Renáta Németh, Dávid Simon
With low measurement level variables, their distribution could be visualized using crosstabs. Do crosstabs work for high measurement level variables?
Let’s look at the distribution of age and income in Hungary in 1995
It seems, crosstabs are not very useful here, for various reasons:
the tables would be too large to take in
there would be many empty cells
there would be too few cases per cell
in all, crosstabs don’t answer the questions posed above
It seems more useful to use some kind of graph
This kind of graph is called a scatterplot.
Axis y (vertical): if it’s possible to interpret, then it usually depicts the dependent variable
Axis y (horizontal): if it’s possible to interpret, then it usually depicts the dependent variable.
One point (here small square) stands for one case.
What does the graph tell us?
the range of the variables (their min and max value) on the two axes
the tendencies of the relationship (or lack of thereof, its direction and shape!)
the presence or lack of extreme cases
To characterise the relationship we need to decide if we can see any relationship between the two variables according to their joined distribution
Let’s revise what makes a variable dependent or independent in a relationship!
For low measurement level variables:
There is a relationship between the two variables if the distribution of the dependent variable is different in various categories of the independent variable
If the two variables are independent of each other, the distribution of one does not vary according to the categories of the other
Note: Dependence is always symmetrical, so reversing the roles of dependent and independent variable mustn’t change the fact of there being a relationship between them
The definition of independence for high measurement level variables:
The conditional distribution of the dependent variable (its distribution if the independent variable takes a specific value) is the same regardless the independent variable as a condition
Less precisely: the dependent variable will take the same value for all the values of the independent variable
If we revisit the graph about age against income, can we tell if the two variables are independent?
What if we disregard the income categories 150 000+ and 0.
Now it’s more obvious that they’re not independent. How could we describe their relationship?