Ugrás a tartalomhoz

## SOCIAL STATISTICS

Renáta Németh, Dávid Simon

ELTE

Visualization

## Visualization

With low measurement level variables, their distribution could be visualized using crosstabs. Do crosstabs work for high measurement level variables?

Let’s look at the distribution of age and income in Hungary in 1995 It seems, crosstabs are not very useful here, for various reasons:

• the tables would be too large to take in

• there would be many empty cells

• there would be too few cases per cell

• in all, crosstabs don’t answer the questions posed above

It seems more useful to use some kind of graph This kind of graph is called a scatterplot.

Terminology:

Axis y (vertical): if it’s possible to interpret, then it usually depicts the dependent variable

Axis y (horizontal): if it’s possible to interpret, then it usually depicts the dependent variable.

One point (here small square) stands for one case.

What does the graph tell us?

• the range of the variables (their min and max value) on the two axes

• the tendencies of the relationship (or lack of thereof, its direction and shape!)

• the presence or lack of extreme cases

To characterise the relationship we need to decide if we can see any relationship between the two variables according to their joined distribution

Let’s revise what makes a variable dependent or independent in a relationship!

For low measurement level variables:

• There is a relationship between the two variables if the distribution of the dependent variable is different in various categories of the independent variable

• If the two variables are independent of each other, the distribution of one does not vary according to the categories of the other

Note: Dependence is always symmetrical, so reversing the roles of dependent and independent variable mustn’t change the fact of there being a relationship between them

The definition of independence for high measurement level variables:

• The conditional distribution of the dependent variable (its distribution if the independent variable takes a specific value) is the same regardless the independent variable as a condition

• Less precisely: the dependent variable will take the same value for all the values of the independent variable

If we revisit the graph about age against income, can we tell if the two variables are independent?

What if we disregard the income categories 150 000+ and 0. Now it’s more obvious that they’re not independent. How could we describe their relationship?