Ugrás a tartalomhoz

SOCIAL STATISTICS

Renáta Németh, Dávid Simon

ELTE

The relationship of nominal or ordinal variables with crosstabs

The relationship of nominal or ordinal variables with crosstabs

Crosstab: showing the joint distribution of two nominal or ordinal variables within one single table

Joint distribution: the distribution of both variables for the categories of the other one

We know the distribution of the cross-combination of both variables. E.g.: The distribution of the origins of best friendships by settlement type (From: Social Report, 2002, KSH)

Origin of best friendship

Settlement type

Total

Capital

County capital

Other city or town

Village

Childhood

22,2

20,0

22,2

29,5

24,2

School

33,6

24,9

22,9

18,4

24,0

Work

21,1

24,7

23,5

16,9

21,1

Family

5,7

5,3

6,7

8,1

6,7

Neighbours

8,7

13,1

13,5

15,3

13,0

Other

8,6

11,9

11,2

11,8/

11,0

Total

100%

100%

100%

100%

100%

Level of measurement? What do the rows and columns stand for?

What kind of percentage data does the table give for the joint distribution?

(Question: To what extent are you attached to the coutry of your residence?, ISSP 1995)

USA

H

SK

Very much

35,4%

463

79,6%

794

41,6%

567

Considerably

46.6%

596

16,8%

168

47,7%

650

Not very much

15.3%

200

2,8%

28

7,6%

103

Not at all

3.7%

48

0,8%

8

3,1%

42

Total

100%

1307

100%

998

100%

1362

% within country

D-E

H

CZ

SLO

PL

BG

RUS

LV

SK

Very much

27,7%

79,6%

47,5%

49,3%

54,6%

72,1%

41,7%

41,3%

41,6%

Considerably

53,6%

16,8%

44,2%

43,7%

39,3%

20,6%

40,1%

45,1%

47,7%

Not very much

16,6%

2,8%

7,0%

6,0%

5,1%

4,3%

12,3%

10,9%

7,6%

Not at all

2,1%

0,8%

1,4%

1,1%

0,9%

3,1%

5,9%

2,7%

3,1%

Total

100,0%

100,0%

100,0%

100,0%

100,0%

100,0%

100,0%

100,0%

100,0%

What's the difference?

Row percentage

Column percentage

Cell percentage

Dependent and independent variables

If our hypothesis is that the origin of the best friendships varies by type of settlement, that is where you live, affects where you make friends, in this model ’origin of friendship’ is the dependent variable and ’settlement type’ is the independent variable.

Crosstabulation: terminology (in the above example)

Row variable: origin of friendship

Column variable: type of settlement

Cell: the overlap of a given row with a given column

Marginal: the distribution of the row variable or the column variable without breaking it up (here: the last column)

In the example above, it seemed more practical to give the row percentages, since the column variable was independent

If the data can be presented both ways (with either the row or the column variable being dependent), we can give both the row and the column percentages

Example for investigating dependency

A fictitious example: the relationship between financial status and mental health

The direction of the relationhip is not obvious – why?

How can you interpret the tables below? Which table presents which type of influence?

Column percentage:

MENTAL HEALTH - HAVING A MEDICAL CONDITION

FINANCIAL STATUS

Relatively bad

Relatively good

Total

Yes

46%

43%

44%

No

54%

57%

56%

Total

100%

100%

100%

Row percentage:

MENTAL HEALTH - HAVING A MEDICAL CONDITION

FINANCIAL STATUS

Relatively bad

Relatively good

Total

Yes

44%

56%

100%

No

42%

58%

100%

Total

43%

57%

100%