Ugrás a tartalomhoz

SOCIAL STATISTICS

Renáta Németh, Dávid Simon

ELTE

Unit of analysis

Unit of analysis

Unit of analysis is the level of social life on which the analysis focuses (individuals, countries, companies etc.).

Example:

  • comparing children in two classrooms on test scores – unit of analysis is the individual child

  • comparing the two classes on classroom climate – unit of analysis is the group (the classroom).

The example of ecological fallacy (see Page 8) shows how important it is to choose the appropriate unit of analysis. Behind the fallacy is the error of using data generated from groups (counties) as the unit of analysis and attempting to draw conclusions about individuals.

Dependent and independent variables

A previous example (see Section Role of statistics in social research) of a research in intercompany relations:

company size affects type of intercompany relations according to our hypothesis

In this context type of relations is called the dependent, while company size is called the independent variable.

The particular research question determines the role of the variables. Type of relations in another research can be the independent variable (“Does type of intercompany relations affect business results?”)

Dependent variable: what we want to explain

Independent variable: what is expected to account for the dependent variable

Does the empirical relationship imply causation?

An empirical relationship between two variables does not automatically imply that one causes the other (see the example about smoking and seeing the GP on Page 12).

Two variables are causally related if

  1. the cause precedes the effect in time (in some cases not clear: political preference/antisemitism, education/self-esteem), and

  • there is an empirical relationship between the cause and the effect, and

  • this relationship cannot be explained by other factors (see Page 12: seeing the GP and smoking may be explained by gender)

Proof of causation is more problematic in the social sciences than in the natural sciences.

Suggested terminology: dependent/independent variables instead of cause/effect.

Example

Debate on drug policy: punishment or prevention/rehabilitation?

Suppose a stricter punishment against drug users is introduced in a country. After two years a significant decrease is shown in the statistics on drug use.

Did the change in drug policy reduce drug use?

Sample and population

A population is the total set of objects (individuals, groups, etc.) which the research question concerns.

Usually it is not possible to study the whole population (due to limitations in time and resources). Instead, we select a subset (a sample) from the population and generalize the results to the entire population.

Descriptive statistics and inferential statistics

Descriptive statistics: organizes, summarizes and describes data on the sample or on the population

Statistical inference: inferences about the whole population from observations of a sample

Important question: Is an attribute of a sample an accurate estimate for a population attribute?

Example: party preference surveys.

The tools of statistical inference help determine the accuracy of the sample estimates.

The present course covers methods of descriptive statistics. Statistical inference will be discussed in later courses.

Important to make distinction in the wording as well:

„X % of the interviewees”: we describe data on the sample.

„From our last two surveys, we can conclude that support for party A has increased”: statistical inference (esp. if two distinct samples were drawn).

Frequency distributions

Data collection › 1.500 questionnaires filled › Summary statistics

A frequency distribution is a table that presents the number of observations that fall into each category of the variable.

International Social Survey Programme (ISSP) 2006, Role of government.

“Do you think it should or should not be the government’s responsibility to reduce income differences between the rich and the poor?”

Hungary

Definitely should be

490

Probably should be

352

Probably should not be

119

Definitely should not be

23

Total

984

The table shows the frequency distribution of the variable. Interpret the table.

(In parenthesis: What do you think, did the sample consist of exactly 984 persons?)

Interpretation is often easier using percentage distribution:

Hungary

Definitely should be

490

49.8%

Probably should be

352

35.8%

Probably should not be

119

12.1%

Definitely should not be

23

2.3%

Total

984

100.0%

How to obtain percentage distribution from a frequency distribution?

Interpret the table: What percentage of the sample thinks the government is responsible to some extent?

Comparing groups: row, column and cell percentages

The table below shows frequency distributions for two other ISSP countries.

Interpret the data.

Hungary

Sweden

USA

Definitely should be

490

419

423

Probably should be

352

343

349

Probably should not be

119

253

394

Definitely should not be

23

110

311

Total

984

1125

1477

Which country has the lowest number of persons who choose the answer „Probably should be”? Is this comparison meaningful?

NO, because of the differences in the sample sizes of the three countries.

How could we make a valid comparison?

To make a valid comparison we have to compare the column percentages:

Hungary

Sweden

USA

Definitely should be

490

419

423

49.8%

37.2%

28.6%

 

Probably should be

352

343

349

35.8%

30.5%

23.6%

 

Probably should not be

119

253

394

12.1%

22.5%

26.7%

 

Definitely should not be

23

110

311

2.3%

9.8%

21.1%

 

Total

984

1125

1477

100.0%

100.0%

100.0%

 

Interpret the data. Are your findings in accordance with your background knowledge?

Remark: Comparative cross-national researches always met with the problem of translation.

Based on our background knowledge, what kind of hypotheses can we make that could explain the cross-country differences?

1. USA vs. Hungary: public support for the redistributive role of the state is stronger in post-socialist countries

2. Sweden vs. USA: State has a stronger role in Scandinavian than in liberal welfare regimes.

How to test the hypotheses?

We should add further countries to the analysis

1. Other post-socialist countries,

2. liberal and Scandinavian welfare regimes.

The table below presents ISSP data on other post-socialist countries. Do the data support our first hypothesis?

Croatia

Czech Republic

Hungary

Latvia

Poland

Russia

Slovenia

Definitely should be

55.5%

21.7%

49.8%

38.9%

54.1%

53.1%

54.2%

Probably should be

29.1%

32.9%

35.8%

44.4%

33.6%

33.1%

36.6%

Probably should not be

9.8%

28.6%

12.1%

13.3%

9.0%

11.1%

7.9%

Definitely should not be

5.6%

16.8%

2.3%

3.5%

3.3%

2.7%

1.3%

Total

100.0%

100.0%

100.0%

100.0%

100.0%

100.0%

100.0%

One might compute row percentages instead of column percentages.

How to interpret the table below? Are row percentages meaningful in this case?

Hungary

Sweden

USA

Total

Definitely should be

36.8%

31.5%

31.8%

100%

Probably should be

33.7%

32.9%

33.4%

100%

Probably should not be

15.5%

33.0%

51.4%

100%

Definitely should not be

5.2%

24.8%

70.0%

100%

Total

27.4%

31.4%

41.2%

100%

Note that if row and column variables are exchanged, then comparing row percentages becomes meaningful:

Definitely should be

Probably should be

Probably should not be

Definitely should not be

Total

Hungary

49.8%

35.8%

12.1%

2.3%

100.0%

Sweden

37.2%

30.5%

22.5%

9.8%

100.0%

USA

28.6%

23.6%

26.7%

21.1%

100.0%

Help: it is easy to decide whether row or column percentages are presented in a table: within-row / within-column percentages sum up to 100, respectively.

Another way of table construction is computing cell percentages (also called absolute percentages). The table below presents ISSP 2006 data on Hungary. Interpret the table.

Attitude to law

 

Gov. resp.: reduce income differences

Obey the law without exception

Follow conscience on occasions

Total

Definitely should be

27.6%

22.3%

49.9%

Probably should be

24.0%

11.4%

35.3%

Probably should not be

6.8%

5.5%

12.2%

Definitely should not be

1.7%

0.8%

2.5%

Total

60.0%

40.0%

100.0%

What percentage of respondents obeys the law without exception? And what percentage of the respondents obeys the law without exception AND think that government definitely should reduce income differences?

The ISSP

The International Social Survey Programme (ISSP) is a continuing annual program of cross-national collaboration on surveys covering topics important for social science research. It was launched in 1983; in 2011 it had 47 member countries. It offers the opportunity to cross-national (e.g. new vs. old EU member states) comparisons, and, since some important topics are repeated, cross-time comparisons (e.g. socialist countries before and after the transition). The annual topics concentrate on highly relevant issues:

1985 Role of Government I

1986 Social Networks

1987 Social Inequality

1988 Family and Changing Gender Roles I

1989 Work Orientations I

1990 Role of Government II

1991 Religion I

1992 Social Inequality II

1993 Environment I

1994 Family and Changing Gender Roles II

1995 National Identity I

1996 Role of Government III

1997 Work Orientations II

1998 Religion II

1999 Social Inequality III

2000 Environment II

2001 Social Relations and Support Systems

2002 Family and Changing Gender Roles III

2003 National Identity II

2004 Citizenship

2005 Work Orientations III

2006 Role of Government IV

2007 Leisure Time and Sports

2008 Religion III

2009 Social Inequality IV

2010 Environment III

2011 Health

ISSP data will be often used as examples during the course.