Ugrás a tartalomhoz

## SOCIAL STATISTICS

Renáta Németh, Dávid Simon

ELTE

Unit of analysis

## Unit of analysis

Unit of analysis is the level of social life on which the analysis focuses (individuals, countries, companies etc.).

Example:

• comparing children in two classrooms on test scores – unit of analysis is the individual child

• comparing the two classes on classroom climate – unit of analysis is the group (the classroom).

The example of ecological fallacy (see Page 8) shows how important it is to choose the appropriate unit of analysis. Behind the fallacy is the error of using data generated from groups (counties) as the unit of analysis and attempting to draw conclusions about individuals.

Dependent and independent variables

A previous example (see Section Role of statistics in social research) of a research in intercompany relations:

company size affects type of intercompany relations according to our hypothesis

In this context type of relations is called the dependent, while company size is called the independent variable.

The particular research question determines the role of the variables. Type of relations in another research can be the independent variable (“Does type of intercompany relations affect business results?”)

Dependent variable: what we want to explain

Independent variable: what is expected to account for the dependent variable

Does the empirical relationship imply causation?

An empirical relationship between two variables does not automatically imply that one causes the other (see the example about smoking and seeing the GP on Page 12).

Two variables are causally related if

1. the cause precedes the effect in time (in some cases not clear: political preference/antisemitism, education/self-esteem), and

• there is an empirical relationship between the cause and the effect, and

• this relationship cannot be explained by other factors (see Page 12: seeing the GP and smoking may be explained by gender)

Proof of causation is more problematic in the social sciences than in the natural sciences.

Suggested terminology: dependent/independent variables instead of cause/effect.

Example

Debate on drug policy: punishment or prevention/rehabilitation?

Suppose a stricter punishment against drug users is introduced in a country. After two years a significant decrease is shown in the statistics on drug use.

Did the change in drug policy reduce drug use?

Sample and population

A population is the total set of objects (individuals, groups, etc.) which the research question concerns.

Usually it is not possible to study the whole population (due to limitations in time and resources). Instead, we select a subset (a sample) from the population and generalize the results to the entire population.

Descriptive statistics and inferential statistics

Descriptive statistics: organizes, summarizes and describes data on the sample or on the population

Statistical inference: inferences about the whole population from observations of a sample

Important question: Is an attribute of a sample an accurate estimate for a population attribute?

Example: party preference surveys.

The tools of statistical inference help determine the accuracy of the sample estimates.

The present course covers methods of descriptive statistics. Statistical inference will be discussed in later courses.

Important to make distinction in the wording as well:

„X % of the interviewees”: we describe data on the sample.

„From our last two surveys, we can conclude that support for party A has increased”: statistical inference (esp. if two distinct samples were drawn).

Frequency distributions

Data collection › 1.500 questionnaires filled › Summary statistics

A frequency distribution is a table that presents the number of observations that fall into each category of the variable.

International Social Survey Programme (ISSP) 2006, Role of government.

“Do you think it should or should not be the government’s responsibility to reduce income differences between the rich and the poor?”

 Hungary Definitely should be 490 Probably should be 352 Probably should not be 119 Definitely should not be 23 Total 984

The table shows the frequency distribution of the variable. Interpret the table.

(In parenthesis: What do you think, did the sample consist of exactly 984 persons?)

Interpretation is often easier using percentage distribution:

 Hungary Definitely should be 490 49.8% Probably should be 352 35.8% Probably should not be 119 12.1% Definitely should not be 23 2.3% Total 984 100.0%

How to obtain percentage distribution from a frequency distribution?

Interpret the table: What percentage of the sample thinks the government is responsible to some extent?

Comparing groups: row, column and cell percentages

The table below shows frequency distributions for two other ISSP countries.

Interpret the data.

 Hungary Sweden USA Definitely should be 490 419 423 Probably should be 352 343 349 Probably should not be 119 253 394 Definitely should not be 23 110 311 Total 984 1125 1477

Which country has the lowest number of persons who choose the answer „Probably should be”? Is this comparison meaningful?

NO, because of the differences in the sample sizes of the three countries.

How could we make a valid comparison?

To make a valid comparison we have to compare the column percentages:

 Hungary Sweden USA Definitely should be 490 419 423 49.8% 37.2% 28.6% Probably should be 352 343 349 35.8% 30.5% 23.6% Probably should not be 119 253 394 12.1% 22.5% 26.7% Definitely should not be 23 110 311 2.3% 9.8% 21.1% Total 984 1125 1477 100.0% 100.0% 100.0%

Interpret the data. Are your findings in accordance with your background knowledge?

Remark: Comparative cross-national researches always met with the problem of translation.

Based on our background knowledge, what kind of hypotheses can we make that could explain the cross-country differences?

1. USA vs. Hungary: public support for the redistributive role of the state is stronger in post-socialist countries

2. Sweden vs. USA: State has a stronger role in Scandinavian than in liberal welfare regimes.

How to test the hypotheses?

We should add further countries to the analysis

1. Other post-socialist countries,

2. liberal and Scandinavian welfare regimes.

The table below presents ISSP data on other post-socialist countries. Do the data support our first hypothesis?

 Croatia Czech Republic Hungary Latvia Poland Russia Slovenia Definitely should be 55.5% 21.7% 49.8% 38.9% 54.1% 53.1% 54.2% Probably should be 29.1% 32.9% 35.8% 44.4% 33.6% 33.1% 36.6% Probably should not be 9.8% 28.6% 12.1% 13.3% 9.0% 11.1% 7.9% Definitely should not be 5.6% 16.8% 2.3% 3.5% 3.3% 2.7% 1.3% Total 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 100.0%

One might compute row percentages instead of column percentages.

How to interpret the table below? Are row percentages meaningful in this case?

 Hungary Sweden USA Total Definitely should be 36.8% 31.5% 31.8% 100% Probably should be 33.7% 32.9% 33.4% 100% Probably should not be 15.5% 33.0% 51.4% 100% Definitely should not be 5.2% 24.8% 70.0% 100% Total 27.4% 31.4% 41.2% 100%

Note that if row and column variables are exchanged, then comparing row percentages becomes meaningful:

 Definitely should be Probably should be Probably should not be Definitely should not be Total Hungary 49.8% 35.8% 12.1% 2.3% 100.0% Sweden 37.2% 30.5% 22.5% 9.8% 100.0% USA 28.6% 23.6% 26.7% 21.1% 100.0%

Help: it is easy to decide whether row or column percentages are presented in a table: within-row / within-column percentages sum up to 100, respectively.

Another way of table construction is computing cell percentages (also called absolute percentages). The table below presents ISSP 2006 data on Hungary. Interpret the table.

 Attitude to law Gov. resp.: reduce income differences Obey the law without exception Follow conscience on occasions Total Definitely should be 27.6% 22.3% 49.9% Probably should be 24.0% 11.4% 35.3% Probably should not be 6.8% 5.5% 12.2% Definitely should not be 1.7% 0.8% 2.5% Total 60.0% 40.0% 100.0%

What percentage of respondents obeys the law without exception? And what percentage of the respondents obeys the law without exception AND think that government definitely should reduce income differences?

The ISSP

The International Social Survey Programme (ISSP) is a continuing annual program of cross-national collaboration on surveys covering topics important for social science research. It was launched in 1983; in 2011 it had 47 member countries. It offers the opportunity to cross-national (e.g. new vs. old EU member states) comparisons, and, since some important topics are repeated, cross-time comparisons (e.g. socialist countries before and after the transition). The annual topics concentrate on highly relevant issues:

1985 Role of Government I

1986 Social Networks

1987 Social Inequality

1988 Family and Changing Gender Roles I

1989 Work Orientations I

1990 Role of Government II

1991 Religion I

1992 Social Inequality II

1993 Environment I

1994 Family and Changing Gender Roles II

1995 National Identity I

1996 Role of Government III

1997 Work Orientations II

1998 Religion II

1999 Social Inequality III

2000 Environment II

2001 Social Relations and Support Systems

2002 Family and Changing Gender Roles III

2003 National Identity II

2004 Citizenship

2005 Work Orientations III

2006 Role of Government IV

2007 Leisure Time and Sports

2008 Religion III

2009 Social Inequality IV

2010 Environment III

2011 Health

ISSP data will be often used as examples during the course.