Ugrás a tartalomhoz

SOCIAL STATISTICS

Renáta Németh, Dávid Simon

ELTE

The mean

The mean

The (arithmetic) mean is what people call the “average”.

Appropriate for interval-ratio variables.

Let Y denote the interval-ratio variable, now the mean is:

5.1. egyenlet -


where (y-bar) denotes the sample mean of Y, n is the sample size, Σ (sigma) is the summation sign in mathematics, and denotes summing over all y-value, and yi is the value of Y measured for the ith observation in the sample. Generally, lowercase letters denote sample values, while uppercase letters denote variables or population factors.

Example. ISSP 2006, Hungarian data. Mean of monthly net income by party preference.

Party preference

Mean

n

Std. deviation

MDF

224,050.00

10

198,666.730

SZDSZ

133,392.86

14

158,119.986

FKGP

57,166.67

6

11,214.574

MSZP

123,963.76

264

149,650.388

FIDESZ

125,898.94

231

158,621.847

Munkáspárt

75,400.00

6

34,556.620

MIÉP

165,433.50

8

207,676.491

Other

159,100.00

10

181,112.273

Uncertain

148,636.12

283

176,798.697

Total

134,243.96

832

162,816.877

The supporters of which party have the highest mean income? Of which the second highest? Of which the lowest?

Important to note that data on uncertain voters are also informative: they seem to have higher mean income than certain voters.

Comment I:

Data above are from a sample. Party-specific differences in mean income may arise simply due to sampling error caused by observing a sample instead of the whole population (e.g. if by chance the only MDF-supporter with an extremely high income was selected). The question arises whether the mean income differences observed are valid for the population (in technical term: whether they are statistically significant differences). Statistical inference introduced in later courses gives the answer to that question.

Comment II:

The mean considers only one feature of the distribution. High mean income of MDF-supporters does not necessarily imply that each MDF-supporter has a high income (low variability). As an extreme example consider the case when only a few MDF-supporter with extremely high income pull the mean up. That is, income may have a distribution with high variability among the MDF supporters. Standard deviation, a measure relating to the variability of the distribution is shown in the fourth column of the table above. Standard deviation will be discussed in the next lecture.

Properties of the mean

Sensitivity to outliers (also called extremes).

Unlike with the mode or the median, every value enters into the calculation of the mean. Therefore the mean is sensitive to extremely high or extremely low values in the distribution

Example: a) no outlier

Y (monthy net income, $)

Sample frequency

Σyi

1000

1

1000

2000

2

4000

3000

4

12000

4000

2

8000

5000

1

5000

Total

n=10

Σyi=30,000

= 30,000/10 = 3,000

b) one outlier

Y (monthy net income, $)

Sample frequency

Σyi

1000

1

1000

2000

2

4000

3000

4

12000

4000

2

8000

35000

1

35000

Total

n=10

Σyi=60,000

= 60,000/10 = 6,000

The income of only one person has changed, but the mean has increased twice!

What are the medians in the above cases?

The median did not change, because it is not sensitive to outliers.