Ugrás a tartalomhoz

SOCIAL STATISTICS

Renáta Németh, Dávid Simon

ELTE

The mean

The mean

The (arithmetic) mean is what people call the “average”.

Appropriate for interval-ratio variables.

Let Y denote the interval-ratio variable, now the mean is:

5.1. egyenlet -

where (y-bar) denotes the sample mean of Y, n is the sample size, Σ (sigma) is the summation sign in mathematics, and denotes summing over all y-value, and yi is the value of Y measured for the ith observation in the sample. Generally, lowercase letters denote sample values, while uppercase letters denote variables or population factors.

Example. ISSP 2006, Hungarian data. Mean of monthly net income by party preference.

 Party preference Mean n Std. deviation MDF 224,050.00 10 198,666.730 SZDSZ 133,392.86 14 158,119.986 FKGP 57,166.67 6 11,214.574 MSZP 123,963.76 264 149,650.388 FIDESZ 125,898.94 231 158,621.847 Munkáspárt 75,400.00 6 34,556.620 MIÉP 165,433.50 8 207,676.491 Other 159,100.00 10 181,112.273 Uncertain 148,636.12 283 176,798.697 Total 134,243.96 832 162,816.877

The supporters of which party have the highest mean income? Of which the second highest? Of which the lowest?

Important to note that data on uncertain voters are also informative: they seem to have higher mean income than certain voters.

Comment I:

Data above are from a sample. Party-specific differences in mean income may arise simply due to sampling error caused by observing a sample instead of the whole population (e.g. if by chance the only MDF-supporter with an extremely high income was selected). The question arises whether the mean income differences observed are valid for the population (in technical term: whether they are statistically significant differences). Statistical inference introduced in later courses gives the answer to that question.

Comment II:

The mean considers only one feature of the distribution. High mean income of MDF-supporters does not necessarily imply that each MDF-supporter has a high income (low variability). As an extreme example consider the case when only a few MDF-supporter with extremely high income pull the mean up. That is, income may have a distribution with high variability among the MDF supporters. Standard deviation, a measure relating to the variability of the distribution is shown in the fourth column of the table above. Standard deviation will be discussed in the next lecture.

Properties of the mean

Sensitivity to outliers (also called extremes).

Unlike with the mode or the median, every value enters into the calculation of the mean. Therefore the mean is sensitive to extremely high or extremely low values in the distribution

Example: a) no outlier

 Y (monthy net income, $) Sample frequency Σyi 1000 1 1000 2000 2 4000 3000 4 12000 4000 2 8000 5000 1 5000 Total n=10 Σyi=30,000 = 30,000/10 = 3,000 b) one outlier  Y (monthy net income,$) Sample frequency Σyi 1000 1 1000 2000 2 4000 3000 4 12000 4000 2 8000 35000 1 35000 Total n=10 Σyi=60,000

= 60,000/10 = 6,000

The income of only one person has changed, but the mean has increased twice!

What are the medians in the above cases?

The median did not change, because it is not sensitive to outliers.