Ugrás a tartalomhoz

## SOCIAL STATISTICS

Renáta Németh, Dávid Simon

ELTE

Stem-and-leaf plot

## Stem-and-leaf plot

Appropriate for interval-ratio variables

Similar to a histogram, assists in visualizing the shape of a distribution

Construction: the numbers (the values of the variable) are broken up into stems and leaves. Typically, the stem contains the first (or first two) digits of the number, and the leaf contains the remaining digits. The plot is drawn with two columns separated by a vertical line. The stems are listed to the left of the vertical line.

Note that digits and not numbers are used: If we have the numbers 8, 12 and 30, then the first digit of 8 is 0.

Next, stems are sorted in ascending order, and then leaves of the same stem are also sorted.

Looks like a horizontal histogram with the exception of presenting also the values.

Example: country-specific mean of hours worked weekly (in ascending order, in hours):

 NL-Netherl 35.2995 CA-Canada 37.265 IE-Ireland 37.396 GB-Great B 37.4716 CH-Switzer 37.8244 NZ-New Zea 37.881 FI-Finland 38.2314 FR-France 38.5404 SE-Sweden 38.5873 DK-Denmark 38.6112 NO-Norway 38.6197 DE-Germany 38.9049 HU-Hungary 39.9765 ZA-South A 40.5217 AU-Austral 40.8511 VE-Venezue 40.9579 PT-Portuga 41.2068 ES-Spain 41.402 IL-Israel 41.7687 RU-Russia 41.8208 US-United 42.3195 LV-Latvia 42.3569 SI-Sloveni 42.75 UY-Uruguay 42.8044 HR-Croatia 43.5 PL-Poland 44.0464 CL-Chile 44.2362 JP-Japan 44.5078 CZ-Czech R 45.4177 DO-Dominic 45.5187 PH-Philipp 47.1896 KR-South K 48.7125 TW-Taiwan 49.4881

The stem-and-leaf plot (stems contain the first two digits):

35*        3 36*        37*        34589 38*        256669 39*        40*        059 41*        02488 42*        3488 43*        5 44*        025 45*        45 46*        47*        2 48*        7 49*        5

Stems are often further broken up, e.g. into two parts according to the 0-4 and 5-9 sets of digits.

The next plot was derived from the plot above:

35*        3         35.                36*                36.                37*        34         37.        589         38*        2         38.        56669         39*                39.                40*        0         40.        59         41*        024         41.        88         42*        34         42.        88         43*                43.        5         44*        02         44.        5         45*        4         45.        5         46*                46.                47*        2         47.                48*                48.        7         49*                49.        5

Remark: the last plot is less smooth than the one before. The same problem was seen before in case of histograms: the finest classes (here: stems) we use, the less smooth curve is obtained.

Construct a stem-and-leaf plot from the data above, with stems containing the first digits only.

Statistical map

Maps are especially useful for describing geographical variations in variables.

Most often for interval-ratio variables.

Example: Number of days spent in hospital per treatment, averaged over Hungarian small areas, 2007.

Interpret the map. How can we explain the observed inequalities?

(Hint: unequal need for health care, and/or unequal efficiency of health care providers)

Source: research report of HealthMonitor (in Hungarian)

Time series chart

Appropiate for interval-ratio variables.

It displays changes in a variable at different points in time. It shows time (measured in units such as years or months) on the X axis and the values of the variable on the Y axis. Points can be joined by a straight line.

Example: Change in income inequalities in post-socialist countries during the transition.

Source: Flemming J., and J. Micklewright, “Income Distribution, Economic Systems and Transition”. Innocenti Occasional Papers, Economic and Social Policy Series, No. 70. Florence: UNICEF International Child Development Centre.

To the interpretation:

• the Gini coefficient is a measure of inequality

• it can range from 0 to 1

• a value of 0 expresses total equality (everyone has the same income), and

• a value of 1 expresses maximal inequality (one person has all the income).

The figure below shows changes in the Gini coefficient in four post-socialist countries during the transition

Compare: In the ‘90s Latin America had the highest Gini in the world (around 0.5); in developed Western-European countries it was about 0.35.

Interpret the time series chart.

What is the general trend in each country? Did your findings meet your expectations? What cross-country differences can you observe?

(Missing points denote missing data, for example: Russia 1990, 1991)