Ugrás a tartalomhoz

SOCIAL STATISTICS

Renáta Németh, Dávid Simon

ELTE

Stem-and-leaf plot

Stem-and-leaf plot

Appropriate for interval-ratio variables

Similar to a histogram, assists in visualizing the shape of a distribution

Construction: the numbers (the values of the variable) are broken up into stems and leaves. Typically, the stem contains the first (or first two) digits of the number, and the leaf contains the remaining digits. The plot is drawn with two columns separated by a vertical line. The stems are listed to the left of the vertical line.

Note that digits and not numbers are used: If we have the numbers 8, 12 and 30, then the first digit of 8 is 0.

Next, stems are sorted in ascending order, and then leaves of the same stem are also sorted.

Looks like a horizontal histogram with the exception of presenting also the values.

Example: country-specific mean of hours worked weekly (in ascending order, in hours):

NL-Netherl

35.29948

CA-Canada

37.26501

IE-Ireland

37.39599

GB-Great B

37.47162

CH-Switzer

37.82437

NZ-New Zea

37.88102

FI-Finland

38.23138

FR-France

38.54045

SE-Sweden

38.5873

DK-Denmark

38.61125

NO-Norway

38.61965

DE-Germany

38.90488

HU-Hungary

39.9765

ZA-South A

40.52171

AU-Austral

40.85112

VE-Venezue

40.9579

PT-Portuga

41.2068

ES-Spain

41.40199

IL-Israel

41.76869

RU-Russia

41.82076

US-United

42.31947

LV-Latvia

42.35688

SI-Sloveni

42.75

UY-Uruguay

42.80439

HR-Croatia

43.5

PL-Poland

44.04636

CL-Chile

44.23623

JP-Japan

44.5078

CZ-Czech R

45.4177

DO-Dominic

45.51872

PH-Philipp

47.18957

KR-South K

48.71251

TW-Taiwan

49.48805

The stem-and-leaf plot (stems contain the first two digits):

35*        3 36*        37*        34589 38*        256669 39*        40*        059 41*        02488 42*        3488 43*        5 44*        025 45*        45 46*        47*        2 48*        7 49*        5

Stems are often further broken up, e.g. into two parts according to the 0-4 and 5-9 sets of digits.

The next plot was derived from the plot above:

        35*        3         35.                36*                36.                37*        34         37.        589         38*        2         38.        56669         39*                39.                40*        0         40.        59         41*        024         41.        88         42*        34         42.        88         43*                43.        5         44*        02         44.        5         45*        4         45.        5         46*                46.                47*        2         47.                48*                48.        7         49*                49.        5

Remark: the last plot is less smooth than the one before. The same problem was seen before in case of histograms: the finest classes (here: stems) we use, the less smooth curve is obtained.

Construct a stem-and-leaf plot from the data above, with stems containing the first digits only.

Statistical map

Maps are especially useful for describing geographical variations in variables.

Most often for interval-ratio variables.

Example: Number of days spent in hospital per treatment, averaged over Hungarian small areas, 2007.

Interpret the map. How can we explain the observed inequalities?

(Hint: unequal need for health care, and/or unequal efficiency of health care providers)

n_pic_17

Source: research report of HealthMonitor (in Hungarian)

Time series chart

Appropiate for interval-ratio variables.

It displays changes in a variable at different points in time. It shows time (measured in units such as years or months) on the X axis and the values of the variable on the Y axis. Points can be joined by a straight line.

Example: Change in income inequalities in post-socialist countries during the transition.

Source: Flemming J., and J. Micklewright, “Income Distribution, Economic Systems and Transition”. Innocenti Occasional Papers, Economic and Social Policy Series, No. 70. Florence: UNICEF International Child Development Centre.

To the interpretation:

  • the Gini coefficient is a measure of inequality

  • it can range from 0 to 1

  • a value of 0 expresses total equality (everyone has the same income), and

  • a value of 1 expresses maximal inequality (one person has all the income).

The figure below shows changes in the Gini coefficient in four post-socialist countries during the transition

Compare: In the ‘90s Latin America had the highest Gini in the world (around 0.5); in developed Western-European countries it was about 0.35.

Interpret the time series chart.

What is the general trend in each country? Did your findings meet your expectations? What cross-country differences can you observe?

n_pic_18

(Missing points denote missing data, for example: Russia 1990, 1991)