Renáta Németh, Dávid Simon

ELTE

Appropriate for **interval-ratio** variables

Similar to a histogram, assists in visualizing the shape of a distribution

Construction: the numbers (the values of the variable) are broken up into stems and leaves. Typically, the stem contains the first (or first two) digits of the number, and the leaf contains the remaining digits. The plot is drawn with two columns separated by a vertical line. The stems are listed to the left of the vertical line.

Note that **digits** and not numbers are used: If we have the numbers 8, 12 and 30, then the first digit of 8 is 0.

Next, stems are sorted in ascending order, and then leaves of the same stem are also sorted.

Looks like a horizontal histogram with the exception of presenting also the values.

*Example:* country-specific mean of hours worked weekly (in ascending order, in hours):

NL-Netherl |
35.29948 |

CA-Canada |
37.26501 |

IE-Ireland |
37.39599 |

GB-Great B |
37.47162 |

CH-Switzer |
37.82437 |

NZ-New Zea |
37.88102 |

FI-Finland |
38.23138 |

FR-France |
38.54045 |

SE-Sweden |
38.5873 |

DK-Denmark |
38.61125 |

NO-Norway |
38.61965 |

DE-Germany |
38.90488 |

HU-Hungary |
39.9765 |

ZA-South A |
40.52171 |

AU-Austral |
40.85112 |

VE-Venezue |
40.9579 |

PT-Portuga |
41.2068 |

ES-Spain |
41.40199 |

IL-Israel |
41.76869 |

RU-Russia |
41.82076 |

US-United |
42.31947 |

LV-Latvia |
42.35688 |

SI-Sloveni |
42.75 |

UY-Uruguay |
42.80439 |

HR-Croatia |
43.5 |

PL-Poland |
44.04636 |

CL-Chile |
44.23623 |

JP-Japan |
44.5078 |

CZ-Czech R |
45.4177 |

DO-Dominic |
45.51872 |

PH-Philipp |
47.18957 |

KR-South K |
48.71251 |

TW-Taiwan |
49.48805 |

The stem-and-leaf plot (stems contain the first two digits):

35* 3 36* 37* 34589 38* 256669 39* 40* 059 41* 02488 42* 3488 43* 5 44* 025 45* 45 46* 47* 2 48* 7 49* 5

Stems are often further broken up, e.g. into two parts according to the 0-4 and 5-9 sets of digits.

The next plot was derived from the plot above:

35* 3 35. 36* 36. 37* 34 37. 589 38* 2 38. 56669 39* 39. 40* 0 40. 59 41* 024 41. 88 42* 34 42. 88 43* 43. 5 44* 02 44. 5 45* 4 45. 5 46* 46. 47* 2 47. 48* 48. 7 49* 49. 5

Remark: the last plot is less smooth than the one before. The same problem was seen before in case of histograms: the finest classes (here: stems) we use, the less smooth curve is obtained.

Construct a stem-and-leaf plot from the data above, with stems containing the first digits only.

**Statistical map**

Maps are especially useful for describing geographical variations in variables.

Most often for **interval-ratio** variables.

*Example:* Number of days spent in hospital per treatment, averaged over Hungarian small areas, 2007.

Interpret the map. How can we explain the observed inequalities?

(Hint: unequal need for health care, and/or unequal efficiency of health care providers)

Source: research report of HealthMonitor (in Hungarian)

**Time series chart**

Appropiate for interval-ratio variables.

It displays changes in a variable at different points in time. It shows time (measured in units such as years or months) on the X axis and the values of the variable on the Y axis. Points can be joined by a straight line.

*Example:* Change in income inequalities in post-socialist countries during the transition.

Source: Flemming J., and J. Micklewright, “Income Distribution, Economic Systems and Transition”. Innocenti Occasional Papers, Economic and Social Policy Series, No. 70. Florence: UNICEF International Child Development Centre.

To the interpretation:

the Gini coefficient is a measure of inequality

it can range from 0 to 1

a value of 0 expresses total equality (everyone has the same income), and

a value of 1 expresses maximal inequality (one person has all the income).

The figure below shows changes in the Gini coefficient in four post-socialist countries during the transition

Compare: In the ‘90s Latin America had the highest Gini in the world (around 0.5); in developed Western-European countries it was about 0.35.

Interpret the time series chart.

What is the general trend in each country? Did your findings meet your expectations? What cross-country differences can you observe?

(Missing points denote missing data, for example: Russia 1990, 1991)