Renáta Németh, Dávid Simon
An alternative way to present nominal or ordinal data graphically.
In case of ordinal variables, categories are sorted along the X axis.
Example: „Please show whether you would like to see more or less government spending on military and defense?”
(Data source from now on is ISSP 2006)
Bar graphs are often used to compare distribution of a variable among different groups.
Interpret the bar chart below.
Appropriate for interval-ratio variables, whose values are classified.
Shows frequencies or percentages of the classes
The classes are displayed as bars, with width proportional to the width of the class and area proportional to the frequency or percentage of that class.
A histogram is similar to a bar chart, but its bars are contiguous to each other (visually indicating that the variable is continuous rather than discrete), and the bars may be of unequal width.
(Remember what we have learned about classification in Section Frequency distributions for interval-ratio variables).
Example. Average hours worked weekly (Hungary, 2006). Classes are 5-hours intervals.
Interpret the histogram.
Which is the most frequent class of working time?
A further difference compared to bar charts: a bar chart can be used to compare the distribution of a variable among different groups (within a single bar chart). A single histogram is not appropriate to this aim, separate histograms have to be drawn for each group.
The histograms below can be used to compare Hungary with Japan and the Netherlands. Width of bars is 5 hours in all the three cases.
Interpret the histograms.
In which country is working time most uniform? In which country are part-time jobs most common? In which country are workers most frequently expected to work extra hours?
Remark. A general classification problem:
The information presented by the chart depends on the width of the classes (also called as bin width). How to select bin width?
There is no "best" number of bins, and different bin sizes can reveal different features of the data.
A large bin width smoothes out the graph, and shows a rough picture. A smaller bin width highlights finer features. But the smaller width we use, the more empty classes are formed, and the more broken graph we get.
(In parenthesis: the population distribution is generally smooth, but the sample has a limited size, and it cannot be expected to give perfectly accurate information. The finer classification we use, the less accurate estimate for the distribution the sample can provide.)
Data on the Netherlands, with three different bin widths:
Width= 10 hours
Width= 5 hours