m1rko April 5, 2019 at 13:00

Incorrect charts: our experience

Transfer

We at The Economist take data visualization very seriously. Every week we publish about 40 charts in print and online versions, as well as in applications. Everywhere we strive to accurately represent the numbers so that they best illustrate the topic. But sometimes we make mistakes. It’s important to learn these lessons so that you don’t make mistakes in the future. Surely our experience will be useful for you.

Plunging into the archives, I found some instructive examples. Crimes against data visualization are grouped into three categories. These are graphs that:

mislead;
confused;
can't make sense.

For each, a revised version is shown, which takes up the same amount of space - an important factor for print publication.

(Note: most of the “original” charts are published before the redesign. Improved charts are compiled in accordance with the new specifications. The data are the same).

Misleading Charts

Let's start with the worst of crimes: presenting data in such a way that it is misleading. We never do this on purpose! But sometimes this happens. Let's consider three examples from our archive.

Error: truncation

( data in csv )

This graph shows the average number of Facebook likes on the pages of the left parties. The purpose of the chart was to show the difference in the likes of the posts of Mr. Corbin and others.

The original schedule not only underestimates the number of likes of Corbin, but also exaggerates the performance for other participants (here is another example of such an error). In the revised version, Mr. Corbin’s column is fully specified. All other columns are still visible.

Another oddity is the choice of color. In an attempt to imitate the Labor color scheme, we used three shades of orange / red assigned to 1) Corbin, 2) to other deputies and 3) to parties / groups. This is not explained anywhere. Although logic may be obvious to many, it makes little sense to those who are not very familiar with British politics.

Error: the effect of the relationship due to the adjustment of the scales

A rare example of perfect correlation? Actually not ( data in csv )

The above chart is from an article on reducing weight of dogs. At first glance, it seems that the weight and circumference of the dog’s neck are perfectly correlated. But is that true? Only to some extent.

On the graph, both scales are reduced by three units (from 21 to 18 on the left; from 45 to 42 on the right). But in percentage terms, the left scale is reduced by 14%, and the right - by 7%. In the revised chart, I kept the double scale, but adjusted the ranges to reflect a comparable proportional change.

Given the fun theme of this diagram, the error may seem relatively minor. In the end, the meaning is the same in both versions. But the conclusion is important: if the two graphs are too close to each other, you probably need to take a closer look at the scales.

Error: wrong visualization method

Opinions about Brexit are almost as volatile as negotiations about it ( data in csv ).

We published this diagram with the survey data in our Espresso news app. It shows the relation to the results of the EU referendum in the form of a line graph. Judging by the data, respondents fluctuate greatly in their views: the results jump by a few percentage points.

Instead of a smooth curve for trend display, we indicated the actual values of each survey. This happened primarily because our charting tool did not know how to build smooth lines. Only recently have we mastered more advanced programs for processing statistical data (for example, R) with more sophisticated visualization methods. Today, anyone can build a smooth curve for polls, as an improved option at the top.

There is still a violation of the scale. The source chart scatters the data wider than it should. In the revised version, I added a bit of space between the beginning of the scale and the minimum data point. Francis Gagnon offers a good formula for such situations: leave free at least 33% of the area under the line graph, which does not start from scratch.

Charts that are confusing

Not such a serious crime as misleading, but if the schedule is difficult to understand, this is a sign of poorly done visualization work.

Error: too abstruse charts

… what? ( data in csv )

Journalists of The Economist strive in a good way to confuse the reader. But sometimes we go too far. The chart above shows the US trade deficit in goods and the number of people employed in manufacturing.

This chart is incredibly hard to understand. She has two main problems. First, the values of one series (trade deficit) are completely negative, while others (employment in manufacturing) are positive. It is difficult to combine such different data in one diagram. The obvious “solution” leads to a second problem: two rows of data do not have a common baseline. The base line of the trade deficit is at the top of the chart (highlighted in red, passes through half the chart). The baseline of the right scale is at the bottom.

The revised chart shows that there was no need to combine the two data series. The relationship between the trade deficit and manufacturing employment remains clear and takes up only a little more space.

Error: tangled colors

50 shades of blue ( the data in the csv )

On this chart compares public expenditure on pensions to people over 65 years of shares in a number of countries, with a special focus on Brazil. In order not to inflate the chart, the visualizer signed only some countries and highlighted them in blue. The OECD average is highlighted in light blue.

The visualizer (it was me!) Ignored the fact that color change often implies a category change. Here, too, the reader may have such an idea that all blue countries seem to belong to a different group than blue ones. This is not true. The only difference is that they are simply not signed.

In the revised version, the color is the same for everyone. I only changed the intensity for the signed countries. Typography does the rest: Brazil, the country of focus, is in bold, and the OECD average is in italics.

Charts that can't make sense

Errors in this last category are less obvious. Such diagrams are not misleading and not very confusing. They simply cannot justify their existence. Either they were built incorrectly, or we tried to squeeze too much information into too small a space.

Error: Too many details.

“The more colors, the better!” ( Data in csv ) A

real rainbow! We published this chart in the German budget surplus column. It shows the budget balance and the current balance of ten countries in the euro area. With so many colors - some of which are pretty hard to distinguish or even see because the values are too small - the meaning of the chart is hard to understand. This almost blocks the brain, causing the reader to skip the chart and move on. And, more importantly, since we do not give figures for all countries in the Eurozone, there is no point in adding data.

I re-read the article to find a way to simplify the diagram. The text refers to Germany, Greece, the Netherlands, Spain and the Eurozone. In the revised version of the chart, I decided to select only them, and placed the rest in the “Other” category (the total current account balance on the processed chart is less than on the original chart, due to the revision of Eurostat data).

Error: a lot of data, not enough space

I give up ( data in csv )

Limited by the space on the page, we are often tempted to push all the data into a too small slot. Although this saves valuable space on the page, there are consequences, as can be seen on this graph from March 2017 . This is a graph for an article stating that men dominate science. All positions are equally interesting and relevant to the article. But, such an amount of data is difficult to assimilate: here are four categories of research areas, as well as the proportion of patent authors in each country.

Upon reflection, I decided not to change this diagram. If you save all the data, the chart will be too large for a small article. In such cases, it is better to cut something. Alternatively, you can show a certain average indicator: for example, the average share of articles of women in all areas. (Please let me know if you have ideas on how to visualize this in a confined space!)

Best practices are developing rapidly: what is acceptable today, will be condemned tomorrow. All the time, new and more advanced methods appear. Have you ever committed an “infographic crime” that can be easily fixed?

Tags: