Data visualization in a time of pandemic - #1: Finding reliable data

Oh no! Not another coronavirus post! Yes, I know, we are bombarded by pandemic content these days. My apologies for creating even more. However, it is not my purpose to bore you with more of the same, or to confuse you with pointless details. Being a passionate information designer I decided to have a look at good and bad practices in COVID-19 related content from a data visualization point of view. I hope this will be a useful and inspiring overview.

We are living in remarkable times. The novel coronavirus is causing an epidemic spreading with a velocity we have never experienced before. Busy long-distance air and rail traffic have made it impossible to contain the virus after its first outbreak in China. For the first time our modern world is confronted with a pandemic of this scale and magnitude, and our healthcare systems are being put to the test.

But in fighting these challenges, the world has never been as united as today. Research teams across the globe are working together to develop cures, social media are used extensively to keep everyone informed, and innovative companies are coming up with solutions to keep people at home and the virus at bay. Technology plays a crucial role in this fight.

As an information designer, I am specifically fascinated by the efforts of the data science and visualization communities. The newest developments in these fields are put to use to turn a complex and rapidly changing topic into easy-to-communicate visuals. In only a matter of days, nearly everyone is familiar with the ‘flatten the curve’ visuals, or Washington Post’s animations on the impact of social distancing.

In this post, we will explore some of the marvelous ways people around the world are using data visualization in the fight against the novel coronavirus.

Title: data visualization in a time of pandemic

Chapter 1: Finding reliable data

As noted by Edward Tufte, excellent graphics consist of complex ideas communicated with clarity, precision, and efficiency. At the core of a good data visual, therefore, lies accurate data. So before we start diving into coronavirus graphs, we will first take a brief stop at trustworthy data sources.

Sources of reliable data

There are currently three important places where one can obtain reliable and relatively complete aggregate data about the Coronavirus epidemic:

  • World Health Organization
    The World Health Organization publishes daily Situation reports detailing the number of confirmed cases and deaths per country. They also provide a Situation dashboard which is updated three times per day.

WHO Coronavirus Situation dashboard

WHO Novel Coronavirus Situation Dashboard

  • John Hopkins University
    Researchers at John Hopkins University also maintain a dashboard providing an overview of the current number of cases, deaths and recoveries on a per country basis. The underlying data is made freely available through GitHub.

John Hopkins University coronavirus dashboard

John Hopkins University Coronavirus Dashboard

  • European Center for Disease Control and Prevention
    The ECDC publishes daily statistics on the pandemic for the entire country (despite its name!). Data is published daily at 1 p.m. CET and is presented on a situation update page.

  • Our World in Data
    The team of Max Roser collects and combines all available data and information about the epidemic on a single page. This excellent summary provides interactive charts on many different topics ranging from the number of cases to symptoms, incubation period and fatality rate. Each chart comes with a downloadable data set.

Accuracy of data

Collecting and aggregating global data in a rapidly changing environment, such as during a pandemic, is obviously very tricky. None of the above datasets should therefore be considered an ‘absolute truth’, as minor errors are bound to happen. Such errors can be related to reporting difficulties or contradicting sources, or differences and shifts in methodology, but can also be due to minor errors such as typos.

As an example, let us compare the three datasets above for the total number of confirmed cases in Belgium (between March 1 and March 19) with the official numbers communicated by the Belgian government (which can be found here).

Table of confirmed coronavirus cases in Belgium, by different sources

Comparison between different data sources of the reported total number of confirmed COVID-19 cases in Belgium between March 1 and March 19, 2020.

Immediately we can note some discrepancies. The John Hopkins University data follows the government data most closely, with an exception on March 12 where for some reason the number was not updated.

The two other datasets (WHO and Our World in Data) appear to lag behind by one day up until March 16, possibly because WHO Situation reports are published at specific timings which don’t match accurately with government reporting timings. Also, these datasets miss the same update as the John Hopkins numbers (from 314 to 399 cases), they were not updated on March 17, and they appear to have a typing error in them (1.085 cases on March 16, while the official government number was 1.058).

Finally, Our World in Data temporarily stopped updating beyond March 17 because WHO shifted their reporting window: up until Situation report 57 the observed 24-hour time window ended at 10 a.m. CET, since then it ends at midnight. This causes a small overlap making it difficult to accurately compare data and analyze trends.

  • Update March 23: Note that Our World in Data stopped relying on WHO data as they found too many errors in the daily Situation reports. Instead, they switched to data provided by the ECDC.

In summary, John Hopkins University data most closely matches official government numbers (for Belgium).

Graph of the total number of confirmed coronavirus cases in Belgium in March, from different sources

Total number of confirmed COVID-19 cases in Belgium in March 2020, comparison between different sources.

Finding more data sources

If you are looking for alternative data sources, direct reports by governments, or data on specific regions or cities, I highly recommend the data section of the Coronavirus Tech Handbook, a crowdsourced document bringing together all the tools, datasets and visualizations on this topic.

The sheer amount of available data can make it a bit overwhelming, especially taking into account that new numbers are being announced almost constantly. When in doubt, I would advise to stick to the four most complete data sources listed above.

This is a multi-chapter blog post!

Continue reading:

For all your comments, suggestions, errors, links and additional information, you can contact me at or via Twitter at @koen_vde.

Disclaimer: I am not a medical doctor or a virologist. I am a physicist running my own business (Baryon) focused on information design.

Read more:

Vreemde plaatsnamen in Vlaanderen

Iedereen kent wellicht 'Kontich' en 'Reet', maar in Vlaanderen hebben we nog veel meer merkwaardige, onverwachte, en vaak grappige plaatsnamen. Heb je bijvoorbeeld ooit al gehoord van Buitenland, Dikkebus, of Grote Homo?

Read More

Small multiples can save your chart

When you're dealing with a chart that has too much information on it, the most straightforward advice to follow is: break it down into multiple charts, each with less information on them. A powerful example of this is a so-called small multiple approach.

Read More

Data visualization podcasts 2023

At Baryon, we’re huge fans of podcasts! Data visualization podcasts are a great way to stay up to date on the latest trends and techniques in data visualization.

Read More

thumbnail for video 10 - can you use excel to create a powerful chart

Can you use Excel to create a powerful chart?

Spreadsheet tools such as Microsoft Excel or Numbers might not be the first thing on your mind when considering data visualization tools, but they can be pretty solid choices to build data visuals. Don’t let anyone convince you that using Excel to create data visuals is unprofessional.

Read More

thumbnail for video 09 - choosing the right font for your data visual

Choosing the right font for your data visual

Fonts evoke emotions: there are very sophisticated fonts, playful fonts, attention-grabbing fonts, and elegant handwritten fonts. Using the wrong type of font can have a lot of impact. In data visualization the implications of typography are mainly focused on readability. Labels and annotations can easily become so small they get hard to read. Above all else, we should choose a font which is readable at small sizes.

Read More

thumbnail for video 08 - three roles of colour in a data visual

Three roles of colour in a data visual

Colour is one of the most crucial tools we have to turn a normal chart into a powerful chart with a clear message, a chart which tells a story rather than simply presenting the information.

Read More

We are really into visual communication!

Every now and then we send out a newsletter with latest work, handpicked inspirational infographics, must-read blog posts, upcoming dates for workshops and presentations, and links to useful tools and tips. Leave your email address here and we’ll add you to our mailing list of awesome people!