The quest for good data on the COVID-19 pandemic

29 Apr 2020 12:29 | Anonymous

I’m a data geek, so I tend to see things through the prism of data. The COVID-19 pandemic is no exception. While some people are glued to the daily theatre of government briefings, I’m looking for reliable sources of information. In particular I’m looking for evidence of how the pandemic is evolving, how long we’re likely to be in lockdown and what is the impact likely to be for me, my family, my colleagues and my clients.

There is certainly lots of data out there, but how do you filter out the noise, because the data is very noisy at the moment with some very wild claims and data that appears to support such a broad range of positions and theories. The conspiracy theorists are having a field day, but let’s not go down that rabbit hole.

My favourite source for information at the moment is the FT. In particular the work of John Burn-Murdoch (@jburnmurdoch) and Chris Giles (@ChrisGiles_). John collates data from around the world to produce a series of daily trackers showing high quality visualisations including daily rates of new cases and deaths, which are the two trackers that I check every day. Chris merges the official daily count of COVID-19 hospital deaths from the Department of Health and Social Care with the weekly total death statistics from the ONS, to produce an estimate of the total COVID-19 deaths.

There are two things that I really like about their work. The first is that the information is presented in a very clear way. John uses logarithmic scales which means that the slope of the line is the most important thing. A straight line represents exponential growth, and while we were in that very scary phase, the straight line showed very clearly how serious the situation was. The same visualisations are also now showing that lockdown measures are working and both deaths and new cases are coming down. I can see all this in less than 30 seconds every morning. Meanwhile Chris’s visualisations show the difference between weekly deaths now and the five year average, with the implication being that the difference is down to COVID-19, which is clearly much worse than seasonal flu.

The second thing I like is the fact that they both show their working and they are clear about the uncertainties and the assumptions that they make. John has a useful and informative video clip explaining why he uses the logarithmic scale, where his source data comes from, and what the inconsistencies are. He is open about the fact that the data is very noisy, and what he has done to compensate for this. He has settled on a 7 day rolling average, for example, to smooth out some of the noise in the daily reporting. Chris documents the assumptions that he makes about merging two separate data sources, and he is clear about when and why he changes those assumptions. The fact that they show their working in such a transparent manner, and that they patiently respond on Twitter to questions and criticisms allows me to validate their output for myself, to the extent that I trust it. I feel confident that I understand what their work shows and what it can’t.

The COVID-19 pandemic is topical, and it’s putting some data under the spotlight, but it’s highlighting some unchanging truths, fundamentals if you like. To make sense of data requires rigour, including understanding where data has come from (lineage), how reliable it is (quality) and what it actually means.

Our Partners and Sponsors





Powered by Wild Apricot Membership Software