Skip to main content

The Power of Data in the Everlasting Battle: Humanity vs. Virus (1/2)

Humanity vs Virus
Part 1: The role of data in the fight against epidemics and pandemics in the past and present

Over the past decades all data technologies, from data acquisition and processing to storing and publishing have become cheaper than ever before. This revolutionised the world by providing unprecedented opportunities to produce insights from data. Scientists embrace these opportunities in many aspects of life, illustrated by how big data and data analytics technology and practices have become mainstream, such as in business, the public sector, mobility, and now, outbreaks.

The ease of use and accessibility of data technology is very visible during the current pandemic, where data on COVID-19 is ubiquitous and new information is generated and processed at every moment. Data available on COVID-19 ranges from epidemiological, healthcare facility and medical research to data on governmental measures and the socioeconomic and environmental impacts of the pandemic. This widespread availability and the opportunity to access it in real-time enhance clarity on the current situation and assist informed decision-making. Moreover, data visualisations on outbreaks are increasingly prevalent, which helps illustrate aspects of crises such as the COVID-19 pandemic and makes them more tangible and actionable.

The lessons we learned from previous outbreaks, combined with our current capacity to collect and process data, shape a more data-driven and evidence-based response against COVID-19. In this data story, we explore the historical role of data in epidemics and pandemics over the past century and highlight how the role of data in COVID-19 is different.


The role of data in epidemics and pandemics in the past century

Today’s widespread availability of data during COVID-19 makes it hard to imagine living through an epidemic or pandemic again without the opportunity to access real-time data and the latest research insights. However, about a hundred years ago, data technology was still expensive and scarcely used.  Therefore, the use of data for managing epidemics and pandemics was highly limited.

In 1918, an H1N1 virus caused a horrendous pandemic referred to as “Spanish flu”, which infected approximately a third of the world’s population. As data collection, publishing, and news distribution systems were far from being as developed as they are today, barely any data was available to the wider public, causing the population to have limited awareness and knowledge of the crisis.

Scientific researchers have recently attempted to reconstruct the footprint of the pandemic by uncovering and analysing old data sitting in dusty libraries, church records and long-forgotten vital statistics books. Even this painstaking work cannot solve some of the uncertainty surrounding the Spanish flu. For example, the reconstruction of the number of deaths produces estimates that range between 17 million and 100 million. This is not unsurprising though, since accurate epidemiological data was unavailable due to medical records and death certificates being kept on paper, such as the death certificates depicted in Figure 1.

Death records Spanish flu

Figure 1: Death certificates of the Hiley family, who died after contracting Spanish Flu in Wales, United Kingdom in 1918.

Furthermore, health systems were strikingly different from present day, as healthcare was highly underdeveloped compared to modern standards in the Western world. The Spanish flu occurred even before antibiotics were invented and many deaths were not caused by the influenza virus itself, but by secondary bacterial infections that medics were not yet able to cure. Moreover, medical research was in its infancy, and it was impossible for researchers to develop a vaccine that could put the pandemic to a halt.

The poor research capacity and the lack of data made it almost impossible to understand how the virus exactly worked, and how it spread. Consequently, the entire world population was defenceless, and authorities could do nothing else than locking down their countries and wait for the virus to disappear naturally. The Spanish flu lasted for two years and caused millions of deaths before it slowed down.

In the decades that followed since, the digital revolution took place, massively disrupting but also progressing science and technology, including in medicine. With the rise of data technology, scientists and researchers gained increased abilities to advance medical technology and research and to use epidemiological data, demographics and basic forecasting. For example, in the 20th century, medical advances[1] such as the discovery of the first antibiotic penicillin (1928), the invention of the electron microscope that allowed doctors to see bacteria and viruses for the first time (1931), and the development of MRI scans (1980s) took place. Furthermore, the pharmaceutical industry and immunology advanced, such as techniques for the development of vaccines, via chemical technology and the improved knowledge of chemistry. These advances were particularly driven by World War II. Lastly, communications technology and increased travel allowed medical information and insights to be shared more rapidly in the research community and beyond.  

A compelling example of medical progress is that starting in the 1960s health technology experts were able to replace paper records with Electronic Health Records (EHRs). These were an enormous game changer, as EHRs not only offered an increased ease of use, lower healthcare costs and flexibility to access records from multiple locations, but also provided invaluable data to clinical researchers. Thereby, EHRs helped advance medical knowledge and the development of treatments for health problems, including viral outbreaks.

In 2009, another H1N1 virus impacted humans and caused a pandemic referred to as “Swine flu” (or “H1N1 2009”). Data on Swine flu suggests that this pandemic infected about 24% of the world’s population and caused a relatively low number of deaths worldwide, estimated at 280,000. Swine flu had a relatively short presence, as the first doses of monovalent H1N1 pandemic vaccine were administered only 6 months after research commenced. The rapid response and halting of the virus resulted, amongst other factors, from the advancements that occurred during the previous decades and the lower pathogenicity of H1N1 2009.

Within a century’s time span, the medical world had advanced impressively, the knowledge on infectious diseases improved, and the ability to effectively engage in vaccine research was established.


The indispensable role of data during COVID-19

Over the past ten years, data technology has progressed even further, enabling data to play an indispensable role in fighting COVID-19 today. Big data, machine learning and other technologies have shaped our medical world and increased opportunities for successful combating and preventing of viruses, by supporting humans on the frontlines determine the best preparation and response.

Below, we address four areas in which data strongly supports our response to COVID-19.

Early warnings on outbreaks – Alerts on the novel coronavirus outside of China occurred for the first time on 30 December 2019. The earliest alerts were sourced from AI-based early warning systems, including BlueDot and HealthMap (Figure 2), which scan social media, online news articles and government reports for signs of infectious disease outbreaks. The systems are used to help warn and inform global agencies such as the World Health Organization and give them a head start in identifying new outbreaks where bureaucracy, language barriers or government procedures might otherwise get in the way.

Europe map

Figure 2: Screenshot of HealthMap’s interactive overview presenting the current global state of infectious diseases and their effect on human and animal health, on 30 April 2020 at 10:10 CEST.

About half an hour after HealthMap’s AI system brought forward the alert, a human research group called ProMed also flagged the outbreak after noticing a post on social media that spoke of an ‘unknown pneumonia’. The fact that the AI system was first to alert the outbreak caused excitement, as this was the first time that a system was faster than human researchers. Although this is true in essence, it still took the insight of humans to recognise the significance of the outbreak and trigger a response from the health community.

In fact, the HealthMap alert had already triggered a warning a couple of days earlier, yet it was neglected as the system ranked the alert’s seriousness “just” as a 3 out of 5. It is notable that AI systems are highly dependent on the quality and quantity of data that they are trained with, and often require humans to review the conclusions to ensure relevance and accuracy. Nonetheless, system improvements and enhanced collaboration between systems and humans are promising for earlier identification of outbreaks in the future.

Open data in viral research – Collaboration and openness are key factors in the acceleration of pandemic research. The engagement of the scientific world with COVID-19 outweighs any previous outbreaks. For instance, only one month after the first case of the virus was reported in China, the country’s scientists sequenced its genome and made it publicly available online.

Consequently, international action in response to this novel coronavirus has started, where leading scientists, public health agencies, ministries of health and research funders were mobilised for global coronavirus research and innovation. Furthermore, sharing the genome publicly in a relatively early stage of the outbreak helped doctors to start testing and diagnosing infections, even in individuals who had no apparent symptoms. In comparison, during the SARS epidemic in 2003, it took months before the outbreak was acknowledged and the genome was shared, and the disease was originally thought to be caused by Chlamydia.

With a significant number of scientists researching COVID-19, the knowledge base on the virus is growing, with information such as its means of transmission, its survival time on surfaces, potential antiviral treatments and vaccines, as well as the extent to which individuals develop immunity after contracting and healing from the virus. Sharing all the research available as quickly and as freely as possible is an imperative, as swift responses to the virus are required.

Fortunately, where academic research is traditionally only available behind paywalls, publishers have hastily responded and adjusted. Especially scientific journals from publishers Elsevier, Springer Nature, and Wiley made large strides towards openness by nearly universally granting open access to articles on corona-related research.

We at the European Data Portal are now wondering, whether this new form of openness is there to stay and create a competitive advantage for humanity in battling next epidemic or pandemic, or we will go back to the all-encompassing paywalls after the emergency has passed.

Better allocation of crucial resourcesThe value of data is also demonstrated by its use in more effective allocation of scarce - though crucial - physical and human resources required for fighting COVID-19. For example, medical personnel, ventilators, hospital beds, ICU beds, testing kits, Personal Protective Equipment (PPE) and diagnostic, therapeutic, and preventive interventions need to be distributed carefully and efficiently, to maximize the benefits. Even though modelling and predicting the pandemic is challenging, models are widely used to help project where resources are needed most, and to ensure that healthcare infrastructure is not overwhelmed. 

One example of how data can support allocation of resources comes from the Netherlands, where predictions on the availability of ICU beds suggested that capacity could fall short in May. In response, the government started ramping up capacity in March, and had more than doubled the number of beds - from 1150 to 2400 beds - by the beginning of April. Furthermore, more than a hundred Dutch patients had been precautionary transferred to German hospitals, to ensure sufficient capacity. Eventually, the number of Dutch patients decreased again. This clearly highlights the potential of data in supporting an effective response for resource allocation or reallocation.

Leveraging data-driven toolsLeveraging data-driven tools to fight COVID-19 in multiple fronts, has become a significant part of our approach in managing the pandemic today. These tools can take multiple forms and are created to realise different strategies.

For instance, China utilised its pre-existing surveillance infrastructure to track whether individuals are compliant with quarantine orders, as well as to detect potential COVID-19 infections by using thermal scanners to measure body temperature of individuals at train stations. Another example of data-driven tools is the availability of data-driven dashboards, which are used to map the status of the COVID-19 pandemic in real time. For more information on global and European dashboards that map the spread of COVID-19 read our data story about COVID-19 dashboards. Furthermore, contact-tracing apps are deployed across the world to automate the tracing of individuals who may have been in proximity to others who are a potential vessel for contagion, either because they were positively tested for the virus, otherwise diagnosed, or were themselves involved in interactions with others who are sick. For more information on contact-tracing apps read our data story about contact-tracing apps.

These four areas where data plays a crucial role in fighting COVID-19 highlight that the extent to which data in the current pandemic supports us is unlike any outbreak we have seen before. The response to COVID-19 is data-driven and data utilisation increased extraordinarily compared to previous epidemics and pandemics.

However, critical views exist on whether politicians and experts effectively made use of the available data to operate and respond effectively amidst these crises. Or, did the deluge of information blind us in timely drawing the right conclusions? As a follow up to this data story, we will address this question and reflect on how data can help the world prepare better for the next epidemic and pandemic, or even prevent it.


Disclaimer: Whether about COVID-19 or any other topic, data may not always be trustworthy or reflect the situation accurately. For example, data collection is easily biased by the assumptions of the individuals that collect the data, and their perspective of the world. Therefore, accurate visualisations and interpretations of data are key, as well as a critical review of trustworthiness.

Contact details: contact form:


Looking for more open COVID-19 related datasets or initiatives? Visit the EDP for COVID-19 curated lists and follow us on Twitter, Facebook or LinkedIn.


[1] In the link, select the Pharmaceuticals and Medical Technology menu item within the Industry and Innovation section.