Skip to main content

The Power of Data in the Everlasting Battle: Humanity vs. Virus (2/2)

Humanity vs Virus
Part 2: The role of data in the fight against epidemics and pandemics in the present and future.

The lessons we learned from previous outbreaks, combined with our current capacity to collect and process data, shape a more data-driven and evidence-based response against COVID-19. In the first part of this data story, we explored the historical role of data in epidemics and pandemics over the past century and highlighted how the role of data in COVID-19 is different. In this second part, we describe what we can learn from this pandemic, including the importance of preserving data, knowledge and experience that can save human lives in the future. The question is: can data help the world better prepare for the next epidemic or pandemic, or even prevent it?


Reflecting on how data is used in COVID-19

Scientist argue that the current pandemic could have been prevented if we had given more attention to crucial infectious disease predictions in the past. In early 2019, scientists argued that a deadly pandemic was long overdue and top animal disease experts even specifically warned that a coronavirus pandemic, such as COVID-19, could occur.

Outbreak prevention

These warnings came from experts who found that occurrences of zoonotic spill over - meaning that a disease transfers from wildlife to humans - are more frequent in recent decades and pose a growing threat to human life and society. This threat is increasing due to the speed of global travel, the growth of the population, and increased contact and exposure to wildlife and other species such as bats. If the threat of zoonotic spill had been acknowledged earlier, authorities, scientists and researchers could have invested more strongly in prevention, effective response strategies and the establishment of resilient medical infrastructures that can cope with outbreaks.

As research predicted that a pandemic was likely to happen, the COVID-19 editorial team investigated what cost savings would be associated with early prevention of infectious diseases via vaccine development. The team found that in 2019, scientists identified 11 different diseases with the potential to kill millions of people in a pandemic. The costs of developing vaccines for all of these 11 diseases, including severe acute respiratory syndrome, lay between €2.5 billion and €3.4 billion[1]. However, research institutions commonly lack the required funding to get preventative vaccine research off the ground. Furthermore, in 2016, a team of researchers had developed a vaccine against a deadly strain of coronavirus, but as this occurred more than a decade after SARS in 2003, it was impossible for them to convince public or private investors to fund human trials.

To determine how the seemingly high figures for preventative vaccine development relate to the actual cost of a viral outbreak, we have looked more in depth at SARS and COVID-19. For SARS in 2003, the cost of combating this viral outbreak was estimated at €49.1 billion1. However, the costs of developing a vaccine for SARS is thought to be strikingly lower, at €299 to €426 million1. As for COVID-19, preliminary research estimates the cost of the pandemic to be at least €1 trillion1 already. In contrast, the Coalition for Epidemic Preparedness Innovations (CEPI) estimates to require €1.8 billion1 in funding to finance the development of a vaccine against COVID-19. A comparison of these figures is illustrated in Figure 1.

Visual comparison of estimated costs

Figure 1: Visual comparison of the estimated costs of vaccine development for SARS and COVID-19 and the estimated costs of actually combatting these diseases without early prevention, as created by the COVID-19 editorial team[2].

The estimations above confirm what is intuitive: that the cost of vaccine development is significantly lower than the cost of an epidemic or pandemic actually occurring.

Even though it is hard to predict whether a certain infectious disease will occur or not and therefore whether it is worth the investment, it is key to acknowledge that - at the moment that an outbreak is occurring - it is already too late to develop a vaccine. The time it takes causes an extensive build-up of costs and socio-economic losses, even if vaccine research was launched as soon as the outbreak appeared. Therefore, allocating the required resources for preventative vaccine development for diseases that have potential to kill millions of people in a pandemic and developing AI capabilities to predict the evolution of coronavirus can create a significant advantage for governments and researchers to fight the next pandemic.

The lack of preventative vaccine development therefore points to a broader challenge, where inefficient allocation of funds leads to an inability to research and develop preventative vaccines that could tremendously reduce the socioeconomic impact of a viral outbreak. Early prevention could save a significant amount of money in the long run, which highlights that governmental institutions play a key role in acknowledging the risks of viral outbreaks and ensuring allocation of capital to research institutions for conducting research and vaccine development. 

A swifter response

Another aspect of COVID-19 where data could have been used better is a faster response to the identification of the outbreak. When considering COVID-19, swift government responses were essential in slowing down the spread of COVID-19, as intervention measures are significantly more effective when taken early.

In one study, scientists used sophisticated modelling to determine how the number of infected individuals would have differed if China’s containment measures were imposed earlier, which would have been achievable if the findings by the scientific community were acknowledged sooner. Their non peer-reviewed results illustrated that the number of cases in Mainland China could have been decreased by 66%, if measures had been imposed just one week earlier. This impact of earlier intervention is illustrated in Figure 2.  

Estimated affected areas of COVID19 in China

Figure 2: Visualisations of estimated affected areas of COVID-19 in mainland China, without inter-city travel restrictions on 29 February 2020 (left) and on 29 February 2020 with imposed interventions at one week earlier than the actual timing (right), available on WorldPop.

Moreover, when China alerted the World Health Organization of pneumonia-like symptoms with unknown origins, only a small number of countries took immediate action, such as South Korea, Taiwan, Singapore and China itself, who all had experience with the SARS outbreak in 2003. In contrast, most Western authorities had a lower sense of urgency and necessity to rapidly impose containment measures, and only acted once the disease hit their countries. This delayed response is not necessarily surprising, as governments may have trusted that the virus would be successfully contained within Asia, as it happened with SARS in 2003. Furthermore, authorities have to strike a balance between responding timely and avoiding that their scientific experts and citizens consider their response as an overreaction.

Thus, when considering the question whether we could have used our data better, it seems like we missed opportunities in two areas. For one, evidence already existed that epidemics and pandemics were likely to occur in the near future, though insufficient investments were made to prevent these viral outbreaks from spreading worldwide. And secondly, even when the outbreak was identified, a timely response was crucial for containment, for which the right contingency plans and agility were not in place. 


What can we learn to fight, or even prevent, the next epidemic or pandemic?

As illustrated by the COVID-19 emergency, improvements in data technology enable researchers and scientist to accurately assess the risks associated with epidemics and pandemics and can help governmental institutions to establish a timely response. In that regard, our abilities have tremendously improved in comparison to the Spanish flu pandemic.

Nonetheless, we identify two areas for further improvement:

1) data openness can help further accelerate successful epidemic and pandemic prevention and combatting, and

2) we must use the available data effectively.

Early and open sharing of data such as publications, datasets, software, code and other scientific material plays an essential role in enabling a quick response to epidemics or pandemics. When considering COVID-19, the contours of increased open data become visible, as seen in the growth of open access publishing, online availability of the COVID-19 genome, and the number of open data initiatives towards combatting COVID-19 collaboratively.

Nonetheless, a significant amount of relevant data, such as articles about the availability of medical devices or on medical knowledge, for instance the effects of additional administration of oxygen during assisted respiration, are not freely accessible yet. Consequently, life-saving information may not reach the wider public and this lack of openness and unity inhibits our abilities to respond most effectively to COVID-19.

In June 2019, the European Union’s Directive on open data and the re-use of Public Sector Information, also known as the ‘Open Data Directive’ was adopted. With this directive Europe aims to encourage Member States to facilitate the re-use of public sector data, including publicly funded research data, with minimal or no legal, technical and financial restraint. In addition, the Directive has started the process by which high-value datasets will be made available for re-use. Member States will have to transpose the Directive into national law by the summer of 2021. It is interesting to imagine how much more data on COVID-19 would have been available today if the Directive had already been transposed and implemented in Member States.

Even if we had all desired data openly available, the second challenge is how to ensure its effective use. First of all, benefits can be gained if researchers and scientists become more persistent and louder in disseminating their findings, even if such findings predict unpopular events such as viral outbreaks. Experts have a responsibility to do so, as they can save an extensive amount of human lives and preserve the world’s economy from significant damage. However, achieving this also requires a cultural change, in which we do not consider data as an alien threat, but rather embrace it as a valuable means of information for evidence-based politics and decision-making.

Secondly, AI can play a valuable role in helping scientists and researchers make effective use of data and help our politicians make decisions, though it won’t replace democracy nor human discourse in general. Ultimately the decisions need to be made by humans, incorporating our ethics and values, where an AI could not.

Finally, data can be used to communicate effectively. Data visualisations can illustrate phenomena that are not naturally intuitive for the human mind, such as slow transformation and long-term effects of incidents, and make them tangible. For example, it is nearly impossible to observe climate change from one day to another. However, we can use environmental data to visualise climate change over the decades, which makes the change of the climate undeniable. Data visualisations can succeed where words fail, and create the necessary sense of urgency to take action. In the case of epidemics and pandemics, visualisation of data can help accelerate the outbreak response. For instance, in the future a simple visual comparison of data on the infection rate and spread of a novel virus outbreak with data we now have on COVID-19 could lead to a more rapid intervention.

It is then compelling that we carefully preserve the data we are collecting on COVID-19 and other epidemics and pandemics. The better the records, the stronger our ability to effectively act upon the next emergency in the future.


Disclaimer: Whether about COVID-19 or any other topic, data may not always be trustworthy or reflect the situation accurately. For example, data collection is easily biased by the assumptions of the individuals that collect the data, and their perspective of the world. Therefore, accurate visualisations and interpretations of data are key, as well as a critical review of trustworthiness. Furthermore, constraints exist in the data used for the data visualisation on vaccine development and the actual costs of the SARS and COVID-19 outbreak (Figure 1), as these provide (preliminary) estimations.

Contact details: contact form:

Looking for more open COVID-19 related datasets or initiatives? Visit the EDP for COVID-19 curated lists and follow us on Twitter, Facebook or LinkedIn.


[1] Exchange rates as of 01 January 2020

[2] Exchange rates as of 01 January 2020