Print
Category: News

Washington, DC - Big data derived from electronic health records, social media, the internet and other digital sources have the potential to provide more timely and detailed information on infectious disease threats or outbreaks than traditional surveillance methods. A team of scientists led by the National Institutes of Health reviewed the growing body of research on the subject and has published its analyses in a special issue of The Journal of Infectious Diseases.

Traditional infectious disease surveillance - typically based on laboratory tests and other data collected by public health institutions - is the gold standard. But, the authors note it can have time lags, is expensive to produce, and typically lacks the local resolution needed for accurate monitoring. Further, it can be cost-prohibitive in low-income countries. In contrast, big data streams from internet queries, for example, are available in real time and can track disease activity locally, but have their own biases. Hybrid tools that combine traditional surveillance and big data sets may provide a way forward, the scientists suggest, serving to complement, rather than replace, existing methods.

“The ultimate goal is to be able to forecast the size, peak or trajectory of an outbreak weeks or months in advance in order to better respond to infectious disease threats. Integrating big data in surveillance is a first step toward this long-term goal,” says Cecile Viboud, Ph.D., co-editor of the supplement and a senior scientist at the NIH’s Fogarty International Center. “Now that we have demonstrated proof of concept by comparing data sets in high-income countries, we can examine these models in low-resource settings where traditional surveillance is sparse.”

Experts in epidemiology, computer science and modeling collaborated on the supplement’s 10 articles. They report on the opportunities and challenges associated with three types of data: medical encounter files, such as records from healthcare facilities and insurance claim forms; crowdsourced data collected from volunteers who self-report symptoms in near real time; and data generated by the use of social media, the internet and mobile phones, which may include self-reporting of health, behavior and travel information to help elucidate disease transmission.

But big data’s potential must be tempered with caution, the authors say. Non-traditional data streams may lack key demographic identifiers such as age and sex, or provide information that underrepresents infants, children, the elderly and developing countries. Social media outlets may not be stable sources of data, as they can disappear if there is a loss of interest or financing. Most importantly, any novel data stream must be validated against established infectious disease surveillance data and systems, the authors said.

Each article features a promising example of the use of big data to monitor and model infectious diseases activity:

While the new hybrid models that combine traditional and digital disease surveillance methods show promise, the scientists agree there is still an overall scarcity of reliable surveillance information, especially compared to other fields such as climatology, where the data sets are huge. “To be able to produce accurate forecasts, we need better observational data that we just don’t have in infectious diseases,” notes Professor Shweta Bansal of Georgetown University, a co-editor of the supplement. “There’s a magnitude of difference between what we need and what we have, so our hope is that big data will help us fill this gap.”

Multi-disciplinary initiatives such as the NIH-led Big Data to Knowledge program will be instrumental in expanding the use of big data in research, as noted in the supplement.

The publication’s authors include scientists affiliated with Fogarty’s Research and Policy for Infectious Diseases program (RAPIDD), grantees from NIH’s National Institute of General Medical Sciences, and researchers from nearly 20 universities throughout North America and Europe. The supplement was produced with support from Georgia State University, the Fogarty International Center, Northeastern University and Georgetown University.