Chinnapong - stock.adobe.com

Natural Language Processing Tracks Public Interest in COVID-19 Topics

Using natural language processing, researchers can monitor popular online comment boards and better understand public concerns during the COVID-19 pandemic.

A team from Penn Medicine has shown that public health officials can use natural language processing techniques to track surges in interest in COVID-19 topics on online forums, such as Reddit.

For more coronavirus updates, visit our resource page, updated twice daily by Xtelligent Healthcare Media.

These insights could help inform leaders’ understanding of public health concerns and priorities, as well as mitigate the spread of misinformation.

“Public health priorities do not always align with community priorities, and the success of public health efforts often depends on having a plan to address community concerns,” said Daniel Stokes, a research fellow with the Center for Emergency Care Policy and the Center for Digital Health at Penn Medicine. “Having a source like Reddit that is directly tied to people’s thoughts could prove invaluable in crafting plans that meet people where they are.”

For the study, published in the Journal of General Internal Medicine, researchers chose to evaluate discussions on Reddit, the 19th most popular site in the world. Reddit is also relatively unfiltered and up-to-date, making it an optimal online forum for the study.

“Real-time analysis of public response could lead to earlier recognition of changing public priorities, fluctuations in wellness, and uptake of public health measures, all of which carry implications for individual- and population-level health,” the team stated.

For example, researchers noted that real-time monitoring of Reddit could have enabled a faster response to a surge of questions about whether it was safe to go outside in mid-March, researchers said. The CDC did not issue official guidelines for safely going to parks or conducting outdoor activities until much later.

If public health officials had monitored online discussion activity, the guidance could have been issued closer to the peak of interest, the researchers said.

“The CDC didn’t make their recommendations on wearing masks in public until early April, so it is interesting to see that masks were being discussed a great deal prior to that recommendation,” Stokes said. “Perhaps it was a sign that many people were ready for these guidelines earlier.”

Additionally, Reddit is also the place where misinformation about COVID-19 has spread. Examples include one poster’s belief that a natural remedy like licorice root could prevent COVID-19 infection, or another poster’s thought that the virus was human engineered.

To identify surges of public interest, researchers collected nearly 95,000 posts from March 3 through March 31, 2020 on the most popular COVID-19 thread on Reddit, r/Coronavirus.

The team identified 50 different discussion topics using natural language processing. Ten of those topics were determined to be most related to three areas of interest in the study: response to public health measures, the sense of the pandemic’s severity, and the pandemic’s impact on daily life.

By tracking the popularity of these topics day-by -day, the team was able to demonstrate how areas of interest ebbed and flowed. For example, the results showed that hand-washing peaked early on, between March 3 and 6. Researchers also found that users discussed concerns about personal finances roughly 50 percent more at the end of March compared to the beginning of the month.

The analysis also showed that some topics that were popular at the start of the month stayed top of mind for people, or had a comeback later in the month, as was the case for mask-wearing.

Going forward, the team will continue tracking and analyzing posts on this COVID-19-specific thread.

“This analysis indicates that longitudinal topic modeling of Reddit content is effective in identifying patterns of public dialogue and could be used to guide targeted interventions. For instance, comparisons to the flu were embraced by the public,” the team said.

“Early recognition of this reality could have led to more specific information dissemination campaigns and earlier public acknowledgement of disease severity.”

In addition to this study, a team from Penn’s Center for Digital Health is collecting similar data through Twitter and map it across the US.

“We are aiming to incorporate input from several digital sources that would allow us to not just track the public’s sentiment and perception of the virus, but also track, in real time, the emergence of new outbreaks,” said Rain Merchant, MD, an associate professor of Emergency Medicine and senior author of this Journal of General Internal Medicine study.

Researchers on the Reddit study believe this natural language processing approach can help public health officials combat the spread of misinformation during the COVID-19 pandemic.

“The success of our public health efforts depends on public buy-in,” Stokes said. “Early comparisons to the flu on Reddit may have indicated a gap in public understanding of pandemic severity. Recognizing such gaps can be useful in developing targeted campaigns to close them.”

Next Steps

Dig Deeper on Artificial intelligence in healthcare