Machine Learning Tracks Evolution of COVID-19 Misinformation

A new machine learning tool could help public health officials prevent the spread of COVID-19 misinformation.

Jessica Kent

Published: 21 Apr 2021

A machine learning algorithm could help public health officials identify COVID-19-related conspiracy theories on social media, potentially reducing the spread of misinformation online, a study published in JMIR revealed.

“A lot of machine-learning studies related to misinformation on social media focus on identifying different kinds of conspiracy theories,” said Courtney Shelley, a postdoctoral researcher in the Information Systems and Modeling Group at Los Alamos National Laboratory and co-author of the study.

“Instead, we wanted to create a more cohesive understanding of how misinformation changes as it spreads. Because people tend to believe the first message they encounter, public health officials could someday monitor which conspiracy theories are gaining traction on social media and craft factual public information campaigns to preempt widespread acceptance of falsehoods.”

Researchers used publicly available, anonymized Twitter data to characterize four COVID-19 conspiracy theory themes and provide context for each through the first five months of the pandemic.

The four theories examined included the idea that 5G cell towers spread the virus, that the Bill and Melinda Gates Foundation engineered COVID-19, that the virus was developed in a laboratory, and that the COVID-19 vaccines would be dangerous.

“We began with a dataset of approximately 1.8 million tweets that contained COVID-19 keywords or were from health-related Twitter accounts,” said Dax Gerts, a computer scientist also in Los Alamos’ Information Systems and Modeling Group and the study’s co-author.

“From this body of data, we identified subsets that matched the four conspiracy theories using pattern filtering, and hand labeled several hundred tweets in each conspiracy theory category to construct training sets.”

Using data collected on each of the four theories, researchers built machine learning models that categorized tweets as COVID-19 misinformation or not.

“This allowed us to observe the way individuals talk about these conspiracy theories on social media, and observe changes over time,” said Gerts.

The team found that misinformation tweets contain more negative sentiment when compared to factual tweets. Additionally, conspiracy theories evolve over time, integrating details from unrelated conspiracy theories as well as real-world events.

“This theory evolution will likely necessitate public health messaging, which also evolves to address a changing landscape,” the researchers said.

“Our work demonstrates that off-the-shelf methods can be combined to track conspiracy theories, both in the moment and through time, to provide public health professionals with better insight into when and how to address health-related conspiracy theories. These same methods can also track public reaction to messaging to assess its impact.”

The study also found that public health officials could use a supervised learning technique to automatically identify conspiracy theories, while officials could use an unsupervised learning method to explore changes in word importance among topics within each theory.

“It’s important for public health officials to know how conspiracy theories are evolving and gaining traction over time,” said Shelley.

“If not, they run the risk of inadvertently publicizing conspiracy theories that might otherwise ‘die on the vine.’ So, knowing how conspiracy theories are changing and perhaps incorporating other theories or real-world events is important when strategizing how to counter them with factual public information campaigns.”

Researchers have previously turned to AI technologies to help track people’s interest in topics related to COVID-19. A study published in May 2020 showed that public health leaders could use natural language processing to track surges in interest in COVID-19 topics on online forums like Reddit.

“Public health priorities do not always align with community priorities, and the success of public health efforts often depends on having a plan to address community concerns,” said Daniel Stokes, a research fellow with the Center for Emergency Care Policy and the Center for Digital Health at Penn Medicine.

“Having a source like Reddit that is directly tied to people’s thoughts could prove invaluable in crafting plans that meet people where they are.”

Machine Learning Tracks Evolution of COVID-19 Misinformation

A new machine learning tool could help public health officials prevent the spread of COVID-19 misinformation.

Next Steps

Dig Deeper on Artificial intelligence in healthcare

What is X (formerly Twitter)?

Misinformation runs deeper than social media

Docs judge patients' beliefs in medical misinformation

Community Notes effectively targets medical misinformation on X