COVID-19 Data Visualizations Shared on Twitter May Lack Reliability
Several common issues often arise when Twitter users develop and share COVID-19 data visualizations, including data mistrust and cognitive bias.
The majority of COVID-19 data visualizations shared by average Twitter users contained one of five errors that reduced their accuracy or reliability, according to a study published in Informatics.
For more coronavirus updates, visit our resource page, updated twice daily by Xtelligent Healthcare Media.
During the COVID-19 pandemic, people have taken to Twitter to share charts from news outlets and government organizations, as well as develop their own casual data visualizations about COVID-19 and posted them on Twitter.
Casual data visualizations refer to charts and graphs that rely on tools readily available to average users in order to depict information in a meaningful way. These visualizations differ from traditional data visualization because they aren’t from developed or distributed by trusted health information sources, such as the CDC, World Health Organization, or the media.
"Experts have not yet begun to explore the world of casual visualizations on Twitter," said Francesco Cafaro, an assistant professor in the School of Informatics and Computing, who led the study. "Studying the new ways people are sharing information online to understand the pandemic and its effect on their lives is an important step in navigating these uncharted waters."
Because more and more people are relying on their own data analyses to inform their personal choices about reopening businesses or going out to restaurants, effectively designing and interpreting these data visualizations is especially critical.
"The reality is that people depend upon these visualizations to make major decisions about their lives: whether or not it's safe to send their kids back to school, whether or not it's safe to take a vacation, and where to go," Cafaro said.
"Given their influence, we felt it was important to understand more about them, and to identify common issues that can cause people creating or viewing them to misinterpret data, often unintentionally."
Researchers identified 5,409 data visualizations shared on Twitter between April 14 and May 9, 2020. Of these, the team randomly selected 540 for random analysis, with full statistical analysis reserved for 435 visualizations based on additional criteria. Of these, 112 data visualizations were made by average citizens.
The results showed that more than half of the analyzed visualizations from average users contained one of five common errors that reduced their clarity, accuracy, or trustworthiness.
More than 25 percent of the analyzed posts failed to clearly identify the source of their data, leading to mistrust of the information. Researchers found that data sources were often obscured because of bad color choices, unclear layouts, or typos.
To overcome this problem, the team recommended clearly labeling data sources, as well as putting this information on the graphic itself rather than the accompanying text.
Eleven percent of posts displayed issues related to proportional reasoning, which refers to users’ ability to compare variables based on ratios or fractions. For example, understanding infection rates across different geographical locations is a challenge of proportional reasoning.
The authors suggested that users utilize labels such as number of infections per 1,000 people to compare regions with disparate populations, because this metric is easier to understand than absolute numbers or percentages.
Researchers also identified seven percent of posts with issues of temporal reasoning, which refers to users’ ability to understand change over time. This included visualizations that compared the numbers of deaths to from flu in a full year to the number of COVID-19 deaths in a few months.
Addressing this issue will require breaking metrics that rely on different time scales in separate charts, researchers said, as opposed to displaying the data in a single chart.
In addition to these challenges, the team noted that a small percentage of posts contained text that seemed to encourage users to misinterpret data based on the creator’s biases related to race, country, and immigration. The group stated that information should be presented with clear, objective descriptions carefully separated from any accompanying political commentary.
Two percent of data visualizations were also based on misunderstandings of the coronavirus, such as the use of data related to SARS or the flu.
Moreover, researchers found that certain types of data visualizations performed strongest on social media. Data visualizations that showed change over time, including line or bar graphs, were most commonly shared. Users also engaged more frequently with charts conveying numbers of deaths as opposed to numbers of infections or impact on the economy.
This finding indicates that people were most interested in the virus’s impact on mortality than its influence on other health metrics or societal effects.
"The challenge of accurately conveying information visually is not limited to information sharing on Twitter, but we feel these communications should be considered especially carefully given the influence of social media on people's decision-making," Cafaro said.
"We believe our findings can help government agencies, news media and average citizens better understanding the types of information about which people care the most, as well as the challenges people may face while interpreting visual information related to the pandemic."
Social media has featured largely in the COVID-19 pandemic. Researchers have continually analyzed these networks to better understand the public’s interest in COVID-19 topics, how this information influences people’s decisions, and how much of the information being shared is actually accurate.
As the pandemic wears on, social media trends will continue to inform public health policies and research efforts.