Getty Images

COVID-19 Mobility Data Can Exclude Older and Non-White People

Mobility data captured during COVID-19 is less likely to include older and non-white voters, potentially exacerbating disparities in underserved groups.

Smartphone-based mobility data used to respond to COVID-19 can leave out older and minority voters, which could lead jurisdictions to under-allocate important health resources to underserved populations.

That’s the principal finding of a study published in the Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, a publication of the Association for Computing Machinery.

Throughout the pandemic, researchers and public health officials have widely adopted anonymized smartphone-based mobility data to design and evaluate COVID-19 response strategies. Leaders can use this information to analyze the effectiveness of social distancing measures, determine how people’s travel impacts virus transmission, and understand how social distancing has affected different sectors of the economy.

However, there has been limited assessment of the reliability of this data, partly because of the lack of documentation about which users are represented.

Researchers from Carnegie Mellon University (CMU) and Stanford University sought to conduct the first independent audit of demographic bias of a smartphone-based mobility dataset used in response to COVID-19.

The team assessed the validity of SafeGraph data, a widely used mobility dataset containing information from approximately 47 million mobile devices in the US. The data comes from mobile applications, such as navigation, weather, and social media apps where users have opted in to location tracking.

When the pandemic began, SafeGraph released much of its data for free as part of the COVID-19 Data Consortium to help researchers, nonprofits, and governments gain insight and develop responses. As a result, SafeGraph’s mobility data has been widely used in pandemic research and to inform public health orders and guidelines issued by governors’ orders, large cities, and counties.

While SafeGraph has reported publicly on the representativeness of its data, researchers suggest that because the company’s analysis examined demographic bias only at census-aggregated levels and did not address demographic bias for inferences specific to places of interest, an independent audit was necessary.

A big challenge in conducting the audit is the lack of demographic information, as SafeGraph data doesn’t contain demographics like age and race. In the study, the team showed how administrative data can provide the demographic information needed for a bias audit, supplementing the information provided by SafeGraph.

Researchers used North Carolina voter registration and turnout records, which typically include information on gender, age, and race, as well as voters’ travel to a polling location on Election Day.

The data came from a private voter file vendor that combines publicly available voter records. In all, the study included 539,000 voters from North Carolina who voted at 558 locations during the 2018 general election.

Researchers identified a sampling bias in the SafeGraph data that underrepresents two high-risk groups: older and minority voters. This gap is particularly concerning in the context of the COVID-19 pandemic, as jurisdictions may under-allocate pop-up testing sites, masks, and other important health resources to vulnerable communities.

"Older age is a major risk factor for COVID-19-related mortality, and African-American, Native-American, and Latinx communities bear a disproportionately high burden of COVID-19 cases and deaths," said Amanda Coston, a doctoral student at CMU's Heinz College and Machine Learning Department, who led the study as a summer research fellow at Stanford University's Regulation, Evaluation, and Governance Lab.

"If these demographic groups are not well represented in data that are used to inform policymaking, we risk enacting policies that fail to help those at greatest risk and further exacerbating serious disparities in the health care response to the pandemic."

Researchers urged leaders to consider other resources when determining how to allocate health resources. Additionally, the team said there should be more work to determine how mobility data can be more representative. Leaders should ask firms that provide this kind of data to be more transparent in including the sources of their data, such as identifying which smartphone applications were used to access the information.

“While SafeGraph information may help people make policy decisions, auxiliary information, including prior knowledge about local populations, should also be used to make policy decisions about allocating resources,” said Alexandra Chouldechova, assistant professor of statistics and public policy at CMU, who coauthored the study.

The study was limited in that voters in the US tend to be older and include more white people than the general population, so the study’s results may underestimate the sampling bias in the general population.

Moreover, because SafeGraph provides researchers with an aggregated version of the data for privacy reasons, the team could not test for bias at the individual voter level. Instead, researchers tested for bias at physical places of interest, finding evidence that SafeGraph is more likely to capture traffic to places frequented by younger, mostly white visitors.

Despite these limitations, the team noted that this study shows how administrative data can be used to overcome the lack of demographic information – a common obstacle in conducting bias audits. 

Next Steps

Dig Deeper on Health data governance