Getty Images

ML Model Estimates Weekly Opioid Overdose Deaths Using Proxy Data

New analysis shows that a machine learning model can estimate national opioid overdose mortality trends in near real-time using proxy data sources.

Researchers have developed a machine learning (ML) model capable of estimating national weekly opioid overdose mortality trends in near real-time using proxy data sources such as public health information and law enforcement data.

Forecasting opioid overdose deaths is a major component of efforts to combat the opioid crisis, but issues around overdose data prevent public health officials from doing so effectively. According to the study, national data on opioid overdose deaths are often delayed by several months or more, seriously limiting their usability.

Earlier in 2022, researchers from the National Institute of Drug Abuse at the National Institutes of Health (NIH) argued that these data lags are such a large issue that they force public health officials to fight the opioid epidemic “blindfolded.”

Further, the researchers stated that the only way to effectively address the opioid crisis is to use real-time, disaggregated data to identify which groups of individuals are most at-risk and use that information to target prevention and treatment at the local level.

However, real-time data access is a challenge because it relies on multiple processes that can take weeks or months to complete, such as toxicology testing, medicolegal investigations, and death certifications. Because of this, public health officials typically rely on data from previous years, and interest in the potential use of proxy data sources for mortality estimation has grown.

In this study, the researchers sought to evaluate if a model using health, law enforcement, and online data proxies could accurately estimate weekly opioid overdose deaths. Data sources included: time series data from 2014 to 2019 on emergency department visits for opioid overdose from the National Syndromic Surveillance Program, data on the volume of heroin and synthetic opioids circulating in illicit markets via the National Forensic Laboratory Information System, and data on the search volume for heroin and synthetic opioids on Google.

Data on post volume on heroin and synthetic opioids on Twitter and Reddit were also used to train and validate prediction models of opioid overdose deaths. Weekly predictions of opioid overdose mortality were made for 2018 and 2019. Results were compared with those from a baseline seasonal autoregressive integrated moving average (SARIMA) model, which is one of the most common approaches used to forecast mortality in this area.

Overall, the statistical models had an approximate one percent error rate. In terms of the accuracy of the weekly estimates between models for both years, the ML model had an average error in its weekly estimates of 60.3 overdose deaths, compared with 310.2 overdose deaths for the SARIMA model, for 2018. In 2019, the ML model’s error was 67.2 overdose deaths, compared with 83.3 overdose deaths for the SARIMA model.

These findings suggest that proxy administrative data sources can be used to successfully estimate national opioid overdose mortality trends, which could provide a more timely understanding of the opioid crisis, the study authors concluded. However, they also noted that proxy data sources, such as the ones used in this study, have significant limitations and that further validation of this research is needed.

Next Steps

Dig Deeper on Artificial intelligence in healthcare