DrAfter123/DigitalVision Vectors

Natural Language Processing Accurately Detects HPV-Related Cancers

A natural language processing algorithm was able to extract information from pathology reports to accurately identify patients with HPV-related cancers.

Natural language processing tools can analyze pathology reports and identify individuals with HPV-related cancers, improving disease surveillance and treatment, according to a study published in JMIR Medical Informatics.

Infection with human papillomavirus (HPV) can lead to precancerous anogenital lesions as well as invasive cancer. Researchers noted that in the US, approximately 25,000 cases of anogenital cancer cases are diagnosed every year, with cervical and anal cancer making up 75 percent of these cases. Over 90 percent of these cases are attributed to infection with HPV types that are preventable by the use of HPV vaccines.

“Although HPV vaccines have high proven efficacy, the way we use these vaccines to prevent HPV cancers is still in need of improvement. Accurate identification and tracking of new cases of HPV cancers is an important step toward the development of strategies that optimize the use of HPV vaccines,” the researcher team stated.

While surveillance for HPV-associated outcomes is critical for monitoring and improving immunization programs, surveillance for HPV cancers is a challenge. Most of the clinical data needed to diagnose a patient with an HPV-related cancer is stored in pathology reports, which are often stored in a narrative format and can contain nondiagnostic information like medical history.

“Although a manual review of these free-text pathology reports is the most accurate case-finding method, it is a laborious process that can become too impractical for large-scale surveillance projects,” the group said.

“To facilitate data capture and analysis, considerable efforts have been made to promote processes that encourage pathologists to document their findings in a specific format and using standardized terminology. However, most efforts to incorporate standardized reporting have yet to be consistently implemented by health care providers and institutions.”

The team set out to develop a natural language processing tool to better extract information from pathology reports, leading to an accurate and scalable surveillance platform for HPV vaccine-preventable cancers.

Researchers selected full-length cervical and anal pathology reports from four clinical pathology laboratories. Two researchers manually and independently reviewed all reports and classified them at the document level according to two domains: diagnosis and human papillomavirus testing results.

Using the manual review as the gold standard, researchers evaluated the algorithm’s performance using standard measurements of accuracy, recall, precision, and F-measure. The team validated the algorithm’s performance on 949 pathology reports.

The results showed that the algorithm identified abnormal cytology, histology, and positive HPV tests with an accuracy greater than 0.91. Precision was lowest for anal histology reports, at 0.87, and highest for cervical cytology at 0.98. The NLP algorithm missed two out of the 15 abnormal anal histology reports, which resulted in a relatively low recall (0.68).

These findings show the potential for NLP tools to accelerate disease surveillance and treatment, the team stated.

“This demonstration of accuracy is an important first step toward the development of a tool that can facilitate the automation of surveillance for HPV vaccine-preventable cancers and precancers,” researchers said.

“There is an increasing body of evidence showing the merits of an NLP system over manual review for data extraction and document classification for disease surveillance. A key contribution of this study is the integration and application of well-validated NLP methodologies to solve a real-world public health problem.”

The algorithm offers an efficient way to use existing resources to measure the extent to which HPV vaccines reduce infections at the population level. This could help healthcare leaders identify areas where immunization programs need to be strengthened.

While additional improvements are necessary to optimize the performance of this algorithm before it can be used in clinical practice or surveillance, the results show the ability for NLP to accelerate healthcare processes.

“We show that with this algorithm, it is possible to accurately detect patients with HPV-related abnormalities at these anatomical sites. These data provide preliminary support for the use of our NLP instrument for the surveillance of HPV cancer and precancer of the cervix and anus,” researchers concluded.

Natural language processing tools have previously demonstrated their ability to improve disease surveillance. In a 2019 study, researchers used NLP models to better detect low blood sugar in patients with diabetes, leading to improved chronic disease management.

“Knowledge of these factors could assist clinicians in identifying patients with higher risk of hypoglycemia, allowing them to intervene to help their patients in lowering that risk,” said Michael Weiner, MD, MPH, director of the Regenstrief Institute William M. Tierney Center for Health Services Research and the senior author of the study. 

“Some factors influencing hypoglycemia may not be immediately obvious. In addition, reassessing hypoglycemia risk as a patient's health status changes may be important as new factors are identified.”

Next Steps

Dig Deeper on Artificial intelligence in healthcare