Getty Images

How AI-driven data curation could advance precision medicine

The data curation requirements for successful precision medicine initiatives present hurdles for healthcare stakeholders. Can AI help alleviate that burden?

Precision medicine -- an approach that uses insights into a patient's genes, environment and behavior to personalize disease prevention and treatment -- relies on access to high-quality data. By analyzing vast amounts of information from EHRs and other sources, precision medicine has the potential to transform various healthcare and medical research applications, including organ-on-a-chip technology, stem cell therapies and cancer care.

Analytics-driven precision medicine in oncology has already seen significant interest across the healthcare and life sciences industries, but curating and processing the data necessary to inform these efforts remains a challenge.

To tackle this, Precision Health Informatics (PHI), a subsidiary of community-based cancer care provider Texas Oncology, is collaborating with healthcare technology company COTA, Inc. to pursue AI-enabled data curation aimed at accelerating precision medicine at the point of care.

Data curation challenges in precision medicine

Part of what makes data processing difficult in the context of cancer care is the wealth of information that healthcare organizations have access to thanks to the proliferation of EHRs, according to C.K. Wang, M.D., chief medical officer at COTA.

He emphasized that while the concept of real-world data (RWD) isn't new, the accessibility of medical record data is a more recent development. Prior to the advent of EHR systems, much of the RWD in the healthcare space was sourced from claims data and prescription information, which limited the insights that could be derived for use in the clinical sphere.

Wang underscored that EHR data gave clinicians and researchers unprecedented access to a wealth of new insights into patients and care patterns. However, this increased availability of RWD comes with its own pitfalls.

First, the scope of what constitutes RWD has continued to expand alongside the availability of tools like wearables and frameworks for capturing patient-reported outcomes. Further, data from sources like EHRs can be "messy," necessitating approaches to parse that information for data analytics efforts.

"Most of that data resides in unstructured information," Wang explained. "More and more over the years -- though there are discrete data elements [in] the more structured data, which you could pull out pretty quickly -- the insights that we're looking for when we talk about clinical real-world data still resides largely in that unstructured data."

For the foreseeable future, even with the rapid evolution of technologies such as AI, there will always be a human component to this work because you need that human expert.
C.K. Wang, M.D.Chief medical officer, COTA, Inc.

He noted that, to date, abstracting unstructured data from medical records to make it more useable requires significant human expertise and resources. COTA's work is concerned with this EHR abstraction, and the company has developed an AI-driven data curation tool known as CAILIN to streamline that process and enable users to query a data set as they would in a search engine.

But even with such tools, Wang indicated that the role of an expert with a medical background is key to clinical data abstraction.

"For the foreseeable future, even with the rapid evolution of technologies such as AI, there will always be a human component to this work because you need that human expert," he stated, noting that these tools are meant to alleviate some of the burdens of manual data abstraction, rather than replace the humans involved in the process.

Wang also noted that when using data to inform analytics efforts or develop algorithms, data curation challenges give way to data quality concerns. The adage "junk in, junk out" is often referenced in conversations around AI technologies to reinforce that the quality of an algorithm is dependent on the quality of its training data.

In the healthcare industry, this point is particularly salient both within and without the AI development space, as the creation of care guidelines and treatment paradigms are informed by insights from clinical trials.

Lori Brisbin, chief operating officer of PHI, explained that in the context of precision medicine, the inclusion of RWD alongside clinical trial data is particularly useful because the parameters of a clinical trial do not necessarily mirror real-world scenarios. Only patients who fit a rigid set of criteria might be eligible for a drug trial, but in practice, other patient groups might also benefit from the use of that drug, albeit in a different dose or combination with other medications.

Utilizing the high-quality data needed to understand patients like these and provide precision-driven care is at the crux of PHI's partnership with COTA.

PHI has a database that represents approximately 1.6 million patients' journeys, from diagnosis to treatment. Brisbin noted that much of this information is structured, but stored in disparate sources. Then there is the unstructured information, such as clinicians' notes, which hold valuable insights but are difficult to analyze.

To gain the full picture of each patient's journey, PHI works with COTA to curate both the structured and unstructured clinical data.

Building meaningful partnerships

Brisbin emphasized that choosing a partner to help PHI improve its data curation and processing came down to considerations of clinical acumen alongside data analytics expertise.

"[COTA] had a very, very strong medical oncology clinical knowledge base, and they were able to look at records to make recommendations to us where there may be data gaps," she said.

Wang highlighted that providers are generally aware of the value of their data to help improve operations and care quality, but often face hurdles in terms of deciding whether to work with a partner or undertake data curation initiatives on their own.

As with any data-related project in healthcare, the investment of personnel and resources can be costly for providers to undertake. But a further hurdle lies in the organization's data curation capabilities, which he differentiates from data abstraction.

He explained that data abstraction can be understood as identifying and pulling data elements from the medical record, which is simpler than data curation, which prioritizes data quality considerations alongside extraction.

Some providers might not have a good sense of the quality of the data they wish to extract, making a collaboration with outside partners that have data quality expertise potentially valuable.

Flagging data quality issues and gaps is particularly relevant for cancer care. Wang indicated that survival data, for example, is important for assessing oncological outcomes, but if a health system is missing a significant portion of that data within its patient population because death events were not reported, that data could be unusable to effectively measure outcomes.

However, if the organization is working with a partner in the data space, it might be able to supplement the gaps in its EHR data using third-party sources. Advances in interoperability and EHR ecosystems could also aid in this process, but the national scale of those challenges makes them unlikely to be resolved quickly.

"The foundation of this partnership is based on data and data insights -- the potential value of the data in giving PHI and Texas Oncology insight into their patient population -- and for them to then stack that on top of their IT technology to meet their needs," Wang stated.

Bringing AI to precision medicine

The partnership is helping PHI conduct hypothesis-driven studies and clinical trial enrichment, which are critical to advancing cancer care.

Brisbin noted that PHI's attrition rate -- a metric that quantifies the loss of participants in a study -- has significantly improved since partnering with COTA to utilize AI-driven data curation.

"We have the lowest attrition rate with COTA than any other data aggregator. So, that means if we send over 100 records, they're using 87 of those records -- with no missing data elements -- [to contribute] to a study. That's enormously high," she stated, noting that some other potential partners that PHI initially considered had attrition rates around 50%.

The AI also allows PHI to streamline clinical trial inclusion. The inclusion and exclusion criteria for these trials are rigorous -- factors like comorbidities are just one of many that could eliminate a patient from participating in a trial.

Identifying patients' trial eligibility requires sifting through entire medical records, but with well-curated data, this process is more straightforward. The addition of AI capabilities further streamlines these efforts by enabling keyword search and other workflow enhancements that make determining eligibility less cumbersome.

Brisbin underscored that the AI serves to reduce the burden of administrative tasks for PHI's workforce.

"If you apply AI to look for pictures of dogs, you're going to get all sorts of dogs. So, say you need to narrow it down to just pictures of German shepherds," she said. "You're going to get German shepherds, but you're probably also going to get wolves, huskies or even coyotes. You're going to get all sorts of things that are going to require somebody to look at it and say, 'No, that's a coyote, not a German shepherd.'"

"That's what we're saying -- AI is going to narrow [clinical data] down for us and make people's jobs easier, and just allow the experts to focus on that higher level of experience and allow them to work at the highest level," she concluded.

Shania Kennedy has been covering news related to health IT and analytics since 2022.

Dig Deeper on Precision medicine