
Ace2020/istock via Getty Images
Study: ML mortality prediction misses 66% of severe injuries
When given test cases with different injury levels, ML mortality prediction models gave inconsistent risk scores, overestimating moderate injuries and underestimating severe ones.
New research has revealed serious deficiencies in the responsiveness of machine learning (ML) mortality prediction models.
The study published in Nature Communications Medicine found that ML mortality prediction models trained solely on patient data failed to recognize about 66% of injuries that could lead to patient death in the hospital.
Using multiple medical ML testing approaches, including a gradient ascent method and neural activation map, researchers tested model accuracy using publicly available datasets on the health and metrics of patients in ICUs or with cancer.
The study found that in-hospital mortality prediction models failed to generate alerts for bradypnea (low respiratory rates) or hypoglycemia conditions.
When given test cases representing various injury levels, neural network models gave inconsistent risk predictions, assigning higher mortality risk to cases of moderate injury while assigning disproportionately lower risk to severe injuries.
In addition to deficiencies with ML mortality prediction models, the study found similar issues with the responsiveness of five-year breast and lung cancer prediction models.
“Our findings highlight the importance of measuring how clinical ML models respond to serious patient conditions,” the study authors wrote. “Our results show that most ML models tested are unable to adequately respond to patients who are seriously ill, even when multiple vital signs are extremely abnormal.”
The study underscores the importance of using medical knowledge-based testing to ensure ML models are accurate, as well as the need to incorporate medical knowledge into model design. Further, the researchers called for new ML responsiveness metrics in healthcare.
“ML responsiveness is a new problem,” they wrote. “It differs from the well-studied ML robustness. ML robustness aims to ensure model stability and the ability to resist sample perturbations so that small (maliciously injected) noises to samples cannot change the prediction results.”
Lipschitzness, a common ML robustness metric, measures models’ resilience to noisy data and perturbations. However, optimizing Lipschitzness for healthcare use cases may lead to models being even more insensitive to changes in patient conditions by lessening the models’ ability to capture critical input changes, the authors noted.
“Our work provides the first look into ML responsiveness,” they wrote. “Comprehensive measurement studies in other medical settings are needed.”
Hannah Nelson has been covering news related to health information technology and health data interoperability since 2020.