ipopba/istock via Getty Images
Open-Source Framework Simplifies Machine Learning Processes
The system brings hospital data through a set of machine learning models to help inform clinical decision-making.
Researchers at MIT’s Data to AI Lab (DAI Lab) have developed a new framework that can streamline machine learning processes to help organizations uncover actionable insights from big data.
The system, called Cardea, is open-source and uses generalizable techniques so that hospitals can share machine learning solutions with each other, leading to increased transparency and collaboration.
To develop Cardea, researchers leveraged automated machine learning, or AutoML. The goal of AutoML is to democratize predictive tools, making it easier for people to build, use, and understand machine learning models.
AutoML systems like Cardea surface existing machine learning tools instead of requiring individuals to design and code entire models. Additionally, AutoML systems include explanations of what they do and how they work, allowing users to mix and match modules to accomplish their goals.
Researchers noted that while data scientists have built multiple machine learning tools for healthcare, most of them aren’t very accessible, even to experts.
"They're written up in papers and hidden away," said Sarah Alnegheimish, a graduate student in MIT's Laboratory for Information and Decision Systems (LIDS).
To build Cardea, the team has been bringing these tools together to develop a comprehensive reference for hospital leaders.
Cardea walks users through a pipeline that features choices and safeguards at each step. Users are first greeted by a data assembler, which ingests the information they provide. Cardea is built to work with Fast Healthcare Interoperability Resources (FHIR), the current industry standard for EHRs.
Because hospitals vary in how they use FHIR, researchers built the system to adapt to different conditions and different datasets seamlessly. If there are discrepancies within the data, Cardea’s data auditor points them out so that they can be fixed or dismissed.
Next, Cardea asks users what they want to find out. For example, a provider may want to estimate how long a patient may stay in the hospital – a critical question in the context of the current pandemic, with healthcare organizations looking to manage resources.
Users can choose between different models, and the software system then uses the dataset and models to learn patterns from previous patients. The system predicts what could happen, helping stakeholders plan ahead.
Cardea is currently set up to help with four types of resource-allocation questions. But because the pipeline incorporates so many different models, the system can be easily adapted to other scenarios that might arise. As Cardea continues to develop, the goal is for stakeholders to be able to use it to solve multiple prediction problems in the healthcare sector.
Researchers tested the accuracy of the system against users of a popular data science platform and found that it outperformed 90 percent of them. The team also tested the system’s efficacy, asking data analysts to use Cardea to make predictions on a demo healthcare dataset. The results showed that Cardea significantly improved their efficacy. For example, feature engineering took researchers five minutes when it would typically take an average of two hours.
In building the Cardea system, researchers aimed to ensure that hospital workers would be able to trust the tool.
"They should get some sense of the model, and they should know what is going on," said Dongyu Liu, a postdoc in LIDS.
To build in even more transparency, Cardea’s next step is a model audit. By laying out a machine learning model’s strengths and weaknesses, the system gives users the ability to decide whether to accept this model’s results or to start again with a new one.
Researchers released Cardea to the public earlier this year. Because it’s open-source, users are able to integrate their own tools. The team also made sure that the software system is not only available, but also understandable and easy to use. This will also help with reproducibility, researchers noted, so that other individuals can check and understand predictions made on models built with the software.
The team also plans to build in more data visualizers and explanations to provide an even deeper view and make the software system more accessible to non-experts.
"The hope is for people to adopt it, and start contributing to it," said Alnegheimish. "With the help of the community, we can make it something much more powerful."