Compare 7 top AutoML tools for machine learning workflows
From cloud-based platforms to open source options, compare the pros and cons of leading AutoML tools, which automate key machine learning tasks to accelerate workflows.
Machine learning workflows often involve time-consuming, repetitive -- and tedious -- tasks. Automated machine learning tools streamline these processes, saving time, reducing manual effort for engineers, and improving consistency and speed.
With a large and growing range of AutoML tools available, finding the best option can be a daunting task. Read on for guidance on the top AutoML software available today, including both commercial and open source options.
What is AutoML?
AutoML is the practice of automating repetitive or time-consuming tasks in machine learning workflows, such as data preprocessing, model selection and hyperparameter optimization. For teams building machine learning models or AI applications, AutoML offers several benefits:
- Saved time. AutoML automates labor-intensive tasks, speeding up workflows.
- Less manual labor. By reducing the amount of tedious work that engineers must perform manually, AutoML helps machine learning teams focus on more high-level or creative tasks.
- Error reduction. AutoML lowers the risk of introducing errors into ML workflows due to human oversight. It also helps eliminate inconsistencies arising from different engineers approaching the same task in diverse ways.
Because of these benefits, AutoML has become an increasingly important component of managing complex, large-scale ML projects. AutoML doesn't replace skilled data scientists, machine learning engineers and software developers, who will always be necessary for the complex tasks that automated tools can't handle. But by automating more basic tasks, AutoML helps teams complete projects more quickly and efficiently.
This article is part of
What is machine learning? Guide, definition and examples
A guide to 7 top AutoML tools
In recent years, a number of software tools have emerged to meet the growing demand for AutoML. Below, we'll look at some of the most popular AutoML tools in 2024.
Our selection was based on community-curated resources and AI product reviews, including the AutoML list and Awesome-AutoML GitHub repos, as well as a selection of tool suggestions on AutoML.org, an online academic community focused on AutoML. Tools appear in alphabetical order.
Amazon SageMaker Autopilot
SageMaker is a machine learning service hosted on AWS, with optional AutoML capabilities through SageMaker Autopilot. With Autopilot, users can upload data into SageMaker, then let Autopilot select an algorithm, preprocess the data, train a model and optimize hyperparameters.
A top advantage of SageMaker Autopilot is that it's fully managed and hosted, requiring virtually no setup or deployment effort from users. However, the major downside is that users are restricted to the tooling and features built into SageMaker, making it difficult to integrate with non-Amazon ML tools. It also requires the use of Amazon compute resources to support ML pipelines, which could pose a challenge for organizations with existing on-premises or private cloud ML infrastructure.
AutoKeras
AutoKeras is an open source AutoML tool built on Keras, a Python-based deep learning API. Its core features include data preprocessing, model selection, hyperparameter optimization and results analysis. Results analysis, which helps teams assess how well a model performs on an intended task, is arguably AutoKeras' most notable feature. The ability to analyze model performance is critical for rapid experimentation, which is AutoKeras' focus.
However, one drawback is that AutoKeras offers less in the realm of automated model training compared with other AutoML tools. Teams will need to configure training manually or rely on other automation tools to handle that part of the ML pipeline.
Auto-sklearn
Auto-sklearn is an open source AutoML tool built on the scikit-learn machine learning library, sometimes referred to as sklearn. While it supports automated model selection, its primary focus is hyperparameter optimization. Auto-sklearn uses probability models to test various configurations and identify those that will yield the best performance.
A potential drawback of auto-sklearn is its high consumption of compute and memory resources due to its intensive probability-testing approach. But for teams with lots of spare infrastructure capacity at their disposal -- or those that can move their ML pipelines to the cloud and consume unlimited resources on demand -- this is not likely to pose a major challenge.
Azure AutoML
Microsoft's Azure AutoML is another fully hosted and fully managed AutoML tool, similar in many respects to other cloud-based options like SageMaker Autopilot. Its core features include automatically selecting and training models, as well as optimizing parameters. Like other cloud-based AutoML tools, Azure AutoML is easy to deploy and use, but a drawback is that it integrates only with the Azure ecosystem of tools and services.
One product feature that sets Azure AutoML apart is Azure's focus on responsible AI, including a dashboard to track considerations like model fairness and causal inference. Although integrations between Azure AutoML and Azure's responsible AI features are currently limited, they are expected to expand as Microsoft continues to invest in its responsible AI efforts.
DataRobot
DataRobot is a commercial AI platform that includes some AutoML components. These include not just standard AutoML capabilities like data preprocessing and model selection, but also features related to building interpretability and explainability into models.
Compared with open source AutoML tools, DataRobot is likely easier for many users to deploy, especially those with limited coding abilities. And unlike AutoML tools tied to public cloud vendors, DataRobot is not locked into a specific ecosystem. Although users must work within DataRobot's AI platform to access its AutoML features, the platform offers a fair amount of flexibility in terms of deployable models and ML tools. That said, power users might find it more restrictive than open source AutoML tools, which tend to offer maximal control and customizability.
Google Cloud AutoML
Google Cloud AutoML is a suite of services within Google Cloud Platform (GCP) designed to automate various ML tasks. The most important part of the suite is Vertex AI, which offers features like data processing and hyperparameter tuning. However, GCP offers some additional tools -- such as AutoML Video, which supports automated streaming video analysis -- that can assist with certain specialized use cases.
Like other cloud-based AutoML tools, Google Cloud AutoML is tightly coupled with the GCP ecosystem, which makes it less flexible than most open source options. Compared with Azure AutoML and SageMaker Autopilot, Google Cloud AutoML can feel more fragmented, as it involves several distinct tools and services. On the other hand, Google Cloud AutoML has the advantage of simple integration with the powerful data analysis tools, such as BigQuery, that are built into GCP -- a platform that, of the big three cloud providers, has arguably invested the most in big data management and analytics.
H2O AutoML
H2O, a commercial AI platform, offers AutoML capabilities focused on model selection and training. The platform provides a largely hands-off experience: Users simply specify a data set and desired responses, then let the platform's AutoML feature set up an ideal model. In addition, users can set time limits for these tasks, helping prevent delays in model optimization workflows.
Unlike some commercial AutoML tools, H2O typically requires users to write their own code; it's neither no-code nor low-code. But engineers are still likely to find it easier to use than most open source AutoML tools, which tend to require more setup and configuration effort. H2O also is a relatively flexible platform compared with cloud-based AutoML tools.
Chris Tozzi is a freelance writer, research adviser, and professor of IT and society who has previously worked as a journalist and Linux systems administrator.