AutoML platforms push data science projects to the finish line
Data science projects often have trouble reaching the production phase, but automated machine learning platforms are accelerating data scientists' work to help them come to fruition.
Since businesses often don't have the time or resources to support the long and tedious work required to complete data science projects, most of them never come to fruition.
The fairly recent development of automated machine learning, or AutoML, rectifies this by speeding up the work data scientists perform through automation. Dennis Michael Sawyers, data scientist and author of Automated Machine Learning with Microsoft Azure, uses Azure's AutoML product as the foremost example of how automated ML software expedites and simplifies this otherwise arduous work.
In this Q&A, Sawyers discusses the evolution of automated machine learning platforms and how they are used to develop ML models.
Editor's note: The following interview was edited for length and clarity.
Will automated machine learning as a concept become a trending topic in 2022, or will it take longer?
Dennis Michael Sawyers: I think automated machine learning as a concept will become a trending topic in 2022 due to the data science labor shortage. It's more important than ever to get new data scientists up and running as quickly as possible and for existing data scientists to automate as much of their labor as possible to increase productivity. AutoML lets data scientists build models very quickly and also lets new data scientists on-ramp very quickly. Instead of having to learn how to prepare data for each algorithm, they can focus on learning how to prepare data for AutoML programs instead.
Which automated machine learning platforms will be the biggest competitors to the one offered by Microsoft Azure in the foreseeable future?
Sawyers: I think Databricks AutoML, DataRobot and Google Vertex AI AutoML are the biggest competitors to Azure AutoML.
Databricks is positioning itself as the 'data and AI company' across all three major clouds (AWS, Google Cloud Platform and Azure) and it does have an AutoML feature that will surely take off due to Databricks' large and established user base. GCP positions AI as its main strength and has fairly advanced AutoML capabilities across multiple categories of data, including tabular, video, text and images. DataRobot is the most popular AutoML vendor and is mostly focused on making machine learning as accessible as possible to companies even if they lack data scientists.
Dennis Michael SawyersAuthor, 'Automated Machine Learning with Microsoft Azure'
Will products like AutoML be used in course curriculums for those learning about machine learning?
Sawyers: I think AutoML should be the main focus of course curriculums for those learning about machine learning. Since most machine learning problems can be handled fairly well by AutoML these days, it makes sense that it should be the first and main tool that data scientists use before building out custom models. That said, there are still many use cases where building a custom model makes more sense, as AutoML programs still lag in areas like custom scoring.
How steep is the learning curve for someone without machine learning experience to successfully use AutoML?
Sawyers: It depends. If you have a lot of experience in using and manipulating data, you will find that the learning curve to using AutoML isn't very steep. However, there are many 'gotchas' in machine learning, and you need to spend a lot of time formulating the business problem as a data problem. It's super important to learn to avoid overfitting and data leakage and how to score models in a way that's representative of how they behave in the real world. That's just as true when using AutoML as it is for learning how to build custom models.