Sergey - stock.adobe.com
Automated machine learning streamlines model building
Automated machine learning leads to faster model building while democratizing use and increasing implementation. Expert Mike Gualtieri answers major questions about the rising tech.
Data science platforms are gunning toward automation. With the widespread release and popularity of Google Cloud AutoML, DataRobot Inc.'s namesake tools and other automated machine learning platforms, analysts, businesses and users are beginning to tap into the technology and the rapidity of automation.
In this Q&A, Mike Gualtieri, vice president and principal analyst at Forrester Research, outlines the state of automated machine learning platforms and their use cases.
Editor's note: The following has been edited for clarity and brevity.
What are some key capabilities of automated machine learning platforms?
Mike Gualtieri: The goal of these platforms is to produce a machine learning model. That model is used to predict something, to make a decision or to identify something. You might want to use it to identify quality problems in manufacturing, predict a recommendation for up-selling, or to help make a decision of whether or not you should shut down a machine before it breaks down. The output of the auto ML [machine learning] is the model, and the process of analyzing that is generally known as data science.
The lifecycle [starts by] acquiring data, and the auto ML platforms generally don't get involved in the acquisition of data. The next step -- that they do get involved in -- is data preparation and making sure that the data is high quality. Then, [they perform] feature engineering.
After feature engineering, you use multiple machine learning algorithms to analyze the data set and find a model. The question is, is the model good enough? Is it accurate enough? That leads to model evaluation and various techniques for evaluating models, like cross-validation. The final step is the deploying of that model, using it in some application.
When you're writing that in code, you have to walk those steps individually. Auto ML solutions are designed to do all of this at once and then spit out a list of models ranked by their accuracy. The result of automating those steps is much faster than the alternative of code.
Are automated machine learning platforms geared more toward citizen data scientists and enterprise teams that don't have data scientists or are they universal use?
Gualtieri: If you look at the platforms in the market, some are oriented strictly for citizen data scientists and some are oriented toward automation for data scientists. Some are a little bit of both. Automating many of the steps eliminates the need to understand the details of the steps, so that makes it good for a citizen data scientist. But if you're a data scientist, it's faster for you, too.
One platform oriented toward citizen data scientists is DMway. One platform oriented toward data scientists is H2O Driverless AI -- it's automation designed by data scientists for data scientists.
DataRobot is oriented toward citizen data scientists. Data scientists may be comfortable with the steps being automated, but they're going to want a little more visibility into what the automation process actually did.
There are other approaches, too. Google has AutoML that is geared toward special types of use cases that use deep learning algorithms -- computer vision, voice recognition or text analysis.
Traditional companies have some auto ML features built into their existing products. They're not pure auto ML because they also provide coding and traditional drag-and-drop design features. We're seeing auto ML not just with vendors solely focused on it, but we also see it being added as a feature in traditional machine learning tools, such as SAS, IBM and RapidMiner [tools].
How mature are these technologies? What's the development process heading toward?
Gualtieri: We believe that every enterprise will -- at some point in the next few years -- have an auto ML capability; but it's not good for all use cases, especially ones that involve new algorithms. There's a lag time between when you can change and add new algorithms.
It's very similar to software development for an enterprise. A tool like Tableau for visualization and creating dashboards with no code hasn't eliminated the need for coding. Enterprises still have Java programmers or Python programmers. It's very similar in the discipline of data science. Auto ML will absolutely enable citizen data scientists and make data scientists faster, but there will always have to be a core group that's going to do this the old-fashion way.
It sounds similar to the general conversation around implementing AI -- it won't eliminate your need for regular workers, it will just help their processes along.
Gualtieri: Yeah, absolutely. The demand for machine learning models is outstripping the supply of data scientists. You don't need to hire more data scientists if you can make them a thousand times more productive.
Auto ML can actually achieve results in a lot of use cases that make data scientists more productive. The automated part of creating the models will not eliminate the need for data scientists, but make the process easier and faster.
This is also good for innovation for small companies that want to build AI solutions. The level of automation and sophistication that auto ML has now is really giving citizen data scientists tools to [implement AI].
Will automated machine learning trickle down to the average user, such as an entry-level worker who's not versed in data science?
Gualtieri: Citizen data scientists are not data scientists -- they don't have all the skills. Having said that, you have to understand the concept of what you're trying to do. You have to understand data. None of the [auto machine learning tools] work without data, so you have to be data-savvy. I wouldn't go as far as total entry level.
Now here's an exception to that -- Google AutoML for computer vision. If I had a thousand images of cars, I could label those cars by their color, by like or dislike, and upload that to Google, and they'll create a model. So, for some use cases, auto ML is enormously democratizing for even the least knowledgeable user. For other use cases, you have to be a little more data-savvy.
Do you see that as the case for most AI?
Gualtieri: AI is more about makers and builders rather than users. Auto ML especially will expand the footprint of the people who can become makers of AI, but not to everyone. The biggest impact is with large companies that have an AI strategy. This set of tools will allow them to complete it that much faster and also let them fail faster.
That's the thing about machine learning -- it doesn't always work. You have an idea but you might not have the right data and don't want to spend weeks or months trying just to realize [you] can't do it. The auto ML can bring you to a conclusion quicker -- that this isn't going to work and to move onto the next use case.
What are the major challenges and limitations of using these auto machine learning platforms?
Gualtieri: There's three main challenges. The first one is that they're limited in use cases. Google has a lot of deep learning use cases on computer vision, but DataRobot doesn't do computer vision, H2O doesn't do computer vision. They don't cover all possible use cases.
Number two is explainability, or transparency. One of the ways you automate things is behind the curtain. Citizen data scientists may not care that much, but data scientists and people who are going to use these models in business have to be able to explain why the model works. And if it's a regulated industry, they have to explain that they're not using information that may have unfair bias in a model, for example.
The third challenge is many data scientists have been reluctant and critical of this automated approach. They're trained to program -- that's their skill -- and they may feel reticent about using a tool that diminishes their value. We don't think it will, and minds are changing.
Kaggle.com, a website that Google bought a couple of years ago, has data science challenges. Some of the world's best data scientists will download a data set, they'll build the model and then whoever's model is most accurate [wins]. A lot of those Kaggle contest winners -- called Kaggle grandmasters -- are the ones who are designing auto ML, and that's bringing a lot more credibility and less criticism from the data science community.