ra2 studio - Fotolia

Hyperscale providers bet big on cloud AI

AI cloud services have emerged as yet another battleground for the hyperscale providers, as they entice data scientists and developers to train their models on their platforms.

If AI is indeed the future of IT, then the cloud vendors plan to be front and center to corral the impending wave of adoption.

There's no shortage of hype surrounding artificial intelligence and machine learning (ML), and the major cloud vendors have banked on that excitement generating big business for them in the years to come.

Amazon Web Services (AWS), Microsoft, Google, IBM and others have added dozens of cloud AI tools in the past year, with varying degrees of complexity. Whether these platforms are the best place for these workloads depends on how AI and machine learning fit into a company's business strategy. Nevertheless, these cloud companies have scurried to fill gaps in their services and make AI accessible to companies that make a living off machine learning, as well as those under pressure to have a strategy but lack experience.

Despite some early success stories of AI-based applications built on the cloud, most of the market is still on the sidelines, particularly with deep learning. Companies need to be selective in the techniques they adopt, whether that involves building from scratch or simply incorporating some of the API-driven cloud services, such as speech and image recognition, said Chirag Dekate, an analyst at Gartner.

"IT leaders need to realize -- and most do -- that AI shouldn't be used as a hammer, but as a scalpel," he said. "Amazon, Google, Microsoft and others are investing so heavily in AI for internal consumption and external cloud-based consumption because they understand there's going to be enormous value for these advanced analytics capabilities."

Gartner analyst Chirag DekateChirag Dekate

Going forward, Dekate said he expects these providers to be even more aggressive in adding cloud AI capabilities. And the current lack of familiarity with AI could play into cloud vendors' hands as they vie for customers, particularly those that want to experiment first.

Syniverse, a mobile networking company based in Tampa, Fla., extends its vRealize-orchestrated private cloud to IBM Cloud and AWS through a partnership with VMware. The company hasn't used many cloud-native services, but sees them as potential differentiators for one platform over another.

"One of the areas we see as very interesting is in AI and machine learning-based tools that allow us to essentially create new reporting and analytics for customers quickly," said Chris Rivera, CTO at Syniverse.

AI cloud services, from beginner to advanced

Cloud providers have essentially built three layers of AI services on their platforms. The lowest level -- which is the most complex, but potentially provides the best performance -- is at the infrastructure layer. The major providers support popular frameworks such as TensorFlow or Apache MXNet and GPU-based virtual machines that can then be connected to other could services to build and train models. Google went a step further with the integration this month with the beta release of its TensorFlow-integrated TPU instance type, which is built on custom processors.

The next level up is an emerging space that's still tailored to data scientists, but which abstracts much of the underlying infrastructure and integrates the hardware configurations and ML frameworks. It pushes AI into more of the as-a-service category, with tools such as IBM Watson, Amazon SageMaker, Microsoft Machine Learning Studio, and Google Machine Learning Engine and Google AutoML.

And the final layer involves API-based plug-in services that can be integrated into existing applications. These are geared toward AI novices, and all the major vendors have some flavor of services that provide tools for cognitive, speech and image recognition.

"Whether you're a data scientist or architect or developer trying to develop a smart AI-based app, they're basically trying to pull you into their ecosystem," Dekate said.

AI components
AI components

Know the benefits and drawbacks of cloud AI

However, the public cloud runs into limitations with deep learning and heavy users, because these GPU-accelerated nodes require more compute power and the training models need vast amounts of data to be stored and processed.

"Most organizations are trying to figure out how to jump-start [AI] with the biggest Capex," he said. "But if deep learning is a mainstay of your organization, then building on premises makes a lot more sense."

Deep learning is useful for specific needs, such as image recognition and text analysis, but even the creators of deep learning neural networks acknowledge it's not a silver bullet to solve the broader problems of AI for companies, Dekate said. And while there is a head-to-head cost advantage to doing it in-house, there's the important caveat of data gravity. If a corporation's data already sits on a public cloud, it will be more effective to do that work there, rather than incur the costs of moving it, he said.

New York-based Alpha Vertex trains machine learning models on Google Cloud Platform to incorporate into its analytics services for the financial sector. Cost would certainly be a concern if those models ran on the largest instance types all day, but the company has architected its infrastructure to use cheaper, smaller Preemptible VMs and Spot Instances. It also uses Kubernetes to scale from roughly 20 VMs to more than 1,000 when it trains the analytics models, which avoids underutilized in-house resources.

"Having Kubernetes in there was the difference between that being something we could manage with one or two people versus an entire department," said Michael Bishop, CTO at Alpha Vertex.

The company's cost-benefit analysis to move these models in-house consistently supports the need to keep them in the cloud to stay on top of the technology.

IT leaders need to realize -- and most do -- that AI shouldn't be used as a hammer, but as a scalpel.
Chirag Dekateanalyst, Gartner

"The cost of higher-end GPUs is fairly enormous, and then they do not have a nice amortization lifecycle," Bishop said. "If you're relying on your bread and butter being the latest and greatest accelerator, you really will have hard time keeping up with that yourself."

Zendesk built Answer Bot, a virtual assistant for its customers that uses Amazon Simple Storage Service, GPU instances, TensorFlow and Amazon Aurora. The bot uses deep learning predictive models to identify common problems and more quickly answer customer questions and suggest best practices.

Answer Bot arrived before AWS added its SageMaker service late last year to abstract much of the underlying infrastructure management, but Zendesk will consider the service for future endeavors for the same reason it has used  AWS since 2011: to offload underlying IT operations and focus on its core business.

"None of that [administrative work] is really data science work," said Steve Loyd, vice president of technology operations at Zendesk, based in San Francisco. "The promise of SageMaker is that it gives you more of a full set of interface and automation built around TensorFlow [and] allows you to do more with less."

Beyond the hype, AI's worth the effort

Data scientists not only build these models, they constantly validate them. The better a tool addresses the underlying infrastructure itself, the more time for data scientists to tweak their algorithms. As AWS and other cloud providers make their AI tool sets easier to use, the barrier to entry for machine learning continues to go down because it's easier to take a data set and get something out of it, Loyd said.

But even AI users say it's not a panacea, especially because most of the models' capabilities are relatively simplistic. Many companies are convinced they need AI, but have no idea what to do about it.

"One of the biggest misconceptions is that it's like some alchemy or magic box, and you just throw stuff in and on the other side some amazing insights come out," Alpha Vertex's Bishop said. "It's a very difficult slog to get high-quality outcomes, and I don't think people fully appreciate that."

Still, they caution it would be unwise to forgo AI just because the hype doesn't match reality. And more important than cloud vendors' AI menus is how adept companies are to integrate these technologies and speed their own innovations, Dekate said. The most successful companies will be pragmatic, with good foundations around data and infrastructure management.

"Every organization, no matter how big or small, needs to have an AI strategy," he said. "Machine learning and AI are long-term things, but it's important to engage in it right now so they can be ahead of the competition."

Dig Deeper on Cloud deployment and architecture