Petya Petrova - Fotolia
Data managers should study up on GPU deep learning
As GPU deep learning becomes more common, data managers will have to navigate several new layers of complexity in their quest to build or buy suitable data infrastructure.
AI-related deep learning and machine learning techniques have become a common area of discussion in big data circles. The trend is something for data managers to keep an eye on for a number of reasons, not the least of which is the new technologies' potential effect on modern data infrastructure.
Increasingly at the center of the discussion is the graphics processor unit (GPU). It has become an established figure on the AI landscape. GPU deep learning has been bubbling under the surface for some time, but the pace of development is quickening.
Deep learning is the branch of AI machine learning that works very recursively on many levels of neural networks comprising ultra-large data sets. GPU deep learning is a particularly potent combination of hardware infrastructure and advanced software aimed at use cases ranging from the recommendation engine to the autonomous car.
Today's GPUs have very high memory bandwidth that enables them to crunch big data with gusto -- think matrix multiplication. As a result, they have an affinity for the type of parallel processing that deep learning requires. This is particularly useful at the training stage of deep learning model creation.
GPUs are still relatively rare compared to CPUs. As a result, the GPU chips cost more, as do the developers that can work with them. Also, their use is expanding beyond AI deep learning, as they show up in graph databases, Apache Spark and, most recently, Apache Hadoop 3.0 Yarn implementations.
'Til there was GPU
Data managers that remember the world before all-purpose 32-bit CPUs may recall floating point coprocessors and array processors, as well as the special uses they served in some applications.
What should managers look out for when a new generation of developers tells them their applications need to move to a math-heavy GPU infrastructure? The answer is that there are a lot of moving parts involved.
For one, things work one way when the job is limited to a handful of GPUs running on a single server. According to Bernard Fraenkel, there are gotchas to consider when deep learning jobs go beyond the single server. It's not just about the chip.
"When you reach the point that you need to use more than one server, then you probably don't have real guarantees that the bandwidth between the two machines will be acceptable," said Fraenkel, who is the practice manager at Silicon Valley Software Group, a technology consulting practice based in San Francisco. "It's hard to foresee the overhead of the inter-server communications."
Inter-server issues surface
Inter-server issues have led cloud providers, server houses and chipmakers to seek improvements at the board and server level, Fraenkel said. But with each improvement, a GPU deep learning implementation can become more closely tied to the system it has been running on, and it can become harder to successfully migrate to another system. That is a gotcha.
In addition, cloud providers and others are working to optimize software to run on their setups, and this too becomes an encumbrance to rehosting your deep learning application on premises or in other clouds. Also, migrations that require additional computation will take a bigger piece of your budget, of course.
What is important, Fraenkel emphasized, is to understand that things are changing very fast in this area. That means data managers should take special heed, even if GPU deep learning is not on their immediate agenda.
"We are still at an early stage of [the] application of artificial intelligence. Algorithms, as well as hardware -- such as chips, servers and data centers -- are still evolving rapidly," Fraenkel said.
Bernard FraenkelSilicon Valley Software Group
Moreover, CIOs especially need to learn about all the layers involved in building these applications.
"They should be evaluating and becoming cogent of all these moving parts now," Fraenkel said. "It's not something that you pick up in one quarter."
There is precedence for this type of infrastructure upheaval, but there is also much about it that is new.
Advances in big data analytics influenced infrastructure changes in recent years -- columnar databases and distributed file systems come to mind straightaway -- but any changes at the chip level were usually slight.
Deep learning, particularly, seems to be a different animal -- one calling for, as Fraenkel observed, system changes at the chip level and above. Also, GPUs, machine learning and deep learning may be among several inflection points for big data analytics in the future, as a slew of new artificial intelligence chips are being prepared for special use cases.
For now, many managers will be watching deep learning-based GPU activity as spectators, not participants. But taking a somewhat avid interest may be a good measure, as GPU deep learning is moving very quickly and may have a significant impact.