JRB - Fotolia
GPU databases bring greater parallelism to big data processing
GPU databases offer a new way to process data. 451 Research analyst James Curtis discusses where they fit in big data applications, particularly for parallel processing.
New data technologies are continually arriving these days. Among the latest entrants are GPU databases. Often associated with graph data processing tasks, these databases tap into the power of the GPU, which has gone beyond its original home in graphics displays and gaming systems to take on machine learning and other big data tasks.
In this Q&A, James Curtis offers his view. As senior analyst covering data platforms and analytics at 451 Research, he has been digging deep into the new technology to see where it may fit best.
We are seeing big data driving the entry of GPUs into the data center. We've even seen GPU databases appear in the form of Blazegraph, Kinetica, MapD and others. How extensive do you think this will become?
James Curtis: New databases tend to be tuned for certain workloads or scenarios.
The GPU database vendors are taking a few approaches. Some of them are trying to come to market with a broader perspective -- with something similar to a general-purpose data warehouse. They are driving SQL-92 compliance.
The GPUs can ingest a lot of data -- they can swallow it and process it whole. People can leverage these GPUs with certain queries. For example, they do geospatial analytics faster.
But, with something like a 20-page SQL statement -- a very complex SQL query with a lot of steps -- a GPU might not give you the performance benefits of a normal, in-memory SQL database. A GPU needs something that can run in parallel. And, with something like geospatial data, the math can be run in parallel.
Will GPUs replace CPUs?
Curtis: GPUs are not a replacement for the CPUs. They can be very specific in their application. They are parallel-process driven. Machine learning is an area where this parallel processing can be useful. But the idea of doing machine learning with GPU databases has not caught on fully yet.
James Curtis451 Group
Still, GPUs are not a fad. As people are building databases around GPUs, it brings another level of discussion to the broader database market.
I think GPUs will become somewhat ubiquitous as part of a number of systems, and that optimizers will get very good at reviewing queries, seeing what can run in parallel, and throwing the right queries at the GPU or CPU -- deciding where it is best to run different workloads or parts of workloads.
There was a time when the relational database management system seemed poised to handle almost any job people had. Now we have such a variety in data processing.
Curtis: Most any database can do what you want it to do -- but the question is whether it will do it quickly and efficiently. I am not aware of any single database that is extremely good at everything.
That's why you have NoSQL. There was a gap in the market that said some of the larger relational systems weren't scaling very well, and they were getting expensive to maintain. But, in the same way relational SQL is not good for everything, NoSQL is not good for everything either.
You are seeing what we call hybrid operational analytical processing, or HOAP. The idea there is that you stream data in as part of transactions and you have a small window there that allows you to do an analytical function or carry out an action on the transaction.
Companies want to do that, but that is a stretch. They want to do big data analytics all along the continuum.
The need and the thirst to do analytics is driving a lot of use cases. It drives the use of GPUs, for example -- people want to do analytics faster on all of their data.