IBM plans to acquire DataStax, a longtime open source database and real-time event streaming vendor that recently rebranded as an AI company.
The terms of the deal, expected to close in the second quarter, were not disclosed. DataStax has raised more than $343 million since its founding in 2010.
IBM said DataStax's technology will support the tech giant's family of Watsonx generative AI models, which aim to help enterprises derive value from unstructured data.
Open source strategy
While the Watsonx line is not fully open source, it provides users with access to open source models. Some of IBM's Granite large language models (LLMs) are open source.
For its part, DataStax has long played an active role in the open source database community, with its Astra DB and DataStax Enterprise databases and NoSQL and vector database capabilities based on Apache Cassandra and Langflow, the open source tool and community for low-code AI application development.
Among DataStax's key capabilities are built-in and easy-to-use retrieval augmented generation (RAG) features for retrieving and generating data for LLMs.
While this solidifies IBM's position in the open source AI space, it will be interesting to see how they will license DataStax going forward to fit with their open source model.
Andy ThuraiAnalyst, Constellation Research
"IBM has supported the open source Langflow tool for a while. But with the combination of the DataStax hybrid vector database and … RAG capabilities, IBM is hoping to gear enterprise RAG adoption more easily," said Andy Thurai, an analyst at Constellation Research. "While this solidifies IBM's position in the open source AI space, it will be interesting to see how they will license DataStax going forward to fit with their open source model."
The acquisition will strengthen IBM's efforts to scale generative AI applications for enterprise data, according to the vendor. DataStax's vector database excels at harnessing unstructured enterprise data. And Langflow supplies a graphical, low-code design environment and component orchestration for generative AI applications, IBM said.
IBM said it will continue to support the Apache Cassandra, Langflow, Apache Pulsar and OpenSearch communities, in which DataStax is involved.
Data-driven
Doug Henschen, another Constellation Research analyst, noted that DataStax provides the data underpinning for online giants such as Netflix, Overstock, Priceline and Intuit, among others.
While IBM's announcement of the planned purchase of DataStax mostly emphasized generative AI opportunities because of DataStax's introduction last year of vector storage and embedding capabilities, DataStax's "underlying platform is solid and geared to massive, global-scale deployments," Henschen said.
However, Henschen noted that DataStax has faced increasing competition in recent years, mostly from the big cloud providers, particularly AWS, with its DynamoDB scalable NoSQL database and Amazon Keyspaces for Apache Cassandra.
"It will be interesting to see whether big, cloud-native companies with skilled engineering teams turn to self-managing Cassandra in the wake of this acquisition," Henschen said.
Building enterprise AI applications requires a faster and more intelligent way for enterprises to train LLMs with RAG, Thurai said.
DataStax's technology "will help IBM solve vectorizing and retrieving data as RAG to feed LLMs with large-scale unstructured data," he said.
Meanwhile, Subbu Iyer, CEO of longtime DataStax competitor Aerospike, said IBM's planned acquisition highlights the critical importance of data in the age of generative AI. Aerospike, a vendor of a NoSQL database, raised $100 million in a funding round last year.
"It's clear that data drives the AI revolution. For AI to realize its potential, organizations need massive amounts of real-time data to provide timely, accurate and meaningful results," Iyer said.
Shaun Sutner is senior news director for Informa TechTarget's information management team, driving coverage of artificial intelligence, unified communications, analytics and data management technologies. He is a veteran journalist with more than 30 years of news experience.