Getty Images

SingleStore adds Iceberg integration, improved vector search

The data platform vendor's latest update targets GenAI development by enabling easier access to unstructured data, making searches more efficient and optimizing cloud consumption.

SingleStore on Wednesday unveiled a new integration with Apache Iceberg aimed at enabling joint users to more easily use vast amounts of data often left untouched in data lakes and lakehouses.

In addition, the vendor introduced improved vector search performance, enhancements to its text search capabilities, new autoscaling capabilities and a new cloud offering that enables customers to deploy SingleStore on their own private cloud.

While each of the new features adds value for SingleStore customers, the integration with Apache Iceberg stands out, according to Kevin Petrie, an analyst at BARC U.S. Apache Iceberg is now the preferred table storage format for many enterprises, and many of SingleStore's competitors already support the open source platform.

"The announcement enables SingleStore to swim with the tide, meaning it helps customers operate on Apache Iceberg," he said.

In early June during user conferences held by fellow data platform vendors Databricks and Snowflake -- along with Databricks' June 4 acquisition of Tabular to add support for Apache Iceberg -- it became clear that Apache Iceberg is the most popular table format for diverse datasets, Petrie continued.

"Other large vendors such as Microsoft also are getting on board, and Dremio has been a longtime proponent," he said

Based in San Francisco, SingleStore is a data platform vendor whose platform works with on-premises, hybrid and cloud deployments. The vendor's tools are designed to quickly ingest data from a wide array of sources to fuel decisions in near real time.

In January, SingleStore introduced Pro Max, an rebranded version of its platform.

As a data platform vendor, SingleStore now competes with the likes of Snowflake and Databricks as well as tech giants, including AWS, Google, Microsoft and Oracle.

Additionally, as a former database specialist, SingleStore still vies with MongoDB and Couchbase, among others. Rockset, however, was acquired by generative AI vendor OpenAI on June 21, leaving Rockset's customers searching for a new database provider with SingleStore potentially attracting some of those users, according to Sanjeev Mohan, founder and principal at SanjMo.

"Companies are already gunning for Rockset's customers," he said.

New capabilities

Apache Iceberg, an open source format for storing large analytics tables, is one of the two main storage formats for data lakes and lakehouses. It is used as a foundation for data storage by vendors such as Snowflake and Google. Delta Lake, developed by Databricks and also open source, is the other main storage format for data lakes and lakehouses.

Data lakes and lakehouses, meanwhile, are the main storage repositories for unstructured data, which is now estimated to make up well over three-quarters of all data.

With unstructured data such as text, images and audio files being so ubiquitous, structured data such as financial records and point-of-sale transactions are no longer enough for enterprises to fully understand their operations. To get a complete view, they need to combine their structured data with their unstructured data.

In addition, AI models and applications -- including generative AI -- require as much high quality data as possible to deliver accurate outputs, making unstructured data vital to training models to understand an individual organization.

The announcement enables SingleStore to swim with the tide, meaning it helps customers operate on Apache Iceberg.
Kevin PetrieAnalyst, BARC U.S.

As a result, enterprises more than ever want to use their unstructured data.

Technologies such as vector embedding and retrieval-augmented generation now make that possible by automating pipelines for unstructured data and eliminating much of the complex manual labor previously needed to give structure to unstructured data to make it usable.

Given the increased need for unstructured data and technological advancements that make it accessible, vendors such as SingleStore need to support platforms such as Apache Iceberg that can handle large amounts of structured and unstructured data in one table. Because many enterprises prefer Apache Iceberg over other table storage formats, it's significant that SingleStore now integrates directly with the platform, according to Petrie.

"Data teams like Apache Iceberg because it provides open data access, minimizing the risk of vendor lock-in," he said. "It also supports transactional consistency across applications, schema evolution and time travel for querying historical datasets."

Madhukar Kumar, SingleStore's chief marketing officer, noted that Apache Iceberg was gaining popularity even before the surge in interest in generative AI sparked by OpenAI's launch of ChatGPT in November 2022.

However, Apache Iceberg and other table formats that enable enterprises to work with large amounts of disparate data have taken on even greater significance because of the rising interest enterprises have in developing generative AI models and applications.

As a result, SingleStore's integration with Apache Iceberg was in part a response to customer feedback.

"Very large [customers] have a massive amount of data in an Iceberg format and they have been looking for ways to utilize that for GenAI applications," Kumar said. "And then … in the [data management] industry itself, we are seeing Iceberg's adoption grow quite a lot."

In addition to the new integration with Apache Iceberg, SingleStore updated its platform by adding the following:

  • Improved vector search speeds that make discovery of relevant data 40% faster than previous iterations of SingleStore's platform, according to the vendor, and add new filtering capabilities for vector searches.
  • New text search capabilities to reduce the need for customers to deploy specialty databases for generative AI and real-time application development by improving the quality of results with the ability to interpret phonetic similarities and improving relevancy scoring.
  • Autoscaling, a feature that automatically scales up compute power consumption or reduces it to account for workload demand in a move aimed at helping customers control cloud computing costs.
  • Helios, a fully managed cloud offering that enables users who need to keep their data in a virtual private cloud (VPC) for security and governance reasons to deploy SingleStore within that VPC. Previously, customers deploying in a VPC had to self-manage SingleStore.

Both Helios and the new vector search capabilities are in private preview, while Autoscaling is in public preview and the text search capabilities are generally available.

Enabling vector search and text mining together, meanwhile, could have important benefits for organizations as they develop generative AI models and applications, according to Petrie.

"I really like the support for vector search alongside full text search," he said. "Despite all the hoopla about GenAI, retrieval-augmented generation and vector search, you need additional capabilities to ensure GenAI gets the facts straight. And text search is a good way to do this because it helps pull up the most relevant documents related to a user query."

Kumar similarly pointed to the importance of faster vector search capabilities as a significant addition. When used in concert with the Apache Iceberg integration, text search capabilities and Autoscaling, improved vector search speed enables enterprises to build real-time generative AI applications.

"Last year was a lot about experimentation," Kumar said. "This year, it's about enterprises going to production. That has a very different set of requirements -- massive amounts of data, scalability, extremely fast, and mixing and matching of data."

Future moves

SingleStore is primarily used by data experts, according to Kumar.

Users frequently want to work with petabytes of data and do so in milliseconds. They want to build complex applications such as knowledge graphs. To meet the needs of those users, SingleStore's roadmap includes improving the vendor's engine to make it more efficient, adding new capabilities to further enable model and application development, and add new connectors so customers can access data from new sources.

"You will see improvements in all three areas," Kumar said.

Petrie, meanwhile, noted that some database users find SingleStore alternatives such as Microsoft SQL Server, PostgreSQL and MySQL easier to use. As a result, further investment in making its tools easier to use would be wise.

"That's an ongoing area of development for SingleStore," Petrie said.

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Data management strategies