Getty Images

Cockroach Labs adds vector search, updates pricing options

The database vendor's new update includes an integration with pgvector that provides users with the vector search capabilities needed to feed RAG pipelines and develop GenAI tools.

Cockroach Labs on Thursday unveiled vector search capabilities aimed at enabling customers to access and operationalize unstructured data to train generative AI models and applications.

In addition, the vendor introduced a new tool designed to improve efficiency by reducing query times and optimizing usage as well as new pricing tiers for CockroachDB Cloud. Each of the new features is part of Cockroach Labs' CockroachDB 24.2 update.

As enterprise interest in generative AI has exploded in popularity over the past two years, vector search has become a common means of discovering the data -- much of it unstructured -- needed for the retrieval-augmented generation (RAG) pipelines that feed and train generative AI tools. As a result, adding vector search capabilities is important for database vendors such as Cockroach Labs, according to Stephen Catanzano, an analyst at TechTarget's Enterprise Strategy Group.

"Vector search is a crucial advancement for CockroachDB because it allows users to handle unstructured data," he said. "By adding vector search, Cockroach enables users to … manage data more intelligently. This is especially important as enterprises increasingly rely on AI and need databases that can handle vectors for enhanced performance and accuracy."

Based in New York City, Cockroach Labs is a database vendor that provides a cloud-native SQL database platform.

To date, the vendor has raised more than $600 million in funding, including $278 million in January 2021 and $160 million in May 2020. Competitors, meanwhile, include other database specialists such as MongoDB and Yugabyte as well as database offering from tech giants including Amazon DynamoDB and Microsoft SQL Server.

New capabilities

OpenAI's November 2022 launch of ChatGPT marked a significant advance in large language model (LLM) capabilities.

Since then, many enterprises have made developing generative AI features a priority, combining LLM capabilities with their own proprietary data to develop models and applications that understand their business.

Using such models and applications, enterprises can develop generative AI assistants that enable users of any skill level to use natural language processing to query and analyze data to make informed decisions. In addition, enterprises can program models and applications to take on repetitive tasks that otherwise need to be performed by data engineers and other experts, thus making those experts more efficient.

However, joining the capabilities of LLMs with proprietary data to train generative AI tools is not simple.

Without large volumes of quality data -- and even sometimes with it -- generative AI tools are prone to AI hallucinations, which are incorrect and sometimes bizarre outputs that can have serious consequences if not caught by humans. To feed models and applications with enough data to reduce the likelihood of hallucinations, unstructured data is needed.

Unstructured data such as text, images and audio files is estimated to make up over 80% of all data. Without some form of structure, however, data is difficult to operationalize. Vectors, which are numerical representations of data automatically assigned by algorithms, give unstructured data the structure it needs to be searched and discovered.

Therefore, to meet the needs of customers wanting to develop generative AI tools, many database specialists and other data management vendors have added vector search and storage capabilities.

For example, Cockroach Labs competitors including MongoDB and Couchbase now offer vector search and storage while tech giants AWS and Oracle have made vector search and storage central to their database strategies.

Now, Cockroach Labs is introducing its own vector search capabilities, adding tools that are critical for any database vendor as enterprise interest in generative AI surges, according to Kevin Petrie, an analyst at BARC U.S.

CockroachDB's vector search capabilities are enabled by an integration with pgvector, an open source tool for PostgreSQL databases that uses semantic modeling to improve vector searches. Through the integration, Cockroach Labs customers can now perform semantic searches across large vector datasets to discover data relevant to generative AI models and applications such as recommendation engines and AI assistants.

"Given the popularity of GenAI, vector search has become somewhat of a must-have feature among database vendors," Petrie said.

In a typical RAG workflow, vector search is a way for enterprises to apply generative AI language models to their own proprietary data, he continued. The vector databases find and retrieve unstructured data, such as text or imagery, then feed it into pipelines to make generative AI language models less likely to hallucinate.

"Recognizing this opportunity, many database vendors are adding vector search features," Petrie said.

Traditional search vs. vector search.
Database vendor Cockroach Labs is adding vector search capabilities through an integration with pgvector.

While some vendors have had vector search capabilities for more than a year -- and vendors such as Pinecone specialize in vector databases -- Cockroach Labs is only getting started with vector search. Although recommendation engines and AI assistants are two target use cases, there are others, Petrie added.

"I'll be interested to see what additional detail they provide in coming announcements about capabilities, target use cases and ideal datasets," he said.

In addition to the new vector search capabilities, Cockroach Labs unveiled a new pricing structure for the fully managed version of its database. It also offers a self-managed version.

The vendor now offers CockroachDB Cloud at Basic, Standard and Advanced tiers. Previously, the vendor offered only Serverless and Dedicated tiers.

Basic and Advanced essentially replace Serverless and Dedicated while Standard represents a new tier between the two to give customers three fully managed options.

Basic begins at no cost with customers incurring charges once they exceed 10 gigabytes of storage and 50 million request units per month. Standard begins at $146 per month per two virtual CPUs (vCPUs) and Advanced begins at $295 per month per two vCPUs.

Beyond simply renaming two of its pricing options and adding a new one, the new pricing tiers are designed to better match an enterprise's workload needs to a pricing tier, according to Cockroach Labs CEO Spencer Kimball.

For example, the Basic tier might be best for an organization with entry-level workloads whereas the Advanced tier is the likely fit for an enterprise requiring high security and scalability. Meanwhile, the Standard tier offers a balance that is intended to provide the cost efficiency of Basic with some of the efficiency, scalability and security of Advanced.

"The introduction of the Standard tier [enables] companies to consolidate a range of workloads while optimizing cost and performance," Kimball said.

Catanzano likewise said the addition of new pricing tiers is significant given that they provide both existing and potential customers with flexibility as their workload demands and budgets change.

"It simplifies cloud adoption and makes CockroachDB accessible to a wider range of users, supporting scalability from startups to large enterprises," he said.

Beyond new vector search capabilities and reorganized pricing, Cockroach Labs unveiled Generic Query Plans, a tool that reduces query times to make complex queries more efficient and less expensive by using less compute power.

A mix of customer feedback and responding to market trends provided the impetus for adding vector search and other new features, according to Kimball.

Many enterprises are making generative AI a priority. To meet their needs, Cockroach Labs needed to add the vector search capabilities that enable those enterprises to find and operationalize relevant data as well as improve the performance of its database to handle the workloads that AI demands.

"We've designed CockroachDB to meet those evolving needs by making sure our database is ready to handle the scale and complexity of these workloads," Kimball said.

Looking ahead

With CockroachDB 24.2 now available, Cockroach Labs plans to continue adding capabilities to enable customers to run AI and machine learning workloads, according to Kimball.

Included is the recognition that many enterprises are only getting started with AI and machine learning and that both workload size and complexity will increase over time.

Vector search is a crucial advancement for CockroachDB because it allows users to handle unstructured data. By adding vector search, Cockroach enables users to … manage data more intelligently.
Stephen CatanzanoAnalyst, Enterprise Strategy Group

"Our goal is to provide businesses with a database that not only meets today's demands but is future proofed for tomorrow's challenges, allowing our customers to stay ahead in a rapidly evolving landscape," he said.

That focus on adding and improving capabilities that enable customers to develop generative AI models and applications is wise, according to Petrie.

With Cockroach Labs only now starting with vector search, it's important that the vendor demonstrate its commitment to enabling advanced application development.

"I'll be interested to see how serious Cockroach is about supporting RAG workflows," Petrie said. "If they are, I would expect more announcements about the benefits of enriching generative AI language model prompts with both vector and relational data."

Catanzano likewise suggested that Cockroach Labs continue to add support for customers interested in developing generative AI tools. Just as the integration with pgvector is how Cockroach Labs is adding vector search, integrations with other vendors could be a means of quickly developing an ecosystem for AI and machine learning.

"To continue its growth, Cockroach Labs could further integrate more AI-driven data management features such as enhanced support for machine learning workloads and more seamless multi-cloud capabilities," Catanzano said.

Adding new tools for developers and features such as data observability could also benefit Cockroach Labs and help the vendor stand out from its competitors, he continued.

"These steps could help Cockroach Labs solidify its leadership in cloud-native, resilient databases," Catanzano said.

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Data management strategies