Getty Images Plus

Aerospike adds new vector search capabilities to database

With vector search a critical component of AI development, the vendor's latest vector search and storage capabilities target simplifying data discovery for training AI tools.

Database specialist Aerospike on Wednesday launched the latest version of its vector search suite, including a flexible storage feature to reduce complexity and an indexing tool aimed at improving query speed and accuracy.

Vectors are numerical representations of data assigned by algorithms and are a popular means of giving structure to unstructured data such as text and images. By giving structure to such data, the data becomes searchable and can be used to inform analytics and AI applications.

Aerospike raised $114 million in funding in April. The database vendor planned to use this funding to continue its development vector search capabilities -- first introduced at that time -- along with graph technology, another frequently used means to discover the data needed to train models and applications.

The new capabilities not only add functionality beyond the initial introduction of Aerospike Vector Search but also help Aerospike stand apart from other database vendors that have provided vector search and storage, according to Matt Aslett, an analyst at ISG's Ventana Research.

"Many data platform software providers have added support for vector search in the past 18 months," he said. "Those that were among the earliest to do so are now refining their approaches. The latest version of Aerospike Vector Search enhances existing functionality … with indexing and storage improvements designed to enhance performance and provide additional differentiation."

Aerospike's main platforms -- Aerospike Database and Aerospike Cloud -- are aimed at enabling real-time data analysis. Competitors include other database specialists such as MongoDB and Couchbase.

New vector search capabilities

OpenAI's November 2022 launch of ChatGPT, which marked a significant improvement in generative AI technology, sparked a surge of enterprise interest in AI development.

Generative AI models, when combined with an enterprise's proprietary data, can be used to develop applications that enable employees to interact with data using natural language and automate repetitive tasks.

However, for those applications to be of worth, they need lots of relevant, high-quality data.

The more relevant, high-quality data there is to train an AI application, the more likely it is to deliver accurate, trustworthy outputs that can be used to inform business decisions.

Given the need for volume, unstructured data is more important than in the past. Structured data, such as financial records and point-of-sale transactions, makes up less than 20% of all data. To get enough data to properly train AI tools and provide a more comprehensive view of an organization's operations, unstructured data is required.

As a result, vector search has taken on a critical role over the past two years, enabling enterprises to access their unstructured data as they develop AI tools. Providing that access, meanwhile, was the impetus for Aerospike first developing vector search capabilities, according to Naren Narendran, the vendor's chief engineering officer.

"Graph and vector are particularly important for our AI strategy," he said. "They are foundational for the future of AI applications and is the reason we got into those."

However, how vectors are indexed is critical to their effectiveness. If not done well, relevant vectorized data will be difficult to discover.

Aerospike's new vector search capabilities include what it calls a hierarchical navigable small world (HNSW) index.

Many data platform software providers have added support for vector search in the past 18 months. Those that were among the earliest to do so are now refining their approaches. The latest version of Aerospike Vector Search enhances existing functionality …
Matt AslettAnalyst, ISG's Ventana Research

The approach to indexing enables data to simultaneously be ingested into the database as well as indexed so it can be searched across devices. In addition, though data ingestion and indexing may be taking place at the same time users run queries on the data in real time, the workloads are kept separate to optimize performance.

Performance speed, meanwhile, is important for Aerospike given not only its historical focus on real-time analysis but also the need to keep AI tools updated with the most current data possible, according to Aslett.

"The ability to scale vector ingestion and indexing independently is in keeping with Aerospike's focus on real-time application requirements," he said. "In addition, it supports the increasing need for high-performance GenAI and AI inference to facilitate intelligent operational applications that deliver contextually relevant recommendations, predictions and forecasting."

Stephen Catanzano, an analyst at Informa TechTarget's Enterprise Strategy Group, similarly noted the importance of HNSW. He pointed out that enabling data to be ingested in real time while the system asynchronously builds an index fuels real-time, AI-powered decisions.

In addition, Catanzano highlighted the importance of new storage options for vectorized data such as in-memory for small indexes or hybrid memory for large indexes that had previously been available only in Aerospike's core database.

"The most significant features in this update are the durable self-healing indexes and flexible storage configurations," he said. "Together, these features enable better scalability, reduced operational overhead and lower infrastructure costs for enterprise AI systems."

Beyond new indexing and storage capabilities, Aerospike's Vector Search update includes the following features:

  • Improvement of the vendor's multi-model database engine to better enable document, key-value, graph database and vector search capabilities in a single system.
  • Prebuilt Python clients and sample applications for common uses of vectorized data to speed development and deployment.
  • Integrations with AI development platforms LangChain and Amazon Bedrock to better enable users to build an ecosystem for creating AI applications, including generative AI.

Combined, the new features comprise a compelling update, according to Catanzano.

"These features address key industry challenges like uninterrupted performance, scalability and cost reduction," he said. "This release [is] a noteworthy advancement rather than just an incremental improvement. "While rising interest in AI development led Aerospike to add vector search and graph technology to its database platform, the motivation for developing new vector search capabilities came from customer feedback and market observations, according to Narendran.

In particular, multi-model database capabilities in the same system address customer needs.

"[With multi-model capabilities, customers don't have to get a vector database, pull their data out and move it into the vector database," Narendran said.

Regarding market trends, the rising interest in developing AI, both generative AI and traditional AI, is a driving force, he continued.

Graphic listing the differences between traditional and vector search.

Next steps

As Aerospike plots its product development roadmap, improving the performance and scale of its database platform are part of its plan, according to Narendran.

In addition, the vendor is looking into developing tools beyond vector search that play roles in the AI development process and providing services that make it easier for customers to use vector search.

"Vector search is just one component," Narendran said. "There are other components that come before that or after that, and there are services that can be built on top of vector search for those that are less proficient in developing AI pipelines."

Catanzano suggested integrations with AI and machine learning development frameworks beyond LangChain and Amazon Bedrock as one-way Aerospike could expand its ecosystem beyond its core database capabilities. In addition, improving retrieval-augmented generation (RAG) capabilities that discover vectorized data and feed AI pipelines would be beneficial.

"There's also an opportunity to strengthen RAG capabilities with more pre-built templates, connectors, and workflow tools for companies rapidly adopting this architecture," Catanzano said.

Aslett, meanwhile, said that Aerospike is wise to continue improving its vector search and graph technology capabilities as interest in AI development increases and the AI-powered applications enterprises build become more advanced.

"As Aerospike [attempts to] further differentiate itself, we anticipate increased focus on the combination of its graph and vector capabilities to serve the development and deployment of high-performance AI-infused operational applications," he said.

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Data management strategies