sdecoret - stock.adobe.com

New MongoDB tools enable generative AI development

The independent database vendor added vector search and workload management tools that work together to enable developers to build and run cutting-edge analytics applications.

MongoDB launched two new tools this week designed to help customers develop generative AI applications.

Atlas Vector Search -- first unveiled in public preview in June and updated in September -- and Atlas Search Nodes were made generally available Monday.

Atlas Vector Search is designed to enable MongoDB customers to use their own data in concert with generative AI (GenAI) platforms to aid decision-making. Atlas Search Nodes enable customers to run large generative AI workloads without affecting performance.

Part of the power of each, therefore, is in the way they work together, according to Doug Henschen, an analyst at Constellation Research.

"A customer could use Atlas Search Nodes whether they are applying GenAI to the search experience or not," he said. "But who wouldn't want to apply GenAI to a modern search experience? As I see it, the two announcements really work hand in hand."

Based in New York City, MongoDB is a NoSQL database vendor whose platform provides an alternative to traditional relational databases.

Relational databases were first developed in the 1970s. But as the volume and complexity of the data organizations now collect continues to increase, relational databases are often unable to handle modern demands.

As a result alternatives have been developed, including graph databases such as TigerGraph and Neo4j and document-based databases such as MongoDB and Couchbase.

Atlas is MongoDB's developer platform where much of MongoDB's recent focus has been on enabling developers to build GenAI applications.

New capabilities

After the initial excitement inspired by OpenAI's launch of ChatGPT in November 2022, many organizations realized they needed to develop their own LLMs to truly benefit from generative AI.

LLMs such as ChatGPT and Google Bard that are trained on public data are good for information searches and content generation. They don't have the institutional knowledge organizations can get from only their own private data, however.

They also don't help inform business decisions.

Therefore, some organizations are augmenting public LLMs with their own data, while others are going even further and developing LLMs trained on their own data to inform decision-making.

However, to get accurate outputs from those LLMs -- whether augmenting public LLMs or developing private ones -- the LLMs need to be trained on as much data as possible. Otherwise, the LLMs have a propensity to deliver inaccurate outputs called AI halluncinations.

Vectors help organizations discover as much data as possible to inform private language models.

Vectors give structure to unstructured data, such as text and audio files, by assigning it a number. That, in turn, allows previously unstructured data to be combined with traditional structured data, thus increasing the pool of data that can be used to train an LLM.

In addition, vectors enable similarity searches so that data engineers can discover as much data as possible to inform their models.

In response to the increased demand for vectors, many data management vendors have unveiled plans to add vector search and storage capabilities. In addition to MongoDB, others include SingleStore and Dremio.

Rather than add vector search capabilities to its existing database, MongoDB instead made Atlas Vector Search a standalone vector database that integrates with MongoDB's operational database. The intent is to provide developers with a single API for their generative AI workloads without requiring data duplication and synchronization, according to MongoDB.

Atlas Search Nodes, meanwhile, provide an infrastructure for managing generative AI workloads separate from the operational nodes of customers' other database infrastructures. The intended results are better performance at scale and cost optimization.

Together, Atlas Vector Search and Atlas Search Nodes are meant to provide a foundation for developing and running generative AI workloads. That addition of a new foundation for generative AI is significant for MongoDB users, according to Stephen Catanzano, an analyst at TechTarget's Enterprise Strategy Group.

"This opens up a new paradigm of possibilities to look at data in more of a three-dimensional way to create new and more accurate outcomes and recommendations [based on] similarity," he said.

Catanzano noted that following the release of ChatGPT, most database vendors quickly realized they didn't offer ways to identify similar data in a data set to make recommendations.

Vector search, which is not a new capability but was not previously viewed as critical, provides the means to discover similar data and make recommendations for using data to inform generative AI models. Vector searches don't provide exact matches. Instead, they find data with similar characteristics.

Catanzano equated vector search to searching for a new home. If someone has 10 criteria and is looking in a particular area, a vector can suggest a home that meets nine of the criteria, is one town away and is priced 20% less.

"It's a real recommendation, which learns from your behaviors," he said.

Henschen, meanwhile, noted that the way Atlas Vector Search and Atlas Search Notes work together separates MongoDB's new tools from the vector search tools offered by other vendors to date.

He explained that customers choose databases for their overall platform rather than individual features. MongoDB is now one of many vendors that offer vector search. But not all combine vector search with scaling and cost optimization capabilities.

"Customers choose databases for the underlying strengths of the product and product ecosystem, not X feature or Y feature," Henschen said. "If you consider only GenAI, I've seen plenty of vector embedding features added to databases. But the added option of scaling up separate search nodes is a bit of a differentiator for MongoDB."

Catanzano similarly said MongoDB is ahead of some competitors in adding capabilities that support generative AI development.

This opens up a new paradigm of possibilities to look at data in more of a three-dimensional way to create new and more accurate outcomes and recommendations [based on] similarity.
Stephen CatanzanoAnalyst, Enterprise Strategy Group

In particular, the independent vendor is working more quickly to date than Oracle, which offers numerous database products.

"A big competitor is Oracle, and Mongo is moving much faster on adding features to attract AI workloads," Catanzano said. "However, Oracle has a loyal customer base that will likely wait for them to add these, but not too long."

While Atlas Vector Search and Atlas Search Nodes were only made generally available Monday, developing vector search capabilities began nearly two years ago, according to Sahir Azam, MongoDB's chief product officer.

The vendor previously developed Atlas Search to enable users to create semantic definitions of data and search for data based on definitions and synonyms. Customers, however, wanted to be able to search based on not only text but also on similarities.

"That smarter search experience is what drove use to start developing this capability," Azam said.

Once ChatGPT and other generative AI platforms were released and organizations wanted to use their own data to either augment LLMs trained on public data or develop their own LLMs from scratch, customer requests for vector search and storage capabilities increased, he continued.

"There were really two tracks [that led to Atlas Vector Search," Azam said. "One was classic semantic search use cases. The other was to enable generative AI applications by leveraging vector databases as a way to tune or augment large-scale language models."

Future plans

As MongoDB continues to develop its platform, adding generative AI capabilities of its own is part of the vendor's product development roadmap, according to Azam.

Since ChatGPT was released, numerous data management and analytics vendors have unveiled plans to release natural language processing (NLP) capabilities that enable conversational interactions with data.

ThoughtSpot was among the first in May when it unveiled Sage to enable natural language search. Many others have followed, though most generative AI-driven NLP tools -- including Sage -- are still in preview.

MongoDB is part of that group, with NLP tools in preview.

"We're focused on making sure our developer community can benefit from the efficiency of AI as they're working with our tools," Azam said. "We have a bunch of announcements planned around natural language … and we've invented copilot capabilities around our development tools."

In addition, Azam noted that MongoDB's plans include developing generative AI tools that help customers modernize their data infrastructures -- an often laborious process that includes substantial manual work.

Henschen, meanwhile, noted that MongoDB is wise to make NLP part of its roadmap.

The vendor has, in recent years, expanded beyond its roots as a database specialist and, in June 2022, added analytics tools to its platform. Analytics users, in particular, can benefit from NLP.

"MongoDB has added a rich set of analytical capabilities to its product, so I'd like to see it add GenAI features such as natural language query, visualization and explanations to those capabilities," Henschen said.

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.

Next Steps

Top generative AI tool categories

Dig Deeper on Database management