your123 - stock.adobe.com
Zilliz raises $60M for open source Milvus vector database
The leading commercial backer behind the open source Milvus database project is looking to advance the technology to build out a database for AI application workloads.
Vector database vendor Zilliz on Wednesday said it raised $60 million in a series B round of funding to advance the open source Milvus platform.
A vector database works somewhat differently from a relational database that relies on structured data. With a vector database, unstructured data is converted into mathematical arrays to help understand relevance.
The open source Milvus vector database has a particular focus on AI applications and is a top-level project at the LF AI & Data Foundation, a section of the Linux Foundation dedicated to AI and data. The Milvus 2.1 release became generally available on Aug. 3, providing performance gains as well as data processing improvements.
Zilliz and its Milvus database compete against other vector database vendors, including Pinecone, which raised $28 million in March. Zilliz is in the process of building out a managed cloud database-as-a-service platform for Milvus that is now in a private preview.
In this Q&A, Charles Xie, founder and CEO of Zilliz, provides insight into the Milvus database and its vector database technology.
Why raise money now for Zilliz and your vector database technology?
Charles Xie: I think compared to traditional data processing technologies, vector database is still at a very early stage.
We raised this amount of money because we think it's time for the Milvus vector database and Zilliz to really go global. We will use this money to go out and hire more talented engineers and also build out our go-to-market efforts.
We want to use this new round of funding to double down on talent acquisition and also help us to build out our commercial products.
What was the original vision behind starting the open source Milvus vector database?
Xie: At Zilliz, we are focusing on developing the next generation of database for the time of AI.
Over the past decade, we have seen a lot of innovation with different databases, including time series, graph and real-time analytical databases. All those databases rely on some form of structured data processing.
Charles XieFounder and CEO, Zilliz
In the time of AI, unstructured data is on the rise. Unstructured data, for example, includes things like images, videos, human voices and even molecular structures, which are used in the life sciences.
The reality is, the real world is made up of a lot of unstructured data. We realized that from a database perspective, there is a need for a new database to manage, store, index and analyze unstructured data.
We started the Milvus database in 2018, open sourced the technology in 2019, and in 2020 we contributed the project into the Linux Foundation's LF AI & Data Foundation.
Milvus is all about trying to help companies make use of unstructured data.
How does the vector database technology actually ingest the unstructured data?
Xie: In order to bring unstructured data into the vector database, we have to go through a process we call the new ETL [extract, transform and load] for unstructured data. We have to transform all the unstructured data, whether it's an image or video, into a feature vector.
We have another open source project developed by Zilliz called Towhee that helps enterprises convert unstructured data into feature vectors. With Towhee, we use deep learning algorithms to convert the unstructured data, and from there it lands in the Milvus database, where it can be analyzed and searched.
What is the intersection of data labeling for unstructured data and a vector database for AI?
Xie: The traditional approach to processing unstructured data for AI is with data labeling, where the data is labeled into different categories.
In our view, data labeling in the future will be replaced with feature vector-based processing. A label only has a preset number of categories, which can be a limitation that doesn't express all the semantics of the data.
With the vector database, we can help people to understand unstructured data at a very detailed level. Data labeling was developed decades ago, and it works for many scenarios. But now, with the help of modern vector database technology, we can just process and understand unstructured data better.
Editor's note: This Q&A has been edited for clarity and conciseness.