Luiz - Fotolia

Rockset updates real-time indexing cloud database

Technology based on the open source RocksDB project is helping cloud database vendor Rockset improve scalability for real-time data applications that don't benefit from data lakes.

Rockset on Wednesday said it updated its real-time indexing cloud database with new features that enable more scalability with improved data storage options.

The cloud database vendor, based in San Mateo, Calif., uses the open source RocksDB key-value store that its founders created while working at Facebook.

Rockset then builds more features on top, including a converged index to help users more easily search across different types of data, including time series, structured and unstructured data. Previously, in order to enable real-time database indexing capabilities, storage was tightly connected to the compute layer. With the new update, Rockset is enabling compute and storage to scale independently, which provides users with more control over utilization and costs.

Among Rockset's users is mobile health app developer Rumble Wellness, based in Tel Aviv, Israel. Yaron Levi, founder and chief architect at Rumble, said his company uses Rockset to help serve online analytics processing (OLAP) queries for the leader board and statistics screens in the Rumble mobile app. Rumble also uses Rockset to store click data from its app in real time via a web dashboard for some of its customers.

Rumble looked at other vendors, including Imply, which is based on the open source Apache Druid project, to help power the leader board and statistics screens, but found it costly and difficult to learn, Levi said. Imply offers users the promise of a real-time data platform intended to make it easier for users to enable a self-service analytics capability.

On the click data, Levi said Rumble considered using Snowflake, but since Rumble was already using Rockset for the OLAP real time mobile screens, it was easy to just use it for real-time dashboards as well.

The separation of compute and storage in the new Rockset update is particularly useful for Rumble, Levi said.

"At Rumble, we ingest and store a very large amount of steps as data points," Levi said. "If we had to pay a fixed price that encompasses both storage and compute, we would probably pay a very high price for compute that is not being used. Also, our incoming stream of data fluctuates over the day, and Rockset simply scales along while we pay a fixed price per gigabyte ingested."

The Rockset approach is different than a data lake

Separating compute from storage is a common practice for layering data analytics on top of a cloud data lake. Venkat Venkataramani, CEO and co-founder of Rockset, said that Rockset is providing a real-time database capability that differs from the batch analytics commonly associated with data lakes.

At Rumble, we ingest and store a very large amount of steps as data points. If we had to pay a fixed price that encompasses both storage and compute, we would probably pay a very high price for compute that is not being used.
Yaron LeviFounder and chief architect, Rumble Wellness

In the real-time market, achieving the decoupling of compute and storage is harder than with a data lake model, because the data is constantly coming in and being analyzed immediately. Venkataramani noted that before the new update, compute and storage with Rockset were tightly coupled.

How Rockset decoupled storage for its real-time database

With the cloud data lake model not an option for Rockset, as it doesn't enable the performance needed for a real-time database, the vendor took a different tack.

Venkataramani's team built what he referred to as a dynamically scaling hot data storage tier. That storage tier can continuously ingest data in a real-time index and then can scale independently of the compute that is needed for processing. The storage that is used is not a cloud data lake, like Amazon S3, which provides cold storage, Venkataramani noted.

"Doing real-time updates to S3 is not possible and also doing millisecond reads or queries on top of S3 is not possible," Venkataramani said. "What we have built is a hot storage tier, which is relatively cost-wise per gigabyte more expensive than S3, but it is just as scalable."

Rockset is based on RocksDB and running on AWS

The underlying hot storage system that Rockset is using is the open source RocksDB embedded storage engine as a base.

Venkataraman noted that Rockset also uses AWS infrastructure. The RocksDB instances are scaled up and down as needed on Amazon EC2 virtual instances that are configured with optimized SSD-based storage capacity.

Looking forward, Venkataraman said one area that the vendor is working on is how to transition from hot storage to cold storage for the Rockset real-time database.

"You can have massive volumes of data coming in and hot storage is where the real time analytics will happen," he said. "And as the data gets older we want to be able to seamlessly allow our users to manage the historical portions in cold storage, so they can get the best of both worlds."

Dig Deeper on Database management