Sergey Nivens - Fotolia

Soda launches cloud service to improve data observability

Data quality vendor Soda has had a busy 2021, building out new services and raising funding to help organizations identify and remediate data quality problems.

With data spread across multiple sources and locations, getting visibility into what data an organization possesses can often be challenging.

Among the vendors that provide data observability tools is Soda, based in Brussels. The 2018 startup has been busy in 2021 expanding its product portfolio.

On April 1, Soda released in general availability its Soda Cloud platform, which provides a managed service for providing organizations with data quality and data collaboration capabilities. Soda is unrelated to the SODA Foundation, an open source data effort operated by the Linux Foundation.

The launch of the Soda Cloud follows a series of notable events from the company in 2021.

On Feb. 9 the vendor released its open source Soda SQL tools, which enables users to test data sets to ensure that data is properly configured and structured. Rounding out the busy start to the year, on Feb. 2 Soda said it had raised €11.5 million (approximately $17.7 million) in a Series A round of funding led by leading European venture investor, Singular, based in Paris.

Soda enables data observability for Cloud Academy

Among the early users of Soda's technology is Alessandro Lollo, a senior data engineer at technology training platform Cloud Academy, based in San Francisco.

Lollo said Cloud Academy uses Soda SQL, Soda's open testing and monitoring tool, to apply tests and metric directly to operational data sources. With the launch of Soda Cloud, Cloud Academy is now working on integrating the platform into its environment to help provide insight into the overall quality of the data.

Cloud Academy has many microservices that communicate with each other, exchange and transform data, Lollo explained.

"Each microservice is responsible for a specific domain of the Cloud Academy platform or product, so we have many different data sources to mix together when doing analytics," Lollo said. "Combining many different data sources to create meaningful analytics is a challenging task, especially when it comes to assessing the quality of the analytics."

Screenshot of Soda Cloud monitoring capabilities
Soda Cloud integrates data monitoring capabilities to help users identify incomplete data sets.

How Soda enables data observability and monitoring

Maarten Masschelein, co-founder and CEO of Soda, said the vendor's platform helps data teams discover data problems and then guides them to efficiently prioritize and resolve them.

Soda aims to be useful from the time that data is ingested into any data platform, Masschelein said. To that end, Soda is designed so that it can be embedded into streaming data and Spark workloads to help enable automated data monitoring.

Soda can help monitor and track data updates to help identify if a given data set is complete. For example, Soda could alert a business intelligence tool user if only a certain percentage of the average volume of data has been processed when they are conducting an analysis.

"We try to communicate as far as possible into tools that the analysts are using, so that we can bring the analysts closer to the owners and the producers of data,"Masschelein said.

Combining many different data sources to create meaningful analytics is a challenging task, especially when it comes to assessing the quality of the analytics.
Alessandro LolloSenior data engineer, Cloud Academy

The intersection of data observability and data fitness

A core element of Soda's data observability platform is a concept that Masschelein referred to as data fitness.

"Many people consume data, but it's not always clear to the data producer who is consuming the data and for what purposes, so they get what they actually need," Masschelein said. "For us, fitness comes from fit for purpose and that's really about bringing context to the use cases of data."

Enabling data fitness overlaps with concepts often associated with data governance as well. Masschelein noted that from Soda's perspective data governance means connecting people to data in complex organizational settings.

"For us, it's really about bringing people together so they can do the right thing when it comes to data," he said.

Next Steps

Monte Carlo Incident IQ looks to improve data observability

Monte Carlo raises $60M to advance data observability

Cribl brings in $200M to advance data lake observability

Dig Deeper on Data governance