agsandrew - Fotolia

Immuta adds automated data governance for Databricks

Immuta has extended its data governance with new features that help enterprises with data privacy compliance for Databricks Unified Data Analytics Platform workloads.

Data analytics workloads that run on the Databricks Unified Data Analytics Platform can now benefit from automated data governance capabilities from Immuta.

Databricks is one of the lead contributors to the Apache Spark open source data query engine, and Spark is the foundation of its data analytics platform. Immuta's automated data governance software provides security controls that help organizations manage personally identifiable information.

While Databricks provides a set of data access controls, some users need additional capabilities. One early user of Immuta for Databricks is Cognoa, a pediatric health provider in Palo Alto, Calif..

Cognoa's data science team uses Databricks as a distributed computing platform for all of its computationally intensive machine learning tasks, Chief AI Officer Halim Abbas said.

"As a digital behavioral health company, data privacy and security are at the core of what we do," Abbas said. "[But] our legacy practices were extremely time- and labor-intensive."

Cognoa needed to provide its data scientists with data to build models, while removing sensitive information. However, this involved numerous steps, including complex data engineering, manual policy enforcement and labor-intensive reporting.

The convoluted approach to data security Cognoa employed previously created friction between compliance officers and machine learning engineers. The former needed to ensure the company protected end users' privacy in accordance with healthcare data laws, while the latter wanted data faster, Abbas added.

"We needed to expedite our data processing, while also finding a way to dynamically anonymize sensitive information for reporting," he said. "We therefore required a solution that could help us enforce data access roles, permissions and policies beyond the standard resource- or table-based control levels."

Immuta competes with fellow data governance startups such as Okera, as well as offerings from Informatica and other large vendors. With Immuta's platform, Cognoa is now able to apply the appropriate restrictions to data and enforce data access and policy restrictions in real time, based on the needs of its data scientists, according to Abbas.

Why organizations need to govern data

Governance critical for data analytics

Organizations that run data management and analytics workloads with Apache Spark and Databricks face common challenges, such as managing fine-grained access controls at scale, said Steven Touw, co-founder and CTO of Immuta.

They also need to have detailed audit logs for all data-level access that show who accessed what data, when, and for what purpose in order to comply with data protection laws such as CCPA , GDPR and HIPAA.

We needed to expedite our data processing, while also finding a way to dynamically anonymize sensitive information for reporting.
Halim AbbasChief AI Officer, Cognoa

Immuta's platform can enforce policies either through a proxy or directly in the database engine, but Immuta for Databricks takes the latter approach. As such, users can clearly see what controls are being applied to a given data set inside of Databricks.

The platform can also identify and catalog sensitive information in Databricks tables and provides a simplified policy builder for data platform engineers to help them create policies that are understandable by nontechnical users.

In addition, Immuta for Databricks includes the ability to create secure data collaboration zones, where users with different permissions can read and write data sets without risk of a data leak within a Databricks cluster.

Dig Deeper on Data governance