jirsak - stock.adobe.com

Databricks bolsters security for data analytics tool

Databricks looks to bridge the gap between on-premises controls and the cloud with data analytics security policies in the company's latest platform update.

Some of the biggest challenges with data management and analytics efforts is security.

Databricks, based in San Francisco, is well aware of the data security challenge, and recently updated its Databricks' Unified Analytics Platform with enhanced security controls to help organizations minimize their data analytics attack surface and reduce risks. Alongside the security enhancements, new administration and automation capabilities make the platform easier to deploy and use, according to the company.

Organizations are embracing cloud-based analytics for the promise of elastic scalability, supporting more end users, and improving data availability, said Mike Leone, a senior analyst at Enterprise Strategy Group. That said, greater scale, more end users and different cloud environments create myriad challenges, with security being one of them, Leone said.

"Our research shows that security is the top disadvantage or drawback to cloud-based analytics today. This is cited by 40% of organizations," Leone said. "It's not only smart of Databricks to focus on security, but it's warranted."

He added that Databricks is extending foundational security in each environment with consistency across environments and the vendor is making it easy to proactively simplify administration.

As organizations turn to the cloud to enable more end users to access more data, they're finding that security is fundamentally different across cloud providers.
Mike LeoneSenior analyst, Enterprise Strategy Group

"As organizations turn to the cloud to enable more end users to access more data, they're finding that security is fundamentally different across cloud providers," Leone said. "That means it's more important than ever to ensure security consistency, maintain compliance and provide transparency and control across environments."

Additionally, Leone said that with its new update, Databricks provides intelligent automation to enable faster ramp-up times and improve productivity across the machine learning lifecycle for all involved personas, including IT, developers, data engineers and data scientists.

Gartner said in its February 2020 Magic Quadrant for Data Science and Machine Learning Platforms that Databricks Unified Analytics Platform has had a relatively low barrier to entry for users with coding backgrounds, but cautioned that "adoption is harder for business analysts and emerging citizen data scientists."

Bringing Active Directory policies to cloud data management

Data access security is handled differently on-premises compared with how it needs to be handled at scale in the cloud, according to David Meyer, senior vice president of product management at Databricks.

Meyer said the new updates to Databricks enable organizations to more efficiently use their on-premises access control systems, like Microsoft Active Directory, with Databricks in the cloud. A member of an Active Directory group becomes a member of the same policy group with the Databricks platform. Databricks then maps the right policies into the cloud provider as a native cloud identity.

Databricks uses the open source Apache Spark project as a foundational component and provides more capabilities, said Vinay Wagh, director of product at Databricks.

"The idea is, you, as the user, get into our platform, we know who you are, what you can do and what data you're allowed to touch," Wagh said. "Then we combine that with our orchestration around how Spark should scale, based on the code you've written, and put that into a simple construct."

Protecting personally identifiable information

Beyond just securing access to data, there is also a need for many organizations to comply with privacy and regulatory compliance policies to protect personally identifiable information (PII).

"In a lot of cases, what we see is customers ingesting terabytes and petabytes of data into the data lake," Wagh said. "As part of that ingestion, they remove all of the PII data that they can, which is not necessary for analyzing, by either anonymizing or tokenizing data before it lands in the data lake."

In some cases, though, there is still PII that can get into a data lake. For those cases, Databricks enables administrators to perform queries to selectively identify potential PII data records.

Improving automation and data management at scale

Another key set of enhancements in the Databricks platform update are for automation and data management.

Meyer explained that historically, each of Databricks' customers had basically one workspace in which they put all their users. That model doesn't really let organizations isolate different users, however, and has different settings and environments for various groups.

To that end, Databricks now enables customers to have multiple workspaces to better manage and provide capabilities to different groups within the same organization. Going a step further, Databricks now also provides automation for the configuration and management of workspaces.

Delta Lake momentum grows

Looking forward, the most active area within Databricks is with the company's Delta Lake and data lake efforts.

Delta Lake is an open source project started by Databrick and now hosted at the Linux Foundation. The core goal of the project is to enable an open standard around data lake connectivity.

"Almost every big data platform now has a connector to Delta Lake, and just like Spark is a standard, we're seeing Delta Lake become a standard and we're putting a lot of energy into making that happen," Meyer said.

Other data analytics platforms ranked similarly by Gartner include Alteryx, SAS, Tibco Software, Dataiku and IBM. Databricks' security features appear to be a differentiator.

Dig Deeper on Data management strategies