Getty Images/iStockphoto
Starburst Galaxy update targets governance, data access
The vendor's latest update includes the public preview of Gravity, a centralized access and governance layer that enables users to better control and connect data across clouds.
Starburst on Tuesday launched the latest version of Galaxy, highlighted by new governance capabilities and improved access to data stored across multiple clouds.
Starburst is a data lake vendor whose platforms -- Galaxy for its cloud-based users and Enterprise for on-premises customers -- are often used by organizations to develop a data mesh architecture for data management.
Data mesh is a decentralized approach that enables different domains within organizations to manage their own data. Its intent is to reduce burdens on centralized data teams that lead to bottlenecks while also taking advantage of the domain expertise of users.
In April, Starburst released an integration with DBT Labs to better enable joint users to transform their data. Shortly before that, the vendor updated Galaxy to improve data discoverability.
New capabilities
Unlike some data management platforms that require different versions to work with data stored on different clouds -- for example, an AWS version to work with data stored on AWS -- Galaxy aims to provide Starburst customers with a single environment where they can work with data stored across all clouds.
Toward that end, the vendor unveiled Gravity in preview.
Gravity is a centralized access and governance layer that enables organizations to put user controls and other guidelines in place for all their connected data sources.
Included in Gravity are automated data catalogs to connect and organize data from different sources; attribute-based access control to limit use of data based on users' roles; a Great Lakes Connector that provides one source for data ingestion; and collaboration capabilities.
In addition, Starburst launched in general availability cross-cloud querying to enable access not only to data in a customer's data lake but also access to data outside their data lake.
Often, in order to combine data stored in different clouds, users must first import all their data into a single data warehouse or data lake before they can combine and explore that data. Cross-cloud querying, however, enables Starburst users to explore data across clouds and data sources before moving that data, thus limiting the amount of data they have to move into their data lake or warehouse and saving on some of the costs associated with moving data in the cloud.
Both Gravity and cross-cloud querying not only advance the capabilities of Galaxy but also directly target customer needs, according to Matt Aslett, an analyst at Ventana Research.
He noted that organizations increasingly store data across multiple clouds as well as keep some on premises, and Starburst is attempting to enable those organizations to better unify and govern their distributed data.
In fact, Aslett predicted that within two years more than three-quarters of all enterprises will have hybrid and multi-cloud data ecosystems and noted that Starburst has been steadily improving its ability to support hybrid and multi-cloud environments for several years.
"As an increasing number of organizations … are distributed across hybrid and multi-cloud architecture, advanced management and governance functionality is required to provide efficient and secure access to data without the need to move it to a single location," Aslett said.
Matt AslettAnalyst, Ventana Research
The new Galaxy features further represent Starburst's emphasis on supporting hybrid and multi-cloud environments by better safeguarding data and enabling business continuity with easy -- but governed -- access to data wherever it's stored, he continued.
"The public preview of Gravity, combined with cross-cloud querying in Starburst Galaxy, are representative of the growing maturity of the Galaxy managed service," Aslett said.
Doug Henschen, an analyst at Constellation Research, also noted the potential significance of Gravity for Starburst's users.
However, Galaxy's new capabilities are not unique among data platform vendors, he noted.
"If you're going to choose a unifying platform for data, you want to be sure that it offers comprehensive access for all types of users, robust access and security controls, and extensive governance capabilities to help meet compliance requirements," Henschen said. "All the leading data platform companies have been expanding on metadata management, cataloging and governance capabilities in recent years."
Beyond Gravity and cross-cloud querying, Starburst's Galaxy update includes the following:
- The ability to work with any data architecture so that users can connect their pre-existing data storage repositories to Galaxy.
- Improved scalability with features such as cluster sizing and auto scaling.
- And a tool called Warp Speed that aims to enable fast ad hoc business intelligence queries.
Taken together, Galaxy's new capabilities fit with Starburst's goal of making it easier to work with distributed data, according to Henschen.
"Starburst is sticking to its core mission of easing distributed data access, bolstering cross-cloud and cross-source query capabilities and, where performance is critical, continually pushing for better performance -- in this case through the addition of the Warp Speed feature," he said.
Meanwhile, with the new capabilities, Galaxy is maturing into a full-featured data management platform, according to Matt Fuller, Starburst's co-founder and vice president of product.
Starburst, which evolved out of the open source Trino project that built a query engine for large data sets, launched Enterprise in 2019 and over time developed that into a full-featured platform. Galaxy, however, wasn't released until February 2021. Since then, the vendor has added features to make its functionality equal to that of Enterprise.
"Today Galaxy is a full-fledged platform, whereas when it first launched it just ran Trino clusters," Fuller said.
To make Galaxy more full-featured, Starburst based its product development plans, in part, on feedback from its Enterprise users, he continued.
Fuller noted that Starburst has had a strategy for developing Galaxy since the SaaS platform was first under development and generally plots its own roadmap but learning from its Enterprise customers also played a role in building out Galaxy.
"We try to understand the patterns we're seeing from customers," he said. "They may ask for a feature, but then we spend time trying to get to the root of what they're trying to accomplish by asking for that feature. That reveals a lot."
On the roadmap
Perhaps Starburst's greatest strengths are the speed with which it queries data lakes and its ability to augment data stored in data lakes with other data without requiring users to move that added data, according to Fuller.
Where the 2017 startup still has room for improvement is its incorporation of AI, Fuller continued.
Data management vendors including Informatica, Alteryx and Dremio are among those that have recently unveiled plans to incorporate generative AI and large language model technology. Starburst, meanwhile, is part of a group that also includes Collibra and Confluent that are waiting to make specific plans to add generative AI.
"We've traditionally focused on data lake analytics, and I think there's an opportunity for us to partner or integrate with AI technologies," he said. "We're not a company that specializes in AI, so the opportunity would be to partner with best-in-class vendors."
Investing in AI capabilities, meanwhile, would be wise for Starburst, according to Henschen.
He noted that as competitors develop generative AI tools, Starburst's customers will want the vendor to do so as well, adding that Dremio introduced some "compelling" generative AI features last week.
"So as a customer I would want and expect Starburst to follow suit in the not too distant future," Henschen said.
Beyond potentially partnering with AI vendors -- and building out its partner ecosystem in other ways -- Starburst's product development plans include making its data lake architecture easier to use, according to Fuller.
"We made modern data lakes easier to use than people who implemented data lakes five years or so ago with Hadoop -- it's a lot easier now," he said. "But we intend to make it even simpler."
Aslett noted that while Starburst provides its own data loading and transformation capabilities through Trino, it could add more data pipeline development and data observability tools.
"The company addresses these though integration with open source tooling and partners today, but there could be the potential to bring some of this functionality into the platform to support more complex data engineering requirements," he said.
Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.