your123 - stock.adobe.com

SAS acquires synthetic data generator to aid AI development

The longtime analytics vendor plans to fold the intellectual property of Hazy into its existing synthetic data generation capabilities to help customers safely build AI tools.

SAS on Tuesday closed on the acquisition of the primary software capabilities of Hazy, a synthetic data generation vendor whose tools can aid in developing AI models and applications.

Financial terms of the transaction were not disclosed.

Based in London, Hazy is a 2017 startup whose synthetic data generation platform enables users to artificially manufacture data that mimics real data.

One of the main applications of synthetic data is to train AI models and applications without accidentally exposing sensitive information contained in an organization's real data. It is particularly useful in industries such as health care and financial services in which sensitive data is common.

Other uses of synthetic data include adding volume when data is too sparse or incomplete to properly train an analytics or AI tool, helping to reduce bias in data sets and informing scenario testing.

SAS, meanwhile, is a longtime analytics vendor based in Cary, N.C. Like many of its peers, in the two years since OpenAI's launch of ChatGPT marked substantial improvement in generative AI capabilities, SAS has made generative AI development a significant aspect of its platform. It has developed certain generative AI-powered tools of its own, such as an AI assistant, while also adding features that enable customers to build AI models and applications.

This acquisition of Hazy is likely a good move for SAS' core data scientist audience. It adds important features to the portfolio, and it does align with some recent trends toward increased use of synthetic data.
Donald FarmerFounder and principal, TreeHive Strategy

SAS' acquisition of Hazy's synthetic data generation capabilities adds another feature aimed at helping customers develop their own generative AI tools and is therefore significant, according to Donald Farmer, founder and principal of TreeHive Strategy.

"This acquisition of Hazy is likely a good move for SAS' core data scientist audience," he said. "It adds important features to the portfolio, and it does align with some recent trends toward increased use of synthetic data."

It is unlikely, however, that acquiring synthetic data generation capabilities to aid AI development will help SAS attract new customers, Farmer added.

"This is a feature for their existing specialist users," he said.

Adding synthetic data

Enterprise interest in developing AI models and applications has surged over the past two years.

Organizations have long sought ways to make employees more data-driven, but have been held back by the complexity of using analytics and data management platforms. Coding skills were generally required to prepare data for analysis, and data literacy skills were needed to interpret data.

Natural language processing was seen as a way of broadening the use of analytics tools beyond trained experts. However, the NLP tools vendors developed up until a couple of years ago had limited scope and still required some level of expertise to use.

Generative AI changed that by enabling true NLP.

When language models such as ChatGPT and Google's Gemini are combined with an organization's proprietary data, users can query and analyze the data using natural language rather than code. In addition, generative AI can be trained to take on certain repetitive tasks, benefiting trained experts.

Now, generative AI is improving to become more autonomous. As a result, more enterprises are either developing or planning to develop generative AI tools to broaden data-driven decision-making and automate repetitive processes.

In response, vendors such as SAS, MicroStrategy, Qlik and many others have created development environments within their platforms designed to simplify the development of AI and machine learning tools.

SAS was initially slow to embrace generative AI, taking a cautious approach in early 2023 due to concerns related to the accuracy and security of large language models. By September 2023, however, the vendor joined the fray and unveiled plans to integrate its tools with LLM capabilities.

In April, SAS unveiled a generative AI-powered assistant, prebuilt AI models, initial synthetic data development capabilities through SAS Data Maker, and a complete environment for users to develop their own AI models and applications.

While introduced in April, SAS Data Maker remains in private preview, according to Brett Wujek, SAS' senior research and development manager. During the private preview, SAS has heard from customers in such industries as public service, pharmaceuticals and manufacturing.

"That [feedback] led us to expand the scope of our development efforts to fill some important gaps [with the acquisition]," Wujek said.

Some of those gaps include generating synthetic data sets based on certain data tables, support for time series data and protection of personally identifiable information, he continued.

"While these were all on the roadmap for SAS Data Maker, the acquisition of Hazy's technology provides an estimated two-year acceleration in product maturity," Wujek said.

Meanwhile, other benefits of the acquisition include the following, according to SAS:

  • AI systems that can be trusted and adhere to ethical standards due to the diversity of data that the inclusion of synthetic data enables.
  • Increased data security and privacy with synthetic data taking the place of sensitive data to eliminate the risk of that sensitive data getting exposed.
  • More diversified research and testing through using synthetic data sets.
  • Cost savings by reducing spending on data collection.
  • Faster development, given the speed with which synthetic data can be generated.

Despite a slower start than some of its peers, factoring in acquiring Hazy's synthetic data generation capabilities, SAS has developed generative AI capabilities and an environment for AI development that are competitive with those of other analytics vendors, according to Farmer.

"The SAS Viya platform is quite comprehensive," he said.

In particular, SAS has done a good job of integrating open source programming languages and adding LLM capabilities into analytics workflows, Farmer continued. However, just as the acquisition of Hazy's intellectual property likely won't attract new customers, Farmer suggested that SAS' AI-related efforts to date haven't helped the vendor stand out to potential new customers.

"This focus on integrating GenAI into existing analytics workflows is good for their existing user base," Farmer said. "The concern is that they do not win many greenfield customers, but are primarily selling more value back into existing clients."

Next steps

With SAS' acquisition of Hazy's technology now complete, one of the vendor's primary focal points will be integrating Hazy's capabilities into SAS Data Maker and making SAS Data Maker generally available on marketplaces in Microsoft Azure, AWS, Google Cloud and Snowflake, according to Wujek.

"This acquisition ... reflects SAS' vision for the future of data and AI," he said.

Farmer, meanwhile, said SAS has a unique opportunity to develop pretrained AI models targeted to customers in certain industries, noting that the vendor already provides an array of applications specific to many industries.

SAS lists 17 industry applications on its website, including ones for agriculture, banking, education, healthcare and even sports. Vendors such as Databricks, SAP and Snowflake also offer industry-specific tools as part of their platforms.

"SAS has a great reputation for vertical specialization," Farmer said. "It would be good to see them developing specific AI models pretrained for specialized industries with APIs to match."

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Business intelligence technology