sdecoret - stock.adobe.com

Generative AI dominates Google's data and analytics roadmap

Following recent integrations between Gemini and the tech giant's major data and analytics platforms, more product innovations featuring LLMs are expected over the next few months.

Just as generative AI has been a prominent part of Google's recent data management and analytics product development plans, the emerging technology will be a primary focus for tech giant for the near future.

In particular, Google hopes to make generative AI part of its data management and analytics experiences, with AI a constant presence within customers' workflows rather than a pop-up window or column alongside their work environment.

Google introduced Duet AI, its generative AI assistant, in May 2023. After that, many of the most significant new product introductions that followed in August 2023 were integrations between Duet and existing Google tools, including data management and analytics platforms such as BigQuery and Looker.

The tech giant has since renamed Duet. It's now part of the Gemini generative AI platform; on Feb. 15 the vendor introduced Gemini 1.5.

With Google Cloud Next '24, the tech giant's biggest user conference, scheduled for April 9-11 in Las Vegas, customers can expect to see evidence of continued generative AI development, according to Gerrit Kazmaier, vice president and general manager of data and analytics at Google Cloud, and Andi Gutmans, Google Cloud's general manager and vice president for databases.

Kazmaier and Gutmans noted in a recent interview that Google is working toward moving tools already unveiled in preview to general availability.

In addition, Google plans to introduce a new wave of generative AI tools in the coming months. Among them will be features aimed at surfacing insights that humans might not otherwise discover and continuing to improve the accuracy and security of large language models (LLMs) by helping users combine structured data with previously inaccessible unstructured data.

As a result of the generative AI capabilities Google has already introduced coupled with the plans Kazmaier and Gutmans revealed for future development, Doug Henschen, an analyst at Constellation Research, sees Google as one of the more innovative generative AI developers to date.

Many of Google's generative AI capabilities are still in the preview stage, but so are those from most other vendors as well, he noted.

"I see Google Cloud as one of two leaders -- along with the Microsoft-OpenAI partnership -- on bringing GenAI to the world. But in Google's case there's more choice in model types and model sources," Henschen said.

Gemini, in particular, is impressive, he continued. And beyond Gemini, Henschen noted that Google offers a Model Garden featuring more than 100 generative AI models developed by the tech giant as well as its partners.

"I see Google as being very thoughtful about where and how it's exposing or planning to expose generative AI capabilities," he said. "No, they're not all generally available yet, but then lots of GenAI capabilities are still in preview."

The present state

Looker is an analytics platform that Google acquired for $2.6 billion in 2019. In 2022 Google made Looker its primary platform for semantic modeling and data analysis.

BIgQuery, meanwhile, was first unveiled in 2011 and is Google's fully managed data warehouse.

Integrations between what is now Gemini and both Looker and BigQuery unveiled last August provide natural language processing (NLP) capabilities that enable users to query data using freeform language rather than code.

Enterprises have struggled for decades to enable more employees to work with data. The platforms required to manipulate and analyze data are often complex, requiring code to take most actions and necessitating data literacy training to comprehend the results of different analyses.

As a result, analytics use has hovered near 25% of workers within organizations for about two decades.

Generative AI is changing that paradigm, potentially making analytics more accessible.

LLMs such as OpenAI's ChatGPT are trained on public data, which means they have vocabularies as extensive as any dictionary. In addition, generative AI can decipher intent. As a result, LLMs enable true NLP rather than the limited NLP previously available, potentially eliminating the most significant barrier holding back the widespread use of analytics tools.

Beyond NLP, the integrations between Gemini and Google's primary data management and analytics platforms enable users generate code with LookML, or Looker Modeling Language, with an intuitive understanding of intent; automate development of dashboards and other data products; and help engineers and analysts write SQL queries and Python code.

However, six months after being unveiled, the integrations between Gemini and Google's primary data management and analytics platforms remain in preview. So do numerous other new and improved features introduced last year.

In that sense, Google is like most other data management and analytics vendors.

Following OpenAI's launch of ChatGPT, which represented a significant advance in LLM capabilities, many vendors unveiled plans to add generative AI to their platforms.

Just as Google introduced Duet AI with plans to integrate the feature throughout its vast computing portfolio, fellow tech giants Microsoft unveiled similar intentions for its Copilots generative AI assistant and AWS did the same for its assistant, Q.

Also, more specialized vendors such as Alteryx, Informatica and ThoughtSpot all revealed plans to make generative AI a significant part of their platforms.

Some tools such as MicroStrategy AI and Dremio's Text-to-SQL translator are available to all customers. Those from Alteryx, Informatica, ThoughtSpot and many others remain in preview as vendors work to improve the accuracy and security of the generative AI tools.

Google, meanwhile, is planning to move at least some of its generative AI capabilities out of preview and make them generally available in concert with the upcoming user conference in April, according to Kazmaier.

"One of the big announcements at Next will be moving many of them to [general availability]," he said, adding that numerous developer capabilities unveiled last year are already generally available.

If the tech giant indeed makes the integrations between Gemini and Google's data management and analytics tools generally available in April, it will be an intriguing development, Henschen said.

"I'm excited about the [Gemini] capabilities that have been announced within BigQuery, Dataplex and Looker. And I'm optimistic that we'll see general availability at Next '24," he said.

However, how those integrations act and react in the real world will truly determine whether they are significant for Google's data management and analytics customers, according to Donald Farmer, founder and principal at TreeHive Strategy.

He noted that despite extensive previews, vendors can sometimes make tools generally available before they're ready.

For example, Farmer said Microsoft Fabric, an AI-powered analytics and data management platform first unveiled in preview in May and made generally available six months later, has been a disappointment in his opinion.

Instead of having the feel of a finished product that should be generally available, it feels like a tool still in need of adjustments, he continued.

"The April timeline is good [for Google], but the devil is in the details," he said. "What do they mean by [general availability]? If you are serious about the enterprise data space, are you really giving an enterprise-level commitment with [general availability]? I think we need to wait and see."

Top benefits of generative AI for businesses.
Seven benefits of generative AI for the enterprise.

Future plans

Google on Feb. 8 unveiled Gemini as the new name for Duet AI as well as several other Google tools, such as the generative AI chatbot formerly known as Bard. The vendor quickly followed that the next week by introducing vector search in BigQuery and launching Gemini 1.5 in preview.

The tech giant likely will continue to introduce new capabilities on an individual basis over the next couple of months. But Google's next collection of significant new features will be unveiled at Next '24, according to Gutmans and Kazmaier.

The two executives were not specific about what will be revealed at the user conference. But they gave some details about Google's vision for its data management and analytics tools as well as the progress the tech giant is making toward that vision.

Kazmaier said Google has two main goals for Looker.

One is to change the BI platform to an AI-first experience. The other is to improve Looker's AI capabilities so that the AI is able to do things humans cannot, such as automatically monitor large amounts of data to surface insights that no human or team of humans could find.

An AI-first experience means making AI an invisible but constant presence. Rather than have a pop-up box or side column where users can ask questions with natural language, Looker will be proactive. It will find the right data and run queries based on what it has learned about users' habits and organizational needs.

"Our vision for BI and for Looker, specifically, is changing it to an AI-first experience," Kazmaier said. "We don't think of AI as having the best form factor in a side panel in Looker but actually re-thinking the entire experience from data curation to data analysis to data sharing in a way that you have an always-on intelligent co-creator."

Surfacing insights beyond what humans are capable of means analyzing every data point among potentially billions for relevant information.

Given human limitations, Kazmaier noted that most BI is done on aggregates of data the serve as a representation of an organization's overall data.

AI, and generative AI in particular, have no such limitations. LLMs can examine all an organization's data and be trained to ask relevant questions that surface key information.

"With the future of Looker, we can unearth these insights and take actions, which is going to be not only more efficient but also more impactful for the customer experience, the employee experience and the supplier experience," Kazmaier said.

Gutmans, meanwhile, noted that vector embeddings are a critical part of the generative AI development process and that Google plans to continue adding more vector search and storage capabilities to its data management suite.

In particular, accuracy and security are focal points, he said.

Vectors are a means of giving structure to unstructured data so the two can be combined and used to better inform an application or model than structured data alone.

It's estimated that over 80% of the word's data is unstructured. However, until recently, analytics dealt almost exclusively with structured data.

There previously was no simple, efficient way to extract meaning from PDFs, emails, texts, videos, audio files and other forms of unstructured data. Even if some limited information could be extracted from unstructured data, combining it with structured data was yet another obstacle.

Google in April 2022 introduced BigLake, a data lakehouse that combines the structured data management capabilities of a data warehouse with the unstructured data management capabilities of a data lake.

Using a combination of structured and unstructured data, which is given structure within platforms such as BigLake by algorithms that assign vectors to give the unstructured data a searchable form, organizations can develop retrieval-augmented generation (RAG) pipelines.

RAG pipelines, in turn, can be used to feed LLMs with an enterprise's proprietary data so the LLMs can be used for business-specific purposes.

In addition, vectors enable similarity searches so that relevant data can be grouped into a dataset and used to inform a model or application through a RAG pipeline.

Public LLMs, however, have suffered both security and accuracy problems, making them risks for enterprises that want to keep proprietary data private and need models to deliver correct outputs to make decisions.

Google and other data management providers, therefore, are developing tools that enable customers to develop their own domain-specific language models and train public LLMs with their own data in a secure environment.

"As more unstructured querying of databases happens, that opens up questions about accuracy and security," Gutmans said. "We believe that we will be able to innovate on both accuracy and security in a manner that truly enables the most mission critical enterprise workloads."

Proper focus

If there's a common theme in Google's data management and analytics product development plans, it's on enabling analysis on an enterprise level.

Gutmans highlighted the tech giant's work to expand data management and analytics availability to more users withing organizations by using AI to transform BI.

"The pendulum has swung toward developers being able to innovate with AI without having to be data scientists or AI experts," he said. "We're really thinking about what we can do to make sure that the developer who is building these GenAI-first apps can be the most productive and do that in the most cost-effective manner."

Kazmaier, meanwhile, noted that perhaps the most significant outcome resulting from the use generative AI in concert with data management and analytics is that enterprises can now operationalize previously untapped unstructured data.

"The way I would characterize it is that the days of big data are over," he said. "We are going to wide data now. It's not about having much more of the same data but actually getting to more data signals. The transformative power of generative AI is that it unlocks unstructured data."

I see Google Cloud as one of two leaders -- along with the Microsoft-OpenAI partnership -- on bringing GenAI to the world. But in Google's case there's more choice in model types and model sources.
Doug HenschenAnalyst, Constellation Research

That focus on expanding both the use of data to more employees within enterprises as well as the volume of data used to inform decisions is significant, according to Farmer.

He noted that other vendors are also focused on making analytics an enterprise-wide tool. But that doesn't detract from Google's own efforts.

"I don't think anything they are announcing in the data management or analytics space is particularly surprising or distinctive, but the fact that they are focusing on those areas is telling," Farmer said. "Clearly they see enterprise data as critical to their direction, and for those of us in the space that's interesting enough in itself."

In particular, Google's emphasis on simplifying application development so that more than just data scientists and other experts can build data products is promising, he noted.

However, without knowing exactly what Google plans to introduce leading up to the April conference, it's difficult to say whether the tech giant's near-term product development plans will produce anything significant for users, Farmer continued.

"Without specific details on the tools, it's challenging to pinpoint what could be particularly interesting or significantly beneficial for GCP users," he said. "However, the emphasis on GenAI integration into various services does suggests a future where analytics and data management are more accessible and possibly even more cost-effective on GCP."

Henschen similarly said that Google's recent development and future plans are sound.

He noted that unstructured data has long been an untapped resource and efforts to operationalize that vast resource are important.

"Observations on the power of combining structured data with a wide array of unstructured data is spot on," Henschen said. "It's a trend that's quickly unlocking never-before-possible insights."

One example of a Google customer using generative AI to unlock insights with unstructured data is Symphony Financial, a financial services company with offices in Maryland and Virginia, according to Kazmaier.

Symphony has copious amounts of trade data, which comes from phone calls featuring domain-specific language. The data results from audio files, trade information, security checks and post-trade reconciliations, among other things.

Before combining Google's LLM technology with BigQuery, none of that data was accessible. Now, however, Symphony is using that data to develop market insights and improve compliance.

Another customer, seafood provider Camanchaca, is using Google's AI-based data management and analytics tools to enable employees to query and analyze data using natural language rather than visit dashboards, according to Kazmaier.

Specifically, Camanchaca is deploying Vertex AI for AI modeling, BigQuery as its data foundation and Looker for semantic modeling. Using those tools, the company has developed an information portal that can be used to ask about inventory and sales processing.

"GenAI can do more than just drive productivity gains and accelerate development workflows," Henschen said. "The real power comes in [changing how data is consumed]."

Beyond Next '24

While improving the security and accuracy of LLMs is a current focal point of Google's data management plans, working to improve security and accuracy will not stop after Next '24, according to Gutmans.

In addition, he said the tech giant is working to make its tools more cost-effective by optimizing compute.

With data volume increasing at an exponential rate and enterprises using more compute power to deal with all their data, cloud computing costs are often exceeding expectations. As a result, vendors -- who are at risk of losing customers if their platforms become too expensive -- are making efforts to help users control spending by improving the efficiency of their cloud-based services.

"We're pretty nascent in the [generative AI] space and still figuring out how to have the best purpose-built model for the job that gets the right level of accuracy, right level of performance and gets the right level of cost," Gutmans said.

Toward that end, autonomous computing could play a significant role, according to Kazmaier.

Autonomous systems have the potential to constantly monitor for security and accuracy while powering up or down to meet an organization's needs at a given time.

"What we are building and what we are seeing is autonomous agents that are basically acting in collaboration but to a degree autonomously with GenAI and data systems as a huge value generator," Kazmaier said. "Think about system optimization that happens continuously, continuous security and compliance, and continuous data analysis."

Security, in particular, is a smart area for Google to focus its generative AI development, according to Farmer.

In addition to helping organizations protect data, making security a priority would further demonstrate Google's commitment to meeting the needs of entire enterprises rather than just data experts.

"If they are as serious about enterprise as they say, one area for further development could be deeper integration of GenAI into Google's security and compliance tools, helping businesses better predict vulnerabilities and automate security protocols," Farmer said.

Henschen, meanwhile, said that further development of an AI-experience would be beneficial.

Though Kazmaier mentioned it as part of Google's generative AI roadmap and Henschen noted that Looker's team shared a video in August alluding to a natural language-driven analytics interface, no specifics have been revealed. Meanwhile, other vendors such as Tableau with Pulse have already unveiled similar capabilities in preview.

"Google Cloud has been vague about what their product will be called and when it will be available," Henschen said. "Hopefully, we'll see a preview release of some sort in April."

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Business intelligence technology