your123 - stock.adobe.com

Citing data privacy, GitLab syncs with Google generative AI

GitLab's deal with Google lets it keep sensitive customer data in the GitLab cloud while training models, amid enterprise concerns about generative AI licensing and security risks.

GitLab's partnership with Google for generative AI looks to assuage enterprise concerns about data privacy, but many open questions about licensing and other risks remain.

The partnership news this week comes amid a tidal wave of IT products that incorporate generative AI built on large language models (LLM). Such tools have held a high profile over the last year, from the availability in June 2022 of GitHub's Copilot feature, based on OpenAI's Codex project, to OpenAI's public release of ChatGPT in November. A ChatGPT API followed in March, which prompted software vendors to integrate it into their products, including infrastructure as code and DevOps tool makers.

The GPT API can include enterprise licensing to ensure customer data is stored privately and not used to train OpenAI's models. GitLab's partnership with Google, by contrast, doesn't require users' data to leave the GitLab cloud at all, according to David DeSanto, chief product officer at GitLab.

"It means that we can see, end to end, how the data is being sent and responded to and stored, as opposed to other third-party services where it's more of a black box," he said.

GitLab plans to lean on Google's generative AI expertise to make an experimental "Explain this Vulnerability" feature, production-ready this year. Other experimental and beta-stage generative AI features disclosed by GitLab since September 2022 include* suggested reviewers, code suggestions and vulnerability guidance.

DeSanto stopped short of saying GitLab plans to move all of these projects under the Google partnership, but didn't rule that possibility out. GitLab also has a commercial partnership with OpenAI, and DeSanto declined to comment on whether Google's Generative AI for Vertex AI, launched in March, is superior technically to OpenAI's GPT.

"We've got a lot of models that make up our code suggestions," he said. "We realized that if we just used one model, like most of our competitors, we would not be as effective in suggesting the right code."

Experts urge generative AI transparency as legal precedents pend

GitHub, OpenAI and Microsoft are the targets of a lawsuit alleging copyright violations in code generated by GitHub's Copilot tool, based on OpenAI's Codex. The outcome of that case is expected to answer major open questions regarding the licensing and copyright of AI-generated code.

Ricardo Torres, BoeingRicardo Torres

Until then, large enterprises will stay mostly on the sidelines, said Ricardo Torres, chief engineer of open source and cloud native and associate technical fellow at aircraft manufacturer Boeing.

"If they use code licensed under a General Public License] to train the model, and that ends up in a customer's product, even though they didn't steal data, [the customer may have] been infected by their training data," Torres said. "Even open source foundations are concerned about this. If they start taking in AI-generated code, and this code was GPL licensed, for example, that's a viral license, and it could infect other things."

DeSanto declined to comment on the GitHub-OpenAI lawsuit. The company is cautious about generative AI licensing issues, another company official said.

"We're taking great care to help prevent and limit customer exposure to license poisoning," wrote Taylor McCaslin, GitLab's group manager of product for data science, in an email sent via a company spokesperson. "This includes care and caution with selecting AI foundation models that power our features, filtering and excluding training data sets, and controls for how customers interact with these features."

Enterprises shouldn't dismiss the innovation happening in generative AI, but it's mandatory that they require vendors to supply them with information about how their data is being used in relation to such systems. They should also demand transparency in how LLMs arrive at their results, said Andy Thurai, an analyst at Constellation Research.

"Every organization is liable for their data and their customer's data. Just because you pass it to some AI [provider] doesn't mean you pass on the ownership and liability," he said. "Enterprises need to demand to understand the model, algorithm, transparency, ethics, bias mitigation, et cetera before they start using any AI solution."

IT pros weigh generative AI benefits vs. risks

Despite enterprise caution, eventually the benefits of generative AI will outweigh the risks -- at least for some organizations -- said Melinda Marks, an analyst at TechTarget's Enterprise Strategy Group.

"There's a lot of exciting potential here to help security teams protect against threats and even get out ahead of them," she said. "Our industry is advancing quickly on this because of pressing needs to get the benefits of using the technology."

These benefits generally come down to increased velocity in software development, according to DeSanto. He said GitLab's internal use of generative AI has allowed the company to get its code suggestions tool from an experimental to limited availability beta phase in a few months.

Enterprises need to demand to understand the model, algorithm, transparency, ethics, bias mitigation, et cetera before they start using any AI solution.
Andy ThuraiAnalyst, Constellation Research

Caution at this stage is prudent, but it makes sense for vendors to start testing the waters of enterprise trust, Marks said.

"The litigation situation is tricky because this is so new. But vendors like GitLab are working out whether there is a way to collect customer data without violating their privacy or protection requirements, whether they can find a way to compensate for that data if they can't get it, and whether customers are willing to take some risk for the advantages they would get," she said.

For Torres, the ultimate benefits of generative AI tools will come from the way they inform humans rather than tasks they automate.

"Taking the human out of the loop is a big step," he said. "I'm not saying that we won't ever cross that chasm. In the near term, though, I do think that providing insight to developers like explaining a vulnerability is valuable, because that's something that can also be fact-checked."

So far, Torres said he hasn't been impressed with AI-generated code he's experimented with for personal projects, making human input all the more important in his view.

"It's great for the very simple, rote stuff that, to be fair, [integrated development environments] have also been able to generate for a pretty long time," Torres said. "Once you start getting down to really intricate or performance- or memory-management-intensive work … you have to push back on [an AI model]. And in order for you to tell if it's listening to you, you have to already know the topic."

*Correction: The original version of this story identified several of GitLab’s recently released security features as generative AI features, which only applies to suggested reviewers, code suggestions and vulnerability guidance.

Beth Pariseau, senior news writer at TechTarget, is an award-winning veteran of IT journalism. She can be reached at [email protected] or on Twitter @PariseauTT.

Dig Deeper on DevOps