Getty Images/iStockphoto

Backup admins must consider GenAI legal issues -- eventually

As legal woes mount against popular LMMs powering generative AI, backup admins should work with IT to set safe and effective usage guardrails to avoid future headaches.

The adoption of generative AI in the enterprise is requiring legal departments to create new rules and guidelines. But the fallout from these decisions won't significantly impact backup and archive administrators work just yet.

Companies hawking large language models (LLMs), the core datasets and algorithms powering GenAI applications such as ChatGPT, are facing numerous legal challenges worldwide. User privacy, copyright violations and deepfake creation are all contributing to the legal scrutiny facing GenAI.

These concerns and the fear of indemnity through GenAI use are the domain of legal and security departments for the time being say experts. Backup administrators typically only need to concern themselves with how to store abstracted data and metadata rather than the content itself.

Establishing enterprise guardrails and regulations for GenAI usage will vary across organizations. But the legal precedents established for data controls will likely remain the same, said Krista Macomber, an analyst for Futurum Group.

"When we think about security and privacy for data, [GenAI is] really no different than any other application," Macomber said. "It's going to be case by case based upon how the organization is using AI and what they're feeding into it."

Generating challenges

Many popular LLMs are licensed or accessed through subscriptions by most enterprises because constructing an LLM is outside time and funding scopes for most. Using an LLM in this manner makes data as vulnerable as it would be with any SaaS program, said Ray Lucchesi, founder and president of Silverton Consulting.

"Is there exposure by putting this out on the internet [through an LLM]? Possibly. But if a corporation owns the data, it should be fine," Lucchesi said.

Vendors of security services already see a potential market for easing the construction of guardrails and rules around GenAI usage before it reaches the purview of backup administrators.

Metomic, a security platform that protects sensitive data within SaaS applications, now provides the ability to halt the submission of personally identifiable information into the ChatGPT website through a browser plugin. The service can be configured to warn users submitting such information about the potential damage before then alerting security teams on what decisions the user makes.

Many enterprises have yet to fully implement ChatGPT or other GenAI services into their IT workflows, said Rich Vibert, cofounder and CEO of Metomic. But his company is planning expand into customer implementations of ChatGPT.

No matter the implementation of GenAI in an enterprise, the amount of data being fed into an LLM model likely cannot be managed by one specific department, Vibert said. Instead, every GenAI user in the enterprise will need education on how to avoid copyright violation, intellectual property infringement and personal identifiable information leaks.

"The world of alerts and mediation can't be handled by security teams themselves," Vibert said. "It has to be handled by employees as well."

Steps to reduce the legal risk of using generative AI.
Generative AI use might lead to legal issues.

Large legal malfeasance

LLMs requiring a massive amount of data and, by proxy, dipping into nebulous legal territory is inherent to GenAI services contracts, said Andy Thurai, an analyst at Constellation Research. Many GenAI vendors are now offering indemnity or other legal protections for customers.

Microsoft’s Copilot Copyright Commitment, for example, offers an indemnity clause for the Copilot AI assistant that claims to protect enterprises that use the service against lawsuits. That clause, Thurai said, hasn't been tested in court and isn't a true source of protection to enterprises facing legal challenges. The requirements for Microsoft to step in also limit the product's functionality significantly.

"It's a [legal] can of worms that enterprises can't afford to open," Thurai said.

Unfortunately for enterprise legal teams, the need to create guidance is fast approaching.

Lawsuits by organizations such as The New York Times are looking to take back IP control and copyright from the OpenAI's proprietary and commercial LLM model.

Those suits are entirely focused on the contents of data itself rather than the mechanics of backup and storage that backup admins would concern themselves with, said Mauricio Uribe, chair of the software/IT and electrical practice groups at law firm Knobbe Martens.

The business advantages of GenAI within backup technology are still unproven and unknown, he added. Risks such as patent infringement remain a possibility.

Backup vendors are implementing GenAI capabilities such as support chatbots into their tools now, such as Rubrik's Ruby and Cohesity's Turing AI. But neither incorporates enterprise customer data or specific customer information, according to both vendors.

"I'm not sure how integrated or tightly coupled these benefits are to backup and recovery," Uribe said. "You see a lot of interfaces for the users. But [GenAI] hasn't yet modified the act of backup and storage."

AI guardrails

Beyond relying upon external services or other departments, IT teams are empowered to shape their AI tools through capabilities such as retrieval-augmented generation (RAG), said Marc Staimer, founder and president of Dragon Slayer Consulting.

The open source LLMs out there today, the ones using the internet as their training ground, are going to have a problem.
Marc StaimerPresident and founder, Dragon Slayer Consulting

RAG creation lets an LLM use alternative data sets to keep up-to-date information or avoid AI hallucinations and enable enterprises to limit what data sets an LLM can use.

This is useful to not only avoid bad or incorrect output from GenAI, Staimer said, but also limit what data is being fed into the LLM. Industry-specific models trained on enterprise data, cut off from the larger internet, should remain in the legal clear. but building off an open source model could muddy an organization's liability.

"The open source LLMs out there today -- the ones using the internet as their training ground -- are going to have a problem," Staimer said.

Even the best safety and legal protocols are likely to change in the next several years as discussion and interest in GenAI continues to grow. But backup admins can likely remain focused on their standard duties.

"These questions are going through the courts right now," Staimer said. "It's not going to fall under [backup or] security; it's going to fall under legal."

Tim McCarthy is a journalist living in the North Shore of Massachusetts. He covers cloud and data storage news.

Dig Deeper on Data backup security