New Nvidia, GitHub AI coding assistants expand devs' options

GitHub Copilot Enterprise and StarCoder2 LLMs, both released this week, will add to an array of AI coding assistants. But caution, especially with security, is still warranted.

Updates from GitHub and a consortium comprised of Nvidia, HuggingFace and ServiceNow will bring fresh options to an already wide selection of AI coding assistants for developers. But experts urge adoption caution amid ongoing security and copyright concerns.

GitHub Copilot Enterprise, a new tier of the popular GitHub Copilot AI coding assistant, became generally available this week at $39 per user per month for users of GitHub's Enterprise Cloud. This version offers customization for organizations using Copilot, generating chat answers, code completion and pull request difference analysis based on a specific codebase. An add-on that will offer fine-tuned AI models is coming soon, according to a GitHub blog post.

Andy Thurai, analyst, Constellation ResearchAndy Thurai

"Coding copilots are a solid use case for improving developer efficiency that many enterprises are considering, experimenting with and implementing," said Andy Thurai, an analyst at Constellation Research. "GitHub Copilot, … backed by Microsoft, has an early adopter advantage because of its integration with Visual Studio."

GitHub Copilot is already among the most widely used AI coding assistants available, according to a 2023 survey of 800 engineering professionals. The survey, conducted by software supply chain security vendor Sonatype, found that 97% of DevOps and SecOps leader respondents currently employ generative AI to some degree in their workflows. Of that 97%, a majority reported using two or more tools daily. Topping the list of most-used tools at 86% was ChatGPT, followed by GitHub Copilot at 70%.

As such, it will be difficult for competitors to unseat GitHub Copilot, Thurai said.

"Microsoft has complete control within the plugin to Visual Studio software," he said. "The additional cost of Copilot plugins is so minimally incremental that most enterprises have already opted to use that as a default practice."

Security caveats remain for AI coding assistants

With GitHub Copilot Enterprise, GitHub claims "enterprise-grade security, safety and privacy," which includes excluding organizations' data from model training by default. As with Copilot Business, Copilot Enterprise includes intellectual property indemnity for customers. IP indemnity is meant to assuage concerns about ongoing lawsuits against Microsoft, GitHub and large language model (LLM) partner OpenAI that claim their AI models were trained on copyrighted data. Microsoft and GitHub have pledged to cover any costs paying customers might incur depending on the outcome of those lawsuits.

Despite that indemnity, Sonatype's survey report sounded a note of caution about AI coding assistants due to copyright concerns.

"The copyright issues around the training sets and outputs of generative AI aren't going away anytime soon," the report read. "Overall, the devil is in the details, and the legal challenges are likely to help democratize the AI landscape."

Meanwhile, even this new high-end Copilot tier -- and any AI coding assistant, regardless of vendor -- comes with significant caveats for now, particularly around security. Recent research by cybersecurity vendor Snyk showed that AI coding assistants, including GitHub Copilot, are prone to reproducing security vulnerabilities and bad practices from a customer's existing codebase.

LLMs are being refined rapidly, but still sometimes "make stuff up," according to Thurai. "Which means you have to avoid that by either fine-tuning the model, [adding] RAG [retrieval augmented generation] and [doing] other things to make it better."

GitHub offers Dependabot, a free tool that discovers vulnerable software dependencies in codebases, and requires two-factor authentication for all GitHub contributors. A GitHub Advanced Security license available for $49 per active code committer per month comes with code and secrets scanning, custom Dependabot auto-triage rules, and dependency reviews. Numerous third-party tools to scan and remediate security vulnerabilities in AI-generated code are also available.

"Regardless of the tool used, teams cannot and should not depend on any single tool to guarantee the security of their software," a GitHub spokesperson wrote to TechTarget Editorial in response to the Snyk report.

As enterprises move forward with AI coding assistants, Sonatype's survey findings indicate these concerns feed lingering skepticism among some DevSecOps pros.

"A striking 75% of both [DevOps and SecOps leads] cited feeling pressured from leadership to adopt AI technologies, recognizing their potential to bolster productivity despite security concerns," the report read.

StarCoder2 offers lightweight LLM, opt-in data

ChatGPT and GitHub Copilot's early dominance notwithstanding, competitors abound, including Meta's Code Llama, Stability AI's StableCode, Amazon CodeWhisperer, and IBM's WatsonX code assistant. Soon, domain-specific AI code assistants will also be built using the StarCoder family of LLMs, which reached version 2 this week.

The industry group behind StarCoder2 -- enterprise workflow vendor ServiceNow, open source AI clearinghouse HuggingFace and AI chipmaker Nvidia -- claims the updated trio of LLMs will address multiple security and legal concerns around AI coding assistants. These models will be cheaper to run than existing models, can easily be fine-tuned to provide better quality answers based on specific codebases, and addresses ongoing concerns about data sourcing and privacy, according to Nvidia officials.

"What you would call a frontier model, GPT-4 class models, is probably several hundred billion, maybe even up to a trillion parameters," said Jonathan Cohen, vice president of applied research at Nvidia. "[But] there's this emerging class of models that's in the five to 15 billion parameter range, … and what's nice about them is they fit very comfortably on a single GPU. … You don't need a special server with many GPUs or super-fast interconnects because you're going to split it across many nodes."

Copyright issues around the training sets and outputs of generative AI aren't going away anytime soon. Overall, the devil is in the details, and the legal challenges are likely to help democratize the AI landscape.
'Risks and Rewards of AI'Sonatype

StarCoder2 models will come in three sizes in that smaller range: a 3-billion-parameter version trained by ServiceNow, a 7-billion-parameter model trained by HuggingFace and a 15-billion-parameter model trained by Nvidia. These smaller models were also trained for longer on a dataset seven times the size of the first generation of StarCoder models, improving their accuracy, Cohen said.

An Nvidia blog post this week also touted that StarCoder2 LLMs were trained "using responsibly sourced data under license from the digital commons of Software Heritage."

That approach might appeal to enterprises hesitant to use Copilot due to copyright concerns, Thurai said.

"Another important factor is that [StarCoder2] is trained in 619 programming languages," he said. "This can help programmers come up to speed on pretty much any language."

StarCoder2 is likely to find a home among vendors that have their own domain-specific language or offer software platforms, to create custom AI coding assistants, Cohen predicted. ServiceNow already made a domain-specific Now LLM available in September based on the first version of StarCoder.

"Workflow generation in addition to code generation coming from a workflow-rich company [such as] ServiceNow could be a value-add" for forthcoming AI coding assistants built using StarCoder2, Thurai said.

Beth Pariseau, senior news writer for TechTarget Editorial, is an award-winning veteran of IT journalism covering DevOps. Have a tip? Email her or reach out on X, formerly known as Twitter, @PariseauTT.

Dig Deeper on Software design and development