your123 - stock.adobe.com
GitHub Copilot Autofix expands as AI snags software delivery
GitHub Copilot Autofix could help vulnerability management keep pace as the volume of AI-generated code swamps delivery processes, but can AI be trusted to rein in AI?
GitHub Copilot Autofix will support longer-term security vulnerability remediation campaigns and hook into third-party pipeline tools as auto-generated code creates new bottlenecks in software delivery.
Copilot Autofix, which became generally available for Advanced Security customers in August and for all repositories in September, scans code in GitHub repositories for security vulnerabilities, generates explanations for developers about why they matter and suggests code fixes. This week, the service expanded to include two new features in public preview: security campaigns, meant to tackle backlogs of security debt, and new integrations with third-party code scanning tools including ESLint, JFrog static application security testing (SAST) and Black Duck's Polaris.
Organizations need to use automated vulnerability remediation in two places -- as developers are coding and to reduce backlogs, according to Katie Norton, an analyst at IDC.
"Autofix can help with the 'as coding' part, helping developers reduce the introduction of vulnerabilities. And security campaigns will help with the reduction of the backlog or security debt," she said. "Many of the existing tools in the automated remediation space are either addressing just the 'as coding' or backlog reduction currently, so having the dual-pronged approach as part of GitHub Advanced Security will be useful."
So far, security campaigns are primarily focused on GitHub's CodeQL vulnerability scanning and SAST findings, but Norton said she expects this to expand to include open source vulnerabilities and dependencies through GitHub's Dependabot as well. Other source code analysis tools -- such as Endor Labs Open Source -- model open source dependencies and model the impact of upgrades to vulnerable open source packages.
"There are some automatic security update capabilities, but I would like to see Dependabot more tightly integrated with the AI-powered Autofix to make it more intelligent," Norton said.
GitHub took a step in that direction this week with the private preview release of Copilot Autofix for Dependabot that covers TypeScript repositories. The new integration includes AI-generated fixes to resolve breaking changes caused by dependency upgrades in Dependabot-authored pull requests, according to a GitHub changelog update.
AI's adverse effect on software delivery performance
Amid the expansion of AI-powered source code analysis tools, Google Cloud's DevOps Research and Assessment (DORA) team's annual Accelerate State of DevOps report this month examined the effect of AI-generated code on software delivery processes. AI adoption has been strong among some 3,000 respondents to the survey this year, with 81% reporting they use AI, primarily for writing, explaining, documenting and optimizing code, as well as for summarizing information.
However, while Google DORA's analysis showed benefits for AI in improving individuals' flow, productivity and job satisfaction, as well as the quality of project documentation, its effects on broader software delivery performance weren't as positive. For every 25% increase in AI adoption, Google DORA found there was a decrease of 1.5% in software delivery throughput, and a 7.2% decrease in delivery stability, a combined measure of the failure rate and rework rate for software changes.
"The fundamental paradigm shift that AI has produced in terms of respondent productivity and code generation speed may have caused the field to forget one of DORA's most basic principles -- the importance of small batch sizes," according to the report. "DORA has consistently shown that larger changes are slower and more prone to creating instability."
Another industry analyst had a different explanation.
Andy Thurai Analyst, Constellation Research
"While 75% of respondents are using AI for code writing, less than 60% of them are using it for debugging, code review and test writing," said Andy Thurai, an analyst at Constellation Research. "It could be because tools in those areas may not be mature, or organizations choose not to trust them for those tasks. Regardless, if half of DevOps is automated and AI-driven and the rest is still trying to catch up, it won't end well."
Another industry observer said he believes it's even simpler than that.
"AI use for software development isn't remotely mature yet," said David Strauss, CTO at WebOps service provider Pantheon. "Companies are exaggerating their adoption; they ought to show stronger trust in AI code, better productivity or higher engineering-hour substitution effects if they're actually getting the results out of AI that they claim."
The Google DORA report indicates shaky trust in AI so far -- 39.2% of respondents reported having little or no trust in AI. A majority -- 67% -- of respondents reported at least some improvement to their ability to write code because of AI-assisted coding tools, but for about 40%, that improvement was described as "slight." Only 10% have observed "extreme" improvements to their ability to write code because of AI, according to the report.
Can AI be trusted to fix AI?
GitHub uses an automated test harness to monitor the quality of suggestions from Copilot Autofix, according to the company's documentation. GitHub red teams also stress-test the system to check for any potential harm, and a filtering system on the large language model helps prevent potentially harmful suggestions being displayed to users, the documentation states.
But GitHub's documentation also includes this caveat:
"You must always consider the limitations of AI and edit the changes as needed before you accept the changes," it reads. "You should also consider updating the CI testing and dependency management for a repository before enabling Copilot Autofix for code scanning."
That's exactly how one enterprise organization has approached using GitHub Copilot Autofix.
"It is sometimes astonishingly good, and sometimes does things I don't expect or would break other functionality," said Kyler Middleton, principal software engineer at healthcare tech company Veradigm. "It's strikingly like a human engineer in that way."
However, detecting AI-generated code is "an unsolved problem, with a capital P," Middleton said. "Without being able to detect which code is AI-generated versus human-generated, we have settled for testing our code as much as possible. … It's very much like the times before AI -- engineers are likely copying code from random places on the internet, too. We have to rely on smart folks to read the code, and good unit testing to find the problems."
But others have qualms about using AI to test AI, particularly when the same underlying foundational models are used on both sides of the code-generation and code-testing equation.
"It's hard to use AI to trust AI for the same reason people often miss their own mistakes -- an AI system that understands a potential mistake generally won't make it in the first place," Pantheon's Strauss said. "A different AI might be able to spot problems in the work of another, but other AIs may still be too similar in their training and distribution of capabilities to 'code review' the work of another AI in a way that inspires confidence, at least today."
Meanwhile, using separate foundational models to evaluate the output of another introduces complexity and cost into the equation, even as many enterprises already struggle with generative AI return on investment, Constellation Research's Thurai said.
"That comes back to, what is the purpose of doing all this?" he said. "If you're trying to do it for production efficiency and cost efficiency, that might well defeat the purpose."
Beth Pariseau, senior news writer for TechTarget Editorial, is an award-winning veteran of IT journalism covering DevOps. Have a tip? Email her or reach out @PariseauTT.