How useful is GenAI in software development? [Q&A]
Generative AI (GenAI) holds a lot of promise for software developers: an ability to help them ship code faster, improve productivity, and reduce time spent on menial and repetitive tasks. But how much of that promise are GenAI coding tools actually delivering -- and how much is hype?
We spoke with Matt Hoffman, product manager and data analyst at Uplevel, an engineering intelligence platform that uses data from across developer tools and collaboration platforms to help engineering leaders drive organizational improvements. Uplevel's data science team, Uplevel Data Labs, recently studied the impact of Copilot usage on developer productivity.
BN: How is generative AI used in software development today? How pervasive is it?
MH: We're seeing potential use cases run the gamut, and it really depends on each developer and each engineering org. GitHub's recent survey found that 92 percent of developers use AI tools, but this technology is still so new that we don't have a clear, agreed-upon picture of the best applications nor best practices. We've even heard reports of training for these tools becoming outdated within weeks of creation.
Some developers might use GenAI tools for code generation or to get on-demand explanations for how blocks of code function (without bugging a senior developer). Teams might use GenAI for bug detection and testing in the review phase; still others might use it for documentation and writing release notes. It’s a bit like the Wild West.
BN: What metrics do engineering teams look to improve with GenAI?
MH: The most obvious ones are the 'classic' metrics around efficiency and quality. Is GenAI helping developers reduce cycle time by increasing automation in the review process? Is it surfacing bugs faster so that code is approved and merged, and we have fewer pull requests (PRs) that lead to a degradation of service?
But there are other metrics around the developer experience that we could also examine when it comes to GenAI. Some of these are qualitative -- do developers report that GenAI helps them be more collaborative and reduce waiting time? Do they say that it reduces the time they have to spend doing work they don't like doing? And some are quantitative -- do we see them spending more time in focus or deep work? Do we observe a reduction in what we call 'sustained always on' time, where devs are working after hours to catch up, which can indicate burnout risk?
BN: When it comes to productivity improvements in particular, do GenAI-based coding tools work?
MH: We conducted a study on this, and our initial hypothesis was that they would. But when we looked at around 800 developers in our data set, and compared them pre- and post-Copilot access, the data surprised us. We didn't see significant changes in PR cycle time or PR throughput, so efficiency didn't improve in a meaningful way with Microsoft's GitHub Copilot.
But quality was significantly impacted… For the worse. We saw a 41 percent increase in bug rate in our test group. We also did not observe much benefit for 'sustained always on,' a potential indicator of burnout. If GenAI is improving the developer experience (and it still certainly might be), it's likely happening in ways that don't substantially change the amount of work they're doing or how they're collaborating. The other hypothesis is that other factors at work continue to be bigger constraints that GenAI can't overcome, for example: unclear requirements on a ticket or PR reviewers that don't have capacity to review work.
All said, these were not the results we thought we'd see, but they're still really important data points for engineering teams considering how to adopt this new technology and where to leverage AI’s value in the development process. I, for one, am eager to see whether this story continues to hold over coming months.
BN: What would you advise organizations and their engineering teams, as they evaluate and implement GenAI for software development?
MH: First, we need to accept that if 92 percent of developers are using AI tools, the time for questioning adoption has passed. Developers are using tools like Copilot because they like them, and these technologies are evolving so fast that just because we found no substantive improvements today, that doesn’t mean that’ll always be the case.
Our biggest takeaway for engineering leaders is to keep experimenting: set goals, figure out a baseline metric to test against, and measure the impact over time. Discover what use cases yield better results for your developers, and find ways to incorporate those into their workflows. It's up to engineering leaders to put guardrails and guidelines in place so that that experimentation and innovation comes with minimal risk.
On that note, before assuming that GenAI will solve all of an engineering organization’s woes, step back and consider: What are the biggest problems the organization is facing? Depending on the answer to that question, GenAI may be beneficial or have little tangible impact.
BN: As we approach the end of the year what does the future hold for GenAI in software development?
MH: It's hard to predict because this space is evolving so quickly! But the most impactful change in development is likely the same as the most impactful change in other industries: helping teams be more strategic and forward-thinking by taking away some of the mindless drudgery of work. You don't want AI to be creative so you can focus on code refactoring. You want it to do the refactoring so you can be creative. It's still so early, but we've also come a long way in such a short time. It's difficult to imagine that we won’t get there relatively soon.
Image credit: meshcube/depositphotos.com