Popular LLMs produce insecure code by default

By Ian Barker
Published 3 months ago

A new study from Backslash Security looks at seven current versions of OpenAI's GPT, Anthropic's Claude and Google's Gemini to test the influence varying prompting techniques have on their ability to produce secure code.

Three tiers of prompting techniques, ranging from 'naive' to 'comprehensive,' were used to generate code for everyday use cases. Code output was measured by its resilience against 10 Common Weakness Enumeration (CWE) use cases. The results show that although secure code output success rises with prompt sophistication all LLMs generally produced insecure code by default.

In response to simple, 'naive' prompts, all LLMs tested generated insecure code vulnerable to at least four of the 10 common CWEs. Naive prompts merely asked to generate code for a specific application, without specifying security requirements.

Prompts that generally specified a need for security produced more secure results, while prompts that requested code that complied with Open Web Application Security Project (OWASP) best practices produced superior results, yet both still yielded some code vulnerabilities for five out of the seven LLMs tested.

Overall, OpenAI's GPT-4o had the lowest performance across all prompts, scoring a 1/10 secure code result using 'naive' prompts. When prompted to generate secure code, it still produced insecure outputs vulnerable to 8 out of 10 issues. GPT-4.1 didn’t fare much better with naive prompts, scoring 1.5/10.

Among the GenAI tools, the best performer was Claude 3.7 Sonnet, scoring 6/10 using naive prompts and 10/10 with security-focused prompts.

"For security teams, AI-generated code -- or vibe coding -- can feel like a nightmare," says Yossi Pik, co-founder and CTO of Backslash Security. "It creates a flood of new code and brings LLM risks like hallucinations and prompt sensitivity. But with the right controls -- like org-defined rules and a context-aware MCP server plugged into a purpose-built security platform -- AI can actually give AppSec teams more control from the start. That's where Backslash comes in, with dynamic policy-based rules, a context-sensitive MCP server, and an IDE extension built for the new coding era."

Popular LLMs produce insecure code by default

Recent Headlines

In five seconds, this SSD will self-destruct: 5… 4… 3… 2…

Addressing key tech challenges in the public sector [Q&A]

Belkin is ending cloud support for many Wemo smart devices

Windows 11 Build 27898 introduces taskbar icon scaling and system recovery improvements

Supply chain issues pose major risks to financial organizations

Philips launches a 4K gaming monitor built for speed and clarity

This ergonomic AI mechanical keyboard is built for modern productivity

Most Commented Stories

Betanews Is Growing Alongside You

Windows 11 25H2 has a new option to remove all unwanted Microsoft apps

16 Billion Passwords Exposed: Major Leak Hits Apple, Facebook and Google Users

Will Windows 10 stop working? See if your PC will survive the switch to Windows 11

Half of Americans think AI is a threat, the other half don't. Who's right?

Apple’s Liquid Glass Control Center Gets a Much-Needed Fix in iOS 26 Beta 2

Apple’s CarPlay Ultra Comes to a Halt as Industry Giants Start Changing Their Minds

Never mind Windows 11, Windows Classic Remastered is the nostalgic Microsoft operating system you didn't know you wanted