Popular LLMs produce insecure code by default


A new study from Backslash Security looks at seven current versions of OpenAI's GPT, Anthropic's Claude and Google's Gemini to test the influence varying prompting techniques have on their ability to produce secure code.
Three tiers of prompting techniques, ranging from 'naive' to 'comprehensive,' were used to generate code for everyday use cases. Code output was measured by its resilience against 10 Common Weakness Enumeration (CWE) use cases. The results show that although secure code output success rises with prompt sophistication all LLMs generally produced insecure code by default.
In response to simple, 'naive' prompts, all LLMs tested generated insecure code vulnerable to at least four of the 10 common CWEs. Naive prompts merely asked to generate code for a specific application, without specifying security requirements.
Prompts that generally specified a need for security produced more secure results, while prompts that requested code that complied with Open Web Application Security Project (OWASP) best practices produced superior results, yet both still yielded some code vulnerabilities for five out of the seven LLMs tested.
Overall, OpenAI's GPT-4o had the lowest performance across all prompts, scoring a 1/10 secure code result using 'naive' prompts. When prompted to generate secure code, it still produced insecure outputs vulnerable to 8 out of 10 issues. GPT-4.1 didn’t fare much better with naive prompts, scoring 1.5/10.
Among the GenAI tools, the best performer was Claude 3.7 Sonnet, scoring 6/10 using naive prompts and 10/10 with security-focused prompts.
"For security teams, AI-generated code -- or vibe coding -- can feel like a nightmare," says Yossi Pik, co-founder and CTO of Backslash Security. "It creates a flood of new code and brings LLM risks like hallucinations and prompt sensitivity. But with the right controls -- like org-defined rules and a context-aware MCP server plugged into a purpose-built security platform -- AI can actually give AppSec teams more control from the start. That's where Backslash comes in, with dynamic policy-based rules, a context-sensitive MCP server, and an IDE extension built for the new coding era."
You can read more on the Backslash blog.
Image credit: meshcube/depositphotos.com