Armis Launches Report Highlighting Security Gaps in AI Development

Armis Launches Report Highlighting Security Gaps in AI Development
🕧 5 min

Research reveals 100% of leading generative AI models fail to generate secure code for critical development scenarios

Armis, the cyber exposure management & security company, is warning that the rapid enterprise adoption of AI-native development is outpacing critical security safeguards, leaving organizations exposed to systemic vulnerabilities.

New research from Armis Labs’ Trusted Vibing Benchmark Report, which evaluates 18 leading generative AI models across 31 test scenarios, reveals a 100% failure rate in generating secure code. These vulnerabilities are most prevalent in high-risk areas like memory buffer overflows, design file uploads and authentication systems. Therefore, organizations should immediately implement AI-native application security controls to reduce risk.

Catch more IT Insights: ModelOps Architects: Bridging AI Strategy and Enterprise Operations

“The era of vibe coding is here, but speed should not come at the cost of security,” said Nadir Izrael, CTO and Co-Founder of Armis. “Our research finds that the worst offenders are the same ones selling security solutions for the very vulnerabilities their models create. If the industry continues to integrate autonomous code without oversight, we aren’t just halting velocity – we are accelerating technical debt.”

The report identifies a concerning variance in security across the AI landscape:

  • Universal Blind Spots: Even the most advanced models produce vulnerable code in over 30% of scenarios. This is compounded by a dangerous perception gap. The 2026 Armis Cyberwarfare Report indicates that 77% of global IT decision-makers trust the integrity and security of the third-party code used in their most critical applications, despite 16% admitting they do not know if it is thoroughly checked for high-severity vulnerabilities.
  • The Performance Gap: Not all models are created equal. For example, Gemini 3.1 Pro emerges as a leader in security posture, while older proprietary models show significantly higher vulnerability counts and a lack of baseline security guardrails.
  • Cost vs. Security: A higher cost does not necessarily mean better safety. Low-cost open-source models, such as Qwen 3.5 and Minimax M2.5, provide highly competitive security performance at a fraction of the price.

“Organizations are currently playing a subjective guessing game with AI-generated code,” added Izrael. “To effectively move forward, application security must evolve from ‘scanner management’ to true ‘risk management.’ Security teams need to stop drowning in signal noise and start using AI-native controls that can prioritize findings based on real business impact.”

Catch more IT Insights: Cognitive CIOs and the AI-Powered Enterprise: What You Need to Know

The Trusted Vibing Benchmark Report, which will be regularly updated by the pioneering team at Armis Labs, measures how leading commercial and open-source AI models generate secure code and resist producing critical vulnerabilities across various scenarios. It focuses on four core areas: testing generated code using “atomic” features or functions, the choice of prompt, the choice of test harness, and the choice of application security tool.

Armis Centrix™ for Application Security helps organizations secure their entire software supply chain through AI-powered detection, contextualization and remediation.

Write to us [wasim.a@demandmediaagency.com] to learn more about our exclusive editorial packages and programmes.

  • Business Wire has been synonymous with well-known press release distribution for more than half a century. Owned by Berkshire Hathaway, it combines regulatory compliance expertise with a powerful media network, helping enterprises large and small share news that influences markets and decision-makers alike.

Recommended Reads :