Armis Launches First-of-Its-Kind Benchmark Report Warning of Critical Security Gaps in AI-Native Development

News | 19.03.2026

SAN FRANCISCO – March 19, 2026 – Armis, the cyber exposure management & security company, is warning that the rapid enterprise adoption of AI-native development is outpacing critical security safeguards, leaving organizations exposed to systemic vulnerabilities.

New research from Armis Labs’ Trusted Vibing Benchmark Report, which evaluates 18 leading generative AI models across 31 test scenarios, reveals a 100% failure rate in generating secure code.

These vulnerabilities are most prevalent in high-risk areas like memory buffer overflows, design file uploads and authentication systems. Therefore, organizations should immediately implement AI-native application security controls to reduce risk.

The era of vibe coding is here, but speed should not come at the cost of security. Our research finds that the worst offenders are the same ones selling security solutions for the very vulnerabilities their models create. If the industry continues to integrate autonomous code without oversight, we aren’t just halting velocity – we are accelerating technical debt.

Nadir Izrael, CTO and Co-Founder of Armis

The report identifies a concerning variance in security across the AI landscape:

Universal Blind Spots: Even the most advanced models produce vulnerable code in over 30% of scenarios. This is compounded by a dangerous perception gap. The 2026 Armis Cyberwarfare Report indicates that 77% of global IT decision-makers trust the integrity and security of the third-party code used in their most critical applications, despite 16% admitting they do not know if it is thoroughly checked for high-severity vulnerabilities.
The Performance Gap: Not all models are created equal. For example, Gemini 3.1 Pro emerges as a leader in security posture, while older proprietary models show significantly higher vulnerability counts and a lack of baseline security guardrails.
Cost vs. Security: A higher cost does not necessarily mean better safety. Low-cost open-source models, such as Qwen 3.5 and Minimax M2.5, provide highly competitive security performance at a fraction of the price.

Organizations are currently playing a subjective guessing game with AI-generated code. To effectively move forward, application security must evolve from ‘scanner management’ to true ‘risk management.’ Security teams need to stop drowning in signal noise and start using AI-native controls that can prioritize findings based on real business impact.

Nadir Izrael, CTO and Co-Founder of Armis

The Trusted Vibing Benchmark Report, which will be regularly updated by the pioneering team at Armis Labs, measures how leading commercial and open-source AI models generate secure code and resist producing critical vulnerabilities across various scenarios. It focuses on four core areas: testing generated code using “atomic” features or functions, the choice of prompt, the choice of test harness, and the choice of application security tool.

Armis Centrix™ for Application Security helps organizations secure their entire software supply chain through AI-powered detection, contextualization and remediation.

While AI-native development and services like Google Cloud AI offer transformative potential for modern enterprises, the Armis report reminds us that innovation is only as strong as the security visibility supporting it. The key is not to slow down adoption, but to ensure that every AI asset and LLM interaction is accounted for within your security perimeter.

Softprom is ready to help you navigate this evolving landscape, ensuring your organization leverages the full power of AI without compromising on data integrity or infrastructure safety.

Contact our experts for a consultation to discuss how to align your AI initiatives with a robust cybersecurity strategy.

Order a consultation

About company