Researchers broke every AI defense they tested. Here are 7 questions to ask vendors.

1 2 minutes read

Security teams are facing a significant challenge with AI defenses that are failing to protect against modern threats. Recent research from OpenAI, Anthropic, and Google DeepMind has revealed that most AI security products are not effective against adaptive attacks. The paper titled “The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections” tested 12 AI defenses and found that they had bypass rates above 90% under adaptive attack conditions.

The study focused on prompting-based, training-based, and filtering-based defenses, all of which collapsed under the pressure of adaptive attacks. Prompting defenses had success rates ranging from 95% to 99%, while training-based methods had bypass rates of 96% to 100%. This highlights a critical issue: most AI security products are being tested against attackers that do not behave like real attackers.

One of the key reasons for the failure of these defenses is that they are stateless, while AI attacks are not. Modern prompt injection techniques, such as Crescendo and Greedy Coordinate Gradient (GCG), exploit this mismatch by breaking requests into fragments, automating attacks, and obfuscating malicious payloads. These attacks operate at the semantic layer, making signature-based detection ineffective.

The rapid deployment of AI in enterprise applications, as predicted by Gartner, is exacerbating the problem. Attackers are leveraging AI to execute sophisticated cyber operations with minimal human involvement, bypassing traditional endpoint defenses. The shift towards AI-based attacks is challenging security teams to keep pace with evolving threats.

Four distinct attacker profiles are already exploiting the gaps in AI defenses. External adversaries, malicious B2B clients, compromised API consumers, and negligent insiders are leveraging adaptive attack techniques to breach security controls. These attackers are adapting their approaches to bypass defenses that fail to detect multi-step attacks, encoded payloads, and context across conversation turns.

To address these challenges, security leaders must ask critical questions when evaluating AI security vendors. Questions about bypass rates against adaptive attackers, detection of multi-turn attacks, handling of encoded payloads, and context tracking are essential. Vendors must demonstrate their ability to adapt defenses against novel attack patterns and update them in a timely manner.

In conclusion, the research findings underscore the urgent need for enterprises to reassess their AI security controls. The gap between the rapid deployment of AI technologies and the stagnant state of security defenses poses a significant risk. Security teams must proactively address these vulnerabilities to prevent data breaches and protect sensitive information.