Anthropic’s browser agent got hijacked 31.5% of the time before safeguards engaged

9 2 minutes read

The latest findings from frontier labs reveal that Anthropic has the highest prompt injection figures compared to other leading AI companies. In a recent test of its newest model, an attacker managed to hijack it 31.5% of the time before safeguards kicked in. This number stands out as a potential security risk, but in reality, it provides valuable insights into the model’s resilience against attacks.

Unlike other labs like OpenAI, Google, and Meta, Anthropic took a comprehensive approach to measuring prompt injection across four different surfaces. Each lab had its own unique methodology for testing and reporting these figures, leading to discrepancies in the results. Prompt injection involves hiding malicious instructions within a document or webpage that can lead to unauthorized data access or actions.

Carter Rees, VP of AI at Reputation, highlighted the challenges posed by prompt injection, noting that it can bypass traditional security measures due to its unique nature. Adam Meyers, Senior Vice President of Counter Adversary Operations at CrowdStrike, emphasized the importance of protecting AI models against misuse, data poisoning, and prompt injection to prevent security breaches.

Anthropic’s Opus 4.8 model measured prompt injection success rates across different surfaces, showing varying levels of vulnerability. For example, the model was more susceptible to attacks in a browser environment compared to a coding environment. By analyzing these results, security teams can better understand the risks associated with deploying AI models in different contexts.

In contrast, OpenAI focused on measuring prompt injection in one specific area, connectors, using known attack vectors. Google and Meta took different approaches to prompt injection testing, with Google emphasizing resistance without providing specific numbers, and Meta relying on separate security tools like LlamaFirewall to protect against attacks.

To help security teams make informed decisions, a Cross-Vendor Prompt Injection Disclosure Grid was created to compare the testing methodologies and results from different labs. By analyzing this grid, security teams can gain a better understanding of how each lab approaches prompt injection testing and what the results mean in practice.

In conclusion, prompt injection testing is essential for evaluating the security of AI models, but there is currently no industry standard for measuring these risks. Security teams should carefully review the testing methodologies and results provided by AI vendors to ensure that deployed models are adequately protected against prompt injection attacks. By following best practices and conducting independent testing, organizations can mitigate the risks associated with prompt injection and safeguard their AI deployments.