Check Point Software Technologies Ltd. (NASDAQ: CHKP), a global leader in cyber security, together with Lakera, a leading AI-native security platform for Agentic AI applications, and researchers from The UK AI Security Institute (AISI), have announced the launch of the backbone breaker benchmark (b3). This open-source framework is designed specifically to evaluate the security of large language models (LLMs) used within AI agent systems.
The b3 benchmark is based on a new concept known as threat snapshots. Rather than requiring the recreation of an AI agent’s full operational workflow, threat snapshots focus on the precise interaction points where vulnerabilities in LLM behaviour are most likely to occur. By narrowing the assessment to these key moments, developers and model providers can gain clearer insight into how their systems respond under realistic adversarial pressures, without the complexity of modelling an entire agent lifecycle.
“We built the b3 benchmark because today’s AI agents are only as secure as the LLMs that power them,” said Mateo Rojas-Carulla, Co-Founder and Chief Scientist at Lakera, a Check Point company. “Threat Snapshots allow us to systematically surface vulnerabilities that have until now remained hidden in complex agent workflows. By making this benchmark open to the world, we hope to equip developers and model providers with a realistic way to measure, and improve, their security posture.”
The evaluation framework incorporates 10 representative agent “threat snapshots” supported by a high-quality dataset of 19,433 adversarial attacks gathered from Gandalf: Agent Breaker, a gamified red-teaming environment. The benchmark measures exposure to a range of attack types, including system prompt exfiltration, phishing link insertion, malicious code injection, denial-of-service behaviours, and unauthorised tool execution.
