Lakera Introduces Open-Source Security Benchmark to Assess LLM Vulnerabilities in AI Agents

Check Point Software Technologies Ltd. (NASDAQ: CHKP), a global leader in cyber security, together with Lakera, a leading AI-native security platform for Agentic AI applications, and researchers from The UK AI Security Institute (AISI), have announced the launch of the backbone breaker benchmark (b3). This open-source framework is designed specifically to evaluate the security of large language models (LLMs) used within AI agent systems.

The b3 benchmark is based on a new concept known as threat snapshots. Rather than requiring the recreation of an AI agent’s full operational workflow, threat snapshots focus on the precise interaction points where vulnerabilities in LLM behaviour are most likely to occur. By narrowing the assessment to these key moments, developers and model providers can gain clearer insight into how their systems respond under realistic adversarial pressures, without the complexity of modelling an entire agent lifecycle.

“We built the b3 benchmark because today’s AI agents are only as secure as the LLMs that power them,” said Mateo Rojas-Carulla, Co-Founder and Chief Scientist at Lakera, a Check Point company. “Threat Snapshots allow us to systematically surface vulnerabilities that have until now remained hidden in complex agent workflows. By making this benchmark open to the world, we hope to equip developers and model providers with a realistic way to measure, and improve, their security posture.”

The evaluation framework incorporates 10 representative agent “threat snapshots” supported by a high-quality dataset of 19,433 adversarial attacks gathered from Gandalf: Agent Breaker, a gamified red-teaming environment. The benchmark measures exposure to a range of attack types, including system prompt exfiltration, phishing link insertion, malicious code injection, denial-of-service behaviours, and unauthorised tool execution.

Lakera Introduces Open-Source Security Benchmark to Assess LLM Vulnerabilities in AI Agents

How to Execute Trades on MT4 on PC in the UAE: A Beginner-Friendly Guide to Profitable Forex Trading

BenQ Expands MA Series with New Flagship and 4K Nano Gloss Monitors for Mac Users

The Power of Webdesign in the Digital Era

Technology and Lab Standards Used by Leading IVF Clinics in Dubai

Jaguar Land Rover and Chery Unveil Freelander Revival at Shanghai Investment Summit

Two-Week Turnaround: How Kwik Payments Plans to Rewire African E-Commerce

Twelve luxury homes sold daily as Dubai notches AED5 billion May

Eight hours reduced to five minutes: Papua New Guinea defence fund overhauls reconciliation

Lakera Introduces Open-Source Security Benchmark to Assess LLM Vulnerabilities in AI Agents

Related Posts