From CVSS to ASR: Putting a Number on AI Agent Risk

For seventeen years I scored security risk in CVSS. A vulnerability arrived, it carried a number, and that number did the work: it set the patch priority, the SLA, and the line in the audit report. When AI agents started landing inside the same networks, they arrived with no number at all. The risk was real, but it was a paragraph of hand-waving, and a paragraph never makes it onto a risk register. On 28 May 2026 that changed. Anthropic published an attack-success-rate for prompt injection, and for the first time AI risk got a CVSS-shaped number you can actually manage.

I came into AI security from the network side: PAN-OS, Check Point, Cisco, Fortinet, OT segmentation, ISO 27001 audits. That order is the whole point of this post. The discipline that makes a firewall estate governable is risk measurement, and it transfers directly to AI agents the moment someone gives you a number to measure against. Anthropic just did.

What a number does that a worry does not

CVSS is not loved, but it is useful, because it converts a vague fear into a triage decision. A 9.3 jumps the queue; a 4.0 waits for the maintenance window. The score gives a security team a defensible way to rank, schedule, and evidence its response. Strip that away and you are left with opinion, and opinion does not survive an audit or a budget meeting.

AI agents lived in exactly that opinion-only world until two weeks ago. Everyone agreed prompt injection was dangerous. Nobody could tell you how dangerous, for which configuration, against which threat model. You cannot put "dangerous" on a risk register, assign it an owner, or set a treatment threshold. A measured attack-success-rate fixes that. It is the input the AI risk conversation has been missing, and it arrives in a shape the security trade already knows how to use.

What Anthropic actually published

In the Claude Opus 4.8 system card, Anthropic reported that a browser-using agent was hijacked by an injected instruction 31.5% of the time with no safeguards, and 0.5% with safeguards on. In a coding tool-use setting against an adaptive attacker, the figures were 7.03% and 2.09%. According to the Anthropic system card, these were measured against held-out injection attacks, and, tellingly, the card also publishes a red-team metric where the new model scored worse than its predecessor. A vendor that prints its own regression is a vendor whose other numbers you can trust.

The specific values matter less than their existence. An attack-success-rate is now a thing that exists for AI agents, the way a CVSS exists for a CVE. Prompt injection has topped the OWASP Top 10 for LLM applications since 2023; what was missing was any way to quantify your exposure to it. Now there is one, and it belongs in the same risk-management machinery you already run.

CVSS for the CVE, ASR for the agent

The mapping is close enough to be operational. An attack-success-rate slots into the risk workflow at the same points a CVSS does, which is exactly why a network-security background reads it instantly.

In the risk workflow	Firewall CVE (CVSS)	AI agent (attack-success-rate)
What the number scores	Exploitability and impact of a flaw	Share of injection attempts that hijack the agent
Where it comes from	Vendor advisory, NVD	Vendor system card, or your own evaluation
Why it moves	Reassessed when exploitation is seen	Shifts with safeguards, tools, and threat model
What it triggers	Patch SLA, compensating controls	Safeguards, least privilege, a human approval gate
The residual	Unpatched exposure you accept or isolate	The 0.5% that still gets through, scored by blast radius

I am not claiming an attack-success-rate is as mature as CVSS. It is not. There is no shared rubric, no NVD equivalent, no agreed harness. But neither did CVSS have those on day one. The value is not precision; it is that the conversation finally has a quantitative anchor, and a rough number you can improve beats a perfect worry you cannot.

Why the network mindset is the right lens

The instinct that makes this useful is not an AI instinct. It is the habit of asking, of everything on the network, how likely is this to be abused and how much can it reach when it is. That is the question behind every firewall rule review and every CVSS triage I have run. An AI agent with tools is just another reachable, privileged thing on the network, and the same two questions apply: what is its attack-success-rate, and what is its blast radius if the attack lands.

This is the throughline of my move from network security into AI security. The failures are not novel; the scoring is. I made the structural version of this argument in why the firewall CVE and the AI-agent breach are the same mistake, where the shared root cause was missing provenance, least privilege, and change control. This post is the measurement companion to it: once you accept the failures are the same, you measure them the same way too. The framework view sits in the OWASP LLM Top 10 for 2026.

What I would tell a board this quarter

Treat an AI agent's attack-success-rate the way you already treat a CVSS. Put every agent that can act on a register with its number and its blast radius. Make a published attack-success-rate a procurement requirement, the same way you stopped buying network gear from vendors who would not publish advisories. If a vendor cannot give you a number under prompt injection, they have not measured it, and you cannot defend an exposure you have never sized.

Then set a threshold and a treatment, exactly as you would for a CVE. Scope the agent's tools and credentials to the task, gate irreversible actions behind human approval, and deny egress by default so a successful hijack cannot exfiltrate. That is the same risk-quantification-to-control loop I have applied across hundreds of firewall audits, and I cover the detection side in AI threat detection for CISOs.

The number is the start, not the end

A measured risk is not a solved one. Safeguards cut Anthropic's browser-use rate from 31.5% to 0.5%, a reduction of more than sixtyfold, but 0.5% is not zero. Run an agent through thousands of actions a day and that residual is a steady trickle of successful attacks, not an edge case. The number tells you where you stand; it does not discharge the work. Score the consequence as carefully as the probability, because a 0.5% hijack that can move money is worse than a 31.5% one that can only summarise a page.

The honest takeaway is the one every audit reaches. The discipline that protects you is built before the incident, not during it. Inventory your agents, attach a number to each, define the treatment, and re-measure on every model and prompt change. None of that is new. It is the firewall playbook, redrawn for a surface that did not exist when the playbook was written.

If you are deploying AI agents and cannot put a number on how exposed they are, that gap is measurable now rather than after an incident. I run engagements that bring enterprise risk-scoring discipline onto AI workloads: measure the attack-success-rate, score the blast radius, and treat it like any other line on the register. Request a review.