Prompt Injection Isn’t a Mystery — It’s Measurable

1. The Objection

“Prompt injection sounds scary — but how do you even measure it?”

That’s the objection I hear most often. And it reminds me of Douglas Hubbard’s point in How to Measure Anything in Cybersecurity Risk: intangibility is a myth. Uncertainty doesn’t mean unmeasurable. Startups don’t need more hand-waving about “AI risk.” They need ranges, evidence, and numbers that answer practical questions like: do we spend $5,000 on defences today, or risk $50,000 on an incident tomorrow?

2. What Prompt Injection Actually Is

Prompt injection comes in two flavors. Direct injection is when a user slips malicious instructions into the model input. Indirect injection hides malicious instructions in external data — a poisoned web page, a booby-trapped PDF, or even a snippet in a database the model reads. OWASP ranks prompt injection as the number one risk in its 2025 Top 10 for LLM applications. And it’s not hypothetical: Anthropic’s Claude browser extension flaw showed how attackers can smuggle instructions through seemingly trusted sites.

3. Why “Unmeasurable AI Risk” Is a Myth

Hubbard’s argument applies here: everything can be measured if you define it. For prompt injection, the measurable parts are:

  • Likelihood: the percentage of crafted prompts that succeed.
  • Impact: the cost of what happens when they do (a leak, a hijacked workflow, a lost deal).
  • Exposure: how often your system interacts with untrusted input.

If you test with 100 malicious prompts and 40 succeed, you’ve quantified likelihood as 40%. If a successful attack could cost $250,000, your expected risk is:

$$ 0.40 \times 250{,}000 = 100{,}000 $$

That number is something a founder or a board can understand.

4. What the Research Shows

This isn’t just theory. Researchers at USENIX Security 2024 tested five attack techniques across ten models and seven tasks, with success rates hitting as high as 70%. Benchmarks like GenTel-Safe compiled over 84,000 attack samples, showing how prompt injection can be tested systematically, not just anecdotally.

Detection is improving but far from perfect. Known-answer detectors miss about 20%. Newer approaches like “Attention Tracker” improve detection by about 10% AUROC over baselines, and frameworks like PromptShield show further optimization. The evidence is clear: prompt injection is quantifiable, repeatable, and hard to ignore.

5. A Practical Measurement Model for Startups

You don’t need a research lab to measure. A founder’s playbook could look like this:

  1. Run 10–100 crafted prompts. Count how many succeed. That’s your likelihood range.
  2. Define your impact categories: nuisance (minor reputation hit), leak (customer concern), hijack (real financial loss). Assign dollar values from downtime, breach costs, or contract size.
  3. Multiply likelihood × impact. Example: 40% × $250,000 = $100,000 expected risk.
  4. Add defences — such as detection filters or allow/deny policies — and re-test. If success drops from 40% to 10%, your expected risk drops to $25,000. If controls cost $20,000, you’ve bought an $80,000 risk reduction.

That’s ROI you can explain without hand-waving.

6. The Takeaway

Prompt injection isn’t a black box. It’s a risk you can measure. Just as Hubbard argued, uncertainty isn’t a reason to do nothing — it’s a reason to measure. For startups, putting numbers around prompt injection isn’t just security hygiene. It’s a way to build trust with investors, customers, and your own team. And that trust is the real currency that lets you move fast without gambling blind.


References