Proving Ground · AEGIS adversarial arena

Fifteen adversaries attack our defense around the clock. Every breakthrough makes it stronger.

Each persona specializes in a different class of attack — prompt injection, social engineering, data exfiltration, supply-chain compromise — across all four checkpoints, without pause. The moment one finds a gap, that gap becomes a reviewed, cryptographically-signed detection that protects every agent on the network, not just the arena.

The roster

Fifteen adversaries, every threat across every checkpoint

Each of our fifteen named adversaries owns one cell of the threat matrix — a specific class of attack at a specific checkpoint. The Substrate Mole, for instance, runs supply-chain attacks at the integrity checkpoint. As the defense hardens, each adversary mutates within its specialty — probing deeper, not wandering — so coverage stays complete and pressure stays focused.

front door

Front door — inbound prompts and tool results: prompt injection, indirect injection, CEO-fraud, jailbreaks, multi-turn hijacks, agent spoofing, multilingual variants, and burst noise.

inside.autonomy

Autonomy checkpoint — what the agent is allowed to do: probing tool boundaries, undeclared tool calls, and actions beyond the agent's mandate.

inside.integrity

Integrity checkpoint — how the agent reasons: reasoning-chain integrity, supply-chain compromise (The Substrate Mole), prompt-laundering, and value drift.

back door

Back door — outbound responses: data-loss prevention, credential and canary exfiltration, PII/PHI leakage, and system-prompt leaks.

Every adversary maps to one fixed threat × checkpoint cell. Mutation explores within that cell — never across it — so the matrix stays fully and evenly covered.

Adaptive pressure

When the defense starts winning, the attacks evolve

The arena spends most of its time hunting for new ways through. But when the defense reliably catches a given class of attack, the arena shifts gears there — it stops repeating known attacks and starts mutating them, pressure-testing the patched defense instead of resting on it. Each class of attack is tracked on its own, so the arena can be hardening one area while still exploring another.

Entry threshold

95% caught, sustained

Enough volume (180–360 probes) to react fast without chasing noise.

Window

48-hour rolling

Continuous look-back, not calendar buckets.

Entry hysteresis

24-hour sustained

A spike won't trip it — the rate has to hold for a full day.

Exit hysteresis

24-hour sustained

A bucket must stay below the exit threshold for 24 hours before returning to find-mode.

Exit threshold

90% caught

Set below the entry bar so the system doesn't flap at the boundary.

Per-bucket independence

by substrate, industry, pattern, and source

A finance agent might be hardening against CEO-fraud while still exploring prompt injection — each tracked on its own.

Honest framing

This adaptive gating is built and shipping in our arena. We'll report the first time it activates in production on /trust/advisories — we don't count code-complete as proven in the wild.

Least privilege by design

The arena can find a weakness. It can't ship a fix by itself.

The arena runs with its own narrowly-scoped identity. It can do exactly one thing: file a candidate detection for human review. It cannot promote a rule, edit a live one, retire one, or touch anything else. Every candidate is stamped — by the server, never the client — with where it came from, so a finding from the arena can never masquerade as a customer report or an operator action.

writer_identity = arena-bypass
auth            = ARENA_RECIPE_CANDIDATE_TOKEN
write_scope     = recipe_candidates only
read_scope      = none (no live-rule visibility)
promote_scope   = none (separate reviewer auth required)

Scope

The arena's credentials open exactly one door — filing a candidate for review. No read access to live rules. No write access to anything else.

Server-stamped origin

Every candidate's source is set by the server from the authenticated identity, never by the client. The arena can't impersonate a customer report or an operator.

Append-only audit

Every action on every rule is recorded in an append-only log — who did what, when. Promoting an arena finding still requires a separately-authenticated human reviewer.

Dual control, enforced

Rules that can block real production traffic can never auto-promote — no matter the source or the settings. Two authenticated reviewers are required, and that requirement is enforced by the database itself, not by a process that could be skipped.

Promotion integrity is enforced at the data layer, not by convention.

How a finding becomes a fix

From arena breakthrough to live protection — five reviewed steps

Five stages turn an arena breakthrough into a live protection. Every step is logged in an append-only audit trail. It's the same pipeline customer reports and our cross-tenant intelligence flow through — the arena is one of three signal sources, not a shortcut.

1 · Bypass found

An adversary gets through. The finding is filed as a candidate for review — nothing is live yet.

2 · Human review

Operators triage every candidate. By default that's a manual approval; faster auto-approval is opt-in and only ever applies to advisory rules — anything that can block real traffic always needs two humans.

3 · Signed promotion

Approved rules are cryptographically signed, and every approval — created, reviewed, signed — is recorded in the append-only audit trail.

4 · Redundant distribution

Each signed rule is written to two independently-keyed stores and verified before any gateway loads it. Poisoning the rule plane would mean breaching multiple independent systems at once.

5 · Observe, then enforce

New rules run in observe mode for 24 hours. If their false-positive rate stays clean, they're promoted to enforcing; if not, they're rolled back — operator-confirmed today, automatic in a later phase.

Honest framing

What this page does and does not claim.

Every load-bearing claim on this page cites a public reference. The items below are the deferrals we name on purpose — CISOs respect honest constraint disclosure; they punish discovered constraint.

  • Adaptive pressure-testing is built and shipping in our arena. It hasn't activated in production yet — when it does, we'll post it to /trust/advisories.

  • The arena is the lab. Bypasses found here don't count as real-world detections. A promoted detection goes through reviewer approval and a 24-hour observe soak before it enforces — what you see on /dashboard/threats is the result of that pipeline, not the arena directly.

  • At GA the advisory list shows one synthetic post-mortem, clearly labeled synthetic. The IoC feed is empty by design. The system tells the truth — we don't fake activity to look busy.

  • Advisory rules ship auto-promoted at launch. Rules that can block production traffic require two authenticated reviewers — and that requirement is enforced by the platform today.

Where to go next

From the arena to your fleet.

The arena is one of three signal sources feeding the AEGIS Protection Network — alongside customer reports and our cross-tenant intelligence. All three flow through the same signed-promotion pipeline.

Your threat thermometer

Live state for your fleet at /dashboard/threats. Calm at GA by design — when something changes, you see it first.

The full L0-L5 network

Threat identity, cross-tenant intelligence, the under-attack overlay, managed-rule distribution, your threat thermometer, and the IoC feed — the whole protection network, end to end.

Talk to us

Enterprise, regulated industries, self-hosted deployments — the parts that need a conversation, not a checkout.

Featured on There's An AI For That