Mnemom AEGIS

Cross-tenant defensive network for AI agents.

Mnemom AEGIS — Adaptive Enforcement, Governance & Intelligence Substrate — is the runtime security network behind Safe House. It screens every agent transaction at four checkpoints — front door, back door, inside.autonomy, inside.integrity — each independently configurable across four enforcement modes. Signed Managed Rules carry a sub-30s P95 cross-tenant propagation SLO target (first measurements publish 30 days post-GA).

AAP declares. AIP verifies in flight. CLPI governs and anchors. Safe House screens. AEGIS signs the cross-tenant defenses.

The threat model.

Seven attack patterns drive the agentic threat surface today. Each maps to one of the four checkpoints — so customers can dial enforcement per surface, not as a single global posture.

ThreatCheckpointWhat it looks like
Prompt injectionfront doorDirect attempts to override the agent's instructions, role-swap, or bypass declared scope at the inbound surface.
Indirect injectionfront doorHidden instructions hiding inside retrieved documents, tool outputs, and vector-store payloads — the prompt the agent never knew it received.
Tool misuseinside.autonomyCoerced or chained tool calls that exceed the Alignment Card's permitted scope. Argument-shape attacks against under-validated schemas (OWASP ASI02).
Data exfiltrationback doorPII, PHI, secrets, credentials, or cross-tenant data echoed back in agent responses, error traces, or split-token patterns.
BEC / impersonation fraudfront doorCEO-fraud style requests, urgency-and-authority pressure, social engineering that targets the agent's escalation contract.
Agent spoofinginside.integrityIdentity-abuse attempts that claim authority the Alignment Card does not declare. OWASP ASI03 — Privilege Compromise via Identity Abuse.
Supply-chain compromiseinside.integrityBehavioral signatures consistent with a compromised SDK, model fine-tune, or vendored prompt template — caught cross-tenant via substrate fingerprinting (OWASP ASI06).

Four checkpoints × four enforcement modes.

Every checkpoint is independently configurable. Composition is strictest-wins across Platform → Org → Team → Agent, so a stricter setting at any layer always governs. It mirrors the way Cloudflare WAF Managed Rules let you set severity × action per rule.

Mode
off
observe
nudge
enforce
front door
Inbound message screening — every prompt, retrieval payload, and tool response before the agent processes it.
back door
Outbound response screening — PII, secrets, Alignment Card violations, regulated advice before the response leaves the perimeter.
inside.autonomy
Tool-call screening — every action the agent takes against the autonomy bounds the Alignment Card declares.
inside.integrity
Reasoning-integrity screening — AIP verdicts on thinking-block payloads; substrate-deviation signatures; identity-abuse patterns.
off

Checkpoint disabled. Used in canary tenants and pre-onboarding.

observe

Evaluates every transaction; emits signed verdicts; never blocks. The default for new Managed Rules during the 24-hour observe soak.

nudge

Annotates or warns inline without blocking. The middle ground for tier-3 rules during ramp-up.

enforce

Blocks the transaction and surfaces a signed verdict to the dashboard. Reached only after the observe soak and FP-rate auto-rollback discipline (CLPI Phase 2).

Composition cascade: Platform → Org → Team → Agent, strictest-wins. Customer admins clamp at any layer.

The Managed Rules pipeline.

Recipes are detection content. Managed Rules are the signed control-plane state that wraps them. The pipeline is structurally constrained — not procedurally — so tier-1 and tier-2 rules cannot auto-promote, regardless of operator-set mode.

  1. 1. Arena

    Fifteen canonical adversarial personas probe Safe House 24/7. Mutation-phase gating activates per-bucket only when detection rate crosses 95% over a 48-hour rolling window with 24-hour hysteresis.

  2. 2. Candidate

    Candidates that slip past the arena enter an isolated review queue with a strictly separated write path, so the system that proposes detection content can never be the same one that approves it. Customer false-negative and false-positive reports and cross-tenant network signals all flow into the same queue.

  3. 3. Review

    Three reviewer modes — manual (default), auto-approve-trusted-sources, auto-approve-high-confidence. Tier-1 / tier-2 always require dual-control review under an append-only audit chain.

  4. 4. 24h observe soak

    Every signed promotion lands in observe mode for 24 hours. FP-rate auto-rollback per CLPI Phase 2 retires the recipe before any production traffic is blocked.

  5. 5. Enforce

    Tiered KV+R2+isolate-cache failover with independent signing chains pushes the rule to every gateway. P95 ≤ 30s signed-promotion → gateway-loaded.

The protective invariant

A tier-1 or tier-2 Managed Rule — one that would actually block real production traffic — can never be promoted without two-person human review, no matter how aggressive the auto-promotion mode is set. The guarantee is enforced structurally, in the data model itself: an active rule cannot exist unless its review quorum has been met. It is a property of the system, not a procedure someone has to remember to follow.

Guaranteed by the data model, not by operator discipline.

Substrate fingerprinting + supply-chain detection.

Every evaluation is stamped with a substrate fingerprint — the provider, model, and SDK version behind the request, plus an optional customer-supplied lockfile hash sent via the `X-Mnemom-Lockfile-Hash` header. AEGIS sees behavioral deviation across every customer running on the same substrate, simultaneously.

May 11, 2026 — the Mini Shai-Hulud worm compromised 170+ npm packages and 2 PyPI packages, including Mistral AI's SDK suite and Guardrails AI's PyPI package. The compromised `@tanstack/*` versions shipped with valid SLSA Build Level 3 attestations — the first documented case of a worm producing legitimate signed provenance for malicious packages. Per-tenant detection and package-layer Sigstore verification structurally cannot catch this class of attack.

OWASP Top 10 for Agentic Applications.

Honest mapping. Where coverage is partial, we say so. The full OWASP ASI taxonomy (Dec 2025) is at owasp.org.

OWASP categoryCoverageHow AEGIS addresses it
ASI02 — Tool Misuse
Full
Policy engine (CLPI Phase 1) + Managed Rules at the inside.autonomy checkpoint. Tool-call screening against the Alignment Card's declared autonomy bounds.
ASI03 — Privilege Compromise via Identity Abuse
Full
AAP-declared autonomy bounds (Alignment Card) + AIP in-flight integrity verdicts + inside.integrity checkpoint screening for identity-abuse patterns.
ASI06 — Agentic Supply Chain Compromise
Full (runtime)
Substrate fingerprinting on every evaluation. The cross-tenant aggregator detects behavioral deviation no single customer can see. Complements — does not replace — package-layer provenance (SLSA, Sigstore).
ASI07 — System Prompt Leakage
Partial
Back-door checkpoint screening for known system-prompt patterns + secrets and Alignment Card violations. Detection is content-based; agents that legitimately quote their system prompt at user request are not suppressed.

ASI01 (Prompt Injection), ASI04 (Resource Exhaustion), ASI05 (Cascading Hallucination), ASI08 (Repudiation & Untraceability), ASI09 (Identity Spoofing), ASI10 (Overreliance) map to other parts of the Mnemom stack (AAP cards, AIP verdicts, CLPI on-chain anchoring, Trust Ratings) — covered on /protection-network and /trust.

How AEGIS compares.

Abbreviated from the 2026-05-23 competitive landscape research. AEGIS is the network layer; the vendors below are complementary, not replacements — see /governance for the full integration story.

CapabilityMnemom AEGISCloudflare WAFLakera GuardCisco AI DefenseAWS Bedrock GuardrailsGoogle Model Armor
Cross-tenant Managed Rules with signed promotion
Yes — Ed25519-signed, P95 ≤ 30s propagation, public audit chain
WAF Managed Rules (web-layer, not agent-layer)Vendor-curated threat-intel; no customer-network-derived signalBuild-time SDK embed; no runtime cross-tenant networkAWS-only; no cross-customer learningIn-process filter; no network
Four-checkpoint × four-mode model per-agent
Yes — front door / back door / inside.autonomy / inside.integrity, each independently configurable
Per-route WAF rules; not agent-transaction-shapedSingle-detector at runtimeNeMo Guardrails integration; build-time policyBedrock Guardrails per-policy (denylist, PII, contextual grounding)Prompt-injection + URL + harmful-content filters
Substrate fingerprinting (provider + model + SDK version) on every evaluation
Yes — cross-tenant supply-chain detection
NoNoNoNoNo
Public STIX 2.1 IoC feed + signed advisories
Yes — /v1/trust/iocs (empty at GA by design)
Customer-internal Radar feeds onlyNo public feedTalos for traditional threats; no public agent IoC feedNoNo
Dual-control invariant on tier-1/-2 (enforced in the data model)
Yes — schema-enforced, not procedural
Procedural change-managementVendor-controlledVendor-controlledCustomer policy IAMVendor-controlled

Sources: vendor public documentation 2026-05-23. AEGIS is a layer customers run alongside these products, not a replacement.

SLOs published. Measured continuously.

Headline numbers below. The full table — measurement queries, historical data once the first 30-day window closes, and the four supporting SLOs — lives on /trust/slos.

Managed Rule propagation
P95 ≤ 30s

Signed promotion → gateway-loaded. Published target; first measurements 30 days post-GA.

Failover availability
99.99%

Gateway loads a verified rule set across multiple independent read tiers.

Rule-set freshness
P99 ≤ 5 min

Under normal operation. P0 page at 24h stale.

First 30-day measurement window publishes 30 days post-GA. We do not pre-announce numbers we cannot defend.

See published SLOs

Bring your tools.

The IoC feed is machine-readable STIX 2.1. The audit chain is verifiable. The dashboard is open to every customer.

curl -s https://api.mnemom.ai/v1/trust/iocs | jq .
Featured on There's An AI For That