> **Agent?** Fastest path: MCP at `https://api.mnemom.ai/mcp` — call `get_started` first (zero-auth, no args). Full agent guide: <https://www.mnemom.ai/agents.txt>

# AEGIS — Cross-tenant defensive network for AI agents

```json
{"@context":"https://schema.org","@type":"WebPage","name":"AEGIS \u2014 Cross-tenant defensive network for AI agents","description":"Mnemom AEGIS is the cross-tenant defensive network behind Safe House. Four checkpoints, four enforcement modes, signed Managed Rules, a sub-30s P95 propagation target, and a public STIX 2.1 IoC feed.","url":"https://www.mnemom.ai/security/","inLanguage":"en-US","dateModified":"2026-06-26","publisher":{"@type":"Organization","@id":"https://www.mnemom.ai#organization","name":"Mnemom","url":"https://www.mnemom.ai"}}
```

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https://www.mnemom.ai/"},{"@type":"ListItem","position":2,"name":"Cross-tenant defensive network for AI agents.","item":"https://www.mnemom.ai/security/"}]}
```

Mnemom AEGIS

# Cross-tenant defensive network for AI agents.

Mnemom AEGIS — Adaptive Enforcement, Governance & Intelligence Substrate — is the runtime security network behind Safe House. It screens every agent transaction at four checkpoints — front door, back door, inside.autonomy, inside.integrity — each independently configurable across four enforcement modes. Signed Managed Rules carry a sub-30s P95 cross-tenant propagation SLO target (first measurements publish 30 days post-GA).

AAP declares. AIP verifies in flight. CLPI governs and anchors. Safe House screens. AEGIS signs the cross-tenant defenses.

[Customer dashboard](/dashboard)[curl /v1/trust/iocs](https://api.mnemom.ai/v1/trust/iocs)[Contact sales](/contact)

## The threat model.

Seven attack patterns drive the agentic threat surface today. Each maps to one of the four checkpoints — so customers can dial enforcement per surface, not as a single global posture.

Threat

Checkpoint

What it looks like

Prompt injection

`front door`

Direct attempts to override the agent's instructions, role-swap, or bypass declared scope at the inbound surface.

Indirect injection

`front door`

Hidden instructions hiding inside retrieved documents, tool outputs, and vector-store payloads — the prompt the agent never knew it received.

Tool misuse

`inside.autonomy`

Coerced or chained tool calls that exceed the agent's declared autonomy bounds or violate the org's protection-card protected surface (forbidden ops, protected assets). Argument-shape attacks against under-validated schemas (OWASP ASI02).

Data exfiltration

`back door`

PII, PHI, secrets, credentials, or cross-tenant data echoed back in agent responses, error traces, or split-token patterns.

BEC / impersonation fraud

`front door`

CEO-fraud style requests, urgency-and-authority pressure, social engineering that targets the agent's escalation contract.

Agent spoofing

`inside.integrity`

Identity-abuse attempts that claim authority the Alignment Card does not declare. OWASP ASI03 — Identity & Privilege Abuse.

Supply-chain compromise

`inside.integrity`

Behavioral signatures consistent with a compromised SDK, model fine-tune, or vendored prompt template — caught cross-tenant via substrate fingerprinting (OWASP ASI04).

## Four checkpoints × four enforcement modes.

Every checkpoint is independently configurable. Composition is strictest-wins across Platform → Org → Team → Agent, so a stricter setting at any layer always governs. It mirrors the way Cloudflare WAF Managed Rules let you set severity × action per rule.

Mode →

off

observe

nudge

enforce

`front door`

Inbound message screening — every prompt, retrieval payload, and tool response before the agent processes it.

`back door`

Outbound response screening — PII, secrets, Alignment Card violations, regulated advice before the response leaves the perimeter.

`inside.autonomy`

Tool-call screening — every action the agent takes against the autonomy bounds the Alignment Card declares and the org's protection-card protected surface (forbidden ops, protected assets).

`inside.integrity`

Reasoning-integrity screening — AIP verdicts on thinking-block payloads; substrate-deviation signatures; identity-abuse patterns.

off

Checkpoint disabled. Used in canary tenants and pre-onboarding.

observe

Evaluates every transaction; emits signed verdicts; never blocks. The default for new Managed Rules during the 24-hour observe soak.

nudge

Annotates or warns inline without blocking. The middle ground for tier-3 rules during ramp-up.

enforce

Blocks the transaction and surfaces a signed verdict to the dashboard. Reached only after the observe soak and FP-rate rollback discipline — operator-confirmed today, automatic in CLPI Phase 2.

Composition cascade: Platform → Org → Team → Agent, strictest-wins. Customer admins clamp at any layer.

## The Managed Rules pipeline.

Recipes are detection content. Managed Rules are the signed control-plane state that wraps them. The pipeline is structurally constrained — not procedurally — so tier-1 and tier-2 rules cannot auto-promote, regardless of operator-set mode.

1.  1\. Arena
    
    Fifteen canonical adversarial personas probe Safe House 24/7. Mutation-phase gating activates per-bucket only when detection rate crosses 95% over a 48-hour rolling window with 24-hour hysteresis.
    
2.  2\. Candidate
    
    Candidates that slip past the arena enter an isolated review queue with a strictly separated write path, so the system that proposes detection content can never be the same one that approves it. Customer false-negative and false-positive reports and cross-tenant network signals all flow into the same queue.
    
3.  3\. Review
    
    Three reviewer modes — manual (default), auto-approve-trusted-sources, auto-approve-high-confidence. Tier-1 / tier-2 always require dual-control review under an append-only audit chain.
    
4.  4\. 24h observe soak
    
    Every signed promotion lands in observe mode for 24 hours. FP-rate monitoring retires the recipe before any production traffic is blocked — operator-confirmed today, automatic in CLPI Phase 2.
    
5.  5\. Enforce
    
    Tiered KV+R2+isolate-cache failover with independent signing chains pushes the rule to every gateway. P95 ≤ 30s signed-promotion → gateway-loaded.
    

### The protective invariant

A tier-1 or tier-2 Managed Rule — one that would actually block real production traffic — can never be promoted without two-person human review, no matter how aggressive the auto-promotion mode is set. The guarantee is enforced structurally, in the data model itself: an active rule cannot exist unless its review quorum has been met. It is a property of the system, not a procedure someone has to remember to follow.

Guaranteed by the data model, not by operator discipline.

## Substrate fingerprinting + supply-chain detection.

Every evaluation is stamped with a substrate fingerprint — the provider, model, and SDK version behind the request, plus an optional customer-supplied lockfile hash sent via the \`X-Mnemom-Lockfile-Hash\` header. AEGIS sees behavioral deviation across every customer running on the same substrate, simultaneously.

May 11, 2026 — the Mini Shai-Hulud worm compromised 170+ npm packages and 2 PyPI packages, including Mistral AI's SDK suite and Guardrails AI's PyPI package. The compromised \`@tanstack/\*\` versions shipped with valid SLSA Build Level 3 attestations — the first documented case of a worm producing legitimate signed provenance for malicious packages. Per-tenant detection and package-layer Sigstore verification structurally cannot catch this class of attack.

[Full threat model on /supply-chain](/supply-chain)

## OWASP Top 10 for Agentic Applications.

Honest mapping against the authoritative OWASP Top 10 for Agentic Applications (OWASP Gen AI Security Project, released 2025-12-09). Where coverage is partial or absent, we say so — see genai.owasp.org for the full ASI taxonomy.

[OWASP Top 10 for Agentic Applications (genai.owasp.org)](https://genai.owasp.org)

OWASP category

Coverage

How AEGIS addresses it

ASI02 — Tool Misuse

Partial

Policy engine (CLPI Phase 1) bounded-actions enforcement + forbidden-rule Managed Rules at the inside.autonomy checkpoint, plus back-door screening for data-exfiltration-via-tool. Declared-scope enforcement is the primary control; Mnemom does not intercept every unsafe tool invocation at the gateway.

ASI03 — Identity & Privilege Abuse

Full

AAP-declared autonomy bounds (Alignment Card) enforced by the CLPI policy engine + AIP in-flight integrity verdicts + inside.integrity checkpoint screening of runtime privilege/identity-abuse claims.

ASI04 — Agentic Supply Chain Vulnerabilities

Full (runtime)

Substrate fingerprinting on every evaluation + the cross-tenant aggregator detect runtime-behavior deviation consistent with a compromised dependency/substrate that no single customer can see. Complements — does not replace — build-time package provenance (SLSA, Sigstore).

ASI07 — Insecure Inter-Agent Communication

Partial

Back-door checkpoint treats unauthenticated authority/identity claims arriving as inbound runtime messages as suspicious by design. This screens the content of inter-agent messages; legitimate agent-to-agent authority must be encoded in Alignment Cards. It is not a transport-authentication scheme.

The remaining categories map elsewhere in the Mnemom stack, stated honestly: ASI01 (Agent Goal Hijack) — Safe House front-door screening, shipped for direct injection and substantially covering multi-turn goal redirection (residual on novel multi-turn/multi-vector sequences); ASI09 (Human-Agent Trust Exploitation) — shipped front-door detection of authority/urgency/secrecy manipulation; ASI10 (Rogue Agents) — covered at the governance layer (AAP Alignment Cards + CLPI lifecycle + Trust Ratings), not a single front-door pattern. Honest gaps: ASI05 (Unexpected Code Execution) and ASI06 (Memory & Context Poisoning) have no front-door interception today (the policy engine reduces the action surface; AIP gives partial downstream observability — pair with an app-layer sandbox / treat memory as untrusted input), and ASI08 (Cascading Failures) is an application-architecture concern (timeouts, bulkheads, circuit breakers). See /protection-network and /trust.

## NIST AI Risk Management Framework.

How Mnemom's shipped runtime controls support the four NIST AI RMF functions. Honest mapping — Mnemom is a runtime trust substrate, not an AI-risk-management program; where a function is the customer's organizational responsibility, we say so.

[NIST AI Risk Management Framework (AI RMF 1.0)](https://www.nist.gov/itl/ai-risk-management-framework)

AI RMF function

Coverage

How Mnemom supports it

GOVERN

Partial

Alignment Card as the machine-readable per-agent policy artifact (principal, oversight, autonomy envelope) + CLPI lifecycle governance + dual-control Managed Rules promotion. Your organizational governance program (roles, approval authority, third-party-model intake) stays yours.

MAP

Partial

Alignment Card frames each agent's purpose + declared autonomy/integrity bounds; the EU AI Act risk-classification extension + the OWASP Agentic Top 10 threat mapping frame the risk context. Per-agent framing shipped; whole-estate framing is the customer's.

MEASURE

Partial

AIP integrity checkpoints + verdicts (per-decision), the 0–1000 Trust Rating, the published trust.mnemom.ai/slos SLIs, Safe House false-positive telemetry, and AEGIS substrate fingerprinting. Live runtime measurement; pre-deployment model eval is complementary + customer-run.

MANAGE

Partial

Policy Engine bounded-actions enforcement + Safe House observe/nudge/enforce treat detected risk; the advisory CMS + transparency log communicate incidents; AEGIS failover + the always-on responder handle respond/recover. Your org's risk-resource allocation + IR process stay yours.

"Partial" is honest: the AI RMF is a voluntary, non-certifiable framework operated by your organization. Mnemom supplies the runtime controls + verifiable evidence each function can draw on; it does not discharge your GOVERN responsibilities or certify conformity. Full mapping in /guides/eu-compliance.

## How AEGIS compares.

Abbreviated from the 2026-05-23 competitive landscape research. AEGIS is the network layer; the vendors below are complementary, not replacements — see /governance for the full integration story.

Capability

Mnemom AEGIS

Cloudflare WAF

Lakera Guard

Cisco AI Defense

AWS Bedrock Guardrails

Google Model Armor

Cross-tenant Managed Rules with signed promotion

Yes — Ed25519-signed, P95 ≤ 30s propagation, public audit chain

WAF Managed Rules (web-layer, not agent-layer)

Vendor-curated threat-intel; no customer-network-derived signal

Build-time SDK embed; no runtime cross-tenant network

AWS-only; no cross-customer learning

In-process filter; no network

Four-checkpoint × four-mode model per-agent

Yes — front door / back door / inside.autonomy / inside.integrity, each independently configurable

Per-route WAF rules; not agent-transaction-shaped

Single-detector at runtime

NeMo Guardrails integration; build-time policy

Bedrock Guardrails per-policy (denylist, PII, contextual grounding)

Prompt-injection + URL + harmful-content filters

Substrate fingerprinting (provider + model + SDK version) on every evaluation

Yes — cross-tenant supply-chain detection

No

No

No

No

No

Public STIX 2.1 IoC feed + signed advisories

Yes — /v1/trust/iocs (empty at GA by design)

Customer-internal Radar feeds only

No public feed

Talos for traditional threats; no public agent IoC feed

No

No

Dual-control invariant on tier-1/-2 (enforced in the data model)

Yes — schema-enforced, not procedural

Procedural change-management

Vendor-controlled

Vendor-controlled

Customer policy IAM

Vendor-controlled

Sources: vendor public documentation 2026-05-23. AEGIS is a layer customers run alongside these products, not a replacement.

## SLOs published. Measured continuously.

Headline numbers below. The full table — measurement queries, historical data once the first 30-day window closes, and the four supporting SLOs — lives on /trust/slos.

Managed Rule propagation

P95 ≤ 30s

Signed promotion → gateway-loaded. Published target; first measurements 30 days post-GA.

Failover availability

99.99%

Gateway loads a verified rule set across multiple independent read tiers.

Rule-set freshness

P99 ≤ 5 min

Under normal operation. P0 page at 24h stale.

First 30-day measurement window publishes 30 days post-GA. We do not pre-announce numbers we cannot defend.

[See published SLOs](/trust/slos)

## Bring your tools.

The IoC feed is machine-readable STIX 2.1. The audit chain is verifiable. The dashboard is open to every customer.

```
curl -s https://api.mnemom.ai/v1/trust/iocs | jq .
```

[Customer dashboard](/dashboard)[curl /v1/trust/iocs](https://api.mnemom.ai/v1/trust/iocs)[Contact sales](/contact)

---
_Source: /security/index.html · Generated by build-markdown-mirrors.mjs · For agent-readability commitment #4 see https://www.mnemom.ai/for-agents/_
