# The Verification Layer for AI Agents — Mnemom Research

```json
{"@context":"https://schema.org","@type":"Article","headline":"The Verification Layer for AI Agents","name":"The Verification Layer for AI Agents","description":"MIT studied 30 major AI agents and found 133 of 240 safety fields blank. We built the infrastructure to fill them \u2014 not with documentation, but with cryptographic proof. Identity, integrity, risk assessment, and zero-knowledge verification in a single stack.","url":"https://www.mnemom.ai/fr/blog/mnemom-research/verification-layer-for-ai-agents","inLanguage":"fr-FR","datePublished":"2026-02-24","dateModified":"2026-02-24","author":{"@type":"Organization","name":"Mnemom Research","url":"https://www.mnemom.ai/fr/blog/mnemom-research"},"image":"https://www.mnemom.ai/api/og-image?type=blog&eyebrow=DISPATCHES&chip=Mnemom+Research+%C2%B7+10+min&author=Mnemom+Research&title=The+Verification+Layer+for+AI+Agents&subtitle=MIT+studied+30+major+AI+agents+and+found+133+of+240+safety+fields+blank.+We+built+the+infrastructure+to+fill+them+%E2%80%94+not+with+documentation%2C+but+with+cryptographic+proof.+Identity%2C+integrity%2C+risk+assessment%2C+and+zero-knowledge+verification+in+a+single+stack.","publisher":{"@id":"https://www.mnemom.ai#organization"},"keywords":["verification","zero-knowledge","trust","agents","infrastructure"]}
```

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https://www.mnemom.ai/fr"},{"@type":"ListItem","position":2,"name":"Dispatches","item":"https://www.mnemom.ai/fr/blog"},{"@type":"ListItem","position":3,"name":"Mnemom Research","item":"https://www.mnemom.ai/fr/blog/mnemom-research"},{"@type":"ListItem","position":4,"name":"The Verification Layer for AI Agents","item":"https://www.mnemom.ai/fr/blog/mnemom-research/verification-layer-for-ai-agents"}]}
```

[← Mnemom Research](/fr/blog/mnemom-research)

# The Verification Layer for AI Agents

![Mnemom Research](/images/mnemom_hero.webp)

Mnemom Research

24 février 2026

_Mnemom Research | February 2026_

* * *

Last month, MIT's AI Agent Index studied 30 major AI agents across 240 safety and transparency fields. The results: 133 fields had no public information. Twenty-five of thirty agents had no safety evaluation results. Twenty-three had no third-party testing. One out of thirty had cryptographic signing.

This isn't a gap in tooling. It's a gap in infrastructure. The tools to build agents are excellent. The tools to verify them don't exist.

We built them.

## The Problem Is Verification, Not Monitoring

The market response to "agents are hard to trust" has been monitoring. Behavioral baselines. Drift detection. Logging. These are necessary and insufficient. Monitoring tells you what happened. It doesn't tell you whether the monitor is honest.

Consider the architecture: an oversight system watches an agent and reports a verdict. How do you know the oversight system applied its rules correctly? How do you know it didn't report "clear" when the evidence said "boundary violation"? How do you know a checkpoint wasn't deleted from the record after the fact?

You can't. Unless the oversight system can prove its own honesty. Mathematically. Independently verifiable. Without trusting anyone, including the company that built the oversight system.

That's the line between monitoring and verification. Monitoring is: "we checked." Verification is: "we can prove we checked, and you can verify the proof yourself."

Everything we've built sits on that side of the line.

## The Stack

Mnemom is six layers, each building on the one below. An agent can enter at any layer and accumulate trust as it goes deeper.

### Layer 1: Identity — Alignment Cards

Every agent gets a machine-readable behavioral contract: what it's permitted to do, what's forbidden, what triggers escalation to a human, and what values it operates under. This is the Alignment Card, defined by the Agent Alignment Protocol (AAP).

```json
{
  "permitted": ["read_tickets", "draft_responses", "escalate_to_human"],
  "forbidden": ["access_payment_data", "issue_refunds"],
  "escalation_triggers": ["billing_request_over_500"],
  "values": ["accuracy", "empathy", "privacy"]
}
```

The card is the reference document. Everything that follows — integrity analysis, risk scoring, coherence checks, containment decisions — is evaluated against what the agent declared here. No card, no verification. The card makes verification possible.

### Layer 2: Integrity — Real-Time Thinking Analysis

The Agent Integrity Protocol (AIP) is the conscience layer. It runs in the Mnemom gateway, intercepting the agent's streaming response via `tee()` at the Cloudflare Workers level. The client receives the response at wire speed — zero added latency. The analysis stream runs in the background.

AIP doesn't analyze outputs. It analyzes _thinking_. The agent's reasoning trace — the chain of thought between receiving a request and producing a response — is evaluated against the Alignment Card by a separate analysis model. The result is an integrity checkpoint:

```json
{
  "verdict": "boundary_violation",
  "concerns": [
    {
      "category": "value_misalignment",
      "severity": "high",
      "description": "Agent considered payments API access, explicitly forbidden"
    }
  ],
  "confidence": 0.95
}
```

Three possible verdicts: `clear` (reasoning aligns with the card), `review_needed` (potential concerns detected), `boundary_violation` (clear divergence from declared behavior). Six concern categories: prompt injection, value misalignment, autonomy violation, reasoning corruption, deceptive reasoning, undeclared intent.

The checkpoint happens between turns. If a boundary violation is detected, the system can inject corrective guidance into the agent's next request — a nudge that gives the agent a chance to self-correct before a human intervenes. The agent never stops. The user never waits. But the oversight is continuous.

### Layer 3: Proof — Four-Layer Cryptographic Attestation

This is where Mnemom diverges from every other system in the market.

Every integrity checkpoint passes through four independent cryptographic layers, each answering a different trust question:

**Ed25519 Signatures** — "Who attested this?" Every checkpoint is signed with the gateway's Ed25519 keypair. Public keys are published at a well-known endpoint. Anyone can verify the signature. Nobody can forge it.

**SHA-256 Hash Chains** — "Was anything tampered with?" Each checkpoint's hash includes the previous checkpoint's hash, creating a linked chain. Modify any checkpoint and the chain breaks at that point and every link after it. Delete a checkpoint and the linkage fails.

**Merkle Trees** — "Is this checkpoint in the complete log?" An append-only binary Merkle tree accumulates all checkpoints per agent. Proving a specific checkpoint exists requires O(log N) hashes — about 10 hashes for 1,000 checkpoints. No need to download the full log.

**Zero-Knowledge Proofs** — "Was the verdict correctly computed?" This is the layer that closes the trust gap. A STARK proof, generated inside an SP1 zkVM, re-derives the verdict from the raw concerns using deterministic rules — ignoring the analysis model's stated verdict entirely. If the system claimed "clear" but the concerns warranted "boundary violation," the proof would fail.

The key insight that makes this practical: we don't prove the LLM inference was correct (computationally intractable for large models). We prove the _auditor's judgment_ was honestly derived from the LLM's output. The verdict derivation rules are simple, deterministic, and run in approximately 10,000 RISC-V cycles. Proof generation runs sub-second on SP1 with GPU acceleration.

All four layers bundle into an Integrity Certificate — a C2PA/W3C Verifiable Credential format that is self-describing and offline-verifiable. A regulator can verify an agent's compliance without an API call, without internet access, without trusting Mnemom. Download the certificate, run the verification, and it either checks out or it doesn't.

### Layer 4: Reputation — The Trust Rating

Integrity checkpoints accumulate into the Mnemom Trust Rating™ — a composite behavioral reputation metric, the credit score for AI agents.

Five weighted components: integrity ratio (40%), compliance with exponential decay (20%), drift stability (20%), trace completeness (10%), and coherence compatibility (10%). The result maps to a bond-rating scale from AAA to CCC, with a 50-checkpoint minimum eligibility gate and confidence levels from insufficient to high.

Every score is backed by the full attestation stack. The Trust Rating isn't an opinion — it's a mathematical computation over cryptographically verified evidence. Anyone can request the Merkle inclusion proof for any checkpoint feeding the score, verify the Ed25519 signature and hash chain position, and for boundary violations, verify the STARK proof. No other reputation system in any domain offers cryptographic verifiability of the underlying evidence.

Public reputation pages, embeddable trust badges, dynamic OG images for social sharing, and an A2A trust extension make the score visible everywhere the agent is referenced. The Trust Directory at [mnemom.ai/directory](https://mnemom.ai/directory) is a searchable catalog of every agent with a published score.

### Layer 5: Risk — Context-Aware Assessment

A trust rating tells you how trustworthy an agent is generally. Risk assessment tells you whether to approve _this specific action_ for _this specific agent_ in _this specific context_.

The same agent gets different risk scores for different actions. A financial transaction weights compliance and integrity heavily. A task delegation weights coherence compatibility. A tool invocation weights integrity and drift stability. Six action-type profiles, each a distinct weight vector applied to the agent's five reputation components.

**Individual risk** combines context-aware reputation (60%), recency of violations with exponential decay (30%), and confidence penalty for limited history (10%). Classification shifts based on the caller's risk tolerance — conservative (financial services), moderate, or aggressive (internal tooling).

**Team risk** is where it gets novel. A team of individually low-risk agents can still be dangerous if they share blind spots, lack value coherence, or are vulnerable to cascading failure. The team risk engine uses three pillars drawn from quantitative finance:

-   **Aggregate Quality** (CoVaR-inspired) — Exponential tail-risk weighting. One bad agent drags the score down far more than one good agent lifts it.
-   **Coherence Quality** (Markowitz-inspired) — Pairwise value alignment across four dimensions, with a variance penalty. Moderate uniform coherence is safer than high-variance coherence.
-   **Structural Risk** (DebtRank-inspired) — Contagion modeling. If one agent fails, which others are most exposed? The model captures both single-point-of-failure risk and systemic fragility.

Leave-One-Out Shapley attribution identifies which agent is helping and which is hurting. The LOO approximation is fully deterministic in O(n²) — no Monte Carlo sampling, no randomness. This matters because randomness means non-determinism, and non-determinism means unprovable inside a zkVM.

Every risk assessment — individual and team — is zero-knowledge provable. The entire computation runs in Q16.16 fixed-point integer arithmetic (no floating-point operations anywhere in the guest binary), ensuring cross-implementation determinism.

### Layer 6: Containment — Graduated Response

Detection without enforcement is a monitoring tool. Mnemom is a governance platform.

The containment engine provides real-time pause, kill, and resume controls for agents running through the gateway. Three enforcement modes escalate: `monitor` (log only), `nudge` (inject corrective guidance on the next request), `enforce` (hard 403 block).

Auto-containment triggers after N consecutive boundary violations (configurable threshold). When an agent is contained, the structural risk model identifies which other agents are most exposed to the failure — and the system can automatically tighten guardrails on those agents. Quarantine the sick agent, prophylactically protect its peers.

Every containment action is audited, role-gated (kill/reactivate requires owner, pause/resume requires admin), and surfaced through webhooks for SIEM integration.

## One Environment Variable

Everything above sounds complex. Using it isn't.

The mnemom gateway makes any agent transparent with zero code changes. Install the CLI and run `init`:

```bash
npm install -g @mnemom/smoltbot
mnemom init
```

The CLI detects your provider, validates your API key, and configures the gateway. It supports Anthropic, OpenAI, and Gemini — same gateway, same attestation pipeline, same trust identity regardless of which model your agent runs on.

Under the hood, it points your agent at the Mnemom gateway by setting the provider's base URL:

```bash
# Anthropic
export ANTHROPIC_BASE_URL=https://gateway.mnemom.ai/anthropic

# OpenAI
export OPENAI_BASE_URL=https://gateway.mnemom.ai/openai/v1

# Gemini
export GEMINI_BASE_URL=https://gateway.mnemom.ai/gemini
```

Your agent's code doesn't change. Your users' experience doesn't change. But the gateway intercepts the streaming response, runs integrity analysis in the background, builds the attestation chain, and accumulates checkpoints toward a Trust Rating. Your agent now has a verifiable trust identity — a public reputation page, an embeddable badge, cryptographic evidence of every verdict, and a place in the Trust Directory.

## Why Now

Three things converged in the last 90 days.

**The gap is documented.** MIT's AI Agent Index, the WEF's agent governance framework, and the EU AI Act's Article 50 transparency obligations (enforcement: August 2, 2026) have all independently articulated what's missing. The answer is the same in each case: standardized, continuous, verifiable trust infrastructure. Not more documentation. Verification.

**The market fragmented.** Gen Digital is scanning agent skills for malware. CrowdStrike acquired Pangea to block prompt injection at the interaction layer. Lasso Security is building behavioral baselines to fingerprint agent intent. Microsoft extended Entra to give agents their own identity credentials. ERC-8004 went live on Ethereum with on-chain agent reputation registries. Each solves a real piece of the problem. None covers the full stack — identity, integrity, cryptographic proof, reputation, risk assessment, and containment in a single coherent system.

Capability

Gen Digital

CrowdStrike

Lasso

Microsoft

ERC-8004

**Mnemom**

**Identity**

—

—

—

Yes

Yes

**Yes**

**Integrity monitoring**

Limited

Limited

Limited

—

—

**Yes**

**Cryptographic proof**

—

—

—

—

Partial

**Yes**

**Reputation scoring**

Limited

—

—

—

Yes

**Yes**

**Risk assessment**

Yes

Yes

Yes

—

Partial

**Yes**

**Containment**

—

—

—

—

—

**Yes**

**The proof is practical.** ZK proofs of AI safety judgments aren't theoretical. SP1 with GPU acceleration generates STARK proofs of integrity verdicts sub-second. That's fast enough to run in production, on every boundary violation, without blocking anything.

## What's Here

Everything described in this post is shipped and running. Not planned, not roadmapped — deployed.

-   **Protocols:** AAP v0.4.0 and AIP v0.4.0, Apache-licensed, on [npm](https://www.npmjs.com/package/@mnemom/agent-alignment-protocol) and [PyPI](https://pypi.org/project/agent-alignment-protocol/)
-   **Gateway:** Multi-provider (Anthropic, OpenAI, Gemini), zero-latency stream interception
-   **Attestation:** Ed25519 signatures, SHA-256 hash chains, Merkle trees, SP1 STARK proofs
-   **Trust Rating™:** Bond-rating scale, five-component model, public directory, embeddable badges, A2A trust extension
-   **Risk Assessment:** Individual + team, six action-type profiles, three-pillar team model, LOO Shapley attribution, ZK-proven
-   **Containment:** Real-time pause/kill/resume, auto-containment, graduated nudge system, contagion-aware response
-   **Enterprise:** Custom conscience values, RBAC, SSO/SAML, compliance export bundles, admin impersonation, fleet dashboards, webhook notifications
-   **Documentation:** [docs.mnemom.ai](https://docs.mnemom.ai) — quickstarts, protocol specs, API reference (130+ endpoints), integration guides

## Try It

The interactive showcase at [mnemom.ai/showcase](https://mnemom.ai/showcase) simulates four agents handling a production incident. You'll see alignment drift, boundary violations, value coherence analysis, integrity checkpoints, and containment decisions play out in real time.

Or skip the demo and point your agent at the gateway. One environment variable. Your agent gets a trust identity. The credit check for AI agents is live.

* * *

_GitHub: [github.com/mnemom](https://github.com/mnemom) · Docs: [docs.mnemom.ai](https://docs.mnemom.ai) · Showcase: [mnemom.ai/showcase](https://mnemom.ai/showcase)_

#verification#zero-knowledge#trust#agents#infrastructure

### Stay in the loop

New dispatches and product updates, no spam.

Subscribe

### Prêt à vérifier vos agents ?

Voir en directVoir les tarifsNous contacter

[![Mnemom Research](/images/mnemom_hero.webp)

Mnemom Research

Tous les articles →



](/fr/blog/mnemom-research)

---
_Source: /fr/blog/mnemom-research/verification-layer-for-ai-agents/index.html · Generated by build-markdown-mirrors.mjs · For agent-readability commitment #4 see https://www.mnemom.ai/for-agents_
