Governance in the Code Path

Mnemom Research
Mnemom Research | February 2026
Every AI agent governance product on the market works the same way. Monitor what the agent did. Log it. Flag anomalies. Show someone a dashboard. The human reads the dashboard. The human decides what to do.
This is the equivalent of a bank that records every transaction but has no fraud rules. You know that something bad happened. You just can't stop it from happening.
Today we're shipping Card Lifecycle & Policy Intelligence — CLPI. It puts governance directly into the code path: the actual request pipeline that every agent action flows through. Policy evaluation happens before the action executes, not after.
The Semantic Gap
Alignment cards declare what an agent is allowed to do in human terms:
"bounded_actions": ["inference", "read", "write", "web_fetch"]
Actual tool names are specific:
mcp__browser__navigate, execute_python_code, search_web, read_file
There's a gap between what the card declares and what the tools do. Until today, that gap was bridged by interpretation — the observer would analyze traces and infer whether the agent's tool usage matched its card declarations. Sometimes the inference was wrong. Configuration errors looked like behavioral violations. The trust score took the hit regardless.
CLPI eliminates the gap with a policy DSL. A YAML file that maps semantic capabilities to concrete tool patterns:
capability_mappings:
web_browsing:
tools:
- "mcp__browser__*"
card_actions:
- "web_fetch"
- "web_search"
file_reading:
tools:
- "mcp__filesystem__read*"
- "mcp__filesystem__list*"
card_actions:
- "read_file"
forbidden:
- pattern: "mcp__filesystem__delete*"
reason: "File deletion not permitted"
severity: "critical"
- pattern: "mcp__shell__*"
reason: "Shell execution not permitted"
severity: "high"
defaults:
unmapped_tool_action: "warn"
grace_period_hours: 24
This isn't a configuration file. It's a governance contract. The card declares intent. The policy enforces it.
Three Checkpoints, One Engine
The policy engine evaluates the same rules at three points in the agent lifecycle:
CI/CD (before deployment). The CLI command mnemom policy evaluate validates that an agent's tool manifest matches its alignment card plus policy. Fails the build on mismatch. This runs in any CI system via npx @mnemom/smoltbot policy evaluate --card card.json --tools manifest.json.
Gateway (before each action). The gateway extracts tool names from the API request body — Anthropic, OpenAI, and Gemini formats are all supported — and evaluates them against the policy in real time. Three modes: enforce (reject with 403), warn (log and forward), off (skip). The response carries an X-Policy-Verdict header so downstream systems know the evaluation result. Policy fetch is parallelized with quota resolution, adding zero latency to the hot path.
Observer (after execution). Retroactive verification against traces. This is where the system distinguishes card gaps (a tool the agent uses that isn't in its card — a configuration error) from real violations (a tool the agent uses that's forbidden). The distinction matters for what comes next.
Same engine. Same rules. No drift between what CI checks, what the gateway enforces, and what the observer verifies. One policy definition governs the entire lifecycle.
Trust Recovery
Before CLPI, this scenario was common: you add a new MCP tool to your agent, the agent starts using it, the observer flags UNBOUNDED_ACTION violations because the alignment card doesn't declare the new capability, and the Mnemom Trust Rating drops. The violations were real — but the root cause was a configuration error, not a behavioral failure. Fixing the card didn't fix the score. The damage was permanent.
CLPI changes this. When the observer detects a capability violation, it now classifies it: card gap or behavior gap. A card gap means the tool isn't in the alignment card but isn't forbidden by policy — it's a missing declaration, not a dangerous action.
Fix the card, and the system reclassifies the violations:
POST /v1/agents/{agent_id}/reclassify
Reclassification triggers a cascade. The scoring engine recomputes the Mnemom Trust Rating with the violations excluded. Trust graph propagation recalculates downstream agents that were penalized by transitive trust decay — BFS traversal, three hops, capped at 50 agents. If the affected violations were in a proven session, the prover re-issues the proof (the original remains valid but is marked superseded). OTel events emit gen_ai.safety.reclassification with before and after scores.
This sounds like a bookkeeping exercise. It's not. It's the difference between a system people trust and one they fight. A governance system that punishes configuration errors the same way it punishes behavioral violations will train operators to stop updating their agents — or stop using the system. Trust recovery makes the system honest.
Grace periods smooth the curve further. When the gateway sees a tool it hasn't seen before, it gets a configurable window (default 24 hours) before the tool becomes a violation. This gives operators time to update their alignment card after deploying new capabilities, without tripping enforcement during the rollout.
The Intelligence Layer
Detection tells you what happened. Prediction tells you what will happen. Recommendation tells you what to do about it.
CLPI's intelligence layer builds on N-way coherence analysis — given a team of agents, the system computes a matrix showing exactly where their values align and diverge. That's been live since V2. The intelligence layer extends it:
Fault line extraction. From the coherence matrix, extract specific value dimensions where agents conflict. "These two agents have opposing escalation policies for data access." Concrete, actionable, attributed.
POST /v1/teams/fault-lines
Predictive risk forecast. Map fault lines to a task context and predict failure modes. "68% chance of escalation conflict on data queries. Primary drivers: agent-a4c prefers to escalate, agent-b8f prefers to resolve autonomously."
POST /v1/teams/forecast
Policy recommendation. Given the team composition, task context, and fault lines, generate a policy.yaml that mitigates the predicted risks. Ready to deploy. Backed by a STARK proof covering the full derivation chain — fault lines, forecast, recommendation, guardrail merge, and enforcement verdicts — in a 5-step SP1 guest program.
POST /v1/teams/recommend-policy
Transaction-scoped guardrails. The recommended policy can be applied as a temporary overlay for a specific multi-agent transaction. The gateway enforces a three-layer merge: org policy + agent policy + transaction guardrails. The guardrails inject conscience values into AIP for the duration of the interaction, then expire automatically. No permanent policy changes required.
This is the consulting product embedded in the API. Instead of hiring a team to audit your agent fleet, you call an endpoint and get a governance configuration backed by cryptographic proof of how it was derived.
On-Chain Anchoring
The proof chain now extends to a public ledger.
Agent reputation scores are publishable to a MnemoReputationRegistry on Base L2 — an ERC-8004-aligned contract that stores scores (0–1000) with timestamps and on-chain verification. A second contract, MnemoMerkleAnchor, anchors Merkle roots of the complete proof tree at configurable intervals. Any party can verify a score publication or Merkle inclusion against the on-chain state without trusting Mnemom infrastructure.
The dashboard shows an "anchored on-chain" indicator with Basescan transaction links. Both contracts are open source with 20 Foundry tests.
On-chain anchoring is opt-in and additive. It doesn't change how proofs work or how scores are computed. It adds an external, immutable verification point for customers and counterparties who need it.
What Changed
Before today, Mnemom could tell you what your agents did and whether they could be trusted. After today, Mnemom governs what your agents can do — with the same cryptographic rigor applied to the governance layer itself.
Monitoring becomes enforcement. Observation becomes policy. Dashboards become code. And the proof chain extends from the zkVM to a public ledger that anyone can verify.
Card Lifecycle & Policy Intelligence is live today. Start with mnemom policy init to generate a starter policy, or call POST /v1/policies/evaluate to test your agents against existing alignment cards.
Policy Management Guide · Policy DSL Specification · Intelligence API · Trust Recovery Guide · On-Chain Verification
Mnemom builds alignment and integrity infrastructure for autonomous agents. AAP and AIP are open source and available on npm and PyPI.
GitHub: github.com/mnemom · Docs: docs.mnemom.ai
Stay in the loop
New dispatches and product updates, no spam.
