Your Agents Have Credit Scores. Now Your Teams Do Too.

Mnemom Research | February 2026

Four days ago we shipped Credit Scores for AI Agents — the Mnemom Trust Rating, a bond-rating-style reputation system for individual agents. It answered a question the industry hadn't gotten around to asking yet: can I trust this agent, and can I prove it?

But the moment you build individual scores, the obvious next question surfaces: what about the team?

Nobody runs one agent. The interesting deployments are three, five, twelve agents coordinating on a task — and the risk profile of the group is not the sum of its parts. We knew this was coming because we'd already built team risk assessment. What we hadn't built was team memory.

The Gap

Mnemom already had team risk assessment — you could pass a list of agent IDs to POST /v1/risk/assess/team and get a three-pillar analysis (agent quality, coherence, systemic risk) with Shapley attribution and circuit breakers. That's been live since V2 launch.

But every team assessment started cold. No persistent identity. No accumulated history. No way to answer "is this team getting better or worse?" If you ran the same five agents together every day for six months, the system treated each assessment as if those agents had never met.

Individual agents get persistent identity, accumulated history, weekly snapshots, trend lines, public reputation pages, badges, and CI enforcement. Teams got none of it.

That changes today.

Team Identity

Teams are now first-class entities in Mnemom. Register a team, give it a name, assign agents.

POST /v1/teams
{
  "org_id": "org-abc123",
  "name": "Incident Response Alpha",
  "agent_ids": ["smolt-a4c12709", "smolt-b8f23e11", "smolt-c1d45a03"],
  "metadata": { "environment": "production", "domain": "infrastructure" }
}

Teams have persistent identity, versioned rosters, and a full change log. When you add or remove an agent, the system records who made the change and when, and triggers a score recomputation. This isn't a tag or a label — it's an entity with its own lifecycle.

Minimum two agents, maximum fifty. Agents can belong to multiple teams. Roster changes are tracked and auditable.

The Team Trust Rating

Here's where it gets interesting. A team score is not the average of its members' scores. If it were, you wouldn't need one.

The thing that makes a team a team — as opposed to a bag of individuals — is how well they work together. A team of five AAA agents with terrible coherence should score worse than a team of five A agents with excellent coherence. The team score captures what individual scores can't: the emergent behavior of the group.

Five weighted components:

Team Coherence History (35%) — How consistently well-aligned is this team over time? This is the dominant signal because it measures the one thing that only exists at the team level. Derived from historical CQ pillar scores across all assessments linked to this team.

Aggregate Member Quality (25%) — The floor. A team is only as strong as its members. This is a tail-risk-weighted aggregate of individual Trust Ratings — members with lower scores receive exponentially more weight, so one weak member drags the team down more than one strong member lifts it up. It matters — but it's not dominant, because a high-quality team with poor coordination is still a risky team.

Operational Track Record (20%) — How often has this team been assessed as low-risk? The historical hit rate across all team assessments. An operational measure, not a compositional one.

Structural Stability (10%) — Is the team's contagion profile stable? Does it churn members? A team that swaps agents every week cannot build a reliable track record. Frequent roster changes suppress this component. This is analogous to how employee turnover affects organizational risk ratings in traditional finance.

Assessment Density (10%) — Is this team actively monitored? Count and recency of assessments. An actively assessed team with 200 data points gets more credit than one assessed twice six months ago.

The composite score maps to the same 0-1000 range and the same AAA-through-CCC grade scale as individual scores. Same confidence tiers. A team needs 10 team risk assessments before a score publishes (teams show "Building 4/10" until they cross the threshold).

Weekly snapshots. Trend charts. All of it.

Proof Chain

Every team assessment that feeds the score is already cryptographically attested — Ed25519 signatures, hash chains, Merkle trees, STARK zero-knowledge proofs. The team score computation itself is also provable in the zkVM.

The guest program takes the list of historical assessment results (each with a proof_id linking to its own STARK proof), the current roster, and the component weights, and re-derives the team score in Q16.16 fixed-point arithmetic. The output is a proof that the team score was correctly computed from attested evidence.

This creates a proof chain: individual checkpoints lead to individual Trust Ratings, which feed team assessments, which feed the Team Trust Rating. Each link is independently verifiable. You don't have to trust us that a team is rated A — you can verify every step of the computation yourself.

Public Surfaces

Everything that exists for individual agents now exists for teams.

Reputation pages at /teams/{team_id}/reputation — score gauge, five-component breakdown, member roster with individual badges, trend charts, roster change timeline with score impact annotations, and a "Verify this score" button that checks the proof chain.

Team directory at /directory/teams — searchable catalog of public team scores. Filter by grade, confidence, domain, or team size.

Badges via GET /v1/teams/{team_id}/badge.svg — embeddable SVG in six variants including score, grade, score+trend, and compact. [ Team Trust | 812 ] in your README, your docs, your agent registry. Pre-eligible teams get a progress badge: [ Team Trust | Building 4/10 ].

GitHub Action — extend mnemom/reputation-check@v1 with a team-id parameter:

- uses: mnemom/reputation-check@v1
  with:
    team-id: team-7f2a9c01
    min-score: 700
    min-grade: A

Fails CI if the team drops below threshold. Enforce team-level quality gates alongside individual ones.

Team Alignment Cards

Teams get their own alignment cards — the behavioral contract that declares the team's collective values, autonomy boundaries, and coordination mode. You can auto-derive a team card from the union of member cards:

POST /v1/teams/{team_id}/card/derive

The derivation merges member cards: values are unioned and ordered by frequency, forbidden actions from any member apply to the team (strictest wins), and the highest audit retention policy is inherited. You can also set cards manually or start from an auto-derived base and customize. Every card change is versioned.

This matters because the team card is what CQ measures against. It's the declared behavioral contract for the group, independent of any individual member's card.

Integration

Team reputation plugs into the systems you're already using.

Containment: When a team member is paused or killed via the containment engine, the team score reflects it immediately. Member quality drops. Structural stability drops. If the contained agent had high contagion vulnerability scores, other team members get flagged for tightened guardrails.

Predictive guardrails: For registered teams, the guardrail engine uses historical assessment data to improve predictions. "This team has historically struggled with speed-safety tradeoffs" is more useful than a cold-start coherence analysis.

CI gating: The same GitHub Action that enforces individual agent scores now enforces team scores. Deploy pipelines can gate on both.

What This Means

Individual Trust Ratings answered "can I trust this agent?" Team Trust Ratings answer the questions that come next:

"How has this team performed?" — answered with a persistent, trended, attested score.

"Is this team getting better or worse?" — answered with weekly snapshots and trend lines, not guesswork.

"Which team should I deploy?" — answered with side-by-side comparison, not gut feel.

If individual Trust Ratings are FICO for agents, team scores are Moody's for agent portfolios. Same rigor. Same verifiability. Applied to the unit that actually matters — the group.

Team Reputation & Risk Scoring is live today on Team and Enterprise plans. Register a team, run assessments, and watch the score build.

Team Management Guide · Team Trust Rating · Teams API

Mnemom builds alignment and integrity infrastructure for autonomous agents. AAP and AIP are open source and available on npm and PyPI.

GitHub: github.com/mnemom · Docs: docs.mnemom.ai