What is agents-shipgate?

A static, manifest-first scanner that catches risky agent tool configurations at PR time. CLI + GitHub Action. Open source, Apache-2.0.

What does it actually check?

Seven dimensions of an agent's tool surface across every declared tool source: inventory, schema, auth, approval policies, side effects, idempotency, and blast radius.

How is this different from LLM evals?

Evals validate behavior on inputs you wrote. agents-shipgate validates the static release artifact (manifest, tool schemas, policies) without running the model. Use both.

What inputs does it support?

MCP exports, OpenAPI 3.x specs, OpenAI Agents SDK Python entrypoints, Anthropic Messages API artifacts, Google ADK Python and YAML config, LangChain/LangGraph, CrewAI, and OpenAI Agents API artifacts.

Is it production-ready?

v0.5.1 is the current public release. The manifest remains version 0.1 across the 0.x series; see STABILITY.md.

How do I add it to GitHub Actions?

Five-minute setup with advisory mode by default; switch to strict with a baseline once the team has triaged existing findings.

Agents Shipgate — Static release checks for tool-using AI agents

Q: How is this different from observability or runtime guardrails?

Observability records what the agent did at runtime. Guardrails enforce policy at call time. agents-shipgate runs upstream, in CI, on the static artifact, so a release that violates policy never reaches them.

Q: Does it call my agent or send my data anywhere?

No. The scanner reads local manifest and tool-source files, runs static checks, and writes local reports. No agent execution, no user-code import by default, no model invocation, no MCP server connections, no LLM calls, no scanner telemetry by default, and no scanner network calls by default.

Q: What is the output format?

Markdown and JSON by default. SARIF is available through the GitHub Action or scan output format configuration. JSON uses report_schema_version 0.5.

Q: Does it certify my agent as safe?

No. agents-shipgate is an advisory release-readiness scanner, not a safety or compliance certification. It produces evidence for human review; it does not replace human review.

Quickstart

Run a fixture, scan your repo, then add advisory CI.

A 60-second path to a real Tool-Use Readiness Report once Python and pipx are available. Requires Python 3.12+.

Known fixture

$ pipx install agents-shipgate
$ agents-shipgate fixture run support_refund_agent

Runs a bundled refund-support fixture and writes local report artifacts.

Scan your repo

$ agents-shipgate init --workspace . --write
$ agents-shipgate scan -c shipgate.yaml

Creates a manifest, reads local tool sources, and generates a report for review.

Advisory CI

- uses: ThreeMoonsLab/agents-shipgate@v0.5.1
  with:
    config: shipgate.yaml
    ci_mode: advisory

Adds PR evidence without failing builds while your team triages findings.

Default artifacts agents-shipgate-reports/report.md agents-shipgate-reports/report.json SARIF is available through the GitHub Action or scan output format configuration.

Product

Turn your agent's tool surface into release evidence.

Agents Shipgate reads declared local tool sources, normalizes them into a reviewable inventory, runs deterministic static checks, and writes a Tool-Use Readiness Report.

CompanyThree Moons Lab

ProductAgents Shipgate

CLI / package / repoagents-shipgate

OutputTool-Use Readiness Report

Inputs

MCP exports

OpenAPI specs

SDK/framework metadata

OpenAI Agents SDK · Anthropic Messages API · Google ADK · LangChain/LangGraph · CrewAI · OpenAI Agents API

agents-shipgate

static · local · deterministic

Outputs

Markdown report

JSON report

GitHub Action summary

Optional SARIF for GitHub code scanning workflows.

Agent buildersReview MCP, OpenAPI, SDK, and framework tools before promotion.

Platform teamsTrack approval, scope, idempotency, and baseline drift in PRs.

Security reviewersGet static release evidence without running agents or importing user code.

Checks

What it checks before release.

The landing page shows the release-review categories; the full check catalog stays in the repo.

01approval

Approval gaps

Write, destructive, financial, or external communication tools without declared approval policies.

02surface

Wildcard tool sources

Wildcard MCP or inventory sources that expose an unreviewable tool surface.

03auth

Broad scopes

Manifest or tool permissions that rely on wildcard or overly broad authorization scopes.

04schema

Free-form action fields

Fields such as body, command, action, or updates that let the model control too much.

05bounds

Missing bounds

Unbounded arrays, objects, strings, or numeric fields on side-effecting operations.

06retry

Idempotency gaps

Write actions where retry behavior could duplicate refunds, updates, messages, or deletes.

07static

Dynamic surfaces

Framework toolsets that cannot be statically reviewed without explicit inventory evidence.

08review

Owner and policy evidence

Tools missing reviewer-friendly ownership, scope, approval, or prohibited-action coverage.

09baseline

Baseline drift

New, matched, and resolved findings when a reviewed baseline is present.

Sample Tool-Use Readiness Report

Reports release owners can actually review.

Findings include severity, evidence, source reference, confidence, and a recommended next action — built for engineering and platform review, not for dashboards.

agents-shipgate-reports / report.md

Tool-Use Readiness Report

Release blockers

Critical

2

High

14

Medium

2

Human review

recommended

Top findings

#01 Critical openapi/billing.yaml:142

stripe.create_refund lacks a declared approval policy

evidence: financial_action · external_write · POST /refunds

recommend: Add approval policy or remove from this release.

confidence: high · check_id: SHIP-POLICY-APPROVAL-MISSING

#02 Critical openapi/billing.yaml:142

stripe.create_refund lacks idempotency evidence

evidence: write action · amount/currency/payment_id schema · retry behavior unknown

recommend: Add idempotency key or document retry policy.

confidence: high · check_id: SHIP-IDEMPOTENCY-MISSING

#03 High shipgate.yaml:18

wildcard_mcp_tools.* exposes an unreviewable tool surface

evidence: wildcard tool source in shipgate.yaml

recommend: Replace wildcard with explicit allowlist.

confidence: high · check_id: SHIP-SOURCE-WILDCARD

#04 High mcp_tools.json:#/tools/send_email

support.send_email accepts free-form 'body' field

evidence: external_communication · no template binding

recommend: Constrain to template IDs or require human confirmation.

confidence: medium · check_id: SHIP-SCHEMA-FREEFORM-ACTION

#05 Medium openapi/tickets.yaml:88

tickets.update is missing maximum bound on 'fields'

evidence: unbounded object · broad write

recommend: Document or enforce a field allowlist.

confidence: medium · check_id: SHIP-SCHEMA-MISSING-BOUND

18 findings · support_refund_agent fixture · report_schema_version 0.5 generated 2026-04-30T03:40Z

terminal

$ agents-shipgate fixture run support_refund_agent
loading shipgate.yaml ........................ ok
loading fixture support_refund_agent .......... ok
reading shipgate.yaml ......................... ok
reading local tool inventories ................ ok
normalizing inventory ......................... 8 tools, 3 sources
running 18 static checks ...................... done

Status: Release blockers detected
Critical: 2  High: 14  Medium: 2
Human review: recommended

Top findings
crit stripe.create_refund · approval policy missing
crit stripe.create_refund · idempotency evidence missing
high wildcard_mcp_tools.* · unreviewable tool surface
high support.send_email · free-form body field
... 14 more findings

→ wrote agents-shipgate-reports/report.md
→ wrote agents-shipgate-reports/report.json
→ exit 0  (advisory mode)

report.json

// agents-shipgate-reports/report.json
{
  "schema_version": "0.1",
  "report_schema_version": "0.5",
  "summary": {
    "status": "blockers",
    "critical_count": 2,
    "high_count": 14,
    "medium_count": 2,
    "human_review_recommended": true
  },
  "generated_reports": ["report.md", "report.json"],
  "tool_inventory": {
    "tool_count": 8,
    "source_count": 3
  },
  "source_warnings": [],
  "findings": [
    {
      "id": "finding-001",
      "severity": "critical",
      "tool": "stripe.create_refund",
      "check_id": "SHIP-POLICY-APPROVAL-MISSING",
      "evidence": ["financial_action", "external_write"],
      "source": "samples/support_refund_agent/shipgate.yaml",
      "confidence": "high",
      "recommendation": "Add an approval policy or remove this tool from the release."
    },
    {
      "id": "finding-002",
      "severity": "high",
      "tool": "stripe.create_refund",
      "check_id": "SHIP-SIDE-EFFECT-IDEMPOTENCY",
      "evidence": ["external_write", "retryable_side_effect"],
      "source": "samples/support_refund_agent/shipgate.yaml",
      "confidence": "medium",
      "recommendation": "Declare idempotency evidence or document retry handling."
    },
    {
      "id": "finding-003",
      "severity": "medium",
      "tool": "support.lookup_customer",
      "check_id": "SHIP-SCHEMA-BOUNDS",
      "evidence": ["unbounded_string"],
      "source": "samples/support_refund_agent/shipgate.yaml",
      "confidence": "medium",
      "recommendation": "Add schema bounds for customer lookup inputs."
    }
  ]
}

What it catches

Hover findings to highlight, or browse the full check catalog.

Wildcard tools Approval gaps Broad scopes Free-form action fields Missing bounds Idempotency gaps Dynamic surfaces Baseline drift

Total findings18

Fixturesupport_refund_agent

Sources3

Suppressed0

Real-world examples

Proof from public tool surfaces, without turning the homepage into docs.

Four public examples show the scanner on realistic SDK/framework code and larger API surfaces. Each card keeps one representative finding visible and leaves full output in GitHub or docs.

OpenAI Agents SDK2 toolsHigh findings

Airline customer service agent

Static AST extraction finds a write-capable update_seat tool without enough release-review evidence.

Representative findingupdate_seat changes customer state and needs explicit scope and policy coverage.

Open source

Expand config excerpt

tool_sources:
  - id: openai_agents_sdk
    type: openai_agents_sdk
    path: main.py
environment:
  target: production_like

Anthropic Messages API3 toolsCritical

Cookbook customer service agent

A real published tool-use example includes cancel_order, a destructive action that needs approval evidence.

Representative findingcancel_order is destructive and ships without a declared approval policy.

Open source

Expand config excerpt

tool_sources:
  - id: anthropic_tools
    type: anthropic_messages
    path: tools.json
policies:
  approval_required_for: [destructive]

OpenAPI591 toolsStress test

DigitalOcean public API as agent tools

A cloud infrastructure API reframed as a broad agent surface exposes irreversible droplet, database, and Kubernetes operations.

Representative findingDestructive infrastructure actions are present without explicit approval policies.

Open source

Expand config excerpt

tool_sources:
  - id: digitalocean_openapi
    type: openapi
    path: openapi.yaml
permissions:
  scopes: ["*"]

OpenAPI167 toolsStress test

Twilio Messaging API purpose mismatch

A read-only manifest pointed at a messaging API still exposes DELETE-capable tools that contradict the declared purpose.

Representative findingRead-only release intent conflicts with message and phone-number deletion operations.

Open source

Expand config excerpt

agent:
  declared_purpose:
    - read messaging inventory
tool_sources:
  - id: twilio_openapi
    type: openapi
    path: messaging.yaml

Developer workflow

Run locally, add advisory PR checks, then tighten when ready.

CI is advisory by default. Strict mode can fail only on unsuppressed critical findings once your team has reviewed the baseline.

Modes Local CLI PR advisory Strict CI Release review

Local CLI

bash

$ pipx install agents-shipgate
$ agents-shipgate init --workspace . --write
$ agents-shipgate scan -c shipgate.yaml

Requires Python 3.12+. Use python -m pip install agents-shipgate if pipx is not available.

.github/workflows/shipgate.yml

yaml

name: Agents Shipgate

on:
  pull_request:

permissions:
  contents: read

jobs:
  shipgate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@<pinned-sha>
      - uses: ThreeMoonsLab/agents-shipgate@v0.5.1
        with:
          config: shipgate.yaml
          ci_mode: advisory
          output_dir: agents-shipgate-reports

SARIF is available through the GitHub Action or scan output format configuration.

Local CLI

Fast feedback while editing the manifest or tool definitions.

PR advisory

Generate Markdown and JSON artifacts without blocking merges.

Strict CI

Fail only on unsuppressed critical findings once a baseline is reviewed.

Release review

Give platform and security owners a named report to discuss.

Trust model · open source

Built for repositories that need careful review.

The scanner is static by default. It does not execute agents, import user code, run tools, call LLMs, connect to MCP servers, make scanner network calls, or collect scanner telemetry by default.

Designed to be safe to run before production tools are connected. Open-source core. Transparent checks. Suppressions require reasons.

View source Read trust model Open an issue

Inspect the checks. Run the fixture. Open an issue with a false positive.

Default scanner guarantees

Static by default

No agent execution
No user-code import by default
No tool calls
No LLM calls
No MCP server connections
No scanner network calls by default
No scanner telemetry by default
Apache-2.0 open source

Differentiation

Not an eval tool. Not observability. Not a gateway.

Evals test behavior. Observability records runtime. Gateways enforce access. Shipgate reviews the tool surface before release.

Not

Evals

They test behavior.

Shipgate reviews release artifacts.

Not

Observability

It records runtime.

Shipgate runs before promotion.

Not

A gateway

Gateways enforce access.

Shipgate produces review evidence.

Shipgate

Release gate

A release gate for agent tool surfaces.

Static checks. Findings with evidence.

Side-by-side

Category	What it answers	What Shipgate answers
Evals	Did the model behave as expected?	What tool surface are we releasing?
Observability	What happened at runtime?	What should be reviewed before promotion?
MCP gateway	Can a tool call be allowed at runtime?	Does this tool surface need release review?
Security scanner	Are there known code or dependency risks?	Are agent tools, schemas, scopes, and policies reviewable?
Governance platform	How do we manage org-level policy?	What static release findings exist in this repo today?

For the runtime boundary, read Agents Shipgate vs runtime guardrails.

FAQ

Common release-review questions.

Short answers for developers, platform teams, and security reviewers before trying the scanner.

Does it call my agent or send my data anywhere?

No. The scanner is static by default: no agent execution, no user-code import by default, no tool calls, no LLM calls, no MCP server connections, no scanner network calls by default, and no scanner telemetry by default.

Is Agents Shipgate production-ready?

v0.5.1 is the current public release. Use advisory mode first to collect review evidence, then move to stricter CI behavior once your team has reviewed the baseline and suppression process.

How is this different from observability or runtime guardrails?

Observability records what happened at runtime, and guardrails enforce access at runtime. Agents Shipgate runs earlier: it turns declared tool surfaces into static release-review evidence before promotion.

Does it certify my agent as safe?

No. Agents Shipgate is not a safety certification, runtime gateway, or behavioral eval. It produces deterministic findings from tool definitions, schemas, scopes, and declared policies so release owners have evidence to review.

Long-term direction

Start with static release checks. Add release evidence over time.

Today, Agents Shipgate makes agent tool surfaces reviewable before release. The next layers are baselines, suppressions, release history, policy drift, re-review triggers, and runtime evidence integrations.

01 · now

Tool-surface release checks

CLI + GitHub Action.

02

Release evidence

Reports, baselines, history, exceptions.

03

Runtime integrations

Trace evidence without replacing static review.

Get started

Run a fixture in 60 seconds.

$ pipx install agents-shipgate
$ agents-shipgate fixture run support_refund_agent

Run the sample fixture

Have a real agent? We'll review your tool surface and walk through the findings with your team — help@threemoonslab.com

Static release checks for tool-using AI agents.

Run a fixture, scan your repo, then add advisory CI.

Turn your agent's tool surface into release evidence.

What it checks before release.

Approval gaps

Wildcard tool sources

Broad scopes

Free-form action fields

Missing bounds

Idempotency gaps

Dynamic surfaces

Owner and policy evidence

Baseline drift

Reports release owners can actually review.

What it catches

Proof from public tool surfaces, without turning the homepage into docs.

Airline customer service agent

Cookbook customer service agent

DigitalOcean public API as agent tools

Twilio Messaging API purpose mismatch

Run locally, add advisory PR checks, then tighten when ready.

Local CLI

PR advisory

Strict CI

Release review

Built for repositories that need careful review.

Not an eval tool. Not observability. Not a gateway.

Common release-review questions.

Start with static release checks. Add release evidence over time.

Tool-surface release checks

Release evidence

Runtime integrations

Run a fixture in 60 seconds.