Agents Shipgate Release checks for tool-using agents

Find risky tools
before your agent ships.

Agents Shipgate is an open-source CLI + GitHub Action for reviewing what tools an agent can call before it gets production-like tools or higher permissions.

It scans shipgate.yaml, local MCP exports, OpenAPI specs, and optional OpenAI Agents SDK static metadata.

Static by default: no agent execution · no tool calls · no LLM calls · no MCP connections · no network · no telemetry
shipgate · tool surface check v0.1.0
$
What it checks
Missing approval policies Wildcard MCP tools Broad auth scopes Free-form action fields Missing bounds Idempotency gaps External communication Injection-like descriptions

Agents are moving from chat to action.

Once an agent can refund, email, cancel, update tickets, or call internal APIs, tool changes become release risks. Shipgate makes those risks visible before promotion.

Tool surfaces drift

MCP exports, OpenAPI specs, and function tools change faster than most release reviews can track.

Policies are implicit

Approval, confirmation, idempotency, and scope expectations often live in prompts, docs, or tribal knowledge.

Runtime evidence arrives late

Traces are useful, but they appear after behavior exists. Shipgate runs before promotion.

A static pre-flight check for agent tool surfaces.

Shipgate turns an agent's tool surface into a reviewable release artifact: inventory, schemas, scopes, declared policies, findings, and recommended next actions.

Manifest-first

Use shipgate.yaml to declare agent purpose, tool sources, policies, overrides, and CI behavior.

MCP / OpenAPI analysis

Inspect local MCP exports and OpenAPI specs for broad schemas, write actions, missing bounds, and risky operations.

SDK-assisted

Optionally enrich reports with OpenAI Agents SDK static metadata without importing user code by default.

Tool-use readiness checks

Flag wildcard tools, missing approval policies, broad scopes, free-form action fields, and idempotency gaps.

CLI + GitHub Action

Run locally or add advisory checks to pull requests and release workflows.

Markdown + JSON reports

Produce human-readable release review reports and machine-readable JSON artifacts.

From tool surface to release-review report.

01

Declare

Define the agent, declared purpose, prohibited actions, tool sources, and policy expectations in shipgate.yaml.

02

Scan

Load local MCP exports, OpenAPI specs, and optional SDK static metadata.

03

Normalize

Build a unified tool inventory with schemas, scopes, annotations, source references, and risk hints.

04

Check

Run deterministic static checks for wildcard tools, broad schemas, approval gaps, idempotency evidence, scope mismatch, and more.

05

Report

Generate Markdown and JSON reports for local review, CI artifacts, or PR comments.

shipgate.yaml
MCP tools export
OpenAPI spec
Agents SDK metadata
agents-shipgate
static · local · deterministic
Markdown report
JSON report
GitHub Action summary

Reports release owners can actually review.

Findings include severity, evidence, source reference, confidence, and a recommended next action — built for engineering and platform review, not for dashboards.

agents-shipgate-reports / report.md
Agents Shipgate Report
Release blockers
Critical
2
High
13
Medium
1
Human review
recommended
Top findings
#01 Critical openapi/billing.yaml:142
stripe.create_refund lacks a declared approval policy
evidence: financial_action · external_write · POST /refunds
recommend: Add approval policy or remove from this release.
confidence: high · check_id: tool-use/approval-missing
#02 Critical openapi/billing.yaml:142
stripe.create_refund lacks idempotency evidence
evidence: write action · amount/currency/payment_id schema · retry behavior unknown
recommend: Add idempotency key or document retry policy.
confidence: high · check_id: tool-use/approval-missing
#03 High shipgate.yaml:18
wildcard_mcp_tools.* exposes an unreviewable tool surface
evidence: wildcard tool source in shipgate.yaml
recommend: Replace wildcard with explicit allowlist.
confidence: high · check_id: tool-use/wildcard-source
#04 High mcp_tools.json:#/tools/send_email
support.send_email accepts free-form 'body' field
evidence: external_communication · no template binding
recommend: Constrain to template IDs or require human confirmation.
confidence: medium · check_id: tool-use/wildcard-source
#05 Medium openapi/tickets.yaml:88
tickets.update is missing maximum bound on 'fields'
evidence: unbounded object · broad write
recommend: Document or enforce a field allowlist.
confidence: medium · check_id: tool-use/unbounded-write
16 findings · 24 tools · 3 sources generated 2026-04-24T10:22Z
terminal
$ agents-shipgate scan --config shipgate.yaml
loading shipgate.yaml ........................ ok
reading mcp_tools.json (24 tools) ............. ok
reading openapi/billing.yaml .................. ok
reading openapi/tickets.yaml .................. ok
normalizing inventory ......................... 24 tools, 7 surfaces
running 18 static checks ...................... done
Status: Release blockers detected
Critical: 2 High: 13 Medium: 1
Human review: recommended
Top findings
crit stripe.create_refund · approval policy missing
crit stripe.create_refund · idempotency evidence missing
high wildcard_mcp_tools.* · unreviewable tool surface
high support.send_email · free-form body field
... 12 more findings
→ wrote agents-shipgate-reports/report.md
→ wrote agents-shipgate-reports/report.json
→ exit 0 (advisory mode)
report.json
// agents-shipgate-reports/report.json
{
"schema_version": "1.0",
"status": "blockers",
"summary": {
"critical": 2,
"high": 13,
"medium": 1,
"human_review": "recommended"
},
"findings": [
{
"id": "approval-missing-001",
"severity": "critical",
"tool": "stripe.create_refund",
"check": "tool-use/approval-missing",
"evidence": ["financial_action", "external_write"],
"source": "openapi/billing.yaml:142",
"confidence": "high",
"recommendation": "Add approval policy or remove from release."
},
// 15 more findings ...
]
}
GitHub Actions · Job Summary
▸ shipgate advisory · 2.3s
Agents Shipgate · pull/482
head: feature/refunds-mcp-tool · base: main
2 critical findings introduced 13 high findings (4 new) 1 medium finding
changed tool surface:
+ stripe.create_refund
+ support.send_email
~ tickets.update (schema widened)
→ report.md, report.json uploaded as artifacts
→ ci_mode: advisory (will not fail check)

What it catches

Hover findings to highlight, or browse the full check catalog.

Wildcard tools Approval gaps Broad scopes Free-form action fields Missing bounds Idempotency gaps External communication Injection-like descriptions

Total findings16
Tools inspected24
Sources3
Suppressed0

Tool-use risk is where agent releases become real.

An agent with no tools can still be wrong. An agent with tools can take action. Tool-use readiness focuses on the release risks that appear when tools, permissions, schemas, policies, and side effects enter the system.

Permission boundaries

Know when tools require broad scopes, service accounts, or missing approval policies.

Schema ambiguity

Catch broad free-form fields like updates, command, action, and body before they become model-controlled actions.

Blast radius

Flag missing maximum bounds, confirmation flows, and idempotency evidence for write actions.

Reviewability

Turn a tool surface into a report that release owners, platform teams, and security reviewers can discuss.

Release confidence

Run the same check locally, in PRs, or before promotion to production-like environments.

Run it locally. Add it to PRs. Keep it advisory by default.

CI is advisory by default. Strict mode can fail only on unsuppressed critical findings when your team is ready.

Local CLI
bash
$ python -m pip install "git+https://github.com/ThreeMoonsLab/agents-shipgate@v0.1.0"
$ agents-shipgate init --workspace . --write
$ agents-shipgate scan --config shipgate.yaml
.github/workflows/shipgate.yml
yaml
name: Agents Shipgate
on:
pull_request:
permissions:
contents: read
jobs:
shipgate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@<pinned-sha>
- uses: ThreeMoonsLab/agents-shipgate@v0.1.0
with:
config: shipgate.yaml
ci_mode: advisory
output_dir: agents-shipgate-reports
First run · 30 seconds

Run the fixture

Clone the repo and run Shipgate against the bundled example agent. You'll see real findings on a real tool surface — no setup, no credentials.

bash
$ git clone https://github.com/ThreeMoonsLab/agents-shipgate
$ cd agents-shipgate
$ agents-shipgate scan --config examples/refunds-agent/shipgate.yaml
→ 16 findings on the example agent
Design partners

Run Shipgate on your tool surface

If your team ships an agent with MCP or OpenAPI tools, we'd like to run Shipgate against your repo and walk you through the findings. False positives, missing checks, and rough edges become roadmap.

  • ·Agents with tool calls in production or staging
  • ·MCP servers, OpenAPI specs, or Agents SDK projects
  • ·30-minute walkthrough of report and findings
Run Shipgate on your tool surface

Built for repositories that need to be careful.

Agents Shipgate is static by default and open-source. It does not execute agent code, run tools, call LLMs, connect to MCP servers, make network calls, or collect telemetry.

Designed to be safe to run on internal repositories before you connect any hosted service. Open-source core. Transparent checks. Suppressions require reasons.

Inspect the checks. Run the fixture. Open an issue with a false positive.

Default scanner guarantees
 Static by default
  • No agent execution
  • No tool calls
  • No LLM calls
  • No MCP server connections
  • No network calls
  • No telemetry
  • No user-code import by default
  • Apache-2.0 open source

Not an eval tool. Not observability. Not a gateway.

Evals test behavior. Observability records runtime. Gateways enforce access. Shipgate reviews the tool surface before release.

Not
Evals
They test behavior.
Shipgate reviews release artifacts.
Not
Observability
It records runtime.
Shipgate runs before promotion.
Not
A gateway
Gateways enforce access.
Shipgate produces review evidence.
Shipgate
Release gate
A release gate for agent tool surfaces.
Static checks. Findings with evidence.
Side-by-side
Category What it answers What Shipgate answers
EvalsDid the model behave as expected?What tool surface are we releasing?
ObservabilityWhat happened at runtime?What should be reviewed before promotion?
MCP gatewayCan a tool call be allowed at runtime?Does this tool surface need release review?
Security scannerAre there known code or dependency risks?Are agent tools, schemas, scopes, and policies reviewable?
Governance platformHow do we manage org-level policy?What static release findings exist in this repo today?

From release readiness to agent healthcare.

Agents Shipgate starts with static release checks for tool-using agents. The longer-term vision is infrastructure for agent lifecycle health: release history, policy drift, trace-based evidence, approval workflows, re-review triggers, and governance signals across teams and agents.

01 · you are here

Tool-surface release checks

CLI + GitHub Action.

02

Release evidence

Reports, history, baselines, exceptions.

03

Runtime behavior evidence

Traces, tool routing, handoffs, guardrails.

04

Agent health infrastructure

Lifecycle health, policy drift, re-authorization, governance.

We are not building a closed governance suite. The lab works in the open and partners with teams shipping real agents.

Interested in design partnership
Get started

Try the pre-flight check on your agent tool surface.

Quickstart · 30 seconds
$ pipx install agents-shipgate
$ agents-shipgate init --write
$ agents-shipgate scan