Adding a release gate to an OpenAI Agents SDK project
If your agent is built with @function_tool decorators, agents-shipgate reads the source statically (no import) and produces release-readiness findings on every PR.
If you ship an agent built on the OpenAI Agents SDK,
the tool surface is whatever your @function_tool-decorated functions
declare — usually a handful of Python functions in one or more files.
That surface is a release artifact in the same way an OpenAPI spec on a
service is. It needs a release gate.
This post walks through wiring agents-shipgate into an Agents SDK project end-to-end.
What gets scanned
agents-shipgate parses your @function_tool-decorated Python statically
— the file is never imported, and the model is never invoked. The
AST extractor reads:
- The decorator kwargs (
name_override,description_override,failure_error_function, etc.) - The function signature, including type annotations
- The first-arg
RunContextWrapper-style context parameter (skipped as not part of the model-facing schema) - Docstrings (used as fallback descriptions when no
description_overrideis set)
What comes out is a normalized tool inventory the rest of the scanner treats identically to MCP, OpenAPI, or Anthropic Messages API tools.
1. Install
pipx install agents-shipgate
agents-shipgate init --workspace . --write
init --write produces a shipgate.yaml skeleton with CHANGE_ME
placeholders.
2. Point the manifest at your agent code
version: "0.1"
project:
name: airline-customer-service
agent:
name: airline-cs-agent
sdk:
type: openai_agents
language: python
entrypoint: examples/customer_service/main.py
declared_purpose:
- assist airline customers with seat changes and FAQs
environment:
target: production
tool_sources:
- id: customer_service_sdk
type: openai_agents_sdk
path: examples/customer_service/main.py
Two things matter here:
agent.sdk.entrypointtells the scanner which file to start AST-parsing from. It can be a single file or a package directory.tool_sources[].pathis the same file or any subpath you want scanned. Multipletool_sourcesentries are allowed.
3. First scan
agents-shipgate scan -c shipgate.yaml
Sample output for the canonical Agents SDK customer-service example:
Status: warnings_detected
Critical: 0 · High: 2 · Medium: 2
Top findings:
1. SHIP-AUTH-MISSING-SCOPE
update_seat lacks declared auth scopes (write tool)
2. SHIP-INVENTORY-LOW-CONFIDENCE-PRODUCTION-SURFACE
production target with medium-confidence SDK extraction
3. SHIP-SCHEMA-FREEFORM-OUTPUT
faq_lookup_tool returns str (no schema)
The scanner detected update_seat as a write tool from its name
alone — no HTTP method, no docstring keyword, just AST + the keyword
classifier. The decorator’s description_override was correctly
preferred over the docstring. The RunContextWrapper first-arg was
correctly skipped.
The SHIP-INVENTORY-LOW-CONFIDENCE-PRODUCTION-SURFACE finding is the
production safety net: SDK static extraction is medium-confidence by
design (decorators, dynamic factories, runtime-built tools can hide),
so promoting an SDK-extracted surface to production gets a nudge to
declare tools through MCP or OpenAPI for higher-confidence inventory.
4. Add it to GitHub Actions
.github/workflows/shipgate.yml:
name: Agents Shipgate
on:
pull_request:
permissions:
contents: read
pull-requests: write
jobs:
agents-shipgate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: ThreeMoonsLab/[email protected]
with:
config: shipgate.yaml
ci_mode: advisory
pr_comment: "true"
Advisory mode posts the finding list as a PR comment without failing
CI. Switch to ci_mode: strict with a baseline once your team has
triaged existing findings.
5. Move to strict
agents-shipgate baseline save -c shipgate.yaml --out .agents-shipgate/baseline.json
Commit the baseline, then in the workflow:
- uses: ThreeMoonsLab/[email protected]
with:
config: shipgate.yaml
ci_mode: strict
fail_on: critical,high
baseline: .agents-shipgate/baseline.json
Strict mode fails CI only on new findings — pre-existing ones in the baseline don’t break unrelated PRs.
What this catches that evals don’t
update_seatbecomes a write tool without an approval policy → flagged.- A new
@function_toolis added in a PR but the eval suite doesn’t exercise it → still flagged on the manifest diff. - The agent’s prompt says “advise only” but the surface includes
cancel_booking→ contradiction surfaced. - The SDK code calls into a generated OpenAPI spec for an internal API
that adds a
delete_useroperation → flagged on the spec diff.
Each of these is the kind of release risk that ships when nothing static sits between PR and production.
For the broader argument that this is the right slot for static checks, see Your AI agent has a tool surface. It needs a release gate..