Adding a release gate to an OpenAI Agents SDK project

If you ship an agent built on the OpenAI Agents SDK, the tool surface is whatever your @function_tool-decorated functions declare — usually a handful of Python functions in one or more files. That surface is a release artifact in the same way an OpenAPI spec on a service is. It needs a release gate.

This post walks through wiring agents-shipgate into an Agents SDK project end-to-end.

What gets scanned

agents-shipgate parses your @function_tool-decorated Python statically — the file is never imported, and the model is never invoked. The AST extractor reads:

The decorator kwargs (name_override, description_override, failure_error_function, etc.)
The function signature, including type annotations
The first-arg RunContextWrapper-style context parameter (skipped as not part of the model-facing schema)
Docstrings (used as fallback descriptions when no description_override is set)

What comes out is a normalized tool inventory the rest of the scanner treats identically to MCP, OpenAPI, or Anthropic Messages API tools.

1. Install

pipx install agents-shipgate
agents-shipgate init --workspace . --write

init --write produces a shipgate.yaml skeleton with CHANGE_ME placeholders.

2. Point the manifest at your agent code

version: "0.1"
project:
  name: airline-customer-service
agent:
  name: airline-cs-agent
  sdk:
    type: openai_agents
    language: python
    entrypoint: examples/customer_service/main.py
  declared_purpose:
    - assist airline customers with seat changes and FAQs
environment:
  target: production
tool_sources:
  - id: customer_service_sdk
    type: openai_agents_sdk
    path: examples/customer_service/main.py

Two things matter here:

agent.sdk.entrypoint tells the scanner which file to start AST-parsing from. It can be a single file or a package directory.
tool_sources[].path is the same file or any subpath you want scanned. Multiple tool_sources entries are allowed.

3. First scan

agents-shipgate scan -c shipgate.yaml

Sample output for the canonical Agents SDK customer-service example:

Status: warnings_detected
Critical: 0 · High: 2 · Medium: 2

Top findings:
1. SHIP-AUTH-MISSING-SCOPE
   update_seat lacks declared auth scopes (write tool)
2. SHIP-INVENTORY-LOW-CONFIDENCE-PRODUCTION-SURFACE
   production target with medium-confidence SDK extraction
3. SHIP-SCHEMA-FREEFORM-OUTPUT
   faq_lookup_tool returns str (no schema)

The scanner detected update_seat as a write tool from its name alone — no HTTP method, no docstring keyword, just AST + the keyword classifier. The decorator’s description_override was correctly preferred over the docstring. The RunContextWrapper first-arg was correctly skipped.

The SHIP-INVENTORY-LOW-CONFIDENCE-PRODUCTION-SURFACE finding is the production safety net: SDK static extraction is medium-confidence by design (decorators, dynamic factories, runtime-built tools can hide), so promoting an SDK-extracted surface to production gets a nudge to declare tools through MCP or OpenAPI for higher-confidence inventory.

4. Add it to GitHub Actions

.github/workflows/shipgate.yml:

name: Agents Shipgate

on:
  pull_request:

permissions:
  contents: read
  pull-requests: write

jobs:
  agents-shipgate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: ThreeMoonsLab/[email protected]
        with:
          config: shipgate.yaml
          ci_mode: advisory
          pr_comment: "true"

Advisory mode posts the finding list as a PR comment without failing CI. Switch to ci_mode: strict with a baseline once your team has triaged existing findings.

5. Move to strict

agents-shipgate baseline save -c shipgate.yaml --out .agents-shipgate/baseline.json

Commit the baseline, then in the workflow:

      - uses: ThreeMoonsLab/[email protected]
        with:
          config: shipgate.yaml
          ci_mode: strict
          fail_on: critical,high
          baseline: .agents-shipgate/baseline.json

Strict mode fails CI only on new findings — pre-existing ones in the baseline don’t break unrelated PRs.

What this catches that evals don’t

update_seat becomes a write tool without an approval policy → flagged.
A new @function_tool is added in a PR but the eval suite doesn’t exercise it → still flagged on the manifest diff.
The agent’s prompt says “advise only” but the surface includes cancel_booking → contradiction surfaced.
The SDK code calls into a generated OpenAPI spec for an internal API that adds a delete_user operation → flagged on the spec diff.

Each of these is the kind of release risk that ships when nothing static sits between PR and production.

For the broader argument that this is the right slot for static checks, see Your AI agent has a tool surface. It needs a release gate..