Home

Agents Shipgate vs promptfoo

promptfoo answers whether a model produced the right output on a test case. Agents Shipgate answers whether the released tool surface is reviewable. Different gates in the same CI.

Use promptfoo for prompt and behavior testing

promptfoo is an open-source CLI for evaluating prompts and model completions. Its strengths are regression-testing prompts across providers, running output assertions (regex, JavaScript, model- graded) on test cases, and A/B comparing prompt variants. It runs your model and grades the outputs against expectations.

Use Agents Shipgate for static release-readiness review

Agents Shipgate reads the static release artifact — manifest, tool schemas, scopes, approval policies, idempotency evidence — and produces a deterministic Tool-Use Readiness Report. It does not invoke the model, does not connect to MCP servers, and does not import your agent's code.

QuestionpromptfooAgents Shipgate
Did the model produce the expected output?YesNo
What tools is this PR releasing?NoYes
Are write-side tools missing approval policies?NoYes
Are retry-unsafe tools missing idempotency evidence?NoYes
Runs the model?YesNo
Should this fail CI before promotion?If output regressesIf tool surface gap

The pattern teams converge on

Behavior layer (promptfoo) plus release layer (Agents Shipgate). Both run in CI; neither replaces the other. The release layer catches things that are not visible from outputs alone — for example, a tool quietly gaining a wildcard scope without anyone reviewing it, or a destructive action shipping without a declared approval policy.

When you only have time for one

If your agent already has output evals: add Agents Shipgate to catch the manifest-level gaps your evals do not see. If your agent has no evals yet: still start with Agents Shipgate, because a release gate catches the failure mode that ships an unsafe tool to production before any eval can run.

Why evals are not release gates Run a scan