Agents Shipgate vs promptfoo
promptfoo answers whether a model produced the right output on a test case. Agents Shipgate answers whether the released tool surface is reviewable. Different gates in the same CI.
Use promptfoo for prompt and behavior testing
promptfoo is an open-source CLI for evaluating prompts and model completions. Its strengths are regression-testing prompts across providers, running output assertions (regex, JavaScript, model- graded) on test cases, and A/B comparing prompt variants. It runs your model and grades the outputs against expectations.
Use Agents Shipgate for static release-readiness review
Agents Shipgate reads the static release artifact — manifest, tool schemas, scopes, approval policies, idempotency evidence — and produces a deterministic Tool-Use Readiness Report. It does not invoke the model, does not connect to MCP servers, and does not import your agent's code.
| Question | promptfoo | Agents Shipgate |
|---|---|---|
| Did the model produce the expected output? | Yes | No |
| What tools is this PR releasing? | No | Yes |
| Are write-side tools missing approval policies? | No | Yes |
| Are retry-unsafe tools missing idempotency evidence? | No | Yes |
| Runs the model? | Yes | No |
| Should this fail CI before promotion? | If output regresses | If tool surface gap |
The pattern teams converge on
Behavior layer (promptfoo) plus release layer (Agents Shipgate). Both run in CI; neither replaces the other. The release layer catches things that are not visible from outputs alone — for example, a tool quietly gaining a wildcard scope without anyone reviewing it, or a destructive action shipping without a declared approval policy.
When you only have time for one
If your agent already has output evals: add Agents Shipgate to catch the manifest-level gaps your evals do not see. If your agent has no evals yet: still start with Agents Shipgate, because a release gate catches the failure mode that ships an unsafe tool to production before any eval can run.