Agents Shipgate vs promptfoo · Three Moons Lab

Use promptfoo for prompt and behavior testing

promptfoo is an open-source CLI for evaluating prompts and model completions. Its strengths are regression-testing prompts across providers, running output assertions (regex, JavaScript, model- graded) on test cases, and A/B comparing prompt variants. It runs your model and grades the outputs against expectations.

Use Agents Shipgate for static release-readiness review

Agents Shipgate reads the static release artifact — manifest, tool schemas, scopes, approval policies, idempotency evidence — and produces a deterministic Tool-Use Readiness Report. It does not invoke the model, does not connect to MCP servers, and does not import your agent's code.

Question	promptfoo	Agents Shipgate
Did the model produce the expected output?	Yes	No
What tools is this PR releasing?	No	Yes
Are write-side tools missing approval policies?	No	Yes
Are retry-unsafe tools missing idempotency evidence?	No	Yes
Runs the model?	Yes	No
Should this fail CI before promotion?	If output regresses	If tool surface gap

The pattern teams converge on

Behavior layer (promptfoo) plus release layer (Agents Shipgate). Both run in CI; neither replaces the other. The release layer catches things that are not visible from outputs alone — for example, a tool quietly gaining a wildcard scope without anyone reviewing it, or a destructive action shipping without a declared approval policy.

When you only have time for one

If your agent already has output evals: add Agents Shipgate to catch the manifest-level gaps your evals do not see. If your agent has no evals yet: still start with Agents Shipgate, because a release gate catches the failure mode that ships an unsafe tool to production before any eval can run.