Home

Agents Shipgate vs Braintrust

Braintrust is a hosted platform for scoring model outputs and monitoring runtime behavior. Agents Shipgate is a local CLI and GitHub Action that scans the declared tool surface before release. Different layers, often shipped together.

Use Braintrust for eval and observability infrastructure

Braintrust ships a hosted SaaS for organizing test sets, scoring model outputs (custom and built-in scorers), comparing experiments, and recording runtime traces. It fits when you want managed eval infrastructure, dashboards, and team collaboration around model quality.

Use Agents Shipgate for static release-readiness review

Agents Shipgate reads a checked-in shipgate.yaml manifest plus local tool sources (MCP exports, OpenAPI specs, SDK/framework metadata) and produces a deterministic Tool-Use Readiness Report. It runs locally, never invokes the model, and emits no scanner telemetry by default. Output formats are Markdown, JSON, and SARIF.

DimensionBraintrustAgents Shipgate
DeploymentHosted SaaSLocal CLI + GitHub Action
Primary inputTest sets, runtime tracesManifest plus tool sources
Runs the model?YesNo
Sends data off-machine?Yes (hosted)No by default
LicenseCommercialApache-2.0
Catches: failing test scorerYesNo
Catches: missing approval policy on a write toolNoYes
Catches: wildcard tool source in a PRNoYes

The pattern teams converge on

If you already use Braintrust for model evals, Agents Shipgate slots in as the release gate Braintrust does not cover: static review of the tool surface itself, run locally in CI before any model invocation. If you do not yet have a hosted eval platform, Agents Shipgate is a no-account, no-telemetry starting point for the release-readiness layer; you can add Braintrust later for the behavior layer.

Why evals are not release gates Run a scan