What's in a release-readiness report? Walking a real finding list
A real agents-shipgate report on a real Anthropic-published agent. Thirteen findings — what each one means and the manifest change that resolves it.
The fastest way to understand what a release-readiness report actually contains is to look at one. This post walks through a real agents-shipgate scan against the anthropic-cookbook customer service agent — a published, working example with three tools.
The agent under review
Three Anthropic Messages API tools pulled verbatim from the cookbook:
get_customer_info(customer_id: string)— reads PII (name, email, phone).get_order_details(order_id: string)— reads order data.cancel_order(order_id: string)— cancels an order.
System prompt: a customer-service chatbot that can look up accounts, check order status, and cancel orders.
The report
Status: release_blockers_detected
Critical: 1 · High: 10 · Medium: 2
Tool inventory:
cancel_order risk_tags=[customer_communication, destructive, write]
get_customer_info risk_tags=[customer_communication, read_only]
get_order_details risk_tags=[read_only]
Three tools, thirteen findings. Let’s walk them.
Finding 1: SHIP-POLICY-APPROVAL-MISSING (critical)
[critical] cancel_order lacks a declared approval policy
evidence: risk_tags=[customer_communication, destructive, write]
recommendation: Declare an approval policy for cancel_order or remove
this tool from the release.
cancel_order has the destructive risk tag because the keyword
classifier flags cancel_*. Destructive write actions need an explicit
human approval gate before they fire — otherwise the agent can cancel
orders nobody approved.
Fix: add to the manifest:
policies:
require_approval_for_tools:
- cancel_order
Critical resolved.
Findings 2–3: SHIP-AUTH-MISSING-SCOPE (high, ×2)
[high] cancel_order lacks declared auth scopes
[high] get_customer_info lacks declared auth scopes
evidence: write tool / customer_communication tool, no scopes declared
The cookbook example uses simulated data; in production these would hit a real database or order API with some service account. The release gate wants the auth scopes declared explicitly so reviewers can confirm they’re narrower than the service account’s actual permissions.
Fix: declare scopes per-tool:
permissions:
scopes:
cancel_order:
- orders:cancel
get_customer_info:
- customers:read_pii
Two highs resolved.
Findings 4–5: SHIP-MANIFEST-HIGH-RISK-OWNER-MISSING (high, ×2)
[high] cancel_order has no declared owner
[high] get_customer_info has no declared owner
evidence: high-risk tools (destructive / customer_communication / write)
without an `owner:` field on the tool or in the manifest
For high-risk tools, the release gate wants someone named who owns the tool’s behavior — the team to page when something goes wrong, the person whose review is required for changes.
Fix:
risk_overrides:
tools:
cancel_order:
owner: support-platform-team
get_customer_info:
owner: customer-data-team
Two highs resolved.
Findings 6–7: SHIP-POLICY-CONFIRMATION-MISSING (high, ×2)
[high] cancel_order lacks a confirmation policy for customer-touching action
[high] get_customer_info lacks a confirmation policy for PII access
Customer-touching tools want confirmation in addition to approval — an explicit “yes” from the affected customer (or a documented opt-out for B2B agents where the operator is authoritative).
Fix:
policies:
require_confirmation_for_tools:
- cancel_order
- get_customer_info
Two highs resolved.
Findings 8–9: SHIP-SIDEFX-IDEMPOTENCY-MISSING (high, ×2)
[high] cancel_order may be retried without idempotency evidence
[high] get_customer_info may be retried without idempotency evidence
If a transient network blip causes the orchestrator to retry the call, will it cancel the order twice? Read PII twice (low cost) or trigger the same audit-log event twice (medium cost)?
Fix: add an idempotency_key parameter to the schema, or declare
the tool as idempotent in policy:
policies:
idempotency_tools:
- cancel_order
- get_customer_info # safe — read-only
Two highs resolved.
Findings 10–12: SHIP-API-FUNCTION-SCHEMA-STRICTNESS (high ×2 + medium ×1)
[high] cancel_order function schema is not strict enough
[high] get_customer_info function schema is not strict enough
[medium] get_order_details function schema is not strict enough
evidence: additionalProperties not false
Anthropic’s tool schema doesn’t have a strict: true flag the way
OpenAI’s does, but the underlying issue applies: if
additionalProperties isn’t false, the model can smuggle extra
fields into a tool call. For write or PII-touching tools
(cancel_order, get_customer_info) the severity is high; for the
purely read-only get_order_details it’s medium.
Fix: add additionalProperties: false to each tool’s input_schema.
Three findings resolved.
Finding 13: SHIP-API-PROMPT-TOOL-SCOPE-MISMATCH (medium)
[medium] Prompt lacks approval/confirmation language for high-risk tools
evidence: tools=[cancel_order, get_customer_info]
The system prompt says the agent “can help customers” but doesn’t include any language about asking for confirmation or approval before firing destructive or PII-touching actions. The release gate wants the prompt to acknowledge those rails so the model has explicit instructions, not just policy enforcement after the fact.
Fix: edit prompts/system.md to include something like “always
confirm the customer’s identity before changes; ask for explicit
confirmation before cancelling an order.”
One medium resolved.
What the report looks like after fixes
Status: pass
Critical: 0 · High: 0 · Medium: 0
Same agent, same model, same prompt. The model behavior didn’t change. What changed is the release artifact — the manifest now contains the explicit declarations a reviewer needs to sign off.
What this is meant to feel like
A good release-readiness report should produce a finite, named list of changes. Every finding should have:
- A specific tool name (or “agent” for manifest-level findings).
- Concrete evidence — risk tags, schema issues, missing fields.
- A recommended remediation that’s actionable in the manifest.
- A severity that matches the production impact.
That’s it. No vague risk scores, no “secure your agent” platitudes, just the punch list.
If you want to run this on your own repo:
pipx install agents-shipgate
agents-shipgate init --workspace . --write
agents-shipgate scan -c shipgate.yaml
Sixty seconds for the first scan. Most agents produce 5–15 findings, most of them legitimate.