Every static check behind the merge verdict.
132 static checks across 24 categories — use this as your AI agent release-readiness checklist. Every category corresponds to a class of release risk for tool-using agents (MCP, OpenAPI, OpenAI Agents SDK, Anthropic, LangChain, CrewAI, Google ADK, Codex plugins, n8n). Vendored from docs/checks.json and refreshed on each agents-shipgate release.
Use this catalog as your...
- MCP security checklist — review wildcard sources, missing approval policies, idempotency gaps, and broad scopes before deploying an MCP server.
- AI agent release checklist — match every PR's tool-surface change against the categories below before approving merge.
- Framework-agnostic tool review — the same checks apply to OpenAI Agents SDK, Anthropic Messages API, LangChain / LangGraph, CrewAI, Google ADK, Codex plugins, and n8n workflows.
All checks run statically: no model invocation, no MCP connection,
no verifier network calls, no verifier telemetry by default. Run
agents-shipgate verify --base origin/main --head HEAD --format json in CI; see the
quickstart.
action_surface
-
SHIP-ACTION-APPROVAL-REMOVEDcriticalAction approval policy was removed.
Why: Removing approval weakens the release boundary for an existing action.
-
SHIP-ACTION-CONTROL-DOWNGRADEhighAction declaration weakens an inherited approval or safeguard control.
Why: Manifest-wide approval and safeguard controls are governance requirements; per-action metadata should not silently weaken them.
-
SHIP-ACTION-DESTRUCTIVE-ROLLBACK-MISSINGcriticalNew destructive action lacks approval or rollback controls.
Why: Destructive actions need explicit approval and rollback evidence before release.
-
SHIP-ACTION-EFFECT-DOWNGRADE-DECLAREDhighAction declaration weakens the inferred effect.
Why: Per-action metadata should not be able to declare away a higher-risk operation inferred from the tool surface.
-
SHIP-ACTION-EFFECT-ESCALATEDcriticalAction effect escalated compared with the base surface.
Why: Effect escalation changes what the agent can do in the real world and needs explicit review.
-
SHIP-ACTION-EXTERNAL-COMMUNICATION-AUDIT-MISSINGhighNew external communication action lacks audit evidence.
Why: External communication changes agent blast radius and should be auditable.
-
SHIP-ACTION-FINANCIAL-WRITE-CONTROL-MISSINGcriticalNew financial write action lacks required controls.
Why: Financial write actions need approval, audit, and idempotency evidence before release.
-
SHIP-ACTION-POLICY-VIOLATIONhighAn action-surface policy requirement is not satisfied.
Why: Action Surface Diff policies are the reviewer-facing release boundary for external action capability.
-
SHIP-ACTION-SAFEGUARD-REMOVEDhighAction safeguard was removed.
Why: Removing audit, idempotency, rollback, or dry-run safeguards expands blast radius.
-
SHIP-ACTION-UNDECLAREDhighA loaded tool lacks explicit action-surface metadata.
Why: Action Surface Diff depends on reviewer-visible action metadata for release decisions.
-
SHIP-ACTION-WILDCARD-SCOPEcriticalAction surface includes a wildcard or admin-like scope.
Why: Wildcard scopes make action blast radius too broad for deterministic release review.
adk
-
SHIP-ADK-DYNAMIC-TOOLSET-NOT-ENUMERABLEhighGoogle ADK toolset cannot be statically enumerated.
Why: Release review needs an explicit tool inventory; ADK MCP/OpenAPI toolsets may resolve tools dynamically at runtime.
-
SHIP-ADK-EVAL-COVERAGE-MISSINGmediumGoogle ADK eval coverage is not declared.
Why: ADK releases should include response and tool-trajectory eval evidence before promotion.
-
SHIP-ADK-FUNCTION-TOOL-METADATA-MISSINGmediumGoogle ADK function tool lacks static metadata.
Why: Static review depends on descriptions and parameter schemas because user ADK code is not imported.
-
SHIP-ADK-GUARDRAIL-EVIDENCE-MISSINGhighHigh-risk Google ADK tools lack static guardrail evidence.
Why: Callbacks and plugins are the static ADK surface where release reviewers can see guardrail intent.
-
SHIP-ADK-LONGRUNNING-CONTRACT-MISSINGhighGoogle ADK long-running tool lacks an operation contract.
Why: Long-running tools need explicit status and operation-id semantics for safe continuation.
-
SHIP-ADK-MCP-TOOLSET-UNFILTEREDhighGoogle ADK McpToolset lacks a static tool filter.
Why: Unfiltered MCP toolsets can expose more tools than reviewers expect.
api
-
SHIP-API-FUNCTION-SCHEMA-STRICTNESShighOpenAI API function schema is not strict enough for reliable tool calls.
Why: Strict schemas reduce ambiguous tool arguments and downstream side-effect risk.
-
SHIP-API-OPERATIONAL-READINESSmediumDeprecated compatibility alias for the v0.3 OpenAI API operational readiness bundle.
Why: v0.4 emits atomic OpenAI API readiness check IDs, but this ID remains available for existing suppressions, severity overrides, baselines, SARIF consumers, and explain/list-checks workflows during the deprecation window.
-
SHIP-API-PROMPT-TOOL-SCOPE-MISMATCHhighPrompt scope contradicts enabled OpenAI API tools.
Why: Prompt instructions should match the actual write/high-risk tool surface.
-
SHIP-API-RETRY-POLICY-MISSINGmediumOpenAI API high-risk flow lacks retry policy metadata.
Why: Retries need explicit policy metadata so reviewers can reason about duplicate side effects.
-
SHIP-API-RETRY-WITHOUT-IDEMPOTENCYhighOpenAI API write tool may be retried without idempotency evidence.
Why: Retries against non-idempotent writes can duplicate financial, destructive, or external side effects.
-
SHIP-API-STRUCTURED-OUTPUT-READINESSmediumOpenAI API structured output schema is missing or under-specified.
Why: Downstream release decisions need explicit, structured success/refusal/review modeling.
-
SHIP-API-TEST-CASES-MISSINGmediumOpenAI API high-risk flow lacks test case metadata.
Why: High-risk tool-call flows should have release evidence before promotion.
-
SHIP-API-TIMEOUT-MISSINGmediumOpenAI API high-risk flow lacks timeout metadata.
Why: Timeouts define failure behavior and reduce ambiguous tool-call continuation.
-
SHIP-API-TOOL-OUTPUT-SCHEMA-MISSINGmediumOpenAI API high-risk tool lacks success/failure output modeling.
Why: Tool output schemas help release reviewers reason about downstream failure handling.
-
SHIP-API-TRACE-APPROVAL-MISSINGmediumOpenAI API trace sample shows a policy-controlled tool without approval.
Why: Trace samples should demonstrate approval behavior for tools that require approval.
-
SHIP-API-TRACE-CONFIRMATION-MISSINGmediumOpenAI API trace sample shows a policy-controlled tool without confirmation.
Why: Trace samples should demonstrate explicit confirmation for tools that require confirmation.
auth
-
SHIP-AUTH-MANIFEST-BROAD-SCOPEhighManifest declares broad permission scopes.
Why: Broad manifest scopes weaken least-privilege review.
-
SHIP-AUTH-MISSING-SCOPEhighScope-requiring tool lacks declared auth scopes.
Why: Reviewers cannot assess least privilege without scope metadata.
-
SHIP-AUTH-SCOPE-COVERAGE-MISSINGhighTool-required scopes are not covered by manifest permissions.scopes.
Why: The manifest should describe the actual permissions needed by the release.
-
SHIP-AUTH-TOOL-BROAD-SCOPEhighTool declares broad auth scopes.
Why: Tool-level broad scopes may grant more power than the operation needs.
baseline
-
SHIP-BASELINE-ENTRY-EXPIREDhighBaseline entry's review window has expired.
Why: Reviewer-set `provenance.expires` is the renewable consent for accepting technical debt. Past that date the entry needs a fresh review, not a silent extension.
-
SHIP-BASELINE-ENTRY-STALElowBaseline entry no longer corresponds to an active finding or check ID.
Why: Stale baseline entries hide intent — reviewers cannot tell whether the accepted debt was resolved or whether the check was renamed. Pruning keeps the baseline aligned with reality.
-
SHIP-BASELINE-INTEGRITY-MISMATCHcriticalBaseline file integrity check failed.
Why: The baseline JSON has been edited outside `agents-shipgate baseline save`, lacks an audit log row, has a malformed audit log row, or references a run_id not present in the audit log. A release gate that accepts silent baseline edits cannot claim to govern technical debt.
codex_boundary
-
SHIP-CODEX-BOUNDARY-AGENTS-SHIPGATE-REQUIREMENT-REMOVEDmediumAGENTS.md removed a Shipgate requirement.
Why: Agent instructions are not controls; removing gate instructions requires human review because semantic weakening cannot be proven safely.
-
SHIP-CODEX-BOUNDARY-APP-AUTO-APPROVEhighCodex app connector tool approval changed to approve.
Why: Connector-backed app tools are externally mediated and need human review before local auto-approval.
-
SHIP-CODEX-BOUNDARY-CI-GATE-REMOVEDcriticalShipgate GitHub Action no longer invokes the gate.
Why: Removing the local or CI gate is a direct bypass.
-
SHIP-CODEX-BOUNDARY-CONFIG-PARSE-FAILEDmediumCodex project configuration could not be parsed.
Why: A malformed Codex config prevents deterministic inspection of the local execution boundary, so the local agent check fails closed to review.
-
SHIP-CODEX-BOUNDARY-DANGER-FULL-ACCESScriticalCodex full-access sandbox is selected.
Why: danger-full-access removes local sandbox restrictions and must not be silently approved by an agent.
-
SHIP-CODEX-BOUNDARY-HOOK-COMMAND-CHANGEDhighA Codex executable hook changed.
Why: Hooks execute in the agent lifecycle and can alter local behavior before or after tool calls.
-
SHIP-CODEX-BOUNDARY-MCP-AUTO-APPROVE-UNKNOWNhighCodex auto-approves an MCP server whose tool surface is not statically enumerable.
Why: Without an explicit tool allowlist or per-tool metadata, Shipgate cannot prove auto-approved MCP calls are read-only.
-
SHIP-CODEX-BOUNDARY-MCP-AUTO-APPROVE-WRITEcriticalCodex auto-approves a write or destructive MCP/app tool.
Why: Auto-approval of write-capable external tools lets the agent take side-effecting actions without a review boundary.
-
SHIP-CODEX-BOUNDARY-NETWORK-EXPANDEDhighCodex network access expanded.
Why: Enabling workspace-write network access or full network mode changes the local execution boundary.
-
SHIP-CODEX-BOUNDARY-NETWORK-WILDCARDhighCodex network permissions allow a wildcard domain.
Why: Wildcard network access expands the local agent's reachable resources beyond a reviewable allowlist.
-
SHIP-CODEX-BOUNDARY-POLICY-WEAKENEDcriticalCodex boundary policy was weakened.
Why: The policy that judges a Codex boundary change must not be weakened by the same change under review.
-
SHIP-CODEX-BOUNDARY-SKILL-COMMAND-CHANGEDmediumA Codex skill gained command-bearing instructions.
Why: Skills can steer agents into shell commands or helper scripts, so command-bearing changes need review before local automation.
-
SHIP-CODEX-BOUNDARY-UNKNOWN-PERMISSION-KEYmediumCodex permissions contain an unknown high-risk key.
Why: Permission-profile schema drift can change the sandbox boundary; unknown keys under permissions fail closed while unrelated top-level keys remain advisory.
codex_plugin
-
SHIP-CODEX-PLUGIN-APP-SURFACE-NOT-ENUMERABLEmediumCodex plugin app connector surface is not statically enumerable.
Why: Connector-backed app capabilities are externally mediated and cannot be proven from local plugin metadata alone.
-
SHIP-CODEX-PLUGIN-COMPONENT-PATH-MISSINGhighCodex plugin component path cannot be loaded.
Why: Release review cannot inspect declared skills, MCP servers, apps, or hooks when component paths are missing or escape the package.
-
SHIP-CODEX-PLUGIN-MARKETPLACE-POLICY-MISSINGmediumCodex plugin marketplace entry lacks policy metadata.
Why: Marketplace installation and authentication policy are part of the release surface coding agents need to review.
-
SHIP-CODEX-PLUGIN-MCP-SERVER-NOT-ENUMERABLEhighCodex plugin MCP server is declared but not statically enumerable.
Why: Agents Shipgate does not execute MCP commands, so reviewer-visible tool metadata requires an explicit local inventory.
-
SHIP-CODEX-PLUGIN-METADATA-MISSINGmediumCodex plugin package metadata is incomplete or ambiguous.
Why: Plugin identity needs to be stable before publication or downstream agent adoption.
-
SHIP-CODEX-PLUGIN-SKILL-METADATA-MISSINGmediumCodex plugin skill metadata is missing or duplicated.
Why: Skill frontmatter is the static routing surface agents use to decide whether a skill applies.
crewai
-
SHIP-CREWAI-DYNAMIC-TOOL-SURFACE-NOT-ENUMERABLEhighCrewAI tool surface cannot be statically enumerated.
Why: CrewAI exposes ad hoc agent-bound tool lists rather than a consistent toolset abstraction, so the check names the broader tool surface instead of ADK's toolset.
-
SHIP-CREWAI-FUNCTION-TOOL-METADATA-MISSINGmediumCrewAI function tool lacks static metadata.
Why: Static review depends on descriptions and parameter schemas because user CrewAI code is not imported.
documentation
-
SHIP-DOC-MISSING-DESCRIPTIONmediumTool description is missing or too short.
Why: Poor tool descriptions increase wrong-tool and reviewer misunderstanding risk.
evidence
-
SHIP-EVIDENCE-APPROVAL-TRACE-MISSINGhighLocal HITL approval trace evidence is missing or incomplete for an approval-required tool.
Why: Limited automation review depends on reviewer-visible local evidence that approval-controlled actions were approved before the tool call; absence of local evidence does not prove the runtime control is absent.
-
SHIP-EVIDENCE-HIGH-RISK-EXCLUSION-MISSINGhighLocal high-risk auto-approval exclusion evidence is missing or incomplete.
Why: High-risk tools that already declare approval policy need separate local evidence that they are excluded from auto-approval review posture; absence of local evidence does not prove the runtime control is absent.
-
SHIP-EVIDENCE-HITL-PROMOTION-CRITERIA-MISSINGhighLocal HITL promotion criteria evidence is missing or incomplete.
Why: A limited auto-approval review posture needs local criteria evidence; Shipgate structures the missing evidence for reviewers but does not certify runtime enforcement.
-
SHIP-EVIDENCE-OVERRIDE-REASON-MISSINGhighLocal HITL override reason evidence is missing or incomplete.
Why: Override, bypass, and auto-approval events need reviewer-visible local reasons for governance review; absence of local evidence does not prove the runtime control is absent.
host_boundary
-
SHIP-HOST-BOUNDARY-CONFIG-PARSE-FAILEDmediumA coding-agent host configuration file could not be parsed.
Why: A malformed host config prevents deterministic inspection of the agent's host capability boundary, so the diff-aware check fails closed to review.
-
SHIP-HOST-BOUNDARY-HOOK-CHANGEDhighClaude Code hooks changed.
Why: Hooks execute commands inside the agent lifecycle and can alter host behavior before or after every tool call.
-
SHIP-HOST-BOUNDARY-MCP-SERVER-ADDEDhighA new MCP server was declared for the coding-agent host.
Why: A new MCP server adds an entire external tool surface to the agent host; it must be human-reviewed before the agent can use it.
-
SHIP-HOST-BOUNDARY-MCP-SERVER-CHANGEDhighAn existing MCP server declaration changed its command, URL, args, or env keys.
Why: Changing what an already-trusted MCP server executes or connects to silently re-shapes the host tool surface behind an approved name.
-
SHIP-HOST-BOUNDARY-PERMISSION-ALLOW-EXPANDEDhighThe Claude Code permission allowlist expanded.
Why: Every new allow rule widens what the host executes without a prompt; expansion needs a human in the loop.
-
SHIP-HOST-BOUNDARY-PERMISSION-DENY-REMOVEDhighA Claude Code permission deny rule was removed.
Why: Deny rules are the host's explicit guardrails; removing one silently re-enables a previously forbidden capability.
-
SHIP-HOST-BOUNDARY-PERMISSION-WILDCARD-ALLOWcriticalA Claude Code allow rule grants a wildcard tool surface.
Why: Wildcard allow rules remove the host's per-command approval boundary entirely; an agent must not self-grant unrestricted tool access.
-
SHIP-HOST-BOUNDARY-PULL-REQUEST-TARGET-ADDEDcriticalA GitHub workflow gained a pull_request_target trigger.
Why: pull_request_target runs workflow code with secrets against fork PRs; adding it is a classic privilege-escalation surface.
-
SHIP-HOST-BOUNDARY-WORKFLOW-PERMISSIONS-EXPANDEDhighGitHub workflow permissions expanded.
Why: Moving a scope from read to write (or granting a new write scope) widens what CI can do with the repository token.
-
SHIP-HOST-BOUNDARY-WORKFLOW-WRITE-ALLcriticalA GitHub workflow grants write-all permissions.
Why: write-all hands the workflow token every write scope at once; an agent must not self-grant blanket CI write access.
inventory
-
SHIP-INVENTORY-LOW-CONFIDENCE-PRODUCTION-SURFACEhighProduction target includes low-confidence tool extraction.
Why: Production promotion should not depend primarily on best-effort SDK inference.
-
SHIP-INVENTORY-NOT-ENUMERABLEhighTool surface cannot be enumerated from declared inputs.
Why: A release gate must fail closed when it cannot see the agent's tools.
-
SHIP-INVENTORY-TOOL-SURFACE-TOO-LARGEmediumTool surface exceeds the MVP review threshold.
Why: Large tool surfaces are harder to reason about during promotion.
-
SHIP-INVENTORY-WILDCARD-TOOLShighWildcard or all-tools exposure is declared.
Why: Wildcard tools make review and least-privilege reasoning impossible.
langchain
-
SHIP-LANGCHAIN-DYNAMIC-TOOL-SURFACE-NOT-ENUMERABLEhighLangChain tool surface cannot be statically enumerated.
Why: LangChain and LangGraph expose ad hoc tool lists and agent-bound tools rather than a consistent toolset abstraction, so the check names the broader tool surface instead of ADK's toolset.
-
SHIP-LANGCHAIN-FUNCTION-TOOL-METADATA-MISSINGmediumLangChain function tool lacks static metadata.
Why: Static review depends on descriptions and parameter schemas because user LangChain code is not imported.
manifest
-
SHIP-MANIFEST-HIGH-RISK-OWNER-MISSINGhighProduction high-risk tool has no declared owner.
Why: High-risk production tools need an accountable owning team for review and remediation.
-
SHIP-MANIFEST-STALE-POLICYmediumA policy references a missing tool.
Why: Approval, confirmation, and idempotency policies should map to the actual release surface.
-
SHIP-MANIFEST-STALE-RISK-OVERRIDEmediumA risk override references a missing tool.
Why: Risk overrides should not outlive the tool they describe.
-
SHIP-MANIFEST-STALE-SUPPRESSIONmediumA suppression references a missing check or tool.
Why: Stale suppressions hide intent and make release review harder to audit.
-
SHIP-MANIFEST-UNUSED-SCOPEmediumManifest declares permission scopes unused by loaded tools.
Why: Unused permissions weaken least-privilege review and often indicate stale config.
mcp_permissions
-
SHIP-MCP-AUTO-APPROVE-SIDE-EFFECTcriticalMCP side-effecting tool is auto-approved.
Why: Auto-approved side-effecting MCP tools can let an agent write, publish, destroy, operate production systems, or make financial changes without review.
-
SHIP-MCP-ENV-SECRET-PASSTHROUGHhighMCP server passes through secret environment variables.
Why: Secret-bearing environment pass-through expands the credential boundary available to a tool server.
-
SHIP-MCP-PERMISSION-EXPANDEDhighMCP capability permissions expanded.
Why: MCP capability broadening changes the agent's static tool authority.
-
SHIP-MCP-READONLY-SERVER-ADDEDlowRead-only local documentation MCP server added.
Why: Local read-only documentation servers are low risk but should remain visible in capability review.
-
SHIP-MCP-UNKNOWN-TOOL-SCHEMAhighMCP tool side effect cannot be proven from static schema.
Why: Incomplete or wildcard MCP metadata cannot prove the callable tool surface is read-only.
n8n
-
SHIP-N8N-AI-TOOL-METADATA-MISSINGmediumn8n AI-exposed tool lacks static metadata.
Why: Static review depends on descriptions and parameter schemas because Shipgate does not execute n8n workflows.
-
SHIP-N8N-CREDENTIAL-EVIDENCE-MISSINGhighn8n credential stubs are not declared.
Why: Credential type evidence lets reviewers assess high-risk integrations without exposing secret values.
-
SHIP-N8N-DYNAMIC-TOOL-SURFACE-NOT-ENUMERABLEhighn8n tool surface cannot be statically enumerated.
Why: Release review needs an explicit local inventory when n8n workflow JSON uses runtime tool names, unresolved workflow references, wildcard MCP exposure, or community tool nodes without static metadata. This is high severity in every environment because static release evidence cannot prove the actual tool inventory.
-
SHIP-N8N-EVAL-COVERAGE-MISSINGmediumn8n eval coverage is not declared.
Why: n8n AI workflow releases should include response and tool-trajectory eval evidence before promotion.
-
SHIP-N8N-MCP-CLIENT-TOOLSET-UNFILTEREDhighn8n MCP Client Tool exposes an unfiltered toolset.
Why: All-tools and all-except MCP client selections can expose more tools than reviewers expect. The check is environment-sensitive because the selector is straightforward to narrow before production, while production-like use increases blast radius.
policy
-
SHIP-POLICY-APPROVAL-MISSINGcriticalHigh-risk tool lacks a declared approval policy.
Why: High-risk actions need explicit approval before promotion.
-
SHIP-POLICY-CONFIRMATION-MISSINGhighDestructive/external/customer-communication tool lacks a confirmation policy.
Why: Destructive and external actions should require explicit confirmation.
schema
-
SHIP-SCHEMA-BROAD-FREE-TEXThighAction-like tool accepts broad free-form input.
Why: Broad action/body/update fields increase blast radius for write tools.
-
SHIP-SCHEMA-FREEFORM-OUTPUTmediumTool returns free-form string output.
Why: Free-form tool output may carry prompt injection into later model context.
-
SHIP-SCHEMA-MISSING-BOUNDShighRisky numeric parameter lacks a maximum bound.
Why: Unbounded counts or financial amounts weaken blast-radius control.
scope
-
SHIP-SCOPE-PROHIBITED-TOOL-PRESENThighTool appears to overlap with a manifest prohibited action.
Why: Prohibited actions should not be contradicted by attached tool capabilities.
-
SHIP-SCOPE-TOOL-OUTSIDE-PURPOSEhighWrite-capable tool contradicts a read-only declared purpose.
Why: Declared purpose should constrain the attached tool surface.
security
-
SHIP-DOC-INJECTION-RISKmediumTool description contains instruction-override-like language.
Why: Tool metadata can be placed into model context and should not contain prompt-like directives.
-
SHIP-DOC-SECRET-IN-DESCRIPTIONmediumTool description contains a secret-like value.
Why: Credentials in tool metadata can leak into reports, prompts, or logs.
-
SHIP-N8N-SECRET-IN-WORKFLOW-PARAMETERhighn8n workflow JSON contains a secret-like value.
Why: Workflow JSON, pinned data, static data, and node notes can be committed or reported; hardcoded secret-like values should be moved into credentials or variables.
side_effects
-
SHIP-SIDEFX-IDEMPOTENCY-MISSINGhighRisky write tool lacks idempotency evidence; critical when retry is known.
Why: Retries against non-idempotent writes can duplicate financial or external side effects.
skill_lint
-
LINT-BODY-001mediumSkill body lacks a step-by-step procedure.
Why: Agents need explicit operational steps to apply a skill consistently.
-
LINT-BODY-003mediumSkill body lacks an output contract.
Why: Without an output contract, agents and reviewers cannot tell what artifact, response shape, or completion criteria the skill requires.
-
LINT-BODY-004mediumSkill body lacks verification criteria.
Why: Verification guidance lets an agent check whether the workflow was applied correctly before reporting completion.
-
LINT-DESC-001highSkill description is too vague to route reliably.
Why: The skill description is the primary routing surface agents inspect before loading the full skill body.
-
LINT-DESC-003mediumSkill description is overbroad and may false-trigger.
Why: Broad skill descriptions can cause agents to load a skill for tasks outside its intended behavior boundary.
-
LINT-SCRIPT-001mediumSkill script lacks documented --help usage.
Why: Agent-invoked scripts need discoverable usage text so agents can run them non-interactively and safely.
-
LINT-SCRIPT-004mediumStateful skill script lacks dry-run support.
Why: Scripts that mutate files or external state should expose a preview path before agents perform the stateful action.
-
LINT-SPEC-002highSkill has invalid YAML frontmatter.
Why: Agents use SKILL.md frontmatter for discovery; malformed frontmatter makes the routing surface ambiguous or unreadable.
-
LINT-SPEC-003highSkill is missing required name frontmatter.
Why: A stable skill name is required for discovery and unambiguous review.
-
LINT-SPEC-004highSkill is missing required description frontmatter.
Why: A clear description tells agents when to select the skill before loading the full body.
skill_security
-
SEC-FLOW-004mediumSkill lacks data and instruction separation guidance.
Why: Skills that combine untrusted content with outbound channels or secret access need explicit guidance that untrusted content is data, not agent instructions.
-
SEC-MISMATCH-001highSkill declares read-only behavior but bundled scripts mutate state.
Why: Reviewers and agents rely on the skill description to understand the behavior boundary before loading scripts and references.
-
SEC-PI-001criticalSkill artifact contains instruction override language.
Why: Skill and instruction artifacts are loaded into agent context and must not tell agents to ignore higher-priority instructions.
-
SEC-PI-003highSkill artifact tells the agent to hide behavior.
Why: Instructions to conceal behavior from users, reviewers, or logs are unsafe in agent behavior surfaces.
-
SEC-PROV-001highThird-party skill lacks provenance metadata.
Why: Third-party skills are supply-chain artifacts and need source, owner, version, and review metadata.
-
SEC-REMOTE-001mediumSkill fetches remote instruction content.
Why: Mutable remote prompt or instruction content can change skill behavior after review.
-
SEC-REMOTE-002criticalRemote content is fetched and executed.
Why: Runtime remote-code execution lets mutable external content change the behavior of a skill after review.
-
SEC-SCRIPT-001criticalRemote content is piped to a shell or interpreter.
Why: Piping remote content to an interpreter executes mutable external code in the agent environment.
-
SEC-SCRIPT-002criticalDestructive or stateful command lacks guardrails.
Why: Skills that drive destructive commands need dry-run, confirmation, or path validation before an agent can invoke them safely.
-
SEC-SECRET-001criticalSkill artifact contains a hardcoded secret-like value.
Why: Secrets in skill artifacts can leak into prompts, reports, logs, or agent execution environments.
-
SEC-SECRET-003highSkill artifact instructs credential or secret-file access.
Why: Skills should not direct agents to inspect broad credential files or environment secrets unless the workflow explicitly scopes and protects them.
-
SEC-TOOL-001highSkill pre-approves shell or bash without justification.
Why: Shell preapproval removes an important confirmation step before an agent runs commands from a skill.
verify
-
SHIP-VERIFY-AGENT-INSTRUCTIONS-WEAKENEDmediumThe PR edits agent-instruction trust roots (AGENTS.md, CLAUDE.md, .cursor/rules, skills, etc.) and Shipgate cannot statically prove the instructions were not weakened.
Why: Prompts are not controls (Principle 3); a static tool cannot judge whether instruction text was weakened, so any change is routed to human review rather than silently trusted.
-
SHIP-VERIFY-BASELINE-OR-WAIVER-EXPANDEDhighThe PR broadens what the gate forgives — a new suppression in checks.ignore, a widened waiver scope, or a larger accepted-debt baseline — versus the base report.
Why: Expanding suppressions, waivers, or the baseline is a reward-hacking move that hides findings instead of fixing them. A base-vs-head superset comparison of the effective-policy snapshot makes the expansion release-visible.
-
SHIP-VERIFY-CAPABILITY-SCOPE-BROADENEDcriticalThe PR removes or broadens a dynamically-loaded agent toolkit's explicit least-privilege configuration bound (e.g. a stripe_agent_toolkit constructor allowlist) versus the base report, silently expanding the toolkit surface.
Why: Toolkits that load tools through a runtime factory (*toolkit.get_tools()) hide the individual tools from static enumeration, so the only statically-provable signal is the constructor's permission allowlist. Removing it mounts the full toolkit (refund / cancel / dispute on a customer-support agent); a coding agent must not self-approve that capability broadening. Without this check the opaque factory degrades the verdict to insufficient_evidence instead of blocked.
-
SHIP-VERIFY-CI-GATE-REMOVEDcriticalThe PR deletes the Shipgate CI workflow from an opted-in repo, which would stop the release gate from running.
Why: Removing CI enforcement is the bluntest gate-bypass. Detecting a changed-and-now-absent workflow file makes the removal a release signal even without a base scan.
-
SHIP-VERIFY-POLICY-WEAKENEDhighThe PR's effective release policy is weaker than the base report's (CI mode downgraded, fail-on severity set loosened, or a check's severity override lowered across a tier).
Why: Comparing the normalized base-vs-head policy snapshot catches a coding agent that makes CI pass by weakening the gate rather than fixing the finding. When no base snapshot is available but a policy trust root was touched, the check fails safe to a review-required finding instead of silently passing.
-
SHIP-VERIFY-TRIGGER-CATALOG-DRIFTmediumThe PR changes the trigger catalog (docs/triggers.json or an .agents-shipgate trigger config) that decides when Shipgate runs.
Why: Editing the trigger catalog can carve out paths so the gate stops firing on the very changes that matter — a gate-evasion one level up from suppressing findings.
-
SHIP-VERIFY-TRUST-ROOT-TOUCHEDmediumA PR changed a release trust-root file (manifest, shipgate state, policy, prompts, the Shipgate CI gate, agent instructions, or a tool-surface declaration).
Why: Trust-root files define the release gate. A coding agent told to make CI pass can weaken the gate instead of fixing the underlying readiness issue (reward hacking); touching a trust root must therefore require human review.