Home

Every static check behind the merge verdict.

132 static checks across 24 categories — use this as your AI agent release-readiness checklist. Every category corresponds to a class of release risk for tool-using agents (MCP, OpenAPI, OpenAI Agents SDK, Anthropic, LangChain, CrewAI, Google ADK, Codex plugins, n8n). Vendored from docs/checks.json and refreshed on each agents-shipgate release.

Use this catalog as your...

  • MCP security checklist — review wildcard sources, missing approval policies, idempotency gaps, and broad scopes before deploying an MCP server.
  • AI agent release checklist — match every PR's tool-surface change against the categories below before approving merge.
  • Framework-agnostic tool review — the same checks apply to OpenAI Agents SDK, Anthropic Messages API, LangChain / LangGraph, CrewAI, Google ADK, Codex plugins, and n8n workflows.

All checks run statically: no model invocation, no MCP connection, no verifier network calls, no verifier telemetry by default. Run agents-shipgate verify --base origin/main --head HEAD --format json in CI; see the quickstart.

action_surface

  • SHIP-ACTION-APPROVAL-REMOVED critical

    Action approval policy was removed.

    Why: Removing approval weakens the release boundary for an existing action.

  • SHIP-ACTION-CONTROL-DOWNGRADE high

    Action declaration weakens an inherited approval or safeguard control.

    Why: Manifest-wide approval and safeguard controls are governance requirements; per-action metadata should not silently weaken them.

  • SHIP-ACTION-DESTRUCTIVE-ROLLBACK-MISSING critical

    New destructive action lacks approval or rollback controls.

    Why: Destructive actions need explicit approval and rollback evidence before release.

  • SHIP-ACTION-EFFECT-DOWNGRADE-DECLARED high

    Action declaration weakens the inferred effect.

    Why: Per-action metadata should not be able to declare away a higher-risk operation inferred from the tool surface.

  • SHIP-ACTION-EFFECT-ESCALATED critical

    Action effect escalated compared with the base surface.

    Why: Effect escalation changes what the agent can do in the real world and needs explicit review.

  • SHIP-ACTION-EXTERNAL-COMMUNICATION-AUDIT-MISSING high

    New external communication action lacks audit evidence.

    Why: External communication changes agent blast radius and should be auditable.

  • SHIP-ACTION-FINANCIAL-WRITE-CONTROL-MISSING critical

    New financial write action lacks required controls.

    Why: Financial write actions need approval, audit, and idempotency evidence before release.

  • SHIP-ACTION-POLICY-VIOLATION high

    An action-surface policy requirement is not satisfied.

    Why: Action Surface Diff policies are the reviewer-facing release boundary for external action capability.

  • SHIP-ACTION-SAFEGUARD-REMOVED high

    Action safeguard was removed.

    Why: Removing audit, idempotency, rollback, or dry-run safeguards expands blast radius.

  • SHIP-ACTION-UNDECLARED high

    A loaded tool lacks explicit action-surface metadata.

    Why: Action Surface Diff depends on reviewer-visible action metadata for release decisions.

  • SHIP-ACTION-WILDCARD-SCOPE critical

    Action surface includes a wildcard or admin-like scope.

    Why: Wildcard scopes make action blast radius too broad for deterministic release review.

adk

  • SHIP-ADK-DYNAMIC-TOOLSET-NOT-ENUMERABLE high

    Google ADK toolset cannot be statically enumerated.

    Why: Release review needs an explicit tool inventory; ADK MCP/OpenAPI toolsets may resolve tools dynamically at runtime.

  • SHIP-ADK-EVAL-COVERAGE-MISSING medium

    Google ADK eval coverage is not declared.

    Why: ADK releases should include response and tool-trajectory eval evidence before promotion.

  • SHIP-ADK-FUNCTION-TOOL-METADATA-MISSING medium

    Google ADK function tool lacks static metadata.

    Why: Static review depends on descriptions and parameter schemas because user ADK code is not imported.

  • SHIP-ADK-GUARDRAIL-EVIDENCE-MISSING high

    High-risk Google ADK tools lack static guardrail evidence.

    Why: Callbacks and plugins are the static ADK surface where release reviewers can see guardrail intent.

  • SHIP-ADK-LONGRUNNING-CONTRACT-MISSING high

    Google ADK long-running tool lacks an operation contract.

    Why: Long-running tools need explicit status and operation-id semantics for safe continuation.

  • SHIP-ADK-MCP-TOOLSET-UNFILTERED high

    Google ADK McpToolset lacks a static tool filter.

    Why: Unfiltered MCP toolsets can expose more tools than reviewers expect.

api

  • SHIP-API-FUNCTION-SCHEMA-STRICTNESS high

    OpenAI API function schema is not strict enough for reliable tool calls.

    Why: Strict schemas reduce ambiguous tool arguments and downstream side-effect risk.

  • SHIP-API-OPERATIONAL-READINESS medium

    Deprecated compatibility alias for the v0.3 OpenAI API operational readiness bundle.

    Why: v0.4 emits atomic OpenAI API readiness check IDs, but this ID remains available for existing suppressions, severity overrides, baselines, SARIF consumers, and explain/list-checks workflows during the deprecation window.

  • SHIP-API-PROMPT-TOOL-SCOPE-MISMATCH high

    Prompt scope contradicts enabled OpenAI API tools.

    Why: Prompt instructions should match the actual write/high-risk tool surface.

  • SHIP-API-RETRY-POLICY-MISSING medium

    OpenAI API high-risk flow lacks retry policy metadata.

    Why: Retries need explicit policy metadata so reviewers can reason about duplicate side effects.

  • SHIP-API-RETRY-WITHOUT-IDEMPOTENCY high

    OpenAI API write tool may be retried without idempotency evidence.

    Why: Retries against non-idempotent writes can duplicate financial, destructive, or external side effects.

  • SHIP-API-STRUCTURED-OUTPUT-READINESS medium

    OpenAI API structured output schema is missing or under-specified.

    Why: Downstream release decisions need explicit, structured success/refusal/review modeling.

  • SHIP-API-TEST-CASES-MISSING medium

    OpenAI API high-risk flow lacks test case metadata.

    Why: High-risk tool-call flows should have release evidence before promotion.

  • SHIP-API-TIMEOUT-MISSING medium

    OpenAI API high-risk flow lacks timeout metadata.

    Why: Timeouts define failure behavior and reduce ambiguous tool-call continuation.

  • SHIP-API-TOOL-OUTPUT-SCHEMA-MISSING medium

    OpenAI API high-risk tool lacks success/failure output modeling.

    Why: Tool output schemas help release reviewers reason about downstream failure handling.

  • SHIP-API-TRACE-APPROVAL-MISSING medium

    OpenAI API trace sample shows a policy-controlled tool without approval.

    Why: Trace samples should demonstrate approval behavior for tools that require approval.

  • SHIP-API-TRACE-CONFIRMATION-MISSING medium

    OpenAI API trace sample shows a policy-controlled tool without confirmation.

    Why: Trace samples should demonstrate explicit confirmation for tools that require confirmation.

auth

  • SHIP-AUTH-MANIFEST-BROAD-SCOPE high

    Manifest declares broad permission scopes.

    Why: Broad manifest scopes weaken least-privilege review.

  • SHIP-AUTH-MISSING-SCOPE high

    Scope-requiring tool lacks declared auth scopes.

    Why: Reviewers cannot assess least privilege without scope metadata.

  • SHIP-AUTH-SCOPE-COVERAGE-MISSING high

    Tool-required scopes are not covered by manifest permissions.scopes.

    Why: The manifest should describe the actual permissions needed by the release.

  • SHIP-AUTH-TOOL-BROAD-SCOPE high

    Tool declares broad auth scopes.

    Why: Tool-level broad scopes may grant more power than the operation needs.

baseline

  • SHIP-BASELINE-ENTRY-EXPIRED high

    Baseline entry's review window has expired.

    Why: Reviewer-set `provenance.expires` is the renewable consent for accepting technical debt. Past that date the entry needs a fresh review, not a silent extension.

  • SHIP-BASELINE-ENTRY-STALE low

    Baseline entry no longer corresponds to an active finding or check ID.

    Why: Stale baseline entries hide intent — reviewers cannot tell whether the accepted debt was resolved or whether the check was renamed. Pruning keeps the baseline aligned with reality.

  • SHIP-BASELINE-INTEGRITY-MISMATCH critical

    Baseline file integrity check failed.

    Why: The baseline JSON has been edited outside `agents-shipgate baseline save`, lacks an audit log row, has a malformed audit log row, or references a run_id not present in the audit log. A release gate that accepts silent baseline edits cannot claim to govern technical debt.

codex_boundary

  • SHIP-CODEX-BOUNDARY-AGENTS-SHIPGATE-REQUIREMENT-REMOVED medium

    AGENTS.md removed a Shipgate requirement.

    Why: Agent instructions are not controls; removing gate instructions requires human review because semantic weakening cannot be proven safely.

  • SHIP-CODEX-BOUNDARY-APP-AUTO-APPROVE high

    Codex app connector tool approval changed to approve.

    Why: Connector-backed app tools are externally mediated and need human review before local auto-approval.

  • SHIP-CODEX-BOUNDARY-CI-GATE-REMOVED critical

    Shipgate GitHub Action no longer invokes the gate.

    Why: Removing the local or CI gate is a direct bypass.

  • SHIP-CODEX-BOUNDARY-CONFIG-PARSE-FAILED medium

    Codex project configuration could not be parsed.

    Why: A malformed Codex config prevents deterministic inspection of the local execution boundary, so the local agent check fails closed to review.

  • SHIP-CODEX-BOUNDARY-DANGER-FULL-ACCESS critical

    Codex full-access sandbox is selected.

    Why: danger-full-access removes local sandbox restrictions and must not be silently approved by an agent.

  • SHIP-CODEX-BOUNDARY-HOOK-COMMAND-CHANGED high

    A Codex executable hook changed.

    Why: Hooks execute in the agent lifecycle and can alter local behavior before or after tool calls.

  • SHIP-CODEX-BOUNDARY-MCP-AUTO-APPROVE-UNKNOWN high

    Codex auto-approves an MCP server whose tool surface is not statically enumerable.

    Why: Without an explicit tool allowlist or per-tool metadata, Shipgate cannot prove auto-approved MCP calls are read-only.

  • SHIP-CODEX-BOUNDARY-MCP-AUTO-APPROVE-WRITE critical

    Codex auto-approves a write or destructive MCP/app tool.

    Why: Auto-approval of write-capable external tools lets the agent take side-effecting actions without a review boundary.

  • SHIP-CODEX-BOUNDARY-NETWORK-EXPANDED high

    Codex network access expanded.

    Why: Enabling workspace-write network access or full network mode changes the local execution boundary.

  • SHIP-CODEX-BOUNDARY-NETWORK-WILDCARD high

    Codex network permissions allow a wildcard domain.

    Why: Wildcard network access expands the local agent's reachable resources beyond a reviewable allowlist.

  • SHIP-CODEX-BOUNDARY-POLICY-WEAKENED critical

    Codex boundary policy was weakened.

    Why: The policy that judges a Codex boundary change must not be weakened by the same change under review.

  • SHIP-CODEX-BOUNDARY-SKILL-COMMAND-CHANGED medium

    A Codex skill gained command-bearing instructions.

    Why: Skills can steer agents into shell commands or helper scripts, so command-bearing changes need review before local automation.

  • SHIP-CODEX-BOUNDARY-UNKNOWN-PERMISSION-KEY medium

    Codex permissions contain an unknown high-risk key.

    Why: Permission-profile schema drift can change the sandbox boundary; unknown keys under permissions fail closed while unrelated top-level keys remain advisory.

codex_plugin

  • SHIP-CODEX-PLUGIN-APP-SURFACE-NOT-ENUMERABLE medium

    Codex plugin app connector surface is not statically enumerable.

    Why: Connector-backed app capabilities are externally mediated and cannot be proven from local plugin metadata alone.

  • SHIP-CODEX-PLUGIN-COMPONENT-PATH-MISSING high

    Codex plugin component path cannot be loaded.

    Why: Release review cannot inspect declared skills, MCP servers, apps, or hooks when component paths are missing or escape the package.

  • SHIP-CODEX-PLUGIN-MARKETPLACE-POLICY-MISSING medium

    Codex plugin marketplace entry lacks policy metadata.

    Why: Marketplace installation and authentication policy are part of the release surface coding agents need to review.

  • SHIP-CODEX-PLUGIN-MCP-SERVER-NOT-ENUMERABLE high

    Codex plugin MCP server is declared but not statically enumerable.

    Why: Agents Shipgate does not execute MCP commands, so reviewer-visible tool metadata requires an explicit local inventory.

  • SHIP-CODEX-PLUGIN-METADATA-MISSING medium

    Codex plugin package metadata is incomplete or ambiguous.

    Why: Plugin identity needs to be stable before publication or downstream agent adoption.

  • SHIP-CODEX-PLUGIN-SKILL-METADATA-MISSING medium

    Codex plugin skill metadata is missing or duplicated.

    Why: Skill frontmatter is the static routing surface agents use to decide whether a skill applies.

crewai

  • SHIP-CREWAI-DYNAMIC-TOOL-SURFACE-NOT-ENUMERABLE high

    CrewAI tool surface cannot be statically enumerated.

    Why: CrewAI exposes ad hoc agent-bound tool lists rather than a consistent toolset abstraction, so the check names the broader tool surface instead of ADK's toolset.

  • SHIP-CREWAI-FUNCTION-TOOL-METADATA-MISSING medium

    CrewAI function tool lacks static metadata.

    Why: Static review depends on descriptions and parameter schemas because user CrewAI code is not imported.

documentation

  • SHIP-DOC-MISSING-DESCRIPTION medium

    Tool description is missing or too short.

    Why: Poor tool descriptions increase wrong-tool and reviewer misunderstanding risk.

evidence

  • SHIP-EVIDENCE-APPROVAL-TRACE-MISSING high

    Local HITL approval trace evidence is missing or incomplete for an approval-required tool.

    Why: Limited automation review depends on reviewer-visible local evidence that approval-controlled actions were approved before the tool call; absence of local evidence does not prove the runtime control is absent.

  • SHIP-EVIDENCE-HIGH-RISK-EXCLUSION-MISSING high

    Local high-risk auto-approval exclusion evidence is missing or incomplete.

    Why: High-risk tools that already declare approval policy need separate local evidence that they are excluded from auto-approval review posture; absence of local evidence does not prove the runtime control is absent.

  • SHIP-EVIDENCE-HITL-PROMOTION-CRITERIA-MISSING high

    Local HITL promotion criteria evidence is missing or incomplete.

    Why: A limited auto-approval review posture needs local criteria evidence; Shipgate structures the missing evidence for reviewers but does not certify runtime enforcement.

  • SHIP-EVIDENCE-OVERRIDE-REASON-MISSING high

    Local HITL override reason evidence is missing or incomplete.

    Why: Override, bypass, and auto-approval events need reviewer-visible local reasons for governance review; absence of local evidence does not prove the runtime control is absent.

host_boundary

  • SHIP-HOST-BOUNDARY-CONFIG-PARSE-FAILED medium

    A coding-agent host configuration file could not be parsed.

    Why: A malformed host config prevents deterministic inspection of the agent's host capability boundary, so the diff-aware check fails closed to review.

  • SHIP-HOST-BOUNDARY-HOOK-CHANGED high

    Claude Code hooks changed.

    Why: Hooks execute commands inside the agent lifecycle and can alter host behavior before or after every tool call.

  • SHIP-HOST-BOUNDARY-MCP-SERVER-ADDED high

    A new MCP server was declared for the coding-agent host.

    Why: A new MCP server adds an entire external tool surface to the agent host; it must be human-reviewed before the agent can use it.

  • SHIP-HOST-BOUNDARY-MCP-SERVER-CHANGED high

    An existing MCP server declaration changed its command, URL, args, or env keys.

    Why: Changing what an already-trusted MCP server executes or connects to silently re-shapes the host tool surface behind an approved name.

  • SHIP-HOST-BOUNDARY-PERMISSION-ALLOW-EXPANDED high

    The Claude Code permission allowlist expanded.

    Why: Every new allow rule widens what the host executes without a prompt; expansion needs a human in the loop.

  • SHIP-HOST-BOUNDARY-PERMISSION-DENY-REMOVED high

    A Claude Code permission deny rule was removed.

    Why: Deny rules are the host's explicit guardrails; removing one silently re-enables a previously forbidden capability.

  • SHIP-HOST-BOUNDARY-PERMISSION-WILDCARD-ALLOW critical

    A Claude Code allow rule grants a wildcard tool surface.

    Why: Wildcard allow rules remove the host's per-command approval boundary entirely; an agent must not self-grant unrestricted tool access.

  • SHIP-HOST-BOUNDARY-PULL-REQUEST-TARGET-ADDED critical

    A GitHub workflow gained a pull_request_target trigger.

    Why: pull_request_target runs workflow code with secrets against fork PRs; adding it is a classic privilege-escalation surface.

  • SHIP-HOST-BOUNDARY-WORKFLOW-PERMISSIONS-EXPANDED high

    GitHub workflow permissions expanded.

    Why: Moving a scope from read to write (or granting a new write scope) widens what CI can do with the repository token.

  • SHIP-HOST-BOUNDARY-WORKFLOW-WRITE-ALL critical

    A GitHub workflow grants write-all permissions.

    Why: write-all hands the workflow token every write scope at once; an agent must not self-grant blanket CI write access.

inventory

  • SHIP-INVENTORY-LOW-CONFIDENCE-PRODUCTION-SURFACE high

    Production target includes low-confidence tool extraction.

    Why: Production promotion should not depend primarily on best-effort SDK inference.

  • SHIP-INVENTORY-NOT-ENUMERABLE high

    Tool surface cannot be enumerated from declared inputs.

    Why: A release gate must fail closed when it cannot see the agent's tools.

  • SHIP-INVENTORY-TOOL-SURFACE-TOO-LARGE medium

    Tool surface exceeds the MVP review threshold.

    Why: Large tool surfaces are harder to reason about during promotion.

  • SHIP-INVENTORY-WILDCARD-TOOLS high

    Wildcard or all-tools exposure is declared.

    Why: Wildcard tools make review and least-privilege reasoning impossible.

langchain

  • SHIP-LANGCHAIN-DYNAMIC-TOOL-SURFACE-NOT-ENUMERABLE high

    LangChain tool surface cannot be statically enumerated.

    Why: LangChain and LangGraph expose ad hoc tool lists and agent-bound tools rather than a consistent toolset abstraction, so the check names the broader tool surface instead of ADK's toolset.

  • SHIP-LANGCHAIN-FUNCTION-TOOL-METADATA-MISSING medium

    LangChain function tool lacks static metadata.

    Why: Static review depends on descriptions and parameter schemas because user LangChain code is not imported.

manifest

  • SHIP-MANIFEST-HIGH-RISK-OWNER-MISSING high

    Production high-risk tool has no declared owner.

    Why: High-risk production tools need an accountable owning team for review and remediation.

  • SHIP-MANIFEST-STALE-POLICY medium

    A policy references a missing tool.

    Why: Approval, confirmation, and idempotency policies should map to the actual release surface.

  • SHIP-MANIFEST-STALE-RISK-OVERRIDE medium

    A risk override references a missing tool.

    Why: Risk overrides should not outlive the tool they describe.

  • SHIP-MANIFEST-STALE-SUPPRESSION medium

    A suppression references a missing check or tool.

    Why: Stale suppressions hide intent and make release review harder to audit.

  • SHIP-MANIFEST-UNUSED-SCOPE medium

    Manifest declares permission scopes unused by loaded tools.

    Why: Unused permissions weaken least-privilege review and often indicate stale config.

mcp_permissions

  • SHIP-MCP-AUTO-APPROVE-SIDE-EFFECT critical

    MCP side-effecting tool is auto-approved.

    Why: Auto-approved side-effecting MCP tools can let an agent write, publish, destroy, operate production systems, or make financial changes without review.

  • SHIP-MCP-ENV-SECRET-PASSTHROUGH high

    MCP server passes through secret environment variables.

    Why: Secret-bearing environment pass-through expands the credential boundary available to a tool server.

  • SHIP-MCP-PERMISSION-EXPANDED high

    MCP capability permissions expanded.

    Why: MCP capability broadening changes the agent's static tool authority.

  • SHIP-MCP-READONLY-SERVER-ADDED low

    Read-only local documentation MCP server added.

    Why: Local read-only documentation servers are low risk but should remain visible in capability review.

  • SHIP-MCP-UNKNOWN-TOOL-SCHEMA high

    MCP tool side effect cannot be proven from static schema.

    Why: Incomplete or wildcard MCP metadata cannot prove the callable tool surface is read-only.

n8n

  • SHIP-N8N-AI-TOOL-METADATA-MISSING medium

    n8n AI-exposed tool lacks static metadata.

    Why: Static review depends on descriptions and parameter schemas because Shipgate does not execute n8n workflows.

  • SHIP-N8N-CREDENTIAL-EVIDENCE-MISSING high

    n8n credential stubs are not declared.

    Why: Credential type evidence lets reviewers assess high-risk integrations without exposing secret values.

  • SHIP-N8N-DYNAMIC-TOOL-SURFACE-NOT-ENUMERABLE high

    n8n tool surface cannot be statically enumerated.

    Why: Release review needs an explicit local inventory when n8n workflow JSON uses runtime tool names, unresolved workflow references, wildcard MCP exposure, or community tool nodes without static metadata. This is high severity in every environment because static release evidence cannot prove the actual tool inventory.

  • SHIP-N8N-EVAL-COVERAGE-MISSING medium

    n8n eval coverage is not declared.

    Why: n8n AI workflow releases should include response and tool-trajectory eval evidence before promotion.

  • SHIP-N8N-MCP-CLIENT-TOOLSET-UNFILTERED high

    n8n MCP Client Tool exposes an unfiltered toolset.

    Why: All-tools and all-except MCP client selections can expose more tools than reviewers expect. The check is environment-sensitive because the selector is straightforward to narrow before production, while production-like use increases blast radius.

policy

  • SHIP-POLICY-APPROVAL-MISSING critical

    High-risk tool lacks a declared approval policy.

    Why: High-risk actions need explicit approval before promotion.

  • SHIP-POLICY-CONFIRMATION-MISSING high

    Destructive/external/customer-communication tool lacks a confirmation policy.

    Why: Destructive and external actions should require explicit confirmation.

schema

  • SHIP-SCHEMA-BROAD-FREE-TEXT high

    Action-like tool accepts broad free-form input.

    Why: Broad action/body/update fields increase blast radius for write tools.

  • SHIP-SCHEMA-FREEFORM-OUTPUT medium

    Tool returns free-form string output.

    Why: Free-form tool output may carry prompt injection into later model context.

  • SHIP-SCHEMA-MISSING-BOUNDS high

    Risky numeric parameter lacks a maximum bound.

    Why: Unbounded counts or financial amounts weaken blast-radius control.

scope

  • SHIP-SCOPE-PROHIBITED-TOOL-PRESENT high

    Tool appears to overlap with a manifest prohibited action.

    Why: Prohibited actions should not be contradicted by attached tool capabilities.

  • SHIP-SCOPE-TOOL-OUTSIDE-PURPOSE high

    Write-capable tool contradicts a read-only declared purpose.

    Why: Declared purpose should constrain the attached tool surface.

security

  • SHIP-DOC-INJECTION-RISK medium

    Tool description contains instruction-override-like language.

    Why: Tool metadata can be placed into model context and should not contain prompt-like directives.

  • SHIP-DOC-SECRET-IN-DESCRIPTION medium

    Tool description contains a secret-like value.

    Why: Credentials in tool metadata can leak into reports, prompts, or logs.

  • SHIP-N8N-SECRET-IN-WORKFLOW-PARAMETER high

    n8n workflow JSON contains a secret-like value.

    Why: Workflow JSON, pinned data, static data, and node notes can be committed or reported; hardcoded secret-like values should be moved into credentials or variables.

side_effects

  • SHIP-SIDEFX-IDEMPOTENCY-MISSING high

    Risky write tool lacks idempotency evidence; critical when retry is known.

    Why: Retries against non-idempotent writes can duplicate financial or external side effects.

skill_lint

  • LINT-BODY-001 medium

    Skill body lacks a step-by-step procedure.

    Why: Agents need explicit operational steps to apply a skill consistently.

  • LINT-BODY-003 medium

    Skill body lacks an output contract.

    Why: Without an output contract, agents and reviewers cannot tell what artifact, response shape, or completion criteria the skill requires.

  • LINT-BODY-004 medium

    Skill body lacks verification criteria.

    Why: Verification guidance lets an agent check whether the workflow was applied correctly before reporting completion.

  • LINT-DESC-001 high

    Skill description is too vague to route reliably.

    Why: The skill description is the primary routing surface agents inspect before loading the full skill body.

  • LINT-DESC-003 medium

    Skill description is overbroad and may false-trigger.

    Why: Broad skill descriptions can cause agents to load a skill for tasks outside its intended behavior boundary.

  • LINT-SCRIPT-001 medium

    Skill script lacks documented --help usage.

    Why: Agent-invoked scripts need discoverable usage text so agents can run them non-interactively and safely.

  • LINT-SCRIPT-004 medium

    Stateful skill script lacks dry-run support.

    Why: Scripts that mutate files or external state should expose a preview path before agents perform the stateful action.

  • LINT-SPEC-002 high

    Skill has invalid YAML frontmatter.

    Why: Agents use SKILL.md frontmatter for discovery; malformed frontmatter makes the routing surface ambiguous or unreadable.

  • LINT-SPEC-003 high

    Skill is missing required name frontmatter.

    Why: A stable skill name is required for discovery and unambiguous review.

  • LINT-SPEC-004 high

    Skill is missing required description frontmatter.

    Why: A clear description tells agents when to select the skill before loading the full body.

skill_security

  • SEC-FLOW-004 medium

    Skill lacks data and instruction separation guidance.

    Why: Skills that combine untrusted content with outbound channels or secret access need explicit guidance that untrusted content is data, not agent instructions.

  • SEC-MISMATCH-001 high

    Skill declares read-only behavior but bundled scripts mutate state.

    Why: Reviewers and agents rely on the skill description to understand the behavior boundary before loading scripts and references.

  • SEC-PI-001 critical

    Skill artifact contains instruction override language.

    Why: Skill and instruction artifacts are loaded into agent context and must not tell agents to ignore higher-priority instructions.

  • SEC-PI-003 high

    Skill artifact tells the agent to hide behavior.

    Why: Instructions to conceal behavior from users, reviewers, or logs are unsafe in agent behavior surfaces.

  • SEC-PROV-001 high

    Third-party skill lacks provenance metadata.

    Why: Third-party skills are supply-chain artifacts and need source, owner, version, and review metadata.

  • SEC-REMOTE-001 medium

    Skill fetches remote instruction content.

    Why: Mutable remote prompt or instruction content can change skill behavior after review.

  • SEC-REMOTE-002 critical

    Remote content is fetched and executed.

    Why: Runtime remote-code execution lets mutable external content change the behavior of a skill after review.

  • SEC-SCRIPT-001 critical

    Remote content is piped to a shell or interpreter.

    Why: Piping remote content to an interpreter executes mutable external code in the agent environment.

  • SEC-SCRIPT-002 critical

    Destructive or stateful command lacks guardrails.

    Why: Skills that drive destructive commands need dry-run, confirmation, or path validation before an agent can invoke them safely.

  • SEC-SECRET-001 critical

    Skill artifact contains a hardcoded secret-like value.

    Why: Secrets in skill artifacts can leak into prompts, reports, logs, or agent execution environments.

  • SEC-SECRET-003 high

    Skill artifact instructs credential or secret-file access.

    Why: Skills should not direct agents to inspect broad credential files or environment secrets unless the workflow explicitly scopes and protects them.

  • SEC-TOOL-001 high

    Skill pre-approves shell or bash without justification.

    Why: Shell preapproval removes an important confirmation step before an agent runs commands from a skill.

verify

  • SHIP-VERIFY-AGENT-INSTRUCTIONS-WEAKENED medium

    The PR edits agent-instruction trust roots (AGENTS.md, CLAUDE.md, .cursor/rules, skills, etc.) and Shipgate cannot statically prove the instructions were not weakened.

    Why: Prompts are not controls (Principle 3); a static tool cannot judge whether instruction text was weakened, so any change is routed to human review rather than silently trusted.

  • SHIP-VERIFY-BASELINE-OR-WAIVER-EXPANDED high

    The PR broadens what the gate forgives — a new suppression in checks.ignore, a widened waiver scope, or a larger accepted-debt baseline — versus the base report.

    Why: Expanding suppressions, waivers, or the baseline is a reward-hacking move that hides findings instead of fixing them. A base-vs-head superset comparison of the effective-policy snapshot makes the expansion release-visible.

  • SHIP-VERIFY-CAPABILITY-SCOPE-BROADENED critical

    The PR removes or broadens a dynamically-loaded agent toolkit's explicit least-privilege configuration bound (e.g. a stripe_agent_toolkit constructor allowlist) versus the base report, silently expanding the toolkit surface.

    Why: Toolkits that load tools through a runtime factory (*toolkit.get_tools()) hide the individual tools from static enumeration, so the only statically-provable signal is the constructor's permission allowlist. Removing it mounts the full toolkit (refund / cancel / dispute on a customer-support agent); a coding agent must not self-approve that capability broadening. Without this check the opaque factory degrades the verdict to insufficient_evidence instead of blocked.

  • SHIP-VERIFY-CI-GATE-REMOVED critical

    The PR deletes the Shipgate CI workflow from an opted-in repo, which would stop the release gate from running.

    Why: Removing CI enforcement is the bluntest gate-bypass. Detecting a changed-and-now-absent workflow file makes the removal a release signal even without a base scan.

  • SHIP-VERIFY-POLICY-WEAKENED high

    The PR's effective release policy is weaker than the base report's (CI mode downgraded, fail-on severity set loosened, or a check's severity override lowered across a tier).

    Why: Comparing the normalized base-vs-head policy snapshot catches a coding agent that makes CI pass by weakening the gate rather than fixing the finding. When no base snapshot is available but a policy trust root was touched, the check fails safe to a review-required finding instead of silently passing.

  • SHIP-VERIFY-TRIGGER-CATALOG-DRIFT medium

    The PR changes the trigger catalog (docs/triggers.json or an .agents-shipgate trigger config) that decides when Shipgate runs.

    Why: Editing the trigger catalog can carve out paths so the gate stops firing on the very changes that matter — a gate-evasion one level up from suppressing findings.

  • SHIP-VERIFY-TRUST-ROOT-TOUCHED medium

    A PR changed a release trust-root file (manifest, shipgate state, policy, prompts, the Shipgate CI gate, agent instructions, or a tool-surface declaration).

    Why: Trust-root files define the release gate. A coding agent told to make CI pass can weaken the gate instead of fixing the underlying readiness issue (reward hacking); touching a trust root must therefore require human review.

Verify a PR docs/checks.md