Check catalog

Every static check behind the merge verdict.

132 static checks across 24 categories — use this as your AI agent release-readiness checklist. Every category corresponds to a class of release risk for tool-using agents (MCP, OpenAPI, OpenAI Agents SDK, Anthropic, LangChain, CrewAI, Google ADK, Codex plugins, n8n). Vendored from docs/checks.json and refreshed on each agents-shipgate release.

Use this catalog as your...

MCP security checklist — review wildcard sources, missing approval policies, idempotency gaps, and broad scopes before deploying an MCP server.
AI agent release checklist — match every PR's tool-surface change against the categories below before approving merge.
Framework-agnostic tool review — the same checks apply to OpenAI Agents SDK, Anthropic Messages API, LangChain / LangGraph, CrewAI, Google ADK, Codex plugins, and n8n workflows.

All checks run statically: no model invocation, no MCP connection, no verifier network calls, no verifier telemetry by default. Run agents-shipgate verify --base origin/main --head HEAD --format json in CI; see the quickstart.

action_surface

SHIP-ACTION-APPROVAL-REMOVED critical

Action approval policy was removed.

Why: Removing approval weakens the release boundary for an existing action.
SHIP-ACTION-CONTROL-DOWNGRADE high

Action declaration weakens an inherited approval or safeguard control.

Why: Manifest-wide approval and safeguard controls are governance requirements; per-action metadata should not silently weaken them.
SHIP-ACTION-DESTRUCTIVE-ROLLBACK-MISSING critical

New destructive action lacks approval or rollback controls.

Why: Destructive actions need explicit approval and rollback evidence before release.
SHIP-ACTION-EFFECT-DOWNGRADE-DECLARED high

Action declaration weakens the inferred effect.

Why: Per-action metadata should not be able to declare away a higher-risk operation inferred from the tool surface.
SHIP-ACTION-EFFECT-ESCALATED critical

Action effect escalated compared with the base surface.

Why: Effect escalation changes what the agent can do in the real world and needs explicit review.
SHIP-ACTION-EXTERNAL-COMMUNICATION-AUDIT-MISSING high

New external communication action lacks audit evidence.

Why: External communication changes agent blast radius and should be auditable.
SHIP-ACTION-FINANCIAL-WRITE-CONTROL-MISSING critical

New financial write action lacks required controls.

Why: Financial write actions need approval, audit, and idempotency evidence before release.
SHIP-ACTION-POLICY-VIOLATION high

An action-surface policy requirement is not satisfied.

Why: Action Surface Diff policies are the reviewer-facing release boundary for external action capability.
SHIP-ACTION-SAFEGUARD-REMOVED high

Action safeguard was removed.

Why: Removing audit, idempotency, rollback, or dry-run safeguards expands blast radius.
SHIP-ACTION-UNDECLARED high

A loaded tool lacks explicit action-surface metadata.

Why: Action Surface Diff depends on reviewer-visible action metadata for release decisions.
SHIP-ACTION-WILDCARD-SCOPE critical

Action surface includes a wildcard or admin-like scope.

Why: Wildcard scopes make action blast radius too broad for deterministic release review.

adk

SHIP-ADK-DYNAMIC-TOOLSET-NOT-ENUMERABLE high

Google ADK toolset cannot be statically enumerated.

Why: Release review needs an explicit tool inventory; ADK MCP/OpenAPI toolsets may resolve tools dynamically at runtime.
SHIP-ADK-EVAL-COVERAGE-MISSING medium

Google ADK eval coverage is not declared.

Why: ADK releases should include response and tool-trajectory eval evidence before promotion.
SHIP-ADK-FUNCTION-TOOL-METADATA-MISSING medium

Google ADK function tool lacks static metadata.

Why: Static review depends on descriptions and parameter schemas because user ADK code is not imported.
SHIP-ADK-GUARDRAIL-EVIDENCE-MISSING high

High-risk Google ADK tools lack static guardrail evidence.

Why: Callbacks and plugins are the static ADK surface where release reviewers can see guardrail intent.
SHIP-ADK-LONGRUNNING-CONTRACT-MISSING high

Google ADK long-running tool lacks an operation contract.

Why: Long-running tools need explicit status and operation-id semantics for safe continuation.
SHIP-ADK-MCP-TOOLSET-UNFILTERED high

Google ADK McpToolset lacks a static tool filter.

Why: Unfiltered MCP toolsets can expose more tools than reviewers expect.

api

SHIP-API-FUNCTION-SCHEMA-STRICTNESS high

OpenAI API function schema is not strict enough for reliable tool calls.

Why: Strict schemas reduce ambiguous tool arguments and downstream side-effect risk.
SHIP-API-OPERATIONAL-READINESS medium

Deprecated compatibility alias for the v0.3 OpenAI API operational readiness bundle.

Why: v0.4 emits atomic OpenAI API readiness check IDs, but this ID remains available for existing suppressions, severity overrides, baselines, SARIF consumers, and explain/list-checks workflows during the deprecation window.
SHIP-API-PROMPT-TOOL-SCOPE-MISMATCH high

Prompt scope contradicts enabled OpenAI API tools.

Why: Prompt instructions should match the actual write/high-risk tool surface.
SHIP-API-RETRY-POLICY-MISSING medium

OpenAI API high-risk flow lacks retry policy metadata.

Why: Retries need explicit policy metadata so reviewers can reason about duplicate side effects.
SHIP-API-RETRY-WITHOUT-IDEMPOTENCY high

OpenAI API write tool may be retried without idempotency evidence.

Why: Retries against non-idempotent writes can duplicate financial, destructive, or external side effects.
SHIP-API-STRUCTURED-OUTPUT-READINESS medium

OpenAI API structured output schema is missing or under-specified.

Why: Downstream release decisions need explicit, structured success/refusal/review modeling.
SHIP-API-TEST-CASES-MISSING medium

OpenAI API high-risk flow lacks test case metadata.

Why: High-risk tool-call flows should have release evidence before promotion.
SHIP-API-TIMEOUT-MISSING medium

OpenAI API high-risk flow lacks timeout metadata.

Why: Timeouts define failure behavior and reduce ambiguous tool-call continuation.
SHIP-API-TOOL-OUTPUT-SCHEMA-MISSING medium

OpenAI API high-risk tool lacks success/failure output modeling.

Why: Tool output schemas help release reviewers reason about downstream failure handling.
SHIP-API-TRACE-APPROVAL-MISSING medium

OpenAI API trace sample shows a policy-controlled tool without approval.

Why: Trace samples should demonstrate approval behavior for tools that require approval.
SHIP-API-TRACE-CONFIRMATION-MISSING medium

OpenAI API trace sample shows a policy-controlled tool without confirmation.

Why: Trace samples should demonstrate explicit confirmation for tools that require confirmation.

auth

SHIP-AUTH-MANIFEST-BROAD-SCOPE high

Manifest declares broad permission scopes.

Why: Broad manifest scopes weaken least-privilege review.
SHIP-AUTH-MISSING-SCOPE high

Scope-requiring tool lacks declared auth scopes.

Why: Reviewers cannot assess least privilege without scope metadata.
SHIP-AUTH-SCOPE-COVERAGE-MISSING high

Tool-required scopes are not covered by manifest permissions.scopes.

Why: The manifest should describe the actual permissions needed by the release.
SHIP-AUTH-TOOL-BROAD-SCOPE high

Tool declares broad auth scopes.

Why: Tool-level broad scopes may grant more power than the operation needs.

baseline

SHIP-BASELINE-ENTRY-EXPIRED high

Baseline entry's review window has expired.

Why: Reviewer-set `provenance.expires` is the renewable consent for accepting technical debt. Past that date the entry needs a fresh review, not a silent extension.
SHIP-BASELINE-ENTRY-STALE low

Baseline entry no longer corresponds to an active finding or check ID.

Why: Stale baseline entries hide intent — reviewers cannot tell whether the accepted debt was resolved or whether the check was renamed. Pruning keeps the baseline aligned with reality.
SHIP-BASELINE-INTEGRITY-MISMATCH critical

Baseline file integrity check failed.

Why: The baseline JSON has been edited outside `agents-shipgate baseline save`, lacks an audit log row, has a malformed audit log row, or references a run_id not present in the audit log. A release gate that accepts silent baseline edits cannot claim to govern technical debt.

codex_boundary

SHIP-CODEX-BOUNDARY-AGENTS-SHIPGATE-REQUIREMENT-REMOVED medium

AGENTS.md removed a Shipgate requirement.

Why: Agent instructions are not controls; removing gate instructions requires human review because semantic weakening cannot be proven safely.
SHIP-CODEX-BOUNDARY-APP-AUTO-APPROVE high

Codex app connector tool approval changed to approve.

Why: Connector-backed app tools are externally mediated and need human review before local auto-approval.
SHIP-CODEX-BOUNDARY-CI-GATE-REMOVED critical

Shipgate GitHub Action no longer invokes the gate.

Why: Removing the local or CI gate is a direct bypass.
SHIP-CODEX-BOUNDARY-CONFIG-PARSE-FAILED medium

Codex project configuration could not be parsed.

Why: A malformed Codex config prevents deterministic inspection of the local execution boundary, so the local agent check fails closed to review.
SHIP-CODEX-BOUNDARY-DANGER-FULL-ACCESS critical

Codex full-access sandbox is selected.

Why: danger-full-access removes local sandbox restrictions and must not be silently approved by an agent.
SHIP-CODEX-BOUNDARY-HOOK-COMMAND-CHANGED high

A Codex executable hook changed.

Why: Hooks execute in the agent lifecycle and can alter local behavior before or after tool calls.
SHIP-CODEX-BOUNDARY-MCP-AUTO-APPROVE-UNKNOWN high

Codex auto-approves an MCP server whose tool surface is not statically enumerable.

Why: Without an explicit tool allowlist or per-tool metadata, Shipgate cannot prove auto-approved MCP calls are read-only.
SHIP-CODEX-BOUNDARY-MCP-AUTO-APPROVE-WRITE critical

Codex auto-approves a write or destructive MCP/app tool.

Why: Auto-approval of write-capable external tools lets the agent take side-effecting actions without a review boundary.
SHIP-CODEX-BOUNDARY-NETWORK-EXPANDED high

Codex network access expanded.

Why: Enabling workspace-write network access or full network mode changes the local execution boundary.
SHIP-CODEX-BOUNDARY-NETWORK-WILDCARD high

Codex network permissions allow a wildcard domain.

Why: Wildcard network access expands the local agent's reachable resources beyond a reviewable allowlist.
SHIP-CODEX-BOUNDARY-POLICY-WEAKENED critical

Codex boundary policy was weakened.

Why: The policy that judges a Codex boundary change must not be weakened by the same change under review.
SHIP-CODEX-BOUNDARY-SKILL-COMMAND-CHANGED medium

A Codex skill gained command-bearing instructions.

Why: Skills can steer agents into shell commands or helper scripts, so command-bearing changes need review before local automation.
SHIP-CODEX-BOUNDARY-UNKNOWN-PERMISSION-KEY medium

Codex permissions contain an unknown high-risk key.

Why: Permission-profile schema drift can change the sandbox boundary; unknown keys under permissions fail closed while unrelated top-level keys remain advisory.

codex_plugin

SHIP-CODEX-PLUGIN-APP-SURFACE-NOT-ENUMERABLE medium

Codex plugin app connector surface is not statically enumerable.

Why: Connector-backed app capabilities are externally mediated and cannot be proven from local plugin metadata alone.
SHIP-CODEX-PLUGIN-COMPONENT-PATH-MISSING high

Codex plugin component path cannot be loaded.

Why: Release review cannot inspect declared skills, MCP servers, apps, or hooks when component paths are missing or escape the package.
SHIP-CODEX-PLUGIN-MARKETPLACE-POLICY-MISSING medium

Codex plugin marketplace entry lacks policy metadata.

Why: Marketplace installation and authentication policy are part of the release surface coding agents need to review.
SHIP-CODEX-PLUGIN-MCP-SERVER-NOT-ENUMERABLE high

Codex plugin MCP server is declared but not statically enumerable.

Why: Agents Shipgate does not execute MCP commands, so reviewer-visible tool metadata requires an explicit local inventory.
SHIP-CODEX-PLUGIN-METADATA-MISSING medium

Codex plugin package metadata is incomplete or ambiguous.

Why: Plugin identity needs to be stable before publication or downstream agent adoption.
SHIP-CODEX-PLUGIN-SKILL-METADATA-MISSING medium

Codex plugin skill metadata is missing or duplicated.

Why: Skill frontmatter is the static routing surface agents use to decide whether a skill applies.

crewai

SHIP-CREWAI-DYNAMIC-TOOL-SURFACE-NOT-ENUMERABLE high

CrewAI tool surface cannot be statically enumerated.

Why: CrewAI exposes ad hoc agent-bound tool lists rather than a consistent toolset abstraction, so the check names the broader tool surface instead of ADK's toolset.
SHIP-CREWAI-FUNCTION-TOOL-METADATA-MISSING medium

CrewAI function tool lacks static metadata.

Why: Static review depends on descriptions and parameter schemas because user CrewAI code is not imported.

documentation

SHIP-DOC-MISSING-DESCRIPTION medium

Tool description is missing or too short.

Why: Poor tool descriptions increase wrong-tool and reviewer misunderstanding risk.

evidence

SHIP-EVIDENCE-APPROVAL-TRACE-MISSING high

Local HITL approval trace evidence is missing or incomplete for an approval-required tool.

Why: Limited automation review depends on reviewer-visible local evidence that approval-controlled actions were approved before the tool call; absence of local evidence does not prove the runtime control is absent.
SHIP-EVIDENCE-HIGH-RISK-EXCLUSION-MISSING high

Local high-risk auto-approval exclusion evidence is missing or incomplete.

Why: High-risk tools that already declare approval policy need separate local evidence that they are excluded from auto-approval review posture; absence of local evidence does not prove the runtime control is absent.
SHIP-EVIDENCE-HITL-PROMOTION-CRITERIA-MISSING high

Local HITL promotion criteria evidence is missing or incomplete.

Why: A limited auto-approval review posture needs local criteria evidence; Shipgate structures the missing evidence for reviewers but does not certify runtime enforcement.
SHIP-EVIDENCE-OVERRIDE-REASON-MISSING high

Local HITL override reason evidence is missing or incomplete.

Why: Override, bypass, and auto-approval events need reviewer-visible local reasons for governance review; absence of local evidence does not prove the runtime control is absent.

host_boundary

SHIP-HOST-BOUNDARY-CONFIG-PARSE-FAILED medium

A coding-agent host configuration file could not be parsed.

Why: A malformed host config prevents deterministic inspection of the agent's host capability boundary, so the diff-aware check fails closed to review.
SHIP-HOST-BOUNDARY-HOOK-CHANGED high

Claude Code hooks changed.

Why: Hooks execute commands inside the agent lifecycle and can alter host behavior before or after every tool call.
SHIP-HOST-BOUNDARY-MCP-SERVER-ADDED high

A new MCP server was declared for the coding-agent host.

Why: A new MCP server adds an entire external tool surface to the agent host; it must be human-reviewed before the agent can use it.
SHIP-HOST-BOUNDARY-MCP-SERVER-CHANGED high

An existing MCP server declaration changed its command, URL, args, or env keys.

Why: Changing what an already-trusted MCP server executes or connects to silently re-shapes the host tool surface behind an approved name.
SHIP-HOST-BOUNDARY-PERMISSION-ALLOW-EXPANDED high

The Claude Code permission allowlist expanded.

Why: Every new allow rule widens what the host executes without a prompt; expansion needs a human in the loop.
SHIP-HOST-BOUNDARY-PERMISSION-DENY-REMOVED high

A Claude Code permission deny rule was removed.

Why: Deny rules are the host's explicit guardrails; removing one silently re-enables a previously forbidden capability.
SHIP-HOST-BOUNDARY-PERMISSION-WILDCARD-ALLOW critical

A Claude Code allow rule grants a wildcard tool surface.

Why: Wildcard allow rules remove the host's per-command approval boundary entirely; an agent must not self-grant unrestricted tool access.
SHIP-HOST-BOUNDARY-PULL-REQUEST-TARGET-ADDED critical

A GitHub workflow gained a pull_request_target trigger.

Why: pull_request_target runs workflow code with secrets against fork PRs; adding it is a classic privilege-escalation surface.
SHIP-HOST-BOUNDARY-WORKFLOW-PERMISSIONS-EXPANDED high

GitHub workflow permissions expanded.

Why: Moving a scope from read to write (or granting a new write scope) widens what CI can do with the repository token.
SHIP-HOST-BOUNDARY-WORKFLOW-WRITE-ALL critical

A GitHub workflow grants write-all permissions.

Why: write-all hands the workflow token every write scope at once; an agent must not self-grant blanket CI write access.

inventory

SHIP-INVENTORY-LOW-CONFIDENCE-PRODUCTION-SURFACE high

Production target includes low-confidence tool extraction.

Why: Production promotion should not depend primarily on best-effort SDK inference.
SHIP-INVENTORY-NOT-ENUMERABLE high

Tool surface cannot be enumerated from declared inputs.

Why: A release gate must fail closed when it cannot see the agent's tools.
SHIP-INVENTORY-TOOL-SURFACE-TOO-LARGE medium

Tool surface exceeds the MVP review threshold.

Why: Large tool surfaces are harder to reason about during promotion.
SHIP-INVENTORY-WILDCARD-TOOLS high

Wildcard or all-tools exposure is declared.

Why: Wildcard tools make review and least-privilege reasoning impossible.

langchain

SHIP-LANGCHAIN-DYNAMIC-TOOL-SURFACE-NOT-ENUMERABLE high

LangChain tool surface cannot be statically enumerated.

Why: LangChain and LangGraph expose ad hoc tool lists and agent-bound tools rather than a consistent toolset abstraction, so the check names the broader tool surface instead of ADK's toolset.
SHIP-LANGCHAIN-FUNCTION-TOOL-METADATA-MISSING medium

LangChain function tool lacks static metadata.

Why: Static review depends on descriptions and parameter schemas because user LangChain code is not imported.

manifest

SHIP-MANIFEST-HIGH-RISK-OWNER-MISSING high

Production high-risk tool has no declared owner.

Why: High-risk production tools need an accountable owning team for review and remediation.
SHIP-MANIFEST-STALE-POLICY medium

A policy references a missing tool.

Why: Approval, confirmation, and idempotency policies should map to the actual release surface.
SHIP-MANIFEST-STALE-RISK-OVERRIDE medium

A risk override references a missing tool.

Why: Risk overrides should not outlive the tool they describe.
SHIP-MANIFEST-STALE-SUPPRESSION medium

A suppression references a missing check or tool.

Why: Stale suppressions hide intent and make release review harder to audit.
SHIP-MANIFEST-UNUSED-SCOPE medium

Manifest declares permission scopes unused by loaded tools.

Why: Unused permissions weaken least-privilege review and often indicate stale config.

mcp_permissions

SHIP-MCP-AUTO-APPROVE-SIDE-EFFECT critical

MCP side-effecting tool is auto-approved.

Why: Auto-approved side-effecting MCP tools can let an agent write, publish, destroy, operate production systems, or make financial changes without review.
SHIP-MCP-ENV-SECRET-PASSTHROUGH high

MCP server passes through secret environment variables.

Why: Secret-bearing environment pass-through expands the credential boundary available to a tool server.
SHIP-MCP-PERMISSION-EXPANDED high

MCP capability permissions expanded.

Why: MCP capability broadening changes the agent's static tool authority.
SHIP-MCP-READONLY-SERVER-ADDED low

Read-only local documentation MCP server added.

Why: Local read-only documentation servers are low risk but should remain visible in capability review.
SHIP-MCP-UNKNOWN-TOOL-SCHEMA high

MCP tool side effect cannot be proven from static schema.

Why: Incomplete or wildcard MCP metadata cannot prove the callable tool surface is read-only.

n8n

SHIP-N8N-AI-TOOL-METADATA-MISSING medium

n8n AI-exposed tool lacks static metadata.

Why: Static review depends on descriptions and parameter schemas because Shipgate does not execute n8n workflows.
SHIP-N8N-CREDENTIAL-EVIDENCE-MISSING high

n8n credential stubs are not declared.

Why: Credential type evidence lets reviewers assess high-risk integrations without exposing secret values.
SHIP-N8N-DYNAMIC-TOOL-SURFACE-NOT-ENUMERABLE high

n8n tool surface cannot be statically enumerated.

Why: Release review needs an explicit local inventory when n8n workflow JSON uses runtime tool names, unresolved workflow references, wildcard MCP exposure, or community tool nodes without static metadata. This is high severity in every environment because static release evidence cannot prove the actual tool inventory.
SHIP-N8N-EVAL-COVERAGE-MISSING medium

n8n eval coverage is not declared.

Why: n8n AI workflow releases should include response and tool-trajectory eval evidence before promotion.
SHIP-N8N-MCP-CLIENT-TOOLSET-UNFILTERED high

n8n MCP Client Tool exposes an unfiltered toolset.

Why: All-tools and all-except MCP client selections can expose more tools than reviewers expect. The check is environment-sensitive because the selector is straightforward to narrow before production, while production-like use increases blast radius.

policy

SHIP-POLICY-APPROVAL-MISSING critical

High-risk tool lacks a declared approval policy.

Why: High-risk actions need explicit approval before promotion.
SHIP-POLICY-CONFIRMATION-MISSING high

Destructive/external/customer-communication tool lacks a confirmation policy.

Why: Destructive and external actions should require explicit confirmation.

schema

SHIP-SCHEMA-BROAD-FREE-TEXT high

Action-like tool accepts broad free-form input.

Why: Broad action/body/update fields increase blast radius for write tools.
SHIP-SCHEMA-FREEFORM-OUTPUT medium

Tool returns free-form string output.

Why: Free-form tool output may carry prompt injection into later model context.
SHIP-SCHEMA-MISSING-BOUNDS high

Risky numeric parameter lacks a maximum bound.

Why: Unbounded counts or financial amounts weaken blast-radius control.

scope

SHIP-SCOPE-PROHIBITED-TOOL-PRESENT high

Tool appears to overlap with a manifest prohibited action.

Why: Prohibited actions should not be contradicted by attached tool capabilities.
SHIP-SCOPE-TOOL-OUTSIDE-PURPOSE high

Write-capable tool contradicts a read-only declared purpose.

Why: Declared purpose should constrain the attached tool surface.

security

SHIP-DOC-INJECTION-RISK medium

Tool description contains instruction-override-like language.

Why: Tool metadata can be placed into model context and should not contain prompt-like directives.
SHIP-DOC-SECRET-IN-DESCRIPTION medium

Tool description contains a secret-like value.

Why: Credentials in tool metadata can leak into reports, prompts, or logs.
SHIP-N8N-SECRET-IN-WORKFLOW-PARAMETER high

n8n workflow JSON contains a secret-like value.

Why: Workflow JSON, pinned data, static data, and node notes can be committed or reported; hardcoded secret-like values should be moved into credentials or variables.

side_effects

SHIP-SIDEFX-IDEMPOTENCY-MISSING high

Risky write tool lacks idempotency evidence; critical when retry is known.

Why: Retries against non-idempotent writes can duplicate financial or external side effects.

skill_lint

LINT-BODY-001 medium

Skill body lacks a step-by-step procedure.

Why: Agents need explicit operational steps to apply a skill consistently.
LINT-BODY-003 medium

Skill body lacks an output contract.

Why: Without an output contract, agents and reviewers cannot tell what artifact, response shape, or completion criteria the skill requires.
LINT-BODY-004 medium

Skill body lacks verification criteria.

Why: Verification guidance lets an agent check whether the workflow was applied correctly before reporting completion.
LINT-DESC-001 high

Skill description is too vague to route reliably.

Why: The skill description is the primary routing surface agents inspect before loading the full skill body.
LINT-DESC-003 medium

Skill description is overbroad and may false-trigger.

Why: Broad skill descriptions can cause agents to load a skill for tasks outside its intended behavior boundary.
LINT-SCRIPT-001 medium

Skill script lacks documented --help usage.

Why: Agent-invoked scripts need discoverable usage text so agents can run them non-interactively and safely.
LINT-SCRIPT-004 medium

Stateful skill script lacks dry-run support.

Why: Scripts that mutate files or external state should expose a preview path before agents perform the stateful action.
LINT-SPEC-002 high

Skill has invalid YAML frontmatter.

Why: Agents use SKILL.md frontmatter for discovery; malformed frontmatter makes the routing surface ambiguous or unreadable.
LINT-SPEC-003 high

Skill is missing required name frontmatter.

Why: A stable skill name is required for discovery and unambiguous review.
LINT-SPEC-004 high

Skill is missing required description frontmatter.

Why: A clear description tells agents when to select the skill before loading the full body.

skill_security

SEC-FLOW-004 medium

Skill lacks data and instruction separation guidance.

Why: Skills that combine untrusted content with outbound channels or secret access need explicit guidance that untrusted content is data, not agent instructions.
SEC-MISMATCH-001 high

Skill declares read-only behavior but bundled scripts mutate state.

Why: Reviewers and agents rely on the skill description to understand the behavior boundary before loading scripts and references.
SEC-PI-001 critical

Skill artifact contains instruction override language.

Why: Skill and instruction artifacts are loaded into agent context and must not tell agents to ignore higher-priority instructions.
SEC-PI-003 high

Skill artifact tells the agent to hide behavior.

Why: Instructions to conceal behavior from users, reviewers, or logs are unsafe in agent behavior surfaces.
SEC-PROV-001 high

Third-party skill lacks provenance metadata.

Why: Third-party skills are supply-chain artifacts and need source, owner, version, and review metadata.
SEC-REMOTE-001 medium

Skill fetches remote instruction content.

Why: Mutable remote prompt or instruction content can change skill behavior after review.
SEC-REMOTE-002 critical

Remote content is fetched and executed.

Why: Runtime remote-code execution lets mutable external content change the behavior of a skill after review.
SEC-SCRIPT-001 critical

Remote content is piped to a shell or interpreter.

Why: Piping remote content to an interpreter executes mutable external code in the agent environment.
SEC-SCRIPT-002 critical

Destructive or stateful command lacks guardrails.

Why: Skills that drive destructive commands need dry-run, confirmation, or path validation before an agent can invoke them safely.
SEC-SECRET-001 critical

Skill artifact contains a hardcoded secret-like value.

Why: Secrets in skill artifacts can leak into prompts, reports, logs, or agent execution environments.
SEC-SECRET-003 high

Skill artifact instructs credential or secret-file access.

Why: Skills should not direct agents to inspect broad credential files or environment secrets unless the workflow explicitly scopes and protects them.
SEC-TOOL-001 high

Skill pre-approves shell or bash without justification.

Why: Shell preapproval removes an important confirmation step before an agent runs commands from a skill.

verify

SHIP-VERIFY-AGENT-INSTRUCTIONS-WEAKENED medium

The PR edits agent-instruction trust roots (AGENTS.md, CLAUDE.md, .cursor/rules, skills, etc.) and Shipgate cannot statically prove the instructions were not weakened.

Why: Prompts are not controls (Principle 3); a static tool cannot judge whether instruction text was weakened, so any change is routed to human review rather than silently trusted.
SHIP-VERIFY-BASELINE-OR-WAIVER-EXPANDED high

The PR broadens what the gate forgives — a new suppression in checks.ignore, a widened waiver scope, or a larger accepted-debt baseline — versus the base report.

Why: Expanding suppressions, waivers, or the baseline is a reward-hacking move that hides findings instead of fixing them. A base-vs-head superset comparison of the effective-policy snapshot makes the expansion release-visible.
SHIP-VERIFY-CAPABILITY-SCOPE-BROADENED critical

The PR removes or broadens a dynamically-loaded agent toolkit's explicit least-privilege configuration bound (e.g. a stripe_agent_toolkit constructor allowlist) versus the base report, silently expanding the toolkit surface.

Why: Toolkits that load tools through a runtime factory (*toolkit.get_tools()) hide the individual tools from static enumeration, so the only statically-provable signal is the constructor's permission allowlist. Removing it mounts the full toolkit (refund / cancel / dispute on a customer-support agent); a coding agent must not self-approve that capability broadening. Without this check the opaque factory degrades the verdict to insufficient_evidence instead of blocked.
SHIP-VERIFY-CI-GATE-REMOVED critical

The PR deletes the Shipgate CI workflow from an opted-in repo, which would stop the release gate from running.

Why: Removing CI enforcement is the bluntest gate-bypass. Detecting a changed-and-now-absent workflow file makes the removal a release signal even without a base scan.
SHIP-VERIFY-POLICY-WEAKENED high

The PR's effective release policy is weaker than the base report's (CI mode downgraded, fail-on severity set loosened, or a check's severity override lowered across a tier).

Why: Comparing the normalized base-vs-head policy snapshot catches a coding agent that makes CI pass by weakening the gate rather than fixing the finding. When no base snapshot is available but a policy trust root was touched, the check fails safe to a review-required finding instead of silently passing.
SHIP-VERIFY-TRIGGER-CATALOG-DRIFT medium

The PR changes the trigger catalog (docs/triggers.json or an .agents-shipgate trigger config) that decides when Shipgate runs.

Why: Editing the trigger catalog can carve out paths so the gate stops firing on the very changes that matter — a gate-evasion one level up from suppressing findings.
SHIP-VERIFY-TRUST-ROOT-TOUCHED medium

A PR changed a release trust-root file (manifest, shipgate state, policy, prompts, the Shipgate CI gate, agent instructions, or a tool-surface declaration).

Why: Trust-root files define the release gate. A coding agent told to make CI pass can weaken the gate instead of fixing the underlying readiness issue (reward hacking); touching a trust root must therefore require human review.

Verify a PR docs/checks.md