Evidence Gate
Use this skill to insert a lightweight evidence gate into an existing workflow without replacing the workflow.
Its purpose is not to make the caller more cautious — capable agents are already cautious.
Its purpose is to make that caution structured, auditable, and actionable by answering a narrower question:
What evidence must exist before this conclusion or action is responsible enough to present, recommend, or execute?
Treat the caller's conclusion or action as tentative until the gate returns a verdict.
Keep the skill lightweight, selective, and non-blocking by default.
Scope
This skill gates the agent's own reasoning quality — not the user's intent.
It is not:
- - content moderation or policy enforcement
- user intent classification (allow / refuse / clarify)
- a legal, compliance, or safety advisory tool
- a replacement for domain expertise
Core idea
Given a tentative claim or action, do three things:
- 1. Define the minimum evidence obligations for that claim/action.
- Check what evidence already exists and what is still missing or conflicting.
- Return a verdict and a safe next-step policy.
Do not fully own evidence collection.
Recommend missing evidence for the caller to gather using its existing tools.
Operating model
Use a single-pass gate instead of taking over the full workflow:
- 1. The caller reaches a tentative claim, diagnosis, recommendation, or action.
- Generate the evidence obligations for that candidate.
- Evaluate only the evidence currently available in the invocation.
- Return a final verdict for this invocation:
- whether the current evidence is sufficient
- how the caller should downgrade if it is not
- which next evidence checks would be most valuable
- 5. Exit.
Assume no durable skill state across calls.
Do not require a second gate pass unless the caller explicitly chooses to orchestrate one outside this skill.
When to use
Use this skill when one or more of the following are true:
- - The caller is about to make a strong claim such as:
- "the root cause is X"
- "this is safe"
- "this configuration should be changed"
- "the correct action is Y"
- - The caller is about to recommend or execute a high-impact step such as:
- rollback
- scale up/down
- delete/disable/quarantine
- approve/reject
- change production configuration
- - The current conclusion appears to rely on only one signal, one log line, one chart, or one tool result.
- Competing explanations have not been checked.
- The user explicitly asks for an evidence-backed answer.
- The environment or workflow has a policy requiring stronger justification before action.
When NOT to use
Do not use this skill when:
- - The output is low-risk and easily reversible.
- The task is simple summarization or formatting.
- The caller is brainstorming possibilities and is not presenting a conclusion as established.
- The additional delay or cost of gating would outweigh the value.
- The caller already has an explicit evidence-validation layer for this exact step.
Design constraints
This skill must preserve the caller's original capability as much as possible.
It should:
- - be selective rather than always-on
- avoid taking over the entire workflow
- avoid forcing chain-of-thought disclosure
- avoid blocking work unless a real risk threshold is crossed
- prefer downgrade/fallback over hard failure
- assume each invocation is stateless
Integration policy
Apply these defaults unless the caller provides stricter policy:
- 1. Run the gate only at conclusion points or before high-impact actions.
- Generate only
2-5 concrete evidence obligations. - Evaluate only the evidence explicitly present in the current invocation.
- Return one final verdict for the current invocation.
- If evidence is insufficient, downgrade or defer instead of spinning.
- Keep domain ownership with the caller.
- Judge only explicit artifacts, not hidden reasoning.
Input contract
The only required input is the claim — the conclusion, diagnosis, recommendation, or action under consideration.
Invocation examples:
- - INLINECODE1
- INLINECODE2
- Agent self-trigger: the agent recognizes a gate-worthy moment and invokes the skill with the current claim from context.
When invoked with just a claim, the skill infers the remaining context:
- -
claim_type: inferred from the claim language (e.g., "the cause is" → diagnosis, "safe to" → safety, "should delete" → action) - INLINECODE7 : inferred from the current working context
- INLINECODE8 : inferred from the action's reversibility and blast radius
- INLINECODE9 : inferred from whether the caller is informing, recommending, or about to execute
- INLINECODE10 : inferred from the claim's language strength
The caller may optionally provide any of these fields to override inference.
Use references/input-template.md when a caller wants a canonical explicit input shape.
See references/protocol.md for the full schema semantics.
Output contract
The skill should return a structured gate result containing:
- - whether a gate is required
- why the gate is required
- evidence requirements
- per-requirement status
- missing evidence
- conflicting evidence
- sufficiency rule
- verdict
- allowed next actions
- blocked next actions
- fallback behavior
- suggested caller wording when evidence is insufficient
- next evidence actions
Return JSON matching references/output-template.md.
Use references/verdict-schema.json as the machine-checkable schema.
Keep gate_required even on explicit invocation.
Use gate_required = false as a fast exit when the claim is already low-risk, exploratory, or sufficiently bounded.
Verdict states
Use exactly these verdicts:
- Evidence is sufficient for the intended claim/action.
- Evidence is incomplete, but sufficient for a weaker claim, advisory output, or low-risk continuation.
- Evidence is insufficient for the intended strength or risk level. High-impact continuation should not proceed.
- Evidence materially disagrees or supports multiple competing interpretations. The caller should not present a strong conclusion as settled.
Required behavior
1. Normalize the candidate
Reduce the caller's current position to a tentative, explicit candidate.
If the caller already states the final conclusion as settled, rewrite it internally as tentative before gating it.
2. Define evidence obligations
Translate the candidate claim/action into a small set of concrete evidence requirements.
Good evidence requirements are:
- - specific
- externally checkable
- operationally gatherable
- tied to the claim, not generic boilerplate
Bad evidence requirements are vague, such as:
- - "get more proof"
- "verify better"
- "be more certain"
3. Evaluate sufficiency
Determine whether currently known evidence satisfies the requirements.
The skill should explicitly mark:
- - INLINECODE21
- INLINECODE22
- INLINECODE23
- INLINECODE24
4. Produce a final verdict for the current invocation
Return a verdict immediately after evaluating known evidence.
If evidence is missing, identify only the smallest set of additional checks that would materially change the verdict.
5. Prefer downgrade over dead stop
If evidence is insufficient, prefer one of:
- - provisional conclusion
- candidate hypotheses
- advisory-only output
- ask-for-human-review
- request-more-evidence plan
Do not hard-block low-risk work unnecessarily.
6. Assume stateless execution
Assume every call is fresh.
Do not depend on remembering prior requirements, prior verdicts, or prior collection attempts unless the caller explicitly embeds them in the current input.
7. Avoid hidden-reasoning dependence
Do not require access to hidden chain-of-thought.
Judge only from explicit claim, explicit evidence, explicit policy, and explicit outputs.
Suggested workflow
- 1. Receive normalized candidate claim/action.
- Decide whether gating is required.
- If no gate is required, return
PASS with rationale. - If a gate is required:
- generate evidence requirements
- evaluate known evidence
- identify gaps and conflicts
- apply a sufficiency rule
- produce a final verdict for this invocation
- produce fallback and next-step guidance
- 5. Return a structured result without taking over execution.
Default trigger heuristics
Bias toward using this skill when any of the following are present:
- - INLINECODE26
- INLINECODE27
- claim language is strong or definitive
- only one evidence source supports the claim
- no competing hypothesis check exists
- action is costly, irreversible, or externally visible
Bias away from using this skill when:
- - INLINECODE28
- the output is exploratory, not conclusive
- the result is easy to reverse
- the task is primarily formatting or summarization
Default fallback policy
When the gate does not fully pass, prefer these downgrades:
- - intended strong conclusion -> provisional conclusion
- automatic action -> advisory recommendation
- settled diagnosis -> candidate hypotheses
- irreversible operation -> human approval required
- insufficient current evidence -> stop and return a bounded next-evidence plan
Output style guidance
When the verdict is not PASS, the caller should avoid overstating certainty.
Good examples:
- - "Current evidence suggests X, but this is not yet sufficiently established."
- "This is a plausible diagnosis, not a confirmed root cause."
- "Evidence is currently insufficient for automatic execution."
- "Additional evidence is needed before recommending Y with confidence."
Bad examples:
- - "This is definitely the cause" when key evidence is missing
- "Safe to proceed" when competing evidence exists
Example use cases
Before recommending scale-up, verify that bottleneck evidence is real and alternative explanations were checked.
Before claiming a bug root cause, verify reproduction path, code-path match, and at least one falsified alternative.
Before declaring an action safe, require policy match, scope confirmation, and risk checks.
Before presenting a strong conclusion, require source support and contradiction checks.
Non-goals
This skill is not:
- - a universal orchestrator
- a replacement for domain expertise
- a guarantee of correctness
- a hidden chain-of-thought inspector
- a mandatory wrapper around every agent step
Its job is narrower:
make evidence obligations explicit, assess whether they are met, and enforce safe downgrade behavior when they are not.
证据门
使用此技能可在现有工作流中插入轻量级证据门,而无需替换整个工作流。
其目的并非让调用者更加谨慎——有能力的智能体已经足够谨慎。
其目的是通过回答一个更窄的问题,使这种谨慎变得结构化、可审计且可操作:
在得出或执行某个结论或行动之前,必须存在哪些证据,才能使其足够负责任地呈现、推荐或执行?
在证据门返回裁决之前,将调用者的结论或行动视为暂定。
默认情况下,保持该技能的轻量、选择性和非阻塞性。
范围
此技能用于把关智能体自身的推理质量——而非用户的意图。
它不是:
- - 内容审核或政策执行
- 用户意图分类(允许/拒绝/澄清)
- 法律、合规或安全咨询工具
- 领域专业知识的替代品
核心思想
给定一个暂定的主张或行动,执行三项操作:
- 1. 定义该主张/行动的最低证据义务。
- 检查已有证据以及仍缺失或存在冲突的证据。
- 返回裁决和安全的下步策略。
不完全负责证据收集。
推荐缺失的证据,由调用者使用其现有工具自行收集。
运作模式
使用单次通过的门,而非接管整个工作流:
- 1. 调用者得出暂定的主张、诊断、建议或行动。
- 为该候选方案生成证据义务。
- 仅评估当前调用中可用的证据。
- 返回本次调用的最终裁决:
- 当前证据是否充分
- 若不充分,调用者应如何降级
- 哪些后续证据检查最有价值
- 5. 退出。
假设跨调用无持久技能状态。
除非调用者明确选择在此技能之外自行编排第二次门控,否则不要求第二次门控通过。
使用时机
当以下一个或多个条件成立时,使用此技能:
- 根本原因是X
- 这是安全的
- 应更改此配置
- 正确的操作是Y
- 回滚
- 扩缩容
- 删除/禁用/隔离
- 批准/拒绝
- 更改生产配置
- - 当前结论似乎仅依赖一个信号、一行日志、一张图表或一个工具结果。
- 尚未检查竞争性解释。
- 用户明确要求基于证据的答案。
- 环境或工作流有策略要求在行动前提供更强有力的理由。
不使用的时机
在以下情况下不要使用此技能:
- - 输出风险低且易于逆转。
- 任务是简单的总结或格式化。
- 调用者正在头脑风暴可能性,并未将结论视为已确定。
- 门控带来的额外延迟或成本超过其价值。
- 调用者已为此步骤拥有明确的证据验证层。
设计约束
此技能必须尽可能保留调用者的原始能力。
它应:
- - 具有选择性而非始终开启
- 避免接管整个工作流
- 避免强制披露思维链
- 除非达到实际风险阈值,否则避免阻塞工作
- 优先选择降级/回退而非硬性失败
- 假设每次调用都是无状态的
集成策略
除非调用者提供更严格的策略,否则应用以下默认设置:
- 1. 仅在结论点或高影响行动前运行门控。
- 仅生成2-5个具体的证据义务。
- 仅评估当前调用中明确存在的证据。
- 返回当前调用的一个最终裁决。
- 如果证据不足,降级或推迟而非空转。
- 将领域所有权保留给调用者。
- 仅判断明确的人工制品,而非隐藏的推理。
输入契约
唯一必需的输入是主张——正在考虑的结论、诊断、建议或行动。
调用示例:
- - /evidence-gate 根本原因是请求解析中的空指针解引用
- /evidence-gate 删除暂存数据库是安全的
- 智能体自触发:智能体识别出门控时机,并使用上下文中的当前主张调用该技能。
当仅凭一个主张调用时,技能会推断剩余上下文:
- - claimtype:从主张语言推断(例如,原因是 → 诊断,安全 → 安全,应删除 → 行动)
- domain:从当前工作上下文推断
- risklevel:从行动的可逆性和影响范围推断
- executionmode:从调用者是在告知、推荐还是即将执行推断
- targetstrength:从主张语言的强度推断
调用者可选择提供这些字段中的任何一个以覆盖推断。
当调用者需要规范的显式输入格式时,请使用 references/input-template.md。
完整模式语义请参见 references/protocol.md。
输出契约
技能应返回包含以下内容的结构化门控结果:
- - 是否需要门控
- 为何需要门控
- 证据要求
- 每个要求的状态
- 缺失的证据
- 冲突的证据
- 充分性规则
- 裁决
- 允许的后续操作
- 阻止的后续操作
- 回退行为
- 证据不足时建议的调用者措辞
- 后续证据行动
返回与 references/output-template.md 匹配的 JSON。
使用 references/verdict-schema.json 作为机器可检查的模式。
即使在显式调用时也保留 gate_required。
当主张已属于低风险、探索性或范围充分受限时,使用 gate_required = false 作为快速退出。
裁决状态
精确使用以下裁决:
- 证据足以支持预期的主张/行动。
- 证据不完整,但足以支持较弱的主张、咨询性输出或低风险继续。
- 证据不足以支持预期的强度或风险水平。不应继续进行高影响操作。
- 证据存在实质性分歧或支持多个相互竞争的解释。调用者不应将强结论视为已确定。
必需行为
1. 规范化候选方案
将调用者的当前立场简化为暂定的、明确的候选方案。
如果调用者已将最终结论视为确定,在门控之前内部将其重写为暂定。
2. 定义证据义务
将候选主张/行动转化为一小组具体的证据要求。
好的证据要求是:
- - 具体的
- 外部可检查的
- 操作上可收集的
- 与主张相关,而非通用模板
不好的证据要求是模糊的,例如:
3. 评估充分性
确定当前已知证据是否满足要求。
技能应明确标记:
- - satisfied(已满足)
- missing(缺失)
- conflicting(冲突)
- not_applicable(不适用)
4. 为当前调用生成最终裁决
在评估已知证据后立即返回裁决。
如果证据缺失,仅识别能够实质性改变裁决的最小附加检查集。
5. 优先降级而非死停
如果证据不足,优先选择以下之一:
- - 暂定结论
- 候选假设
- 仅咨询性输出
- 请求人工审查
- 请求更多证据的计划
不要不必要地硬性阻止低风险工作。
6. 假设无状态执行
假设每次调用都是全新的。
除非调用者明确将先前的需求、先前的裁决或先前的收集尝试嵌入当前输入中,否则不依赖记忆。
7. 避免依赖隐藏推理
不需要访问隐藏的思维链。
仅根据明确的主张、明确的证据、明确的策略和明确的输出进行判断。
建议工作流
- 1. 接收规范化后的候选主张/行动。
- 决定是否需要门控。
- 如果不需要门控,返回 PASS 并附上理由。
- 如果需要门控:
- 生成证据要求
- 评估已知证据
- 识别缺口和冲突
- 应用充分性规则
- 为本次调用生成最终裁决
- 生成回退和下步指导
- 5. 返回结构化结果,不接管执行。
默认触发启发式
当存在以下任何情况时,倾向于使用此技能:
- - risklevel = high
- executionmode = auto
- 主张语言强烈或确定
- 仅有一个证据来源支持该主张
- 不存在竞争性假设检查
- 行动成本高、不可逆或对外可见
在以下情况下避免使用此技能:
- - risk_level = low
- 输出是探索性的,而非结论性的
- 结果易于逆转
- 任务主要是格式化或总结
默认回退策略
当门控未完全通过时,优先选择以下降级:
- - 预期的强结论 → 暂定结论
- 自动行动 → 咨询性建议
- 已确定的诊断 → 候选假设
- 不可逆操作 → 需要人工批准
- 当前证据不足 → 停止并返回有边界的下步证据计划
输出风格指导
当裁决不是 PASS