GitHub Actions Failure Spike Audit
Use this skill to catch workflows that recently degraded (new flaky tests, broken deploy gates, bad dependency updates, or infra outages) before they become long-running incidents.
What this skill does
- - Reads GitHub Actions run JSON exports
- Groups by repository + workflow + branch + event
- Splits each group into recent runs and baseline history
- Compares recent failure rate to baseline failure rate
- Scores severity (
ok, warn, critical) using spike + recent failure rate gates - Emits text or JSON output for CI automation
Inputs
Optional:
- -
RUN_GLOB (default: artifacts/github-actions/*.json) - INLINECODE5 (default:
20) - INLINECODE7 (
text or json, default: text) - INLINECODE11 (default:
4) - INLINECODE13 (default:
3) - INLINECODE15 (default:
4) - INLINECODE17 (default:
15) - INLINECODE19 (default:
30) - INLINECODE21 (default:
25) - INLINECODE23 (default:
45) - INLINECODE25 (regex, optional)
- INLINECODE26 (regex, optional)
- INLINECODE27 (regex, optional)
- INLINECODE28 (regex, optional)
- INLINECODE29 (regex, optional)
- INLINECODE30 (regex, optional)
- INLINECODE31 (regex, optional)
- INLINECODE32 (regex, optional)
- INLINECODE33 (
0 or 1, default: 0)
Collect run JSON
CODEBLOCK0
Run
Text report:
CODEBLOCK1
JSON output + fail gate:
CODEBLOCK2
Run against bundled fixtures:
CODEBLOCK3
Output contract
- - Exit
0 in report mode (default) - Exit
1 when FAIL_ON_CRITICAL=1 and one or more groups are critical - Text mode prints summary + ranked failure-rate spike groups
- JSON mode prints summary + ranked groups + critical groups
GitHub Actions 故障峰值审计
使用此技能可在工作流问题(新增的脆弱测试、损坏的部署门禁、不良依赖更新或基础设施中断)演变为长期事件之前,及时发现近期出现故障的工作流。
技能功能
- - 读取 GitHub Actions 运行 JSON 导出文件
- 按仓库 + 工作流 + 分支 + 事件进行分组
- 将每组划分为近期运行和基线历史记录
- 比较近期故障率与基线故障率
- 使用峰值 + 近期故障率阈值对严重程度进行评分(ok、warn、critical)
- 输出文本或 JSON 格式结果,用于 CI 自动化
输入参数
可选参数:
- - RUNGLOB(默认值:artifacts/github-actions/*.json)
- TOPN(默认值:20)
- OUTPUTFORMAT(text 或 json,默认值:text)
- RECENTRUNS(默认值:4)
- MINRECENTRUNS(默认值:3)
- MINBASELINERUNS(默认值:4)
- WARNSPIKEPCT(默认值:15)
- CRITICALSPIKEPCT(默认值:30)
- WARNRECENTFAILURERATE(默认值:25)
- CRITICALRECENTFAILURERATE(默认值:45)
- WORKFLOWMATCH(正则表达式,可选)
- WORKFLOWEXCLUDE(正则表达式,可选)
- BRANCHMATCH(正则表达式,可选)
- BRANCHEXCLUDE(正则表达式,可选)
- EVENTMATCH(正则表达式,可选)
- EVENTEXCLUDE(正则表达式,可选)
- REPOMATCH(正则表达式,可选)
- REPOEXCLUDE(正则表达式,可选)
- FAILONCRITICAL(0 或 1,默认值:0)
收集运行 JSON
bash
gh run view --json databaseId,workflowName,event,conclusion,headBranch,headSha,createdAt,updatedAt,startedAt,url,repository \
> artifacts/github-actions/run-.json
运行
文本报告:
bash
RUN_GLOB=artifacts/github-actions/*.json \
RECENT_RUNS=8 \
WARNSPIKEPCT=12 \
bash skills/github-actions-failure-spike-audit/scripts/failure-spike-audit.sh
JSON 输出 + 故障门禁:
bash
RUN_GLOB=artifacts/github-actions/*.json \
OUTPUT_FORMAT=json \
FAILONCRITICAL=1 \
bash skills/github-actions-failure-spike-audit/scripts/failure-spike-audit.sh
使用捆绑的测试数据运行:
bash
RUN_GLOB=skills/github-actions-failure-spike-audit/fixtures/*.json \
bash skills/github-actions-failure-spike-audit/scripts/failure-spike-audit.sh
输出约定
- - 报告模式下退出码为 0(默认)
- 当 FAILONCRITICAL=1 且存在一个或多个严重组时,退出码为 1
- 文本模式输出摘要 + 按故障率峰值排序的组
- JSON 模式输出摘要 + 排序组 + 严重组