Codex Review — Three-Tier Code Quality Defense

Unified orchestration layer: picks audit depth based on trigger phrases. bug-audit is invoked as an independent skill — never modified.

Security & Privacy

- Read-only by default: This skill only reads your project files for analysis. It does NOT modify, delete, or upload your code anywhere.
Optional external model: L1/L3 can use an external code-review API (OpenAI-compatible) for a second opinion. This is opt-in — if no API key is configured, the skill works fine with agent-only review.
Credentials via environment variables only: API keys are loaded from CODEX_REVIEW_API_KEY env var. Never hardcoded, never logged, never stored.
Local-only artifacts: Hotspot files are written to system temp directory and auto-cleaned. No network transmission of analysis results.
No data exfiltration: Code snippets sent to the external API are limited to the files being reviewed. No telemetry, no analytics, no third-party data sharing beyond the configured review model.

Prerequisites

- External model API (optional, for L1 Round 1 and L3): Any OpenAI-compatible endpoint.

- Set env vars: CODEX_REVIEW_API_BASE (default: https://api.openai.com/v1), CODEX_REVIEW_API_KEY, CODEX_REVIEW_MODEL (default: gpt-4o) - Works without this — falls back to agent-only audit

- bug-audit skill (optional): Required for L2/L3. Without it, L2 uses a built-in fallback.
curl: For API calls (standard on macOS/Linux)

Trigger Mapping

User says	Level	What it does	Est. time
"review" / "quick scan" / "review下" / "检查下"	L1	External model scan + agent deep pass	5-10 min
"audit" / "deep audit" / "审计下" / "排查下"

L2 | Full bug-audit flow (or built-in fallback) | 30-60 min | | "pre-deploy check" / "上线前检查" | L1→L2 | L1 scan → record hotspots → L2 audit → hotspot gap check | 40-70 min | | "cross-validate" / "highest level" / "交叉验证" | L3 | Dual independent audits + compare + adversarial test | 60-90 min |

Level 1: Quick Scan (core of codex-review)

Flow

1. Gather code — local read, git clone <url>, server scp, user-pasted snippet, or PR diff
Exclude — node_modules/, .git/, package-lock.json, dist/, *.db, pycache/, vendor/
Round 1 — send to external model API for automated scan (skipped if no API key)
Round 2 — current agent does deep supplementary pass
Merge & dedup — output severity-graded report
Write hotspot file (for L1→L2 handoff)

External Model API Call

CODEBLOCK0

Fallback: If API call fails or times out (120s), skip Round 1 and complete with agent-only audit.

System Prompt (L1 External Scan)

CODEBLOCK1

Agent Round 2 — Universal Checklist

- [ ] Cross-file logic consistency (imports, exports, shared state)
[ ] Authentication & authorization bypass
[ ] Race conditions (concurrent requests, DB write conflicts)
[ ] Unhandled exceptions / missing error boundaries
[ ] Input validation & sanitization (SQL injection, XSS, path traversal)
[ ] Memory/resource leaks (unclosed connections, event listener buildup)
[ ] Sensitive data exposure (keys in code, logs, error messages)
[ ] Timezone handling (UTC vs local)
[ ] Dependency vulnerabilities (outdated packages, known CVEs)

Agent Round 2 — Tech-Stack Specific (auto-detect & apply)

Node.js/Express:

- [ ] SQLite pitfalls (DEFAULT doesn't support functions, double-quote = column name)
[ ] Middleware ordering (auth before route handlers)
[ ] pm2/cluster mode compatibility

Python/Django/Flask:

- [ ] ORM N+1 queries
[ ] CSRF protection enabled
[ ] Debug mode in production

Frontend (React/Vue/vanilla):

- [ ] innerHTML / dangerouslySetInnerHTML without sanitization
[ ] WebView compatibility (WeChat, in-app browsers)
[ ] Nginx sub-path / base URL issues

Other stacks: adapt checklist to detected technology.

Code Volume Control

- Single API request: backend core files only (server + routes + db + config)
Send frontend as a second batch if needed
Very large projects (>50 files): summarize file tree first, then scan in priority order

Hotspot File (L1→L2 handoff)

After L1, write issue summary to ${TMPDIR:-/tmp}/codex-review-hotspots.json: CODEBLOCK2

This file is only used internally for L1→L2 handoff. bug-audit is unaware of it.

Level 2: Deep Audit

Flow (bug-audit available)

1. Read bug-audit's SKILL.md and execute its full flow (6 Phases)
bug-audit itself is never modified
Agent strictly follows bug-audit's specification

Flow (bug-audit NOT available — built-in fallback)

1. Phase 1: Project Dissection — read all source files, build dependency graph
Phase 2: Build Check Matrix — generate project-specific checklist from actual code patterns
Phase 3: Exhaustive Verification — verify every checklist item against real code
Phase 4: Reproduce — for each finding, trace the exact execution path
Phase 5: Report — output full severity-graded report
Phase 6: Fix Suggestions — provide concrete code patches

Level 1→2 Cascade: Pre-Deploy Check

Flow

1. Execute L1 quick scan
Write hotspot file
Execute L2 (bug-audit or fallback)
After L2, agent does hotspot gap analysis:

- Read hotspot file - Check if L2 report covers each L1 hotspot - Uncovered hotspots → targeted deep analysis, add to report - L1 vs L2 conclusions conflict → flag for manual review

5. Output final merged report

Level 3: Cross-Validation (highest level)

Flow

CODEBLOCK3

Adversarial Test Prompt

You are a security researcher. The following security fixes were applied to a project.
For each fix, analyze:
1. Can the fix be bypassed? How?
2. Does the fix introduce new vulnerabilities?
3. Are there edge cases the fix doesn't cover?
Be adversarial and thorough. Output language: match the user's language.

Report Format (all levels)

CODEBLOCK5

User Options

Users can customize behavior by saying:

- "only scan backend" / "只扫后端" → skip frontend files
"ignore LOW" / "忽略低级别" → filter out LOW severity
"output in English/Chinese" → control report language
"scan this PR" / "审这个PR" → fetch PR diff instead of full codebase
"skip external model" / "不用外部模型" → agent-only audit

Notes

1. External API timeout: 120 seconds. On failure, skip that round — agent completes independently
Large projects: split into batches (backend → frontend → config)
Long reports: split across multiple messages, adapted to current channel
L2/L3 bug-audit execution strictly follows its own SKILL.md — no modifications or shortcuts
Hotspot file is ephemeral — overwritten each L1 run, not persisted
All secrets/keys must come from env vars or user config — never hardcoded in this skill

Codex Review — 三层代码质量防御体系

统一编排层：根据触发短语选择审计深度。bug-audit 作为独立技能被调用——绝不修改。

安全与隐私

- 默认只读：本技能仅读取你的项目文件进行分析。它不会在任何地方修改、删除或上传你的代码。
可选外部模型：L1/L3 可使用外部代码审查 API（兼容 OpenAI）进行二次验证。这是自愿选择——如果未配置 API 密钥，该技能仅使用智能体审查即可正常工作。
仅通过环境变量传递凭证：API 密钥从 CODEXREVIEWAPI_KEY 环境变量加载。绝不硬编码、绝不记录、绝不存储。
仅本地产物：热点文件写入系统临时目录并自动清理。分析结果不进行网络传输。
无数据泄露：发送到外部 API 的代码片段仅限于被审查的文件。不进行遥测、不进行分析、不向配置的审查模型之外的第三方共享数据。

前置条件

- 外部模型 API（可选，用于 L1 第1轮和 L3）：任何兼容 OpenAI 的端点。

- 设置环境变量：CODEXREVIEWAPIBASE（默认：https://api.openai.com/v1）、CODEXREVIEWAPIKEY、CODEXREVIEWMODEL（默认：gpt-4o） - 无此配置也能工作——回退到仅智能体审计

- bug-audit 技能（可选）：L2/L3 需要。无此技能时，L2 使用内置回退方案。
curl：用于 API 调用（macOS/Linux 标准工具）

触发映射

用户输入	级别	功能	预计时间
review / quick scan / review下 / 检查下	L1	外部模型扫描 + 智能体深度检查	5-10 分钟
audit / deep audit / 审计下 / 排查下

L2 | 完整 bug-audit 流程（或内置回退方案） | 30-60 分钟 | | pre-deploy check / 上线前检查 | L1→L2 | L1 扫描 → 记录热点 → L2 审计 → 热点缺口检查 | 40-70 分钟 | | cross-validate / highest level / 交叉验证 | L3 | 双重独立审计 + 对比 + 对抗性测试 | 60-90 分钟 |

级别 1：快速扫描（codex-review 核心）

流程

1. 收集代码 — 本地 read、git clone 、服务器 scp、用户粘贴的代码片段或 PR diff
排除 — node_modules/、.git/、package-lock.json、dist/、*.db、pycache/、vendor/
第1轮 — 发送到外部模型 API 进行自动扫描（无 API 密钥则跳过）
第2轮 — 当前智能体执行深度补充检查
合并与去重 — 输出严重程度分级报告
写入热点文件（用于 L1→L2 交接）

外部模型 API 调用

bash
curl -s ${CODEXREVIEWAPI_BASE:-https://api.openai.com/v1}/chat/completions \
-H Authorization: Bearer ${CODEXREVIEWAPI_KEY} \
-H Content-Type: application/json \
-d {
model: ${CODEXREVIEWMODEL:-gpt-4o},
messages: [
{role: system, content: SYSTEMPROMPT>},
{role: user, content: } ], temperature: 0.2, max_tokens: 6000 }


回退方案：如果 API 调用失败或超时（120秒），跳过第1轮，仅使用智能体审计完成。
系统提示（L1 外部扫描）
你是一位代码审查专家。找出所有错误和安全问题：

1. 严重 — 安全漏洞（XSS、注入、认证绕过）、崩溃性错误
高 — 逻辑错误、竞态条件、未处理的异常
中 — 缺少验证、边界情况、性能问题
低 — 代码风格、死代码、微小改进

每项需包含：严重程度、文件+行号、问题描述、修复方案及代码片段。

关注真实错误，而非风格意见。输出语言：与用户语言一致。
智能体第2轮 — 通用检查清单
- [ ] 跨文件逻辑一致性（导入、导出、共享状态）
[ ] 认证与授权绕过
[ ] 竞态条件（并发请求、数据库写入冲突）
[ ] 未处理的异常 / 缺少错误边界
[ ] 输入验证与清理（SQL注入、XSS、路径遍历）
[ ] 内存/资源泄漏（未关闭的连接、事件监听器堆积）
[ ] 敏感数据暴露（代码中的密钥、日志、错误消息）
[ ] 时区处理（UTC 与本地时间）
[ ] 依赖漏洞（过时的包、已知 CVE）
智能体第2轮 — 技术栈特定（自动检测并应用）
Node.js/Express：

- [ ] SQLite 陷阱（DEFAULT 不支持函数，双引号=列名）
[ ] 中间件顺序（认证在路由处理器之前）
[ ] pm2/集群模式兼容性

Python/Django/Flask：

- [ ] ORM N+1 查询
[ ] CSRF 保护已启用
[ ] 生产环境中的调试模式

前端（React/Vue/原生）：

- [ ] 未经过滤的 innerHTML / dangerouslySetInnerHTML
[ ] WebView 兼容性（微信、应用内浏览器）
[ ] Nginx 子路径 / base URL 问题

其他技术栈： 根据检测到的技术调整检查清单。
代码量控制
- 单次 API 请求：仅后端核心文件（服务器 + 路由 + 数据库 + 配置）
如有需要，将前端作为第二批发送
非常大的项目（超过50个文件）：先汇总文件树，然后按优先级顺序扫描
热点文件（L1→L2 交接）
L1 完成后，将问题摘要写入 ${TMPDIR:-/tmp}/codex-review-hotspots.json：
json
{
  project: my-project,
  timestamp: 2026-03-05T22:00:00,
  hotspots: [
    {file: routes/admin.js, severity: CRITICAL, brief: 通过 localhost 绕过管理员认证},
    {file: routes/game.js, severity: CRITICAL, brief: 分数提交无服务器验证}
  ]
}
此文件仅用于 L1→L2 内部交接。bug-audit 不知晓此文件。



级别 2：深度审计

流程（有 bug-audit）
1. 读取 bug-audit 的 SKILL.md 并执行其完整流程（6个阶段）
bug-audit 本身绝不修改
智能体严格遵循 bug-audit 的规范
流程（无 bug-audit — 内置回退方案）
1. 阶段 1：项目剖析 — 读取所有源文件，构建依赖图
阶段 2：构建检查矩阵 — 根据实际代码模式生成项目特定检查清单
阶段 3：穷举验证 — 对照实际代码验证每个检查清单项
阶段 4：复现 — 对每个发现，追踪确切的执行路径
阶段 5：报告 — 输出完整的严重程度分级报告
阶段 6：修复建议 — 提供具体的代码补丁

级别 1→2 级联：预部署检查
流程
1. 执行 L1 快速扫描
写入热点文件
执行 L2（bug-audit 或回退方案）
L2 完成后，智能体执行热点缺口分析：
   - 读取热点文件
   - 检查 L2 报告是否覆盖每个 L1 热点
   - 未覆盖的热点 → 针对性深度分析，添加到报告
   - L1 与 L2 结论冲突 → 标记为需要人工审查
5. 输出最终合并报告

级别 3：交叉验证（最高级别）
流程
步骤 1：外部模型独立审计

  → 将完整代码发送到外部 API，附带详细系统提示

codex-review代码审查

codex-review

Codex Review — Three-Tier Code Quality Defense

Security & Privacy

Prerequisites

Trigger Mapping

Level 1: Quick Scan (core of codex-review)

Flow

External Model API Call

System Prompt (L1 External Scan)

Agent Round 2 — Universal Checklist

Agent Round 2 — Tech-Stack Specific (auto-detect & apply)

Code Volume Control

Hotspot File (L1→L2 handoff)

Level 2: Deep Audit

Flow (bug-audit available)

Flow (bug-audit NOT available — built-in fallback)

Level 1→2 Cascade: Pre-Deploy Check

Flow

Level 3: Cross-Validation (highest level)

Flow

Adversarial Test Prompt

Report Format (all levels)

User Options

Notes

Codex Review — 三层代码质量防御体系

安全与隐私

前置条件

触发映射

级别 1：快速扫描（codex-review 核心）

流程

外部模型 API 调用

系统提示（L1 外部扫描）

智能体第2轮 — 通用检查清单

智能体第2轮 — 技术栈特定（自动检测并应用）

代码量控制

热点文件（L1→L2 交接）

级别 2：深度审计

流程（有 bug-audit）

流程（无 bug-audit — 内置回退方案）

级别 1→2 级联：预部署检查

流程

级别 3：交叉验证（最高级别）

流程

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement