Codex Review — Three-Tier Code Quality Defense
Unified orchestration layer: picks audit depth based on trigger phrases. bug-audit is invoked as an independent skill — never modified.
Security & Privacy
- - Read-only by default: This skill only reads your project files for analysis. It does NOT modify, delete, or upload your code anywhere.
- Optional external model: L1/L3 can use an external code-review API (OpenAI-compatible) for a second opinion. This is opt-in — if no API key is configured, the skill works fine with agent-only review.
- Credentials via environment variables only: API keys are loaded from
CODEX_REVIEW_API_KEY env var. Never hardcoded, never logged, never stored. - Local-only artifacts: Hotspot files are written to system temp directory and auto-cleaned. No network transmission of analysis results.
- No data exfiltration: Code snippets sent to the external API are limited to the files being reviewed. No telemetry, no analytics, no third-party data sharing beyond the configured review model.
Prerequisites
- - External model API (optional, for L1 Round 1 and L3): Any OpenAI-compatible endpoint.
- Set env vars:
CODEX_REVIEW_API_BASE (default:
https://api.openai.com/v1),
CODEX_REVIEW_API_KEY,
CODEX_REVIEW_MODEL (default:
gpt-4o)
- Works without this — falls back to agent-only audit
- - bug-audit skill (optional): Required for L2/L3. Without it, L2 uses a built-in fallback.
- curl: For API calls (standard on macOS/Linux)
Trigger Mapping
| User says | Level | What it does | Est. time |
|---|
| "review" / "quick scan" / "review下" / "检查下" | L1 | External model scan + agent deep pass | 5-10 min |
| "audit" / "deep audit" / "审计下" / "排查下" |
L2 | Full bug-audit flow (or built-in fallback) | 30-60 min |
| "pre-deploy check" / "上线前检查" | L1→L2 | L1 scan → record hotspots → L2 audit → hotspot gap check | 40-70 min |
| "cross-validate" / "highest level" / "交叉验证" | L3 | Dual independent audits + compare + adversarial test | 60-90 min |
Level 1: Quick Scan (core of codex-review)
Flow
- 1. Gather code — local
read, git clone <url>, server scp, user-pasted snippet, or PR diff - Exclude — node_modules/, .git/, package-lock.json, dist/, *.db, pycache/, vendor/
- Round 1 — send to external model API for automated scan (skipped if no API key)
- Round 2 — current agent does deep supplementary pass
- Merge & dedup — output severity-graded report
- Write hotspot file (for L1→L2 handoff)
External Model API Call
CODEBLOCK0
Fallback: If API call fails or times out (120s), skip Round 1 and complete with agent-only audit.
System Prompt (L1 External Scan)
CODEBLOCK1
Agent Round 2 — Universal Checklist
- - [ ] Cross-file logic consistency (imports, exports, shared state)
- [ ] Authentication & authorization bypass
- [ ] Race conditions (concurrent requests, DB write conflicts)
- [ ] Unhandled exceptions / missing error boundaries
- [ ] Input validation & sanitization (SQL injection, XSS, path traversal)
- [ ] Memory/resource leaks (unclosed connections, event listener buildup)
- [ ] Sensitive data exposure (keys in code, logs, error messages)
- [ ] Timezone handling (UTC vs local)
- [ ] Dependency vulnerabilities (outdated packages, known CVEs)
Agent Round 2 — Tech-Stack Specific (auto-detect & apply)
Node.js/Express:
- - [ ] SQLite pitfalls (DEFAULT doesn't support functions, double-quote = column name)
- [ ] Middleware ordering (auth before route handlers)
- [ ] pm2/cluster mode compatibility
Python/Django/Flask:
- - [ ] ORM N+1 queries
- [ ] CSRF protection enabled
- [ ] Debug mode in production
Frontend (React/Vue/vanilla):
- - [ ] innerHTML / dangerouslySetInnerHTML without sanitization
- [ ] WebView compatibility (WeChat, in-app browsers)
- [ ] Nginx sub-path / base URL issues
Other stacks: adapt checklist to detected technology.
Code Volume Control
- - Single API request: backend core files only (server + routes + db + config)
- Send frontend as a second batch if needed
- Very large projects (>50 files): summarize file tree first, then scan in priority order
Hotspot File (L1→L2 handoff)
After L1, write issue summary to
${TMPDIR:-/tmp}/codex-review-hotspots.json:
CODEBLOCK2
This file is only used internally for L1→L2 handoff. bug-audit is unaware of it.
Level 2: Deep Audit
Flow (bug-audit available)
- 1. Read bug-audit's SKILL.md and execute its full flow (6 Phases)
- bug-audit itself is never modified
- Agent strictly follows bug-audit's specification
Flow (bug-audit NOT available — built-in fallback)
- 1. Phase 1: Project Dissection — read all source files, build dependency graph
- Phase 2: Build Check Matrix — generate project-specific checklist from actual code patterns
- Phase 3: Exhaustive Verification — verify every checklist item against real code
- Phase 4: Reproduce — for each finding, trace the exact execution path
- Phase 5: Report — output full severity-graded report
- Phase 6: Fix Suggestions — provide concrete code patches
Level 1→2 Cascade: Pre-Deploy Check
Flow
- 1. Execute L1 quick scan
- Write hotspot file
- Execute L2 (bug-audit or fallback)
- After L2, agent does hotspot gap analysis:
- Read hotspot file
- Check if L2 report covers each L1 hotspot
- Uncovered hotspots → targeted deep analysis, add to report
- L1 vs L2 conclusions conflict → flag for manual review
- 5. Output final merged report
Level 3: Cross-Validation (highest level)
Flow
CODEBLOCK3
Adversarial Test Prompt
You are a security researcher. The following security fixes were applied to a project.
For each fix, analyze:
1. Can the fix be bypassed? How?
2. Does the fix introduce new vulnerabilities?
3. Are there edge cases the fix doesn't cover?
Be adversarial and thorough. Output language: match the user's language.
Report Format (all levels)
CODEBLOCK5
User Options
Users can customize behavior by saying:
- - "only scan backend" / "只扫后端" → skip frontend files
- "ignore LOW" / "忽略低级别" → filter out LOW severity
- "output in English/Chinese" → control report language
- "scan this PR" / "审这个PR" → fetch PR diff instead of full codebase
- "skip external model" / "不用外部模型" → agent-only audit
Notes
- 1. External API timeout: 120 seconds. On failure, skip that round — agent completes independently
- Large projects: split into batches (backend → frontend → config)
- Long reports: split across multiple messages, adapted to current channel
- L2/L3 bug-audit execution strictly follows its own SKILL.md — no modifications or shortcuts
- Hotspot file is ephemeral — overwritten each L1 run, not persisted
- All secrets/keys must come from env vars or user config — never hardcoded in this skill
Codex Review — 三层代码质量防御体系
统一编排层:根据触发短语选择审计深度。bug-audit 作为独立技能被调用——绝不修改。
安全与隐私
- - 默认只读:本技能仅读取你的项目文件进行分析。它不会在任何地方修改、删除或上传你的代码。
- 可选外部模型:L1/L3 可使用外部代码审查 API(兼容 OpenAI)进行二次验证。这是自愿选择——如果未配置 API 密钥,该技能仅使用智能体审查即可正常工作。
- 仅通过环境变量传递凭证:API 密钥从 CODEXREVIEWAPI_KEY 环境变量加载。绝不硬编码、绝不记录、绝不存储。
- 仅本地产物:热点文件写入系统临时目录并自动清理。分析结果不进行网络传输。
- 无数据泄露:发送到外部 API 的代码片段仅限于被审查的文件。不进行遥测、不进行分析、不向配置的审查模型之外的第三方共享数据。
前置条件
- - 外部模型 API(可选,用于 L1 第1轮和 L3):任何兼容 OpenAI 的端点。
- 设置环境变量:CODEX
REVIEWAPI
BASE(默认:https://api.openai.com/v1)、CODEXREVIEW
APIKEY、CODEX
REVIEWMODEL(默认:gpt-4o)
- 无此配置也能工作——回退到仅智能体审计
- - bug-audit 技能(可选):L2/L3 需要。无此技能时,L2 使用内置回退方案。
- curl:用于 API 调用(macOS/Linux 标准工具)
触发映射
| 用户输入 | 级别 | 功能 | 预计时间 |
|---|
| review / quick scan / review下 / 检查下 | L1 | 外部模型扫描 + 智能体深度检查 | 5-10 分钟 |
| audit / deep audit / 审计下 / 排查下 |
L2 | 完整 bug-audit 流程(或内置回退方案) | 30-60 分钟 |
| pre-deploy check / 上线前检查 | L1→L2 | L1 扫描 → 记录热点 → L2 审计 → 热点缺口检查 | 40-70 分钟 |
| cross-validate / highest level / 交叉验证 | L3 | 双重独立审计 + 对比 + 对抗性测试 | 60-90 分钟 |
级别 1:快速扫描(codex-review 核心)
流程
- 1. 收集代码 — 本地 read、git clone 、服务器 scp、用户粘贴的代码片段或 PR diff
- 排除 — node_modules/、.git/、package-lock.json、dist/、*.db、pycache/、vendor/
- 第1轮 — 发送到外部模型 API 进行自动扫描(无 API 密钥则跳过)
- 第2轮 — 当前智能体执行深度补充检查
- 合并与去重 — 输出严重程度分级报告
- 写入热点文件(用于 L1→L2 交接)
外部模型 API 调用
bash
curl -s ${CODEXREVIEWAPI_BASE:-https://api.openai.com/v1}/chat/completions \
-H Authorization: Bearer ${CODEXREVIEWAPI_KEY} \
-H Content-Type: application/json \
-d {
model: ${CODEXREVIEWMODEL:-gpt-4o},
messages: [
{role: system, content: SYSTEMPROMPT>},
{role: user, content: }
],
temperature: 0.2,
max_tokens: 6000
}
回退方案:如果 API 调用失败或超时(120秒),跳过第1轮,仅使用智能体审计完成。
系统提示(L1 外部扫描)
你是一位代码审查专家。找出所有错误和安全问题:
- 1. 严重 — 安全漏洞(XSS、注入、认证绕过)、崩溃性错误
- 高 — 逻辑错误、竞态条件、未处理的异常
- 中 — 缺少验证、边界情况、性能问题
- 低 — 代码风格、死代码、微小改进
每项需包含:严重程度、文件+行号、问题描述、修复方案及代码片段。
关注真实错误,而非风格意见。输出语言:与用户语言一致。
智能体第2轮 — 通用检查清单
- - [ ] 跨文件逻辑一致性(导入、导出、共享状态)
- [ ] 认证与授权绕过
- [ ] 竞态条件(并发请求、数据库写入冲突)
- [ ] 未处理的异常 / 缺少错误边界
- [ ] 输入验证与清理(SQL注入、XSS、路径遍历)
- [ ] 内存/资源泄漏(未关闭的连接、事件监听器堆积)
- [ ] 敏感数据暴露(代码中的密钥、日志、错误消息)
- [ ] 时区处理(UTC 与本地时间)
- [ ] 依赖漏洞(过时的包、已知 CVE)
智能体第2轮 — 技术栈特定(自动检测并应用)
Node.js/Express:
- - [ ] SQLite 陷阱(DEFAULT 不支持函数,双引号=列名)
- [ ] 中间件顺序(认证在路由处理器之前)
- [ ] pm2/集群模式兼容性
Python/Django/Flask:
- - [ ] ORM N+1 查询
- [ ] CSRF 保护已启用
- [ ] 生产环境中的调试模式
前端(React/Vue/原生):
- - [ ] 未经过滤的 innerHTML / dangerouslySetInnerHTML
- [ ] WebView 兼容性(微信、应用内浏览器)
- [ ] Nginx 子路径 / base URL 问题
其他技术栈: 根据检测到的技术调整检查清单。
代码量控制
- - 单次 API 请求:仅后端核心文件(服务器 + 路由 + 数据库 + 配置)
- 如有需要,将前端作为第二批发送
- 非常大的项目(超过50个文件):先汇总文件树,然后按优先级顺序扫描
热点文件(L1→L2 交接)
L1 完成后,将问题摘要写入 ${TMPDIR:-/tmp}/codex-review-hotspots.json:
json
{
project: my-project,
timestamp: 2026-03-05T22:00:00,
hotspots: [
{file: routes/admin.js, severity: CRITICAL, brief: 通过 localhost 绕过管理员认证},
{file: routes/game.js, severity: CRITICAL, brief: 分数提交无服务器验证}
]
}
此文件仅用于 L1→L2 内部交接。bug-audit 不知晓此文件。
级别 2:深度审计
流程(有 bug-audit)
- 1. 读取 bug-audit 的 SKILL.md 并执行其完整流程(6个阶段)
- bug-audit 本身绝不修改
- 智能体严格遵循 bug-audit 的规范
流程(无 bug-audit — 内置回退方案)
- 1. 阶段 1:项目剖析 — 读取所有源文件,构建依赖图
- 阶段 2:构建检查矩阵 — 根据实际代码模式生成项目特定检查清单
- 阶段 3:穷举验证 — 对照实际代码验证每个检查清单项
- 阶段 4:复现 — 对每个发现,追踪确切的执行路径
- 阶段 5:报告 — 输出完整的严重程度分级报告
- 阶段 6:修复建议 — 提供具体的代码补丁
级别 1→2 级联:预部署检查
流程
- 1. 执行 L1 快速扫描
- 写入热点文件
- 执行 L2(bug-audit 或回退方案)
- L2 完成后,智能体执行热点缺口分析:
- 读取热点文件
- 检查 L2 报告是否覆盖每个 L1 热点
- 未覆盖的热点 → 针对性深度分析,添加到报告
- L1 与 L2 结论冲突 → 标记为需要人工审查
- 5. 输出最终合并报告
级别 3:交叉验证(最高级别)
流程
步骤 1:外部模型独立审计
→ 将完整代码发送到外部 API,附带详细系统提示