Debug Detective — Systematic Debugging Methodology
Find and fix bugs efficiently across the full stack using structured investigation techniques.
1. Debugging Mindset
1.1 The scientific method for debugging
CODEBLOCK0
1.2 Key debugging principles
- - The bug is never where you think it is. Widen your search radius before going deep.
- Reproduce first, fix second. A bug you can't reproduce is a bug you can't verify as fixed.
- Change one thing at a time. Multiple simultaneous changes make it impossible to identify the fix.
- Trust nothing. Verify assumptions — check that the code you're reading is the code that's running.
- Read the error message. Fully. Including the stack trace. Including the "caused by" chain.
1.3 Cognitive biases that hinder debugging
| Bias | How it hurts | Counter-strategy |
|---|
| Confirmation bias | You look for evidence supporting your theory, ignore contradicting evidence | Actively try to disprove your hypothesis |
| Anchoring |
First theory dominates even when evidence points elsewhere | Write down 3+ hypotheses before investigating any |
|
Recency bias | "I just changed X, so X must be the problem" | Check git log — the bug might predate your change |
|
Availability bias | "Last time it was a race condition, so it must be again" | Consider all categories: data, logic, timing, config, environment |
|
Sunk cost | "I've spent 2 hours on this theory, it must be right" | Set a timebox: 30 min per hypothesis, then move on |
1.4 Rubber duck debugging
Explain the problem out loud (to a duck, a colleague, or a text file):
- 1. State what the code is supposed to do
- Walk through the code line by line, explaining each step
- The act of articulating often reveals the gap between expectation and reality
1.5 Feynman technique
- 1. Write the bug description as if explaining to a non-programmer
- Identify gaps in your explanation — those are gaps in your understanding
- Go back to the code to fill those gaps
- Simplify your explanation further
2. Systematic Debugging Workflow
2.1 The six-step process
CODEBLOCK1
2.2 Reproducing the bug
Minimal reproduction checklist:
- 1. Start from a clean state (fresh install, empty database, incognito browser)
- List exact steps to trigger the bug
- Note the environment: OS, runtime version, browser, config
- Strip away unrelated code until the bug is isolated
- If intermittent: identify the timing/concurrency pattern
CODEBLOCK2
2.3 Binary search debugging
When you don't know where the bug is, bisect:
Code bisection:
CODEBLOCK3
Data bisection:
CODEBLOCK4
Config bisection:
CODEBLOCK5
2.4 Reading stack traces
CODEBLOCK6
Read bottom-up: The bottom shows where the call originated. The top shows where it failed. The line src/services/user.ts:42 is where to look, but the cause might be in order.ts:87 (passing undefined).
3. Git Bisect
3.1 Manual bisect
CODEBLOCK7
3.2 Automated bisect
CODEBLOCK8
3.3 Bisect with skip
CODEBLOCK9
4. Frontend Debugging
4.1 Chrome DevTools — Console power features
CODEBLOCK10
4.2 Sources panel — Advanced breakpoints
| Breakpoint type | How to set | Use case |
|---|
| Line breakpoint | Click line number | Stop at specific line |
| Conditional |
Right-click line → "Add conditional" | Stop only when condition is true |
| Logpoint | Right-click → "Add logpoint" | Log without modifying code |
| DOM breakpoint | Elements panel → right-click → "Break on" | Stop when DOM changes |
| XHR breakpoint | Sources → XHR Breakpoints → add URL pattern | Stop on matching fetch/XHR |
| Event listener | Sources → Event Listener Breakpoints | Stop on click, keypress, etc. |
| Exception | Sources → pause icon → "Pause on exceptions" | Stop on any thrown error |
4.3 Performance panel — Finding slow code
CODEBLOCK11
4.4 Memory panel — Finding leaks
CODEBLOCK12
4.5 CSS debugging techniques
CODEBLOCK13
4.6 React DevTools profiler
CODEBLOCK14
5. Node.js / JavaScript Debugging
5.1 Inspect flag
CODEBLOCK15
5.2 VS Code launch.json
CODEBLOCK16
5.3 Memory leak hunting in Node.js
CODEBLOCK17
5.4 Debugging async code
CODEBLOCK18
5.5 Why is Node.js not exiting?
CODEBLOCK19
CODEBLOCK20
6. Python Debugging
6.1 Built-in debugger
CODEBLOCK21
6.2 ipdb (enhanced debugger)
CODEBLOCK22
CODEBLOCK23
6.3 py-spy — Production profiling
CODEBLOCK24
6.4 Memory profiling
CODEBLOCK25
CODEBLOCK26
6.5 tracemalloc — Built-in memory tracking
CODEBLOCK27
7. System-Level Debugging
7.1 strace — Trace system calls
CODEBLOCK28
Common findings:
CODEBLOCK29
7.2 Process inspection
CODEBLOCK30
7.3 tcpdump — Network packet capture
CODEBLOCK31
8. Database Debugging
8.1 EXPLAIN ANALYZE
CODEBLOCK32
Reading the output:
CODEBLOCK33
Key things to look for:
- -
Seq Scan on large tables → missing index - INLINECODE3 much larger than
rows estimate → stale statistics (ANALYZE table) - INLINECODE6 → N+1 query pattern
- INLINECODE7 → not enough work_mem
8.2 Finding slow queries
CODEBLOCK34
8.3 Lock debugging
CODEBLOCK35
8.4 N+1 query detection
CODEBLOCK36
9. Network Debugging
9.1 curl deep dive
CODEBLOCK37
9.2 DNS debugging
CODEBLOCK38
9.3 SSL/TLS debugging
CODEBLOCK39
9.4 CORS debugging checklist
CODEBLOCK40
10. Memory Debugging
10.1 Common memory leak patterns
| Language | Common cause | Detection |
|---|
| JavaScript | Event listeners not removed | Heap snapshot comparison |
| JavaScript |
Closures capturing large objects | Heap snapshot retainer tree |
| JavaScript | Detached DOM nodes | DevTools Memory → "Detached" filter |
| JavaScript | Growing Map/Set/Array (cache without eviction) | Monitor
process.memoryUsage() |
| Python | Circular references with
__del__ |
gc.get_referrers(),
objgraph |
| Python | Global/module-level caches |
tracemalloc |
| Go | Goroutine leaks |
runtime.NumGoroutine(), pprof |
| Go | Unclosed channels |
runtime.Stack() |
10.2 JavaScript memory leak debugging workflow
CODEBLOCK41
10.3 Container OOM debugging
CODEBLOCK42
11. Performance Profiling
11.1 Flame graphs
CODEBLOCK43
Reading flame graphs:
- - X-axis = proportion of time (NOT chronological)
- Y-axis = call stack depth (bottom = entry point, top = leaf functions)
- Wide bars = functions consuming the most CPU time
- Look for "plateaus" — wide, flat tops indicate hot functions
11.2 Core Web Vitals debugging
| Metric | Target | How to debug |
|---|
| LCP (Largest Contentful Paint) | < 2.5s | DevTools → Performance → "LCP" marker; check image loading, font loading, render-blocking resources |
| INP (Interaction to Next Paint) |
< 200ms | DevTools → Performance → click "Interactions"; look for long tasks blocking the main thread |
|
CLS (Cumulative Layout Shift) | < 0.1 | DevTools → Performance → "Layout Shifts"; add explicit width/height to images and ads |
CODEBLOCK44
11.3 Load testing for debugging
CODEBLOCK45
12. Logging Strategies
12.1 Structured logging
CODEBLOCK46
12.2 Log levels
| Level | When to use | Example |
|---|
| INLINECODE15 | Application cannot continue | Database connection lost permanently |
| INLINECODE16 |
Operation failed, needs attention | Payment processing failed |
|
warn | Unexpected but handled | Rate limit approaching threshold |
|
info | Significant business events | User registered, order placed |
|
debug | Detailed technical info | SQL query executed, cache hit/miss |
|
trace | Very fine-grained | Function entry/exit, variable values |
12.3 Correlation IDs
CODEBLOCK47
12.4 OpenTelemetry basics
CODEBLOCK48
13. Debugging in Production
13.1 Debug without redeploying
CODEBLOCK49
13.2 Safe debug endpoints
CODEBLOCK50
13.3 Sentry error tracking
CODEBLOCK51
14. Common Pitfalls
| Pitfall | Symptom | Investigation Approach |
|---|
| Debugging the wrong environment | Fix works locally, not in staging | Compare env vars, node versions, OS; use printenv diff |
| Stale code running |
Changes seem to have no effect | Hard refresh (Ctrl+Shift+R); restart dev server; check build output timestamps |
| Caching hiding the bug | Bug appears intermittently | Disable all caches (browser, CDN, Redis, ORM query cache); test in incognito |
| Race condition | Bug only happens under load or "randomly" | Add logging with timestamps; use
--inspect-brk to slow execution; test with concurrent requests |
| Timezone bug | Dates off by hours; works in some regions | Log
new Date().toISOString() at each step; check DB timezone settings; use UTC everywhere |
| Encoding issue | Garbled text, emoji broken, special chars wrong | Check Content-Type headers; verify UTF-8 at every boundary (DB, API, file I/O) |
| Silent error swallowed | Code does nothing; no error visible | Search for empty catch blocks; add
.catch(console.error) to all promises |
| Missing await | Function returns Promise instead of value | TypeScript strict mode; search for
async functions without
await on calls |
| Circular dependency | Module is undefined at import time | Check import order; use dynamic imports; restructure to break the cycle |
| DNS resolution failure | "ENOTFOUND" errors in containers | Check
/etc/resolv.conf; verify DNS from inside the container with
nslookup |
| Connection pool exhaustion | Timeouts after running fine for hours | Monitor active connections; check for uncommitted transactions; add pool max/idle settings |
| Off-by-one error | Wrong count, missing first/last item | Log array lengths and indices; test boundary values: 0, 1, N-1, N |
| Environment variable missing |
undefined used as string, silent failures | Log all env vars on startup (redacted); use zod to validate env at boot |
| File descriptor leak | "EMFILE: too many open files" |
lsof -p PID | wc -l; check for unclosed streams, database connections, or file handles |
| Wrong dependency version | Code works in one project but not another | Check
npm ls package-name; delete
node_modules and reinstall; check for hoisting issues |
| Debugging minified code | Stack traces show line 1, column 43827 | Enable source maps; upload them to Sentry; use
--no-minify for debugging |
Debug Detective — 系统性调试方法论
使用结构化调查技术,高效地跨全栈查找和修复错误。
1. 调试心态
1.1 调试的科学方法
- 1. 观察 — 到底发生了什么?(症状、错误信息、日志)
- 假设 — 什么可能导致这个问题?(列出3个以上可能性)
- 预测 — 如果假设X成立,那么Y应该为真
- 测试 — 设计最小的实验来验证预测
- 分析 — 测试是证实还是否定了假设?
- 重复 — 如果被否定,转向下一个假设;如果被证实,修复并验证
1.2 关键调试原则
- - 错误绝不在你想象的位置。 在深入之前先扩大搜索范围。
- 先复现,后修复。 无法复现的错误,也无法验证是否已修复。
- 一次只改一个东西。 同时进行多项修改,将无法确定是哪个改动修复了问题。
- 不要相信任何东西。 验证假设——检查你正在阅读的代码是否就是正在运行的代码。
- 阅读错误信息。 完整地阅读。包括堆栈跟踪。包括由...引起的链条。
1.3 妨碍调试的认知偏差
| 偏差 | 危害 | 应对策略 |
|---|
| 确认偏差 | 你寻找支持自己理论的证据,忽略矛盾的证据 | 主动尝试推翻你的假设 |
| 锚定效应 |
即使证据指向别处,第一个理论仍然占主导地位 | 在调查任何假设之前,先写下3个以上假设 |
|
近因偏差 | 我刚改了X,所以X一定是问题所在 | 检查git日志——错误可能在你修改之前就存在 |
|
可得性偏差 | 上次是竞态条件,所以这次也一定是 | 考虑所有类别:数据、逻辑、时序、配置、环境 |
|
沉没成本 | 我已经在这个理论上花了2小时,它一定是对的 | 设定时间盒:每个假设30分钟,然后继续前进 |
1.4 橡皮鸭调试法
大声解释问题(对着一只鸭子、同事或文本文件):
- 1. 说明代码应该做什么
- 逐行浏览代码,解释每一步
- 阐述的过程往往会揭示期望与现实之间的差距
1.5 费曼技巧
- 1. 像向非程序员解释一样,写下错误描述
- 找出解释中的漏洞——这些就是你理解上的漏洞
- 回到代码中填补这些漏洞
- 进一步简化你的解释
2. 系统性调试工作流
2.1 六步流程
┌─────────────┐
│ 1. 复现 │ ← 你能可靠地触发这个错误吗?
└──────┬──────┘
▼
┌─────────────┐
│ 2. 隔离 │ ← 缩小范围:哪个组件、输入或路径?
└──────┬──────┘
▼
┌─────────────┐
│ 3. 识别 │ ← 找到根本原因
└──────┬──────┘
▼
┌─────────────┐
│ 4. 修复 │ ← 最小化、有针对性的修改
└──────┬──────┘
▼
┌─────────────┐
│ 5. 验证 │ ← 错误不再复现;没有回归问题
└──────┬──────┘
▼
┌─────────────┐
│ 6. 预防 │ ← 添加测试、监控或文档
└─────────────┘
2.2 复现错误
最小复现清单:
- 1. 从干净状态开始(全新安装、空数据库、无痕浏览器)
- 列出触发错误的确切步骤
- 记录环境:操作系统、运行时版本、浏览器、配置
- 剥离无关代码,直到错误被隔离
- 如果是间歇性错误:识别时序/并发模式
bash
创建一个最小复现项目
mkdir bug-repro && cd bug-repro
npm init -y
只添加演示错误所需的最小依赖
npm install problematic-library@1.2.3
编写能触发问题的最小脚本
2.3 二分查找调试
当不知道错误在哪里时,使用二分法:
代码二分法:
// 在可疑代码的中点添加一个return/exit
// 如果错误消失 → 错误在中点之后
// 如果错误仍然存在 → 错误在中点之前
// 在缩小后的半段上重复
数据二分法:
bash
如果大量输入导致错误,将其分成两半
head -n 500 input.csv > first_half.csv
tail -n 500 input.csv > second_half.csv
测试每一半——哪个触发了错误?
配置二分法:
bash
注释掉一半配置,测试
缩小到哪个配置选项导致了问题
2.4 阅读堆栈跟踪
Error: Cannot read properties of undefined (reading email)
at getUserEmail (src/services/user.ts:42:18) ← 崩溃的位置
at processOrder (src/services/order.ts:87:24) ← 谁调用了它
at OrderController.create (src/controllers/order.ts:23:5) ← 入口点
at Layer.handle (node_modules/express/lib/router/layer.js:95:5)
从下往上读: 底部显示调用的起源。顶部显示失败的位置。src/services/user.ts:42 这一行是查看的位置,但原因可能在 order.ts:87(传入了 undefined)。
3. Git Bisect
3.1 手动二分查找
bash
开始二分查找
git bisect start
将当前(有问题的)提交标记为bad
git bisect bad
标记一个已知正常的提交(例如,上一个发布标签)
git bisect good v2.0.0
Git 检出了中间点——测试它
如果这个提交有问题:
git bisect bad
如果这个提交正常:
git bisect good
重复直到 Git 识别出第一个有问题的提交
Git 输出:abc1234 is the first bad commit
完成——重置
git bisect reset
3.2 自动二分查找
bash
自动化:提供一个测试脚本,退出码0(正常)或1(有问题)
git bisect start HEAD v2.0.0
git bisect run npm test
或者使用自定义脚本
git bisect run bash -c
npm run build 2>/dev/null && \
node -e
const { buggyFunction } = require(\./dist\);
const result = buggyFunction(\test-input\);
process.exit(result === expected ? 0 : 1);
完成后重置
git bisect reset
3.3 带跳过的二分查找
bash
如果某个提交无法测试(例如,由于无关原因构建失败)
git bisect skip
跳过一段无法测试的提交范围
git bisect skip v2.0.1..v2.0.5
4. 前端调试
4.1 Chrome DevTools — 控制台高级功能
js
// $0 — 引用 Elements 面板中当前选中的元素
$0.textContent
// $$() — querySelectorAll 快捷方式
$$(button.primary).length
// copy() — 将任意值复制到剪贴板
copy(JSON.stringify(data, null, 2))
// monitor() — 记录对函数的所有调用
monitor(fetch)
// unmonitor(fetch) 停止
// monitorEvents() — 记录元素上的所有事件
monitorEvents($0, click)
// unmonitorEvents($0) 停止
// queryObjects() — 查找构造函数的所有实例
queryObjects(Promise) // 查找所有存活的 Promise
// table() — 将数组/对象显示为表格
console.table(users, [name, email, role])
// time/timeEnd — 测量执行时间
console.time(render)
renderComponent()
console.timeEnd(render) // render: 42.3ms
// group — 组织相关日志
console.group(API Request)
console.log(URL:, url)
console.log(Method:, method)
console.log(Body:, body)
console.groupEnd()
// assert — 仅在条件失败时记录
console.assert(user.id, User ID is missing, user)
4.2 Sources 面板 — 高级断点
| 断点类型 | 设置方法 | 使用场景 |
|---|
| 行断点 | 点击行号 | 在特定行停止 |
| 条件断点 |