Debug Detective — Systematic Debugging Methodology

Find and fix bugs efficiently across the full stack using structured investigation techniques.

1. Debugging Mindset

1.1 The scientific method for debugging

CODEBLOCK0

1.2 Key debugging principles

- The bug is never where you think it is. Widen your search radius before going deep.
Reproduce first, fix second. A bug you can't reproduce is a bug you can't verify as fixed.
Change one thing at a time. Multiple simultaneous changes make it impossible to identify the fix.
Trust nothing. Verify assumptions — check that the code you're reading is the code that's running.
Read the error message. Fully. Including the stack trace. Including the "caused by" chain.

1.3 Cognitive biases that hinder debugging

Bias	How it hurts	Counter-strategy
Confirmation bias	You look for evidence supporting your theory, ignore contradicting evidence	Actively try to disprove your hypothesis
Anchoring

First theory dominates even when evidence points elsewhere | Write down 3+ hypotheses before investigating any | | Recency bias | "I just changed X, so X must be the problem" | Check git log — the bug might predate your change | | Availability bias | "Last time it was a race condition, so it must be again" | Consider all categories: data, logic, timing, config, environment | | Sunk cost | "I've spent 2 hours on this theory, it must be right" | Set a timebox: 30 min per hypothesis, then move on |

1.4 Rubber duck debugging

Explain the problem out loud (to a duck, a colleague, or a text file):

1. State what the code is supposed to do
Walk through the code line by line, explaining each step
The act of articulating often reveals the gap between expectation and reality

1.5 Feynman technique

1. Write the bug description as if explaining to a non-programmer
Identify gaps in your explanation — those are gaps in your understanding
Go back to the code to fill those gaps
Simplify your explanation further

2. Systematic Debugging Workflow

2.1 The six-step process

CODEBLOCK1

2.2 Reproducing the bug

Minimal reproduction checklist:

1. Start from a clean state (fresh install, empty database, incognito browser)
List exact steps to trigger the bug
Note the environment: OS, runtime version, browser, config
Strip away unrelated code until the bug is isolated
If intermittent: identify the timing/concurrency pattern

CODEBLOCK2

2.3 Binary search debugging

When you don't know where the bug is, bisect:

Code bisection:
CODEBLOCK3

Data bisection:
CODEBLOCK4

Config bisection:
CODEBLOCK5

2.4 Reading stack traces

CODEBLOCK6

Read bottom-up: The bottom shows where the call originated. The top shows where it failed. The line src/services/user.ts:42 is where to look, but the cause might be in order.ts:87 (passing undefined).

3. Git Bisect

3.1 Manual bisect

CODEBLOCK7

3.2 Automated bisect

CODEBLOCK8

3.3 Bisect with skip

CODEBLOCK9

4. Frontend Debugging

4.1 Chrome DevTools — Console power features

CODEBLOCK10

4.2 Sources panel — Advanced breakpoints

Breakpoint type	How to set	Use case
Line breakpoint	Click line number	Stop at specific line
Conditional

4.3 Performance panel — Finding slow code

CODEBLOCK11

4.4 Memory panel — Finding leaks

CODEBLOCK12

4.5 CSS debugging techniques

CODEBLOCK13

4.6 React DevTools profiler

CODEBLOCK14

5. Node.js / JavaScript Debugging

5.1 Inspect flag

CODEBLOCK15

5.2 VS Code launch.json

CODEBLOCK16

5.3 Memory leak hunting in Node.js

CODEBLOCK17

5.4 Debugging async code

CODEBLOCK18

5.5 Why is Node.js not exiting?

CODEBLOCK19

CODEBLOCK20

6. Python Debugging

6.1 Built-in debugger

CODEBLOCK21

6.2 ipdb (enhanced debugger)

CODEBLOCK22

CODEBLOCK23

6.3 py-spy — Production profiling

CODEBLOCK24

6.4 Memory profiling

CODEBLOCK25

CODEBLOCK26

6.5 tracemalloc — Built-in memory tracking

CODEBLOCK27

7. System-Level Debugging

7.1 strace — Trace system calls

CODEBLOCK28

Common findings:
CODEBLOCK29

7.2 Process inspection

CODEBLOCK30

7.3 tcpdump — Network packet capture

CODEBLOCK31

8. Database Debugging

8.1 EXPLAIN ANALYZE

CODEBLOCK32

Reading the output:
CODEBLOCK33

Key things to look for:

- Seq Scan on large tables → missing index
INLINECODE3 much larger than rows estimate → stale statistics (ANALYZE table)
INLINECODE6 → N+1 query pattern
INLINECODE7 → not enough work_mem

8.2 Finding slow queries

CODEBLOCK34

8.3 Lock debugging

CODEBLOCK35

8.4 N+1 query detection

CODEBLOCK36

9. Network Debugging

9.1 curl deep dive

CODEBLOCK37

9.2 DNS debugging

CODEBLOCK38

9.3 SSL/TLS debugging

CODEBLOCK39

9.4 CORS debugging checklist

CODEBLOCK40

10. Memory Debugging

10.1 Common memory leak patterns

Language	Common cause	Detection
JavaScript	Event listeners not removed	Heap snapshot comparison
JavaScript

10.2 JavaScript memory leak debugging workflow

CODEBLOCK41

10.3 Container OOM debugging

CODEBLOCK42

11. Performance Profiling

11.1 Flame graphs

CODEBLOCK43

Reading flame graphs:

- X-axis = proportion of time (NOT chronological)
Y-axis = call stack depth (bottom = entry point, top = leaf functions)
Wide bars = functions consuming the most CPU time
Look for "plateaus" — wide, flat tops indicate hot functions

11.2 Core Web Vitals debugging

Metric	Target	How to debug
LCP (Largest Contentful Paint)	< 2.5s	DevTools → Performance → "LCP" marker; check image loading, font loading, render-blocking resources
INP (Interaction to Next Paint)

< 200ms | DevTools → Performance → click "Interactions"; look for long tasks blocking the main thread | | CLS (Cumulative Layout Shift) | < 0.1 | DevTools → Performance → "Layout Shifts"; add explicit width/height to images and ads |

CODEBLOCK44

11.3 Load testing for debugging

CODEBLOCK45

12. Logging Strategies

12.1 Structured logging

CODEBLOCK46

12.2 Log levels

Level	When to use	Example
INLINECODE15	Application cannot continue	Database connection lost permanently
INLINECODE16

12.3 Correlation IDs

CODEBLOCK47

12.4 OpenTelemetry basics

CODEBLOCK48

13. Debugging in Production

13.1 Debug without redeploying

CODEBLOCK49

13.2 Safe debug endpoints

CODEBLOCK50

13.3 Sentry error tracking

CODEBLOCK51

14. Common Pitfalls

Pitfall	Symptom	Investigation Approach
Debugging the wrong environment	Fix works locally, not in staging	Compare env vars, node versions, OS; use `printenv` diff
Stale code running

Changes seem to have no effect | Hard refresh (Ctrl+Shift+R); restart dev server; check build output timestamps | | Caching hiding the bug | Bug appears intermittently | Disable all caches (browser, CDN, Redis, ORM query cache); test in incognito | | Race condition | Bug only happens under load or "randomly" | Add logging with timestamps; use --inspect-brk to slow execution; test with concurrent requests | | Timezone bug | Dates off by hours; works in some regions | Log new Date().toISOString() at each step; check DB timezone settings; use UTC everywhere | | Encoding issue | Garbled text, emoji broken, special chars wrong | Check Content-Type headers; verify UTF-8 at every boundary (DB, API, file I/O) | | Silent error swallowed | Code does nothing; no error visible | Search for empty catch blocks; add .catch(console.error) to all promises | | Missing await | Function returns Promise instead of value | TypeScript strict mode; search for async functions without await on calls | | Circular dependency | Module is undefined at import time | Check import order; use dynamic imports; restructure to break the cycle | | DNS resolution failure | "ENOTFOUND" errors in containers | Check /etc/resolv.conf; verify DNS from inside the container with nslookup | | Connection pool exhaustion | Timeouts after running fine for hours | Monitor active connections; check for uncommitted transactions; add pool max/idle settings | | Off-by-one error | Wrong count, missing first/last item | Log array lengths and indices; test boundary values: 0, 1, N-1, N | | Environment variable missing | undefined used as string, silent failures | Log all env vars on startup (redacted); use zod to validate env at boot | | File descriptor leak | "EMFILE: too many open files" | lsof -p PID | wc -l; check for unclosed streams, database connections, or file handles | | Wrong dependency version | Code works in one project but not another | Check npm ls package-name; delete node_modules and reinstall; check for hoisting issues | | Debugging minified code | Stack traces show line 1, column 43827 | Enable source maps; upload them to Sentry; use --no-minify for debugging |

Debug Detective — 系统性调试方法论

使用结构化调查技术，高效地跨全栈查找和修复错误。

1. 调试心态

1.1 调试的科学方法

1. 观察 — 到底发生了什么？（症状、错误信息、日志）
假设 — 什么可能导致这个问题？（列出3个以上可能性）
预测 — 如果假设X成立，那么Y应该为真
测试 — 设计最小的实验来验证预测
分析 — 测试是证实还是否定了假设？
重复 — 如果被否定，转向下一个假设；如果被证实，修复并验证

1.2 关键调试原则

- 错误绝不在你想象的位置。 在深入之前先扩大搜索范围。
先复现，后修复。 无法复现的错误，也无法验证是否已修复。
一次只改一个东西。 同时进行多项修改，将无法确定是哪个改动修复了问题。
不要相信任何东西。 验证假设——检查你正在阅读的代码是否就是正在运行的代码。
阅读错误信息。 完整地阅读。包括堆栈跟踪。包括由...引起的链条。

1.3 妨碍调试的认知偏差

偏差	危害	应对策略
确认偏差	你寻找支持自己理论的证据，忽略矛盾的证据	主动尝试推翻你的假设
锚定效应

1.4 橡皮鸭调试法

大声解释问题（对着一只鸭子、同事或文本文件）：

1. 说明代码应该做什么
逐行浏览代码，解释每一步
阐述的过程往往会揭示期望与现实之间的差距

1.5 费曼技巧

1. 像向非程序员解释一样，写下错误描述
找出解释中的漏洞——这些就是你理解上的漏洞
回到代码中填补这些漏洞
进一步简化你的解释

2. 系统性调试工作流

2.1 六步流程

┌─────────────┐
│ 1. 复现 │ ← 你能可靠地触发这个错误吗？
└──────┬──────┘
▼
┌─────────────┐
│ 2. 隔离 │ ← 缩小范围：哪个组件、输入或路径？
└──────┬──────┘
▼
┌─────────────┐
│ 3. 识别 │ ← 找到根本原因
└──────┬──────┘
▼
┌─────────────┐
│ 4. 修复 │ ← 最小化、有针对性的修改
└──────┬──────┘
▼
┌─────────────┐
│ 5. 验证 │ ← 错误不再复现；没有回归问题
└──────┬──────┘
▼
┌─────────────┐
│ 6. 预防 │ ← 添加测试、监控或文档
└─────────────┘

2.2 复现错误

最小复现清单：

1. 从干净状态开始（全新安装、空数据库、无痕浏览器）
列出触发错误的确切步骤
记录环境：操作系统、运行时版本、浏览器、配置
剥离无关代码，直到错误被隔离
如果是间歇性错误：识别时序/并发模式

bash

创建一个最小复现项目

mkdir bug-repro && cd bug-repro
npm init -y

只添加演示错误所需的最小依赖

npm install problematic-library@1.2.3

编写能触发问题的最小脚本

2.3 二分查找调试

当不知道错误在哪里时，使用二分法：

代码二分法：

// 在可疑代码的中点添加一个return/exit
// 如果错误消失 → 错误在中点之后
// 如果错误仍然存在 → 错误在中点之前
// 在缩小后的半段上重复

数据二分法：
bash

如果大量输入导致错误，将其分成两半

head -n 500 input.csv > first_half.csv
tail -n 500 input.csv > second_half.csv

测试每一半——哪个触发了错误？

配置二分法：
bash

注释掉一半配置，测试

缩小到哪个配置选项导致了问题

2.4 阅读堆栈跟踪

Error: Cannot read properties of undefined (reading email)
at getUserEmail (src/services/user.ts:42:18) ← 崩溃的位置
at processOrder (src/services/order.ts:87:24) ← 谁调用了它
at OrderController.create (src/controllers/order.ts:23:5) ← 入口点
at Layer.handle (node_modules/express/lib/router/layer.js:95:5)

从下往上读： 底部显示调用的起源。顶部显示失败的位置。src/services/user.ts:42 这一行是查看的位置，但原因可能在 order.ts:87（传入了 undefined）。

3. Git Bisect

3.1 手动二分查找

bash

开始二分查找

git bisect start

将当前（有问题的）提交标记为bad

git bisect bad

标记一个已知正常的提交（例如，上一个发布标签）

git bisect good v2.0.0

Git 检出了中间点——测试它

如果这个提交有问题：

git bisect bad

如果这个提交正常：

git bisect good

重复直到 Git 识别出第一个有问题的提交

Git 输出：abc1234 is the first bad commit

完成——重置

git bisect reset

3.2 自动二分查找

bash

自动化：提供一个测试脚本，退出码0（正常）或1（有问题）

git bisect start HEAD v2.0.0
git bisect run npm test

或者使用自定义脚本

git bisect run bash -c npm run build 2>/dev/null && \ node -e const { buggyFunction } = require(\./dist\); const result = buggyFunction(\test-input\); process.exit(result === expected ? 0 : 1);

完成后重置

git bisect reset

3.3 带跳过的二分查找

bash

如果某个提交无法测试（例如，由于无关原因构建失败）

git bisect skip

跳过一段无法测试的提交范围

git bisect skip v2.0.1..v2.0.5

4. 前端调试

4.1 Chrome DevTools — 控制台高级功能

js
// $0 — 引用 Elements 面板中当前选中的元素
$0.textContent

// $$() — querySelectorAll 快捷方式
$$(button.primary).length

// copy() — 将任意值复制到剪贴板
copy(JSON.stringify(data, null, 2))

// monitor() — 记录对函数的所有调用
monitor(fetch)
// unmonitor(fetch) 停止

// monitorEvents() — 记录元素上的所有事件
monitorEvents($0, click)
// unmonitorEvents($0) 停止

// queryObjects() — 查找构造函数的所有实例
queryObjects(Promise) // 查找所有存活的 Promise

// table() — 将数组/对象显示为表格
console.table(users, [name, email, role])

// time/timeEnd — 测量执行时间
console.time(render)
renderComponent()
console.timeEnd(render) // render: 42.3ms

// group — 组织相关日志
console.group(API Request)
console.log(URL:, url)
console.log(Method:, method)
console.log(Body:, body)
console.groupEnd()

// assert — 仅在条件失败时记录
console.assert(user.id, User ID is missing, user)

4.2 Sources 面板 — 高级断点

断点类型	设置方法	使用场景
行断点	点击行号	在特定行停止
条件断点

debug-detective调试侦探

debug-detective

Debug Detective — Systematic Debugging Methodology

1. Debugging Mindset

1.1 The scientific method for debugging

1.2 Key debugging principles

1.3 Cognitive biases that hinder debugging

1.4 Rubber duck debugging

1.5 Feynman technique

2. Systematic Debugging Workflow

2.1 The six-step process

2.2 Reproducing the bug

2.3 Binary search debugging

2.4 Reading stack traces

3. Git Bisect

3.1 Manual bisect

3.2 Automated bisect

3.3 Bisect with skip

4. Frontend Debugging

4.1 Chrome DevTools — Console power features

4.2 Sources panel — Advanced breakpoints

4.3 Performance panel — Finding slow code

4.4 Memory panel — Finding leaks

4.5 CSS debugging techniques

4.6 React DevTools profiler

5. Node.js / JavaScript Debugging

5.1 Inspect flag

5.2 VS Code launch.json

5.3 Memory leak hunting in Node.js

5.4 Debugging async code

5.5 Why is Node.js not exiting?

6. Python Debugging

6.1 Built-in debugger

6.2 ipdb (enhanced debugger)

6.3 py-spy — Production profiling

6.4 Memory profiling

6.5 tracemalloc — Built-in memory tracking

7. System-Level Debugging

7.1 strace — Trace system calls

7.2 Process inspection

7.3 tcpdump — Network packet capture

8. Database Debugging

8.1 EXPLAIN ANALYZE

8.2 Finding slow queries

8.3 Lock debugging

8.4 N+1 query detection

9. Network Debugging

9.1 curl deep dive

9.2 DNS debugging

9.3 SSL/TLS debugging

9.4 CORS debugging checklist

10. Memory Debugging

10.1 Common memory leak patterns

10.2 JavaScript memory leak debugging workflow

10.3 Container OOM debugging

11. Performance Profiling

11.1 Flame graphs

11.2 Core Web Vitals debugging

11.3 Load testing for debugging

12. Logging Strategies

12.1 Structured logging

12.2 Log levels

12.3 Correlation IDs

12.4 OpenTelemetry basics

13. Debugging in Production

13.1 Debug without redeploying

13.2 Safe debug endpoints

13.3 Sentry error tracking

14. Common Pitfalls

Debug Detective — 系统性调试方法论

1. 调试心态

1.1 调试的科学方法

1.2 关键调试原则

1.3 妨碍调试的认知偏差

1.4 橡皮鸭调试法

1.5 费曼技巧

2. 系统性调试工作流

2.1 六步流程

2.2 复现错误

创建一个最小复现项目