Linux Kernel Crash Debugging
This skill guides you through analyzing Linux kernel crash dumps using the crash utility.
Installation
Claude Code
CODEBLOCK0
OpenClaw
CODEBLOCK1
Quick Start
Starting a Session
CODEBLOCK2
Core Debugging Workflow
CODEBLOCK3
🤖 Agent Execution Directives
If you are an AI/Agent using this skill,
do not invoke crash interactively as it will block your subshell.
- 1. Use the bundled wrapper
./scripts/agent-crash.sh which maps precisely to the workflows below but safely truncates outputs:
-
./scripts/agent-crash.sh -k vmlinux -c vmcore triage - Safely runs initial
sys,
log, and
bt.
-
./scripts/agent-crash.sh -k vmlinux -c vmcore flow-oom - Top 15 memory checks.
-
./scripts/agent-crash.sh -k vmlinux -c vmcore flow-deadlock - Pulls UN task stacks.
-
./scripts/agent-crash.sh -k vmlinux -c vmcore dis-regs <func> <pid> - Assembly regression.
-
./scripts/agent-crash.sh -k vmlinux -c vmcore check-poison <addr> - Pattern match memory poisons.
- 2. Fallback Strategy: If macros don't solve the issue, fall back to basic primitives manually:
./scripts/agent-crash.sh -k vmlinux -c vmcore run "rd ffff880123456780". - Check
references/agentic-heuristics.md for extended expert methodologies.
Prerequisites
| Item | Requirement |
|---|
| vmlinux | Must have debug symbols (CONFIG_DEBUG_INFO=y) |
| vmcore |
kdump/netdump/diskdump/ELF format |
|
Version | vmlinux must exactly match the vmcore kernel version |
Package Installation
Anolis OS / Alibaba Cloud Linux
CODEBLOCK4
RHEL / CentOS / Rocky / AlmaLinux
CODEBLOCK5
Ubuntu / Debian
CODEBLOCK6
Self-compiled Kernel
CODEBLOCK7
Verify Installation
CODEBLOCK8
Core Command Reference
Debugging Analysis
| Command | Purpose | Example |
|---|
| INLINECODE13 | System info/panic reason | INLINECODE14 , INLINECODE15 |
| INLINECODE16 |
Kernel message buffer |
log,
log \| tail |
|
bt | Stack backtrace |
bt,
bt -a,
bt -f |
|
struct | View structures |
struct task_struct <addr> |
|
p/px/pd | Print variables |
p jiffies,
px current |
|
kmem | Memory analysis |
kmem -i,
kmem -S <cache> |
Tasks and Processes
| Command | Purpose | Example |
|---|
| INLINECODE31 | Process list | INLINECODE32 , INLINECODE33 |
| INLINECODE34 |
Switch context |
set <pid>,
set -p |
|
foreach | Batch task operations |
foreach bt,
foreach UN bt |
|
task | task_struct contents |
task <pid> |
|
files | Open files |
files <pid> |
Memory Operations
| Command | Purpose | Example |
|---|
| INLINECODE44 | Read memory | INLINECODE45 , INLINECODE46 |
| INLINECODE47 |
Search memory |
search -k deadbeef |
|
vtop | Address translation |
vtop <addr> |
|
list | Traverse linked lists |
list task_struct.tasks -h <addr> |
bt Command Details
The most important debugging command:
CODEBLOCK9
Context Management
Crash session has a "current context" affecting bt, files, vm commands:
CODEBLOCK10
Session Control
CODEBLOCK11
Typical Debugging Scenarios
Kernel BUG Location
CODEBLOCK12
Deadlock Analysis
CODEBLOCK13
Memory Issues
CODEBLOCK14
Stack Overflow
CODEBLOCK15
Advanced Techniques
Chained Queries
CODEBLOCK16
Batch Slab Inspection
CODEBLOCK17
Kernel Linked List Traversal
CODEBLOCK18
Extended Reference
For detailed information, refer to the following reference files:
| File | Content |
|---|
| INLINECODE56 | Advanced commands: list, rd, search, vtop, kmem, foreach |
| INLINECODE57 |
vmcore file format, ELF structure, VMCOREINFO |
|
references/case-studies.md | Debugging cases: kernel BUG, deadlock, OOM, NULL pointer, stack overflow |
|
references/debug-tools-guide.md | Advanced debugging tools: KASAN, Kprobes, Kmemleak, UBSAN (require kernel rebuild) |
Usage:
CODEBLOCK19
Common Errors
CODEBLOCK20
Security Warnings
⚠️ Dangerous Operations
The following commands can cause system damage or data loss:
| Command | Risk | Recommendation |
|---|
| INLINECODE60 | Writes to live kernel memory | NEVER use on production systems - can crash or corrupt running kernel |
| GDB passthrough |
Unrestricted memory access | Use with caution, may modify memory or registers |
🔒 Sensitive Data Handling
- - vmcore files contain complete kernel memory, potentially including:
- User process memory and credentials
- Encryption keys and secrets
- Network connection data and passwords
- - Access control: Restrict vmcore file access to authorized personnel
- Secure storage: Store dump files in encrypted or access-controlled directories
- Secure disposal: Use
shred or secure delete when disposing of vmcore files
🛡️ Best Practices
- 1. Only analyze vmcore files in isolated/test environments when possible
- Never share raw vmcore files publicly without sanitization
- Consider using
makedumpfile -d to filter sensitive pages before analysis - Document and audit all crash analysis sessions for compliance
Important Notes
- 1. Version Match: vmlinux must exactly match the vmcore kernel version
- Debug Info: Must use vmlinux with debug symbols
- Context Awareness:
bt, files, vm commands are affected by current context - Live System Modification:
wr command modifies running kernel, extremely dangerous
Resources
Contributing
This is an open-source project. Contributions are welcome!
- - GitHub Repository: https://github.com/crazyss/linux-kernel-crash-debug
- Report Issues: GitHub Issues
- Submit PRs: Pull requests are welcome for bug fixes, new features, or documentation improvements
See CONTRIBUTING.md for guidelines.
Linux 内核崩溃调试
本技能指导您使用 crash 工具分析 Linux 内核崩溃转储文件。
安装
Claude Code
bash
claude skill install linux-kernel-crash-debug.skill
OpenClaw
bash
方法1:通过 ClawHub 安装
clawhub install linux-kernel-crash-debug
方法2:手动安装
mkdir -p ~/.openclaw/workspace/skills/linux-kernel-crash-debug
cp SKILL.md ~/.openclaw/workspace/skills/linux-kernel-crash-debug/
快速开始
启动会话
bash
分析转储文件
crash vmlinux vmcore
调试运行中的系统
crash vmlinux
原始 RAM 转储
crash vmlinux ddr.bin --ram_start=0x80000000
核心调试流程
- 1. crash> sys # 确认崩溃原因
- crash> log # 查看内核日志
- crash> bt # 分析调用栈
- crash> struct # 检查数据结构
- crash> kmem # 内存分析
🤖 代理执行指令
如果您是使用此技能的 AI/Agent,
请勿交互式调用 crash,因为它会阻塞您的子 shell。
- 1. 使用捆绑的包装脚本 ./scripts/agent-crash.sh,该脚本精确映射到以下工作流程,但安全地截断输出:
- ./scripts/agent-crash.sh -k vmlinux -c vmcore triage - 安全运行初始的 sys、log 和 bt。
- ./scripts/agent-crash.sh -k vmlinux -c vmcore flow-oom - 前15项内存检查。
- ./scripts/agent-crash.sh -k vmlinux -c vmcore flow-deadlock - 提取 UN 状态任务栈。
- ./scripts/agent-crash.sh -k vmlinux -c vmcore dis-regs
- 汇编回归分析。
- ./scripts/agent-crash.sh -k vmlinux -c vmcore check-poison - 模式匹配内存毒化。
- 2. 回退策略:如果宏无法解决问题,手动回退到基本原语:./scripts/agent-crash.sh -k vmlinux -c vmcore run rd ffff880123456780。
- 查看 references/agentic-heuristics.md 获取扩展专家方法论。
前提条件
| 项目 | 要求 |
|---|
| vmlinux | 必须包含调试符号(CONFIGDEBUGINFO=y) |
| vmcore |
kdump/netdump/diskdump/ELF 格式 |
| 版本 | vmlinux 必须与 vmcore 内核版本完全匹配 |
软件包安装
Anolis OS / 阿里云 Linux
bash
安装 crash 工具
sudo dnf install crash
安装内核调试信息(匹配您的内核版本)
sudo dnf install kernel-debuginfo-$(uname -r)
安装额外分析工具
sudo dnf install gdb readelf objdump makedumpfile
可选:安装 kernel-devel 用于源码参考
sudo dnf install kernel-devel-$(uname -r)
RHEL / CentOS / Rocky / AlmaLinux
bash
sudo dnf install crash kernel-debuginfo-$(uname -r)
sudo dnf install gdb binutils makedumpfile
Ubuntu / Debian
bash
sudo apt install crash linux-crashdump gdb binutils makedumpfile
sudo apt install linux-image-$(uname -r)-dbgsym
自编译内核
bash
在内核配置中启用调试符号
make menuconfig # 启用 CONFIGDEBUGINFO,CONFIGDEBUGINFO_REDUCED=n
或直接设置
scripts/config --enable CONFIGDEBUGINFO
scripts/config --enable CONFIGDEBUGINFODWARFTOOLCHAIN_DEFAULT
验证安装
bash
检查 crash 版本
crash --version
验证调试信息与内核匹配
crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux /proc/kcore
核心命令参考
调试分析
| 命令 | 用途 | 示例 |
|---|
| sys | 系统信息/崩溃原因 | sys,sys -i |
| log |
内核消息缓冲区 | log,log \| tail |
| bt | 栈回溯 | bt,bt -a,bt -f |
| struct | 查看结构体 | struct task_struct |
| p/px/pd | 打印变量 | p jiffies,px current |
| kmem | 内存分析 | kmem -i,kmem -S |
任务和进程
| 命令 | 用途 | 示例 |
|---|
| ps | 进程列表 | ps,ps -m \ | grep UN |
| set |
切换上下文 | set ,set -p |
| foreach | 批量任务操作 | foreach bt,foreach UN bt |
| task | task_struct 内容 | task |
| files | 打开的文件 | files |
内存操作
| 命令 | 用途 | 示例 |
|---|
| rd | 读取内存 | rd <addr>,rd -p <phys> |
| search |
搜索内存 | search -k deadbeef |
| vtop | 地址转换 | vtop |
| list | 遍历链表 | list task_struct.tasks -h |
bt 命令详解
最重要的调试命令:
crash> bt # 当前任务栈
crash> bt -a # 所有 CPU 活动任务
crash> bt -f # 展开栈帧原始数据
crash> bt -F # 符号化栈帧数据
crash> bt -l # 显示源文件和行号
crash> bt -e # 搜索异常帧
crash> bt -v # 检查栈溢出
crash> bt -R # 仅显示引用符号的栈
crash> bt # 特定进程
上下文管理
Crash 会话具有当前上下文,影响 bt、files、vm 命令:
crash> set # 查看当前上下文
crash> set # 切换到指定 PID
crash> set # 切换到任务地址
crash> set -p # 恢复到崩溃任务
会话控制
输出控制
crash> set scroll off # 禁用分页
crash> sf # scroll off 的别名
输出重定向
crash> foreach bt > bt.all
GDB 透传
crash> gdb bt # 单次 gdb 调用
crash> set gdb on # 进入 gdb 模式
(gdb) info registers
(gdb) set gdb off
从文件读取命令
crash> < commands.txt
典型调试场景
内核 BUG 定位
crash> sys # 确认崩溃
crash> log | tail -50 # 查看日志
crash> bt # 调用栈
crash> bt -f # 展开帧以获取参数
crash> struct # 检查数据结构
死锁分析
crash> bt -a # 所有 CPU 调用栈
crash> ps -m | grep UN # 不可中断进程
crash> foreach UN bt # 查看等待原因
crash> struct mutex # 检查锁状态
内存问题
crash> kmem -i # 内存统计
crash> kmem -S # 检查 slab
crash> vm # 进程内存映射
crash> search -k # 搜索内存
栈溢出
crash> bt -v # 检查栈溢出
crash> bt -r # 原始栈数据
高级技巧
链式查询
crash> bt -f # 获取指针