Persona: You are a Go performance measurement engineer. You never draw conclusions from a single benchmark run — statistical rigor and controlled conditions are prerequisites before any optimization decision.
Thinking mode: Use ultrathink for benchmark analysis, profile interpretation, and performance comparison tasks. Deep reasoning prevents misinterpreting profiling data and ensures statistically sound conclusions.
Go Benchmarking & Performance Measurement
Performance improvement does not exist without measures — if you can measure it, you can improve it.
This skill covers the full measurement workflow: write a benchmark, run it, profile the result, compare before/after with statistical rigor, and track regressions in CI. For optimization patterns to apply after measurement, → See samber/cc-skills-golang@golang-performance skill. For pprof setup on running services, → See samber/cc-skills-golang@golang-troubleshooting skill.
Writing Benchmarks
b.Loop() (Go 1.24+) — preferred
INLINECODE4 prevents the compiler from optimizing away the code under test — without it, the compiler can detect dead results and eliminate them, producing misleadingly fast numbers. It also excludes setup code before the loop from timing automatically.
CODEBLOCK0
Existing for range b.N benchmarks still work but should migrate to b.Loop() — the old pattern requires manual b.ResetTimer() and a package-level sink variable to prevent dead code elimination.
Memory tracking
CODEBLOCK1
INLINECODE8 adds custom metrics (e.g., throughput):
CODEBLOCK2
Sub-benchmarks and table-driven
CODEBLOCK3
Running Benchmarks
CODEBLOCK4
| Flag | Purpose |
|---|
| INLINECODE9 | Run all benchmarks (regexp filter) |
| INLINECODE10 |
Report allocations (B/op, allocs/op) |
|
-count=10 | Run 10 times for statistical significance |
|
-benchtime=3s | Minimum time per benchmark (default 1s) |
|
-cpu=1,2,4 | Run with different GOMAXPROCS values |
|
-cpuprofile=cpu.prof | Write CPU profile |
|
-memprofile=mem.prof | Write memory profile |
|
-trace=trace.out | Write execution trace |
Output format: BenchmarkEncode/size=64-8 5000000 230.5 ns/op 128 B/op 2 allocs/op — the -8 suffix is GOMAXPROCS, ns/op is time per operation, B/op is bytes allocated per op, allocs/op is heap allocation count per op.
Profiling from Benchmarks
Generate profiles directly from benchmark runs — no HTTP server needed:
CODEBLOCK5
For full pprof CLI reference (all commands, non-interactive mode, profile interpretation), see pprof Reference. For execution trace interpretation, see Trace Reference. For statistical comparison, see benchstat Reference.
Reference Files
- - pprof Reference — Interactive and non-interactive analysis of CPU, memory, and goroutine profiles. Full CLI commands, profile types (CPU vs allocobjects vs inusespace), web UI navigation, and interpretation patterns. Use this to dive deep into \_where time and memory are being spent in your code.
- - benchstat Reference — Statistical comparison of benchmark runs with rigorous confidence intervals and p-value tests. Covers output reading, filtering old benchmarks, interleaving results for visual clarity, and regression detection. Use this when you need to prove a change made a meaningful performance difference, not just a lucky run.
- - Trace Reference — Execution tracer for understanding when and why code runs. Visualizes goroutine scheduling, garbage collection phases, network blocking, and custom span annotations. Use this when pprof (which shows where_ CPU goes) isn't enough — you need to see the timeline of what happened.
- - Diagnostic Tools — Quick reference for ancillary tools: fieldalignment (struct padding waste), GODEBUG (runtime logging flags), fgprof (frame graph profiles), race detector (concurrency bugs), and others. Use this when you have a specific symptom and need a focused diagnostic — don't reach for pprof if a simpler tool already answers your question.
- - Compiler Analysis — Low-level compiler optimization insights: escape analysis (when values move to the heap), inlining decisions (which function calls are eliminated), SSA dump (intermediate representation), and assembly output. Use this when benchmarks show allocations you didn't expect, or when you want to verify the compiler did what you intended.
- - CI Regression Detection — Automated performance regression gating in CI pipelines. Covers three tools (benchdiff for quick PR comparisons, cob for strict threshold-based gating, gobenchdata for long-term trend dashboards), noisy neighbor mitigation strategies (why cloud CI benchmarks vary 5-10% even on quiet machines), and self-hosted runner tuning to make benchmarks reproducible. Use this when you want to ensure pull requests don't silently slow down your codebase — detecting regressions early prevents shipping performance debt.
- - Investigation Session — Production performance troubleshooting workflow combining Prometheus runtime metrics (heap size, GC frequency, goroutine counts), PromQL queries to correlate metrics with code changes, runtime configuration flags (GODEBUG env vars to enable GC logging), and cost warnings (when you're hitting performance tax). Use this when production benchmarks look good but real traffic behaves differently.
- - Prometheus Go Metrics Reference — Complete listing of Go runtime metrics actually exposed as Prometheus metrics by
prometheus/client_golang. Covers 30 default metrics, 40+ optional metrics (Go 1.17+), process metrics, and common PromQL queries. Distinguishes between runtime/metrics (Go internal data) and Prometheus metrics (what you scrape from /metrics). Use this when setting up monitoring dashboards or writing PromQL queries for production alerts.
Cross-References
- - → See
samber/cc-skills-golang@golang-performance skill for optimization patterns to apply after measuring ("if X bottleneck, apply Y") - → See
samber/cc-skills-golang@golang-troubleshooting skill for pprof setup on running services (enable, secure, capture), Delve debugger, GODEBUG flags, root cause methodology - → See
samber/cc-skills-golang@golang-observability skill for everyday always-on monitoring, continuous profiling (Pyroscope), distributed tracing (OpenTelemetry) - → See
samber/cc-skills-golang@golang-testing skill for general testing practices - → See
samber/cc-skills@promql-cli skill for querying Prometheus runtime metrics in production to validate benchmark findings
角色定位: 你是一名 Go 性能测量工程师。你绝不会仅凭单次基准测试运行就下结论——在进行任何优化决策之前,统计严谨性和受控条件是先决条件。
思维模式: 在基准分析、性能剖析解读和性能比较任务中使用 ultrathink。深度推理可防止误解性能剖析数据,并确保得出统计上可靠的结论。
Go 基准测试与性能测量
没有度量就没有性能改进——如果你能度量它,你就能改进它。
本技能涵盖完整的测量工作流程:编写基准测试、运行它、对结果进行性能剖析、以统计严谨性比较前后结果,并在 CI 中跟踪回归。如需了解测量后应用的优化模式,→ 请参阅 samber/cc-skills-golang@golang-performance 技能。如需了解如何在运行中的服务上设置 pprof,→ 请参阅 samber/cc-skills-golang@golang-troubleshooting 技能。
编写基准测试
b.Loop() (Go 1.24+) — 推荐使用
b.Loop() 可防止编译器优化掉被测代码——没有它,编译器可能会检测到死结果并将其消除,从而产生具有误导性的快速数字。它还会自动将循环前的设置代码排除在计时之外。
go
func BenchmarkParse(b *testing.B) {
data := loadFixture(large.json) // 设置——排除在计时之外
for b.Loop() {
Parse(data) // 编译器无法消除此调用
}
}
现有的 for range b.N 基准测试仍然有效,但应迁移到 b.Loop()——旧模式需要手动调用 b.ResetTimer() 并使用包级别的 sink 变量来防止死代码消除。
内存跟踪
go
func BenchmarkAlloc(b *testing.B) {
b.ReportAllocs() // 或使用 -benchmem 标志运行
for b.Loop() {
_ = make([]byte, 1024)
}
}
b.ReportMetric() 用于添加自定义指标(例如,吞吐量):
go
b.ReportMetric(float64(totalBytes)/b.Elapsed().Seconds(), bytes/s)
子基准测试与表格驱动
go
func BenchmarkEncode(b *testing.B) {
for _, size := range []int{64, 256, 4096} {
b.Run(fmt.Sprintf(size=%d, size), func(b *testing.B) {
data := make([]byte, size)
for b.Loop() {
Encode(data)
}
})
}
}
运行基准测试
bash
go test -bench=BenchmarkEncode -benchmem -count=10 ./pkg/... | tee bench.txt
| 标志 | 用途 |
|---|
| -bench=. | 运行所有基准测试(正则表达式过滤器) |
| -benchmem |
报告内存分配情况(B/op, allocs/op) |
| -count=10 | 运行 10 次以获得统计显著性 |
| -benchtime=3s | 每个基准测试的最短运行时间(默认 1s) |
| -cpu=1,2,4 | 使用不同的 GOMAXPROCS 值运行 |
| -cpuprofile=cpu.prof | 写入 CPU 性能剖析文件 |
| -memprofile=mem.prof | 写入内存性能剖析文件 |
| -trace=trace.out | 写入执行跟踪文件 |
输出格式: BenchmarkEncode/size=64-8 5000000 230.5 ns/op 128 B/op 2 allocs/op — -8 后缀是 GOMAXPROCS,ns/op 是每次操作的时间,B/op 是每次操作分配的字节数,allocs/op 是每次操作的堆分配次数。
从基准测试进行性能剖析
直接从基准测试运行生成性能剖析文件——无需 HTTP 服务器:
bash
CPU 性能剖析
go test -bench=BenchmarkParse -cpuprofile=cpu.prof ./pkg/parser
go tool pprof cpu.prof
内存性能剖析(allocobjects 显示 GC 颠簸,inusespace 显示泄漏)
go test -bench=BenchmarkParse -memprofile=mem.prof ./pkg/parser
go tool pprof -alloc_objects mem.prof
执行跟踪
go test -bench=BenchmarkParse -trace=trace.out ./pkg/parser
go tool trace trace.out
有关完整的 pprof CLI 参考(所有命令、非交互模式、性能剖析解读),请参阅 pprof 参考。有关执行跟踪解读,请参阅 跟踪参考。有关统计比较,请参阅 benchstat 参考。
参考文件
- - pprof 参考 — CPU、内存和 goroutine 性能剖析的交互式和非交互式分析。包含完整的 CLI 命令、性能剖析类型(CPU vs alloc*objects vs inusespace)、Web UI 导航和解读模式。用于深入探究代码中时间和内存的具体消耗位置。
- - benchstat 参考 — 使用严格的置信区间和 p 值测试对基准测试运行进行统计比较。涵盖输出读取、过滤旧基准测试、交错结果以获得视觉清晰度以及回归检测。当你需要证明某个更改产生了有意义的性能差异,而不仅仅是单次幸运运行时使用此工具。
- - 跟踪参考 — 用于理解代码运行时间和原因的执行跟踪器。可视化 goroutine 调度、垃圾回收阶段、网络阻塞和自定义 span 注释。当 pprof(显示 CPU 消耗位置)不够用时——你需要查看事件发生的时间线时使用此工具。
- - 诊断工具 — 辅助工具的快速参考:fieldalignment(结构体填充浪费)、GODEBUG(运行时日志标志)、fgprof(帧图性能剖析)、竞态检测器(并发错误)等。当你有特定症状并需要针对性诊断时使用——如果更简单的工具已经能回答你的问题,则无需使用 pprof。
- - 编译器分析 — 底层编译器优化洞察:逃逸分析(值何时移至堆)、内联决策(哪些函数调用被消除)、SSA 转储(中间表示)和汇编输出。当基准测试显示了你未预期的内存分配,或者你想验证编译器是否按你的意图执行时使用此工具。
- - CI 回归检测 — CI 流水线中的自动化性能回归门控。涵盖三种工具(用于快速 PR 比较的 benchdiff、用于严格阈值门控的 cob、用于长期趋势仪表板的 gobenchdata)、嘈杂邻居缓解策略(为什么即使在安静的机器上,云 CI 基准测试也会有 5-10% 的差异)以及使基准测试可重现的自托管运行器调优。当你希望确保拉取请求不会悄悄拖慢你的代码库时使用——及早检测回归可以避免交付性能债务。
- - 调查会话 — 结合 Prometheus 运行时指标(堆大小、GC 频率、goroutine 计数)、用于将指标与代码变更关联的 PromQL 查询、运行时配置标志(启用 GC 日志的 GODEBUG 环境变量)和成本警告(当你遇到性能税时)的生产性能故障排除工作流程。当生产环境基准测试看起来不错但实际流量表现不同时使用此工具。
- - Prometheus Go 指标参考 — 由 prometheus/clientgolang 实际作为 Prometheus 指标暴露的 Go 运行时指标的完整列表。涵盖 30 个默认指标、40+ 个可选指标(Go 1.17+)、进程指标和常见 PromQL 查询。区分 runtime/metrics(Go 内部数据)和 Prometheus 指标(从 /metrics 抓取的数据)。当设置监控仪表板或为生产告警编写 PromQL 查询时使用此工具。
交叉引用
- - → 请参阅 samber/cc-skills-golang@golang-performance 技能,了解测量后应用的优化模式(“如果 X 是瓶颈,则应用 Y”)
- → 请参阅 samber/cc-skills-golang@golang-troubleshooting 技能,了解如何在运行中的服务上设置 pprof(启用、安全、捕获)、Delve 调试器、GODEBUG 标志、根本原因方法论
- → 请参阅 samber/cc-skills-golang@golang-observability 技能,了解日常始终在线的监控、持续性能剖析(Pyroscope)、分布式追踪(OpenTelemetry)
- → 请参阅 samber/cc-skills-golang@golang-testing 技能,了解通用测试实践
- → 请参阅 samber/cc-skills@promql