Repomix — Codebase Packer & Analyzer
Pack entire codebases into a single, AI-friendly file for analysis. Repomix intelligently collects repository files, respects .gitignore, runs security checks, and generates structured output optimized for LLM consumption.
When to Use
- - "Analyze this repo" / "Explore this codebase"
- "What's the structure of facebook/react?"
- "Find all authentication-related code"
- "How many tokens is this project?"
- "Pack this repo for AI analysis"
- "Show me the main components of vercel/next.js"
Quick Reference
Pack a Remote Repository
CODEBLOCK0
Always output to a temporary directory (/tmp on Unix, %TEMP% on Windows) for remote repositories to avoid polluting the user's working directory.
Pack a Local Directory
CODEBLOCK1
Key Options
| Option | Description |
|---|
| INLINECODE3 | Output format: xml (default, recommended), markdown, plain, INLINECODE7 |
| INLINECODE8 |
Tree-sitter compression (~70% token reduction) — use for large repos |
|
--include <patterns> | Include only matching patterns (e.g.,
"src/**/*.ts,**/*.md") |
|
--ignore <patterns> | Additional ignore patterns |
|
--output <path> | Custom output path (default:
repomix-output.xml) |
|
--remote-branch <name> | Specific branch, tag, or commit (for remote repos) |
Workflow
Step 1: Pack the Repository
Choose the appropriate command based on the target:
CODEBLOCK2
Step 2: Check Command Output
The command displays:
- - Files processed: Number of files included
- Total characters: Size of content
- Total tokens: Estimated AI tokens
- Output file location: Where the file was saved
Note the output file location for subsequent analysis.
Step 3: Analyze the Output
Structure overview:
- 1. Search for the file tree section (near the beginning of the output)
- Check the metrics summary for overall statistics
Search for patterns (use the output file path from Step 2):
CODEBLOCK3
Read specific sections using offset/limit for large outputs.
Step 4: Report Findings
- - Metrics: Files, tokens, size from command output
- Structure: Directory layout from file tree analysis
- Key findings: Based on pattern search results
- Next steps: Suggestions for deeper exploration
Best Practices
- 1. Use
--compress for large repos (>100k lines) to reduce token usage by ~70% - Use pattern search first before reading entire output files
- Use a temporary directory for output (
/tmp on Unix, %TEMP% on Windows) to keep the user's workspace clean - Use
--include to focus on specific parts of a codebase - XML is the default and recommended format — it has clear file boundaries for structured analysis
Output Formats
| Format | Best For |
|---|
| XML (default) | Structured analysis, clear file boundaries |
| Markdown |
Human-readable documentation |
| Plain | Simple grep-friendly output |
| JSON | Programmatic/machine analysis |
Error Handling
- - Command fails: Check error message, verify repository URL/path, check permissions
- Output too large: Use
--compress, narrow scope with INLINECODE20 - Network issues (remote): Verify connection, suggest local clone as alternative
- Pattern not found: Try alternative patterns, check file tree to verify files exist
Security
Repomix automatically excludes potentially sensitive files (API keys, credentials, .env files) through built-in security checks. Trust its security defaults unless the user explicitly requests otherwise.
Repomix — 代码库打包与分析工具
将整个代码库打包成一个便于AI分析的单一文件。Repomix能够智能收集仓库文件、遵循.gitignore规则、执行安全检查,并生成针对大语言模型优化的结构化输出。
使用场景
- - 分析此仓库 / 探索此代码库
- facebook/react的结构是什么?
- 查找所有与认证相关的代码
- 这个项目有多少token?
- 打包此仓库供AI分析
- 展示vercel/next.js的主要组件
快速参考
打包远程仓库
bash
npx repomix@latest --remote --output /tmp/-analysis.xml
对于远程仓库,始终输出到临时目录(Unix系统为/tmp,Windows系统为%TEMP%),以避免污染用户的工作目录。
打包本地目录
bash
npx repomix@latest [directory] --output /tmp/-analysis.xml
关键选项
| 选项 | 描述 |
|---|
| --style <format> | 输出格式:xml(默认,推荐)、markdown、plain、json |
| --compress |
Tree-sitter压缩(约减少70% token)— 适用于大型仓库 |
| --include
| 仅包含匹配的模式(例如 src//.ts,/.md) |
| --ignore | 额外的忽略模式 |
| --output | 自定义输出路径(默认:repomix-output.xml) |
| --remote-branch | 指定分支、标签或提交(用于远程仓库) |
工作流程
步骤1:打包仓库
根据目标选择合适的命令:
bash
远程仓库(始终输出到 /tmp)
npx repomix@latest --remote yamadashy/repomix --output /tmp/repomix-analysis.xml
大型远程仓库(启用压缩)
npx repomix@latest --remote facebook/react --compress --output /tmp/react-analysis.xml
本地目录
npx repomix@latest ./src --output /tmp/src-analysis.xml
仅特定文件类型
npx repomix@latest --include /*.{ts,tsx,js,jsx} --output /tmp/filtered-analysis.xml
步骤2:检查命令输出
命令会显示:
- - 已处理文件数:包含的文件数量
- 总字符数:内容大小
- 总token数:预估的AI token数量
- 输出文件位置:文件保存位置
记录输出文件位置以供后续分析使用。
步骤3:分析输出
结构概览:
- 1. 查找文件树部分(靠近输出开头)
- 检查指标摘要以获取整体统计信息
搜索模式(使用步骤2中的输出文件路径):
bash
查找导出和主入口点
grep -iE export.function|export.class
带上下文搜索
grep -iE -A 5 -B 5 authentication|auth
查找API端点
grep -iE router|route|endpoint|api
查找数据库模型
grep -iE model|schema|database|query
阅读特定部分:对于大型输出,使用偏移量/限制。
步骤4:报告发现
- - 指标:命令输出的文件数、token数、大小
- 结构:文件树分析得出的目录布局
- 关键发现:基于模式搜索结果
- 后续步骤:深入探索的建议
最佳实践
- 1. 对大型仓库使用--compress(超过10万行)可减少约70%的token使用量
- 先使用模式搜索,再读取整个输出文件
- 使用临时目录作为输出(Unix系统为/tmp,Windows系统为%TEMP%)以保持用户工作区整洁
- 使用--include聚焦代码库的特定部分
- XML是默认且推荐的格式 — 具有清晰的文件边界,便于结构化分析
输出格式
| 格式 | 最佳用途 |
|---|
| XML(默认) | 结构化分析,清晰的文件边界 |
| Markdown |
人类可读的文档 |
| Plain | 简单的grep友好输出 |
| JSON | 程序化/机器分析 |
错误处理
- - 命令失败:检查错误信息,验证仓库URL/路径,检查权限
- 输出过大:使用--compress,通过--include缩小范围
- 网络问题(远程):验证连接,建议本地克隆作为替代方案
- 未找到模式:尝试其他模式,检查文件树以确认文件是否存在
安全性
Repomix通过内置安全检查自动排除潜在敏感文件(API密钥、凭证、.env文件)。除非用户明确要求,否则请信任其安全默认设置。