Git Repository Auditor
What This Does
A CLI tool to audit Git repositories for security issues, code quality problems, and repository health. Scan repositories for secrets, large files, sensitive data, and common security anti-patterns.
Key features:
- - Secrets detection: Scan Git history for API keys, passwords, tokens, and other sensitive data using regex patterns
- Large file detection: Identify large files (>10MB) in repository history that may impact performance
- Security anti-patterns: Detect hardcoded credentials, insecure configuration files, and dangerous permissions
- Repository health: Check for merge conflicts, stale branches, and other repository hygiene issues
- Compliance reporting: Generate security compliance reports for audits and team reviews
- Multiple output formats: Human-readable, JSON, and CSV output for integration with other tools
- Custom scanning: Configure custom regex patterns and file extensions to scan
- Historical analysis: Scan entire Git history or specific time ranges
- Remediation guidance: Suggest fixes for identified security issues
When To Use
- - You need to audit a Git repository for security compliance
- You want to detect accidental commits of secrets or sensitive data
- You're preparing a repository for open-source release
- You need to identify performance issues (large files in history)
- You're onboarding new developers and want to ensure repository hygiene
- You need to generate security audit reports for compliance requirements
- You want to automate security scanning in CI/CD pipelines
- You're cleaning up old repositories and need to identify issues
Usage
Basic commands:
CODEBLOCK0
Examples
Example 1: Basic security scan
CODEBLOCK1
Output:
CODEBLOCK2
Example 2: JSON output for CI/CD integration
CODEBLOCK3
Output (excerpt):
CODEBLOCK4
Example 3: Check repository health
CODEBLOCK5
Output:
CODEBLOCK6
Example 4: Large files detection only
CODEBLOCK7
Output:
CODEBLOCK8
Requirements
- - Git 2.20+ installed and available in PATH
- Python 3.x
- No external Python dependencies required (uses standard library)
Limitations
- - Scanning large repositories with extensive history may be slow
- Secrets detection uses regex patterns; may have false positives/negatives
- Does not automatically remove secrets from history (requires manual remediation)
- Limited to Git repositories (does not work with other VCS)
- No support for scanning encrypted repositories
- Large file detection scans entire history; may miss files in ignored directories
- Does not integrate with external secret managers (Vault, AWS Secrets Manager, etc.)
- No real-time monitoring; scans only historical commits
- Limited to text file scanning; cannot detect secrets in binary files
- May not detect all secret patterns; custom patterns may be needed
- Performance depends on repository size and history depth
- No support for scanning Git submodules automatically
- No built-in integration with secret management systems (Vault, AWS Secrets Manager)
- Limited to text file scanning; cannot detect secrets in binary files
- No support for custom Git hooks or pre-commit integration
- Performance may be impacted on repositories with millions of commits
- No support for distributed scanning across multiple repositories
- Limited error handling for corrupted Git repositories
- No support for scanning Git worktrees or shallow clones
- Cannot scan remote repositories without local clone
- No built-in notification system for new issues
Directory Structure
The tool works with any local Git repository. No special configuration directories are required, but you can provide custom patterns files for secrets detection.
Error Handling
- - Invalid repository paths show helpful error messages with suggestions
- Git command failures show the underlying error and suggest troubleshooting steps
- Permission errors suggest checking repository access rights
- Pattern file parsing errors show line numbers and validation issues
- Memory errors suggest using smaller commit ranges or more specific scanning
Contributing
This is a skill built by the Skill Factory. Issues and improvements should be reported through the OpenClaw project.
Git 仓库审计工具
功能概述
一款用于审计 Git 仓库安全漏洞、代码质量问题和仓库健康状况的 CLI 工具。可扫描仓库中的密钥、大文件、敏感数据和常见安全反模式。
主要特性:
- - 密钥检测:使用正则表达式模式扫描 Git 历史记录中的 API 密钥、密码、令牌和其他敏感数据
- 大文件检测:识别仓库历史中可能影响性能的大文件(>10MB)
- 安全反模式:检测硬编码凭据、不安全配置文件和危险权限
- 仓库健康检查:检查合并冲突、陈旧分支和其他仓库卫生问题
- 合规报告:生成安全合规报告,用于审计和团队审查
- 多种输出格式:支持人类可读、JSON 和 CSV 格式,便于与其他工具集成
- 自定义扫描:配置自定义正则表达式模式和文件扩展名进行扫描
- 历史分析:扫描整个 Git 历史或特定时间范围
- 修复指导:针对已识别的安全问题提供修复建议
适用场景
- - 需要对 Git 仓库进行安全合规审计
- 希望检测意外提交的密钥或敏感数据
- 准备将仓库开源发布
- 需要识别性能问题(历史中的大文件)
- 正在培训新开发者,希望确保仓库卫生
- 需要生成安全审计报告以满足合规要求
- 希望在 CI/CD 流水线中自动化安全扫描
- 正在清理旧仓库并需要识别问题
使用方法
基本命令:
bash
扫描当前目录仓库
python3 scripts/main.py scan .
扫描指定仓库路径
python3 scripts/main.py scan /path/to/repo
使用自定义密钥模式文件扫描
python3 scripts/main.py scan . --patterns custom-patterns.json
生成 JSON 报告用于自动化
python3 scripts/main.py scan . --json
仅检查大文件(>50MB)
python3 scripts/main.py scan . --check large-files --threshold 50
扫描特定分支或提交范围
python3 scripts/main.py scan . --branch main --since 2024-01-01
生成包含修复建议的修复报告
python3 scripts/main.py scan . --remediation
列出所有分支及其最后提交时间
python3 scripts/main.py branches .
示例
示例 1:基本安全扫描
bash
python3 scripts/main.py scan ~/projects/my-app
输出:
🔍 正在扫描仓库:/home/user/projects/my-app
📊 仓库信息:247 次提交,5 个分支,3 位贡献者
🔐 发现安全漏洞(3 个):
⚠️ 高:在提交 abc123(2024-02-15)中发现 AWSACCESSKEY_ID
文件:config/old-config.env
模式:AWSACCESSKEY_ID=AKIA.*
修复建议:立即轮换密钥,使用 BFG 从历史中移除
⚠️ 中:在提交 def456(2024-01-20)中发现硬编码数据库密码
文件:src/database.js
模式:password: secret123
修复建议:移至环境变量,使用密钥管理器
⚠️ 低:在提交 ghi789(2023-12-05)中发现私钥文件扩展名
文件:backup/id_rsa.old
模式:私钥文件(.pem, .key, .ppk, id_rsa)
修复建议:从仓库历史中移除文件
💾 发现大文件(2 个):
📦 42MB:assets/video/demo.mp4(提交 xyz123)
📦 18MB:database/backup.sql(提交 uvw456)
✅ 仓库健康:良好
⏰ 陈旧分支:2 个分支超过 90 天未更新
示例 2:用于 CI/CD 集成的 JSON 输出
bash
python3 scripts/main.py scan . --json > security-report.json
输出(节选):
json
{
repository: /home/user/projects/my-app,
scan_date: 2024-03-06T10:30:00Z,
security_issues: [
{
severity: high,
type: awsaccesskey,
commit: abc123,
date: 2024-02-15,
file: config/old-config.env,
pattern: AWSACCESSKEY_ID=AKIA.*,
remediation: 立即轮换密钥,使用 BFG 从历史中移除
}
],
large_files: [
{
size_mb: 42,
path: assets/video/demo.mp4,
commit: xyz123
}
],
summary: {
total_issues: 3,
by_severity: {high: 1, medium: 1, low: 1},
largefilescount: 2,
totalsizemb: 60
}
}
示例 3:检查仓库健康
bash
python3 scripts/main.py health .
输出:
📈 仓库健康报告:/home/user/projects/my-app
📊 基本指标:
- - 提交次数:1,247
- 分支数:12(3 个活跃,9 个陈旧)
- 贡献者:8
- 首次提交:2022-05-15
- 最后提交:2024-03-06
⚠️ 健康问题:
- - 陈旧分支:9 个分支超过 90 天无提交
- 大文件:历史中有 2 个文件 >10MB
- 二进制文件:45 个二进制文件(考虑使用 Git LFS)
- 合并冲突:代码中存在 3 个未解决的合并标记
✅ 良好实践:
- - .gitignore 存在且全面
- 最近提交中未检测到密钥
- 定期提交活动(平均每周 15 次提交)
- 有意义的提交信息(87% 良好)
💡 建议:
- 1. 清理陈旧分支:git branch -d branch1 branch2...
- 考虑对二进制文件使用 Git LFS
- 解决以下文件中的合并冲突:src/app.js, config/settings.yaml
示例 4:仅检测大文件
bash
python3 scripts/main.py scan . --check large-files --threshold 20
输出:
💾 仓库历史中的大文件(>20MB):
- 1. assets/videos/presentation.mp4
- 大小:42MB
- 提交:xyz123(2024-01-15)
- 作者:Jane Doe
- 信息:添加演示视频
- 2. database/backup/archive.sql.gz
- 大小:38MB
- 提交:uvw456(2023-12-20)
- 作者:John Smith
- 信息:数据库备份
总计:2 个文件,80MB
建议:考虑对 >20MB 的文件使用 Git LFS
系统要求
- - Git 2.20+ 已安装并可在 PATH 中使用
- Python 3.x
- 无需外部 Python 依赖(使用标准库)
局限性
- - 扫描具有大量历史记录的仓库可能较慢
- 密钥检测使用正则表达式模式;可能存在误报/漏报
- 不会自动从历史中移除密钥(需要手动修复)
- 仅限于 Git 仓库(不适用于其他版本控制系统)
- 不支持扫描加密仓库
- 大文件检测扫描整个历史;可能遗漏忽略目录中的文件
- 不与外部密钥管理器集成(Vault、AWS Secrets Manager 等)
- 无实时监控;仅扫描历史提交
- 仅限于文本文件扫描;无法检测二进制文件中的密钥
- 可能无法检测所有密钥模式;可能需要自定义模式
- 性能取决于仓库大小和历史深度
- 不支持自动扫描 Git 子模块
- 无内置密钥管理系统集成(Vault、AWS Secrets Manager)
- 仅限于文本文件扫描;无法检测二进制文件中的密钥
- 不支持自定义 Git 钩子或预提交集成
- 对于拥有数百万次提交的仓库,性能可能受影响
- 不支持跨多个仓库的分布式扫描
- 对损坏的 Git 仓库的错误处理有限
- 不支持扫描 Git 工作树或浅克隆
- 无法扫描没有本地克隆的远程仓库
- 无内置新问题通知系统
目录结构
该工具适用于任何本地 Git 仓库。无需特殊配置目录,但您可以提供自定义模式文件用于密钥检测。
错误处理
- - 无效仓库路径显示有用的错误消息和建议
- Git 命令失败显示底层错误并建议故障排除步骤
- 权限错误建议检查仓库访问权限
- 模式文件解析错误显示行号和验证问题
- 内存错误建议使用更小的提交范围或更具体的扫描
贡献
此技能由技能工厂构建。问题和改进应通过 OpenClaw 项目报告。