DeepSafe Scan — Preflight Security Scanner for AI Coding Agents
Full-featured preflight security scanner across 5 dimensions:
Posture (config), Skill (skills & MCP), Memory (sessions), Hooks (agent config injection), Model (behavioral safety probes).
Works with OpenClaw, Claude Code, Cursor, and Codex. LLM features auto-detect credentials — no manual configuration needed.
When to Use
- - User asks to "scan", "audit", "check security", or "health check" their AI setup
- User installs a new skill, MCP server, or clones a project with agent configs
- User wants to know if any secrets or PII are leaked in session history
- User asks about hooks injection risks (Claude Code settings.json, .cursorrules, etc.)
- User wants to probe model behavior for manipulation, deception, or hallucination risks
How to Run
Quick static scan (no API key needed)
CODEBLOCK0
Full scan (auto-detects API credentials)
CODEBLOCK1
Targeted scans
CODEBLOCK2
Output options
CODEBLOCK3
Cache control
CODEBLOCK4
Interpreting Results
Scores
- - Each module scores 1-100 (100 = clean, deductions per finding, minimum 1)
- Module contribution = floor(score / 4), range 1–25
- Total = sum of 4 contributions, max 100
Severity Levels
- - CRITICAL (-10 pts): Immediate exploitation risk — secrets exposed, no auth, data exfiltration chains
- HIGH (-5 pts): Serious risk — prompt injection, sensitive file access, network exposure
- MEDIUM (-2 pts): Moderate risk — hardcoded keys, missing logs, supply chain concerns
- LOW (-1 pt): Minor improvement — non-standard endpoints, missing metadata
Risk Ratings
- - 85-100: LOW RISK (green)
- 65-84: MEDIUM RISK (yellow)
- 40-64: HIGH RISK (orange)
- 4-39: CRITICAL RISK (red)
After Scanning
- 1. Present the total score and per-module breakdown to the user
- List CRITICAL and HIGH findings first with clear explanations
- For each finding, explain what the risk is and how to fix it
- Offer to help fix issues (e.g., "want me to add auth to your gateway config?")
What Gets Scanned
Posture Module (deployment config)
Reads
openclaw.json and checks: gateway auth, network exposure, API key handling, TLS, plugin permissions, MCP server risks, logging, sandboxing.
Skill Module (installed skills & MCP)
Scans all files in
workspace/skills/,
mcp/,
mcp-servers/ for: hidden unicode, prompt injection, dangerous runtime calls, encoded payloads, sensitive file references, hardcoded secrets, system prompt extraction, command injection, data exfiltration patterns, destructive actions, auto-execute without confirmation, excessive permissions.
With --gateway-url provided (and --no-llm not set), also runs LLM-enhanced semantic analysis on each skill's SKILL.md for social engineering, implicit data exfiltration, and deeper pattern recognition.
Memory Module (sessions & stored data)
Scans
agents/,
credentials/,
identity/,
logs/,
workspace/ for: 27+ secret patterns (API keys for OpenAI, GitHub, AWS, Slack, Stripe, etc.), 9 PII patterns (email, phone, ID numbers, credit cards), persistent prompt injection traces.
Model Module (4 behavioral safety probes)
Requires
--gateway-url and
--gateway-token. Runs:
- - Persuasion Probe: Tests if the model can manipulate user opinions in multi-turn conversation
- Sandbagging Probe: Tests if the model can strategically hide capabilities (performance control)
- Deception Probe: 3-phase test for reasoning/action misalignment (DTR metric)
- HaluEval Probe: Tests hallucination detection accuracy on QA benchmarks
Each probe produces a finding with risk level and score. Average across probes = module score.
DeepSafe Scan — AI编码代理的预检安全扫描器
全功能预检安全扫描器,覆盖5个维度:
配置姿态(配置)、技能(技能与MCP)、记忆(会话)、钩子(代理配置注入)、模型(行为安全探测)。
支持OpenClaw、Claude Code、Cursor和Codex。LLM功能可自动检测凭证——无需手动配置。
使用场景
- - 用户要求扫描、审计、检查安全性或健康检查其AI设置
- 用户安装新技能、MCP服务器或克隆包含代理配置的项目
- 用户想知道会话历史中是否有任何机密或PII泄露
- 用户询问钩子注入风险(Claude Code的settings.json、.cursorrules等)
- 用户想要探测模型行为是否存在操纵、欺骗或幻觉风险
运行方式
快速静态扫描(无需API密钥)
bash
python3 {baseDir}/scripts/scan.py --modules posture,skill,memory,hooks --scan-dir . --no-llm --format markdown
完整扫描(自动检测API凭证)
bash
OpenClaw(自动读取网关配置)
python3 {baseDir}/scripts/scan.py --openclaw-root ~/.openclaw --format html --output /tmp/deepsafe-report.html
Claude Code / Cursor / Codex(使用ANTHROPICAPIKEY或OPENAIAPIKEY)
python3 {baseDir}/scripts/scan.py --modules posture,skill,memory,hooks,model --scan-dir . --format html --output /tmp/deepsafe-report.html
定向扫描
bash
仅钩子注入(最快——检查.claude/settings.json、.cursorrules等)
python3 {baseDir}/scripts/scan.py --modules hooks --scan-dir . --no-llm --format markdown
仅记忆扫描(检查泄露的机密/PII)
python3 {baseDir}/scripts/scan.py --openclaw-root ~/.openclaw --modules memory --no-llm
仅模型行为探测
python3 {baseDir}/scripts/scan.py --openclaw-root ~/.openclaw --modules model --profile quick
输出选项
bash
python3 {baseDir}/scripts/scan.py --format json # 机器可读
python3 {baseDir}/scripts/scan.py --format markdown # 人类可读摘要
python3 {baseDir}/scripts/scan.py --format html --output /tmp/report.html # 可视化报告
缓存控制
bash
python3 {baseDir}/scripts/scan.py --ttl-days 3 # 缓存3天
python3 {baseDir}/scripts/scan.py --no-cache # 始终重新扫描
结果解读
评分
- - 每个模块评分1-100(100=干净,每发现一个问题扣分,最低1分)
- 模块贡献分 = floor(评分 / 4),范围1-25
- 总分 = 4个贡献分之和,最高100分
严重级别
- - 严重(-10分):即时利用风险——机密泄露、无认证、数据外泄链
- 高(-5分):严重风险——提示注入、敏感文件访问、网络暴露
- 中(-2分):中等风险——硬编码密钥、缺少日志、供应链问题
- 低(-1分):轻微改进——非标准端点、缺少元数据
风险评级
- - 85-100:低风险(绿色)
- 65-84:中风险(黄色)
- 40-64:高风险(橙色)
- 4-39:严重风险(红色)
扫描后操作
- 1. 向用户展示总分和各模块细分
- 首先列出严重和高风险发现,并附上清晰说明
- 对每个发现,解释风险是什么以及如何修复
- 主动提供帮助修复问题(例如:需要我向您的网关配置添加认证吗?)
扫描内容
配置姿态模块(部署配置)
读取openclaw.json并检查:网关认证、网络暴露、API密钥处理、TLS、插件权限、MCP服务器风险、日志记录、沙箱。
技能模块(已安装技能与MCP)
扫描workspace/skills/、mcp/、mcp-servers/中的所有文件,检查:隐藏Unicode、提示注入、危险运行时调用、编码载荷、敏感文件引用、硬编码机密、系统提示提取、命令注入、数据外泄模式、破坏性操作、无确认自动执行、过度权限。
如果提供了--gateway-url(且未设置--no-llm),还会对每个技能的SKILL.md运行LLM增强语义分析,检测社会工程、隐式数据外泄和更深层次的模式识别。
记忆模块(会话与存储数据)
扫描agents/、credentials/、identity/、logs/、workspace/,检查:27种以上机密模式(OpenAI、GitHub、AWS、Slack、Stripe等的API密钥)、9种PII模式(电子邮件、电话、身份证号、信用卡)、持久性提示注入痕迹。
模型模块(4项行为安全探测)
需要--gateway-url和--gateway-token。运行:
- - 说服探测:测试模型是否能在多轮对话中操纵用户意见
- 能力隐藏探测:测试模型是否能策略性地隐藏能力(性能控制)
- 欺骗探测:3阶段测试推理/行动不一致性(DTR指标)
- 幻觉评估探测:测试在QA基准上的幻觉检测准确性
每个探测产生一个带有风险级别和评分的发现。各探测的平均值 = 模块评分。