DeepSafe Scan — Preflight Security Scanner for AI Coding Agents

Full-featured preflight security scanner across 5 dimensions:
Posture (config), Skill (skills & MCP), Memory (sessions), Hooks (agent config injection), Model (behavioral safety probes).

Works with OpenClaw, Claude Code, Cursor, and Codex. LLM features auto-detect credentials — no manual configuration needed.

When to Use

- User asks to "scan", "audit", "check security", or "health check" their AI setup
User installs a new skill, MCP server, or clones a project with agent configs
User wants to know if any secrets or PII are leaked in session history
User asks about hooks injection risks (Claude Code settings.json, .cursorrules, etc.)
User wants to probe model behavior for manipulation, deception, or hallucination risks

How to Run

Quick static scan (no API key needed)

CODEBLOCK0

Full scan (auto-detects API credentials)

CODEBLOCK1

Targeted scans

CODEBLOCK2

Output options

CODEBLOCK3

Cache control

CODEBLOCK4

Interpreting Results

Scores

- Each module scores 1-100 (100 = clean, deductions per finding, minimum 1)
Module contribution = floor(score / 4), range 1–25
Total = sum of 4 contributions, max 100

Severity Levels

- CRITICAL (-10 pts): Immediate exploitation risk — secrets exposed, no auth, data exfiltration chains
HIGH (-5 pts): Serious risk — prompt injection, sensitive file access, network exposure
MEDIUM (-2 pts): Moderate risk — hardcoded keys, missing logs, supply chain concerns
LOW (-1 pt): Minor improvement — non-standard endpoints, missing metadata

Risk Ratings

- 85-100: LOW RISK (green)
65-84: MEDIUM RISK (yellow)
40-64: HIGH RISK (orange)
4-39: CRITICAL RISK (red)

After Scanning

1. Present the total score and per-module breakdown to the user
List CRITICAL and HIGH findings first with clear explanations
For each finding, explain what the risk is and how to fix it
Offer to help fix issues (e.g., "want me to add auth to your gateway config?")

What Gets Scanned

Posture Module (deployment config)

Reads openclaw.json and checks: gateway auth, network exposure, API key handling, TLS, plugin permissions, MCP server risks, logging, sandboxing.

Skill Module (installed skills & MCP)

Scans all files in workspace/skills/, mcp/, mcp-servers/ for: hidden unicode, prompt injection, dangerous runtime calls, encoded payloads, sensitive file references, hardcoded secrets, system prompt extraction, command injection, data exfiltration patterns, destructive actions, auto-execute without confirmation, excessive permissions.

With --gateway-url provided (and --no-llm not set), also runs LLM-enhanced semantic analysis on each skill's SKILL.md for social engineering, implicit data exfiltration, and deeper pattern recognition.

Memory Module (sessions & stored data)

Scans agents/, credentials/, identity/, logs/, workspace/ for: 27+ secret patterns (API keys for OpenAI, GitHub, AWS, Slack, Stripe, etc.), 9 PII patterns (email, phone, ID numbers, credit cards), persistent prompt injection traces.

Model Module (4 behavioral safety probes)

Requires --gateway-url and --gateway-token. Runs:

- Persuasion Probe: Tests if the model can manipulate user opinions in multi-turn conversation
Sandbagging Probe: Tests if the model can strategically hide capabilities (performance control)
Deception Probe: 3-phase test for reasoning/action misalignment (DTR metric)
HaluEval Probe: Tests hallucination detection accuracy on QA benchmarks

Each probe produces a finding with risk level and score. Average across probes = module score.

DeepSafe Scan — AI编码代理的预检安全扫描器

全功能预检安全扫描器，覆盖5个维度：
配置姿态（配置）、技能（技能与MCP）、记忆（会话）、钩子（代理配置注入）、模型（行为安全探测）。

支持OpenClaw、Claude Code、Cursor和Codex。LLM功能可自动检测凭证——无需手动配置。

使用场景

- 用户要求扫描、审计、检查安全性或健康检查其AI设置
用户安装新技能、MCP服务器或克隆包含代理配置的项目
用户想知道会话历史中是否有任何机密或PII泄露
用户询问钩子注入风险（Claude Code的settings.json、.cursorrules等）
用户想要探测模型行为是否存在操纵、欺骗或幻觉风险

运行方式

快速静态扫描（无需API密钥）

bash
python3 {baseDir}/scripts/scan.py --modules posture,skill,memory,hooks --scan-dir . --no-llm --format markdown

完整扫描（自动检测API凭证）

bash

OpenClaw（自动读取网关配置）

python3 {baseDir}/scripts/scan.py --openclaw-root ~/.openclaw --format html --output /tmp/deepsafe-report.html

Claude Code / Cursor / Codex（使用ANTHROPICAPIKEY或OPENAIAPIKEY）

python3 {baseDir}/scripts/scan.py --modules posture,skill,memory,hooks,model --scan-dir . --format html --output /tmp/deepsafe-report.html

定向扫描

bash

仅钩子注入（最快——检查.claude/settings.json、.cursorrules等）

python3 {baseDir}/scripts/scan.py --modules hooks --scan-dir . --no-llm --format markdown

仅记忆扫描（检查泄露的机密/PII）

python3 {baseDir}/scripts/scan.py --openclaw-root ~/.openclaw --modules memory --no-llm

仅模型行为探测

python3 {baseDir}/scripts/scan.py --openclaw-root ~/.openclaw --modules model --profile quick

输出选项

bash
python3 {baseDir}/scripts/scan.py --format json # 机器可读
python3 {baseDir}/scripts/scan.py --format markdown # 人类可读摘要
python3 {baseDir}/scripts/scan.py --format html --output /tmp/report.html # 可视化报告

缓存控制

bash
python3 {baseDir}/scripts/scan.py --ttl-days 3 # 缓存3天
python3 {baseDir}/scripts/scan.py --no-cache # 始终重新扫描

结果解读

评分

- 每个模块评分1-100（100=干净，每发现一个问题扣分，最低1分）
模块贡献分 = floor(评分 / 4)，范围1-25
总分 = 4个贡献分之和，最高100分

严重级别

- 严重（-10分）：即时利用风险——机密泄露、无认证、数据外泄链
高（-5分）：严重风险——提示注入、敏感文件访问、网络暴露
中（-2分）：中等风险——硬编码密钥、缺少日志、供应链问题
低（-1分）：轻微改进——非标准端点、缺少元数据

风险评级

- 85-100：低风险（绿色）
65-84：中风险（黄色）
40-64：高风险（橙色）
4-39：严重风险（红色）

扫描后操作

1. 向用户展示总分和各模块细分
首先列出严重和高风险发现，并附上清晰说明
对每个发现，解释风险是什么以及如何修复
主动提供帮助修复问题（例如：需要我向您的网关配置添加认证吗？）

扫描内容

配置姿态模块（部署配置）

读取openclaw.json并检查：网关认证、网络暴露、API密钥处理、TLS、插件权限、MCP服务器风险、日志记录、沙箱。

技能模块（已安装技能与MCP）

扫描workspace/skills/、mcp/、mcp-servers/中的所有文件，检查：隐藏Unicode、提示注入、危险运行时调用、编码载荷、敏感文件引用、硬编码机密、系统提示提取、命令注入、数据外泄模式、破坏性操作、无确认自动执行、过度权限。

如果提供了--gateway-url（且未设置--no-llm），还会对每个技能的SKILL.md运行LLM增强语义分析，检测社会工程、隐式数据外泄和更深层次的模式识别。

记忆模块（会话与存储数据）

扫描agents/、credentials/、identity/、logs/、workspace/，检查：27种以上机密模式（OpenAI、GitHub、AWS、Slack、Stripe等的API密钥）、9种PII模式（电子邮件、电话、身份证号、信用卡）、持久性提示注入痕迹。

模型模块（4项行为安全探测）

需要--gateway-url和--gateway-token。运行：

- 说服探测：测试模型是否能在多轮对话中操纵用户意见
能力隐藏探测：测试模型是否能策略性地隐藏能力（性能控制）
欺骗探测：3阶段测试推理/行动不一致性（DTR指标）
幻觉评估探测：测试在QA基准上的幻觉检测准确性

每个探测产生一个带有风险级别和评分的发现。各探测的平均值 = 模块评分。

deepsafe-scan深度安全扫描

deepsafe-scan

DeepSafe Scan — Preflight Security Scanner for AI Coding Agents

When to Use

How to Run

Quick static scan (no API key needed)

Full scan (auto-detects API credentials)

Targeted scans

Output options

Cache control

Interpreting Results

Scores

Severity Levels

Risk Ratings

After Scanning

What Gets Scanned

Posture Module (deployment config)

Skill Module (installed skills & MCP)

Memory Module (sessions & stored data)

Model Module (4 behavioral safety probes)

DeepSafe Scan — AI编码代理的预检安全扫描器

使用场景

运行方式

快速静态扫描（无需API密钥）

完整扫描（自动检测API凭证）

OpenClaw（自动读取网关配置）

Claude Code / Cursor / Codex（使用ANTHROPICAPIKEY或OPENAIAPIKEY）

定向扫描

仅钩子注入（最快——检查.claude/settings.json、.cursorrules等）

仅记忆扫描（检查泄露的机密/PII）

仅模型行为探测

输出选项

缓存控制

结果解读

评分

严重级别

风险评级

扫描后操作

扫描内容

配置姿态模块（部署配置）

技能模块（已安装技能与MCP）

记忆模块（会话与存储数据）

模型模块（4项行为安全探测）

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement