12-Factor Agents Compliance Analysis
Reference: 12-Factor Agents
Input Parameters
| Parameter | Description | Required |
|---|
| INLINECODE0 | Path to documentation directory (for existing analyses) | Optional |
| INLINECODE1 |
Root path of the codebase to analyze | Required |
Analysis Framework
Factor 1: Natural Language to Tool Calls
Principle: Convert natural language inputs into structured, deterministic tool calls using schema-validated outputs.
Search Patterns:
CODEBLOCK0
File Patterns: **/agents/*.py, **/schemas/*.py, INLINECODE4
Compliance Criteria:
| Level | Criteria |
|---|
| Strong | All LLM outputs use Pydantic/dataclass schemas with validators |
| Partial |
Some outputs typed, but dict returns or unvalidated strings exist |
|
Weak | LLM returns raw strings parsed manually or with regex |
Anti-patterns:
- -
json.loads(llm_response) without schema validation - INLINECODE6 or regex parsing of LLM responses
- INLINECODE7 return types from agents
- No validation between LLM output and handler execution
Factor 2: Own Your Prompts
Principle: Treat prompts as first-class code you control, version, and iterate on.
Search Patterns:
CODEBLOCK1
File Patterns: **/prompts/**, **/templates/**, INLINECODE10
Compliance Criteria:
| Level | Criteria |
|---|
| Strong | Prompts in separate files, templated (Jinja2), versioned |
| Partial |
Prompts as module constants, some parameterization |
|
Weak | Prompts hardcoded inline in functions, f-strings only |
Anti-patterns:
- -
f"You are a {role}..." inline in agent methods - Prompts mixed with business logic
- No way to iterate on prompts without code changes
- No prompt versioning or A/B testing capability
Factor 3: Own Your Context Window
Principle: Control how history, state, and tool results are formatted for the LLM.
Search Patterns:
CODEBLOCK2
File Patterns: **/context/*.py, **/state/*.py, INLINECODE14
Compliance Criteria:
| Level | Criteria |
|---|
| Strong | Custom context format, token optimization, typed events, compaction |
| Partial |
Basic message history with some structure |
|
Weak | Raw message accumulation, standard OpenAI format only |
Anti-patterns:
- - Unbounded message accumulation
- Large artifacts embedded inline (diffs, files)
- No agent-specific context filtering
- Same context for all agent types
Factor 4: Tools Are Structured Outputs
Principle: Tools produce schema-validated JSON that triggers deterministic code, not magic function calls.
Search Patterns:
CODEBLOCK3
File Patterns: **/tools/*.py, **/handlers/*.py, INLINECODE17
Compliance Criteria:
| Level | Criteria |
|---|
| Strong | All tool outputs schema-validated, handlers type-safe |
| Partial |
Most tools typed, some loose dict returns |
|
Weak | Tools return arbitrary dicts, no validation layer |
Anti-patterns:
- - Tool handlers that directly execute LLM output
- INLINECODE18 or
exec() on LLM-generated code - No separation between decision (LLM) and execution (code)
- Magic method dispatch based on string matching
Factor 5: Unify Execution State
Principle: Merge execution state (step, retries) with business state (messages, results).
Search Patterns:
CODEBLOCK4
File Patterns: **/state/*.py, **/models/*.py, INLINECODE22
Compliance Criteria:
| Level | Criteria |
|---|
| Strong | Single serializable state object with all execution metadata |
| Partial |
State exists but split across systems (memory + DB) |
|
Weak | Execution state scattered, requires multiple queries to reconstruct |
Anti-patterns:
- - Retry count stored separately from task state
- Error history in logs but not in state
- LangGraph checkpoints + separate database storage
- No unified event thread
Factor 6: Launch/Pause/Resume
Principle: Agents support simple APIs for launching, pausing at any point, and resuming.
Search Patterns:
CODEBLOCK5
File Patterns: **/routes/*.py, **/api/*.py, INLINECODE25
Compliance Criteria:
| Level | Criteria |
|---|
| Strong | REST API + webhook resume, pause at any point including mid-tool |
| Partial |
Launch/pause/resume exists but only at coarse-grained points |
|
Weak | CLI-only launch, no pause/resume capability |
Anti-patterns:
- - Blocking
input() or confirm() calls - No way to resume after process restart
- Approval only at plan level, not per-tool
- No webhook-based resume from external systems
Factor 7: Contact Humans with Tools
Principle: Human contact is a tool call with question, options, and urgency.
Search Patterns:
CODEBLOCK6
File Patterns: **/agents/*.py, **/tools/*.py, INLINECODE30
Compliance Criteria:
| Level | Criteria |
|---|
| Strong | INLINECODE31 tool with question/options/urgency/format |
| Partial |
Approval gates exist but hardcoded in graph structure |
|
Weak | Blocking CLI prompts, no tool-based human contact |
Anti-patterns:
- -
typer.confirm() in agent code - Human contact hardcoded at specific graph nodes
- No way for agents to ask clarifying questions
- Single response format (yes/no only)
Factor 8: Own Your Control Flow
Principle: Custom control flow, not framework defaults. Full control over routing, retries, compaction.
Search Patterns:
CODEBLOCK7
File Patterns: **/orchestrator/*.py, **/graph/*.py, INLINECODE35
Compliance Criteria:
| Level | Criteria |
|---|
| Strong | Custom routing functions, conditional edges, execution mode control |
| Partial |
Framework control flow with some customization |
|
Weak | Default framework loop with no custom routing |
Anti-patterns:
- - Single path through graph with no branching
- No distinction between tool types (all treated same)
- Framework-default error handling only
- No rate limiting or resource management
Factor 9: Compact Errors into Context
Principle: Errors in context enable self-healing. Track consecutive errors, escalate after threshold.
Search Patterns:
CODEBLOCK8
File Patterns: **/agents/*.py, **/orchestrator/*.py, INLINECODE38
Compliance Criteria:
| Level | Criteria |
|---|
| Strong | Errors in context, retry with threshold, automatic escalation |
| Partial |
Errors logged and returned, no automatic retry loop |
|
Weak | Errors logged only, not fed back to LLM, task fails immediately |
Anti-patterns:
- -
logger.error() without adding to context - No retry mechanism (fail immediately)
- No consecutive error tracking
- No escalation to humans after repeated failures
Factor 10: Small, Focused Agents
Principle: Each agent has narrow responsibility, 3-10 steps max.
Search Patterns:
CODEBLOCK9
File Patterns: INLINECODE40
Compliance Criteria:
| Level | Criteria |
|---|
| Strong | 3+ specialized agents, each with single responsibility, step limits |
| Partial |
Multiple agents but some have broad scope |
|
Weak | Single "god" agent that handles everything |
Anti-patterns:
- - Single agent with 20+ tools
- Agent with unbounded step count
- Mixed responsibilities (planning + execution + review)
- No step or time limits on agent execution
Factor 11: Trigger from Anywhere
Principle: Workflows triggerable from CLI, REST, WebSocket, Slack, webhooks, etc.
Search Patterns:
CODEBLOCK10
File Patterns: **/routes/*.py, **/cli/*.py, INLINECODE43
Compliance Criteria:
| Level | Criteria |
|---|
| Strong | CLI + REST + WebSocket + webhooks + chat integrations |
| Partial |
CLI + REST API available |
|
Weak | CLI only, no programmatic access |
Anti-patterns:
- - Only
if __name__ == "__main__" entry point - No REST API for external systems
- No event streaming for real-time updates
- Trigger logic tightly coupled to execution
Factor 12: Stateless Reducer
Principle: Agents as pure functions: (state, input) -> (state, output). No side effects in agent logic.
Search Patterns:
CODEBLOCK11
File Patterns: **/agents/*.py, INLINECODE46
Compliance Criteria:
| Level | Criteria |
|---|
| Strong | Immutable state updates, side effects isolated to tools/handlers |
| Partial |
Mostly immutable, some in-place mutations |
|
Weak | State mutated in place, side effects mixed with agent logic |
Anti-patterns:
- -
state.field = new_value (mutation) - File writes inside agent methods
- HTTP calls inside agent decision logic
- Shared mutable state between agents
Factor 13: Pre-fetch Context
Principle: Fetch likely-needed data upfront rather than mid-workflow.
Search Patterns:
CODEBLOCK12
File Patterns: **/context/*.py, **/retrieval/*.py, INLINECODE50
Compliance Criteria:
| Level | Criteria |
|---|
| Strong | Automatic pre-fetch of related tests, files, docs before planning |
| Partial |
Manual context passing, design doc support |
|
Weak | No pre-fetching, LLM must request all context via tools |
Anti-patterns:
- - Architect starts with issue only, no codebase context
- No semantic search for similar past work
- Related tests/files discovered only during execution
- No RAG or document retrieval system
Output Format
Executive Summary Table
CODEBLOCK13
Per-Factor Analysis
For each factor, provide:
- 1. Current Implementation
- Evidence with file:line references
- Code snippets showing patterns
- 2. Compliance Level
- Strong/Partial/Weak with justification
- 3. Gaps
- What's missing vs. 12-Factor ideal
- 4. Recommendations
- Actionable improvements with code examples
Analysis Workflow
- 1. Initial Scan
- Run search patterns for all factors
- Identify key files for each factor
- Note any existing compliance documentation
- 2. Deep Dive (per factor)
- Read identified files
- Evaluate against compliance criteria
- Document evidence with file paths
- 3. Gap Analysis
- Compare current vs. 12-Factor ideal
- Identify anti-patterns present
- Prioritize by impact
- 4. Recommendations
- Provide actionable improvements
- Include before/after code examples
- Reference roadmap if exists
- 5. Summary
- Compile executive summary table
- Highlight strengths and critical gaps
- Suggest priority order for improvements
Quick Reference: Compliance Scoring
| Score | Meaning | Action |
|---|
| Strong | Fully implements principle | Maintain, minor optimizations |
| Partial |
Some implementation, significant gaps | Planned improvements |
|
Weak | Minimal or no implementation | High priority for roadmap |
When to Use This Skill
- - Evaluating new LLM-powered systems
- Reviewing agent architecture decisions
- Auditing production agentic applications
- Planning improvements to existing agents
- Comparing frameworks or implementations
12-Factor Agents 合规性分析
参考:12-Factor Agents
输入参数
| 参数 | 描述 | 必填 |
|---|
| docspath | 文档目录路径(用于已有分析) | 可选 |
| codebasepath |
待分析代码库的根路径 | 必填 |
分析框架
因子1:自然语言到工具调用
原则: 使用模式验证的输出,将自然语言输入转换为结构化、确定性的工具调用。
搜索模式:
bash
查找 Pydantic 模式
grep -r class.
BaseModel --include=.py
grep -r TaskDAG\|TaskResponse\|ToolCall --include=*.py
查找 JSON 模式生成
grep -r model
jsonschema\|json_schema --include=*.py
查找结构化输出生成
grep -r output
type\|responsemodel --include=*.py
文件模式: /agents/.py,/schemas/.py,/models/*.py
合规标准:
| 级别 | 标准 |
|---|
| 强 | 所有 LLM 输出使用带验证器的 Pydantic/dataclass 模式 |
| 部分 |
部分输出有类型,但存在字典返回或未验证的字符串 |
|
弱 | LLM 返回原始字符串,手动或用正则解析 |
反模式:
- - 无模式验证的 json.loads(llm_response)
- 对 LLM 响应使用 output.split() 或正则解析
- 代理返回 dict[str, Any] 类型
- LLM 输出与处理器执行之间无验证
因子2:拥有你的提示词
原则: 将提示词视为你控制、版本管理和迭代的一等代码。
搜索模式:
bash
查找内嵌提示词
grep -r SYSTEM
PROMPT\|systemprompt --include=*.py
grep -r .
You are --include=.py
查找模板系统
grep -r jinja\|Jinja\|render_template --include=*.py
find . -name
.jinja2 -o -name .j2
查找提示词目录
find . -type d -name prompts
文件模式: /prompts/,/templates/,/agents/*.py
合规标准:
| 级别 | 标准 |
|---|
| 强 | 提示词在单独文件中,使用模板(Jinja2),有版本管理 |
| 部分 |
提示词作为模块常量,有一定参数化 |
|
弱 | 提示词硬编码在函数内联中,仅使用 f-string |
反模式:
- - 在代理方法中内联使用 fYou are a {role}...
- 提示词与业务逻辑混合
- 无法在不修改代码的情况下迭代提示词
- 没有提示词版本管理或 A/B 测试能力
因子3:拥有你的上下文窗口
原则: 控制历史记录、状态和工具结果如何格式化为 LLM 输入。
搜索模式:
bash
查找上下文/消息管理
grep -r AgentMessage\|ChatMessage\|messages --include=*.py
grep -r context
window\|contextcompiler --include=*.py
查找自定义序列化
grep -r to
xml\|tocontext\|serialize --include=*.py
查找令牌管理
grep -r token
count\|maxtokens\|truncate --include=*.py
文件模式: /context/.py,/state/.py,/core/*.py
合规标准:
| 级别 | 标准 |
|---|
| 强 | 自定义上下文格式、令牌优化、类型化事件、压缩 |
| 部分 |
基本消息历史,有一定结构 |
|
弱 | 原始消息累积,仅使用标准 OpenAI 格式 |
反模式:
- - 无限制的消息累积
- 内嵌大型工件(差异、文件)
- 无代理特定的上下文过滤
- 所有代理类型使用相同上下文
因子4:工具是结构化输出
原则: 工具产生模式验证的 JSON,触发确定性代码,而非魔法函数调用。
搜索模式:
bash
查找工具/响应模式
grep -r class.
Response.BaseModel --include=*.py
grep -r ToolResult\|ToolOutput --include=*.py
查找确定性处理器
grep -r def handle
\|def execute --include=*.py
查找验证层
grep -r model
validate\|parseobj --include=*.py
文件模式: /tools/.py,/handlers/.py,/agents/*.py
合规标准:
| 级别 | 标准 |
|---|
| 强 | 所有工具输出经过模式验证,处理器类型安全 |
| 部分 |
大多数工具有类型,部分存在松散字典返回 |
|
弱 | 工具返回任意字典,无验证层 |
反模式:
- - 直接执行 LLM 输出的工具处理器
- 对 LLM 生成的代码使用 eval() 或 exec()
- 决策(LLM)与执行(代码)之间无分离
- 基于字符串匹配的魔法方法分发
因子5:统一执行状态
原则: 将执行状态(步骤、重试)与业务状态(消息、结果)合并。
搜索模式:
bash
查找状态模型
grep -r ExecutionState\|WorkflowState\|Thread --include=*.py
查找双状态系统
grep -r checkpoint\|MemorySaver --include=*.py
grep -r sqlite\|database\|repository --include=*.py
查找状态重建
grep -r load_state\|restore\|reconstruct --include=*.py
文件模式: /state/.py,/models/.py,/database/*.py
合规标准:
| 级别 | 标准 |
|---|
| 强 | 单一可序列化状态对象,包含所有执行元数据 |
| 部分 |
状态存在但跨系统拆分(内存 + 数据库) |
|
弱 | 执行状态分散,需要多次查询才能重建 |
反模式:
- - 重试计数与任务状态分开存储
- 错误历史在日志中但不在状态中
- LangGraph 检查点 + 独立数据库存储
- 无统一事件线程
因子6:启动/暂停/恢复
原则: 代理支持简单的 API 用于启动、在任何点暂停和恢复。
搜索模式:
bash
查找 REST 端点
grep -r @router.post\|@app.post --include=*.py
grep -r start_workflow\|pause\|resume --include=*.py
查找中断机制
grep -r interrupt
before\|interruptafter --include=*.py
查找 Webhook 处理器
grep -r webhook\|callback --include=*.py
文件模式: /routes/.py,/api/.py,/orchestrator/*.py
合规标准:
| 级别 | 标准 |
|---|
| 强 | REST API + Webhook 恢复,可在任何点暂停,包括工具执行中 |
| 部分 |
存在启动/暂停/恢复,但仅在粗粒度点 |
|
弱 | 仅 CLI 启动,无暂停/恢复能力 |
反模式:
- - 阻塞的 input() 或 confirm() 调用
- 进程重启后无法恢复
- 审批仅在计划级别,而非每个工具
- 无基于 Webhook 的外部系统恢复
因子7:通过工具联系人类
原则: 人类联系是一个带有问题、选项和紧急程度的工具调用。
搜索模式:
bash
查找人类输入机制
grep -r typer.confirm\|input(\|prompt( --include=*.py
grep -r request
humaninput\|human_contact --include=*.py
查找审批模式
grep -r approval\|approve\|reject --include=*.py
查找结构化问题