Ship Loop v5.0 — TARS Convergence
Orchestrate multi-segment feature work as a self-healing pipeline. Three nested loops ensure maximum autonomy: Loop 1 runs the standard code→preflight→ship→verify chain, Loop 2 auto-repairs failures via the coding agent, Loop 3 spawns experiment branches when repairs stall. A SQLite state backend provides crash recovery and cross-run analytics. A verdict router replaces hardcoded branching with a configurable decision table. A reflection loop audits historical effectiveness and auto-generates learnings.
Architecture: Three Loops + Event Queue + Verdict Router
CODEBLOCK0
Security Notice
SHIPLOOP.yml is equivalent to running a script. The agent_command, all preflight commands (build, lint, test), and custom deploy scripts execute with your full user privileges. Ship Loop does not sandbox these commands. Never use on untrusted repos without reviewing the config. Treat SHIPLOOP.yml with the same caution as a Makefile or CI pipeline.
When to Use
- - Building multiple features for a project in sequence
- Any work that follows: code → preflight → commit → deploy → verify → next
- When you need checkpointing so progress survives session restarts
- When you want self-healing: failures auto-repair before asking humans
- When you want cost visibility and learning from past runs
Prerequisites
- - Python 3.10+ with
pyyaml and pydantic installed - A git repository with a remote
- A deployment pipeline triggered by push (Vercel, Netlify, etc.)
- A coding agent CLI configured via
agent_command in SHIPLOOP.yml
Installation
CODEBLOCK1
CLI Usage
CODEBLOCK2
Pipeline Definition (SHIPLOOP.yml)
CODEBLOCK3
SQLite State Backend (v5.0)
State is now stored in .shiploop/tars.db (SQLite, WAL mode). SHIPLOOP.yml is config-only.
Tables
| Table | Purpose |
|---|
| INLINECODE8 | Pipeline execution records (id, project, startedat, status, cost) |
| INLINECODE9 |
Segment execution records per run (status, commit, touchedpaths) |
|
run_events | Event queue for crash recovery and audit trail |
|
learnings | Failure/success lessons with effectiveness scores |
|
usage | Token and cost records per agent invocation |
|
decision_gaps | Situations the system didn't know how to handle |
Event Types
| Event | When emitted |
|---|
| INLINECODE14 | Agent invocation begins |
| INLINECODE15 |
All preflight steps pass |
|
preflight_failed | Any preflight step fails |
|
repair_done | Repair loop succeeded |
|
repair_failed | Repair loop failed or exhausted |
|
meta_done | Meta loop winner merged |
|
segment_shipped | Segment fully complete |
|
segment_failed | Segment permanently failed |
|
deploy_failed | Deploy or verification failed |
|
file_overlap_warning | Segment may touch files changed by prior segment |
Crash recovery: On startup, unprocessed events are replayed to restore pipeline state.
Verdict Router (v5.0)
The orchestrator no longer uses if/else chains. Every outcome maps to a Verdict, and a VerdictRouter maps verdicts to Action values.
Default Routing Table
| Verdict | Default Action |
|---|
| INLINECODE28 | INLINECODE29 |
| INLINECODE30 |
repair |
|
agent_fail |
fail |
|
deploy_fail |
retry |
|
repair_success |
ship |
|
repair_exhausted |
meta |
|
meta_success |
ship |
|
meta_exhausted |
fail |
|
budget_exceeded |
fail |
|
converged |
meta ← skip remaining repairs, jump to meta |
|
no_changes |
fail |
|
unknown |
pause_and_alert |
Override via router: section in SHIPLOOP.yml (see above).
Meta-Reflection Loop (v5.0)
Runs automatically after pipeline completion (when reflection.auto_run: true) or manually via shiploop reflect.
What It Analyzes
- 1. Repeat failures — same error_signature across multiple segments/runs
- Repair-heavy segments — segments that needed >1 repair loop (same error type)
- Efficiency trends — cost/time per segment trending up or down
- Stale learnings — learnings with score < 0.3 that haven't helped
- Decision gaps — situations that triggered INLINECODE55
Auto-creates learnings from patterns
If an error signature appears 3+ times across runs, the reflect loop auto-generates a AUTO-<sig> learning flagging it for human review.
CODEBLOCK4
Playbook Evolution (v5.0)
When a repair fails with an error that doesn't match any existing learning, the system records a decision_gap:
CODEBLOCK5
Decision gaps surface in shiploop reflect output and the decision_gaps DB table. Operators use them to add new learnings or router overrides.
Convergence Detection (v5.0 Enhanced)
Same-segment: if two consecutive repair attempts produce the same error hash → CONVERGED verdict → router jumps to META (skipping remaining repair attempts).
Cross-segment: before starting a segment, the orchestrator checks if any already-shipped segment touched the same files (via touched_paths in DB). If overlap detected, a file_overlap_warning event is emitted.
Learnings Scoring (v5.0)
CODEBLOCK6
Search results are sorted by combined keyword-relevance × score. Learnings with score < 0.3 are flagged as stale in reflection.
CODEBLOCK7
State Machine
CODEBLOCK8
SHIPLOOP.yml checkpointed after every transition (for backward compat). SQLite is the primary state store.
Deploy Providers
| Provider | How it works |
|---|
| INLINECODE64 | Polls routes for HTTP 200, checks x-vercel-deployment-url header |
| INLINECODE66 |
Polls routes for HTTP 200, checks
x-nf-request-id header |
|
custom | Runs
deploy.script with
SHIPLOOP_COMMIT and
SHIPLOOP_SITE env vars |
Budget Tracking
Token usage and estimated costs tracked per agent invocation in SQLite (falls back to metrics.json).
CODEBLOCK9
Critical Rules
- 1. Never break the chain — after a segment ships, immediately start the next
- Preflight is mandatory — no exceptions, no "ship now fix later"
- Explicit staging only — never
git add -A, only changed files from INLINECODE74 - Prompts via file — never shell arguments (prevents injection)
- SQLite is source of truth — SHIPLOOP.yml config-only; runtime state in INLINECODE75
- Agent command from config — always read from
agent_command, never hardcode - Budget-aware — track costs, enforce limits, fail gracefully
Project Structure
CODEBLOCK10
Changelog
v5.0.0 (2026-03-27) — TARS Convergence
- - SQLite state backend:
tars.db replaces metrics.json + learnings.yml for runtime state - Event queue: all phase transitions emit events; unprocessed events enable crash recovery
- Verdict router: configurable
Verdict → Action table replaces if/else chains in orchestrator - Meta-reflection loop:
shiploop reflect analyzes run history, finds patterns, auto-generates learnings - Playbook evolution:
MISSING_DECISION_BRANCH detection → decision_gaps table - Cross-segment convergence:
touched_paths tracked per segment for overlap warnings - Learnings scoring: score field (+0.1 on success, -0.2 on failure), sorted by score
- New CLI commands:
reflect, events, INLINECODE87 - New config sections:
reflection, INLINECODE89
v4.0.0
- - Python CLI replaces bash scripts
- Pydantic v2 config validation
- Budget tracking with per-segment and per-run limits
- Error convergence detection (hash-based)
- Deploy provider plugins (Vercel, Netlify, Custom)
Ship Loop v5.0 — TARS 收敛
将多段功能工作编排为自愈流水线。三个嵌套循环确保最大自主性:循环1运行标准的 代码→预检→发布→验证 链路,循环2通过编码代理自动修复失败,循环3在修复停滞时生成实验分支。SQLite 状态后端提供崩溃恢复和跨运行分析。裁决路由器用可配置的决策表替代硬编码分支。反思循环审计历史有效性并自动生成经验教训。
架构:三个循环 + 事件队列 + 裁决路由器
┌───────────────────────────────────────────────────────────┐
│ SHIP LOOP v5.0 │
│ │
│ 循环1: 发布循环 │
│ 代码 → 预检 → 发布 → 验证 → emit(segment_shipped) │
│ │ │
│ 失败时 (裁决 → 通过VerdictRouter执行动作) │
│ ▼ │
│ 循环2: 修复循环 │
│ 捕获上下文 → 代理修复 → 重新预检 (最多N次) │
│ ↳ 发射事件: repairdone | repairfailed │
│ ↳ 检测到收敛 → CONVERGED裁决 → META │
│ ↳ 未知错误 → recorddecisiongap() │
│ │ │
│ 已耗尽 │
│ ▼ │
│ 循环3: 元循环 │
│ 元分析 → N个实验分支 → 胜出者 → 合并 │
│ ↳ 发射: meta_done │
│ │
│ 🗄 SQLite (tars.db): 运行、段、事件、经验教训 │
│ 📋 事件队列: 通过未处理事件实现崩溃恢复 │
│ 🔀 裁决路由器: 可配置的裁决→动作表 │
│ 📚 经验教训引擎: 带评分的经验教训 (分数追踪使用情况) │
│ 🪞 反思循环: 运行后分析 + 建议 │
│ 💰 预算追踪器: 每次运行的令牌/成本追踪 │
└───────────────────────────────────────────────────────────┘
安全提示
SHIPLOOP.yml 等同于运行脚本。 agent_command、所有预检命令(build、lint、test)和自定义部署脚本均以您的完整用户权限执行。Ship Loop 不会对这些命令进行沙箱隔离。未经审查配置,切勿在不可信仓库上使用。 对待 SHIPLOOP.yml 应像对待 Makefile 或 CI 流水线一样谨慎。
使用场景
- - 按顺序为项目构建多个功能
- 任何遵循以下流程的工作:代码 → 预检 → 提交 → 部署 → 验证 → 下一步
- 需要检查点以便进度在会话重启后得以保留
- 需要自愈能力:在请求人工介入前自动修复失败
- 需要成本可见性和从历史运行中学习
前置条件
- - Python 3.10+,已安装 pyyaml 和 pydantic
- 带有远程仓库的 git 仓库
- 由推送触发的部署流水线(Vercel、Netlify 等)
- 通过 SHIPLOOP.yml 中的 agent_command 配置的编码代理 CLI
安装
bash
pip install pyyaml pydantic
CLI 使用
bash
核心流水线
shiploop run # 启动或恢复流水线
shiploop run --dry-run # 预览将要执行的操作
shiploop status # 显示段状态(从数据库读取)
shiploop reset
# 将段重置为待处理状态
经验教训
shiploop learnings list
shiploop learnings search dark mode theme toggle
预算
shiploop budget # 显示成本摘要
v5.0 新增
shiploop reflect # 对近期运行历史运行元反思
shiploop reflect --depth 20 # 分析最近20次运行
shiploop events # 查看最新运行的事件历史
shiploop events # 查看特定运行的事件历史
shiploop history # 从数据库查看历史运行记录
选项
shiploop -c /path/to/SHIPLOOP.yml run
shiploop -v run # 详细日志
shiploop --version # 显示版本 (5.0.0)
流水线定义 (SHIPLOOP.yml)
yaml
project: 项目名称
repo: /absolute/path/to/project
site: https://production-url.com
branch: pr # direct-to-main | per-segment | pr
mode: solo
agent_command: claude --print --permission-mode bypassPermissions
preflight:
build: npm run build
lint: npm run lint
test: npm run test
deploy:
provider: vercel # vercel | netlify | custom
routes: [/, /api/health]
marker: data-version
health_endpoint: /api/health
deploy_header: x-vercel-deployment-url
timeout: 300
repair:
max_attempts: 3
meta:
enabled: true
experiments: 3
budget:
maxusdper_segment: 10.0
maxusdper_run: 50.0
maxtokensper_segment: 500000
haltonbreach: true
v5.0 新增: 反思配置
reflection:
enabled: true # 流水线完成后运行反思循环
auto_run: true # 自动运行,不仅限于CLI命令
history_depth: 10 # 分析多少条历史运行记录
v5.0 新增: 自定义裁决路由
router:
agent_fail: retry # 用retry覆盖默认值(fail)
deploy_fail: fail # 用fail覆盖默认值(retry)
segments:
- name: feature-name
status: pending
prompt: |
您的编码代理提示词在此处。
depends_on: []
SQLite 状态后端 (v5.0)
状态现在存储在 .shiploop/tars.db(SQLite,WAL模式)。SHIPLOOP.yml 仅用于配置。
表
| 表 | 用途 |
|---|
| runs | 流水线执行记录 (id, project, startedat, status, cost) |
| segments |
每次运行的段执行记录 (status, commit, touchedpaths) |
| run_events | 用于崩溃恢复和审计追踪的事件队列 |
| learnings | 带有有效性评分的失败/成功经验教训 |
| usage | 每次代理调用的令牌和成本记录 |
| decision_gaps | 系统不知道如何处理的情况 |
事件类型
| 事件 | 发射时机 |
|---|
| agentstarted | 代理调用开始 |
| preflightpassed |
所有预检步骤通过 |
| preflight_failed | 任何预检步骤失败 |
| repair_done | 修复循环成功 |
| repair_failed | 修复循环失败或耗尽 |
| meta_done | 元循环胜出者已合并 |
| segment_shipped | 段完全完成 |
| segment_failed | 段永久失败 |
| deploy_failed | 部署或验证失败 |
| fileoverlapwarning | 段可能触及先前段更改的文件 |
崩溃恢复:启动时,未处理的事件将被重放以恢复流水线状态。
裁决路由器 (v5.0)
编排器不再使用 if/else 链。每个结果映射到一个 Verdict,VerdictRouter 将裁决映射到 Action 值。
默认路由表
| 裁决 | 默认动作 |
|---|
| success | ship |
| preflight_fail |
repair |
| agent_fail | fail |
| deploy_fail | retry |
| repair_success | ship |
| repair_exhausted | meta |
| meta_success | ship |
| meta_exhausted | fail |
| budget_exceeded | fail |
| converged | meta ← 跳过剩余修复,跳转到元循环 |
| no_changes | fail |
| unknown | pauseandalert |
通过 SHIPLO