Pattern Mine

"A pattern isn't repeated code. It's a repeated decision — and every repeated decision is a decision that should have been made once."

What It Does

Your codebase has patterns. Some are intentional (design patterns, conventions, shared utilities). Most are accidental — the same logic independently invented by different developers at different times, slightly different each time, all slowly diverging.

Pattern Mine excavates these buried patterns and brings them to the surface:

1. Convergent patterns: Different code doing the same thing (should be unified)
Divergent patterns: Same code doing different things (should be separated)
Emerging patterns: A pattern forming but not yet crystallized (candidate for abstraction)
Fossilized patterns: Old patterns still followed long after the reason died

The Four Mining Operations

Operation 1: Convergent Pattern Detection

"Three developers independently wrote the same thing"

Not just copy-paste detection (your linter does that). Pattern Mine finds semantically equivalent code with different syntax — code that does the same thing but looks different.

CODEBLOCK0

Operation 2: Divergent Pattern Detection

"Same abstraction, different behavior — the abstraction is lying"

Finds code that looks like it follows a pattern but actually deviates in meaningful ways:

CODEBLOCK1

Operation 3: Emerging Pattern Detection

"This is about to become a pattern — should it be one?"

Finds code that is repeated 2-3 times but hasn't yet become an abstraction. This is the sweet spot for extraction — enough repetition to justify it, but not yet so much that extraction requires touching dozens of files.

CODEBLOCK2

Operation 4: Fossilized Pattern Detection

"Everyone follows this pattern. Nobody remembers why."

Finds patterns that are consistently followed but serve no current purpose:

CODEBLOCK3

The Mining Process

CODEBLOCK4

Output Format

CODEBLOCK5

When to Invoke

- Before any refactoring effort — know what patterns exist before restructuring
When onboarding (understand the codebase's actual patterns, not just the documented ones)
During sprint planning for cleanup work (prioritized extraction targets)
When a code review reveals "we have this pattern everywhere"
After a new developer joins and writes code that almost matches existing patterns
Quarterly, as a health check (are patterns converging or diverging?)

Why It Matters

Unmined patterns are a hidden tax on every developer who reads, writes, or modifies the code. Every time someone writes retry logic from scratch because they didn't know a retry utility exists (or because the existing three retry utilities are all slightly different), the codebase gets a little bigger, a little more inconsistent, and a little harder to understand.

Pattern Mine doesn't tell you to DRY everything. It tells you where DRY matters and where it doesn't — so you abstract the right things at the right time.

Zero external dependencies. Zero API calls. Pure structural and semantic analysis.

模式挖掘

模式不是重复的代码。它是重复的决策——而每一个重复的决策，都是本应只做一次的决策。

功能概述

你的代码库中存在各种模式。有些是有意为之的（设计模式、约定、共享工具类）。但大多数是偶然产生的——由不同开发者在不同时间独立编写的相同逻辑，每次略有不同，逐渐产生分歧。

模式挖掘工具能够发掘这些被埋藏的模式，并将其带到表面：

1. 趋同模式：不同的代码做着相同的事情（应统一）
趋异模式：相同的代码做着不同的事情（应分离）
新兴模式：正在形成但尚未固化的模式（适合抽象化）
化石模式：在原有理由早已消失后仍被遵循的旧模式

四种挖掘操作

操作一：趋同模式检测

三位开发者独立编写了相同的功能

不仅仅是复制粘贴检测（你的代码检查工具也能做到）。模式挖掘工具能发现语义等价但语法不同的代码——做相同事情但看起来不同的代码。

示例——发现：3个独立的带退避的重试实现

位置1：src/api/client.ts:45
async function fetchWithRetry(url, attempts = 3) {
for (let i = 0; i < attempts; i++) {
try { return await fetch(url); }
catch (e) { await sleep(1000 * Math.pow(2, i)); }
}
throw new Error(重试后失败);
}

位置2：src/services/payment.ts:112
const retry = async (fn, max = 3) => {
let lastError;
for (let attempt = 1; attempt <= max; attempt++) {
try { return await fn(); }
catch (err) { lastError = err; await delay(attempt * 2000); }
}
throw lastError;
};

位置3：src/workers/email.ts:67
function withRetry(operation, retries = 5) {
return operation().catch(err => {
if (retries <= 0) throw err;
return new Promise(r => setTimeout(r, 1000))
.then(() => withRetry(operation, retries - 1));
});
}

分析：
├── 三者都实现了带退避的重试
├── 差异：最大尝试次数（3、3、5），退避策略（指数、线性、固定）
├── 差异：错误处理（通用抛出、保留最后一次、重新抛出）
├── 没有一个足够可配置以替代其他
└── 建议：提取共享的重试工具类，支持可配置的
尝试次数、退避策略和错误处理

操作二：趋异模式检测

相同的抽象，不同的行为——抽象在说谎

发现那些看起来遵循某种模式但实际上有显著偏差的代码：

示例——发现：UserValidator 偏离了模式

模式：src/validators/ 中的所有验证器遵循：
├── validate(input) → { valid: boolean, errors: string[] }
├── 对空输入抛出异常
├── 成功时返回空 errors 数组

偏差：UserValidator
├── validate() 返回 { isValid: boolean, messages: string[] }
│ └── 不同的属性名：valid→isValid，errors→messages
├── 对空输入返回 null（不抛出异常）
├── 成功时返回 undefined errors（不是空数组）
└── UserValidator 的每个消费者都有特殊处理

建议：使 UserValidator 与通用模式对齐。
预计消费者清理：8 个文件。

操作三：新兴模式检测

这即将成为一种模式——它应该成为模式吗？

发现重复了2-3次但尚未抽象化的代码。这是提取的最佳时机——重复次数足够证明提取的合理性，但尚未多到需要修改数十个文件。

示例——新兴：权限检查 + 审计日志（2次出现，可能增长）

src/routes/admin.ts：
if (!user.hasRole(admin)) {
auditLog.write({ action: ADMINACCESSDENIED, userId: user.id });
throw new ForbiddenError(需要管理员权限);
}

src/routes/billing.ts：
if (!user.hasRole(billing)) {
auditLog.write({ action: BILLINGACCESSDENIED, userId: user.id });
throw new ForbiddenError(需要计费权限);
}

分析：
├── 模式：角色检查 → 审计拒绝访问 → 抛出禁止异常
├── 出现次数：2（第三个路由正在本迭代中编写）
├── 变化：只有角色名称和审计操作不同
└── 建议：在第三个副本出现之前，提取 requireRole(user, role) 中间件

操作四：化石模式检测

每个人都遵循这个模式。没人记得为什么。

发现被一致遵循但已无实际用途的模式：

示例——化石：非空调用后的防御性空值检查

在23个位置发现的模式：
const user = await getUser(id); // getUser 现在总是返回 User 或抛出异常
if (!user) { // 这个分支不可达
throw new NotFoundError(); // getUser 自身会抛出 NotFoundError
}

历史：
├── getUser() 过去对不存在的用户返回 null（2024年之前）
├── 改为直接抛出 NotFoundError（提交 a8f3d2e，2024-03）
├── 重写后未移除空值检查
└── 新代码从旧代码复制了该模式（盲目模仿）

建议：移除23个不可达的空值检查。
安全移除：是（getUser 的契约保证非空返回）。

挖掘流程

阶段1：提取
├── 将所有源文件解析为结构表示
├── 识别功能块（函数、方法、处理器、中间件）
├── 对每个块，提取：
│ ├── 输入/输出签名
│ ├── 执行的核心操作
│ ├── 错误处理策略
│ ├── 副作用
│ └── 依赖关系
└── 构建所有块之间的相似度矩阵

阶段2：聚类
├── 按语义相似度（而非仅语法）对块进行分组
├── 对每个聚类：
│ ├── 有多少实例？（2-3 = 新兴，4+ = 已建立）
│ ├── 一致性如何？（相同 = 趋同，有变化 = 趋异）
│ ├── 多久了？（全部近期 = 新兴，全部陈旧 = 化石）
│ └── 趋势？（增长 = 新兴，稳定 = 已建立，减少 = 化石）
└── 过滤噪声：单行模式、框架样板代码、琐碎重复

阶段3：分析
├── 对趋同模式：
│ ├── 规范形式是什么？（最常见的变体）
│ ├── 有意义的变体是什么？（可配置 vs 复制粘贴错误）
│ ├── 提取难度（每个实例的耦合程度）
│ └── 提取收益（消除的代码量 × 变更频率）
├── 对趋异模式：
│ ├── 哪个实例是错误的？（或者模式本身是错误的？）
│ ├── 趋异的影响（让开发者困惑？导致bug？）
│ └── 对齐难度
├── 对新兴模式：
│ ├── 抽象化是否合理？（三的原则）
│ ├── 接口应该是什么样子？
│ └── 这个模式会继续增长吗？
└── 对化石模式：
├── 原有理由何时失效？
├── 移除是否安全？
└── 需要清理多少个实例？

阶段4：挖掘报告
├── 发现的模式，按类型分类
├── 提取/清理建议，按以下优先级排序：
│ ├── Bug风险（趋异模式优先）
│ ├── 开发速度（重复最多的趋同模式）
│ ├── 代码健康（化石模式用于清理）
│ └── 时效性（新兴模式在扩散前处理）
└── 每个建议的预估工作量

输出格式

╔══════════════════════════════════════════════════════════════╗
║ 模式挖掘 ║
║ 代码库：acme-platform ║
║ 扫描文件：347 / 发现模式：18 ║
╠══════════════════════════════════════════════════════════════╣
║ ║
║ 趋同（应统一）：6个模式 ║
║ ├── 带退避的重试 ........... 3个变体，3个文件 ║
║ │ 提取节省：约45行，统一行为 ║
║ ├── API响应格式化 .......... 4个变体，12个文件 ║
║ │ 提取节省：约120行，修复2处不一致

pattern-mine模式挖掘

pattern-mine

Pattern Mine

What It Does

The Four Mining Operations