返回顶部
m

master-data-matching

>

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
144
下载量
0
收藏
概述
安装方式
版本历史

master-data-matching

# Master Data Intelligent Matching System ## Overview A production-ready skill for intelligent entity resolution across business domains. It combines exact-match and vector-semantic retrieval, OCR field mapping with confidence coloring, and human-in-the-loop verification with active learning. ## Usage ```javascript import mdm from './index.js'; // 1. Get supported domains mdm.getSupportedDomains(); // ['procurement', 'finance', 'sales', 'hr'] // 2. Build OCR-to-schema mapping with confidence colors const mapping = mdm.buildOcrSchemaMapping(ocrFields, 'procurement'); // 3. Run full matching pipeline const result = mdm.runMatchingPipeline(ocrEntity, 'procurement', dbRecords); // 4. Format result as summary console.log(mdm.formatMatchingSummary(result)); ``` ## Key Features ### Business Domain Isolation Four isolated schemas: - **procurement** — vendor records (vendor_name, vendor_code, tax_id, contact, etc.) - **finance** — company records (company_name, registration_number, fiscal_year_end, etc.) - **sales** — customer records (customer_name, customer_code, industry, credit_limit, etc.) - **hr** — employee records (employee_name, employee_id, id_number, department, etc.) ### OCR Field to Schema Visual Line Mapping `buildOcrSchemaMapping(ocrFields, domain)` maps raw OCR field names to schema fields with confidence colors: | Color | Score | Meaning | |---------|-------------|----------------------------------| | 🟢 green | ≥ 0.92 | High confidence mapping | | 🟡 yellow | 0.70–0.92 | Medium confidence mapping | | 🔴 red | < 0.70 | Low confidence / unmapped | | 🔵 blue | db-only | Database field, no OCR data | ### Dual-Path Entity Retrieval `dualPathEntityRetrieval(entity, domain, dbRecords)` runs two parallel paths: 1. **Exact Match** (threshold 0.92) — ALL critical fields must match exactly 2. **Vector Semantic** (threshold 0.70) — weighted similarity across all fields Results include `needsHumanReview: true` if confidence < 0.92 or no match found. ### Field Value Verification `verifyFieldValues(ocrEntity, dbRecord, domain)` returns 4-state verification per field: | State | Meaning | |-------------|---------------------------------------------------| | `match` | OCR and DB values agree | | `mismatch` | Values differ (requires human resolution) | | `new_info` | Field only in OCR (new information) | | `db_only` | Field only in DB (not in OCR document) | ### Human-in-the-Loop Every pipeline result generates a `hitlRequest` with: - Mismatched fields highlighted - New info fields listed - Available review actions: confirm_match, reject_match, create_new, update_fields Use `processHumanDecision(decision, state)` to process human feedback and generate learning payloads. ### Active Learning `updateActiveLearning(payloads, stats)` tracks: - Per-domain confirmation/rejection/new-record rates - Per-field error rates - Auto-adjusts thresholds when field error rate > 30% ## Example ```javascript import mdm from './index.js'; // Sample OCR entity from a vendor invoice const ocrVendor = { vendor_name: 'Acme Corporation Ltd', vendor_code: 'V-5001', tax_id: '91110000123456789X', contact_person: 'John Smith', email: 'john.smith@acme.com', }; // Existing database records const dbRecords = [ { id: 'rec_001', vendor_name: 'Acme Corporation Ltd', vendor_code: 'V-5001', tax_id: '91110000123456789X', contact_person: 'John Smith', email: 'j.smith@acme.com', // slight email mismatch phone: '+86-10-12345678', address: 'Beijing Chaoyang District', bank_account: '6222021234567890', }, ]; // Run pipeline const result = mdm.runMatchingPipeline(ocrVendor, 'procurement', dbRecords); console.log(mdm.formatMatchingSummary(result)); // Process human decision const decision = { action: 'confirm_match', notes: 'Email mismatch acceptable' }; const { status, learningPayload } = mdm.processHumanDecision(decision, { domain: 'procurement', ocrEntity: ocrVendor, matchResult: result.matchResult, }); // Update active learning const newStats = mdm.updateActiveLearning([learningPayload], {}); ``` ## API Reference | Function | Description | |-----------------------------------|------------------------------------------------| | `getSupportedDomains()` | List all supported business domains | | `getDomainSchema(domain)` | Get field schema for a domain | | `buildOcrSchemaMapping(ocr, dom)` | Map OCR fields to schema with confidence | | `dualPathEntityRetrieval(...)` | Run exact + semantic matching | | `verifyFieldValues(...)` | 4-state field verification | | `runMatchingPipeline(...)` | Full orchestration pipeline | | `generateHitlReviewRequest(...)` | Build human review request payload | | `processHumanDecision(...)` | Handle human feedback | | `updateActiveLearning(...)` | Update learning stats from decisions | | `formatMatchingSummary(...)` | Human-readable result summary |

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 master-data-matching-1776105661 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 master-data-matching-1776105661 技能

通过命令行安装

skillhub install master-data-matching-1776105661

下载 Zip 包

⬇ 下载 master-data-matching v1.0.0

文件大小: 15.42 KB | 发布时间: 2026-4-17 15:18

v1.0.0 最新 2026-4-17 15:18
Initial release of the production-ready Master Data Intelligent Matching System.

- Supports entity resolution for procurement, finance, sales, and HR domains
- Dual-path retrieval: exact-match and vector-semantic matching
- OCR-to-schema mapping with confidence coloring (green/yellow/red/blue)
- 4-state per-field value verification and human-in-the-loop review flow
- Active learning tracks errors, feedback, and auto-adjusts thresholds
- Modular API for integration and orchestration

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部