geo-bulk-processor

Overview

INLINECODE0 is a GEO (Generative Engine Optimization) bulk processing orchestrator designed for large sites and content libraries. It helps an agent:

- Design and run batch optimization pipelines over thousands of pages or documents.
Standardize templates and patterns for repeated page types.
Track progress, quality, and rollout phases instead of treating each page as an isolated “single-shot” task.

Use this skill when the user:

- Manages a large website, multi-language site, or content library.
Wants to bulk retrofit existing pages for GEO readiness, AI citations, or structured data.
Provides sitemaps, URL lists, spreadsheets, or exports from CMS/analytics tools.
Needs a repeatable, scalable workflow instead of one-off page edits.

This skill focuses on planning, orchestration, and consistency. It does not tie you to any specific GEO framework, but is optimized to work well with the family of geo-* skills (e.g. content optimizers, schema generators, local optimizers).

When to use this skill

Always consider using geo-bulk-processor when:

- The user mentions “bulk”, “batch”, “mass update”, “large site”, “thousands of pages”, “entire catalog”, “all blog posts”, or similar language.
The user shares CSV/Excel exports, sitemaps, URL lists, or folders of content files.
The user wants to apply similar GEO strategies across many items (e.g., all product pages, all location pages, all support articles).
The task obviously cannot be solved efficiently by optimizing a single page at a time.

If the request is clearly about one page or a very small number of pages (e.g., “fix GEO for this single article”), prefer the more focused single-page GEO skills instead of this bulk processor.

High-level workflow

When this skill is invoked, follow this structured workflow:

1. Clarify the bulk goal

- Identify the primary GEO goals (e.g., AI citations for specific queries, better structured data, consistent FAQ blocks, improved internal linking). - Ask what content universe is in scope (full site, one section, specific content types, language variants, etc.). - Ask about constraints (timelines, systems, headcount, tech stack, allowed file formats, risk tolerance).

2. Ingest and profile the corpus

- Accept inputs such as: - URL lists or sitemaps (XML, CSV, plain text). - Content exports from CMS (CSV, JSON, Markdown/HTML files). - Analytics exports (top pages, landing pages, long-tail queries). - Build a lightweight inventory summary: - Number of items by content type, language, directory, or template. - Obvious clusters (e.g., /blog/, /product/, /docs/, /locations/). - Known GEO gaps (missing schema, thin pages, duplicated structures).

3. Define content clusters and page types

- Group content into clusters that can share the same optimization strategy: - By URL pattern, content type, template ID, language, or topic. - For each cluster, define: - A page-type definition (what the pages are, who they serve, business role). - The GEO opportunities specific to that cluster. - Risks and special constraints (e.g., legal wording, regulated content).

4. Design bulk GEO pipelines

- For each cluster, design a pipeline describing: - Inputs (fields from the CMS/CSV, existing content fields, metadata). - Transformations (rewrites, new sections, schema generation, FAQ extraction). - External skills/tools to call (e.g., schema generators, content optimizers). - Outputs (updated HTML/Markdown, JSON fields, CSV columns, migration specs). - Capture pipelines in a structured, reusable format, using the guidance in references/bulk_pipelines.md.

5. Create reusable templates and patterns

- For each major page type, define: - A content skeleton (sections, headings, FAQs, schema blocks). - Variable slots tied to data fields (e.g., product name, city, category). - Style and tone expectations (aligned with GEO and brand constraints). - Make these templates explicit so they can be applied programmatically to many items.

6. Plan execution and rollout

- Propose phased rollout: - Pilot set → expansion → full coverage. - For each phase, specify: - Sample size, selection criteria, and evaluation plan. - Success metrics (GEO readiness scores, AI citation coverage, traffic proxies). - Feedback loop and how to adjust templates/pipelines between phases.

7. Generate concrete artifacts

- Depending on user needs, generate: - A master plan document summarizing clusters, pipelines, and rollout. - CSV/JSON specs for engineers or operators to implement changes. - Example before/after content and schema for each major page type. - Use scripts/geo_bulk_pipeline.py conceptually as a helper reference: it describes how to represent content items, clusters, and pipelines in code or data. You do not have to execute it, but you may mirror its structures.

8. Quality assurance and risk management

- Recommend: - Sampling and manual review before full rollout. - Regression checks for critical pages (homepage, top products, legal pages). - Monitoring plan for GEO and traffic indicators after deployment.

Input formats and expectations

Typical inputs for this skill:

- URL lists / sitemaps

- Plain-text lists of URLs.
- XML sitemaps (possibly multiple).

- Spreadsheets / tables

- CSV or Excel exports listing URLs, titles, categories, traffic, etc.

- Content exports

- Folders of HTML/Markdown files.
- CMS exports with JSON per document.

When possible, normalize these into a tabular or record-based view:

- One row or record per content item.
Columns / fields for URL, path, language, category, template, and any key metrics.

If files are not already in a structured format, design and describe a simple manifest format that the user’s team can produce (for example, a CSV with id,url,type,cluster,language columns).

Output expectations

Unless the user asks for something highly specific, structure your main outputs as:

1. High-level GEO bulk strategy

- Clear description of scope, goals, and constraints. - Overview of clusters, page types, and priorities.

2. Cluster-by-cluster plan

- For each cluster: - Description and rationale. - Proposed pipeline steps and external skills/tools. - Suggested templates and content patterns.

3. Implementation-ready specifications

- Tables or pseudo-schemas describing: - Required input fields and data sources. - Output fields/files and where they should be written. - Recommended automation approach (ETL, scripts, CMS workflows).

4. Rollout and QA plan

- Phased rollout with criteria and success metrics. - QA checklists for sampling and sign-off. - Monitoring and iteration loops.

When the user explicitly requests machine-consumable artifacts (for example, “give me a CSV spec that my data team can use”), prioritize precise, unambiguous formats (clear column names, data types, and examples).

Use of bundled resources

This skill ships with additional reference material and helper code:

- INLINECODE10

- Explains common GEO bulk pipeline patterns. - Provides example cluster definitions, pipeline step types, and rollout patterns. - Read this when you need inspiration for how to structure pipelines or explain them to the user’s team.

- INLINECODE11

- Contains lightweight data models and helper functions that show one way to model content items, clusters, and pipelines. - You can mirror its structures when designing data formats, but you do not need to execute it to complete user requests. - If the environment allows executing scripts, they can be adapted into real automation helpers, but this skill does not require that.

When in doubt, prefer clear written plans and specs over over-engineered code. The primary value of this skill is in designing scalable GEO bulk workflows, not in implementing full production systems inside the skill.

Style and collaboration guidelines

- Think like a systems designer.

- Aim for solutions that will still work when the corpus doubles or triples. - Avoid plans that require per-page manual tweaking unless limited to a small, high-value subset.

- Explain the “why”, not just the “what”.

- Connect GEO decisions (clusters, templates, schema types) to the user’s business goals and constraints.

- Be opinionated but adaptable.

- Suggest defaults and best practices, but clearly mark them as such. - Offer alternative paths when user constraints are unclear or strict.

- Make it easy to operationalize.

- Prefer concrete artifacts: specs, templates, checklists, and example records. - Avoid vague recommendations that cannot be turned into tickets or automation.

If the user already uses other geo-* skills, explicitly reference where they fit into each pipeline step (for example, “use geo-schema-gen here to generate Product and FAQ schema for this cluster”).

概述

geo-bulk-processor 是一个GEO（生成式引擎优化）批量处理编排器，专为大型网站和内容库设计。它帮助智能体：

- 针对数千个页面或文档，设计和运行批量优化流水线。
为重复的页面类型标准化模板和模式。
跟踪进度、质量和分阶段部署，而不是将每个页面视为孤立的“一次性”任务。

当用户出现以下情况时，使用此技能：

- 管理大型网站、多语言站点或内容库。
希望对现有页面进行批量改造，以实现GEO就绪、AI引用或结构化数据。
提供来自CMS/分析工具的站点地图、URL列表、电子表格或导出文件。
需要可重复、可扩展的工作流程，而非一次性的页面编辑。

此技能专注于规划、编排和一致性。它不将您绑定到任何特定的GEO框架，但经过优化，可与geo-*系列技能（例如内容优化器、模式生成器、本地优化器）良好配合。

何时使用此技能

在以下情况下，始终考虑使用geo-bulk-processor：

- 用户提到“批量”、“批次”、“大规模更新”、“大型网站”、“数千个页面”、“整个目录”、“所有博客文章”或类似表述。
用户分享CSV/Excel导出文件、站点地图、URL列表或内容文件文件夹。
用户希望跨多个项目应用相似的GEO策略（例如，所有产品页面、所有位置页面、所有支持文章）。
任务显然无法通过一次优化一个页面来高效解决。

如果请求明确是关于一个页面或极少数页面（例如，“修复此单篇文章的GEO”），请优先使用更聚焦的单页面GEO技能，而非此批量处理器。

高级工作流程

当调用此技能时，请遵循此结构化工作流程：

1. 明确批量目标

- 确定主要GEO目标（例如，针对特定查询的AI引用、更好的结构化数据、一致的FAQ块、改进的内部链接）。 - 询问内容范围（整个站点、某个部分、特定内容类型、语言变体等）。 - 询问约束条件（时间线、系统、人员、技术栈、允许的文件格式、风险承受能力）。

2. 导入并分析语料库

- 接受以下输入： - URL列表或站点地图（XML、CSV、纯文本）。 - 来自CMS的内容导出（CSV、JSON、Markdown/HTML文件）。 - 分析导出（热门页面、着陆页、长尾查询）。 - 构建轻量级清单摘要： - 按内容类型、语言、目录或模板划分的项目数量。 - 明显的集群（例如，/blog/、/product/、/docs/、/locations/）。 - 已知的GEO差距（缺少模式、内容单薄、结构重复）。

3. 定义内容集群和页面类型

- 将内容分组为可以共享相同优化策略的集群： - 按URL模式、内容类型、模板ID、语言或主题。 - 对于每个集群，定义： - 页面类型定义（页面是什么、为谁服务、业务角色）。 - 该集群特有的GEO机会。 - 风险和特殊约束（例如，法律措辞、受监管内容）。

4. 设计批量GEO流水线

- 对于每个集群，设计一个流水线，描述： - 输入（来自CMS/CSV的字段、现有内容字段、元数据）。 - 转换（重写、新章节、模式生成、FAQ提取）。 - 要调用的外部技能/工具（例如，模式生成器、内容优化器）。 - 输出（更新的HTML/Markdown、JSON字段、CSV列、迁移规范）。 - 使用references/bulk_pipelines.md中的指导，以结构化、可重用的格式捕获流水线。

5. 创建可重用的模板和模式

- 对于每种主要页面类型，定义： - 内容骨架（章节、标题、FAQ、模式块）。 - 与数据字段关联的变量槽（例如，产品名称、城市、类别）。 - 风格和语气期望（与GEO和品牌约束保持一致）。 - 使这些模板明确，以便可以以编程方式应用于多个项目。

6. 规划执行和部署

- 提出分阶段部署： - 试点集 → 扩展 → 全面覆盖。 - 对于每个阶段，指定： - 样本量、选择标准和评估计划。 - 成功指标（GEO就绪评分、AI引用覆盖率、流量代理）。 - 反馈循环以及如何在阶段之间调整模板/流水线。

7. 生成具体产物

- 根据用户需求，生成： - 总结集群、流水线和部署的主计划文档。 - 供工程师或操作员实施变更的CSV/JSON规范。 - 每种主要页面类型的示例前后对比内容和模式。 - 概念性地使用scripts/geobulkpipeline.py作为辅助参考：它描述了如何在代码或数据中表示内容项、集群和流水线。您不必执行它，但可以镜像其结构。

8. 质量保证和风险管理

- 建议： - 在全面部署前进行抽样和人工审查。 - 对关键页面（主页、顶级产品、法律页面）进行回归检查。 - 部署后对GEO和流量指标进行监控计划。

输入格式和期望

此技能的典型输入：

- URL列表/站点地图

- URL的纯文本列表。
- XML站点地图（可能多个）。

- 电子表格/表格

- 列出URL、标题、类别、流量等的CSV或Excel导出文件。

- 内容导出

- HTML/Markdown文件的文件夹。
- 每个文档带有JSON的CMS导出。

在可能的情况下，将这些规范化为表格或基于记录的形式：

- 每个内容项一行或一条记录。
URL、路径、语言、类别、模板和任何关键指标的列/字段。

如果文件尚未采用结构化格式，请设计并描述一个用户团队可以生成的简单清单格式（例如，包含id,url,type,cluster,language列的CSV）。

输出期望

除非用户要求高度具体的内容，否则将主要输出结构化为：

1. 高级GEO批量策略

- 清晰描述范围、目标和约束。 - 集群、页面类型和优先级的概述。

2. 逐集群计划

- 对于每个集群： - 描述和理由。 - 建议的流水线步骤和外部技能/工具。 - 建议的模板和内容模式。

3. 可实施的规范

- 描述以下内容的表格或伪模式： - 所需的输入字段和数据源。 - 输出字段/文件及其写入位置。 - 推荐的自动化方法（ETL、脚本、CMS工作流程）。

4. 部署和QA计划

- 带有标准和成功指标的分阶段部署。 - 用于抽样和签核的QA检查清单。 - 监控和迭代循环。

当用户明确请求机器可消费的产物（例如，“给我一个我的数据团队可以使用的CSV规范”）时，优先考虑精确、无歧义的格式（清晰的列名、数据类型和示例）。

捆绑资源的使用

此技能附带额外的参考材料和辅助代码：

- references/bulk_pipelines.md

- 解释常见的GEO批量流水线模式。 - 提供示例集群定义、流水线步骤类型和部署模式。 - 当您需要如何构建流水线或向用户团队解释它们的灵感时，请阅读此文件。

- scripts/geobulkpipeline.py

- 包含轻量级数据模型和辅助函数，展示了一种建模内容项、集群和流水线的方法。 - 在设计数据格式时，您可以镜像其结构，但不需要执行它来完成用户请求。 - 如果环境允许执行脚本，它们可以改编为实际的自动化辅助工具，但此技能不要求如此。

如有疑问，优先选择清晰的书面计划和规范，而非过度设计的代码。此技能的主要价值在于设计可扩展的GEO批量工作流程，而非在技能内部实现完整的生产系统。

风格和协作指南

- 像系统设计师一样思考。

- 追求在语料库翻倍或三倍时仍然有效的解决方案。 - 避免需要逐页手动调整的计划，除非仅限于少量高价值子集。

- 解释“为什么”，而不仅仅是“是什么”。

- 将GEO决策（集群、模板、模式类型）与用户的业务目标和约束联系起来。

- 有主见但灵活。

- 建议默认值和最佳实践，但明确标记为如此。 - 当用户约束不明确或严格时，提供

geo-bulk-processor 地理批量处理器