BatchJob Async Job Skill

Use this skill when the user wants to run or manage batch jobs through the BatchJob service.

Required Environment

- INLINECODE0
INLINECODE1

All HTTP requests must include:

CODEBLOCK0

API Endpoints

- INLINECODE2
INLINECODE3
INLINECODE4
INLINECODE5
INLINECODE6
INLINECODE7

Automation Policy (Default)

- Always run in full-auto mode.
Do not ask user for file_id first.
Resolve file source from current message/context, then upload automatically when needed.
Ask follow-up questions only when no readable file source can be obtained.
Accepted input file formats for upload: jsonl, csv, xlsx, xls (BatchJob normalizes to internal JSONL).
For jsonl, each line must be either:

- canonical Vertex format (contents + optional generationConfig) - simple prompt format (prompt + optional aspect_ratio / image_urls, where image_urls must be publicly reachable URLs)

- If user gives only model mode after a file message, treat it as confirmation and continue automatically.

Guardrails (Must Follow)

- Do not auto-retry by creating a second job unless user explicitly asks.
Do not auto-rewrite dataset format after a terminal failure unless user explicitly asks.
Before upload, auto-normalization is allowed only once for known safe mappings (e.g. simple prompt JSONL -> Vertex JSONL).
Upload failure is terminal for this run: do not continue to precheck/submit when upload fails.
Never submit when row_count <= 0.
Do not fetch or parse output_summary_url automatically; only do it when user asks for detailed failure reason.
After reaching terminal status (completed, failed, partially_failed), stop execution and return summary immediately.

JSONL Compatibility Rule (Important)

BatchJob internal execution expects each JSONL line to be a VertexGeminiImageRequest shape.

- Canonical line format:

- contents[0].parts[0].text contains prompt text - generationConfig.imageConfig.aspectRatio is optional - generationConfig.responseModalities should include IMAGE and TEXT

- Non-canonical simple JSONL like {"prompt":"...","aspect_ratio":"1:1"} is acceptable; server will normalize it to Vertex format.
If JSONL has neither contents nor prompt, stop and ask user to provide valid data.
Explicitly unsupported (must reject before submit):

- OpenAI Batch style lines containing method + url + body (for example /v1/chat/completions payload). - This schema will fail with Vertex error: at least one contents field is required.

Dataset Output Rule (Important)

When user asks you to generate a template/demo file for BatchJob image tasks:

- Prefer CSV headers: INLINECODE40
Or JSONL line format with prompt / aspect_ratio / INLINECODE43
If image_urls is provided, it must be publicly accessible (http/https).
Do NOT generate OpenAI batch envelope (custom_id + method + url + body) as final upload file

User Template Response (Important)

When user asks for format/template, return:

1. A short explanation (prompt required, aspect_ratio/image_urls optional, and image_urls must be public URLs).
One copyable CSV snippet (default).
Optionally one JSONL snippet (simple prompt schema).
Local template file paths (if available):

- /home/node/.openclaw/workspace/templates/batchjob-input-template.csv - /home/node/.openclaw/workspace/templates/batchjob-input-template.jsonl - INLINECODE55

Do not output OpenAI batch envelope examples in template replies.

File Source Resolver (Strict Order)

1. Existing file_id:

- if provided, skip upload.

2. Public file_url (http:// or https://):

- download to temp local file, then upload.

3. Explicit local file_path:

- if readable, upload.

4. Inbound attachment local path from channel/runtime context:

- examples: /tmp/..., MEDIA:<path>, /tmp/openclaw-media/.... - if readable, upload.

5. Channel private file token/object (no local path, no public URL):

- if runtime has a channel adapter that can download attachment bytes, use it and upload. - if not available, enter fallback interaction.

Resolver output must be normalized to one of:

- INLINECODE64
INLINECODE65 (readable local file)

Execution Flow

1. Confirm model and mode; if missing, use safe defaults (model=google/gemini-2.5-flash-image, mode=fast) and tell user.
Resolve file source using resolver above.
If resolver gives file_path, upload via POST /v1/batch/files:upload to get file_id.

- For .jsonl, inspect a few non-empty lines first: - if contents exists, upload as-is. - if only prompt/aspect_ratio/image_urls exists, upload as-is (server will normalize). - if method + url + body exists, stop and ask user to switch to BatchJob schema. - if structure is unknown, stop and ask user for valid schema. - Backward compatibility fallback: - if upload fails with unsupported file type for csv/xlsx/xls, convert once to JSONL and retry upload once. - do not retry more than once. - If upload returns validation error (unsupported schema, no valid data rows), stop immediately and return fix guidance.

4. Run precheck with record_count (prefer uploaded file row_count).
Submit job with file_id.
Poll job status until terminal (completed, failed, partially_failed) with bounded timeout:

- interval: 5 seconds - max polls: 12 (about 60 seconds) - if still non-terminal after max polls: return current status and job_id, then stop.

7. Return concise summary with job_id, status, progress, and output_summary_url.

When user only asks for estimate, stop at precheck and do not submit.

Fallback Interaction (Only When Needed)

Use this when resolver cannot read file bytes from current channel/context:

INLINECODE94
1) 直接发一个可公网下载的 URL
2) 提供本机可读路径（如 /tmp/xxx.csv）
INLINECODE97

If the current channel supports resending as direct attachment path in context, also ask user to resend once.

File Source Playbook

A) Public URL -> Local Temp File

CODEBLOCK1

A2) JSONL Schema Sanity Check

Use this before upload to avoid wrong schema submission.

CODEBLOCK2

B) Feishu/Channel Attachment Path

If message context already includes local attachment path, treat it as FILE_PATH directly.

CODEBLOCK3

If only a channel token/link is provided but no downloadable URL and no local path, try channel adapter download first. If adapter is unavailable, use fallback interaction.

Curl Templates

CODEBLOCK4

CODEBLOCK5

CODEBLOCK6

CODEBLOCK7

CODEBLOCK8

BatchJob 异步任务技能

当用户希望通过 BatchJob 服务运行或管理批处理任务时，使用此技能。

所需环境

- BATCHJOBBASEURL
BATCHJOBBEARERTOKEN

所有 HTTP 请求必须包含：

bash
-H Authorization: Bearer ${BATCHJOBBEARERTOKEN}
-H Content-Type: application/json

API 端点

- POST /v1/batch/files:upload
POST /v1/batch/jobs:precheck
POST /v1/batch/jobs
GET /v1/batch/jobs/{jobid}
GET /v1/batch/jobs?page=1&pagesize=10&status=...
POST /v1/batch/jobs/{job_id}:cancel

自动化策略（默认）

- 始终以全自动模式运行。
不要先询问用户 file_id。
从当前消息/上下文中解析文件来源，需要时自动上传。
仅在无法获取可读文件来源时询问后续问题。
接受的上传输入文件格式：jsonl、csv、xlsx、xls（BatchJob 会标准化为内部 JSONL）。
对于 jsonl，每行必须是以下之一：

- 标准 Vertex 格式（contents + 可选的 generationConfig） - 简单提示格式（prompt + 可选的 aspectratio / imageurls，其中 image_urls 必须是可公开访问的 URL）

- 如果用户在文件消息后只给出 model mode，则视为确认并自动继续。

防护措施（必须遵守）

- 除非用户明确要求，否则不要通过创建第二个任务来自动重试。
除非用户明确要求，否则不要在终端失败后自动重写数据集格式。
上传前，仅允许对已知的安全映射进行一次自动标准化（例如简单提示 JSONL -> Vertex JSONL）。
上传失败对此运行是终端性的：上传失败时不要继续执行预检查/提交。
当 rowcount <= 0 时绝不提交。
不要自动获取或解析 outputsummaryurl；仅在用户询问详细失败原因时执行。
达到终端状态（completed、failed、partiallyfailed）后，停止执行并立即返回摘要。

JSONL 兼容性规则（重要）

BatchJob 内部执行期望每个 JSONL 行是 VertexGeminiImageRequest 形状。

- 标准行格式：

- contents[0].parts[0].text 包含提示文本 - generationConfig.imageConfig.aspectRatio 是可选的 - generationConfig.responseModalities 应包含 IMAGE 和 TEXT

- 非标准的简单 JSONL 如 {prompt:...,aspect_ratio:1:1} 是可接受的；服务器会将其标准化为 Vertex 格式。
如果 JSONL 既没有 contents 也没有 prompt，则停止并要求用户提供有效数据。
明确不支持（提交前必须拒绝）：

- 包含 method + url + body 的 OpenAI Batch 风格行（例如 /v1/chat/completions 负载）。 - 此模式将导致 Vertex 错误：至少需要一个 contents 字段。

数据集输出规则（重要）

当用户要求你为 BatchJob 图像任务生成模板/示例文件时：

- 优先使用 CSV 表头：prompt,aspectratio,imageurls
或使用带 prompt / aspectratio / imageurls 的 JSONL 行格式
如果提供了 imageurls，它必须是可公开访问的（http/https）。
不要生成 OpenAI 批处理信封（customid + method + url + body）作为最终上传文件

用户模板响应（重要）

当用户询问格式/模板时，返回：

1. 简短说明（prompt 为必填，aspectratio/imageurls 为可选，且 image_urls 必须是公开 URL）。
一个可复制的 CSV 片段（默认）。
可选的一个 JSONL 片段（简单提示模式）。
本地模板文件路径（如果可用）：

- /home/node/.openclaw/workspace/templates/batchjob-input-template.csv - /home/node/.openclaw/workspace/templates/batchjob-input-template.jsonl - /home/node/.openclaw/workspace/templates/batchjob-format-guide.md

不要在模板回复中输出 OpenAI 批处理信封示例。

文件来源解析器（严格顺序）

1. 现有的 file_id：

- 如果提供了，跳过上传。

2. 公开的 file_url（http:// 或 https://）：

- 下载到临时本地文件，然后上传。

3. 明确的本地 file_path：

- 如果可读，则上传。

4. 来自频道/运行时上下文的入站附件本地路径：

- 示例：/tmp/...、MEDIA:、/tmp/openclaw-media/...。 - 如果可读，则上传。

5. 频道私有文件令牌/对象（无本地路径，无公开 URL）：

- 如果运行时具有可下载附件字节的频道适配器，则使用它并上传。 - 如果不可用，则进入回退交互。

解析器输出必须标准化为以下之一：

- fileid
filepath（可读的本地文件）

执行流程

1. 确认 model 和 mode；如果缺失，使用安全默认值（model=google/gemini-2.5-flash-image，mode=fast）并告知用户。
使用上述解析器解析文件来源。
如果解析器给出 filepath，通过 POST /v1/batch/files:upload 上传以获取 fileid。

- 对于 .jsonl，首先检查几行非空行： - 如果存在 contents，按原样上传。 - 如果仅存在 prompt/aspectratio/imageurls，按原样上传（服务器会标准化）。 - 如果存在 method + url + body，停止并要求用户切换到 BatchJob 模式。 - 如果结构未知，停止并要求用户提供有效模式。 - 向后兼容回退： - 如果上传因不支持的文件类型而失败（针对 csv/xlsx/xls），转换一次为 JSONL 并重试上传一次。 - 不要重试超过一次。 - 如果上传返回验证错误（不支持的架构、没有有效的数据行），立即停止并返回修复指南。

4. 使用 recordcount 运行预检查（优先使用上传文件的 rowcount）。
使用 fileid 提交任务。
轮询任务状态直到终端状态（completed、failed、partiallyfailed），带有超时限制：

- 间隔：5 秒 - 最大轮询次数：12（约 60 秒） - 如果最大轮询次数后仍为非终端状态：返回当前状态和 job_id，然后停止。

7. 返回包含 jobid、状态、进度和 outputsummary_url 的简洁摘要。

当用户只要求估算时，在预检查处停止，不提交。

回退交互（仅在需要时使用）

当解析器无法从当前频道/上下文读取文件字节时使用：

我拿到了“文件引用”，但当前运行环境无法直接读取该附件内容。请任选其一：
1) 直接发一个可公网下载的 URL
2) 提供本机可读路径（如 /tmp/xxx.csv）
3) 先把文件上传到 BatchJob，给我 file_id

如果当前频道支持重新发送为上下文中的直接附件路径，也请用户重新发送一次。

文件来源操作手册

A) 公开 URL -> 本地临时文件

bash
FILE_URL=https://example.com/input.jsonl
EXT=${FILE_URL##*.}
FILE_PATH=$(mktemp /tmp/batchjob-input.XXXXXX.${EXT:-jsonl})
curl -fL --retry 3 --connect-timeout 10 $FILEURL -o $FILEPATH

A2) JSONL 模式健全性检查

在上传前使用此检查以避免提交错误模式。

bash
SRC_JSONL=/tmp/input.jsonl
FIRSTLINE=$(grep -m1 -v ^[[:space:]]*$ $SRCJSONL)
echo $FIRST

batchjob-async-job批量异步任务