uni-vision-engine统一视觉引擎

Automated high-quality video generation (text-to-video, image-to-video) via a local jimeng-api Docker service. Features native OpenClaw image interception, allowing users to send images directly in chat to generate videos without any UI.

作者: admin | 来源: ClawHub

Uni Vision Engine (v1.2.0)

This skill leverages a local jimeng-api Docker service. It allows AI agents to fully control high-quality image-to-video and text-to-video generation using a valid sessionid.

🌟 Core Feature: Native Chat Image Interception (Best Practice)

With this skill, the AI Assistant can automatically intercept clothing/character images sent by the user in the chat interface and seamlessly pass them to the generation model—no manual web uploads required!

When a user sends a "character/outfit" photo in the chat and intends to animate it (e.g., showing off the clothing, turning around), you MUST execute the following steps:

1. Intercept the Image Payload: Use the read tool or native execution flow to extract the base64 content or cache path of this image from the chat context. Save it as a local temporary file (e.g., /tmp/target.jpg).
Never use text-based URLs or JSON format for image uploads. You MUST use Node.js multipart/form-data to submit the physical file stream.
Initiate the Video Generation Task using the core script:

node {baseDir}/scripts/generate.js --prompt "The model naturally turns around, fully showcasing the gloss of the fabric, extremely high quality, natural sunlight..." --image /tmp/target.jpg

4. Monitor the Output: Generation usually takes 60-310 seconds. Monitor the Docker logs to retrieve the direct MP4 link and return it to the user.

Content Moderation Warning (China Firewall)

Note: Because this relies on the domestic Jimeng/Seedance engine, there is strict automated content moderation for clothing. If you encounter error -2001 (First frame image upload failed: may contain violating content), this means the image is deemed "too revealing", shows too much skin, or contains sensitive elements. The firewall outright blocks these. No credits are deducted. If this occurs, ask the user to provide a different image or switch to an overseas engine like Luma/Runway.

CLI Usage (For Automation Scripts)

1. Text-to-Video

CODEBLOCK1

2. Image-to-Video (Requires --image)

CODEBLOCK2

Notes:

1. Requires sufficient credits in the Jimeng account.
Using jimeng-video-3.0-pro deducts 50 credits per run.

Uni Vision Engine (v1.2.0)

该技能利用本地 jimeng-api Docker 服务，使AI代理能够通过有效的 sessionid 完全控制高质量的图像转视频和文本转视频生成。

🌟 核心功能：原生聊天图像截取（最佳实践）

通过此技能，AI助手可以自动截取用户在聊天界面发送的服装/人物图像，并将其无缝传递给生成模型——无需手动进行网页上传！

当用户在聊天中发送人物/服装照片并希望将其制作成动画（例如展示服装、转身动作）时，你必须执行以下步骤：

1. 截取图像负载：使用 read 工具或原生执行流程，从聊天上下文中提取该图像的base64内容或缓存路径。将其保存为本地临时文件（例如 /tmp/target.jpg）。
切勿使用基于文本的URL或JSON格式上传图像。你必须使用Node.js的 multipart/form-data 提交物理文件流。
使用核心脚本启动视频生成任务：

bash
node {baseDir}/scripts/generate.js --prompt 模特自然转身，充分展示面料光泽，极高画质，自然阳光... --image /tmp/target.jpg

4. 监控输出：生成通常需要60-310秒。监控Docker日志以获取直接的MP4链接并返回给用户。

内容审核警告（中国防火墙）

注意：由于依赖国内即梦/Seedance引擎，服装内容存在严格的自动审核。如果遇到错误 -2001（首帧图像上传失败：可能包含违规内容），这意味着图像被认为过于暴露、皮肤裸露过多或包含敏感元素。防火墙会直接拦截这些内容。不会扣除积分。如果发生这种情况，请要求用户提供其他图像或切换到Luma/Runway等海外引擎。

CLI使用方式（用于自动化脚本）

1. 文本转视频

bash node {baseDir}/scripts/generate.js --prompt 一只冲浪的柴犬 --session your_sessionid

2. 图像转视频（需要 --image 参数）

bash node {baseDir}/scripts/generate.js --prompt 模特自然转身展示服装 --image /tmp/target.jpg --session your_sessionid

注意事项：

1. 即梦账户需要足够的积分。
使用 jimeng-video-3.0-pro 每次运行扣除 50积分。

uni-vision-engine统一视觉引擎