LLM Video Generator

Generate videos via ZhipuAI CogVideoX-3. Each API call produces ~5s of video.
For longer videos, chain multiple calls using last-frame continuation, then concatenate.

Scripts

All scripts use /opt/anaconda3/bin/python3. Resolve <skill-dir> to this skill's directory.

Script	Purpose
INLINECODE2	Core generation (3 modes: text2video, image2video, frames2video)
INLINECODE3

Extract last frame from a video (for continuation) |
| scripts/concat_videos.py | Concatenate multiple video segments into one |

Workflow

Step 1: Assess Request & Clarify

Clear request → proceed to Step 2. A request is clear when:

- Video content/scene is described with enough detail
Style or visual tone is specified or implied
Duration is stated (default: 5s if not specified)

Vague request → propose a plan first:

CODEBLOCK0

Iterate with the user until confirmed.

Step 2: Estimate Time & Notify User

Before starting generation, calculate and report the estimated time:

Time estimation formula:

- Base: 1 minute per second of video (e.g., 20s video ≈ 20 minutes)
High-definition (4K or 60fps): add +30% (e.g., 20s 4K video ≈ 26 minutes)
Additional overhead: ~2 minutes for frame extraction, concatenation, and compression
Segments: ceil(target_duration / 5)

MUST send this message to the user before starting generation:

CODEBLOCK1

Example for a 30s 1080P video:

- 6 segments, base time = 30 minutes, +2 min overhead → ~32 minutes
Message: "预计总耗时：约 32 分钟"

Example for a 20s 4K video:

- 4 segments, base time = 20 * 1.3 = 26 min, +2 min → ~28 minutes

Step 3: Plan Generation Segments

Each API call produces ~5 seconds. Calculate segments: INLINECODE5

For multi-segment videos, plan how the content evolves across segments. Write a prompt for each segment describing what happens in that 5-second window, maintaining visual continuity.

Step 4: Execute Generation with Progress Reports

CRITICAL: After each segment completes, IMMEDIATELY send a progress message to the user before starting the next segment. Do not wait until all segments are done.

Progress message format (send via message tool or inline reply after each segment):

CODEBLOCK2

Generation process:

Segment 1 — Text-to-Video:

CODEBLOCK3

→ Send progress message to user

Segments 2+ — Image-to-Video (last-frame continuation):

For each subsequent segment:

1. Extract last frame from the previous segment's video:

CODEBLOCK4

2. Generate next segment using the last frame as input:

CODEBLOCK5

3. → Send progress message to user

Repeat for all segments.

Alternative — Frames-to-Video mode:

If you have both a starting and ending image for a segment:
CODEBLOCK6

Step 5: Concatenate Segments

After all segments are generated, combine them:

CODEBLOCK7

If the final file exceeds 25MB (Feishu upload limit), compress with ffmpeg:
CODEBLOCK8

Step 6: Deliver

- Share the final video file with the user
For Feishu delivery: use feishu-send-file skill to send the .mp4 file
Final report:

CODEBLOCK9

Prompt Tips

- Use English prompts for best quality (translate Chinese descriptions)
Be specific: scene, camera angle, lighting, motion, atmosphere
Include style keywords: cinematic, realistic, cartoon, watercolor, etc.
For continuation segments, describe the action progression, not the full scene from scratch
Keep each segment prompt concise (1-3 sentences)

Parameters Reference

Parameter	Flag	Default	Options
Prompt	INLINECODE6	(required)	Descriptive text
Quality

Error Handling

- Missing ZHIPUAPIKEY: Ask user to set environment variable
Missing zai-sdk: pip install zai-sdk (under anaconda)
Missing ffmpeg: Required for frame extraction and concatenation
Task timeout: Increase --max-wait or retry; check task status manually via API
Task failed: Simplify the prompt and retry
File too large for Feishu: Compress with ffmpeg (reduce resolution or increase CRF)

LLM 视频生成器

通过智谱AI CogVideoX-3生成视频。每次API调用生成约5秒视频。
如需更长视频，使用最后一帧延续方式串联多次调用，然后拼接。

脚本

所有脚本使用 /opt/anaconda3/bin/python3。将解析为该技能所在目录。

脚本	用途
scripts/videogen.py	核心生成（3种模式：文本转视频、图像转视频、帧序列转视频）
scripts/extractlast_frame.py

从视频中提取最后一帧（用于延续） |
| scripts/concat_videos.py | 将多个视频片段拼接为一个 |

工作流程

步骤1：评估请求并澄清

明确请求 → 进入步骤2。当满足以下条件时，请求被视为明确：

- 视频内容/场景描述足够详细
指定或暗示了风格或视觉基调
说明了时长（未指定时默认：5秒）

模糊请求 → 先提出方案：

基于你的需求，我拟定了以下视频方案：

📹 视频内容: [包含关键时刻的详细场景描述]
🎨 视频风格: [例如：写实/动画/电影感/温馨...]
⏱️ 视频时长: [X秒，注意：将以5秒为片段生成]
🔊 背景音乐: 有/无
📐 分辨率: 1920x1080
🎞️ 帧率: 30fps

你觉得这个方案可以吗？需要调整哪些部分？

与用户反复沟通直至确认。

步骤2：预估时间并通知用户

开始生成前，计算并报告预估时间：

时间预估公式：

- 基础：视频每秒1分钟（例如，20秒视频 ≈ 20分钟）
高清（4K或60fps）：增加+30%（例如，20秒4K视频 ≈ 26分钟）
额外开销：约2分钟用于帧提取、拼接和压缩
片段数：ceil(目标时长 / 5)

开始生成前必须向用户发送此消息：

⏳ 视频生成预估

📊 分段计划：{N} 段（每段约5秒）
⏱️ 预计总耗时：约 {estimated_minutes} 分钟
📐 分辨率：{resolution}

视频生成是一个耗时过程，请耐心等待。我会在每段完成后实时汇报进度。

30秒1080P视频示例：

- 6段，基础时间=30分钟，+2分钟开销 → 约32分钟
消息：预计总耗时：约 32 分钟

20秒4K视频示例：

- 4段，基础时间=20 * 1.3 = 26分钟，+2分钟 → 约28分钟

步骤3：规划生成片段

每次API调用生成约5秒。计算片段数：ceil(目标时长 / 5)

对于多片段视频，规划内容在各片段间的演进。为每个片段编写提示词，描述该5秒窗口内发生的内容，保持视觉连续性。

步骤4：执行生成并报告进度

关键：每个片段完成后，在开始下一个片段前立即向用户发送进度消息。 不要等到所有片段完成。

进度消息格式（在每个片段后通过消息工具或内联回复发送）：

✅ 进度：{completed}/{total} 段完成（第{N}段已生成）
📝 内容：{brief segment description}
⏱️ 本段耗时：{minutes}分钟
📊 预计剩余：约 {remaining_minutes} 分钟

生成过程：

片段1 — 文本转视频：

bash
/opt/anaconda3/bin/python3 /scripts/video_gen.py text2video \
--prompt 1prompt> \
--quality quality --audio true --size 1920x1080 --fps 30 \
--output-dir --max-wait 900

→ 向用户发送进度消息

片段2+ — 图像转视频（最后一帧延续）：

对于每个后续片段：

1. 提取前一个片段视频的最后一帧：

bash /opt/anaconda3/bin/python3 /scripts/extractlastframe.py \ video.mp4> --output /framesegN.png

2. 使用最后一帧作为输入生成下一个片段：

bash /opt/anaconda3/bin/python3 /scripts/video_gen.py image2video \ --prompt Nprompt> \ --image-url /frame_segN.png \ --quality quality --audio true --size 1920x1080 --fps 30 \ --output-dir --max-wait 900

3. → 向用户发送进度消息

对所有片段重复此过程。

替代方案 — 帧序列转视频模式：

如果某个片段同时有起始和结束图像：
bash
/opt/anaconda3/bin/python3 /scripts/video_gen.py frames2video \
--prompt \
--first-frame --last-frame \
--quality quality --audio true --size 1920x1080 --fps 30 \
--output-dir

步骤5：拼接片段

所有片段生成后，合并它们：

bash
/opt/anaconda3/bin/python3 /scripts/concat_videos.py \
--inputs ... \
--output /final_video.mp4

如果最终文件超过25MB（飞书上传限制），使用ffmpeg压缩：
bash
ffmpeg -i -c:v libx264 -crf 32 -c:a aac -b:a 96k -vf scale=1280:720 -y

步骤6：交付

- 与用户分享最终视频文件
飞书交付：使用飞书发送文件技能发送.mp4文件
最终报告：

🎬 视频生成完成！

⏱️ 总时长：{duration}秒
📦 文件大小：{size}MB
📊 共 {N} 段，总耗时 {total_minutes} 分钟

提示词技巧

- 使用英文提示词以获得最佳质量（翻译中文描述）
具体描述：场景、镜头角度、灯光、运动、氛围
包含风格关键词：电影感、写实、卡通、水彩等
对于延续片段，描述动作进展，而非从头描述整个场景
保持每个片段提示词简洁（1-3句话）

参数参考

参数	标志	默认值	选项
提示词	--prompt	（必填）	描述性文本
质量

--quality | quality | quality / speed | | 音频 | --audio | true | true / false | | 分辨率 | --size | 1920x1080 | 1280x720, 1920x1080, 3840x2160 | | 帧率 | --fps | 30 | 30 / 60 | | 输出目录 | --output-dir | . | 任何可写路径 | | 轮询间隔 | --poll-interval | 10 | 秒 | | 最大等待 | --max-wait | 900 | 秒（为提高可靠性已增加默认值） |

错误处理

- 缺少ZHIPUAPIKEY：要求用户设置环境变量
缺少zai-sdk：pip install zai-sdk（在anaconda下）
缺少ffmpeg：帧提取和拼接需要
任务超时：增加--max-wait或重试；通过API手动检查任务状态
任务失败：简化提示词并重试
文件过大无法上传飞书：使用ffmpeg压缩（降低分辨率或增加CRF值）

llm-video-generator视频生成器

llm-video-generator

LLM Video Generator

Scripts