Getting Started
Let's do some text to video ai. Upload your files or describe what you're after.
Try saying:
- - "generate my text prompts"
- "export 1080p MP4"
- "turn this script into a 30-second"
First-Time Connection
When a user first opens this skill, connect to the processing backend automatically. Briefly let them know (e.g. "Setting up...").
Authentication: Check if NEMO_TOKEN is set in the environment. If it is, skip to step 2.
- 1. Obtain a free token: Generate a random UUID as client identifier. POST to
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id set to that UUID. The response data.token is your NEMO_TOKEN — 100 free credits, valid 7 days. - Create a session: POST to
https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Authorization: Bearer <token>, Content-Type: application/json, and body {"task_name":"project","language":"<detected>"}. Store the returned session_id for all subsequent requests.
Keep setup communication brief. Don't display raw API responses or token values to the user.
From Text Prompts to Ai Generated Videos
This does AI video creation for text prompts. Everything runs server-side.
A quick walkthrough: upload a 150-word product description paragraph → ask for turn this script into a 30-second video with visuals and background music → wait roughly 1-2 minutes → download your MP4 at 1080p. The backend handles rendering, encoding, all of it.
Fair warning — shorter, clearer prompts tend to produce more accurate and focused video output.
Sorting Your Requests
The skill checks your message against a few patterns and routes it to the right handler.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
Backend Processing
The heavy lifting runs on NemoVideo's GPU cluster. You upload, the backend processes, you download. No local rendering needed.
Base URL: https://mega-api-prod.nemovideo.ai
| Endpoint | Method | Purpose |
|---|
| INLINECODE10 | POST | Start a new editing session. Body: {"task_name":"project","language":"<lang>"}. Returns session_id. |
| INLINECODE13 |
POST | Send a user message. Body includes
app_name,
session_id,
new_message. Stream response with
Accept: text/event-stream. Timeout: 15 min. |
|
/api/upload-video/nemo_agent/me/<sid> | POST | Upload a file (multipart) or URL. |
|
/api/credits/balance/simple | GET | Check remaining credits (
available,
frozen,
total). |
|
/api/state/nemo_agent/me/<sid>/latest | GET | Fetch current timeline state (
draft,
video_infos,
generated_media). |
|
/api/render/proxy/lambda | POST | Start export. Body:
{"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll status every 30s. |
Accepted file types: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE30 - INLINECODE31 : from frontmatter INLINECODE32
- INLINECODE33 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
Every API call needs Authorization: Bearer <NEMO_TOKEN> plus the three attribution headers above. If any header is missing, exports return 402.
Error Codes
- -
0 — success, continue normally - INLINECODE41 — token expired or invalid; re-acquire via INLINECODE42
- INLINECODE43 — session not found; create a new one
- INLINECODE44 — out of credits; anonymous users get a registration link with
?bind=<id>, registered users top up - INLINECODE46 — unsupported file type; show accepted formats
- INLINECODE47 — file too large; suggest compressing or trimming
- INLINECODE48 — missing
X-Client-Id; generate one and retry - INLINECODE50 — free plan export blocked; not a credit issue, subscription tier
- INLINECODE51 — rate limited; wait 30s and retry once
Reading the SSE Stream
Text events go straight to the user (after GUI translation). Tool calls stay internal. Heartbeats and empty data: lines mean the backend is still working — show "⏳ Still working..." every 2 minutes.
About 30% of edit operations close the stream without any text. When that happens, poll /api/state to confirm the timeline changed, then tell the user what was updated.
Translating GUI Instructions
The backend responds as if there's a visual interface. Map its instructions to API calls:
- - "click" or "点击" → execute the action via the relevant endpoint
- "open" or "打开" → query session state to get the data
- "drag/drop" or "拖拽" → send the edit command through SSE
- "preview in timeline" → show a text summary of current tracks
- "Export" or "导出" → run the export workflow
Draft JSON uses short keys: t for tracks, tt for track type (0=video, 1=audio, 7=text), sg for segments, d for duration in ms, m for metadata.
Example timeline summary:
CODEBLOCK0
Quick Start Guide
First time? Just upload a text prompts and describe what you need. I'll run it through NemoVideo's backend and hand you back a 1080p MP4.
Processing takes about 1-2 minutes depending on video length. You start with 100 free credits — most edits cost 1-3.
Tips and Tricks
Keep your source files under 500MB for fastest processing. If you're working with longer content, split it into chunks first.
For best results at 1080p, make sure your input is at least 720p. Upscaling from 480p works but you'll notice it.
Export as MP4 for widest compatibility across social platforms and devices.
Best Practices
Use source footage in TXT, DOCX, PDF, SRT format for best compatibility. 1080p input gives the cleanest results but 720p works fine too.
Be specific with your requests — "add upbeat background music at 30% volume" beats "add some music". The AI works better with concrete details.
Export as MP4 for widest compatibility across social platforms and devices.
快速上手
让我们开始文字转视频AI吧。上传您的文件或描述您的需求。
试试这样说:
- - 生成我的文字提示
- 导出1080p MP4
- 把这个脚本变成30秒视频
首次连接
当用户首次打开此技能时,自动连接到处理后端。简要告知用户(例如正在设置...)。
身份验证:检查环境中是否设置了NEMO_TOKEN。如果已设置,跳至步骤2。
- 1. 获取免费令牌:生成一个随机UUID作为客户端标识符。向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求,头部设置X-Client-Id为该UUID。响应中的data.token即为您的NEMOTOKEN——100免费积分,有效期7天。
- 创建会话:向https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent发送POST请求,携带Authorization: Bearer 、Content-Type: application/json以及请求体{taskname:project,language:<检测到的语言>}。保存返回的sessionid用于后续所有请求。
保持设置沟通简洁。不要向用户显示原始API响应或令牌值。
从文字提示到AI生成视频
本功能通过文字提示实现AI视频创作。所有处理均在服务端运行。
快速流程:上传一段150字的产品描述文本 → 要求将其转化为带画面和背景音乐的30秒视频 → 等待约1-2分钟 → 下载1080p MP4文件。后端负责渲染、编码等全部工作。
温馨提示——更简短、清晰的提示通常能产生更准确、更聚焦的视频输出。
请求分类
技能会根据您的消息匹配特定模式,并将其路由到相应的处理程序。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分查询 | ✅ |
| status / 状态 / show tracks | → §3.4 状态查询 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
后端处理
繁重的工作在NemoVideo的GPU集群上运行。您上传,后端处理,您下载。无需本地渲染。
基础URL:https://mega-api-prod.nemovideo.ai
| 端点 | 方法 | 用途 |
|---|
| /api/tasks/me/with-session/nemoagent | POST | 开始新的编辑会话。请求体:{taskname:project,language:<语言>}。返回sessionid。 |
| /runsse |
POST | 发送用户消息。请求体包含app
name、sessionid、new_message。流式响应,Accept: text/event-stream。超时:15分钟。 |
| /api/upload-video/nemo_agent/me/
| POST | 上传文件(multipart)或URL。 |
| /api/credits/balance/simple | GET | 检查剩余积分(available、frozen、total)。 |
| /api/state/nemoagent/me//latest | GET | 获取当前时间线状态(draft、videoinfos、generated_media)。 |
| /api/render/proxy/lambda | POST | 开始导出。请求体:{id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询状态。 |
接受的文件类型:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
技能归属——运行时从此文件的YAML前置元数据读取:
- - X-Skill-Source:text-to-video-ai
- X-Skill-Version:来自前置元数据version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则 → unknown)
每次API调用都需要Authorization: Bearer 以及上述三个归属头部。如果缺少任何头部,导出将返回402。
错误代码
- - 0 — 成功,正常继续
- 1001 — 令牌过期或无效;通过/api/auth/anonymous-token重新获取
- 1002 — 会话未找到;创建新会话
- 2001 — 积分不足;匿名用户获取带?bind=的注册链接,注册用户充值
- 4001 — 不支持的文件类型;显示接受的格式
- 4002 — 文件过大;建议压缩或裁剪
- 400 — 缺少X-Client-Id;生成一个并重试
- 402 — 免费计划导出受限;非积分问题,而是订阅层级
- 429 — 请求频率限制;等待30秒后重试一次
读取SSE流
文本事件直接发送给用户(经过GUI翻译后)。工具调用保持内部处理。心跳和空data:行表示后端仍在工作——每2分钟显示⏳ 仍在处理中...。
约30%的编辑操作会在无任何文本的情况下关闭流。此时,轮询/api/state确认时间线已更改,然后告知用户更新内容。
翻译GUI指令
后端响应时假设存在可视化界面。将其指令映射为API调用:
- - click或点击 → 通过相关端点执行操作
- open或打开 → 查询会话状态获取数据
- drag/drop或拖拽 → 通过SSE发送编辑命令
- preview in timeline → 显示当前轨道的文本摘要
- Export或导出 → 执行导出工作流
草稿JSON使用短键:t表示轨道,tt表示轨道类型(0=视频,1=音频,7=文字),sg表示片段,d表示时长(毫秒),m表示元数据。
时间线摘要示例:
时间线(3条轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:Lo-fi(0-10秒,35%)3. 标题:都市梦想(0-3秒)
快速入门指南
第一次使用?只需上传文字提示并描述您的需求。我将通过NemoVideo后端处理,为您返回1080p MP4文件。
处理时间约1-2分钟,取决于视频长度。您从100免费积分开始——大多数编辑消耗1-3积分。
技巧与提示
为获得最快处理速度,请将源文件控制在500MB以下。如果处理较长内容,请先将其分割成小块。
为获得最佳1080p效果,请确保输入至少为720p。从480p放大也能工作,但效果会有所差异。
导出为MP4格式,以获得跨社交平台和设备的最广泛兼容性。
最佳实践
使用TXT、DOCX、PDF、SRT格式的源素材以获得最佳兼容性。1080p输入能产生最清晰的结果,但720p也能正常工作。
请求要具体明确——添加30%音量的欢快背景音乐比加些音乐效果更好。AI在处理具体细节时表现更佳。
导出为MP4格式,以获得跨社交平台和设备的最广泛兼容性。