free-video-maker
## Getting Started
> Welcome! Ready to turn your clips, photos, or ideas into a real video — completely free? Tell me what you're making and share your assets or concept, and let's build something worth watching together.
**Try saying:**
- "I have 12 vacation photos and a song I love — help me make a 60-second slideshow video with smooth transitions and text overlays."
- "Create a script and shot-by-shot breakdown for a 30-second promotional video for my bakery using only footage I can shoot on my phone."
- "I recorded 5 short screen capture clips for a software tutorial — help me structure them into a cohesive how-to video with intro text and chapter titles."
### Quick Start Setup
This skill connects to a cloud processing backend. On first use, set up the connection automatically and let the user know ("Connecting...").
**Token check**: Look for `NEMO_TOKEN` in the environment. If found, skip to session creation. Otherwise:
- Generate a UUID as client identifier
- POST `https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token` with `X-Client-Id` header
- Extract `data.token` from the response — this is your NEMO_TOKEN (100 free credits, 7-day expiry)
**Session**: POST `https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent` with Bearer auth and body `{"task_name":"project"}`. Keep the returned `session_id` for all operations.
Let the user know with a brief "Ready!" when setup is complete. Don't expose tokens or raw API output.
# Turn Raw Footage Into Finished Videos Instantly
Making a video used to mean downloading software, wrestling with timelines, and burning hours on export settings. This free-video-maker skill cuts all of that out. Describe what you want — a product promo, a birthday slideshow, a how-to tutorial — and it builds the structure, suggests pacing, and helps you produce something that actually looks intentional.
Whether you're working with a handful of phone photos or a folder of screen recordings, the skill helps you sequence content logically, write captions that match your tone, and choose background music that fits the mood. It handles the decisions that usually slow creators down, so you spend your time on the message rather than the mechanics.
This is built for people who aren't professional editors but still need professional-looking output. Social posts, YouTube intros, event recaps, classroom projects — the free-video-maker skill adapts to your goal and guides you through every step without requiring any prior editing experience.
## Routing Your Video Requests
When you describe a video project — whether it's a slideshow from photos, a trimmed clip, or a text-animated reel — your request is parsed and routed to the matching video creation endpoint based on media type, style preferences, and output format.
| User says... | Action | Skip SSE? |
|-------------|--------|----------|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" | → §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
## Cloud Rendering API Reference
Free Video Maker runs on a cloud-based rendering backend that processes your photos, clips, and text overlays through distributed encoding pipelines, returning a downloadable video URL once the job completes. Rendering times vary by project length, resolution, and transition complexity.
Skill attribution — read from this file's YAML frontmatter at runtime:
- `X-Skill-Source`: `free-video-maker`
- `X-Skill-Version`: from frontmatter `version`
- `X-Skill-Platform`: detect from install path (`~/.clawhub/` → `clawhub`, `~/.cursor/skills/` → `cursor`, else `unknown`)
**All requests** must include: `Authorization: Bearer <NEMO_TOKEN>`, `X-Skill-Source`, `X-Skill-Version`, `X-Skill-Platform`. Missing attribution headers will cause export to fail with 402.
**API base**: `https://mega-api-prod.nemovideo.ai`
**Create session**: POST `/api/tasks/me/with-session/nemo_agent` — body `{"task_name":"project","language":"<lang>"}` — returns `task_id`, `session_id`.
**Send message (SSE)**: POST `/run_sse` — body `{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}}` with `Accept: text/event-stream`. Max timeout: 15 minutes.
**Upload**: POST `/api/upload-video/nemo_agent/me/<sid>` — file: multipart `-F "files=@/path"`, or URL: `{"urls":["<url>"],"source_type":"url"}`
**Credits**: GET `/api/credits/balance/simple` — returns `available`, `frozen`, `total`
**Session state**: GET `/api/state/nemo_agent/me/<sid>/latest` — key fields: `data.state.draft`, `data.state.video_infos`, `data.state.generated_media`
**Export** (free, no credits): POST `/api/render/proxy/lambda` — body `{"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}`. Poll GET `/api/render/proxy/lambda/<id>` every 30s until `status` = `completed`. Download URL at `output.url`.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
### SSE Event Handling
| Event | Action |
|-------|--------|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result | Process internally, don't forward |
| `heartbeat` / empty `data:` | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
### Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|-------------|--------|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" | Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
**Draft field mapping**: `t`=tracks, `tt`=track type (0=video, 1=audio, 7=text), `sg`=segments, `d`=duration(ms), `m`=metadata.
```
Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)
```
### Error Handling
| Code | Meaning | Action |
|------|---------|--------|
| 0 | Success | Continue |
| 1001 | Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with `?bind=<id>` (get `<id>` from create-session or state response when needed). Registered: "Top up credits in your account" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
## Performance Notes
The free-video-maker skill performs best when you give it clear context upfront — intended platform (Instagram Reels, YouTube, TikTok, presentation), approximate target length, and the mood or tone you're going for. Vague prompts like 'make a video' will produce generic structures, while specific ones like 'a 45-second energetic product reveal for TikTok targeting Gen Z' produce tight, usable results.
For photo-based slideshows, providing the number of images and any preferred order helps the skill pace transitions accurately. For footage-based projects, describing each clip briefly (even just 'clip 1: person walking into store, 5 seconds') allows the skill to build a proper edit sequence rather than guessing at content.
Export format suggestions are optimized for common platforms by default. If you have a specific resolution, aspect ratio, or file format requirement, mention it early so recommendations stay aligned throughout the session.
## Troubleshooting
If the generated video structure feels off-paced or too long, try specifying a hard time cap in your prompt (e.g., 'keep it under 90 seconds'). The skill defaults to completeness over brevity, so setting a limit forces tighter editing decisions.
If captions or text overlays don't match your brand voice, share a few examples of your existing content or describe your tone explicitly — 'casual and funny' versus 'formal and informative' produces noticeably different caption styles.
For music sync issues where the beat doesn't feel matched to visual cuts, ask the skill to generate a 'cut list timed to BPM' and provide the song's tempo if you know it. This gives the edit a rhythmic backbone.
If you're getting output that feels too templated, try rephrasing your request as a story rather than a task — describe the viewer's experience from start to finish, and the skill will generate a more narrative-driven structure.
标签
skill
ai