topview-skill

# Topview AI Skill > Modular Python toolkit for the [Topview AI](https://www.topview.ai) API. ✨ **Generate. Edit. Collaborate. — All in One Place.** ✨ - 🧠 **All Mainstream Models**: Seamlessly access the world's top-tier AI models for video, image, and voice in one toolkit. - 🗣️ **Describe to Create**: Just tell the agent what you want. From talking avatars to product composites, your prompts generate the exact output. - ⚡ **Zero Manual Ops**: No manual uploads, no tedious tweaking. Everything is automated straight to your shared board. ## Execution Rule > **Always use the Python scripts in `scripts/`. Do NOT use `curl` or direct HTTP calls.** ## User-Facing Reply Rules > **⚠️ HIGHEST PRIORITY — every user-facing reply MUST follow ALL rules below.** > > Most users are non-technical. Many chat from Feishu, WeChat, or similar apps and **cannot** see local browser popups or terminals. 1. **Keep replies short** — give the result or next step directly. If one sentence is enough, don't write three. 2. **Use plain language** — no API jargon, no terminal references, no mentions of environment variables, polling, JSON, scripts, or "auth flow". Speak as if the user has never seen a command line. 3. **Never mention terminal details** — do not reference command output, logs, exit codes, file paths, config files, or any technical internals. These mean nothing to the user. 4. **Never ask the user to operate a browser popup** — the user cannot see the agent's machine screen. When login is needed, the **only** correct action is to send the authorization link directly in the chat. 5. **Always send the direct login link** — extract `URL: ...` from `auth.py login` output and use the login template below. Never say "browser opened" or similar. **If the URL is not found in the output, re-run `auth.py login` to get a new link. Never skip sending the link.** 6. **Wait for user confirmation after login** — ask the user to reply "好了" / "done", then continue the task. 7. **Explain errors simply** — if a task fails, tell the user in one sentence what happened and ask if they want to retry. Never paste error messages or technical details. 8. **Be result-oriented** — after task completion, give the user the result (link, image, video) directly. Do not describe intermediate steps. 9. **Always take the user's perspective** — the user can only see the chat conversation, nothing else. Anything requiring user action (links, confirmations) must appear in the chat. 10. **Do not tell the user to register separately** — the authorization page includes both login and sign-up. New users can register directly on that page. Never say "go to topview.ai to register first". 11. **Act directly, don't ask which method** — when login is needed, just run `auth.py login` and send the link. Don't ask "which method do you prefer?" or present multiple options. The user asked you to do something — login is just an intermediate step, handle it. 12. **Give time estimates for generation tasks** — after submitting a task, tell the user the estimated wait time so they know what to expect. Use the estimates from the "Estimated Generation Time" table below. **Estimated Generation Time** > Tell the user the estimated wait time after submitting a task. Match the user's language. | Task Type | Model | Estimated Time | |-----------|-------|---------------| | Video | Standard / Fast (Seedance 2.0) | ~5–10 min | | Video | All other video models (Kling, Sora, Veo, Vidu, etc.) | ~3–5 min | | Image | GPT Image 1.5 | ~1 min | | Image | All other image models (Nano Banana, Seedream, Imagen, Kontext, Grok, etc.) | ~30s–1 min | | Avatar | avatar4 | ~2–5 min (depends on script length) | | TTS | text2voice | ~10–30s | | Remove BG | remove_bg | ~10–30s | | Product Avatar | product_avatar | ~1–2 min | Example messages after submitting: - Chinese: "已经开始生成了，视频大约需要 5-10 分钟，请稍等~" - English: "Generation started — the video will take roughly 5–10 minutes. I'll send it to you as soon as it's ready." **Required login message template** Replace `<LOGIN_URL>` with the actual link. Follow the user's language (Chinese template for Chinese users, English for English users). Send the login link in Markdown-friendly plain text format: `点击登录 (<LOGIN_URL>)` for Chinese, `Click to sign in (<LOGIN_URL>)` for English. 中文模板： ```text 安装完成，Topview Skill 已连接到你的智能助手。点击下方登录链接，登录后将解锁以下能力：点击登录 (<LOGIN_URL>) 🎬 视频生成文字转视频、图片转视频、参考视频生成，自动配音配乐。视频模型：Seedance 2.0 · Sora 2 · Kling 3 · Veo 3.1 · Vidu Q3 · wan2.7 🖼️ AI 图片生成与编辑文字生图、AI 修图、风格转换，最高支持 4K。图片模型：Nano Banana 2 · Seedream 5.0 · GPT Image 1.5 · Imagen 4 · Kontext-Pro · Grok Image 🧑‍💼 口播数字人上传一张照片 + 文案，自动生成真人口播视频，支持多语种。 ✂️ 背景移除一键抠图，产品图、人像、任意图片秒去背景。 👗 产品模特图把你的产品图放到模特身上，自动生成带货展示图。 🎙️ 语音与配音文字转语音、声音克隆，支持多语种配音输出。登录完成后回我一句"好了"，我马上继续。 ``` English template: ```text Installation complete. Topview Skill is now connected to your agent. Click the sign-in link below. After signing in, the following capabilities will be unlocked. Click to sign in (<LOGIN_URL>) 🎬 Video Generation Text-to-video, image-to-video, reference-based generation with auto sound & music. Models: Seedance 2.0 · Sora 2 · Kling 3 · Veo 3.1 · Vidu Q3 · wan2.7 🖼️ AI Image Generation & Editing Text-to-image, AI retouching, style transfer — up to 4K resolution. Models: Nano Banana 2 · Seedream 5.0 · GPT Image 1.5 · Imagen 4 · Kontext-Pro · Grok Image 🧑‍💼 Talking Avatar Upload a photo + script to auto-generate presenter-style talking head videos. ✂️ Background Removal One-click cutout for product shots, portraits, and any image. 👗 Product Model Shots Place your product onto model templates for e-commerce showcase images. 🎙️ Voice & TTS Text-to-speech, voice cloning, multilingual dubbing and narration. Once you've signed in, just reply "done" and I'll continue right away. ``` **Banned phrases (including any variations):** - "Browser has opened" / "browser popped up" - "Run this in the terminal" / "run the login command" - "Check the popup" / "look at the browser" - "Set the environment variable" - "Command executed successfully" - "Polling task status" - "Script output is as follows" - "Go operate on that computer" / "check the robot's computer" - "Authorization page popped up" / "if the page appeared" - "Go to topview.ai to register first" — auth page has built-in registration - "Which method do you prefer?" / "two options for you" — don't give choices, just act - "Auth flow" / "perform authentication" / "complete authentication" — too technical - "Python config" / "environment setup" — user doesn't need to know - Anything asking the user to operate outside the chat window - Anything containing code, commands, or file paths **Fallback when login URL is not captured:** > If `auth.py login` output does not contain a `URL:` line (e.g. background execution missed the output), **re-run `auth.py login`** to get a fresh link. > **NEVER** fall back to telling the user to "check the browser popup" or "go operate on the agent's computer". The user cannot see it. ## Prerequisites - **Python 3.8+** - Authenticated — see [references/auth.md](references/auth.md) for the direct-link login flow - Credits available — see [references/user.md](references/user.md) to check balance - Env vars `TOPVIEW_UID` + `TOPVIEW_API_KEY` are handled automatically after login; manual setup is only for CI/internal use ```bash pip install -r {baseDir}/scripts/requirements.txt ``` ## Agent Workflow Rules > **These rules apply to ALL generation modules (avatar4, video_gen, ai_image, remove_bg, product_avatar, text2voice).** 1. **Always start with `run`** — it submits the task and polls automatically until done. This is the default and correct choice in almost all situations. 2. **Do NOT ask the user to check the task status themselves.** The agent is responsible for polling until the task completes or the timeout is reached. 3. **Only use `query`** when `run` has already timed out and you have a `taskId` to resume, or when the user explicitly provides an existing `taskId`. 4. **`query` polls continuously** — it keeps checking every `--interval` seconds until status is `success` or `fail`, or `--timeout` expires. It does not stop after one check. 5. **If `query` also times out** (exit code 2), increase `--timeout` and try again with the same `taskId`. Do not resubmit unless the task has actually failed. ``` Decision tree: → New request? use `run` → run timed out? use `query --task-id <id>` → query timed out? use `query --task-id <id> --timeout 1200` → task status=fail? resubmit with `run` ``` **Task Status:** | Status | Description | |--------|-------------| | `init` | Task is queued, waiting to be processed | | `running` | Task is actively being processed | | `success` | Task completed successfully | | `fail` | Task failed | ## Board ID Protocol > **Every generation task should include a `--board-id` so results are organized and viewable on the web.** 1. **Session start** — before submitting the first task, run `board.py list --default -q` to get the default board ID ("My First Board"). Only need to do this once per session. 2. **Pass to all tasks** — add `--board-id <id>` to every generation command (`avatar4.py`, `video_gen.py`, `ai_image.py`, `product_avatar.py`, `text2voice.py`). 3. **After completion** — if the task result contains a `boardTaskId`, show the user the edit link: `https://www.topview.ai/board/{boardId}?boardResultId={boardTaskId}`. Tell the user they can view and edit the result via this link. 4. **User wants a new board** — run `board.py create --name "..."` and use the returned board ID for subsequent tasks. 5. **User specifies a board** — use the user-provided board ID instead of the default. 6. **Forgot the board ID?** — run `board.py list --default -q` again. ``` Session flow: 1. BOARD_ID = $(board.py list --default -q) 2. avatar4.py run --board-id $BOARD_ID ... 3. video_gen.py run --board-id $BOARD_ID ... 4. (result shows edit link with boardTaskId) ``` ## Modules | Module | Script | Reference | Description | |--------|--------|-----------|-------------| | Auth | `scripts/auth.py` | [auth.md](references/auth.md) | OAuth 2.0 Device Flow — generate login link, wait for authorization, save credentials | | Avatar4 | `scripts/avatar4.py` | [avatar4.md](references/avatar4.md) | Talking avatar videos from a photo; `list-captions` for caption styles | | Video Gen | `scripts/video_gen.py` | [video_gen.md](references/video_gen.md) | Image-to-video, text-to-video, omni reference(video generation from reference video, image, audio and text) | | AI Image | `scripts/ai_image.py` | [ai_image.md](references/ai_image.md) | Text-to-image and AI image editing (10+ models) | | Remove BG | `scripts/remove_bg.py` | [remove_bg.md](references/remove_bg.md) | Remove image background — step 1 of Product Avatar flow | | Product Avatar | `scripts/product_avatar.py` | [product_avatar.md](references/product_avatar.md) | Model showcase product image; `list-avatars`/`list-categories` for template browsing | | Text2Voice | `scripts/text2voice.py` | [text2voice.md](references/text2voice.md) | Text-to-speech audio generation | | Voice | `scripts/voice.py` | [voice.md](references/voice.md) | Voice list/search, voice cloning, delete custom voices | | Board | `scripts/board.py` | [board.md](references/board.md) | Board management — organize results, view/edit on web | | User | `scripts/user.py` | [user.md](references/user.md) | Credit balance and usage history | > **Read individual reference docs for usage, options, and code examples.** > Local files (image/audio/video) are auto-uploaded when passed as arguments — no manual upload step needed. --- ## Creative Guide > **Core Principle:** Start from the user's intent, not from the API. > Analyze what the user wants to achieve, then pick the right tool, model, and parameters. ### Step 1 — Intent Analysis Every time a user requests content, identify: | Dimension | Ask Yourself | Fallback | |-----------|-------------|----------| | **Output Type** | Image? Video? Audio? Composite? | Must ask | | **Purpose** | Marketing? Education? Social media? Personal? | General social media | | **Source Material** | What does the user have? What's missing? | Must ask | | **Style / Tone** | Professional? Casual? Playful? Authoritative? | Professional & friendly | | **Duration** | How long should the output be? | 5–15s for clips, 30–60s for avatar | | **Language** | What language? Need captions? | Match user's language | | **Channel** | Where will it be published? | General purpose | ### Step 2 — Tool Selection ``` What does the user need? │ ├─ A person speaking to camera (talking head)? │ → avatar4 or video_gen with native-audio models │ ├─ An image animated into a video clip? │ → video_gen --type i2v │ ├─ A video generated purely from text? │ → video_gen --type t2v │ ├─ A new video based on reference materials (style transfer, editing)? │ → video_gen --type omni │ ├─ An image generated from a text prompt? │ → ai_image --type text2image │ ├─ An existing image edited / modified with AI? │ → ai_image --type image_edit │ ├─ Remove background from an image (e.g. product cutout)? │ → remove_bg │ ├─ A product placed into a model/avatar scene? │ → product_avatar (use remove_bg first if product has background) │ → product_avatar list-avatars to browse public templates │ ├─ Browse available caption styles for avatar videos? │ → avatar4 list-captions │ ├─ Text converted to speech audio? │ → text2voice │ ├─ Need to find a voice / list available voices? │ → voice list │ ├─ Clone a custom voice from audio sample? │ → voice clone │ ├─ Delete a custom voice? │ → voice delete │ ├─ Manage boards / view results on web? │ → board (list, create, detail, tasks) │ ├─ A combination (e.g., talking head + product clips)? │ → Use a recipe (see Step 3) │ └─ Outside current capabilities? → See Capability Boundaries below ``` **Quick-reference routing table:** | User says... | Script & Type | |----------------------------------------------|-------------| | "Make a talking avatar video with this photo and text" | `avatar4.py` (pass local image path directly) | | "Generate a video with this photo and my audio recording" | `avatar4.py` (pass local image + audio paths) | | "Animate this image / image-to-video" | `video_gen.py --type i2v` (pass local image path) | | "Generate a video about..." | `video_gen.py --type t2v` | | "Generate a new video referencing this image's style" | `video_gen.py --type omni` | | "Generate an image / text-to-image" | `ai_image.py --type text2image` | | "Modify this image / change background" | `ai_image.py --type image_edit` | | "Remove image background / cutout" | `remove_bg.py` | | "Put this product on a model image" | `product_avatar.py` (use `remove_bg.py` first if product has background) | | "What product avatar/model templates are available?" | `product_avatar.py list-avatars` | | "What caption styles are available?" | `avatar4.py list-captions` | | "Convert this text to speech / audio" | `text2voice.py` | | "What voices are available? / Find a female voice" | `voice.py list --gender female` | | "Clone a voice from this audio recording" | `voice.py clone --audio <file>` | | "Delete this custom voice" | `voice.py delete --voice-id <id>` | | "View my board / check what was generated" | `board.py list` or `board.py tasks --board-id <id>` | | "Create a new board" | `board.py create --name "..."` | | "Check how many credits I have left" | `user.py credit` | > **Video model selection** — see [references/video_gen.md](references/video_gen.md) § Model Recommendation. > **Image model tip:** For all image tasks, default to **Nano Banana 2** — strongest all-round model with best quality, 14 aspect ratios, up to 4K, and 14 reference images for editing. See [references/ai_image.md](references/ai_image.md) § Model Recommendation. > **Product Avatar workflow:** For best results, use the 2-step flow: `remove_bg.py` to get a `bgRemovedImageFileId`, then `product_avatar.py` with `--product-image-no-bg`. Use `product_avatar.py list-avatars` to browse public templates and get an `avatarId`. See [references/product_avatar.md](references/product_avatar.md) § Full Workflow. > **Caption styles for avatar4:** Use `avatar4.py list-captions` to discover available caption styles, then pass the `captionId` via `--caption`. > **Talking-head tip — avatar4 vs video_gen with native audio:** > Some video_gen models (e.g. Standard, Kling V3, Veo 3.1) support native audio and can produce talking-head videos with **better visual quality** than avatar4. However, they have **shorter max duration** (5–15s) and are **significantly more expensive**. Avatar4 supports up to 120s per segment at much lower cost. > **Rule of thumb:** Default to avatar4 for most talking-head needs. Consider video_gen native-audio models only when the clip is short (<=15s) and the user explicitly prioritizes top-tier visual quality over cost. ### Step 3 — Simple vs Complex **Simple requests** — the user's need is clear, materials are ready → handle directly from the reference docs. **Complex requests** — the user gives a *goal* (e.g., "make a promo video", "explain how AI works") rather than a direct API instruction. Follow this universal workflow: 1. **Deconstruct & Clarify:** Ask the user for the target audience, core message, intended duration, and what assets they currently have (photos, scripts). 2. **Determine the Route:** - *Has a person's photo + needs narration* → Use `avatar4` (Talking Head). - *Has a product/reference photo* → Use `video_gen --type i2v` or `omni`. - *No assets, purely visual concept* → Use `video_gen --type t2v`. - *Requires both* → Plan a Hybrid approach (Avatar narration + B-roll inserts). 3. **Structure the Content:** - Write a structured script (Hook → Body/Explanation → Call to Action). - Add `<break time="0.5s"/>` tags to TTS scripts for natural pacing. - For visuals, write detailed prompts covering Subject + Action + Lighting + Camera. 4. **Handle Long-Form (>120s):** If the script exceeds the 120s limit for a single `avatar4` task, split it into logical segments (e.g., 60s each) at natural sentence boundaries. Submit tasks in parallel using the `submit` command, ensure parameters (voice/model) remain locked across segments, and deliver them in order. --- ## Pre-Execution Protocol > Follow this before EVERY generation task. 1. **Estimate cost** — use `video_gen.py estimate-cost` for video tasks, `ai_image.py estimate-cost` for image tasks; avatar4 costs depend on video length; product_avatar is fixed 0.5 credits; text2voice is fixed 0.1 credits 2. **Validate parameters** — ensure model, aspect ratio, resolution, and duration are compatible (use `list-models` to check) 3. **Ask about missing key parameters** — if the user has not specified important parameters that affect the output, ask before proceeding. Key parameters by module: - **video_gen**: duration, aspect ratio, model - **ai_image**: aspect ratio, resolution, model, number of images - **avatar4**: (usually determined by input, but confirm voice if not specified) - **text2voice**: voice selection - Do NOT silently pick defaults for these — always confirm with the user. 4. **Confirm before first submission** — before the very first generation task in a session, present the full plan (tool, model, parameters, cost estimate) and ask the user: - Whether to proceed with the generation - Whether they want the agent to ask for confirmation before each subsequent task, or trust the agent to proceed automatically for the rest of the session - These two questions should be combined into a single confirmation message. - If the user chooses "auto-proceed", skip the confirmation step (but still ask about missing parameters) for subsequent tasks in the same session. - If the user explicitly said "just do it" or similar upfront, treat it as auto-proceed from the start. ## Agent Behavior Protocol ### During Execution 1. **Pass local paths directly** — scripts auto-upload local files to S3 before submitting tasks 2. **Parallelize independent steps** — independent generation tasks can run concurrently 3. **Keep consistency across segments** — when generating multiple segments, use identical parameters ### After Execution > **Use the structured result templates below.** The user should see the output link first, then the board link, then key metadata. Keep it clean and scannable. **Video result template:** ```text 🎬 视频已生成完成视频地址：<VIDEO_URL> • 时长：<DURATION> • 画幅：<ASPECT_RATIO> • 模型：<MODEL_NAME> • 消耗：<COST> credits 🔗 项目链接 https://www.topview.ai/board/<BOARD_ID>?boardResultId=<BOARD_TASK_ID> 可在项目中查看、编辑和下载。不满意的话可以告诉我，我帮你调整后重新生成。 ``` **Image result template:** ```text 🖼️ 图片已生成完成图片地址：<IMAGE_URL> • 分辨率：<RESOLUTION> • 模型：<MODEL_NAME> • 消耗：<COST> credits 🔗 项目链接 https://www.topview.ai/board/<BOARD_ID>?boardResultId=<BOARD_TASK_ID> 可在项目中查看、编辑和下载。不满意的话可以告诉我，我帮你调整后重新生成。 ``` **English video result template:** ```text 🎬 Video generated Video: <VIDEO_URL> • Duration: <DURATION> • Aspect ratio: <ASPECT_RATIO> • Model: <MODEL_NAME> • Cost: <COST> credits 🔗 Project link https://www.topview.ai/board/<BOARD_ID>?boardResultId=<BOARD_TASK_ID> View, edit, and download in the project. Not happy with the result? Let me know and I'll adjust and regenerate. ``` **English image result template:** ```text 🖼️ Image generated Image: <IMAGE_URL> • Resolution: <RESOLUTION> • Model: <MODEL_NAME> • Cost: <COST> credits 🔗 Project link https://www.topview.ai/board/<BOARD_ID>?boardResultId=<BOARD_TASK_ID> View, edit, and download in the project. Not happy with the result? Let me know and I'll adjust and regenerate. ``` **Rules:** 1. **Result link first** — always show the video/image URL at the very top. 2. **Board link second** — if `boardTaskId` is available, show the board edit link. 3. **Key metadata only** — duration, aspect ratio/resolution, model, cost. Don't dump raw JSON or extra fields. 4. **Offer iteration** — end with a short note that the user can ask for adjustments. Remind that regeneration costs additional credits. 5. **Multiple outputs** — if the task produced multiple results, number them (1, 2, 3…) each with its own link and metadata. 6. **Match user language** — use the Chinese template for Chinese users, English for English users. ### Error Handling See [references/error_handling.md](references/error_handling.md) for error codes, task-level failures, and recovery decision tree. --- ## Capability Boundaries | Capability | Status | Script | |---------------------------------|--------------|----------------------------------------------------------| | Photo avatar / talking head | Available | `scripts/avatar4.py` | | Caption styles | Available | `scripts/avatar4.py list-captions` | | Credit management | Available | `scripts/user.py` | | Image-to-video (i2v) | Available | `scripts/video_gen.py --type i2v` | | Text-to-video (t2v) | Available | `scripts/video_gen.py --type t2v` | | Omni reference video | Available | `scripts/video_gen.py --type omni` | | Text-to-image | Available | `scripts/ai_image.py --type text2image` | | Image editing | Available | `scripts/ai_image.py --type image_edit` | | Remove background | Available | `scripts/remove_bg.py` | | Product avatar / image replace | Available | `scripts/product_avatar.py` | | Product avatar templates | Available | `scripts/product_avatar.py list-avatars` / `list-categories` | | Text-to-speech (TTS) | Available | `scripts/text2voice.py` | | Voice list / search | Available | `scripts/voice.py list` | | Voice cloning | Available | `scripts/voice.py clone` | | Delete custom voice | Available | `scripts/voice.py delete` | | Board management | Available | `scripts/board.py` | | Board task browsing | Available | `scripts/board.py tasks` / `task-detail` | | Marketing video (m2v) | No module | Suggest [topview.ai](https://www.topview.ai) web UI | > **Never promise capabilities that don't exist as modules.**

topview-skill

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

topview-skill

topview-skill

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement