Music Generator Skill (Full SOP)
Capability Overview
This skill supports the following intents:
1) Generate a full song with lyrics
2) Generate pure background music (BGM)
3) Generate lyrics only (no audio)
4) Query music generation task status
Users can describe the music they want in plain language. The system auto-determines the mode and
handles parameter inference and task tracking.
Triggers and Natural Language Examples
The following natural language requests will trigger this skill:
- - Generate a romantic love song
- Write lyrics about the night sky
- Create electronic music suitable for short‑video background
- Check my music generation progress
- I want a cheerful background music track
Execution (SOP Step‑by‑Step)
Preflight Check (Mandatory)
- - Read MUSICFULAPIKEY from the skill folder’s .env (resolved at runtime via the running script path): /.env
- If not configured (empty/missing), immediately inform the user:
- "MUSICFUL
APIKEY is not configured. Please visit https://www.musicful.ai/api/authentication/interface-key/
to obtain/purchase an interface key, then write the KEY into
root>/.env under MUSICFULAPI_KEY."
- Stop subsequent calls and wait for the user to complete configuration before continuing.
The execution flow is intent‑based and incorporates a two‑stage return and a "lyrics‑first" UX:
- - Single command entry: /music_generator, with mode branch control:
- mode=normal (default): generate and show lyrics → submit generation → return preview (status=2) →
return final (status=0)
- mode=bgm: pure music (instrumental=1), no lyrics → preview first → then final
- mode=lyrics: return lyrics text immediately
- - When using the "custom lyrics" path (built into the normal flow or future extensions): submit generation
directly and poll (preview first, then final)
Scenario A: Generate a Full Song with Lyrics
Typical user inputs:
- - Generate a romantic electronic song
- Write a sad rock song and generate audio
- Here are some lyrics, please use them to create the song: [...user‑provided lyrics...]
Detailed Flow
Step 1: Intent Recognition
- - If the user provides complete lyrics, treat it as lyrics‑provided generation;
- Otherwise, assume lyrics need to be generated automatically and then used to synthesize the song.
Step 2: Lyrics Handling
- - If lyrics are provided → use them directly;
- If not provided → call the V1 Lyrics API to generate lyrics content;
Step 3: Submit Music Generation Task
- - POST {BASE_URL}/v1/music/generate
- body: INLINECODE0
Step 4: Automatic Task Polling (Two Stages)
- - GET {BASEURL}/v1/music/tasks?ids=id[,task_id2,…]>
- Status semantics (key):
- status = 2 → preview stage complete (returns audio_url as preview link)
- status = 0 → full audio complete (returns audio_url as downloadable final link)
- others → processing or failed (use fail_code/fail_reason)
1) On first status=2: immediately announce "preview is ready" and return audio_url for listening;
2) Continue polling until status=0: then return "final audio is ready" with audio_url (download/publish).
Step 5: Return Results to the User (Two‑Stage × Two Songs)
- - The system by default generates two songs/two task_ids (e.g., ids=[id1,id2]). For each id, perform the
two‑stage return independently:
- Stage 1: when this id reaches status=2 → return the preview link (audio_url)
- Stage 2: keep polling this id → when status=0 → return the full mp3 download link (audio_url)
- - Recommended output format (one block per song):
- title:
- prompt:
- lyrics:
- preview:
- full:
Scenario B: Generate Pure Background Music
:whitecheckmark: Typical user inputs:
- - Generate a piece of pure background music
- I want an electronic instrumental suitable for video background
Detailed Flow
Step 1: Intent Recognition
- - Detect semantics like "pure music/background music/accompaniment" → enter the pure BGM flow.
Step 2: Submit Music Generation Task
- - POST {BASE_URL}/v1/music/generate
- body: INLINECODE7
Step 3: Automatic Task Polling (Two Stages)
- - Same as Scenario A: status=2 → preview; status=0 → full
Step 4: Return Preview & Final Links (Two Steps)
- - Stage 1: preview link (status=2)
- Stage 2: final link (status=0)
Scenario C: Generate Lyrics Only
:whitecheckmark: User inputs:
- - Write lyrics about a summer beach
- I only need lyrics, about a rainy day
Process
- - POST {BASE_URL}/v1/lyrics body: INLINECODE8
- Return: INLINECODE9
Scenario D: Query Music Generation Task Status
:whitecheckmark: User inputs:
- - Check the progress of task_id=abc123
- See how far my song generation has progressed
Detailed Flow
- 1. Extract taskid from the user input;
- GET {BASEURL}/v1/music/tasks?ids=
- Return task status and audio information.
Parameter Inference Rules
| Parameter | Source | Default/Notes |
|---|
| INLINECODE10 | Inferred from user input | Default to Pop/general if none |
| INLINECODE11 |
Default high‑quality | Prefer latest high‑quality |
| instrumental | Set to 1 for BGM | Otherwise 0 |
| lyrics | User‑provided / auto | — |
| title | Inferred or auto‑named | — |
BASE_URL and API Key:
- - MUSICFULBASEURL (default: https://api.musicful.ai)
- MUSICFULAPIKEY (read from the skill folder’s .env; environment variable MUSICFULAPIKEY is also honored if set)
- Entry points: scripts/musicfulapi.py, CLI: scripts/runmusicful.py / scripts/dispatchmusicgenerator.py
- Important: ensure MUSICFULAPIKEY is configured before calling; if missing, the server may respond with HTTP 500 (helps pinpoint auth/config issues quickly).
Error Handling and Fallback
- 1. If the request is unclear (e.g., "generate music" without clarifying lyrics vs BGM) → ask a follow‑up;
- If the API call fails → return clear failure reason and suggestions;
- If polling times out → prompt the user to wait or retry.
Unified Return Format
Success:
{ "status": "success", "data": { ... } }
Error:
CODEBLOCK1
Example Dialogues
- - User: Generate a sad rock song
- Skill: Shows generated lyrics → submits job → returns preview link → returns full mp3 link
- - User: An ambient BGM for a quiet night
- Skill: Submits job (instrumental=1) → returns preview → returns full
- - User: Write lyrics about the night sky
- Skill: Returns generated lyrics
音乐生成技能(完整标准操作流程)
能力概述
本技能支持以下意图:
1) 生成带歌词的完整歌曲
2) 生成纯背景音乐(BGM)
3) 仅生成歌词(无音频)
4) 查询音乐生成任务状态
用户可以用自然语言描述他们想要的音乐。系统自动确定模式并处理参数推断和任务跟踪。
触发条件与自然语言示例
以下自然语言请求将触发此技能:
- - 生成一首浪漫情歌
- 写一首关于夜空的歌词
- 创作适合短视频背景的电子音乐
- 查看我的音乐生成进度
- 我想要一段欢快的背景音乐
执行流程(标准操作流程逐步说明)
预检检查(必做)
- - 从技能文件夹的.env文件中读取MUSICFULAPIKEY(运行时通过运行脚本路径解析):/.env
- 如果未配置(为空/缺失),立即告知用户:
- MUSICFUL
APIKEY未配置。请访问https://www.musicful.ai/api/authentication/interface-key/
获取/购买接口密钥,然后将密钥写入
root>/.env文件中的MUSICFULAPI_KEY字段。
- 停止后续调用,等待用户完成配置后再继续。
执行流程基于意图,并采用两阶段返回和歌词优先的用户体验:
- - 单一命令入口:/music_generator,带模式分支控制:
- mode=normal(默认):生成并显示歌词 → 提交生成 → 返回预览(status=2)→
返回最终结果(status=0)
- mode=bgm:纯音乐(instrumental=1),无歌词 → 先预览 → 再最终结果
- mode=lyrics:立即返回歌词文本
- - 使用自定义歌词路径时(内置于正常流程或未来扩展):直接提交生成
并轮询(先预览,再最终结果)
场景A:生成带歌词的完整歌曲
典型用户输入:
- - 生成一首浪漫电子歌曲
- 写一首悲伤摇滚歌曲并生成音频
- 这里有一些歌词,请用它们来创作歌曲:[...用户提供的歌词...]
详细流程
第1步:意图识别
- - 如果用户提供完整歌词,视为已提供歌词的生成;
- 否则,假设需要自动生成歌词,然后用其合成歌曲。
第2步:歌词处理
- - 如果提供了歌词 → 直接使用;
- 如果未提供 → 调用V1歌词API生成歌词内容;
第3步:提交音乐生成任务
- - POST {BASE_URL}/v1/music/generate
- 请求体:{ action: custom, lyrics: <歌词>, style: <从用户输入推断>, mv: <默认使用最新高质量模型> }
第4步:自动任务轮询(两阶段)
- - GET {BASEURL}/v1/music/tasks?ids=id[,task_id2,…]>
- 状态含义(关键):
- status = 2 → 预览阶段完成(返回audio_url作为预览链接)
- status = 0 → 完整音频完成(返回audio_url作为可下载的最终链接)
- 其他 → 处理中或失败(使用failcode/failreason)
1) 首次status=2时:立即告知预览已就绪并返回audio_url供试听;
2) 继续轮询直到status=0:然后返回最终音频已就绪并附带audio_url(下载/发布)。
第5步:向用户返回结果(两阶段 × 两首歌曲)
- - 系统默认生成两首歌曲/两个task_id(例如ids=[id1,id2])。对每个id独立执行两阶段返回:
- 阶段1:当该id达到status=2时 → 返回预览链接(audio_url)
- 阶段2:持续轮询该id → 当status=0时 → 返回完整mp3下载链接(audio_url)
- 标题:<标题>
- 提示词:<用户原始描述>
- 歌词:<歌词模式的完整歌词;BGM模式为空>
- 预览:<预览链接(status=2)>
- 完整版:<最终mp3(status=0)>
场景B:生成纯背景音乐
:whitecheckmark: 典型用户输入:
- - 生成一段纯背景音乐
- 我想要一段适合视频背景的电子纯音乐
详细流程
第1步:意图识别
- - 检测纯音乐/背景音乐/伴奏等语义 → 进入纯BGM流程。
第2步:提交音乐生成任务
- - POST {BASE_URL}/v1/music/generate
- 请求体:{ action: auto, style: <从用户输入推断>, mv: <默认最新模型>, instrumental: 1 }
第3步:自动任务轮询(两阶段)
- - 与场景A相同:status=2 → 预览;status=0 → 完整版
第4步:返回预览和最终链接(两步)
- - 阶段1:预览链接(status=2)
- 阶段2:最终链接(status=0)
场景C:仅生成歌词
:whitecheckmark: 用户输入:
- - 写一首关于夏日海滩的歌词
- 我只需要歌词,关于雨天
处理流程
- - POST {BASE_URL}/v1/lyrics 请求体:{ prompt: <用户描述> }
- 返回:{ lyrics: }
场景D:查询音乐生成任务状态
:whitecheckmark: 用户输入:
- - 查看task_id=abc123的进度
- 看看我的歌曲生成到哪一步了
详细流程
- 1. 从用户输入中提取taskid;
- GET {BASEURL}/v1/music/tasks?ids=
- 返回任务状态和音频信息。
参数推断规则
| 参数 | 来源 | 默认值/备注 |
|---|
| style | 从用户输入推断 | 默认为流行/通用风格 |
| mv |
默认高质量 | 优先使用最新高质量模型 |
| instrumental | BGM模式设为1 | 其他情况为0 |
| lyrics | 用户提供/自动生成 | — |
| title | 推断或自动命名 | — |
BASE_URL和API密钥:
- - MUSICFULBASEURL(默认:https://api.musicful.ai)
- MUSICFULAPIKEY(从技能文件夹的.env文件读取;如果设置了环境变量MUSICFULAPIKEY也会被识别)
- 入口点:scripts/musicfulapi.py,CLI:scripts/runmusicful.py / scripts/dispatchmusicgenerator.py
- 重要:调用前确保已配置MUSICFULAPIKEY;如果缺失,服务器可能返回HTTP 500(有助于快速定位认证/配置问题)。
错误处理和降级方案
- 1. 如果请求不明确(例如生成音乐未说明是歌词还是BGM)→ 追问澄清;
- 如果API调用失败 → 返回明确的失败原因和建议;
- 如果轮询超时 → 提示用户等待或重试。
统一返回格式
成功:
json
{ status: success, data: { ... } }
错误:
json
{ status: error, message: <原因> }
对话示例
- - 用户:生成一首悲伤摇滚歌曲
- 技能:显示生成的歌词 → 提交任务 → 返回预览链接 → 返回完整mp3链接
- - 用户:一段适合安静夜晚的氛围BGM
- 技能:提交任务(instrumental=1)→ 返回预览 → 返回完整版
- - 用户:写一首关于夜空的歌词
- 技能:返回生成的歌词