Music Analysis (Local, No External APIs)
Primary tool: a full listen that combines snapshot analysis, structure, groove, harmonic tension, temporal mood mapping, and optional Whisper lyric alignment into one report.
1. Full Listen — primary / recommended
CODEBLOCK0
What it does in one pass:
- 1. Snapshot analysis: tempo, pulse stability, swing proxy, key clarity, harmonic tension, timbre, structure
- Whisper lyric transcription and filtering first — keep only real lyric text, drop artifact tags like INLINECODE0
- Temporal listen: windowed energy / mood / tension journey
- Synthesis layer that aligns lyrics with peak / tension / quiet windows and lets the lyric layer override the final vibe when confidence is high
Human-readable output structure
- groove/pocket
- structure summary + repeated sections
- harmony (key clarity + tension)
- timbre descriptor tags
- likely instrument palette (strong/likely/possible confidence)
- per-section instrument entrances and exits
- how instruments color the emotional feel
- written as natural language, not clinical data
- opening / middle / closing mood-energy-tension read
- peak / quietest / tensest moments
- mood journey and transition count
- explainable emotion summary based on measured features
- Whisper segment count
- excerpt or graceful skip note
- lyric-energy/tension alignment
- peak / tension / quiet lyric moments
- per-window moments where transitions / lyrics / tension spikes occur
2. Snapshot Analysis — standalone
CODEBLOCK1
Reports:
- - tempo / pulse stability / pulse confidence / swing proxy / pocket
- key estimate / key clarity / chroma entropy / harmonic change / tonal motion / tension
- timbre descriptors (brightness, richness, low-end, contrast, dynamic range)
- section labels (A/B/C...) and repeated material detection
- explainable emotional read with reasons
3. Temporal Listen — standalone
CODEBLOCK2
Reports:
- - sliding-window timeline (4s windows, 2s hops)
- energy contour
- mood labels
- harmonic tension + tonal motion
- transition types (drop hits, pulls back, tightens harmonically, shifts color, evolves)
- narrative arc (mountain / ascending / descending / plateau / wave)
Interpretation rules
- - Structure labels are similarity labels, not verse/chorus claims.
- Swing proxy is a feel estimate, not drummer-grade microtiming truth.
- Emotion is explainable, derived from pulse + timbre + harmonic tension rather than a black-box mood guess.
- Lyrics can override the final vibe when filtered Whisper text is confident and emotionally clear.
Audio sourcing
The tool needs a real audio file on disk.
- - Direct file (mp3, wav, flac, ogg, m4a — anything ffmpeg/librosa can read)
- YouTube / supported URLs: INLINECODE1
Whisper lyrics transcription
INLINECODE2 uses:
- - CLI: INLINECODE3
- Model: INLINECODE4
- Preprocess: convert input to mono 16kHz WAV via ffmpeg
- Fallback: skip gracefully if Whisper is missing or errors
Dependencies
Python:
System:
Workspace hygiene
- - Keep temporary audio files in a dedicated temp/output folder for the skill.
- Avoid modifying unrelated project files while working on audio analysis tasks.
音乐分析(本地,无外部API)
主要工具:完整聆听,将快照分析、结构、律动、和声张力、时间情绪映射以及可选的Whisper歌词对齐整合到一份报告中。
1. 完整聆听 — 主要/推荐方式
bash
python3 skills/music-analysis/scripts/listen.py /path/to/audio.mp3
python3 skills/music-analysis/scripts/listen.py track.mp3 --json
python3 skills/music-analysis/scripts/listen.py track.mp3 --out report.txt
python3 skills/music-analysis/scripts/listen.py track.mp3 --json --out report.json
一次性完成的内容:
- 1. 快照分析:速度、脉冲稳定性、摇摆代理、调性清晰度、和声张力、音色、结构
- 首先进行Whisper歌词转录和过滤——仅保留真实歌词文本,丢弃如[MUSIC]等伪影标签
- 时间聆听:窗口化能量/情绪/张力变化过程
- 合成层,将歌词与峰值/张力/安静窗口对齐,并在置信度高时让歌词层覆盖最终氛围
人类可读输出结构
- 律动/节奏感
- 结构摘要 + 重复段落
- 和声(调性清晰度 + 张力)
- 音色描述标签
- 可能的乐器组合(高置信度/可能/可能的置信度级别)
- 各段落乐器的进入和退出
- 乐器如何影响情感感受
- 以自然语言而非临床数据形式呈现
- 开头/中间/结尾的情绪-能量-张力解读
- 峰值/最安静/最紧张时刻
- 情绪变化过程及过渡次数
- 基于测量特征的可解释情感摘要
- Whisper片段数量
- 摘录或优雅跳过说明
- 歌词-能量/张力对齐
- 峰值/张力/安静时刻的歌词
- 每个窗口内发生过渡/歌词/张力峰值的时间点
2. 快照分析 — 独立运行
bash
python3 skills/music-analysis/scripts/analyze_music.py /path/to/audio.mp3
python3 skills/music-analysis/scripts/analyze_music.py track.mp3 --json
报告内容:
- - 速度/脉冲稳定性/脉冲置信度/摇摆代理/节奏感
- 调性估计/调性清晰度/色度熵/和声变化/调性运动/张力
- 音色描述(亮度、丰富度、低频、对比度、动态范围)
- 段落标签(A/B/C...)及重复素材检测
- 带有原因的可解释情感解读
3. 时间聆听 — 独立运行
bash
python3 skills/music-analysis/scripts/temporal_listen.py /path/to/audio.mp3
python3 skills/music-analysis/scripts/temporal_listen.py track.mp3 --json
报告内容:
- - 滑动窗口时间线(4秒窗口,2秒步进)
- 能量轮廓
- 情绪标签
- 和声张力 + 调性运动
- 过渡类型(节拍落下、回拉、和声收紧、色彩变化、演变)
- 叙事弧线(山峰/上升/下降/平台/波浪)
解读规则
- - 结构标签是相似性标签,而非主歌/副歌声明。
- 摇摆代理是感觉估计,而非鼓手级别的微节奏真相。
- 情感是可解释的,源自脉冲+音色+和声张力,而非黑箱情绪猜测。
- 当过滤后的Whisper文本置信度高且情感清晰时,歌词可以覆盖最终氛围。
音频来源
该工具需要磁盘上的真实音频文件。
- - 直接文件(mp3、wav、flac、ogg、m4a——任何ffmpeg/librosa可读取的格式)
- YouTube/支持的URL:yt-dlp -x --audio-format mp3 -o output.mp3 URLORSEARCH
Whisper歌词转录
listen.py使用:
- - CLI:/opt/homebrew/bin/whisper-cli
- 模型:~/.local/share/whisper-cpp/ggml-large-v3-turbo.bin
- 预处理:通过ffmpeg将输入转换为单声道16kHz WAV
- 回退:如果Whisper缺失或出错则优雅跳过
依赖项
Python:
系统:
工作区卫生
- - 将临时音频文件保存在该技能专用的临时/输出文件夹中。
- 在处理音频分析任务时,避免修改不相关的项目文件。