ai-podcast-pipeline

Create Korean AI podcast packages from QuickView trend notes. Use for dual-host script writing (Callie × Nick), Gemini multi-speaker TTS audio generation, subtitle timing/render fixes, thumbnail+MP4 packaging, and YouTube title/description output. Supports both full (15~20 min) and compressed (5~7 min) editions.

作者: admin | 来源: ClawHub

# AI Podcast Pipeline ## ⚠️ Security Notice This skill may trigger antivirus false positives due to legitimate use of: - **base64 decoding**: Used ONLY to decode audio data from Gemini TTS API responses (standard practice for binary data in JSON) - **subprocess calls**: Used ONLY to invoke ffmpeg for audio/video processing - **Environment variables**: Reads API keys from user-configured environment (`GEMINI_API_KEY`) - **Network requests**: Calls Google Gemini API for text-to-speech generation All code is open source and auditable in this repository. No malicious behavior. Build end-to-end podcast assets from `Trend/QuickView-*` content. ## Core Workflow 1. Select source QuickView file. 2. Generate script (full or compressed mode). 3. Build dual-voice MP3 (Gemini multi-speaker, chunked for reliability). 4. Generate full-text Korean subtitles (no ellipsis truncation). 5. Render subtitle MP4 with tuned font/size/timing shift. 6. Build thumbnail + YouTube metadata. 7. Deliver final package. ## Step 1) Select Source Prefer weekly QuickView file from your configured Quartz root. If user gives `wk.aiee.app` URL, map to local Quartz markdown first. ## Step 2) Generate Script Read and apply: - `references/podcast_prompt_template_ko.md` Modes: - **Full mode**: 15~20 minutes - **Compressed mode**: 5~7 minutes (core tips only) Rules: - no system/meta text in spoken lines - host intro once at opening only - conversational Korean, short sentences, actionable - save script in `archive/` ## Step 3) Build Audio (Gemini Multi-Speaker, Reliable) ### Preferred: chunked builder (timeout-safe) ```bash # Set API key via environment (required) export GEMINI_API_KEY="<YOUR_KEY>" # Run from skills/ai-podcast-pipeline/ python3 scripts/build_dualvoice_audio.py \ --input <script.txt> \ --outdir <outdir> \ --basename podcast_full_dualvoice \ --chunk-lines 6 ``` ### Single-pass (short scripts) ```bash python3 scripts/gemini_multispeaker_tts.py \ --input-file <dialogue.txt> \ --outdir <outdir> \ --basename podcast_dualvoice \ --retries 3 \ --timeout-seconds 120 ``` Default voice mapping (2026-02-10 fixed): - Callie (female) → `Kore` - Nick (male) → `Puck` Output: MP3 (default delivery format) ## Step 4) Build Korean Subtitles (Full Text) Use full-text subtitle builder (no `...` truncation): ```bash python3 scripts/build_korean_srt.py \ --script <script.txt> \ --audio <final.mp3> \ --output <outdir>/podcast.srt \ --max-chars 22 ``` ## Step 5) Render Subtitled MP4 (Font + Timing) Use renderer with adjustable font and timing shift: ```bash python3 scripts/render_subtitled_video.py \ --image <thumbnail.png> \ --audio <final.mp3> \ --srt <podcast.srt> \ --output <outdir>/final.mp4 \ --font-name "Do Hyeon" \ --font-size 27 \ --shift-ms -250 ``` Notes: - `shift-ms` negative = subtitle earlier (for lag fixes) - If text clipping occurs, lower `font-size` (e.g., 25~27) - keep text inside safe area; avoid overlap with character/object ## Step 6) Build Thumbnail + YouTube Metadata ```bash # Set API key via environment (required) export GEMINI_API_KEY="<YOUR_KEY>" python3 scripts/build_podcast_assets.py \ --source "<QuickView path or URL>" ``` Reference (layout/copy guardrails): - `references/thumbnail_guidelines_ko.md` ## Step 7) Final Delivery Checklist Always include: 1. source used 2. final MP3 path 3. subtitle MP4 path + size 4. thumbnail path 5. YouTube title options (3) 6. YouTube description ## Reliability Rules - Gemini timeout on long input: use chunked builder (`build_dualvoice_audio.py`) - Subtitle clipping: reduce font size and increase bottom margin - Subtitle lag: adjust `--shift-ms` (usually `-150` to `-300`) - Keep generated assets under Telegram practical limits ## Security Notes - API keys must be passed via environment variables (`GEMINI_API_KEY`), not hardcoded. - Never paste raw keys into prompts, logs, screenshots, or public posts. - Recent hardening: thumbnail generation now passes keys via env (not CLI args). ## References - `references/podcast_prompt_template_ko.md` - `references/workflow_runbook.md` - `references/thumbnail_guidelines_ko.md`

ai-podcast-pipeline