ai-podcast-pipeline
# AI Podcast Pipeline
## ⚠️ Security Notice
This skill may trigger antivirus false positives due to legitimate use of:
- **base64 decoding**: Used ONLY to decode audio data from Gemini TTS API responses (standard practice for binary data in JSON)
- **subprocess calls**: Used ONLY to invoke ffmpeg for audio/video processing
- **Environment variables**: Reads API keys from user-configured environment (`GEMINI_API_KEY`)
- **Network requests**: Calls Google Gemini API for text-to-speech generation
All code is open source and auditable in this repository. No malicious behavior.
Build end-to-end podcast assets from `Trend/QuickView-*` content.
## Core Workflow
1. Select source QuickView file.
2. Generate script (full or compressed mode).
3. Build dual-voice MP3 (Gemini multi-speaker, chunked for reliability).
4. Generate full-text Korean subtitles (no ellipsis truncation).
5. Render subtitle MP4 with tuned font/size/timing shift.
6. Build thumbnail + YouTube metadata.
7. Deliver final package.
## Step 1) Select Source
Prefer weekly QuickView file from your configured Quartz root.
If user gives `wk.aiee.app` URL, map to local Quartz markdown first.
## Step 2) Generate Script
Read and apply:
- `references/podcast_prompt_template_ko.md`
Modes:
- **Full mode**: 15~20 minutes
- **Compressed mode**: 5~7 minutes (core tips only)
Rules:
- no system/meta text in spoken lines
- host intro once at opening only
- conversational Korean, short sentences, actionable
- save script in `archive/`
## Step 3) Build Audio (Gemini Multi-Speaker, Reliable)
### Preferred: chunked builder (timeout-safe)
```bash
# Set API key via environment (required)
export GEMINI_API_KEY="<YOUR_KEY>"
# Run from skills/ai-podcast-pipeline/
python3 scripts/build_dualvoice_audio.py \
--input <script.txt> \
--outdir <outdir> \
--basename podcast_full_dualvoice \
--chunk-lines 6
```
### Single-pass (short scripts)
```bash
python3 scripts/gemini_multispeaker_tts.py \
--input-file <dialogue.txt> \
--outdir <outdir> \
--basename podcast_dualvoice \
--retries 3 \
--timeout-seconds 120
```
Default voice mapping (2026-02-10 fixed):
- Callie (female) → `Kore`
- Nick (male) → `Puck`
Output: MP3 (default delivery format)
## Step 4) Build Korean Subtitles (Full Text)
Use full-text subtitle builder (no `...` truncation):
```bash
python3 scripts/build_korean_srt.py \
--script <script.txt> \
--audio <final.mp3> \
--output <outdir>/podcast.srt \
--max-chars 22
```
## Step 5) Render Subtitled MP4 (Font + Timing)
Use renderer with adjustable font and timing shift:
```bash
python3 scripts/render_subtitled_video.py \
--image <thumbnail.png> \
--audio <final.mp3> \
--srt <podcast.srt> \
--output <outdir>/final.mp4 \
--font-name "Do Hyeon" \
--font-size 27 \
--shift-ms -250
```
Notes:
- `shift-ms` negative = subtitle earlier (for lag fixes)
- If text clipping occurs, lower `font-size` (e.g., 25~27)
- keep text inside safe area; avoid overlap with character/object
## Step 6) Build Thumbnail + YouTube Metadata
```bash
# Set API key via environment (required)
export GEMINI_API_KEY="<YOUR_KEY>"
python3 scripts/build_podcast_assets.py \
--source "<QuickView path or URL>"
```
Reference (layout/copy guardrails):
- `references/thumbnail_guidelines_ko.md`
## Step 7) Final Delivery Checklist
Always include:
1. source used
2. final MP3 path
3. subtitle MP4 path + size
4. thumbnail path
5. YouTube title options (3)
6. YouTube description
## Reliability Rules
- Gemini timeout on long input: use chunked builder (`build_dualvoice_audio.py`)
- Subtitle clipping: reduce font size and increase bottom margin
- Subtitle lag: adjust `--shift-ms` (usually `-150` to `-300`)
- Keep generated assets under Telegram practical limits
## Security Notes
- API keys must be passed via environment variables (`GEMINI_API_KEY`), not hardcoded.
- Never paste raw keys into prompts, logs, screenshots, or public posts.
- Recent hardening: thumbnail generation now passes keys via env (not CLI args).
## References
- `references/podcast_prompt_template_ko.md`
- `references/workflow_runbook.md`
- `references/thumbnail_guidelines_ko.md`
标签
skill
ai