obsidian-clipper

# Obsidian Clipper Universal clipper — saves URLs from any platform to your Obsidian Vault with local media, tags, and wikilinks. ## Configuration On first run, read `config.yml` from **the same directory as this SKILL.md file**. If missing, tell the user to copy `config.yml.example` to `config.yml` and fill in `vault.base_path`. Key paths derived from config: ``` BASE = config.vault.base_path ATTACHMENTS = BASE / config.vault.attachments_dir X_DIR = BASE / config.vault.dirs.x WECHAT_DIR = BASE / config.vault.dirs.wechat XHS_DIR = BASE / config.vault.dirs.xiaohongshu DOUYIN_DIR = BASE / config.vault.dirs.douyin GITHUB_DIR = BASE / config.vault.dirs.github WEB_DIR = BASE / config.vault.dirs.web ``` --- ## URL Router Match the URL and dispatch to the correct handler: | URL pattern | Handler | |---|---| | `x.com/*` or `twitter.com/*` | X Handler | | `mp.weixin.qq.com/*` | WeChat Handler | | `xhslink.com/*` or `xiaohongshu.com/*` | Xiaohongshu Handler | | `v.douyin.com/*` or `douyin.com/video/*` | Douyin Handler | | `github.com/{owner}/{repo}` (no deeper path) | GitHub Handler | | Everything else | Web Handler | --- ## Shared Rules These apply to ALL handlers: ### File naming - Use the content title as filename - Strip `/\:*?"<>|` and all emoji / special Unicode - Truncate if >60 characters - If a file with the same name exists, ask the user before overwriting ### Tags - Every clipping MUST have the `clipping` tag - No `.` or spaces in tags (`llms.txt` → `llms-txt`, `Claude Code` → `Claude-Code`) - Add 2-4 content-based tags automatically ### Media download - Download all images and videos to `ATTACHMENTS/` - Image naming: `{title-slug}-{n}.{ext}` (slug ≤20 chars, no emoji) - Video naming: `{title-slug}.mp4` or `{title-slug}-video-{n}.mp4` - Replace remote URLs with Obsidian wikilinks: `![[filename.ext]]` ### "Why clipped" field - Infer from conversation context if possible - If context is insufficient, ask the user ### Cross-platform linking - If the content contains a `github.com/{owner}/{repo}` link, auto-trigger the GitHub Handler to create a GitHub note, then add bidirectional wikilinks --- ## X Handler Saves X (Twitter) posts and articles. ### Step 1: Fetch post data ```bash curl -s "https://api.fxtwitter.com/{handle}/status/{tweet_id}" ``` Extract from URL: `x.com/{handle}/status/{id}` → handle, tweet_id. Fields: - `tweet.author.name` → author name - `tweet.author.screen_name` → @handle - `tweet.text` → post body (short posts) - `tweet.article` → long-form article (if present) - `tweet.article.title` → article title - `tweet.article.content.blocks` → structured content (Draft.js format) - `tweet.created_at` → publish date - `tweet.likes` / `tweet.retweets` / `tweet.views` → engagement - `tweet.media` → short post images - `tweet.article.media_entities` → article inline images ### Step 2: Determine content type **Short post** (`tweet.article` is null): use `tweet.text`, images from `tweet.media.photos`. **Article** (`tweet.article` exists): parse `tweet.article.content.blocks` (Draft.js): | block type | Markdown | |---|---| | `unstyled` | paragraph | | `header-one` | `# heading` | | `header-two` | `## heading` | | `header-three` | `### heading` | | `unordered-list-item` | `- item` | | `ordered-list-item` | `1. item` | | `atomic` | insert image `![[filename]]` | | `blockquote` | `> quote` | Inline styles: `Bold` → `**text**`, `Italic` → `*text*`. For `atomic` blocks: `entityMap` → `value.data.mediaItems[].mediaId` → match `media_entities` → `media_info.original_img_url`. ### Step 3: Download images Download all images to `ATTACHMENTS/`. ### Step 4: Generate file Write to `X_DIR/{title}.md`: **Short post:** ```markdown --- title: "{author name}的推文 - {first 30 chars}" author: "{author name}" handle: "@{screen_name}" source: {original URL} date: {publish date YYYY-MM-DD} tags: - clipping - {auto tags} --- # {author name}的推文 > X: @{handle} ({name}) | {date} | {likes} likes · {retweets} retweets · {views} views {body text} {images ![[filename]]} ``` **Article:** ```markdown --- title: "{article.title}" author: "{author name}" handle: "@{screen_name}" source: {original URL} date: {publish date YYYY-MM-DD} tags: - clipping - {auto tags} --- # {article.title} > X: @{handle} ({name}) | {date} | {likes} likes · {retweets} retweets · {views} views {parsed Markdown body with ![[local images]]} ``` ### Notes - fxtwitter API needs no auth but may rate-limit; try `vxtwitter.com` as fallback - Title: articles use `article.title`; short posts use `{name}-{first 15 chars of text}` --- ## WeChat Handler Saves WeChat Official Account (公众号) articles. ### Step 1: Fetch article data ```bash curl -s "https://down.mptext.top/api/public/v1/download?url={URL-encoded-link}&format=json" ``` Extract: `title`, `nick_name`, `create_time`, `content_noencode`, `link`. If API returns 204 or fails, fall back to `defuddle` or `WebFetch`. ### Step 2: Fetch HTML (if rich content needed) If content has images or complex formatting: ```bash curl -s "https://down.mptext.top/api/public/v1/download?url={URL-encoded-link}&format=html" ``` Extract image URLs and formatted content from HTML. ### Step 3: Download images - Download each image to `ATTACHMENTS/` - WeChat image URLs: `mmbiz.qpic.cn/mmbiz_png/...` → `.png`, `mmbiz_jpg/...` → `.jpg` - Replace with `![[filename.jpg]]` ### Step 4: Generate file Write to `WECHAT_DIR/{title}.md`: ```markdown --- title: "{article title}" author: "{公众号 name}" source: {original link} date: {publish date YYYY-MM-DD} tags: - clipping - {auto tags} --- # {article title} > 公众号：{name} | {publish date} {body with ![[local images]]} ``` --- ## Xiaohongshu Handler Saves 小红书 notes (images + video). ### Step 1: Resolve short link If URL is `xhslink.com`, follow redirects to get the real URL **with full query parameters** (especially `xsec_token`): ```bash curl -sL "<short-link>" -o /dev/null -w "%{url_effective}" ``` > **Important**: The full query string is required. Bare URLs return empty `noteDetailMap`. ### Step 2: Fetch SSR data Xiaohongshu is an SPA, but SSR embeds data in `window.__INITIAL_STATE__`: ```bash curl -sL "<full-URL-with-params>" \ -H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36" \ -H "Accept: text/html" | python3 -c " import sys, re, json html = sys.stdin.read() m = re.search(r'window\.__INITIAL_STATE__\s*=\s*(\{.*?\})\s*</script>', html, re.DOTALL) if not m: print('NO_DATA') sys.exit(1) raw = m.group(1).replace(':undefined', ':null') d = json.loads(raw) note_map = d.get('note', {}).get('noteDetailMap', {}) for nid, detail in note_map.items(): n = detail.get('note', {}) result = { 'title': n.get('title', ''), 'desc': n.get('desc', ''), 'type': n.get('type', ''), 'author': n.get('user', {}).get('nickname', ''), 'time': n.get('time', ''), 'images': [], 'video_url': None, 'tags': [t.get('name', '') for t in n.get('tagList', [])] } for img in n.get('imageList', []): url = img.get('urlDefault', '') or img.get('urlPre', '') or img.get('url', '') if url: result['images'].append(url) video = n.get('video', {}) if video: media = video.get('media', {}) stream = media.get('stream', {}) for codec in ['h264', 'h265', 'h266', 'av1']: streams = stream.get(codec, []) if streams: result['video_url'] = streams[0].get('masterUrl', '') break print(json.dumps(result, ensure_ascii=False)) " ``` ### Step 3: Download media - Images: `curl -L -o` each image to `ATTACHMENTS/` - Video (if present): `curl -L -o` to `ATTACHMENTS/{title-slug}-video.mp4` ### Step 4: Generate file Write to `XHS_DIR/{title}.md`: ```markdown --- title: "{title}" author: "{author}" source: {real URL} domain: xiaohongshu.com date: {today YYYY-MM-DD} tags: - clipping - {tags from tagList, 2-4} --- # {title} > 来源：小红书 | {author} | {publish date} ![[cover-image.jpg]] ![[video.mp4]] (if video exists) {desc body, strip [话题]# markers} ## 标签 {tags joined with /} ``` ### Notes - If curl can't reach Xiaohongshu, add `--proxy {config.xiaohongshu.proxy}` - `desc` often contains `[话题]#tag[话题]#` markers — strip them for clean text --- ## Douyin Handler Saves 抖音 videos. **Requires douyin-downloader tool** — check `config.douyin.enabled` first. If disabled, tell the user to install douyin-downloader and enable it in config. ### Step 1: Extract and resolve link From share text like `7.94 复制打开抖音...https://v.douyin.com/xxxxx/ DUL:/...`, extract the `v.douyin.com` short link. Resolve to full video ID: ```bash curl -sI -L --max-redirs 5 "https://v.douyin.com/xxxxx/" 2>&1 | grep -i location | grep '/video/' | sed 's|.*/video/\([0-9]*\).*|\1|' | head -1 ``` Construct: `https://www.douyin.com/video/{aweme_id}` ### Step 2: Download with douyin-downloader 1. Edit `{config.douyin.tool_path}/config.yml`: set `link` to the full URL 2. Run: ```bash cd {config.douyin.tool_path} && {config.douyin.python} run.py ``` 3. Find files in `Downloaded/` — `.mp4`, `_cover.jpg`, `_data.json` ### Step 3: Extract metadata From `_data.json`: - `desc` → description (title + hashtags) - `author.nickname` → author - `statistics.digg_count` → likes - `statistics.comment_count` → comments - `statistics.share_count` → shares - `create_time` → Unix timestamp → convert to YYYY-MM-DD ### Step 4: Copy media to vault Copy video and cover to `ATTACHMENTS/`. ### Step 5: Generate file Write to `DOUYIN_DIR/{title}.md`: ```markdown --- title: "{video title}" source: https://www.douyin.com/video/{aweme_id} author: "{author}" platform: 抖音 digg: {likes} comment: {comments} share: {shares} created: {publish date YYYY-MM-DD} date: {today YYYY-MM-DD} tags: - clipping - 抖音 - {auto tags} --- # {video title} > {original desc with hashtags} ## 为什么收藏 {inferred or asked} ## 视频 ![[{video-file}.mp4]] ## 封面 ![[{cover-file}_cover.jpg]] ``` ### Notes - Short links MUST be resolved first — douyin-downloader cannot handle them - If download fails (0% success), cookies may be expired — tell user to re-run cookie fetcher: ```bash cd {config.douyin.tool_path} && {config.douyin.python} -m tools.cookie_fetcher --config config.yml ``` - `create_time` is Unix timestamp: `python3 -c "import datetime; print(datetime.datetime.fromtimestamp({ts}).strftime('%Y-%m-%d'))"` --- ## GitHub Handler Saves GitHub repositories. ### Step 1: Fetch repo metadata ```bash curl -s "https://api.github.com/repos/{owner}/{repo}" ``` Extract: `name`, `full_name`, `description`, `stargazers_count`, `language`, `license.spdx_id`, `topics`, `created_at`, `html_url`, `homepage`. ### Step 2: Fetch README ```bash curl -s "https://api.github.com/repos/{owner}/{repo}/readme" ``` Decode base64 `content` to get README markdown. ### Step 3: Summarize README Do NOT copy the full README. Extract: 1. **Core features**: 3-5 bullet points, one sentence each 2. **Quick start**: only the most essential command or API snippet 3. **Notes**: limitations, dependencies, special requirements ### Step 4: Generate file Write to `GITHUB_DIR/{repo-name}.md`: ```markdown --- title: "{repo name}" source: {repo URL} author: "{owner}" stars: {star count} language: "{primary language}" license: "{license}" created: {repo created date YYYY-MM-DD} date: {today YYYY-MM-DD} tags: - clipping - github - {auto tags from topics} --- # {repo name} > {one-line description} ## 为什么收藏 {inferred or asked} ## 核心功能 - {point 1} - {point 2} - ... ## 快速开始 {minimal code block} ## 备注 {limitations, dependencies, notes} ``` ### Notes - Star count as raw number (no `1.2k` formatting — for Dataview sorting) - Use `name` not `full_name` for filename - If API returns 403 (rate limit), tell user to wait - Strip emoji from filenames --- ## Web Handler Saves generic web pages. Fallback for URLs that don't match any platform handler. ### Step 1: Extract content with defuddle ```bash defuddle parse <url> --json ``` Extract: `title`, `author`, `content` (HTML), `description`, `published`, `domain`. If defuddle is not installed, run `npm install -g defuddle-cli` first. ### Step 2: Handle empty content (SPA fallback) If `content` is empty or just `<body></body>` (client-rendered SPA), try fallback paths in order: **Path A: CDP browser** (if `config.web.cdp_enabled`): 1. Open page: `curl -s "{config.web.cdp_url}/new?url=<URL>"` → get `targetId` 2. Scroll: `curl -s "{cdp_url}/scroll?target=<ID>&direction=bottom"` 3. Extract title: `curl -s -X POST "{cdp_url}/eval?target=<ID>" -d 'document.title'` 4. Extract body: `curl -s -X POST "{cdp_url}/eval?target=<ID>" -d "document.querySelector('article')?.innerHTML || document.querySelector('main')?.innerHTML || document.body.innerHTML"` 5. Extract images: `curl -s -X POST "{cdp_url}/eval?target=<ID>" -d "[...document.querySelectorAll('img')].map(i=>i.src).filter(s=>s.startsWith('http'))"` 6. Extract videos: `curl -s -X POST "{cdp_url}/eval?target=<ID>" -d "[...document.querySelectorAll('video source, video')].map(v=>v.src||v.querySelector('source')?.src).filter(Boolean)"` 7. Close tab: `curl -s "{cdp_url}/close?target=<ID>"` **Path B: WebFetch** — use the `WebFetch` tool as fallback. **Path C: Bookmark mode** — if URL is a web app/tool/dashboard or all paths fail, create a bookmark note from meta info only. ### Step 3: Download media Download images and videos to `ATTACHMENTS/`, replace with `![[filename]]`. ### Step 4: Generate file Write to `WEB_DIR/{title}.md`: **Content page:** ```markdown --- title: "{page title}" author: "{author}" source: {URL} domain: "{domain}" date: {today YYYY-MM-DD} tags: - clipping - {auto tags} --- # {page title} > 来源：{domain} | {author} | {publish date} {Markdown body with ![[local media]]} ``` **Bookmark page** (tool/app/SPA): ```markdown --- title: "{page title}" author: "{author}" source: {URL} domain: "{domain}" date: {today YYYY-MM-DD} tags: - clipping - {auto tags} --- # {page title} > 来源：{domain} | {URL} {description} ## 为什么收藏 {inferred or asked} ## 核心功能 - {points from meta info} ``` ### Notes - defuddle does not work on SPAs (React/Vue client-rendered) — use CDP path - Always close CDP tabs after use

obsidian-clipper

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

obsidian-clipper