Xiaohongshu Search and Summarize

This skill automates the process of extracting high-quality multi-modal content (text + images) from Xiaohongshu (小红书) and actively assists you in generating a deeply integrated, analytical final report for the user. Due to Xiaohongshu's aggressive anti-scraping mechanisms, direct HTTP requests or naive scraping often result in 404s or blocks. This skill natively bypasses these by simulating a real user through the playwright-cli in a headed browser window.

It operates in two distinct phases:

Phase 1: Subagent Data Collection

1. Simulate a search for the keyword on Xiaohongshu in a headed browser.
Advance through image sliders to fully load all lazy pictures from the top N posts.
Extract titles, descriptions, top comments, and all high-resolution images.
Download those images to a local directory and generate a raw data document ([keyword]_raw_data.md).

Phase 2: AI Multi-Modal Synthesis (Your Job)

5. You MUST use your file reading capabilities to read the [keyword]_raw_data.md file.
Inside the raw data markdown, you will find paths to image files. You MUST use your file reading / vision capabilities on these image file paths to actually ingest and "see" their visual content. If you skip this step, you are only reading file names, not the images themselves!
You analyze the texts, summarize the genuinely useful comments (discarding noise like "pm me"), and interpret the semantic content of the images you just viewed (e.g. diagrams, guidelines, step-by-step UI flows).
You compile everything into a beautifully synthesized, single comprehensive report rather than just a linear list of posts.

Dependencies

- playwright-cli (Must be available on the path)
INLINECODE4 (Required to download images and stitch the raw data markdown)
INLINECODE5 Python package (pip install requests) — used by parse.py to download images

Usage Instructions

Step 1: Run the Extraction Script

Execute the wrapper script in scripts/run.sh. It accepts the following arguments:

CODEBLOCK0

- YOUR KEYWORD: The search term to look up on Xiaohongshu.
<MAX_POSTS>: (Optional, default = 10) The number of top posts to scan.
<OUTPUT_DIRECTORY>: (Optional, default = ./) Directory where the raw data and images will be saved.

Example execution:

CODEBLOCK1

Step 2: Read Raw Data & Images

Once the bash script finishes successfully, navigate to the OUTPUT_DIRECTORY and use your file reading capabilities to ingest the generated [keyword]_raw_data.md file.

Inside this file, you will find descriptions, comments, and file paths pointing to post_X_img_Y.webp or post_X_img_Y.jpg.

Step 3: Synthesis & Summarization

This is the most critical step. Do not just return the raw markdown file to the user. Instead, write a polished comprehensive markdown report that reorganizes the information logically, while retaining a high level of detail.

Follow these strict compilation rules:

- Do not list posts individually (e.g. avoid "Post 1: ... Post 2: ...").
Read the Images: You MUST use your file reading and vision capabilities on the .webp or .jpg image files found in the raw data directory to interpret their contents.
Detailed & Comprehensive Synthesis: Provide a highly detailed summary that includes diverse viewpoints, nuances, and specific examples found across different posts. Avoid over-summarizing or losing important context; preserve the richness and diversity of the information.
Extract and merge themes: Group ideas by concepts, steps, recurring themes, or pros/cons.
Evaluate comments: Merge insights from valuable comments directly into the core narrative. Skip useless or repetitive comments, but preserve diverse opinions or helpful counter-arguments from the comments section.
Integrate images contextually: Embed the most relevant and high-quality images directly into the flow of your final report to support the analytical points being made. Describe their visual meaning based on what you saw with your vision capabilities.
Save to OUTPUT_DIRECTORY: Save your beautifully compiled final Markdown report using your file writing capabilities directly into the same <OUTPUT_DIRECTORY> as the raw data (e.g., <OUTPUT_DIRECTORY>/[keyword]_synthesis.md), and give the user the path to it.

Error Handling

If you encounter 404 Not Found or "element not visible" errors during the browser invocation:

- Keep in mind that Xiaohongshu may demand a login challenge. If the site pauses waiting for a login, instruct the user to verify the playwright-cli browser window and perform necessary authentication manually, then try the script again.

小红书搜索与摘要

该技能自动化了从小红书提取高质量多模态内容（文本+图像）的过程，并主动协助您为用户生成深度整合的分析性最终报告。由于小红书具有激进的反爬虫机制，直接的HTTP请求或简单的爬取通常会导致404错误或被屏蔽。该技能通过playwright-cli在有头浏览器窗口中模拟真实用户，原生地绕过了这些限制。

该技能分两个不同阶段运行：

第一阶段：子代理数据收集

1. 在有头浏览器中模拟在小红书上搜索关键词。
前进通过图片滑块，完全加载前N篇帖子的所有懒加载图片。
提取标题、描述、热门评论以及所有高分辨率图片。
将这些图片下载到本地目录，并生成原始数据文档（[keyword]rawdata.md）。

第二阶段：AI多模态合成（您的任务）

5. 您必须使用文件读取能力来读取[keyword]rawdata.md文件。
在原始数据markdown文件中，您将找到图片文件的路径。您必须对这些图片文件路径使用文件读取/视觉能力，以实际摄取并看到它们的视觉内容。如果您跳过此步骤，您将只读取文件名，而不是图片本身！
您分析文本，总结真正有用的评论（忽略私信我等噪音），并解释您刚刚查看的图片的语义内容（例如图表、指南、分步UI流程）。
您将所有内容整合成一个精美合成的、单一的综合报告，而不仅仅是帖子的线性列表。

依赖项

- playwright-cli（必须在路径中可用）
python3（用于下载图片和拼接原始数据markdown）
requests Python包（pip install requests）——由parse.py用于下载图片

使用说明

第一步：运行提取脚本

执行scripts/run.sh中的包装脚本。它接受以下参数：

bash
/bin/bash /scripts/run.sh 您的关键词 <最大帖子数> <输出目录>

- 您的关键词：要在小红书上搜索的查询词。
<最大帖子数>：（可选，默认=10）要扫描的热门帖子数量。
<输出目录>：（可选，默认=./）保存原始数据和图片的目录。

执行示例：

bash
/bin/bash ~/.claude/skills/xiaohongshu-search-summarizer/scripts/run.sh openclaw使用场景 10 ./xhsreportopenclaw_scenarios

第二步：读取原始数据与图片

bash脚本成功完成后，导航到输出目录，并使用您的文件读取能力来摄取生成的[keyword]rawdata.md文件。

在此文件中，您将找到描述、评论以及指向postXimgY.webp或postXimgY.jpg的文件路径。

第三步：合成与摘要

这是最关键的一步。不要仅仅将原始markdown文件返回给用户。相反，编写一份精美的综合markdown报告，逻辑性地重新组织信息，同时保留高水平的细节。

遵循以下严格的编译规则：

- 不要逐条列出帖子（例如避免帖子1：……帖子2：……）。
读取图片：您必须对原始数据目录中找到的.webp或.jpg图片文件使用文件读取和视觉能力，以解释其内容。
详细且全面的合成：提供高度详细的摘要，包括不同帖子中发现的不同观点、细微差别和具体示例。避免过度概括或丢失重要上下文；保留信息的丰富性和多样性。
提取并合并主题：按概念、步骤、重复出现的主题或优缺点对观点进行分组。
评估评论：将有价值的评论中的见解直接合并到核心叙述中。跳过无用或重复的评论，但保留评论部分中的不同观点或有帮助的反驳意见。
上下文整合图片：将最相关和高质量的图片直接嵌入到最终报告的流程中，以支持正在进行的分析点。根据您通过视觉能力看到的内容描述它们的视觉含义。
保存到输出目录：使用您的文件写入能力，将您精美编译的最终Markdown报告直接保存到与原始数据相同的<输出目录>中（例如<输出目录>/[keyword]_synthesis.md），并将路径提供给用户。

错误处理

如果在浏览器调用过程中遇到404 Not Found或元素不可见错误：

- 请记住，小红书可能要求登录验证。如果网站暂停等待登录，请指示用户验证playwright-cli浏览器窗口并手动执行必要的身份验证，然后重新运行脚本。

xiaohongshu-search-summarizer小红书搜索摘要

xiaohongshu-search-summarizer

Xiaohongshu Search and Summarize

Phase 1: Subagent Data Collection

Phase 2: AI Multi-Modal Synthesis (Your Job)

Dependencies

Usage Instructions

Step 1: Run the Extraction Script

Step 2: Read Raw Data & Images

Step 3: Synthesis & Summarization

Error Handling

小红书搜索与摘要

第一阶段：子代理数据收集

第二阶段：AI多模态合成（您的任务）

依赖项

使用说明

第一步：运行提取脚本

第二步：读取原始数据与图片

第三步：合成与摘要

错误处理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

xiaohongshu-search-summarizer小红书搜索摘要

xiaohongshu-search-summarizer

Xiaohongshu Search and Summarize

Phase 1: Subagent Data Collection

Phase 2: AI Multi-Modal Synthesis (Your Job)

Dependencies

Usage Instructions

Step 1: Run the Extraction Script

Step 2: Read Raw Data & Images

Step 3: Synthesis & Summarization

Error Handling

小红书搜索与摘要

第一阶段：子代理数据收集

第二阶段：AI多模态合成（您的任务）

依赖项

使用说明

第一步：运行提取脚本

第二步：读取原始数据与图片

第三步：合成与摘要

错误处理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement