Add Directories
Workflow
- 1. Parse the input (URL or pasted text) into a list of directories
- Deduplicate against existing entries in INLINECODE0
- Append new entries with required fields
- Classify by running the analysis and verification pipeline
- Discover forms for submission targets
- Submit via automation or manual browser interaction
Step 1: Parse Input
From URL
Fetch the page and extract directory entries. Look for patterns like:
- - Name + URL pairs in lists, tables, or cards
- Structured data (JSON-LD, markdown tables, CSV)
- Repeated DOM patterns with links
From GitHub Topics/Repos
Use gh CLI to explore curated lists:
- -
gh repo clone <owner>/<repo> to clone awesome-lists - Parse README.md for directory links (markdown link format)
- Check for JSON/YAML data files with directory entries
- Can also create PRs to add your product to these lists
From Pasted Text
Parse lines/rows. Common formats:
- -
Name - https://url.com or INLINECODE4 - Markdown links: INLINECODE5
- Markdown tables with Name and URL columns
- Plain URLs (one per line) — derive name from domain
- CSV/TSV with headers
Extract at minimum: name and url (submission or homepage).
Step 2: Deduplicate
Load directories.json and check each parsed entry against existing ones by:
- - Exact URL match (normalize: strip trailing slash, lowercase domain)
- Domain match (same domain = likely duplicate)
- Name match (case-insensitive)
Report duplicates to the user and skip them.
Step 3: Append New Entries
For each new directory, create an entry with this structure:
CODEBLOCK0
Field rules:
- -
slug: lowercase name, spaces to hyphens, strip special chars - INLINECODE8 and
url: use the submission/signup URL if available, otherwise homepage - INLINECODE10 : leave empty string (will be filled later or by user)
- INLINECODE11 : default
["General"] unless context provides a category - INLINECODE13 : default
"free" unless explicitly marked paid - INLINECODE15 : always
true for new entries
Save the updated directories.json.
Step 4: Classify
Run the pipeline scripts in order using the project venv at .venv/:
CODEBLOCK1
Each script reads/writes directories.json. Steps 3-4 use browser_check_list.json as intermediate state (generated by step 2).
After completion, report the summary: how many added, and the auth/status breakdown for the new entries.
Step 5: Discover Forms
For directories that are active and have auth_type = none or auth_type = email_password:
CODEBLOCK2
This visits each submission URL with Playwright, extracts form fields via DOM queries, and updates submission_plan.json with discovered fields and form paths.
Step 6: Submit
Automated Submission
Configure the PRODUCT dict in submit_directories.py with your details (search for YOUR_ placeholders), then:
CODEBLOCK3
The script uses heuristic field mapping (matching field names/labels to product data) and handles file uploads for logo/screenshot.
Manual Browser Submission (via Playwright MCP)
For directories that need manual interaction (captcha, OAuth, complex forms), use the Playwright browser tools:
- 1. Navigate to the submission URL
- Take a snapshot to understand the page structure
- Fill form fields using
browser_fill_form or INLINECODE27 - Handle OAuth flows by switching tabs when Google login popups open
- Upload files via INLINECODE28
- Click submit and verify confirmation
GitHub PR Submissions
Some directories accept submissions via GitHub PRs to awesome-lists:
- 1. Fork the repo: INLINECODE29
- Clone and create a branch
- Add your product entry following the repo's format
- Push and create PR: INLINECODE30
Notes
Pipeline Scripts
- -
analyze_directories.py uses ThreadPoolExecutor with plain HTTP — fast first pass - INLINECODE33 triages errors (dead domains, invalid URLs, Facebook groups) and builds INLINECODE34
- INLINECODE35 uses async Playwright with 10 concurrent tabs;
--recheck-unknown does a deep DOM pass on active unknowns only - INLINECODE37 uses async Playwright with 10 concurrent tabs; extracts form field names, types, labels, and paths
- INLINECODE38 uses async Playwright with 5 concurrent tabs; heuristic field mapping with file upload support
- All scripts are idempotent — safe to re-run
Common Submission Blockers
When evaluating or submitting to directories, watch for these issues:
| Blocker | Frequency | How to Detect |
|---|
| Paid listing required | ~20% | Look for pricing page, Stripe/PayPal links, "$" on submit page |
| reCAPTCHA / Turnstile |
~10% |
iframe[src*=recaptcha] or
[data-turnstile] elements |
|
Broken captcha | ~2% | "Invalid site key" errors, disabled submit buttons |
|
Login/account required | ~15% | Redirect to
/login or
/register on submit URL |
|
Business email required | ~3% | Rejects gmail/yahoo domains (e.g., SoftwareSuggest) |
|
Reciprocal link required | ~5% | Old web directories require backlink before listing |
|
Newsletter-only forms | ~10% | Page looks like submit but is actually email signup |
|
Backend API broken | ~2% | Form submits but returns GraphQL/API errors |
|
Domain parked/dead | ~8% | No content, parking page, DNS failure |
|
Cloudflare blocked | ~3% | Challenge page, 403 errors |
Automation Tips
- - Simple HTML forms have highest auto-submit success rate
- reCAPTCHA v3 (invisible) sometimes passes; v2 (checkbox) never does automatically
- Google Forms are reliably automatable
- Rich text editors (TinyMCE, Quill) need
browser_evaluate to set content - Cloudinary/custom upload widgets often break automation — use manual browser
- Cross-origin OAuth popups: Switch tabs with
browser_tabs action to handle Google login - Combobox/select fields: Use
browser_click on the dropdown, then click the option - Multi-step forms: Take snapshot after each step to see new fields
Submission Plan Structure
Each entry in submission_plan.json contains:
CODEBLOCK4
Status values: discovered, submitted, skipped, skipped_paid, timeout, no_form_found, no_fields_matched, submit_timeout, captcha, cloudflare_blocked, domain_parked, skipped_login_required, deferred.
Best ROI Directory Types (for AI/SaaS products)
- 1. AI tool directories with simple forms (FutureTools, SaaSHub, AItools.inc, etc.)
- Startup directories with Google Form submissions
- GitHub awesome-lists accepting PRs (free, high-quality backlinks)
- NoCode/SaaS aggregators (NoCodeList, NoCodeDevs)
- General web directories with DA≥30 (for SEO value)
Security Note
Before pushing to GitHub, ensure all personal data is stripped:
- - Search for
YOUR_ placeholders in submission_plan.json and INLINECODE62 - Never commit real emails, passwords, or API keys
- The
.playwright-mcp/ folder may contain console logs with personal data — add to INLINECODE64
添加目录
工作流程
- 1. 解析输入(URL或粘贴文本)为目录列表
- 去重与directories.json中已有条目比对
- 追加包含必填字段的新条目
- 分类通过运行分析和验证流水线
- 发现表单用于提交目标
- 提交通过自动化或手动浏览器交互
步骤1:解析输入
从URL
获取页面并提取目录条目。查找以下模式:
- - 列表、表格或卡片中的名称+URL对
- 结构化数据(JSON-LD、Markdown表格、CSV)
- 包含链接的重复DOM模式
从GitHub主题/仓库
使用gh命令行工具探索精选列表:
- - gh repo clone <所有者>/<仓库>克隆awesome列表
- 解析README.md中的目录链接(Markdown链接格式)
- 检查包含目录条目的JSON/YAML数据文件
- 也可以创建PR将你的产品添加到这些列表中
从粘贴文本
解析行/行。常见格式:
- - 名称 - https://url.com 或 名称 | https://url.com
- Markdown链接:名称
- 包含名称和URL列的Markdown表格
- 纯URL(每行一个)——从域名派生名称
- 带标题的CSV/TSV
至少提取:名称和URL(提交页或首页)。
步骤2:去重
加载directories.json并通过以下方式检查每个解析条目与现有条目:
- - 精确URL匹配(规范化:去除尾部斜杠,域名小写)
- 域名匹配(相同域名=可能重复)
- 名称匹配(不区分大小写)
向用户报告重复项并跳过它们。
步骤3:追加新条目
为每个新目录创建一个具有以下结构的条目:
json
{
categories: [通用],
description: ,
is_active: true,
name: 目录名称,
pricing_type: free,
slug: directory-name,
submission_url: https://example.com/submit,
url: https://example.com/submit
}
字段规则:
- - slug:小写名称,空格转连字符,去除特殊字符
- submissionurl和url:使用提交/注册URL(如有),否则使用首页
- description:留空字符串(稍后由用户或自动填充)
- categories:默认[通用],除非上下文提供分类
- pricingtype:默认free,除非明确标记为付费
- is_active:新条目始终为true
保存更新后的directories.json。
步骤4:分类
使用项目虚拟环境.venv/按顺序运行流水线脚本:
bash
1. HTTP级别分析(认证、验证码、定价信号、失效域名)
.venv/bin/python analyze_directories.py
2. 清理明显失败项 + 构建浏览器检查列表
.venv/bin/python cleanup
andcategorize.py
3. 使用Playwright进行浏览器验证(10个并发工作线程)
.venv/bin/python browser_verify.py
4. 深度重新检查剩余未知项
.venv/bin/python browser_verify.py --recheck-unknown
每个脚本读取/写入directories.json。步骤3-4使用browserchecklist.json作为中间状态(由步骤2生成)。
完成后,报告摘要:添加了多少条目,以及新条目的认证/状态细分。
步骤5:发现表单
对于活跃且authtype = none或authtype = email_password的目录:
bash
在提交页面上发现表单字段
.venv/bin/python discover_forms.py
这将使用Playwright访问每个提交URL,通过DOM查询提取表单字段,并使用发现的字段和表单路径更新submission_plan.json。
步骤6:提交
自动提交
在submitdirectories.py中配置PRODUCT字典,填写你的详细信息(搜索YOUR占位符),然后:
bash
自动提交到所有已发现的目录
.venv/bin/python submit_directories.py
该脚本使用启发式字段映射(将字段名称/标签与产品数据匹配)并处理Logo/截图的文件上传。
手动浏览器提交(通过Playwright MCP)
对于需要手动交互的目录(验证码、OAuth、复杂表单),使用Playwright浏览器工具:
- 1. 导航到提交URL
- 拍摄快照以了解页面结构
- 使用browserfillform或browsertype填写表单字段
- 通过切换标签页处理OAuth流程(当Google登录弹窗打开时)
- 通过browserfile_upload上传文件
- 点击提交并验证确认信息
GitHub PR提交
一些目录通过GitHub PR接受提交到awesome列表:
- 1. Fork仓库:gh repo fork <所有者>/<仓库>
- 克隆并创建分支
- 按照仓库格式添加你的产品条目
- 推送并创建PR:gh pr create
备注
流水线脚本
- - analyzedirectories.py使用ThreadPoolExecutor配合纯HTTP——快速初筛
- cleanupandcategorize.py分类错误(失效域名、无效URL、Facebook群组)并构建browserchecklist.json
- browserverify.py使用异步Playwright,10个并发标签页;--recheck-unknown仅对活跃未知项进行深度DOM遍历
- discoverforms.py使用异步Playwright,10个并发标签页;提取表单字段名称、类型、标签和路径
- submitdirectories.py使用异步Playwright,5个并发标签页;启发式字段映射,支持文件上传
- 所有脚本都是幂等的——可安全重新运行
常见提交障碍
在评估或提交到目录时,注意以下问题:
| 障碍 | 频率 | 如何检测 |
|---|
| 需要付费列表 | ~20% | 查找定价页面、Stripe/PayPal链接、提交页上的$符号 |
| reCAPTCHA / Turnstile |
~10% | iframe[src*=recaptcha]或[data-turnstile]元素 |
|
损坏的验证码 | ~2% | 无效站点密钥错误、禁用的提交按钮 |
|
需要登录/账户 | ~15% | 提交URL重定向到/login或/register |
|
需要企业邮箱 | ~3% | 拒绝gmail/yahoo域名(例如SoftwareSuggest) |
|
需要互惠链接 | ~5% | 旧式网络目录要求在列表前添加反向链接 |
|
仅新闻通讯表单 | ~10% | 页面看起来像提交页,但实际上是邮箱注册 |
|
后端API损坏 | ~2% | 表单提交但返回GraphQL/API错误 |
|
域名停放/失效 | ~8% | 无内容、停放页面、DNS故障 |
|
Cloudflare拦截 | ~3% | 验证页面、403错误 |
自动化技巧
- - 简单HTML表单自动提交成功率最高
- reCAPTCHA v3(不可见)有时可通过;v2(复选框)无法自动通过
- Google表单可靠地可自动化
- 富文本编辑器(TinyMCE、Quill)需要使用browserevaluate设置内容
- Cloudinary/自定义上传组件经常破坏自动化——使用手动浏览器
- 跨域OAuth弹窗:使用browsertabs操作切换标签页处理Google登录
- 组合框/选择字段:在下拉菜单上使用browser_click,然后点击选项
- 多步骤表单:每一步后拍摄快照以查看新字段
提交计划结构
submission_plan.json中的每个条目包含:
json
{
directory_name: 示例AI,
submission_url: https://example.com/submit,
status: discovered,
copy: {
title: 产品标题变体,
description: 此目录的产品描述变体。
},
discovered_fields: [...],
form_path: form#submit-form,
credentials: {
email: 你的邮箱,
name: 你的姓名,
username: 你的用户名,
password: 你的密码
}
}
状态值:discovered、submitted、skipped、skippedpaid、timeout、noformfound、nofieldsmatched、submittimeout、captcha、cloudflareblocked、domainparked、skippedloginrequired