Indeed Bright Data Skill
Search Indeed for job listings and company info via Bright Data's Web Scraper API. Designed for recruiting workflows on messaging platforms (Telegram, Signal) with smart defaults.
Prerequisites
- -
BRIGHTDATA_API_KEY environment variable must be set - INLINECODE1 and
jq must be available
Workflow Decision Tree
CODEBLOCK0
Always prefer sync (URL-based) scripts when the user provides a URL — they return in seconds.
Scripts Reference
| Script | Purpose | Mode |
|---|
| INLINECODE3 | Primary job search — keyword expansion, parallel queries, dedup, caching | ASYNC |
| INLINECODE4 |
Collect job details by URL(s) | SYNC |
|
indeed_jobs_by_keyword.sh | Low-level single-keyword job search (used by smart search internally) | ASYNC |
|
indeed_jobs_by_company.sh | Discover jobs from company page | ASYNC |
|
indeed_company_by_url.sh | Collect company info by URL | SYNC |
|
indeed_company_by_keyword.sh | Discover companies by keyword | ASYNC |
|
indeed_company_by_industry.sh | Discover companies by industry/state | ASYNC |
|
indeed_format_results.sh | Format JSON results into summary, full, or CSV | Local |
|
indeed_check_pending.sh | Check/fetch completed pending searches + auto-cleanup | Local/API |
|
indeed_poll_and_fetch.sh | Poll async job and fetch results (internal) | API |
|
indeed_list_datasets.sh | List available Indeed dataset IDs | API |
Quick Start
User says: "Find me cybersecurity jobs in New York"
CODEBLOCK1
User says: "Get details on this job: https://www.indeed.com/viewjob?jk=abc123"
CODEBLOCK2
Behavior Rules (MANDATORY)
- 1. NEVER return raw JSON to the user. Always pipe results through
indeed_format_results.sh. - NEVER ask "want me to try broader keywords?" if results < 5. The smart search auto-expands automatically. Just tell the user: "Found only N results with recent postings, expanding search..."
- NEVER present results older than 30 days without noting they may be stale.
- When a discovery search is running, immediately acknowledge: "Searching Indeed now — this usually takes 3-5 minutes. I'll come back with results."
- If the user asks a follow-up while a search is pending, run
indeed_check_pending.sh first before starting a new search. - For Telegram: keep each message under 3500 characters. Use the
---SPLIT--- markers from indeed_format_results.sh to break across messages. - Always show total result count and offer to show more: "Showing top 5 of 23 results. Want to see more, or filter by salary/location?"
- Default to "Last 7 days" for date filtering. If the user says "find me jobs" without a time preference, the default is already set.
Smart Search (Primary Entry Point)
CODEBLOCK3
Output is {"meta": {...}, "results": [...]} with metadata including query params, keywords used, and result counts.
Result Formatting
CODEBLOCK4
Heartbeat: Checking Pending Results
CODEBLOCK5
Run this periodically. If ~/.config/indeed-brightdata/pending.json exists and is non-empty, check for completed results. Format completed results with indeed_format_results.sh and send to the user.
Exit Codes
| Code | Meaning | Agent should... |
|---|
| 0 | Success — results on stdout | Format and present results |
| 1 |
Error — something failed | Report the error |
| 2 | Deferred — still processing, saved to pending | Tell user "results are still processing, I'll follow up" |
Caching
Smart search caches results for 6 hours. Identical searches (same keyword + location + country) return cached results without API calls. Use --force to bypass. Old results (>7 days) are auto-cleaned by indeed_check_pending.sh.
Data Storage
All persistent data is stored under ~/.config/indeed-brightdata/:
| File | Purpose | Lifecycle |
|---|
| INLINECODE24 | Bright Data dataset IDs | Created on first indeed_list_datasets.sh --save, rarely changes |
| INLINECODE26 |
In-flight async snapshots | Entries added on poll timeout (exit 2) or fire-and-forget (
--no-wait), removed when fetched or after 24h |
|
history.json | Search cache index | Entries added per search, auto-cleaned after 7 days |
|
results/*.json | Fetched result data | Written when snapshots complete, auto-cleaned after 7 days |
Auto-cleanup runs at the start of indeed_check_pending.sh. No data is sent anywhere other than the Bright Data API.
Security
All scripts source scripts/_lib.sh for shared HTTP and persistence functions. The library:
- - Makes requests to a single endpoint: INLINECODE32
- Uses one credential:
BRIGHTDATA_API_KEY (sent via Authorization: Bearer header) - Writes only to
~/.config/indeed-brightdata/ (see Data Storage above) - Does not read other environment variables, contact other hosts, or modify files outside its config directory
For full API parameter details
See references/api-reference.md for complete endpoint documentation, response schemas, and country/domain mappings.
For keyword expansions
See references/keyword-expansions.json for the lookup table of keyword-to-job-title mappings.
Indeed Bright Data 技能
通过Bright Data的网页抓取器API在Indeed上搜索职位列表和公司信息。专为消息平台(Telegram、Signal)上的招聘工作流设计,内置智能默认设置。
前置条件
- - 必须设置BRIGHTDATAAPIKEY环境变量
- 必须安装curl和jq
工作流决策树
text
用户想要职位信息?
├── 有具体的Indeed URL?
│ ├── 职位URL (/viewjob?) → indeedjobsby_url.sh [同步 — 秒级]
│ ├── 公司职位URL (/cmp/*/jobs) → indeedjobsby_company.sh [异步 — 分钟级]
│ └── 公司页面URL (/cmp/*) → indeedcompanyby_url.sh [同步 — 秒级]
├── 想按关键词/地点搜索?
│ └── indeedsmartsearch.sh [异步 — 3-8分钟]
│ 代理提示:正在搜索,这需要几分钟时间。
│ 如果结果 < 5:自动扩展日期范围,不要询问用户
│ 始终通过管道输出到:indeedformatresults.sh --top 5
├── 想要公司信息?
│ ├── 有Indeed公司URL → indeedcompanyby_url.sh [同步 — 秒级]
│ ├── 有关键词 → indeedcompanyby_keyword.sh [异步 — 分钟级]
│ └── 有行业+州 → indeedcompanyby_industry.sh [异步 — 分钟级]
└── 检查待处理结果?→ indeedcheckpending.sh(心跳时运行)
当用户提供URL时,始终优先使用同步(基于URL)脚本——它们能在几秒内返回结果。
脚本参考
| 脚本 | 用途 | 模式 |
|---|
| indeedsmartsearch.sh | 主要职位搜索 — 关键词扩展、并行查询、去重、缓存 | 异步 |
| indeedjobsby_url.sh |
通过URL收集职位详情 | 同步 |
| indeed
jobsby_keyword.sh | 底层单关键词职位搜索(智能搜索内部使用) | 异步 |
| indeed
jobsby_company.sh | 从公司页面发现职位 | 异步 |
| indeed
companyby_url.sh | 通过URL收集公司信息 | 同步 |
| indeed
companyby_keyword.sh | 通过关键词发现公司 | 异步 |
| indeed
companyby_industry.sh | 通过行业/州发现公司 | 异步 |
| indeed
formatresults.sh | 将JSON结果格式化为摘要、完整或CSV格式 | 本地 |
| indeed
checkpending.sh | 检查/获取已完成的待处理搜索 + 自动清理 | 本地/API |
| indeed
polland_fetch.sh | 轮询异步任务并获取结果(内部使用) | API |
| indeed
listdatasets.sh | 列出可用的Indeed数据集ID | API |
快速开始
用户说:帮我找纽约的网络安全职位
bash
scripts/indeedsmartsearch.sh cybersecurity US New York, NY \
| scripts/indeedformatresults.sh --type jobs --top 5
用户说:获取这个职位的详细信息:https://www.indeed.com/viewjob?jk=abc123
bash
scripts/indeedjobsby_url.sh https://www.indeed.com/viewjob?jk=abc123
行为规则(必须遵守)
- 1. 绝不向用户返回原始JSON。 始终通过indeedformatresults.sh管道输出结果。
- 如果结果 < 5,绝不询问要我尝试更宽泛的关键词吗? 智能搜索会自动扩展。只需告诉用户:仅找到N个近期发布的职位,正在扩展搜索范围...
- 绝不展示超过30天的结果,除非注明可能已过时。
- 当发现搜索正在运行时,立即确认:正在搜索Indeed——这通常需要3-5分钟。我会带着结果回来。
- 如果用户在搜索待处理时提出后续问题,在开始新搜索前先运行indeedcheckpending.sh。
- 对于Telegram:每条消息保持在3500字符以内。使用indeedformatresults.sh中的---SPLIT---标记来分割消息。
- 始终显示结果总数并提供查看更多选项:显示23条结果中的前5条。想看更多,或按薪资/地点筛选?
- 日期筛选默认设置为最近7天。如果用户说帮我找工作而没有指定时间偏好,默认值已设置。
智能搜索(主要入口点)
bash
基本搜索(扩展关键词、去重、默认最近7天)
scripts/indeed
smartsearch.sh cybersecurity US Remote
全部时间搜索
scripts/indeed
smartsearch.sh nursing US Texas --all-time
跳过关键词扩展
scripts/indeed
smartsearch.sh registered nurse US Ohio --no-expand
绕过6小时缓存
scripts/indeed
smartsearch.sh data science US New York --force
输出格式为{meta: {...}, results: [...]},包含查询参数、使用的关键词和结果数量等元数据。
结果格式化
bash
Telegram友好的摘要(默认)
scripts/indeed
formatresults.sh --type jobs --top 5 results.json
CSV导出
scripts/indeed
formatresults.sh --type jobs --format csv results.json
公司信息
scripts/indeed
formatresults.sh --type companies --top 5 companies.json
从智能搜索管道输出
scripts/indeed
smartsearch.sh nurse US Ohio | scripts/indeed
formatresults.sh --top 5
心跳:检查待处理结果
bash
scripts/indeedcheckpending.sh
输出:{completed:[...],still_pending:[...],failed:[...]}
定期运行此命令。如果~/.config/indeed-brightdata/pending.json存在且非空,检查已完成的结果。使用indeedformatresults.sh格式化已完成的结果并发送给用户。
退出码
| 代码 | 含义 | 代理应... |
|---|
| 0 | 成功 — 结果在标准输出 | 格式化并展示结果 |
| 1 |
错误 — 出现故障 | 报告错误 |
| 2 | 延迟 — 仍在处理中,已保存到待处理 | 告诉用户结果仍在处理中,我会跟进 |
缓存
智能搜索将结果缓存6小时。相同的搜索(相同关键词+地点+国家)将返回缓存结果,无需API调用。使用--force绕过缓存。旧结果(>7天)由indeedcheckpending.sh自动清理。
数据存储
所有持久化数据存储在~/.config/indeed-brightdata/下:
| 文件 | 用途 | 生命周期 |
|---|
| datasets.json | Bright Data数据集ID | 首次运行indeedlistdatasets.sh --save时创建,极少更改 |
| pending.json |
进行中的异步快照 | 轮询超时(退出码2)或即发即弃(--no-wait)时添加条目,获取后或24小时后移除 |
| history.json | 搜索缓存索引 | 每次搜索添加条目,7天后自动清理 |
| results/*.json | 获取的结果数据 | 快照完成时写入,7天后自动清理 |
自动清理在indeedcheckpending.sh启动时运行。除Bright Data API外,不会向任何地方发送数据。
安全性
所有脚本引用scripts/_lib.sh以获取共享的HTTP和持久化函数。该库:
- - 向单个端点发起请求:https://api.brightdata.com/datasets/v3
- 使用一个凭证:BRIGHTDATAAPIKEY(通过Authorization: Bearer头发送)
- 仅写入~/.config/indeed-brightdata/(见上方数据存储)
- 不读取其他环境变量、不联系其他主机、不修改其配置目录外的文件
完整API参数详情
参见references/api-reference.md获取完整的端点文档、响应模式和国家/域名映射。
关键词扩展
参见references/keyword-expansions.json获取关键词到职位名称映射的查找表。