apify

# Apify Run any of the 17,000+ Actors on [Apify Store](https://apify.com/store) and retrieve structured results via the REST API. Full OpenAPI spec: [openapi.json](openapi.json) ## Authentication All requests need the `APIFY_TOKEN` env var. Use it as a Bearer token: ```bash -H "Authorization: Bearer $APIFY_TOKEN" ``` Base URL: `https://api.apify.com` ## Core workflow ### 1. Find the right Actor Search the Apify Store by keyword: ```bash curl -s "https://api.apify.com/v2/store?search=web+scraper&limit=5" \ -H "Authorization: Bearer $APIFY_TOKEN" | jq '.data.items[] | {name: (.username + "/" + .name), title, description}' ``` Actors are identified by `username~name` (tilde) in API paths, e.g. `apify~web-scraper`. ### 2. Get Actor README and input schema Before running an Actor, fetch its default build to get the README (usage docs) and input schema (expected JSON fields): ```bash curl -s "https://api.apify.com/v2/acts/apify~web-scraper/builds/default" \ -H "Authorization: Bearer $APIFY_TOKEN" | jq '.data | {readme, inputSchema}' ``` `inputSchema` is a JSON-stringified object — parse it to see required/optional fields, types, defaults, and descriptions. Use this to construct valid input for the run. You can also get the Actor's per-build OpenAPI spec (no auth required): ```bash curl -s "https://api.apify.com/v2/acts/apify~web-scraper/builds/default/openapi.json" ``` ### 3. Run an Actor (async — recommended for most cases) Start the Actor and get the run object back immediately: ```bash curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/runs" \ -H "Authorization: Bearer $APIFY_TOKEN" \ -H "Content-Type: application/json" \ -d '{"startUrls":[{"url":"https://example.com"}],"maxPagesPerCrawl":10}' ``` Response includes `data.id` (run ID), `data.defaultDatasetId`, `data.status`. Optional query params: `?timeout=300&memory=4096&maxItems=100&waitForFinish=60` - `waitForFinish` (0-60): seconds the API waits before returning. Useful to avoid polling for short runs. ### 4. Poll run status ```bash curl -s "https://api.apify.com/v2/actor-runs/RUN_ID?waitForFinish=60" \ -H "Authorization: Bearer $APIFY_TOKEN" | jq '.data | {status, defaultDatasetId}' ``` Terminal statuses: `SUCCEEDED`, `FAILED`, `ABORTED`, `TIMED-OUT`. ### 5. Get results **Dataset items** (most common — structured scraped data): ```bash curl -s "https://api.apify.com/v2/datasets/DATASET_ID/items?clean=true&limit=100" \ -H "Authorization: Bearer $APIFY_TOKEN" ``` Or directly from the run (shortcut — same parameters): ```bash curl -s "https://api.apify.com/v2/actor-runs/RUN_ID/dataset/items?clean=true&limit=100" \ -H "Authorization: Bearer $APIFY_TOKEN" ``` Params: `format` (`json`|`csv`|`jsonl`|`xml`|`xlsx`|`rss`), `fields`, `omit`, `limit`, `offset`, `clean`, `desc`. **Key-value store record** (screenshots, HTML, OUTPUT): ```bash curl -s "https://api.apify.com/v2/key-value-stores/STORE_ID/records/OUTPUT" \ -H "Authorization: Bearer $APIFY_TOKEN" ``` **Run log:** ```bash curl -s "https://api.apify.com/v2/logs/RUN_ID" \ -H "Authorization: Bearer $APIFY_TOKEN" ``` ### 6. Run Actor synchronously (short-running Actors only) For Actors that finish within 300 seconds, get dataset items in one call: ```bash curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/run-sync-get-dataset-items?timeout=120" \ -H "Authorization: Bearer $APIFY_TOKEN" \ -H "Content-Type: application/json" \ -d '{"startUrls":[{"url":"https://example.com"}],"maxPagesPerCrawl":5}' ``` Returns the dataset items array directly (not wrapped in `data`). Returns `408` if the run exceeds 300s. Alternative: `/run-sync` returns the KVS `OUTPUT` record instead of dataset items. ## Quick recipes ### Scrape a website ```bash curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/run-sync-get-dataset-items?timeout=120" \ -H "Authorization: Bearer $APIFY_TOKEN" \ -H "Content-Type: application/json" \ -d '{"startUrls":[{"url":"https://example.com"}],"maxPagesPerCrawl":20}' ``` ### Google search ```bash curl -s -X POST "https://api.apify.com/v2/acts/apify~google-search-scraper/run-sync-get-dataset-items?timeout=120" \ -H "Authorization: Bearer $APIFY_TOKEN" \ -H "Content-Type: application/json" \ -d '{"queries":"site:example.com openai","maxPagesPerQuery":1}' ``` ### Long-running Actor (async with polling) ```bash # 1. Start RUN=$(curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/runs?waitForFinish=60" \ -H "Authorization: Bearer $APIFY_TOKEN" \ -H "Content-Type: application/json" \ -d '{"startUrls":[{"url":"https://example.com"}],"maxPagesPerCrawl":500}') RUN_ID=$(echo "$RUN" | jq -r '.data.id') # 2. Poll until done while true; do STATUS=$(curl -s "https://api.apify.com/v2/actor-runs/$RUN_ID?waitForFinish=60" \ -H "Authorization: Bearer $APIFY_TOKEN" | jq -r '.data.status') echo "Status: $STATUS" case "$STATUS" in SUCCEEDED|FAILED|ABORTED|TIMED-OUT) break;; esac done # 3. Fetch results curl -s "https://api.apify.com/v2/actor-runs/$RUN_ID/dataset/items?clean=true" \ -H "Authorization: Bearer $APIFY_TOKEN" ``` ### Abort a run ```bash curl -s -X POST "https://api.apify.com/v2/actor-runs/RUN_ID/abort" \ -H "Authorization: Bearer $APIFY_TOKEN" ``` ## Paid / rental Actors Some Actors require a monthly subscription before they can be run. If the API returns a permissions or payment error for an Actor, ask the user to manually subscribe via the Apify Console: ``` https://console.apify.com/actors/ACTOR_ID ``` Replace `ACTOR_ID` with the Actor's ID (e.g. `AhEsMsQyLfHyMLaxz`). The user needs to click **Start** on that page to activate the subscription. Most rental Actors offer a free trial period set by the developer. You can get the Actor ID from the store search response (`data.items[].id`) or from `GET /v2/acts/username~name` (`data.id`). ## Error handling - **401**: `APIFY_TOKEN` missing or invalid. - **404 Actor not found**: check `username~name` format (tilde, not slash). Browse https://apify.com/store. - **400 run-failed**: check `GET /v2/logs/RUN_ID` for details. - **402/403 payment required**: the Actor likely requires a subscription. See "Paid / rental Actors" above. - **408 run-timeout-exceeded**: sync endpoints have a 300s limit. Use async workflow instead. - **429 rate-limit-exceeded**: retry with exponential backoff (start at 500ms, double each time). ## Additional resources - API docs (LLM-friendly): https://docs.apify.com/api/v2.md - OpenAPI spec: [openapi.json](openapi.json) - Apify Store (browse Actors): https://apify.com/store

apify

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

apify

apify

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement