Building LLM-Powered Applications with Claude
This skill helps you build LLM-powered applications with Claude. Choose the right surface based on your needs, detect the project language, then read the relevant language-specific documentation.
Defaults
Unless the user requests otherwise:
For the Claude model version, please use Claude Opus 4.6, which you can access via the exact model string claude-opus-4-6. Please default to using adaptive thinking (thinking: {type: "adaptive"}) for anything remotely complicated. And finally, please default to streaming for any request that may involve long input, long output, or high max_tokens — it prevents hitting request timeouts. Use the SDK's .get_final_message() / .finalMessage() helper to get the complete response if you don't need to handle individual stream events
Language Detection
Before reading code examples, determine which language the user is working in:
- 1. Look at project files to infer the language:
- *.py, requirements.txt, pyproject.toml, setup.py, Pipfile → Python — read from python/
- *.ts, *.tsx, package.json, tsconfig.json → TypeScript — read from typescript/
- *.js, *.jsx (no .ts files present) → TypeScript — JS uses the same SDK, read from typescript/
- *.java, pom.xml, build.gradle → Java — read from java/
- *.kt, *.kts, build.gradle.kts → Java — Kotlin uses the Java SDK, read from java/
- *.scala, build.sbt → Java — Scala uses the Java SDK, read from java/
- *.go, go.mod → Go — read from go/
- *.rb, Gemfile → Ruby — read from ruby/
- *.cs, *.csproj → C# — read from csharp/
- *.php, composer.json → PHP — read from INLINECODE42
- 2. If multiple languages detected (e.g., both Python and TypeScript files):
- Check which language the user's current file or question relates to
- If still ambiguous, ask: "I detected both Python and TypeScript files. Which language are you using for the Claude API integration?"
- 3. If language can't be inferred (empty project, no source files, or unsupported language):
- Use AskUserQuestion with options: Python, TypeScript, Java, Go, Ruby, cURL/raw HTTP, C#, PHP
- If AskUserQuestion is unavailable, default to Python examples and note: "Showing Python examples. Let me know if you need a different language."
- 4. If unsupported language detected (Rust, Swift, C++, Elixir, etc.):
- Suggest cURL/raw HTTP examples from curl/ and note that community SDKs may exist
- Offer to show Python or TypeScript examples as reference implementations
- 5. If user needs cURL/raw HTTP examples, read from
curl/.
Language-Specific Feature Support
| Language | Tool Runner | Agent SDK | Notes |
|---|
| Python | Yes (beta) | Yes | Full support — @beta_tool decorator |
| TypeScript |
Yes (beta) | Yes | Full support —
betaZodTool + Zod |
| Java | Yes (beta) | No | Beta tool use with annotated classes |
| Go | Yes (beta) | No |
BetaToolRunner in
toolrunner pkg |
| Ruby | Yes (beta) | No |
BaseTool +
tool_runner in beta |
| cURL | N/A | N/A | Raw HTTP, no SDK features |
| C# | No | No | Official SDK |
| PHP | No | No | Official SDK |
Which Surface Should I Use?
Start simple. Default to the simplest tier that meets your needs. Single API calls and workflows handle most use cases — only reach for agents when the task genuinely requires open-ended, model-driven exploration.
| Use Case | Tier | Recommended Surface | Why |
|---|
| Classification, summarization, extraction, Q&A | Single LLM call | Claude API | One request, one response |
| Batch processing or embeddings |
Single LLM call |
Claude API | Specialized endpoints |
| Multi-step pipelines with code-controlled logic | Workflow |
Claude API + tool use | You orchestrate the loop |
| Custom agent with your own tools | Agent |
Claude API + tool use | Maximum flexibility |
| AI agent with file/web/terminal access | Agent |
Agent SDK | Built-in tools, safety, and MCP support |
| Agentic coding assistant | Agent |
Agent SDK | Designed for this use case |
| Want built-in permissions and guardrails | Agent |
Agent SDK | Safety features included |
Note: The Agent SDK is for when you want built-in file/web/terminal tools, permissions, and MCP out of the box. If you want to build an agent with your own tools, Claude API is the right choice — use the tool runner for automatic loop handling, or the manual loop for fine-grained control (approval gates, custom logging, conditional execution).
Decision Tree
CODEBLOCK0
Should I Build an Agent?
Before choosing the agent tier, check all four criteria:
- - Complexity — Is the task multi-step and hard to fully specify in advance? (e.g., "turn this design doc into a PR" vs. "extract the title from this PDF")
- Value — Does the outcome justify higher cost and latency?
- Viability — Is Claude capable at this task type?
- Cost of error — Can errors be caught and recovered from? (tests, review, rollback)
If the answer is "no" to any of these, stay at a simpler tier (single call or workflow).
Architecture
Everything goes through POST /v1/messages. Tools and output constraints are features of this single endpoint — not separate APIs.
User-defined tools — You define tools (via decorators, Zod schemas, or raw JSON), and the SDK's tool runner handles calling the API, executing your functions, and looping until Claude is done. For full control, you can write the loop manually.
Server-side tools — Anthropic-hosted tools that run on Anthropic's infrastructure. Code execution is fully server-side (declare it in tools, Claude runs code automatically). Computer use can be server-hosted or self-hosted.
Structured outputs — Constrains the Messages API response format (output_config.format) and/or tool parameter validation (strict: true). The recommended approach is client.messages.parse() which validates responses against your schema automatically. Note: the old output_format parameter is deprecated; use output_config: {format: {...}} on messages.create().
Supporting endpoints — Batches (POST /v1/messages/batches), Files (POST /v1/files), Token Counting, and Models (GET /v1/models, GET /v1/models/{id} — live capability/context-window discovery) feed into or support Messages API requests.
Current Models (cached: 2026-02-17)
| Model | Model ID | Context | Input $/1M | Output $/1M |
|---|
| Claude Opus 4.6 | INLINECODE63 | 200K (1M beta) | $5.00 | $25.00 |
| Claude Sonnet 4.6 |
claude-sonnet-4-6 | 200K (1M beta) | $3.00 | $15.00 |
| Claude Haiku 4.5 |
claude-haiku-4-5 | 200K | $1.00 | $5.00 |
ALWAYS use claude-opus-4-6 unless the user explicitly names a different model. This is non-negotiable. Do not use claude-sonnet-4-6, claude-sonnet-4-5, or any other model unless the user literally says "use sonnet" or "use haiku". Never downgrade for cost — that's the user's decision, not yours.
CRITICAL: Use only the exact model ID strings from the table above — they are complete as-is. Do not append date suffixes. For example, use claude-sonnet-4-5, never claude-sonnet-4-5-20250514 or any other date-suffixed variant you might recall from training data. If the user requests an older model not in the table (e.g., "opus 4.5", "sonnet 3.7"), read shared/models.md for the exact ID — do not construct one yourself.
A note: if any of the model strings above look unfamiliar to you, that's to be expected — that just means they were released after your training data cutoff. Rest assured they are real models; we wouldn't mess with you like that.
Live capability lookup: The table above is cached. When the user asks "what's the context window for X", "does X support vision/thinking/effort", or "which models support Y", query the Models API (client.models.retrieve(id) / client.models.list()) — see shared/models.md for the field reference and capability-filter examples.
Thinking & Effort (Quick Reference)
Opus 4.6 — Adaptive thinking (recommended): Use thinking: {type: "adaptive"}. Claude dynamically decides when and how much to think. No budget_tokens needed — budget_tokens is deprecated on Opus 4.6 and Sonnet 4.6 and must not be used. Adaptive thinking also automatically enables interleaved thinking (no beta header needed). When the user asks for "extended thinking", a "thinking budget", or budget_tokens: always use Opus 4.6 with thinking: {type: "adaptive"}. The concept of a fixed token budget for thinking is deprecated — adaptive thinking replaces it. Do NOT use budget_tokens and do NOT switch to an older model.
Effort parameter (GA, no beta header): Controls thinking depth and overall token spend via output_config: {effort: "low"|"medium"|"high"|"max"} (inside output_config, not top-level). Default is high (equivalent to omitting it). max is Opus 4.6 only. Works on Opus 4.5, Opus 4.6, and Sonnet 4.6. Will error on Sonnet 4.5 / Haiku 4.5. Combine with adaptive thinking for the best cost-quality tradeoffs. Use low for subagents or simple tasks; max for the deepest reasoning.
Sonnet 4.6: Supports adaptive thinking (thinking: {type: "adaptive"}). budget_tokens is deprecated on Sonnet 4.6 — use adaptive thinking instead.
Older models (only if explicitly requested): If the user specifically asks for Sonnet 4.5 or another older model, use thinking: {type: "enabled", budget_tokens: N}. budget_tokens must be less than max_tokens (minimum 1024). Never choose an older model just because the user mentions budget_tokens — use Opus 4.6 with adaptive thinking instead.
Compaction (Quick Reference)
Beta, Opus 4.6 and Sonnet 4.6. For long-running conversations that may exceed the 200K context window, enable server-side compaction. The API automatically summarizes earlier context when it approaches the trigger threshold (default: 150K tokens). Requires beta header compact-2026-01-12.
Critical: Append response.content (not just the text) back to your messages on every turn. Compaction blocks in the response must be preserved — the API uses them to replace the compacted history on the next request. Extracting only the text string and appending that will silently lose the compaction state.
See {lang}/claude-api/README.md (Compaction section) for code examples. Full docs via WebFetch in shared/live-sources.md.
Reading Guide
After detecting the language, read the relevant files based on what the user needs:
Quick Task Reference
Single text classification/summarization/extraction/Q&A:
→ Read only INLINECODE97
Chat UI or real-time response display:
→ Read {lang}/claude-api/README.md + INLINECODE99
Long-running conversations (may exceed context window):
→ Read {lang}/claude-api/README.md — see Compaction section
Function calling / tool use / agents:
→ Read {lang}/claude-api/README.md + shared/tool-use-concepts.md + INLINECODE103
Batch processing (non-latency-sensitive):
→ Read {lang}/claude-api/README.md + INLINECODE105
File uploads across multiple requests:
→ Read {lang}/claude-api/README.md + INLINECODE107
Agent with built-in tools (file/web/terminal):
→ Read {lang}/agent-sdk/README.md + INLINECODE109
Claude API (Full File Reference)
Read the language-specific Claude API folder ({language}/claude-api/):
- 1.
{language}/claude-api/README.md — Read this first. Installation, quick start, common patterns, error handling. shared/tool-use-concepts.md — Read when the user needs function calling, code execution, memory, or structured outputs. Covers conceptual foundations.{language}/claude-api/tool-use.md — Read for language-specific tool use code examples (tool runner, manual loop, code execution, memory, structured outputs).{language}/claude-api/streaming.md — Read when building chat UIs or interfaces that display responses incrementally.{language}/claude-api/batches.md — Read when processing many requests offline (not latency-sensitive). Runs asynchronously at 50% cost.{language}/claude-api/files-api.md — Read when sending the same file across multiple requests without re-uploading.shared/error-codes.md — Read when debugging HTTP errors or implementing error handling.shared/live-sources.md — WebFetch URLs for fetching the latest official documentation.
Note: For Java, Go, Ruby, C#, PHP, and cURL — these have a single file each covering all basics. Read that file plus shared/tool-use-concepts.md and shared/error-codes.md as needed.
Agent SDK
Read the language-specific Agent SDK folder ({language}/agent-sdk/). Agent SDK is available for Python and TypeScript only.
- 1.
{language}/agent-sdk/README.md — Installation, quick start, built-in tools, permissions, MCP, hooks. {language}/agent-sdk/patterns.md — Custom tools, hooks, subagents, MCP integration, session resumption.shared/live-sources.md — WebFetch URLs for current Agent SDK docs.
When to Use WebFetch
Use WebFetch to get the latest documentation when:
- - User asks for "latest" or "current" information
- Cached data seems incorrect
- User asks about features not covered here
Live documentation URLs are in shared/live-sources.md.
Common Pitfalls
- - Don't truncate inputs when passing files or content to the API. If the content is too long to fit in the context window, notify the user and discuss options (chunking, summarization, etc.) rather than silently truncating.
- Opus 4.6 / Sonnet 4.6 thinking: Use
thinking: {type: "adaptive"} — do NOT use budget_tokens (deprecated on both Opus 4.6 and Sonnet 4.6). For older models, budget_tokens must be less than max_tokens (minimum 1024). This will throw an error if you get it wrong. - Opus 4.6 prefill removed: Assistant message prefills (last-assistant-turn prefills) return a 400 error on Opus 4.6. Use structured outputs (
output_config.format) or system prompt instructions to control response format instead. max_tokens defaults: Don't lowball max_tokens — hitting the cap truncates output mid-thought and requires a retry. For non-streaming requests, default to ~16000 (keeps responses under SDK HTTP timeouts). For streaming requests, default to ~64000 (timeouts aren't a concern, so give the model room). Only go lower when you have a hard reason: classification (~256), cost caps, or deliberately short outputs.- 128K output tokens: Opus 4.6 supports up to 128K
max_tokens, but the SDKs require streaming for values that large to avoid HTTP timeouts. Use .stream() with .get_final_message() / .finalMessage(). - Tool call JSON parsing (Opus 4.6): Opus 4.6 may produce different JSON string escaping in tool call
input fields (e.g., Unicode or forward-slash escaping). Always parse tool inputs with json.loads() / JSON.parse() — never do raw string matching on the serialized input. - Structured outputs (all models): Use
output_config: {format: {...}} instead of the deprecated output_format parameter on messages.create(). This is a general API change, not 4.6-specific. - Don't reimplement SDK functionality: The SDK provides high-level helpers — use them instead of building from scratch. Specifically: use
stream.finalMessage() instead of wrapping .on() events in new Promise(); use typed exception classes (Anthropic.RateLimitError, etc.) instead of string-matching error messages; use SDK types (Anthropic.MessageParam, Anthropic.Tool, Anthropic.Message, etc.) instead of redefining equivalent interfaces. - Don't define custom types for SDK data structures: The SDK exports types for all API objects. Use
Anthropic.MessageParam for messages, Anthropic.Tool for tool definitions, Anthropic.ToolUseBlock / Anthropic.ToolResultBlockParam for tool results, Anthropic.Message for responses. Defining your own interface ChatMessage { role: string; content: unknown } duplicates what the SDK already provides and loses type safety. - Report and document output: For tasks that produce reports, documents, or visualizations, the code execution sandbox has
python-docx, python-pptx, matplotlib, pillow, and pypdf pre-installed. Claude can generate formatted files (DOCX, PDF, charts) and return them via the Files API — consider this for "report" or "document" type requests instead of plain stdout text.
使用Claude构建基于LLM的应用程序
本技能帮助您使用Claude构建基于LLM的应用程序。根据您的需求选择合适的接口,检测项目语言,然后阅读相关的语言特定文档。
默认设置
除非用户另有要求:
对于Claude模型版本,请使用Claude Opus 4.6,您可以通过确切的模型字符串claude-opus-4-6访问。对于任何稍微复杂的内容,请默认使用自适应思考(thinking: {type: adaptive})。最后,对于任何可能涉及长输入、长输出或高maxtokens的请求,请默认使用流式传输——这可以防止请求超时。如果您不需要处理单个流事件,请使用SDK的.getfinal_message() / .finalMessage()辅助方法获取完整响应。
语言检测
在阅读代码示例之前,确定用户正在使用的语言:
- 1. 查看项目文件以推断语言:
- *.py、requirements.txt、pyproject.toml、setup.py、Pipfile → Python — 从python/读取
- .ts、.tsx、package.json、tsconfig.json → TypeScript — 从typescript/读取
- .js、.jsx(不存在.ts文件) → TypeScript — JS使用相同的SDK,从typescript/读取
- *.java、pom.xml、build.gradle → Java — 从java/读取
- .kt、.kts、build.gradle.kts → Java — Kotlin使用Java SDK,从java/读取
- *.scala、build.sbt → Java — Scala使用Java SDK,从java/读取
- *.go、go.mod → Go — 从go/读取
- *.rb、Gemfile → Ruby — 从ruby/读取
- .cs、.csproj → C# — 从csharp/读取
- *.php、composer.json → PHP — 从php/读取
- 2. 如果检测到多种语言(例如,同时存在Python和TypeScript文件):
- 检查用户当前文件或问题涉及哪种语言
- 如果仍然不明确,询问:我检测到了Python和TypeScript文件。您使用哪种语言进行Claude API集成?
- 3. 如果无法推断语言(空项目、没有源文件或不受支持的语言):
- 使用AskUserQuestion并提供选项:Python、TypeScript、Java、Go、Ruby、cURL/原始HTTP、C#、PHP
- 如果AskUserQuestion不可用,默认使用Python示例并注明:显示Python示例。如果您需要其他语言,请告知。
- 4. 如果检测到不受支持的语言(Rust、Swift、C++、Elixir等):
- 建议使用curl/中的cURL/原始HTTP示例,并说明可能存在社区SDK
- 提供Python或TypeScript示例作为参考实现
- 5. 如果用户需要cURL/原始HTTP示例,从curl/读取。
语言特定功能支持
| 语言 | 工具运行器 | Agent SDK | 备注 |
|---|
| Python | 是(测试版) | 是 | 完全支持 — @beta_tool装饰器 |
| TypeScript |
是(测试版) | 是 | 完全支持 — betaZodTool + Zod |
| Java | 是(测试版) | 否 | 使用带注解类的测试版工具 |
| Go | 是(测试版) | 否 | toolrunner包中的BetaToolRunner |
| Ruby | 是(测试版) | 否 | 测试版中的BaseTool + tool_runner |
| cURL | 不适用 | 不适用 | 原始HTTP,无SDK功能 |
| C# | 否 | 否 | 官方SDK |
| PHP | 否 | 否 | 官方SDK |
应该使用哪个接口?
从简单开始。 默认使用满足您需求的最简单层级。单个API调用和工作流可以处理大多数用例——只有当任务真正需要开放式、模型驱动的探索时才使用Agent。
| 用例 | 层级 | 推荐接口 | 原因 |
|---|
| 分类、摘要、提取、问答 | 单次LLM调用 | Claude API | 一个请求,一个响应 |
| 批处理或嵌入 |
单次LLM调用 |
Claude API | 专用端点 |
| 代码控制逻辑的多步骤流水线 | 工作流 |
Claude API + 工具使用 | 您编排循环 |
| 使用您自己工具的自定义Agent | Agent |
Claude API + 工具使用 | 最大灵活性 |
| 具有文件/网络/终端访问权限的AI Agent | Agent |
Agent SDK | 内置工具、安全性和MCP支持 |
| Agent编码助手 | Agent |
Agent SDK | 为此用例设计 |
| 想要内置权限和护栏 | Agent |
Agent SDK | 包含安全功能 |
注意: 当您想要开箱即用的内置文件/网络/终端工具、权限和MCP时,使用Agent SDK。如果您想使用自己的工具构建Agent,Claude API是正确的选择——使用工具运行器进行自动循环处理,或使用手动循环进行精细控制(审批门、自定义日志记录、条件执行)。
决策树
您的应用程序需要什么?
- 1. 单次LLM调用(分类、摘要、提取、问答)
└── Claude API — 一个请求,一个响应
- 2. Claude是否需要在其工作过程中读取/写入文件、浏览网页或运行shell命令?
(不是:您的应用程序读取文件并将其交给Claude — 而是Claude本身是否需要发现和访问文件/网络/shell?)
└── 是 → Agent SDK — 内置工具,不要重新实现它们
示例:扫描代码库查找错误、汇总目录中的每个文件、
使用子Agent查找错误、通过网络搜索研究主题
- 3. 工作流(多步骤、代码编排、使用您自己的工具)
└── 带工具使用的Claude API — 您控制循环
- 4. 开放式Agent(模型决定自己的轨迹,使用您自己的工具)
└── Claude API Agent循环(最大灵活性)
我应该构建Agent吗?
在选择Agent层级之前,检查所有四个标准:
- - 复杂性 — 任务是否多步骤且难以提前完全指定?(例如,将此设计文档转化为PR vs. 从PDF中提取标题)
- 价值 — 结果是否值得更高的成本和延迟?
- 可行性 — Claude在此任务类型上是否胜任?
- 错误成本 — 错误能否被捕获并从中恢复?(测试、审查、回滚)
如果任何一项答案为否,请保持在更简单的层级(单次调用或工作流)。
架构
一切通过POST /v1/messages进行。工具和输出约束是这个单一端点的功能——而不是单独的API。
用户定义的工具 — 您定义工具(通过装饰器、Zod模式或原始JSON),SDK的工具运行器处理调用API、执行您的函数并循环直到Claude完成。要完全控制,您可以手动编写循环。
服务器端工具 — Anthropic托管的工具,在Anthropic的基础设施上运行。代码执行完全在服务器端(在tools中声明,Claude自动运行代码)。计算机使用可以是服务器托管或自托管。
结构化输出 — 约束Messages API响应格式(outputconfig.format)和/或工具参数验证(strict: true)。推荐的方法是client.messages.parse(),它会自动根据您的模式验证响应。注意:旧的outputformat参数已弃用;在messages.create()上使用output_config: {format: {...}}。
支持端点 — 批处理(POST /v1/messages/batches)、文件(POST /v1/files)、令牌计数和模型(GET /v1/models、GET /v1/models/{id} — 实时能力/上下文窗口发现)为Messages API请求提供支持。
当前模型(缓存日期:2026-02-17)
| 模型 | 模型ID | 上下文 | 输入 $/1M | 输出 $/1M |
| ---------------- | ------------------- | -------------- | ---------