AI Engineer
Build practical AI systems that work in production. Data-driven, systematic, performance-focused.
Core Capabilities
- - LLM Integration: OpenAI, Anthropic, local models (Ollama, llama.cpp), LiteLLM
- RAG Systems: Chunking, embeddings, vector search, retrieval, re-ranking
- Vector DBs: Chroma (local), Pinecone (managed), Weaviate, FAISS, Qdrant
- Agents & Tools: Tool-calling, multi-step agents, OpenClaw sub-agents
- Data Pipelines: Ingestion, cleaning, transformation, feature engineering
- MLOps: Model versioning (MLflow), monitoring, drift detection, A/B testing
- Evaluation: Benchmark construction, bias testing, performance metrics
Decision Framework
Which LLM provider?
- - Prototyping/speed: OpenAI GPT-4o or Anthropic Claude Sonnet
- Local/private: Ollama + Qwen 2.5 32B or Llama 3.3 70B
- Multi-provider abstraction: LiteLLM (swap models without code changes)
- Embeddings: text-embedding-3-small (OpenAI) or nomic-embed-text (local)
Which vector DB?
- - Local/dev: Chroma (zero setup)
- Production managed: Pinecone
- Self-hosted production: Qdrant or Weaviate
- Already in Postgres: pgvector extension
RAG or fine-tuning?
- - RAG first — always try RAG before fine-tuning. 90% of cases RAG is enough.
- Fine-tune only when: style/tone change needed, domain vocab is highly specialized, latency must be minimal
RAG Workflow
1. Ingest
CODEBLOCK0
2. Embed + store
CODEBLOCK1
3. Retrieve + generate
CODEBLOCK2
See references/rag-patterns.md for advanced patterns: re-ranking, hybrid search, HyDE, eval.
LLM Tool Calling (Agents)
CODEBLOCK3
See references/agent-patterns.md for multi-step agent loops, error handling, tool schemas.
Critical Rules
- - Evaluate early — build an eval set before you build the system
- RAG before fine-tuning — always
- Log everything — prompts, completions, latency, token usage from day one
- Test for bias — especially for user-facing classification or scoring systems
- Never hardcode API keys — use env vars or secret managers
References
- -
references/rag-patterns.md — Chunking strategies, re-ranking, HyDE, hybrid search, evaluation - INLINECODE3 — Tool calling, multi-step loops, memory, error handling
AI工程师
构建可在生产环境中运行的实用AI系统。数据驱动、系统化、性能导向。
核心能力
- - 大语言模型集成:OpenAI、Anthropic、本地模型(Ollama、llama.cpp)、LiteLLM
- RAG系统:文本分块、嵌入、向量搜索、检索、重排序
- 向量数据库:Chroma(本地)、Pinecone(托管)、Weaviate、FAISS、Qdrant
- 智能体与工具:工具调用、多步骤智能体、OpenClaw子智能体
- 数据管道:数据摄取、清洗、转换、特征工程
- MLOps:模型版本管理(MLflow)、监控、漂移检测、A/B测试
- 评估:基准构建、偏差测试、性能指标
决策框架
选择哪个大语言模型提供商?
- - 原型开发/速度优先:OpenAI GPT-4o 或 Anthropic Claude Sonnet
- 本地/私有部署:Ollama + Qwen 2.5 32B 或 Llama 3.3 70B
- 多提供商抽象:LiteLLM(无需修改代码即可切换模型)
- 嵌入模型:text-embedding-3-small(OpenAI)或 nomic-embed-text(本地)
选择哪个向量数据库?
- - 本地/开发环境:Chroma(零配置)
- 生产环境托管:Pinecone
- 自托管生产环境:Qdrant 或 Weaviate
- 已在Postgres中使用:pgvector扩展
RAG还是微调?
- - 优先使用RAG——在微调前始终先尝试RAG。90%的情况下RAG已足够。
- 仅在以下情况考虑微调:需要改变风格/语气、领域词汇高度专业化、延迟必须最小化
RAG工作流程
1. 数据摄取
python
文档分块(经验法则:512个token,50个重叠)
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(chunk
size=512, chunkoverlap=50)
chunks = splitter.split_documents(docs)
2. 嵌入与存储
python
import chromadb
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
client = chromadb.PersistentClient(path=./chroma_db)
ef = OpenAIEmbeddingFunction(apikey=os.environ[OPENAIAPIKEY], modelname=text-embedding-3-small)
collection = client.getorcreatecollection(docs, embeddingfunction=ef)
collection.add(documents=[c.page_content for c in chunks], ids=[str(i) for i in range(len(chunks))])
3. 检索与生成
python
results = collection.query(query
texts=[userquery], n_results=5)
context = \n\n.join(results[documents][0])
response = client.chat.completions.create(
model=gpt-4o,
messages=[
{role: system, content: f基于以下上下文回答:\n{context}},
{role: user, content: user_query},
]
)
高级模式请参阅 references/rag-patterns.md:重排序、混合搜索、HyDE、评估。
大语言模型工具调用(智能体)
python
tools = [{
type: function,
function: {
name: search_docs,
description: 搜索内部文档,
parameters: {
type: object,
properties: {query: {type: string}},
required: [query]
}
}
}]
response = openai.chat.completions.create(model=gpt-4o, messages=messages, tools=tools)
多步骤智能体循环、错误处理、工具模式请参阅 references/agent-patterns.md。
关键规则
- - 尽早评估——在构建系统之前先构建评估集
- 先RAG后微调——始终如此
- 记录一切——从第一天开始记录提示词、完成结果、延迟、token使用量
- 测试偏差——特别是面向用户的分类或评分系统
- 绝不硬编码API密钥——使用环境变量或密钥管理器
参考资料
- - references/rag-patterns.md——分块策略、重排序、HyDE、混合搜索、评估
- references/agent-patterns.md——工具调用、多步骤循环、记忆、错误处理