PharmaClaw AlphaFold Agent
Overview
Protein structure retrieval and ligand docking agent for the PharmaClaw drug discovery pipeline. Fetches experimental structures from RCSB PDB and predicted structures from AlphaFold DB, detects binding sites, and performs conformer-based docking with RDKit.
Quick Start
CODEBLOCK0
Capabilities
| Feature | Method | Source |
|---|
| Structure Fetch | RCSB Search API + AlphaFold DB | Public PDB files |
| Fold Prediction |
ESMFold via HuggingFace | Sequence → 3D structure |
| Binding Sites | Pocket detection | Residue-level pockets |
| Ligand Docking | RDKit conformer generation | SMILES → affinity score |
Decision Tree
- - UniProt ID provided? → Fetch from RCSB PDB / AlphaFold DB
- FASTA sequence provided? → Predict fold via ESMFold
- SMILES provided? → Dock ligand into detected binding pocket
- No structure found? → Fall back to ESMFold prediction
Input Format
CODEBLOCK1
Output Format
CODEBLOCK2
Chain Integration
- - Receives from: Chemistry Query (SMILES for docking), Literature (target proteins)
- Feeds into: IP Expansion (novel binding modes), Catalyst Design (structure-guided synthesis)
Dependencies
- -
rdkit-pypi — Conformer generation and molecular descriptors - INLINECODE1 — PDB parsing and FASTA sequence handling
- INLINECODE2 — API calls to RCSB and AlphaFold DB
Compliance
Uses only publicly available protein structures (RCSB PDB, AlphaFold DB) and open-source prediction (ESMFold). All data sources are commercially permissible. No proprietary AlphaFold 3 server calls.
Scripts
- -
scripts/alphafold_agent.py — Main agent: fetch, predict, detect sites, dock
Limitations
- - Docking uses RDKit conformer scoring (not full physics-based docking like Vina)
- ESMFold prediction requires significant compute for large proteins
- Binding site detection is simplified; production use should integrate fpocket or P2Rank
技能名称:pharmaclaw-alphafold-agent
详细描述:
PharmaClaw AlphaFold 智能体
概述
用于PharmaClaw药物发现流程的蛋白质结构检索与配体对接智能体。可从RCSB PDB获取实验结构,从AlphaFold DB获取预测结构,检测结合位点,并使用RDKit进行基于构象的对接。
快速开始
bash
获取结构并对接配体
python scripts/alphafold_agent.py {uniprot: P01116, smiles: CC(=O)Nc1ccc(O)cc1}
仅获取结构
python scripts/alphafold_agent.py {uniprot: P01116}
功能
| 特性 | 方法 | 来源 |
|---|
| 结构获取 | RCSB搜索API + AlphaFold DB | 公共PDB文件 |
| 折叠预测 |
通过HuggingFace的ESMFold | 序列 → 3D结构 |
| 结合位点 | 口袋检测 | 残基级别口袋 |
| 配体对接 | RDKit构象生成 | SMILES → 亲和力评分 |
决策树
- - 提供了UniProt ID? → 从RCSB PDB / AlphaFold DB获取
- 提供了FASTA序列? → 通过ESMFold预测折叠
- 提供了SMILES? → 将配体对接至检测到的结合口袋
- 未找到结构? → 回退至ESMFold预测
输入格式
json
{
uniprot: P01116,
smiles: CC(=O)Nc1ccc(O)cc1,
fasta: path/to/sequence.fasta
}
输出格式
json
{
pdb: 1abc.pdb,
sites: [{res: G12, pocket_vol: 150}],
docking: {affinity: -15.2, viz: docked.png},
compliance: 公共AlphaFold 2数据库/ESMFold(商业可用)
}
链式集成
- - 接收自: 化学查询(用于对接的SMILES)、文献(目标蛋白)
- 输入至: IP扩展(新颖结合模式)、催化剂设计(结构引导合成)
依赖项
- - rdkit-pypi — 构象生成与分子描述符
- biopython — PDB解析与FASTA序列处理
- requests — 对RCSB和AlphaFold DB的API调用
合规性
仅使用公开可用的蛋白质结构(RCSB PDB、AlphaFold DB)和开源预测工具(ESMFold)。所有数据源均允许商业使用。不涉及专有的AlphaFold 3服务器调用。
脚本
- - scripts/alphafold_agent.py — 主智能体:获取、预测、检测位点、对接
局限性
- - 对接使用RDKit构象评分(非基于完整物理学的对接,如Vina)
- ESMFold预测对大型蛋白质需要大量计算资源
- 结合位点检测为简化版本;生产环境应集成fpocket或P2Rank