Cheminformatics Agent v1.0.0
Overview
Advanced cheminformatics toolkit for 3D molecular analysis and drug development workflows. Extends Chemistry Query (which handles 2D lookup/properties/visualization) with predictive and structural capabilities that require 3D reasoning.
Chemistry Query = "What is this molecule?" (2D, lookup, descriptors)
Cheminformatics = "What can this molecule become?" (3D, conformers, pharmacophores, fragments, stereoisomers)
Scripts
scripts/conformer_gen.py
3D conformer ensemble generation using ETKDG with MMFF/UFF optimization.
CODEBLOCK0
| Action | Description |
|---|
| generate | Generate N conformers with energies and RMSD matrix |
| ensemble |
Same as generate + write SDF file |
| best | Find lowest-energy conformer with 3D coordinates |
CODEBLOCK1
Output includes: conformer energies (kcal/mol), relative energies, convergence status, RMSD matrix (top 20), SDF file.
scripts/format_converter.py
Convert between molecular file formats.
CODEBLOCK2
Supported formats: smiles, sdf, mol, inchi, inchikey, pdb, INLINECODE8
CODEBLOCK3
Batch mode reads multi-molecule SDF files. All 3D formats auto-generate and optimize conformers.
scripts/pharmacophore.py
Pharmacophore feature extraction, fingerprints, and comparison.
CODEBLOCK4
| Action | Description |
|---|
| features | Extract 3D pharmacophoric features (HBD, HBA, aromatic, hydrophobic, ionizable) with coordinates |
| fingerprint |
Generate Gobbi 2D pharmacophore fingerprint |
| compare | Pairwise pharmacophore similarity (Tanimoto) across multiple molecules |
| map | Generate color-coded pharmacophore PNG (green=donor, red=acceptor, yellow=aromatic, blue=hydrophobic) |
CODEBLOCK5
scripts/recap_fragment.py
RECAP (Retrosynthetic Combinatorial Analysis Procedure) fragmentation at synthetically accessible bonds (amide, ester, amine, urea, ether, olefin, sulfonamide, etc.).
CODEBLOCK6
| Action | Description |
|---|
| fragment | All RECAP fragments with metadata |
| leaves |
Terminal building blocks only (for library design) |
| tree | Hierarchical decomposition tree |
| common_fragments | Shared fragments across multiple molecules (common scaffolds) |
CODEBLOCK7
Use case: Leaf fragments → building blocks for combinatorial library enumeration. Common fragments across a compound series → shared pharmacophoric scaffolds.
scripts/stereoisomers.py
Stereoisomer enumeration and analysis (chiral centers R/S, double bond E/Z).
CODEBLOCK8
| Action | Description |
|---|
| enumerate | Generate all stereoisomers with configurations |
| analyze |
Count chiral centers and stereo bonds without enumerating |
| compare | Compare properties across all stereoisomers (drug dev relevance) |
CODEBLOCK9
Drug relevance: FDA requires characterization of each stereoisomer for chiral drug candidates. Flags meso forms and provides R/S assignments.
scripts/chain_entry.py
Standard agent chain interface. Runs all 5 modules on a SMILES input.
CODEBLOCK10
Input JSON fields:
- -
smiles (required): Input SMILES - INLINECODE14 : Chain context string
- INLINECODE15 : Array to run subset — INLINECODE16
- INLINECODE17 : Directory for SDF/PNG output files
Output schema:
CODEBLOCK11
Chaining
| From | To | What passes |
|---|
| Chemistry Query → | Cheminformatics | SMILES + basic properties |
| Cheminformatics → |
Pharmacology | SMILES + pharmacophore profile for ADME context |
|
Cheminformatics → | Catalyst Design | 3D conformer data for catalyst selection |
|
Cheminformatics → | IP Expansion | Stereoisomers as patentable variants |
|
Cheminformatics → | Toxicology | Fragment analysis for structural alerts |
Dependencies
- - Python ≥ 3.10
- rdkit-pypi
- Pillow (for pharmacophore map PNG)
- numpy
化学信息学代理 v1.0.0
概述
用于3D分子分析和药物开发工作流的高级化学信息学工具包。扩展了化学查询(处理2D查询/属性/可视化)的功能,增加了需要3D推理的预测和结构能力。
化学查询 = 这是什么分子?(2D,查询,描述符)
化学信息学 = 这个分子能变成什么?(3D,构象,药效团,片段,立体异构体)
脚本
scripts/conformer_gen.py
使用ETKDG生成3D构象集合,并采用MMFF/UFF优化。
--smiles --action [--numconfs N] [--optimize mmff|uff|none] [--energywindow F] [--prune_rms F] [--output file.sdf]
| 操作 | 描述 |
|---|
| generate | 生成N个构象,包含能量和RMSD矩阵 |
| ensemble |
同generate + 写入SDF文件 |
| best | 寻找具有3D坐标的最低能量构象 |
bash
python scripts/conformergen.py --smiles CC(=O)Oc1ccccc1C(=O)O --action generate --numconfs 20
python scripts/conformer_gen.py --smiles CCO --action best --output best.sdf
python scripts/conformergen.py --smiles c1ccccc1 --action ensemble --numconfs 50 --output benzene_confs.sdf
输出包括:构象能量(kcal/mol)、相对能量、收敛状态、RMSD矩阵(前20个)、SDF文件。
scripts/format_converter.py
分子文件格式转换。
--smiles | --input --to [--output file] [--batch] [--name label]
支持的格式:smiles、sdf、mol、inchi、inchikey、pdb、xyz
bash
python scripts/format_converter.py --smiles CCO --to sdf --output ethanol.sdf
python scripts/format_converter.py --smiles CCO --to inchi
python scripts/format_converter.py --input mols.sdf --to smiles --batch
python scripts/format_converter.py --smiles CCO --to pdb --output ethanol.pdb
批处理模式读取多分子SDF文件。所有3D格式自动生成并优化构象。
scripts/pharmacophore.py
药效团特征提取、指纹和比较。
--smiles --action [--target_smiles smi1,smi2] [--output file.png]
| 操作 | 描述 |
|---|
| features | 提取3D药效团特征(氢键供体、氢键受体、芳香环、疏水基团、可电离基团)及坐标 |
| fingerprint |
生成Gobbi 2D药效团指纹 |
| compare | 多分子间药效团成对相似度(Tanimoto系数) |
| map | 生成彩色编码的药效团PNG(绿色=供体,红色=受体,黄色=芳香环,蓝色=疏水基团) |
bash
python scripts/pharmacophore.py --smiles CC(=O)Oc1ccccc1C(=O)O --action features
python scripts/pharmacophore.py --smiles CC(=O)Oc1ccccc1C(=O)O --action map --output pharm.png
python scripts/pharmacophore.py --target_smiles CCO,CC(=O)O,c1ccccc1 --action compare
scripts/recap_fragment.py
RECAP(逆合成组合分析流程)在可合成键(酰胺、酯、胺、脲、醚、烯烃、磺酰胺等)处进行片段化。
--smiles --action fragments> [--targetsmiles smi1,smi2] [--max_depth N]
| 操作 | 描述 |
|---|
| fragment | 所有RECAP片段及元数据 |
| leaves |
仅末端构建块(用于库设计) |
| tree | 层次化分解树 |
| common_fragments | 多分子间共享片段(共同骨架) |
bash
python scripts/recap_fragment.py --smiles CC(=O)Nc1ccc(O)cc1 --action fragment
python scripts/recap_fragment.py --smiles CC(=O)Nc1ccc(O)cc1 --action leaves
python scripts/recapfragment.py --targetsmiles CC(=O)Nc1ccc(O)cc1,CC(=O)Nc1ccccc1 --action common_fragments
应用场景:叶片段 → 组合库枚举的构建块。化合物系列中的共同片段 → 共享药效团骨架。
scripts/stereoisomers.py
立体异构体枚举和分析(手性中心R/S,双键E/Z)。
--smiles --action [--maxisomers N] [--onlyunassigned]
| 操作 | 描述 |
|---|
| enumerate | 生成所有立体异构体及构型 |
| analyze |
计数手性中心和立体键(不枚举) |
| compare | 比较所有立体异构体的性质(药物开发相关性) |
bash
python scripts/stereoisomers.py --smiles C(F)(Cl)Br --action enumerate
python scripts/stereoisomers.py --smiles CC=CC --action analyze
python scripts/stereoisomers.py --smiles OC(F)(Cl)Br --action compare
药物相关性:FDA要求对手性候选药物的每个立体异构体进行表征。标记内消旋形式并提供R/S分配。
scripts/chain_entry.py
标准代理链接口。对SMILES输入运行全部5个模块。
bash
python scripts/chain_entry.py --input-json {smiles: CC(=O)Nc1ccc(O)cc1, context: user}
python scripts/chain_entry.py --input-json {smiles: CCO, actions: [conformers, pharmacophore]}
输入JSON字段:
- - smiles(必需):输入SMILES
- context:链上下文字符串
- actions:运行子集的数组 — [conformers, pharmacophore, recap, stereoisomers, formats]
- output_dir:SDF/PNG输出文件目录
输出模式:
json
{
agent: cheminformatics,
version: 1.0.0,
smiles: <规范SMILES>,
status: success|error,
report: {
conformers: {...},
pharmacophore: {...},
recap: {...},
stereoisomers: {...},
formats: {...}
},
risks: [],
warnings: [],
viz: [path/to/file.sdf, path/to/pharmacophore_map.png],
recommend_next: [pharmacology, catalyst-design, ip-expansion],
confidence: 0.9,
timestamp: ISO8601
}
链式连接
| 从 | 到 | 传递内容 |
|---|
| 化学查询 → | 化学信息学 | SMILES + 基本属性 |
| 化学信息学 → |
药理学 | SMILES + 用于ADME背景的药效团概况 |
|
化学信息学 → | 催化剂设计 | 用于催化剂选择的3D构象数据 |
|
化学信息学 → | 知识产权扩展 | 作为可专利变体的立体异构体 |
|
化学信息学 → | 毒理学 | 用于结构警示的片段分析 |
依赖项
- - Python ≥ 3.10
- rdkit-pypi
- Pillow(用于药效团映射PNG)
- numpy