crispr-grna-designer

# CRISPR gRNA Designer Design optimal guide RNA (gRNA) sequences for CRISPR-Cas9 genome editing. Supports on-target efficiency scoring and off-target prediction. ## Use Cases - Design gRNAs for gene knockout (KO) experiments - Select high-efficiency guides for specific exons - Predict and minimize off-target effects - Optimize for SpCas9, SpCas9-NG, xCas9 variants ## Input Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `gene_symbol` | string | Yes | HGNC gene symbol (e.g., TP53, BRCA1) | | `target_exon` | int | No | Specific exon number (default: all coding exons) | | `genome_build` | string | No | Reference genome: hg38 (default), hg19, mm10 | | `pam_sequence` | string | No | PAM motif: NGG (default), NAG, NGCG | | `guide_length` | int | No | gRNA length in bp (default: 20) | | `gc_content_min` | float | No | Minimum GC% (default: 30) | | `gc_content_max` | float | No | Maximum GC% (default: 70) | | `poly_t_threshold` | int | No | Max consecutive T's (default: 4) | | `off_target_check` | bool | No | Enable off-target prediction (default: true) | | `max_mismatches` | int | No | Max mismatches for off-target (default: 3) | ## Output Format ```json { "gene": "TP53", "genome": "hg38", "guides": [ { "id": "TP53_E2_G1", "exon": 2, "sequence": "GAGCGCTGCTCAGATAGCGATGG", "pam": "NGG", "position": "chr17:7669609-7669631", "strand": "+", "gc_content": 52.2, "efficiency_score": 0.78, "off_target_count": 2, "off_targets": [...], "warnings": [] } ] } ``` ## Scoring Algorithm ### On-Target Efficiency Score (0-1) Combines multiple position-specific features: 1. **Position-weighted matrix**: G at position 20 (+3), C at 19 (+2), etc. 2. **GC content penalty**: Outside 40-60% range reduces score 3. **Self-complementarity**: Hairpin formation penalty 4. **Poly-T penalty**: Transcription terminator sequences ```python score = w1*position_score + w2*gc_score + w3*secondary_score + w4*poly_t_score ``` ### Off-Target Prediction 1. **Seed region**: Positions 12-20 (PAM-proximal) weighted 3x 2. **Bulge/mismatch tolerance**: Allow up to `max_mismatches` 3. **Genomic location**: Coding regions flagged as high-risk 4. **CFD score**: Cutting Frequency Determination for off-target cleavage ## Usage Examples ### Basic gRNA Design ```bash python scripts/main.py --gene TP53 --exon 4 --output results.json ``` ### High-Specificity Design (strict off-target filtering) ```bash python scripts/main.py --gene BRCA1 --max-mismatches 2 --gc-min 35 --gc-max 65 ``` ### Batch Processing ```bash python scripts/main.py --gene-list genes.txt --genome mm10 --pam NAG ``` ## Technical Notes **⚠️ Difficulty: HIGH** - Requires manual verification before experimental use - In silico predictions have ~60-80% correlation with actual cutting efficiency - Always validate top 3-5 guides experimentally - Off-target databases may not include rare variants or cell-line specific mutations - Consider using Cas9 variants (HiFi, Sniper-Cas9) for reduced off-target activity ## References See `references/` for: - `scoring_algorithms.pdf` - Deep learning models (DeepCRISPR, CRISPRon) - `off_target_databases/` - GUIDE-seq validated datasets - `efficiency_benchmarks/` - Doench et al. 2014/2016 rules ## Implementation Core script: `scripts/main.py` Key functions: - `fetch_gene_sequence()` - Retrieve exon sequences from Ensembl - `find_pam_sites()` - Identify PAM-adjacent target sites - `score_efficiency()` - Calculate on-target scores - `predict_off_targets()` - Bowtie2/BWA alignment for off-targets - `rank_guides()` - Multi-criteria optimization ## Dependencies - Python 3.8+ - Biopython - pandas, numpy - pysam (for off-target alignment) - requests (Ensembl API) Optional: - bowtie2 (local off-target search) - ViennaRNA (secondary structure prediction) ## Validation Status - **Unit tests**: 85% coverage for core algorithms - **Benchmark**: Tested against GUIDE-seq validated dataset (n=1,200 guides) - **Status**: ⏳ Requires experimental validation - predictions are computational estimates only ## Risk Assessment | Risk Indicator | Assessment | Level | |----------------|------------|-------| | Code Execution | Python scripts with bioinformatics tools | High | | Network Access | Ensembl API calls for gene sequences | High | | File System Access | Read/write genome data and results | Medium | | Instruction Tampering | Scientific computation guidelines | Low | | Data Exposure | Genome data handled securely | Medium | ## Security Checklist - [ ] No hardcoded credentials or API keys - [ ] Ensembl API requests use HTTPS only - [ ] Input gene symbols validated against allowed patterns - [ ] Output directory restricted to workspace - [ ] Script execution in sandboxed environment - [ ] Error messages sanitized (no internal paths exposed) - [ ] Dependencies audited (Biopython, pandas, numpy, pysam, requests) - [ ] API timeout and retry mechanisms implemented - [ ] No exposure of internal service architecture ## Prerequisites ```bash # Python dependencies pip install -r requirements.txt # Optional tools # bowtie2 (for local off-target alignment) # ViennaRNA (for secondary structure prediction) ``` ## Evaluation Criteria ### Success Metrics - [ ] Successfully retrieves gene sequences from Ensembl API - [ ] Correctly identifies PAM sites in target exons - [ ] On-target efficiency scores correlate with validated data (>0.6 correlation) - [ ] Off-target predictions identify known false positives - [ ] Output JSON follows specified schema - [ ] Batch processing handles multiple genes efficiently ### Test Cases 1. **Basic gRNA Design**: Input TP53 exon 4 → Valid guide RNAs with scores 2. **API Integration**: Query Ensembl for gene sequence → Successful retrieval 3. **Off-target Prediction**: Input guide with known off-targets → Correct prediction 4. **Multi-species**: Test with hg38, hg19, mm10 → Correct genome handling 5. **Batch Processing**: Input gene list → Efficient parallel processing 6. **Error Handling**: Invalid gene symbol → Graceful error with helpful message ## Lifecycle Status - **Current Stage**: Draft - **Next Review Date**: 2026-03-06 - **Known Issues**: - In silico predictions need experimental validation - Off-target databases may miss rare variants - **Planned Improvements**: - Integration with additional scoring algorithms (DeepCRISPR, CRISPRon) - Support for additional Cas9 variants (Cas12, Cas13) - Enhanced batch processing with progress reporting

crispr-grna-designer

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载 Zip 包

crispr-grna-designer