Frequently Asked Questions
Common questions about Omics807 - file types, analysis time, accuracy, tumor-only vs tumor-normal, and troubleshooting
Frequently Asked Questions
Answers to common questions about using Omics807 and understanding somatic variant calling.
General Questions
What is Omics807?
Omics807 is a cloud-based platform that combines deep learning variant calling with clinical insights and comprehensive multi-omics enrichment. It identifies cancer-specific mutations from sequencing data and provides interpretable results through an intuitive interface.
Key features: - Somatic variant calling for tumor samples with deep learning - Omics807 interpretation with multi-omics enrichment (population genetics, protein structures, drug targets, pathways, clinical evidence) - Support for multiple sequencing technologies - Cloud processing with real-time progress tracking - Interactive visualizations
Who should use Omics807?
Ideal users: - Cancer researchers studying somatic mutations - Bioinformaticians analyzing tumor samples - Clinical genomics labs - Students learning variant calling - Anyone needing accessible genomic analysis
Not recommended for: - Clinical diagnosis (use validated clinical pipelines) - Germline variant calling (use DeepVariant instead) - Production clinical workflows (regulatory considerations)
Is Omics807 free?
Omics807 itself is a free platform, but you need: - Cloud server - Your own infrastructure (costs vary by provider) - AI service API - For clinical interpretations and enrichment (~$0.01-0.10 per analysis) - Storage - For BAM files and results
Estimated costs: - Small test (chr1): $1-2 for server time - Whole genome: $10-50 depending on server specs - AI service API: $0.05-0.20 per job
Input Data Questions
What file types does Omics807 accept?
Required formats: - BAM files (.bam) - Binary Alignment Map - BAI files (.bam.bai) - BAM index files - CRAM files - Compressed alternative to BAM (supported)
Not accepted: - ❌ FASTQ files (raw reads - need alignment first) - ❌ VCF files (already called variants) - ❌ Images or PDFs - ❌ CSV or text files
Why BAM files?
BAM files contain aligned sequencing reads, which DeepSomatic needs to create pileup images for variant calling.
Do I need both tumor and normal samples?
Recommended: Yes, tumor-normal pairs give best accuracy
Tumor-Normal (Paired): - ✅ Distinguishes somatic from germline - ✅ Higher precision (fewer false positives) - ✅ No Panel of Normals needed - ✅ 95%+ precision typical
Tumor-Only: - ⚠️ Cannot distinguish somatic/germline - ⚠️ Higher false positive rate - ⚠️ Requires Panel of Normals filtering - ⚠️ 70-80% precision typical
When to use tumor-only: - No matched normal available - Archival FFPE samples - Budget constraints - Preliminary screening
What's the difference between WGS and WES?
| Feature | WGS | WES |
|---|---|---|
| Coverage | Entire genome | Exome only (~2%) |
| Size | ~100-200GB | ~10-20GB |
| Cost | $800-1500 | $200-500 |
| Runtime | 3-6 hours | 15-30 minutes |
| Variants | All regions | Coding only |
| Use case | Complete analysis | Targeted, cost-effective |
Choose WGS when: - Need non-coding variants - Studying structural variants - Comprehensive analysis required
Choose WES when: - Budget limited - Focus on coding mutations - Faster turnaround needed - Most pathogenic variants in exons
Can I upload files from my computer?
Yes, but with limitations:
File upload: - Maximum: 10GB per file (configurable) - Recommended: <5GB for reasonable upload times - Network speed: Critical factor
Better option: Use URLs
Instead of uploading 100GB BAM:
→ Upload to cloud storage (S3, GCS)
→ Generate public URL
→ Paste URL into Omics807
→ Server downloads directly (faster!)
Quick test URLs:
Tumor: https://storage.googleapis.com/.../tumor.bam
Normal: https://storage.googleapis.com/.../normal.bam
What reference genome does Omics807 use?
Default: GRCh38 (hg38)
Why GRCh38? - Current standard (2013 release) - Better accuracy than GRCh37 - Fewer gaps and errors - Used by DeepSomatic training
Important: Your BAM files must be aligned to GRCh38
- Check BAM header: samtools view -H your.bam | grep @SQ
- If aligned to GRCh37, will cause incorrect variant calls
Analysis Questions
How long does analysis take?
Quick start (chr1 subset, 100kb): - ~5-10 minutes
Chromosome 1 complete: - WGS: ~30-45 minutes - WES: ~10-15 minutes
Whole genome: - WGS: 3-6 hours (96 CPUs) - WES: 15-30 minutes - PacBio: 5-6 hours - ONT: 5-6 hours
Factors affecting runtime: - BAM file size and coverage - Model type (WES faster than WGS) - Server CPU count - Number of shards (parallelization) - GPU availability (for call_variants)
Can I speed up the analysis?
Yes, several strategies:
1. Use more CPU cores:
--num_shards=32 # Use 32 cores instead of 4
Scales nearly linearly with cores
2. Add GPU acceleration:
- Use -gpu Docker image
- Reduces call_variants by 50-70%
- Nvidia T4, A100, or V100 recommended
3. Analyze specific regions:
--regions=chr17 # Only chromosome 17
--regions=chr1:1000000-2000000 # Specific region
4. Use faster models: - WES instead of WGS (if applicable) - Skip tumor-only models if normal available
5. Optimize server: - SSD storage (faster I/O) - More RAM (avoid swapping) - Faster network (for downloads)
How accurate is DeepSomatic?
WGS (Illumina): - SNV: 95% recall, 99% precision - Indel: 93% recall, 85% precision - Overall: 98% precision
WES (Illumina): - SNV: 94% recall, 99% precision - Indel: 90% recall, 94% precision
Compared to alternatives: - MuTect2: DeepSomatic 15-20% better on indels - Strelka2: DeepSomatic ~5% better precision - VarScan2: DeepSomatic significantly better
Accuracy depends on: - Coverage depth (higher = better) - Sample quality (FFPE vs fresh) - Tumor purity (higher = easier) - Variant allele frequency (higher = easier)
See Model Guide for detailed metrics.
What's the minimum coverage required?
Recommended minimum:
| Sample Type | Minimum | Recommended | Optimal |
|---|---|---|---|
| Normal WGS | 30x | 50x | 80x |
| Tumor WGS | 50x | 80x | 100x+ |
| Normal WES | 80x | 120x | 150x |
| Tumor WES | 100x | 150x | 200x |
| Tumor-only | 60x | 100x | 150x+ |
Why higher coverage matters: - Detect low VAF variants (<10%) - Improve indel calling accuracy - Reduce false negatives - Better genotype confidence
Too low coverage (<20x): - Many false negatives - Unreliable VAF estimates - Poor GQ scores
Can Omics807 detect all types of variants?
Well detected ✅: - SNVs (single nucleotide variants) - Small indels (<50bp) - MNVs (multi-nucleotide variants)
Limited detection ⚠️: - Medium indels (50-200bp) - accuracy drops - Structural variants (>200bp) - not designed for this - Copy number alterations - needs separate tool
Not detected ❌: - Large SVs (>1kb) - use SV caller - Gene fusions - use RNA-seq tools - Methylation - use bisulfite-seq - CNVs - use CNVkit, FACETS, etc.
For comprehensive analysis: - DeepSomatic: SNVs and indels - Manta/GRIDSS: Structural variants - CNVkit/FACETS: Copy number - STAR-Fusion: Gene fusions
Model Selection Questions
Which model should I use?
Follow this decision tree:
1. Do you have matched normal? - Yes → Tumor-normal models - No → Tumor-only models
2. What sequencing platform? - Illumina → WGS or WES - PacBio → PACBIO - Nanopore → ONT_R104
3. Is tissue FFPE? - Yes → FFPE_WGS or FFPE_WES - No → Standard models
4. Coverage type? - Whole genome → WGS - Exome only → WES
Quick examples:
- Illumina WGS + normal → WGS
- Illumina WES + normal → WES
- Illumina WGS, no normal → WGS_TUMOR_ONLY
- FFPE WGS + normal → FFPE_WGS
- PacBio + normal → PACBIO
What if I use the wrong model?
Consequences: - Lower accuracy (10-30% drop) - More false positives/negatives - Incorrect quality scores - Wasted computational time
Common mistakes:
Using WGS on WES data: - Too many false positives in exons - Wrong statistical assumptions
Not using FFPE model on FFPE: - C→T artifacts called as variants - 20-30% more false positives
Using Illumina model on PacBio: - Different error profiles - Poor indel performance
How to verify correct model: - Check sequencing platform in metadata - Examine error rates in BAM - Review tissue preservation method
When should I use tumor-only mode?
Use tumor-only when: - ✅ No matched normal available - ✅ Archival/FFPE samples only - ✅ Budget constraints (half the sequencing cost) - ✅ Rapid screening needed - ✅ Have access to Panel of Normals
Avoid tumor-only when: - ❌ Normal tissue accessible - ❌ Need high precision (clinical decisions) - ❌ No Panel of Normals available - ❌ Studying rare variants
Tumor-only best practices: 1. Use high coverage (100x+) 2. Apply Panel of Normals filtering 3. Validate key variants orthogonally 4. Filter common population variants (dbSNP) 5. Be conservative with interpretation
Results Questions
What does "PASS" mean in the results?
PASS = High-confidence somatic variant
Criteria for PASS: - Sufficient quality score (QUAL > 30) - Present in tumor, absent (or low) in normal - Passes all filters (strand bias, mapping quality, etc.) - Likely true somatic mutation
Other FILTER values: - GERMLINE: Inherited, not somatic - RefCall: No variant, matches reference - LowQual: Quality below threshold
Focus on PASS variants for: - Clinical interpretation - Downstream analysis - Actionable mutation discovery
See Understanding Results for details.
How do I interpret variant allele frequency (VAF)?
VAF = Variant reads / Total reads
Interpretation:
VAF = 100% → Homozygous in tumor
VAF = 50% → Heterozygous or germline
VAF = 40% → Somatic, ~80% tumor purity
VAF = 20% → Somatic, low purity or subclonal
VAF = 5% → Very low frequency, validate carefully
Factors affecting VAF:
- Tumor purity:
Pure tumor (100% purity):
→ Heterozygous somatic = 50% VAF
Mixed (50% tumor, 50% normal):
→ Heterozygous somatic = 25% VAF
- Copy number:
Normal: 1 variant copy, 1 normal → 50% VAF
Amplification: 3 variant copies, 1 normal → 75% VAF
LOH: 1 variant copy, 0 normal → 100% VAF
- Subclonality:
Clonal (all cells): High VAF (30-50%)
Subclonal (subset): Low VAF (5-20%)
What are "Omics807 Insights" and how reliable are they?
Omics807 Insights are comprehensive clinical interpretations generated by advanced analysis that integrate multi-omics enrichment data
What Omics807 provides: - Population genetics analysis for germline filtering - Protein structure predictions for mutation impact - Drug target matching and therapeutic options - Pathway analysis for biological context - Clinical evidence from curated databases - Research literature citations - Clinical significance assessment - Associated cancer types - Treatment implications - Recommended follow-up actions
Example:
Variant: BRAF p.V600E
AI: "Well-established oncogenic driver in melanoma
and colorectal cancer. Targetable with BRAF inhibitors
(vemurafenib, dabrafenib). Consider resistance testing."
Reliability: - ✅ Good for well-known hotspot mutations - ✅ Summarizes published literature - ⚠️ May not reflect latest research (knowledge cutoff) - ⚠️ Should be validated with databases (COSMIC, ClinVar) - ❌ Not a substitute for clinical interpretation
Best practice: 1. Use AI as starting point 2. Verify with clinical databases 3. Consult with oncologists/genetic counselors 4. Consider patient-specific context
How many variants should I expect?
Typical ranges (whole genome):
WGS: - Total detected: 10,000-100,000 - PASS (somatic): 1,000-10,000 - High impact: 10-100
WES: - Total detected: 100-1,000 - PASS (somatic): 50-500 - High impact: 5-50
Factors affecting count: - Cancer type (melanoma > pediatric tumors) - Tumor mutational burden (TMB) - Patient age (more with age) - Exposure (smoking, UV) - Filtering stringency
Concerning patterns: - Too few (<100): Low coverage, high purity issues - Too many (>100,000): Wrong reference, artifacts - All filtered (no PASS): Quality issues
Troubleshooting
Why is my job stuck at "Queued"?
Possible causes:
- Server connection failed:
- Check SSH credentials
- Verify server is running
-
Test connection manually
-
Previous job still running:
- Omics807 runs one job at a time
-
Wait for completion or cancel
-
Server out of resources:
- Check disk space
- Verify sufficient RAM
- Monitor CPU usage
Solutions:
# Test SSH connection
ssh root@your-server-ip
# Check disk space
df -h
# Check running processes
ps aux | grep deepsomatic
# Kill stuck process
pkill -9 -f deepsomatic
Why did my job fail?
Common error messages:
"BAM file not found": - URL is incorrect or inaccessible - File was not uploaded properly - Network issues during transfer
"Reference genome missing": - GRCh38 reference not on server - Wrong path specified - Need to run setup script
"Out of memory": - BAM file too large for server RAM - Increase server memory - Use fewer shards
"Docker not installed": - Server setup incomplete - Run setup script - Install Docker manually
"Invalid BAM file": - BAM corrupted or incomplete - Wrong reference genome used - Missing BAM index (.bai)
Debugging steps:
1. Check job logs in Omics807
2. SSH to server and check /root/deepsomatic_jobs/[job_id]/deepsomatic.log
3. Verify input files exist and are valid
4. Test with quick start dataset
How do I validate my results?
Validation hierarchy (best to good):
1. Sanger Sequencing (Gold standard): - PCR amplify variant region - Sanger sequence - Confirms variant presence and VAF
2. Digital Droplet PCR (ddPCR): - Precise VAF measurement - Good for low-frequency variants - Quantitative validation
3. Alternative NGS Platform: - Re-sequence on different platform - Illumina → PacBio - Different library prep
4. Orthogonal Variant Caller: - Run MuTect2 or Strelka2 - Take consensus of multiple callers - Higher confidence on shared calls
5. Database Cross-Reference: - Check COSMIC for known cancer mutations - Review ClinVar for pathogenicity - Compare to published literature
Validation targets: - All clinically actionable variants - Variants driving treatment decisions - Novel/unexpected mutations - Low VAF variants (<10%)
Can I run Omics807 on my laptop?
Short answer: Not recommended
Why not: - DeepSomatic requires 32GB+ RAM (laptops typically 8-16GB) - WGS takes 3-6 hours on 96 CPU server (days on laptop) - Large BAM files (100GB+) need substantial storage - Resource-intensive Docker containers
Better alternatives: 1. Cloud server (recommended): - Rent on-demand (AWS, GCP, Kamatera) - Pay only when analyzing - Scale resources as needed
- Institutional HPC:
- Use university/hospital cluster
- Often free for researchers
-
Pre-installed tools
-
Small datasets on laptop (chr1 only):
- Possible for testing
- Use quick start dataset
- Expect 1-2 hour runtime
Minimum specs for laptop testing: - CPU: 8+ cores - RAM: 32GB - Storage: 100GB free - Time: Hours, not minutes
Advanced Questions
Can I use Omics807 for clinical diagnostics?
Currently: No, not recommended for clinical use
Reasons: - Not FDA approved for diagnostics - No CLIA/CAP validation - Research-grade implementation - Lacks clinical-grade QC
For clinical use, you need: - Validated clinical pipeline - CLIA-certified laboratory - CAP accreditation - Clinical-grade reporting
Omics807 is appropriate for: - Research studies - Method development - Educational purposes - Preliminary screening
If clinical application needed: - Partner with certified lab - Validate against clinical standards - Obtain regulatory approval - Implement QC procedures
How can I integrate Omics807 into my pipeline?
Integration options:
1. API Integration (future): - REST API for job submission - Programmatic result retrieval - Webhook notifications
2. Docker Container:
# Run DeepSomatic directly
docker run google/deepsomatic:1.9.0 \
run_deepsomatic \
--model_type=WGS \
--ref=ref.fasta \
--reads_tumor=tumor.bam \
--reads_normal=normal.bam \
--output_vcf=output.vcf.gz
3. Workflow Integration: - Add to Nextflow pipeline - Integrate with Snakemake - Use in WDL workflows
4. Batch Processing: - Process multiple samples - Cloud-based scaling - Results aggregation
Example Nextflow:
process deepsomatic {
container 'google/deepsomatic:1.9.0'
input:
path tumor_bam
path normal_bam
output:
path 'output.vcf.gz'
script:
"""
run_deepsomatic \
--model_type=WGS \
--reads_tumor=${tumor_bam} \
--reads_normal=${normal_bam} \
--output_vcf=output.vcf.gz
"""
}
Where can I get help?
Omics807 Support: - Check this FAQ - Review documentation guides - Examine case studies
DeepSomatic Support: - GitHub Issues - Documentation - Community forums
General Genomics Help: - Biostars - SEQanswers - r/bioinformatics
Training Resources: - Broad Institute Workshops - Coursera Genomics Courses - Galaxy Training
Still Have Questions?
Documentation: - Getting Started - Basic introduction - Model Guide - Technical details - Understanding Results - Interpretation help
External Resources: - DeepSomatic GitHub - SEQC2 Project - COSMIC Database
Community: Join discussions on genomics forums or create an issue on the Omics807 GitHub repository.