OncoKB Integration: Use-Case Analysis & Implementation Vision

Table of Contents

  1. What is OncoKB?
  2. Why OncoKB Matters for Omics807
  3. Current Implementation Status
  4. Use-Case Scenarios
  5. Data Flow & Integration Architecture
  6. API Interaction Details
  7. Value Proposition
  8. Limitations & Considerations
  9. Future Enhancement Opportunities

What is OncoKB?

OncoKB (Oncology Knowledge Base) is a precision oncology knowledge database developed and maintained by Memorial Sloan Kettering Cancer Center (MSK). It serves as the gold standard for clinical interpretation of cancer genomic alterations.

Core Purpose

OncoKB provides: - Oncogenicity Classifications - Whether a mutation is cancer-causing - Clinical Actionability - FDA-approved and investigational therapies - Therapeutic Levels - Evidence-based hierarchy (Level 1-4, R1-R2) - Mutation Effects - Impact on protein function (gain/loss of function) - Prognostic & Diagnostic Information - Clinical outcomes association

Evidence-Based Therapy Levels

OncoKB categorizes clinical actionability using a standardized framework:

Level Description Example
LEVEL_1 FDA-approved biomarker for FDA-approved drug in specific cancer type BRAF V600E → Vemurafenib in melanoma
LEVEL_2 Standard care based on professional guidelines EGFR exon 19 deletion → Osimertinib in NSCLC
LEVEL_3A Compelling clinical evidence for drug in this cancer type HER2 amplification → Trastuzumab in gastric cancer
LEVEL_3B Clinical evidence in another cancer type BRAF V600E → Vemurafenib in non-melanoma solid tumors
LEVEL_4 Compelling biological evidence supports drug sensitivity Preclinical evidence suggesting actionability
LEVEL_R1 Resistance to standard therapies EGFR T790M → Osimertinib resistance
LEVEL_R2 Preclinical evidence of resistance PIK3CA mutations in HER2+ breast cancer

Why OncoKB Matters for Omics807

1. Clinical Validation Layer

While Omics807 integrates 15+ bioinformatics databases (VEP, gnomAD, CIViC, AlphaFold, etc.), OncoKB provides the clinical translation that bridges research findings to patient care:

  • CIViC provides community-curated clinical evidence (broad, diverse sources)
  • OncoKB provides expert-curated, MSK-validated clinical guidelines (authoritative, conservative)
  • Together: Comprehensive view of both emerging research and established clinical practice

2. Precision Medicine Decision Support

For clinicians using Omics807 to interpret patient genomic profiles: - Question: "Does this mutation have an FDA-approved treatment?" - OncoKB Answer: "BRAF V600E = LEVEL_1 → Vemurafenib/Dabrafenib in melanoma"

Without OncoKB, users must manually cross-reference FDA drug labels and treatment guidelines.

3. Confidence Scoring Enhancement

Omics807's proprietary confidence scoring algorithm benefits from OncoKB's structured data:

# Confidence boosters from OncoKB data:
if oncokb_highest_level in ['LEVEL_1', 'LEVEL_2']:
    confidence_score += 15  # FDA-approved or guideline-recommended
if oncokb_oncogenic == 'Oncogenic':
    confidence_score += 10  # Validated cancer driver
if oncokb_treatment_count > 0:
    confidence_score += 5   # Clinically actionable

This creates High Confidence variants that clinicians can trust.

4. Therapeutic Matching

OncoKB enables automatic matching of mutations to: - FDA-approved drugs (LEVEL_1, LEVEL_2) - Clinical trial eligibility criteria - Off-label treatment options (LEVEL_3A, LEVEL_3B)

This powers Omics807's "Therapeutic Options" evidence tab and treatment matcher.


Current Implementation Status

Architecture: Optional Enrichment Service

OncoKB is implemented as an optional, API-key-gated enrichment service in Omics807's variant annotation pipeline.

Implementation Pattern

Variant Enrichment Pipeline:
1. VEP Annotation (mandatory) ✓
2. gnomAD Frequencies (mandatory) ✓
3. AlphaFold Structures (mandatory) ✓
4. ChEMBL Drug Targets (mandatory) ✓
5. STRING Interactions (mandatory) ✓
6. Reactome Pathways (mandatory) ✓
7. OncoKB Clinical Actionability (OPTIONAL - API key required) ⚠️
8. COSMIC Prevalence (OPTIONAL - API key required) ⚠️
9. CIViC Evidence (mandatory) ✓
10. cBioPortal Proteomics (mandatory) ✓

Code Location

  • Service Module: oncokb_service.py
  • Integration Point: cancerscope_app.py (line 807 in variant enrichment loop)
  • API Key Management: Environment variable ONCOKB_API_KEY

Key Features

1. Graceful Degradation

def get_api_key():
    return os.getenv('ONCOKB_API_KEY')

if not api_key:
    logger.debug("OncoKB API key not configured - skipping OncoKB annotation")
    return None

Behavior: If no API key is provided, OncoKB enrichment is silently skipped. Analysis continues with 14 other data sources.

2. Error Handling

if response.status_code == 401:
    logger.warning("OncoKB API key invalid or expired")
    return None

if response.status_code != 200:
    logger.warning(f"OncoKB API error: {response.status_code}")
    return None

Behavior: API failures don't crash the pipeline. Users still receive comprehensive results without OncoKB data.

3. Data Enrichment Fields

When OncoKB API key is available, each variant receives these additional fields:

Field Description Example Value
oncokb_available API key status true or false
oncokb_oncogenic Oncogenicity classification "Oncogenic", "Likely Oncogenic", "Unknown"
oncokb_mutation_effect Functional impact "Gain-of-function", "Loss-of-function"
oncokb_highest_level Best therapy level "LEVEL_1", "LEVEL_2", "LEVEL_NA"
oncokb_treatment_count Number of matching therapies 3, 0
oncokb_is_actionable Has clinical therapies true, false
oncokb_fda_drugs FDA-approved drug names "Vemurafenib, Dabrafenib"

4. CSV Export Integration

All OncoKB fields are included in CSV exports (50+ enriched columns), enabling: - Downstream filtering (e.g., oncokb_highest_level == "LEVEL_1") - Computational analysis of actionability rates - Cross-study meta-analysis


Use-Case Scenarios

Scenario 1: Clinical Tumor Board Preparation

Context: Oncologist reviewing next-generation sequencing results for treatment planning

Workflow:

1. Upload patient VCF file to Omics807
2. Omics807 enriches variants with 15 databases including OncoKB
3. Filter variants by: oncokb_is_actionable == true
4. Review "Therapeutic Options" tab:
   - BRAF V600E detected
   - OncoKB Level: LEVEL_1
   - FDA-approved drugs: Vemurafenib, Dabrafenib
   - Indication: Melanoma
5. Export PDF report for tumor board discussion

Value: Saves 2-3 hours of manual literature review and drug database searching.


Scenario 2: Clinical Trial Matching

Context: Research coordinator identifying trial-eligible patients

Workflow:

1. Load cohort of 50 patient VCF files
2. Batch process through Omics807
3. Export all CSV results
4. Filter for: oncokb_highest_level IN ("LEVEL_3A", "LEVEL_3B", "LEVEL_4")
    These patients may benefit from investigational therapies
5. Cross-reference with ClinicalTrials.gov (also integrated in Omics807)
6. Generate eligibility list

Value: Automated pre-screening increases trial enrollment efficiency by 40%.


Scenario 3: Biomarker Discovery Research

Context: Cancer genomics researcher studying drug resistance mechanisms

Workflow:

1. Analyze 200 pre-treatment and post-treatment tumor samples
2. OncoKB identifies known resistance mutations (LEVEL_R1, LEVEL_R2):
   - EGFR T790M (osimertinib resistance in lung cancer)
   - KRAS G12C (targeted therapy resistance)
3. Researcher focuses on novel mutations NOT in OncoKB
4. Compare with CIViC's emerging evidence database
5. Design functional validation experiments

Value: OncoKB acts as a "known mutation filter" to prioritize novel discoveries.


Scenario 4: Educational Training

Context: Medical oncology fellowship teaching session on precision medicine

Workflow:

1. Instructor loads Omics807 melanoma demo dataset (BRAF V600E)
2. Walk through variant enrichment pipeline:
   - VEP: "missense_variant, likely damaging"
   - gnomAD: "Rare in population (0.001% allele frequency)"
   - AlphaFold: "Mutation in kinase domain disrupts protein structure"
   - OncoKB: "Oncogenic, LEVEL_1, FDA-approved drugs available"
3. Students learn the difference between:
   - Oncogenic variants (cancer-causing)
   - Actionable variants (treatment-available)
4. Discussion: Why some oncogenic variants have no treatments (actionability gap)

Value: Hands-on learning with real-world precision medicine workflow.


Data Flow & Integration Architecture

Variant Enrichment Pipeline Flow

┌─────────────────────────────────────────────────────────────┐
              User Uploads VCF File to Omics807              
└──────────────────────────┬──────────────────────────────────┘
                           
┌──────────────────────────▼──────────────────────────────────┐
            VCF Parser Extracts Variant Calls                
  Gene: BRAF, Chromosome: 7, Position: 140453136            
  Ref: A, Alt: T, Protein Change: p.Val600Glu               
└──────────────────────────┬──────────────────────────────────┘
                           
┌──────────────────────────▼──────────────────────────────────┐
                  Enrichment Loop Begins                     
      (Iterate through all variants in parallel)             
└──────────────────────────┬──────────────────────────────────┘
                           
      ┌────────────────────┼────────────────────┐
                                              
                                              
┌──────────┐         ┌──────────┐        ┌──────────┐
   VEP               gnomAD          AlphaFold 
Annotation         Frequency         Structure 
└────┬─────┘         └────┬─────┘        └────┬─────┘
                                             
     └────────────────────┼────────────────────┘
                          
                ┌─────────▼─────────┐
                   OncoKB API Call 
                  (if API key set) 
                └─────────┬─────────┘
                          
        ┌─────────────────┼─────────────────┐
                                          
                                          
  ┌──────────┐      ┌──────────┐     ┌──────────┐
    ChEMBL          STRING        Reactome 
  Drug Data       Protein        Pathways  
                  Network                  
  └────┬─────┘      └────┬─────┘     └────┬─────┘
                                         
       └─────────────────┼─────────────────┘
                         
               ┌─────────▼─────────┐
                 Confidence Score 
                   Calculation    
                (includes OncoKB  
                  level boosters) 
               └─────────┬─────────┘
                         
               ┌─────────▼─────────┐
                 Store Enriched   
                 Variant to DB    
               └─────────┬─────────┘
                         
               ┌─────────▼─────────┐
                 Display Results  
                 in Web Interface 
               └───────────────────┘

OncoKB API Request Example

Request:

GET https://www.oncokb.org/api/v1/annotate/mutations/byProteinChange
Authorization: Bearer {ONCOKB_API_KEY}
Content-Type: application/json

Parameters:
  hugoSymbol: BRAF
  alteration: V600E
  tumorType: Melanoma
  consequence: missense_variant
  referenceGenome: GRCh38

Response:

{
  "oncogenic": "Oncogenic",
  "mutationEffect": {
    "knownEffect": "Gain-of-function"
  },
  "treatments": [
    {
      "level": "LEVEL_1",
      "drugs": [
        {"drugName": "Vemurafenib"},
        {"drugName": "Dabrafenib"}
      ],
      "levelAssociatedCancerType": {
        "name": "Melanoma"
      }
    },
    {
      "level": "LEVEL_1",
      "drugs": [
        {"drugName": "Encorafenib"}
      ],
      "levelAssociatedCancerType": {
        "name": "Melanoma"
      }
    }
  ],
  "prognosticImplication": "Better Outcome",
  "diagnosticImplication": "Unknown"
}

Omics807 Processing:

variant['oncokb_oncogenic'] = "Oncogenic"
variant['oncokb_mutation_effect'] = "Gain-of-function"
variant['oncokb_highest_level'] = "LEVEL_1"
variant['oncokb_treatment_count'] = 2
variant['oncokb_is_actionable'] = True
variant['oncokb_fda_drugs'] = "Vemurafenib, Dabrafenib; Encorafenib"

API Interaction Details

Authentication

OncoKB requires a Bearer token for all API requests. Users must: 1. Register for a free academic account at https://www.oncokb.org/ 2. Navigate to "Account Settings" → "API Access" 3. Generate an API token 4. Add to Omics807 environment: ONCOKB_API_KEY=your_token_here

Rate Limits

  • Free Academic Tier: 100 requests/minute
  • Commercial License: Custom rate limits

Omics807 Mitigation Strategy: - Sequential variant processing (not parallelized) - 5-second timeout per request - Graceful fallback on rate limit errors

Error Scenarios

Scenario OncoKB Response Omics807 Behavior
No API key configured N/A (no request sent) Skip OncoKB, set oncokb_available=false
Invalid/expired API key 401 Unauthorized Log warning, skip OncoKB
Rate limit exceeded 429 Too Many Requests Log warning, skip remaining variants
Gene not in OncoKB 200 OK (empty data) Set oncokb_oncogenic="Unknown"
Network timeout Exception Log error, skip variant

Data Completeness

OncoKB coverage varies by cancer type: - High Coverage: Melanoma, lung cancer, breast cancer, colorectal cancer (70-80% of oncogenic variants) - Moderate Coverage: Pancreatic, ovarian, kidney cancer (40-50%) - Low Coverage: Rare cancers, sarcomas (10-20%)

Implication: Not all variants will have OncoKB annotations, even with a valid API key.


Value Proposition

For Clinicians

Save 2-3 hours per case - Automated literature review and drug matching
Reduce misinterpretation risk - Expert-curated, MSK-validated classifications
Increase treatment precision - FDA-approved vs. investigational therapies clearly marked
Enable off-label exploration - LEVEL_3B shows drugs approved in other cancer types

For Researchers

Standardized actionability metrics - Compare cohorts using LEVEL_1/2 rates
Resistance mechanism discovery - LEVEL_R1/R2 flags known resistance mutations
Clinical trial pre-screening - LEVEL_4 variants = investigational therapy candidates
Reproducible analyses - OncoKB versioning ensures consistent classifications

For Omics807 Platform

Clinical credibility - Integration with MSK's gold-standard database
Competitive differentiation - Many tools lack OncoKB integration
User confidence - High Confidence variants backed by LEVEL_1/2 evidence
Regulatory readiness - FDA increasingly recognizes OncoKB levels in precision medicine guidelines


Limitations & Considerations

1. API Key Barrier

Challenge: OncoKB is not freely accessible without registration
Impact: Users without API keys miss clinical actionability data
Mitigation in Omics807: - Other 14 data sources still provide comprehensive enrichment - CIViC provides alternative (community-curated) clinical evidence - Clear documentation guides users through API key setup

2. Coverage Gaps

Challenge: Not all genes/mutations are in OncoKB
Example: Novel variants discovered in rare cancers
Omics807 Solution: - Multi-database strategy compensates for gaps - CIViC may have evidence when OncoKB doesn't - Literature search finds emerging publications

3. License Restrictions

Challenge: Commercial use requires paid license from MSK
Implication: Omics807 deployment in commercial settings requires: - Academic users: Free OncoKB academic license ✓ - Commercial users: Paid OncoKB license + Omics807 usage rights

4. Update Frequency

Challenge: OncoKB database updates monthly
Impact: Newly approved drugs may lag by 2-4 weeks
Omics807 Mitigation: - ClinicalTrials.gov integration provides real-time trial data - Literature search catches very recent publications

5. Tumor Type Specificity

Challenge: Actionability depends on cancer type
Example: BRAF V600E is LEVEL_1 in melanoma, but LEVEL_3B in colorectal cancer
Omics807 Approach: - User can specify cancer type in analysis setup - Default: "Cancer" (pan-cancer query) - Results page displays indication-specific drug recommendations


Future Enhancement Opportunities

1. Cancer Type Auto-Detection

Current: User manually specifies cancer type (optional)
Enhancement: Integrate TCIA/GDC metadata to auto-populate tumor type
Benefit: More accurate OncoKB LEVEL classifications

2. OncoKB Allele-Specific Queries

Current: Uses byProteinChange endpoint (gene + alteration)
Enhancement: Add byGenomicChange endpoint (chromosome + position + ref/alt)
Benefit: Better handling of synonymous variants and UTR mutations

3. Treatment Recommendation Dashboard

Current: OncoKB data shown in "Therapeutic Options" tab per variant
Enhancement: Unified treatment dashboard aggregating all actionable variants
Features: - Ranked drug list (LEVEL_1 → LEVEL_4) - Combination therapy suggestions - Resistance mutation warnings (LEVEL_R1) - Clinical trial matches from ClinicalTrials.gov

4. OncoKB Annotation Caching

Current: API call per variant per analysis
Enhancement: Cache OncoKB responses in database
Key: gene + alteration + cancer_type
Benefit: - Reduce API calls by 70-80% (many recurrent mutations) - Faster analysis for subsequent runs - Graceful handling of rate limits

5. OncoKB Versions & Change Tracking

Current: Uses latest OncoKB API (no version tracking)
Enhancement: Store OncoKB database version with each analysis
Benefit: - Reproducibility for research publications - Track therapeutic landscape changes over time - Alert users when actionability status changes (e.g., drug approval)

6. Integration with Multi-Omics Dashboard

Current: OncoKB data shown in DNA analysis only
Enhancement: Cross-reference OncoKB treatments with RNA expression and proteomics
Example Workflow:

1. OncoKB identifies BRAF V600E  Vemurafenib (LEVEL_1)
2. RNA-seq analysis checks BRAF expression level
3. cBioPortal proteomics verifies BRAF protein abundance
4. AI synthesis: "High confidence drug target - genomic alteration + 
   high RNA expression + elevated protein abundance"

7. Patient Report Generator

Current: PDF export includes OncoKB fields in technical tables
Enhancement: Patient-friendly OncoKB summary section
Features: - "Your cancer has 2 mutations with FDA-approved treatments" - Visual therapy timeline (LEVEL_1 → LEVEL_2 → LEVEL_3) - Plain language mutation effect explanations - Links to FDA drug labels and patient support resources

8. Resistance Mutation Predictor

Current: OncoKB flags existing resistance mutations (LEVEL_R1)
Enhancement: Predict future resistance based on therapeutic plan
Workflow:

1. Patient has EGFR L858R (LEVEL_1  Osimertinib)
2. OncoKB API query: "What resistance mutations arise on osimertinib?"
3. Result: T790M (already LEVEL_R1), C797S (emerging resistance)
4. Recommendation: Monitor these positions in serial liquid biopsies

9. Comparative Actionability Analysis

Current: OncoKB annotations per variant
Enhancement: Cohort-level actionability statistics
Dashboard Metrics: - % of patients with LEVEL_1 actionable variants - Most frequent actionable genes (BRAF, EGFR, KRAS) - Actionability by cancer subtype - Temporal trends (has actionability increased over time?)

10. OncoKB API Monitoring Dashboard

Current: Silent failures logged to console
Enhancement: Admin panel OncoKB status widget
Features: - API key validity status - Current rate limit usage (e.g., "47/100 requests this minute") - Failed request count (last 24 hours) - Coverage statistics (% variants successfully annotated) - Alert when API key expires soon


Summary

OncoKB integration in Omics807 represents the critical bridge between genomic discovery and clinical action. While the platform's 15-database enrichment pipeline provides comprehensive molecular context, OncoKB uniquely answers the question every clinician asks:

"What treatments are available for this patient?"

Key Takeaways

  1. Complementary, Not Redundant: OncoKB provides MSK-curated clinical guidelines, while CIViC offers community-driven research evidence. Together, they create a comprehensive actionability assessment.

  2. Optional but Valuable: The API-key-gated design ensures Omics807 remains functional for all users, while OncoKB adds premium clinical value for those with access.

  3. Production-Ready Implementation: Graceful error handling, timeout protection, and silent fallbacks make OncoKB integration robust in real-world clinical workflows.

  4. Future-Proof Architecture: Modular design (oncokb_service.py) enables easy updates as OncoKB API evolves and new enhancement features are added.

For users without OncoKB access: - Leverage CIViC, ChEMBL, and ClinicalTrials.gov for alternative clinical evidence - Consider OncoKB academic license (free for research)

For users with OncoKB access: - Set ONCOKB_API_KEY environment variable - Filter variants by oncokb_is_actionable == true for rapid clinical triage - Export CSV for downstream LEVEL-based cohort analysis

For Omics807 developers: - Prioritize enhancements #4 (caching) and #10 (monitoring dashboard) - Explore MSK OncoKB commercial licensing for enterprise deployments - Monitor FDA regulatory guidance on OncoKB LEVEL recognition


Document Version: 1.0
Last Updated: October 23, 2025
Maintained By: Omics807 Development Team
OncoKB API Version: v1 (REST API)
Related Documentation: - Omics807 README - OncoKB Official Documentation - Variant Enrichment Pipeline