Cross-Database Validation
Systematic search across 10+ genomic databases, zero quintuple matches among 31,000+ myeloid patients
Databases Searched
10+
Unique Patients
31,000+
Deduplicated myeloid cohort
Quintuple Matches Zero
0
5 drivers · Expected: 7.7×10¹³
Triple Matches
25
Closest: Patient 2642 (IDH2+PTPN11+SETBP1 triple)
Database Coverage
Zero quintuple matches across 10+ databases and 31,000+ deduplicated myeloid patients.
GENIE v19.0 provides the statistical backbone (20,820 myeloid samples), supplemented
by cBioPortal (46 studies, 25,873 samples), GDC (24 hematological projects, 16,411
cases), ICGC, and ClinVar for independent confirmation. GDC identified 33 unique
cases not represented in GENIE, including 3 double matches (DNMT3A+IDH2) in the
BeatAML project. Additional European and Japanese cohorts (aCML, MDS/MPN overlap
cases sequenced under degron-focused or deep-sequencing panels) would extend the
analysis into populations not yet represented in these predominantly Western-centric
databases.
| Database | Patients | Unique Addition | EZH2 V662A | Quintuple Result |
|---|---|---|---|---|
| GENIE v19.017 [17] Consortium 2017 AACR Project GENIE: Powering Precision Medicine through an International Consortium. Cancer Discov (2017) | 20,820 | Baseline | 0 / 20,739 | 0 |
| cBioPortal (46 studies) | 25,873 | ~5,000 | N/A | 0 |
| GDC/TCGA (24 projects) | 16,411 | 33 | N/A | 0 |
| IPSS-M (Bernard 2022)13 [13] E 2022 Molecular International Prognostic Scoring System for Myelodysplastic Syndromes. NEJM Evid (2022) | 2,957 | ~1,750 | N/A | N/A |
| ICGC/PCAWG | 1,575 | ~883 | N/A | 0 |
| Beat AML / Vizome | 903 | 0 (overlap) | N/A | 0 |
| DepMap (cell lines) | 84 | 84 | N/A | N/A |
| ClinVar | N/A | N/A | N/A | N/A (no co-occurrence) |
| Open Targets | N/A | N/A | N/A | N/A (gene-drug only) |
Patient counts represent panel-eligible myeloid samples with sequencing
coverage for all five target genes (DNMT3A, IDH2, SETBP1, PTPN11, EZH2).
Unique addition estimates account for patient overlap between databases.
EZH2 V662A has 0 carriers in GENIE (0/20,739). The variant itself does not
appear in any public database.
GDC Expanded Results
24 hematological projects queried, 16,411 cases screened.
The GDC expanded search used GRCh38 genomic coordinates to query all hematological malignancy projects in the Genomic Data Commons. Of 16,411 cases, 33 were identified as potentially unique (not in GENIE).
The GDC expanded search used GRCh38 genomic coordinates to query all hematological malignancy projects in the Genomic Data Commons. Of 16,411 cases, 33 were identified as potentially unique (not in GENIE).
| Gene | Variant Hits | Double Matches | Notes |
|---|---|---|---|
| DNMT3A | 19 | 3 (with IDH2) | All doubles in BEATAML (base GDC: 9 SSMs) |
| IDH2 | 17 | 3 (with DNMT3A) | All doubles in BEATAML (base GDC: 2 SSMs) |
| SETBP1 | 1 | 0 | Single hit, no overlap with others (base GDC: 0 SSMs) |
| PTPN11 | 3 | 0 | No co-occurrence with target genes (base GDC: 3 SSMs) |
GDC Data Portal v2 (portal.gdc.cancer.gov). Query: simple somatic mutations
across 24 hematological projects using GRCh38 coordinates. March 2026.
cBioPortal Expanded
46 myeloid studies, 25,873 samples screened.
Expanded cBioPortal query across all publicly available myeloid/AML/MDS studies identified 20 patients carrying 3 or more of the target genes (triples). Notable: patient P-0032912 with DNMT3A+IDH2+SETBP1, the closest match to the patient profile. Zero quadruple or quintuple matches were found.
Expanded cBioPortal query across all publicly available myeloid/AML/MDS studies identified 20 patients carrying 3 or more of the target genes (triples). Notable: patient P-0032912 with DNMT3A+IDH2+SETBP1, the closest match to the patient profile. Zero quadruple or quintuple matches were found.
| Metric | Value |
|---|---|
| Studies queried | 46 |
| Total samples | 25,873 |
| Patients with 3+ target genes | 20 |
| Closest match | P-0032912 (DNMT3A+IDH2+SETBP1) |
| Quadruple matches | 0 |
| Quintuple matches | 0 |
cBioPortal for Cancer Genomics (cbioportal.org). 46 myeloid-lineage studies
queried via web API. March 2026.
Genomic Landscape Context
The patient's mutation profile intersects two well-characterized genomic
landscapes. Papaemmanuil et al.
12
[12] E 2016
Genomic Classification and Prognosis in Acute Myeloid Leukemia. N Engl J Med (2016)
classified AML into
genomic subgroups based on driver mutations and demonstrated that DNMT3A,
IDH2, and PTPN11 co-mutations define a chromatin-spliceosome subgroup with
distinct prognosis. Haferlach et al.
48[48] T 2014
Landscape of genetic lesions in 944 patients with myelodysplastic syndromes. Leukemia (2014)
mapped the MDS mutational
landscape in 944 patients, establishing SETBP1 mutations
1[1] R 2013
Recurrent SETBP1 mutations in atypical chronic myeloid leukemia. Nat Genet (2013)
2[2] H 2013
Somatic SETBP1 mutations in myeloid malignancies. Nat Genet (2013)
as
recurrent in MDS/MPN overlap syndromes with adverse prognosis.
Quintuple Rarity Context
The original four-gene analysis found 0 matches with an expected frequency of
1.13×10-4 (1 in ~8,850). With EZH2 V662A confirmed as a fifth
driver, the expected frequency drops to ~7.7×10-13
(1 in ~1.3 trillion). This is not merely rare; it is computationally
impossible to find in any existing cohort. Even if every myeloid patient ever
sequenced worldwide (~150,000-250,000) were available, the expected number
of matches would be ~2×10-7.
The pairwise-corrected estimate uses the maximum entropy approximation with observed/expected ratios for all 10 gene pairs 30
The pairwise-corrected estimate uses the maximum entropy approximation with observed/expected ratios for all 10 gene pairs 30
[30] S 2016
A novel independence test for somatic alterations in cancer shows that biology drives mutual exclusivity but chance explains most co-occurrence. Genome Biol (2016)
31[31] C 2013
Mutational landscape and significance across 12 major cancer types. Nature (2013)
, applied to
variant-specific frequencies in GENIE v19.0
17[17] Consortium 2017
AACR Project GENIE: Powering Precision Medicine through an International Consortium. Cancer Discov (2017)
.
| Scenario | Expected Frequency | Matches in 250k Patients |
|---|---|---|
| Quadruple (DNMT3A+IDH2+SETBP1+PTPN11) | 1.13×10-4 | ~0.028 |
| Quintuple (+EZH2 V662A) | 7.7×10-13 | ~2×10-7 |
References
- AACR Project GENIE Consortium. AACR Project GENIE: powering precision medicine through an international consortium. Cancer Discov (2017). DOI
- Bernard E et al. Molecular International Prognostic Scoring System for myelodysplastic syndromes. NEJM Evid (2022). DOI
- Papaemmanuil E et al. Genomic Classification and Prognosis in Acute Myeloid Leukemia. N Engl J Med (2016). PubMed
- Haferlach T et al. Landscape of genetic lesions in 944 patients with myelodysplastic syndromes. Leukemia (2014). PubMed
- Piazza R et al. Recurrent SETBP1 mutations in atypical chronic myeloid leukemia. Nat Genet (2013). PubMed
- Makishima H et al. Somatic SETBP1 mutations in myeloid malignancies. Nat Genet (2013). PubMed
- Canisius S et al. A novel independence test for somatic alterations in cancer. Genome Biol (2016). DOI
- Kandoth C et al. Mutational landscape and significance across 12 major cancer types. Nature (2013). PubMed