Skip to main content
AACR Project GENIE v19.0 · 21,017 myeloid patients Panel-adjusted Fisher's exact with Benjamini-Hochberg FDR N=1 case study · Not clinical guidance

Statistical Rarity

Bayesian rare-event estimation, 1 in 1.9 billion variant-level frequency

Jeffreys Median Ultra-rare
1.22e-5
Posterior median under Jeffreys Beta(0.5, 0.5) prior i
Patients for 50%
55,875
Future cohort needed for >50% chance of 1 match i
Variant Specificity
7.1 orders
Specific variants are 12.6 million times rarer than gene-level i
Combined Search Zero
0 matches
26,642 patients (GENIE + HARMONY)

Jeffreys Posterior Estimation

With 0 observed quintuples in 18,625 myeloid patients, the Jeffreys non-informative prior Beta(0.5, 0.5) yields a posterior Beta(0.5, 18,625.5). The posterior median (1.22e-5) represents the best point estimate of the true population frequency. The 95% credible interval spans 2.64e-8 to 1.35e-4, providing bounds on the plausible frequency range.
Statistic Value Interpretation
Posterior median 1.22e-5 Best point estimate of true frequency
Posterior mean 2.68e-5 Mean of Beta(0.5, 18,625.5)
95% CI lower 2.64e-8 Lower 2.5th percentile
95% CI upper 1.35e-4 Upper 97.5th percentile
One-sided 95% upper 1.03e-4 Conservative upper bound
Source: bayesian_rarity_results.json. Jeffreys prior: non-informative conjugate prior for binomial data.

Rule of Three (Frequentist)

The Rule of Three (Hanley and Lippman-Hand, 1983) provides a quick frequentist upper bound: with 0 events in n trials, the 95% upper confidence limit is approximately 3/n. For 18,625 patients, this gives an upper bound of 1.61e-4. The Clopper-Pearson exact interval (1.61e-4) and Poisson exact 99% bound (2.47e-4) provide alternative frequentist estimates, all consistent with the Bayesian credible interval.
Method Upper Bound Confidence Level
Rule of Three 1.61e-4 95%
Clopper-Pearson exact 1.61e-4 95%
Poisson exact 2.47e-4 99%
Source: bayesian_rarity_results.json. Hanley JA, Lippman-Hand A. If nothing goes wrong, is everything all right? JAMA (1983).

Gene-Level vs Variant-Level Rarity

The five genes (DNMT3A, IDH2, SETBP1, PTPN11, EZH2) are individually common in myeloid malignancies. At the gene level, the expected co-occurrence count is 1.94e-2, placing this combination at the 77.8th percentile among all 278,256 possible five-gene combinations from 34 myeloid genes. The rarity is driven entirely by variant specificity: the specific amino acid changes are 7.1 orders of magnitude (1.3e+7x) rarer than the gene-level combination. EZH2 V662A (0 carriers in GENIE) is the primary driver of variant-level rarity.
Gene Variant-to-Gene Ratio Interpretation
DNMT3A 0.2752 Moderate specificity
IDH2 0.8084 Moderate specificity
SETBP1 0.3170 Moderate specificity
PTPN11 0.0210 High specificity
EZH2 5.37e-5 Extreme specificity (0 carriers)
Gene-Level Expected
1.94e-2
77.8th percentile among 278,256 combinations
Variant-Level Expected
1.54e-9
7.1 orders rarer than gene-level
Quintuple Probability
~7.7e-13
1 in 1.3 trillion
Source: bayesian_rarity_results.json. Extreme Value Theory: Coles S. Statistical Modeling of Extreme Values. Springer (2001).

Posterior Predictive: Patients Needed

Using the Jeffreys posterior, the probability of observing at least one quintuple match in a future cohort of M patients is computed via the posterior predictive distribution. 55,875 patients are needed for a 50% chance of finding one match. 7,431,474 patients are needed for 95% confidence. These numbers far exceed current database sizes, confirming that the zero-match observation is consistent with expectations and not a sampling artifact.
Future Cohort Size P(at least 1 match)
1,000 2.6%
5,000 11.2%
10,000 19.3%
50,000 47.9%
100,000 60.4%
500,000 81.0%
1,000,000 86.5%
50% Detection
55,875 patients
Future cohort needed for 50% chance of 1 match
95% Detection
7,431,474 patients
Future cohort needed for 95% chance of 1 match
Source: bayesian_rarity_results.json. Posterior predictive under Jeffreys conjugate Beta prior.

Extreme Value Theory

All 278,256 possible five-gene combinations from the 34 target myeloid genes were enumerated and their expected counts computed under independence. A Generalized Extreme Value (GEV) distribution was fitted to the log-transformed expected counts. The patient quintuple ranks 216,373 out of 278,256 at the gene level (77.8th percentile), confirming that these five genes are individually common enough in myeloid malignancies that their gene-level co-occurrence is unremarkable. The rarity originates from the specific variant combination, not the gene combination.
EVT Metric Value Interpretation
Gene-level percentile 77.8th Genes common, combination not extreme
Gene-level expected count 1.94e-2 Expected quintuples at gene level
Variant specificity product 7.93e-8 Product of variant-to-gene ratios
Variant-level expected count 1.54e-9 Gene-level times specificity product
Rarity gap 7.1 orders of magnitude Variants 12.6 million times rarer than genes
GEV shape parameter 0.2278 Positive shape: heavy-tailed distribution
The GEV fit confirms a heavy-tailed distribution (positive shape parameter 0.23), meaning a small number of gene combinations have disproportionately high expected counts (e.g., DNMT3A+TET2+ASXL1 combinations) while the vast majority are rare. The patient quintuple sits in the body of this distribution at the gene level, but falls in the extreme tail at the variant level.
Source: bayesian_rarity_results.json. Coles S. Statistical Modeling of Extreme Values. Springer (2001). Babur et al. Genome Biol (2015). Canisius et al. Genome Biol (2016).
References
  1. AACR Project GENIE Consortium. AACR Project GENIE: Powering Precision Medicine through an International Consortium. Cancer Discov (2017). DOI
  2. Coles S. An Introduction to Statistical Modeling of Extreme Values. Springer Series in Statistics (2001). DOI
  3. Babur O, Gonen M, Aksoy BA, et al. Systematic identification of cancer driving signaling pathways based on mutual exclusivity of genomic alterations. Genome Biol (2015). DOI
  4. Canisius S, Martens JW, Wessels LF. A novel independence test for somatic alterations in cancer. Genome Biol (2016). DOI
  5. Hanley JA, Lippman-Hand A. If nothing goes wrong, is everything all right? Interpreting zero numerators. JAMA (1983). PubMed