Mapping Genotypes To Chromatin Accessibility Profiles in Single Cells

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Article

Mapping genotypes to chromatin


accessibility profiles in single cells

https://doi.org/10.1038/s41586-024-07388-y Franco Izzo1,2,3,12,17 ✉, Robert M. Myers1,2,3,4,17, Saravanan Ganesan1,2,3,17, Levan Mekerishvili1,2,3,5,17,


Sanjay Kottapalli1,2,3, Tamara Prieto1,2,3, Elliot O. Eton1,2,3,4, Theo Botella1,2,3, Andrew J. Dunbar6,
Received: 22 May 2022
Robert L. Bowman6, Jesus Sotelo1,2,3, Catherine Potenski1,2,3, Eleni P. Mimitou1,13,
Accepted: 4 April 2024 Maximilian Stahl6,14, Sebastian El Ghaity-Beckley7, JoAnn Arandela7, Ramya Raviram1,2,3,
Daniel C. Choi8,9,10, Ronald Hoffman7, Ronan Chaligné1,2,3,15, Omar Abdel-Wahab6,
Published online: xx xx xxxx
Peter Smibert1,16, Irene M. Ghobrial11, Joseph M. Scandura8,9,10, Bridget Marcellino7,
Check for updates Ross L. Levine6 & Dan A. Landau1,2,3 ✉

In somatic tissue differentiation, chromatin accessibility changes govern priming


and precursor commitment towards cellular fates1–3. Therefore, somatic mutations
are likely to alter chromatin accessibility patterns, as they disrupt differentiation
topologies leading to abnormal clonal outgrowth. However, defining the impact of
somatic mutations on the epigenome in human samples is challenging due to admixed
mutated and wild-type cells. Here, to chart how somatic mutations disrupt epigenetic
landscapes in human clonal outgrowths, we developed genotyping of targeted loci
with single-cell chromatin accessibility (GoT–ChA). This high-throughput platform
links genotypes to chromatin accessibility at single-cell resolution across thousands
of cells within a single assay. We applied GoT–ChA to CD34+ cells from patients with
myeloproliferative neoplasms with JAK2V617F-mutated haematopoiesis. Differential
accessibility analysis between wild-type and JAK2V617F-mutant progenitors revealed
both cell-intrinsic and cell-state-specific shifts within mutant haematopoietic
precursors, including cell-intrinsic pro-inflammatory signatures in haematopoietic
stem cells, and a distinct profibrotic inflammatory chromatin landscape in
megakaryocytic progenitors. Integration of mitochondrial genome profiling and
cell-surface protein expression measurement allowed expansion of genotyping onto
DOGMA-seq through imputation, enabling single-cell capture of genotypes, chromatin
accessibility, RNA expression and cell-surface protein expression. Collectively, we
show that the JAK2V617F mutation leads to epigenetic rewiring in a cell-intrinsic and cell
type-specific manner, influencing inflammation states and differentiation trajectories.
We envision that GoT–ChA will empower broad future investigations of the critical
link between somatic mutations and epigenetic alterations across clonal populations
in malignant and non-malignant contexts.

Differentiation topologies support human tissue homeostasis through Somatic driver mutations within HSCs promote clonal expansions
coordinated epigenetic regulation reflected in the chromatin accessi- and reshape differentiation landscapes with increasing impact in clonal
bility landscape, which, for example, determines haematopoietic stem haematopoiesis4,5 (CH), myeloproliferative neoplasm (MPN)6 and
cell (HSC) commitment towards cellular fates1. Single-cell chromatin leukaemia7. The recurrent p.V617F mutation in the Janus kinase 2 ( JAK2)
accessibility mapping has revealed that key transcription factors (TFs) gene is associated with CH8 and overt differentiation skews in poly-
change the accessibility of regulatory elements, underlying progenitor cythaemia vera (PV) and essential thrombocythemia (ET), leading to
cell epigenetic heterogeneity2 through epigenetic priming3. bone marrow failure in myelofibrosis (MF)9–12. JAK2V617F is constitutively
1
New York Genome Center, New York, NY, USA. 2Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medicine, New York, NY, USA. 3Sandra and Edward Meyer
Cancer Center, Weill Cornell Medicine, New York, NY, USA. 4Tri-Institutional MD-PhD Program, Weill Cornell Medicine, Rockefeller University, Memorial Sloan Kettering Cancer Center, New York,
NY, USA. 5Physiology, Biophysics and Systems Biology Graduate Program, Weill Cornell Medicine, New York, NY, USA. 6Molecular Pharmacology Program, Sloan Kettering Institute, Memorial
Sloan Kettering Cancer Center, New York, NY, USA. 7Division of Hematology/Medical Oncology, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA. 8Laboratory
of Molecular Hematopoiesis, Hematology and Oncology, Weill Cornell Medicine, New York, NY, USA. 9Richard T. Silver MD Myeloproliferative Neoplasm Center, Weill Cornell Medicine, New York,
NY, USA. 10Regenerative Medicine, Department of Medicine, Weill Cornell Medicine, New York, NY, USA. 11Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.
12
Present address: Department of Oncological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA. 13Present address: Immunai, New York, NY, USA. 14Present address:
Department of Medical Oncology, Division of Leukemia, Dana-Farber Cancer Institute, Boston, MA, USA. 15Present address: SAIL: Single-cell Analytics Innovation Lab, Memorial Sloan
Kettering Cancer Center, New York, NY, USA. 16Present address: 10x Genomics, Pleasanton, CA, USA. 17These authors contributed equally: Franco Izzo, Robert M. Myers, Saravanan Ganesan,
Levan Mekerishvili. ✉e-mail: franco.izzo@mssm.edu; dlandau@nygenome.org

Nature | www.nature.com | 1
Article
a Nuclei b HEL
1 2 Transposition 3 Droplet generation 4 In-droplet cell barcoding 5 Library construction
isolation WT TP53R248
Barcoded beads Chromatin accessibility Targeted genotyping CA46 GoT–ChA
R2N gDNA R1N Target locus on gDNA ATAC fragments
MUT
Nucleosomes
90%
CA46
P5 BCR1N Exponential (n = 2,117 cells)
Linear 10% GoT–ChA amplicons
Oil
Cells Nuclei Tn5 transposition amplification amplification

UMAP2
* HEL
Locus-specific * (n = 2,540 cells)
GoT–ChA primers * Mutation site
UMAP1
c d e f g JAK2V617 GoT–ChA
TP53R248 locus TP53R248 GoT–ChA 6 CCRF-CEM
Accuracy = 99.7% TP53 WT Accuracy = 96.2% (HEL + CCRF-CEM)
4 WT
10x scATAC Cells genotyped = 49.5% Cells genotyped = 63.2% (HEL + CCRF-CEM)
100 2 SET-2 JAK2V617
(% cells with coverage) HET GoT–ChA HEL
0 SET-2
GoT–ChA HEL WT (0%)

CNV score (log2[FC])


CA46 –2 Cells genotyped = 64.7%
75 (% cells genotyped) MUT HET (2.7%)
WT (0.1%) –4 MUT (76.1%) 100 WT
MUT (49.0%) 13.9% HET
–6 NA (21.2%)
Cells (%)

NA (50.9%) MUT
49.8% 49.1% 6 75

genotyped cells
50

Percentage of
TP53 MUT 34.3%
4 SET-2
2 (n = 1,268 cells) 50
HEL
25 0 (n = 1,334 cells) CCRF-CEM
HEL –2 WT (28.8%) 25 51.8%
3.94% 3.68% WT (49.6%) CCRF-CEM HET (1.3%)
–4
UMAP2

UMAP2
UMAP2
MUT (0.2%) MUT (0.6%)
0 –6 (n = 638 cells)
NA (50.2%) NA (69.3%) 0
0 10 20 30 40 50
HEL CA46 UMAP1 UMAP1 UMAP1
10 Mb bin number (Chr. 9)

h Multiplexed-adapted GoT–ChA i WT
j
JAK2V617 TP53M133 NRASQ61 TP53R248
OCI-AML3 TP53M133 + Accuracy = 93.5% Accuracy = 92.2% Accuracy = 91.5% Accuracy = 93.5% HET
MUT Number of genotyped
SET-2 JAK2V617 + Genotyped = 60.1% Genotyped = 56.0% Genotyped = 73.4% Genotyped = 58.0%
NA loci per cell
NRASQ61 + 0 1 2 3 4
HEL TP53R248
GoT–ChA
CA46
OCI−AML3 (n = 2,627 cells) 6 15.5 25.5 27.4 24.6

CA46 (n = 2,458 cells) 0 25 50 75 100


UMAP2

UMAP2

Cells (%)
HEL (n = 1,828 cells)

SET-2 (n = 995 cells)


UMAP1 UMAP1

Fig. 1 | GoT–ChA profiles single-cell genotypes with chromatin accessibility. (n = 1,268 cells) and CCRF-CEM (n = 638 cells) coloured according to GoT–ChA
a, The GoT–ChA workflow (Methods). b, TP53 R248 mixing study (top) and genotyping of homozygous WT, homozygous mutant (MUT), heterozygous
accessibility-based uniform manifold approximation and projection (UMAP; (HET) and not assignable (NA) cells. SET-2 genotyping showing multiallelic
bottom) for HEL (TP53 WT/WT) and CA46 (TP53 MUT/MUT) cells. c, TP53 R248 locus capture in a subset of cells (inset; Methods). h, Multiplex-adapted GoT–ChA
coverage (Methods). d, UMAP coloured by GoT–ChA genotyping of CA46 and cell-mixing experiment (top) and chromatin-accessibility-based UMAP for
HEL cells assigned as WT, mutant (MUT) or not assignable (NA). e, scATAC– multiplexing GoT–ChA targeting four distinct loci, coloured by cell line for
seq-inferred CNV scores in TP53 R248 WT HEL cells (with known chromosome 9 OCI-AML3, SET-2, HEL and CA46 cells. i, Chromatin-accessibility-based UMAP
amplification) or mutant cells (Methods). Chr., chromosome; FC, fold change. coloured according to multiplexed GoT–ChA genotype for JAK2V617, TP53 M133 ,
f, JAK2V617 mixing study (top) and chromatin accessibility-based UMAP for NRAS Q61 and TP53 R248 . j, The percentage of cells with 0 to 4 genotyped loci per
HEL ( JAK2 MUT/MUT), CCRF-CEM ( JAK2WT/WT) and SET-2 ( JAK2WT/MUT) cells (bottom). single cell.
g, Chromatin-accessibility-based UMAP for HEL (n = 1,334 cells), SET-2

active and phosphorylates downstream signal transducers and activa- Here we developed genotyping of targeted loci with single-cell chro-
tors of transcription (STATs)9, promoting an inflammatory phenotype13 matin accessibility (GoT–ChA) for droplet-based high-throughput
and cytokine-independent expansion of the erythroid–megakaryocytic simultaneous capture of genotyping and chromatin accessibility from
lineage14. However, the epigenetic underpinning of JAK2V617F differ- the same single cell. GoT–ChA enables high-resolution intrasample
entiation in human disease remains largely unclear, as mutated and comparisons of mutated versus WT chromatin accessibility profiles
wild-type (WT) cells are admixed, limiting bulk or single-cell chromatin within the same patient, bypassing interindividual clinical confounding
profiling. Moreover, interindividual comparisons are confounded factors and decoupling cell-intrinsic from microenvironment effects.
by clinical covariates and microenvironmental cell-extrinsic effects Single-cell genotyping using GoT–ChA is based on targeted ampli-
across patients. fication of genomic DNA, obviating both the dependency on target
To resolve admixed WT and mutated cells in primary human samples, gene expression level and mutation position. When applied to human
single-cell multi-omics methods can link genotypes with transcrip- JAK2V617-mutated CH, PV and MF, GoT–ChA substantially increases the
tional15–19 or cell-surface protein20,21 profiles. Plate-based TARGET-seq genotyping rate relative to RNA-based methods19,22,23. Furthermore, we
performs simultaneous single-cell RNA-sequencing (scRNA-seq) with identify intrinsic and cell-type-specific effects of JAK2V617F in human
genotyping through targeted amplification of cDNA and genomic DNA haematopoiesis and demonstrate the ability of GoT–ChA to resolve
in hundreds of single cells16,17. Droplet- or nanowell-based techniques clonal admixtures in primary human samples.
enable higher throughput with simultaneous scRNA-seq and genotyp-
ing of actively transcribed genes18,19,22,23, but these rely on captured
mRNA, resulting in a dependency on target-gene expression level GoT–ChA for single-cell multi-omics
and the distance of the mutated locus from the transcript end, which To integrate genotyping into single-cell assay for transposase-accessible
severely impacts the genotyping of low-expressed genes, including chromatin followed by sequencing (scATAC–seq), we modified the 10x
JAK219,23. Furthermore, although a plate-based method is available24, Genomics platform by adding two custom primers (GoT–ChA primers)
existing droplet-based methods cannot jointly capture genotyping to the cell barcoding PCR reaction mixture before loading the micro-
and chromatin accessibility to define the impact of somatic mutations fluidics chip for droplet generation (Fig. 1a, Extended Data Fig. 1a–e,
on the epigenetic landscape of clonal expansions. Supplementary Fig. 1, Supplementary Table 1 and Supplementary

2 | Nature | www.nature.com
Methods). Genotype capture does not require target site tagmenta- applied to the peak-by-cell matrix was used for sample integration to
tion, as amplicons already contain the capture sequence. Our R package correct for batch effects followed by doublet removal (Methods and
(see the ‘Code availability’ section) provides start-to-end processing of Extended Data Fig. 4d–j). Individual cluster identities were assigned
GoT–ChA data and integration with scATAC–seq profiles. using gene accessibility scores of haematopoietic lineage marker
We performed GoT–ChA profiling of two mixed (1:1 ratio) human cell genes, confirmed by TF motif accessibility and differential enrich-
lines with differing TP53R248 genotypes (CA46, TP53R248Q homozygous ment of marker peaks (Extended Data Fig. 5a–c). We identified distinct
mutant; HEL, TP53R248 homozygous WT; Extended Data Fig. 1f). Cell lines HSC subclusters—early, lymphoid-biased (HSCLY) and myeloid-biased
were identified using chromatin accessibility information alone (Fig. 1b) (HSCMY)—through TF motif accessibility scores, further supported
and annotated on the basis of marker gene accessibility and mutually with orthogonal annotation by cell mapping using multi-omic bridge
exclusive mitochondrial variants, while retaining high data quality integration (Methods and Extended Data Fig. 5d–f).
(Extended Data Fig. 1g–l). We developed a computational framework JAK2 V617 was genotyped in 48,372 out of 150,643 total cells
for genotyping (Methods, Extended Data Fig. 1m–n, Supplementary (38.1 ± 10.6%; mean ± s.d. across samples; Extended Data Fig. 6a–h,
Methods and Supplementary Fig. 2). Whereas less than 4% of cells had Supplementary Fig. 4 and Supplementary Methods) compared with in
a scATAC–seq read covering TP53R248, GoT–ChA genotyped 49.8% (HEL) 7–10% of cells in previous droplet-based scRNA-seq cDNA-based geno-
and 49.1% (CA46) of cells with 99.7% accuracy after background noise typing19,22 (Extended Data Fig. 7a). Projection of genotypes onto the
correction (Fig. 1c,d). WT and mutant cell copy-number variation (CNV) differentiation map showed mixing of JAK2-WT and JAK2-mutant cells
scores inferred from the chromatin accessibility profiles (Methods) in HSC and myeloid progenitor clusters, while 91.6% of common lym-
across chromosome 9 (amplified in WT HEL cells25) orthogonally con- phoid progenitors (CLPs) and lymphoid cell clusters were composed
firmed successful single-cell genotyping (Fig. 1e). of WT cells (Fig. 2b; versus 51.7% of WT cells in the remaining clusters;
We tested heterozygous genotyping with the JAK2V617 hotspot by P < 10−10; Fisher exact test), consistent with previous MPN studies14,27.
mixing (1:1:1 ratio) HEL (JAK2V617F homozygous mutant), SET-2 (JAK2V617F However, genotype densities projected onto the differentiation
heterozygous) and CCRF-CEM (JAK2 homozygous WT) cells (Extended map suggested an uneven distribution of mutant cells across progeni-
Data Fig. 2a). Clustering using scATAC–seq data alone resulted in dis- tors, with WT cells enriched in HSC and lymphoid clusters, JAK2V617F
crete cell populations (Fig. 1f) that were annotated on the basis of dif- heterozygous cells enriched in granulocyte–monocyte progenitors,
ferential gene accessibility and unique mitochondrial variants, while and homozygous JAK2V617F cells enriched in both granulocyte–monocyte
retaining high scATAC–seq data quality (Extended Data Fig. 2b–f and and erythroid–megakaryocytic clusters, patterns that are more pre-
Supplementary Methods). Whereas less than 2% of cells had a scATAC– dominant in untreated compared with ruxolitinib-treated samples
seq read covering JAK2V617, GoT–ChA genotyping captured 30.7% of (Fig. 2c–e). While HSCs and myeloid progenitor clusters in untreated
CCRF-CEM, 64.7% of SET-2 and 78.8% of HEL cells (Extended Data samples comprised WT and JAK2V617F-mutant cell admixtures, their fre-
Fig. 2g–i). In the WT CCRF-CEM and mutant HEL homozygous lines, quencies varied greatly. The normalized mutant fraction was increased
63.2% of cells were genotyped for JAK2V617 with 96.2% accuracy; 64.7% of in early (EP1) and late (EP2 and EP3) erythroid progenitor (EP), mega-
heterozygous SET-2 cells were genotyped, of which 34.3% were correctly karyocytic progenitor (MkP) (Extended Data Fig. 7b), HSCMY and granu-
identified as heterozygous, confirming biallelic, albeit incomplete, locyte–monocyte progenitor clusters (Extended Data Fig. 7c,d). By
capture (Fig. 1g). CNV scores at chromosome 9, amplified in JAK2V617F contrast, ruxolitinib-treated patients had a more even mutant cell dis-
homozygous mutant HEL cells25, orthogonally validated successful tribution across cell types (Extended Data Fig. 7b,e,f), suggesting that
genotyping (Extended Data Fig. 2j). The proportion of genotyped cells JAK inhibition alters the JAK2V617F frequency in these lineages, but does
was positively correlated with the JAK2 copy number (Extended Data not eliminate the mutated clone28–30. A patient with no on-treatment
Fig. 2k), while genotyping accuracy remained constant (Extended Data response to ruxolitinib (Pt-07) showed a similar normalized mutant
Fig. 2l). We further tested a high-GC-content target, FOXO1S22 (78.9% fraction distribution to untreated patients across cell types (Extended
GC versus 50.83 ± 13.6% in other tested targets; mean ± s.d.) by mixing Data Fig. 7g,h), suggesting that differentiation skews are seen with
(1:1 ratio) HEPG2 and SUM159 cells. Despite the high GC content, we ruxolitinib resistance. The increased mutated cell fraction trend was
achieved genotyping for 14.3% of cells with 94.8% accuracy (Extended already observed in our CH sample and was more pronounced in a
Data Fig. 2m,n). We optimized GoT–ChA for multiplexing up to four patient with PV who progressed to MF (Extended Data Fig. 7b). Pseu-
targets (Fig. 1h–j and Extended Data Fig. 3a–l) and demonstrated that dotemporal ordering (Extended Data Fig. 7i) showed an increased
genotyping efficiency is not correlated with locus accessibility (Meth- fraction of mutant cells along erythroid differentiation in untreated
ods, Extended Data Fig. 3m, Supplementary Table 2 and Supplementary patients, while ruxolitinib treatment led to a more uniform distribution
Methods). Together, GoT–ChA enables high-throughput co-capture of mutant cells (Fig. 2f), underscoring a progenitor-specific mutant cell
of genotypes and chromatin accessibility profiles in single nuclei, predominance in MF that is eliminated by therapeutic JAK2 inhibition.
independent of target expression level, chromatin accessibility or
the distance of the mutated locus from the transcript end.
JAK2V617F HSC and MKP cell-intrinsic phenotypes
Inflammatory disruption of the bone marrow microenvironment is
GoT–ChA in primary human JAK2V617F MF a hallmark of MF31–35, but how cell-intrinsic epigenetic profiles differ
JAK2V617F has a central role in MPN pathogenesis9–12. To investigate how in WT and JAK2V617F-mutant early progenitors in human MF remains
JAK2V617F disrupts cell fate and the regulatory chromatin landscape of unclear. We compared gene accessibility scores (Methods) between WT
HSCs, we applied GoT–ChA to CD34+-sorted progenitor cells (Supple- and JAK2V617F-mutated cells in early HSC and HSCMY clusters, revealing
mentary Fig. 3) from 21 human primary samples, comprising 18 patients a cell-intrinsic pro-inflammatory phenotype in JAK2V617F-mutant HSCs
with JAK2V617F-mutated MF who were untreated (n = 12; including three with increased gene accessibility of inflammatory genes, including
longitudinal PV-to-MF samples) or treated with ruxolitinib (n = 6), a the NF-κB pathway gene TRAPPC936 and the TGF-β superfamily gene
JAK1/2 inhibitor (Supplementary Table 3). We included a JAK2V617F CH BMPR1B, which are involved in leukaemic HSC maintenance37 (Fig. 3a,
sample (n = 1) to explore early epigenetic changes, before the onset Extended Data Fig. 8a and Supplementary Table 4). The TGF-β super-
of overt haematological abnormality. scATAC–seq data quality was family secreted ligand gene GDF10 had increased accessibility, suggest-
not affected by the inclusion of GoT–ChA genotyping (Extended Data ing a co-increase in ligands and receptors, potentially promoting NF-κB
Fig. 4a–c). We performed cell clustering based solely on chromatin and TGF-β signalling activation in JAK2V617F-mutated HSCs. Upregulated
accessibility data (Fig. 2a). Reciprocal latent semantic indexing (LSI)26 genes included MMP15 (Fig. 3a), which is involved in extracellular matrix

Nature | www.nature.com | 3
Article
a Human CD34+ cells
b WT (n = 15,028) HET (n = 10,390)
c HSC
n = 21 samples (19 patients) MUT (n = 22,954) NA (n = 102,271) WT density
(150,643 cells) High
HSC MkP Cells genotyped = 38.1%
LMPPEM
EP1
CD4 T EP2 Low
LY
EP3
CD8 T
MEP HET density
High
NK HSCMY
pCD4/8 BaEoMa
B Low
CMP GMP
HSCLY JUN/FOS EM
MUT density
CD4 effector LMPP GMP
High
CLP
CD16 monocytes
UMAP2

UMAP2
CD14 monocytes CD14 monocytes
Low
UMAP1 UMAP1 GMP

d e f
WT density MUT and HET density Δ Density (MUT and HET – WT)
Low High Low High

Normalized mutant cell fraction


−0.010 0 0.010 Untreated
HSC EM Ruxolitinib
HSC EM
1.2

Untreated
Untreated

HSCMY
HSCMY
0.8

LMPP GMP GMP


0.4

1 2 3 4 5 6 7 8 9–10
EP trajectory pseudotime quantile
Ruxolitinib
Ruxolitinib

Pseudotime: Early Late


Pseudotime
mean

Cells (%)
UMAP2

UMAP2

GMP GMP
Cluster: HSC MEP EP1 EP2 EP3

UMAP1 UMAP1

Fig. 2 | GoT–ChA applied to human JAK2V617F-mutated MF samples. mutant, heterozygous and not assignable. c, Density estimation of the WT,
a, Chromatin accessibility UMAP after reciprocal LSI integration (Methods) HET or MUT cell distribution across UMAP embedding. LY, lymphoid; EM,
from CD34+-sorted patient samples (n = 21 samples, 19 patients, 150,643 cells). erythroid–megakaryocyte; GMP, granulocyte–monocyte progenitors. The
BaEoMa, basophil/eosinophil/mastocyte progenitors; CD14 monocytes, CD14+ dotted lines highlight increased density. d, Density estimation as in c but for
monocytes; CD16 monocytes, CD16+ monocyte progenitors; CLP, common untreated (top) or ruxolitinib-treated (bottom) patients. e, Cell density
lymphoid progenitors; CMP, common myeloid progenitors; EP1 to EP3, difference (Δ density) between mutated and WT for untreated (top) and
erythroid progenitors; GMP, granulocyte–macrophage progenitors; HSC, ruxolitinib-treated (bottom) patients. The dotted lines highlight changes in Δ
haematopoietic stem cells; HSCLY, lymphoid-based HSCs; HSCMY, myeloid-based density. f, The normalized mutant fraction along erythroid pseudotime quantiles
HSCs; JUN/FOS, high JUN/FOS motif accessibility; LMPP, lymphoid-myeloid for untreated (n = 9,855 cells) or ruxolitinib-treated (n = 6,356 cells) samples,
pluripotent progenitors; MEP, megakaryocyte–erythrocyte progenitors; MkP, with quantiles 9–10 merged to increase the cell number. The points represent
megakaryocytic progenitors; LMPPEM, erythroid–megakaryocyte-biased mean mutant cell fraction, the error bars indicate the s.e.m. across samples,
LMPPs; NK, natural killer cells; pCD4/8, CD4 and CD8 progenitors; . A full list of the lines indicate the fit and the shadowed areas represent the 95% confidence
cell type definitions is provided in Supplementary Table 3. b, Integrated UMAP interval of the generalized additive model (top).
with GoT–ChA-assigned JAK2V617 genotypes: homozygous WT, homozygous

remodelling and acute myeloid leukaemia38. Thus, mutated HSCs show DNA-binding motifs of TFs associated with myeloid–erythroid dif-
epigenetic priming with increased accessibility of genes involved in ferentiation also had increased accessibility in mutant cells (Fig. 3b),
pro-inflammatory signalling and extracellular matrix remodelling. suggestive of myeloid epigenetic priming in early JAK2V617F-mutated
In contrast, WT HSCs showed increased accessibility for stem- and haematopoiesis. Overall, differentially accessible TF motifs between
quiescence-associated (FRY39, HLF40) genes, together with HSC main- WT and JAK2V617F-mutated cells from HSC and HSCMY clusters had simi-
tenance and lymphoid priming genes (PBX141,42; Fig. 3a). Decreased lar changes in homozygous and heterozygous JAK2V671F-mutant cells,
accessibility of these quiescence/stem genes in JAK2V617F-mutated HSCs except for NFKB1 and REL, which were specifically increased in homozy-
compared with in WT HSCs might underlie their higher contribution gous mutants (Extended Data Fig. 8c).
to myeloid cell types. Through longitudinal sampling of patient Pt-01 (PV to MF; Supple-
We inferred TF activity based on DNA-binding-motif accessibility mentary Table 3), we observed that JAK2V617F-mutant early HSCs already
(Methods). Comparing WT and JAK2V617F-mutated cells in the HSC and show increased STAT1 motif accessibility in the PV stage before MF
HSCMY clusters, we identified TFs with increased motif accessibility (Extended Data Fig. 8d). Thus, while pro-inflammatory phenotypes
(false-discovery rate (FDR) < 0.05, ∆z > 0.1) in early mutant HSCs. We were linked to extrinsic effects16,43,44 and altered transcription in com-
observed increased motif accessibility of canonical JAK2 downstream mitted EPs23, we show in early mutant human HSCs that JAK2V617F pro-
targets and TFs involved in NF-κB signalling and the TGF-β pathway in motes cell-intrinsic pro-inflammatory gene accessibility and TF activity,
mutant cells (Fig. 3b, Extended Data Fig. 8b and Supplementary Table 5), before erythroid commitment. We correlated NF-κB-related TF motif
further suggesting that JAK2V617F already primes early HSCs towards a accessibility with STAT-family TF motifs in HSCs, observing increased
pro-inflammatory state. Ruxolitinib abrogated the cell-intrinsic TF correlation of motif accessibility between canonical STAT targets and
motif accessibility between WT and JAK2V617F-mutated HSCs (Sup- NF-κB factors, with reduced correlation of non-canonical JAK2 targets
plementary Table 6), consistent with JAK2V617F constitutive activation and NFKB1 (Extended Data Fig. 8e), suggesting that JAK2-mediated
mediating these changes in untreated patients. activation underlies the increased activity of NF-κB-associated TFs.

4 | Nature | www.nature.com
a HSC and HSCMY clusters (untreated) c HSC and HSCMY clusters
Increased in WT Increased in JAK2V617F Pt-19 (JAK2V617F clonal haematopoiesis)
6 5.0
4 P = 0.018 P = 0.021

STAT5A motif
Stem cell maintenance 4 P = 0.048

STAT1 motif
LMO2 motif
TRAPPC9 2.5
NF-κB target genes 2
2
TGF-β and matrix PTPRD 0
GLI2 0 0

Motif accessibility
20 remodelling
FDR < 0.05 and BMPR1B –2
GAS2 −2 −2.5
absolute log2[FC] > 0.25

–log10[FDR]
WDR86
PBX1 FABP5 WT JAK2V617F WT JAK2V617F WT JAK2V617F
FRY MMP15 4
NS 10
10 HLF VCAN 4 NS

NFKB1 motif
SMAD1 motif
2

TGIF2 motif
BAG2 GDF10
PTPN14 2 5
CTSG 0 NS
0
–2 0
0 –2
−1 0 1 –4 –5
Gene accessibility score (log2[FC (JAK2V617F/WT)] WT JAK2V617F WT JAK2V617F WT JAK2V617F

b HSC and HSCMY clusters (untreated) d MkP cluster (untreated)


Increased in WT Increased in JAK2V617F Increased in WT Increased in JAK2V617F

Myeloid/erythroid TCF12
3 JUN/FOS family
60 STAT TAL1
NF-κB LYL1 FDR < 0.1 and
TGF-β LMO2 absolute Δz > 0.1
FDR < 0.05 and SNAI1
absolute Δz > 0.1 TCF4
ID3
40 STAT1 ID4 2 SMARCC1
ZEB1 BACH2 FOSL1
–log10[FDR]

–log10[FDR]
STAT3 HOXC13 JDP2
RUNX1 FOS
STAT5B FOSB JUND
TGIF2LX FOSL2
NFE2L2 STAT5A TGIF2LY JUNB
TGIF2 BACH1 JUN
NFE2
20 SMAD1 1
SMAD4
GATA5 REL
JDP2 NFKB1
JUN RELA
FOS RELB
FOSB NFKB2
0 0

–0.4 0 0.4 –1 0 1
Δz score for motif accessibility (JAK2V617F – WT) Δz score for motif accessibility (JAK2V617F – WT)

Fig. 3 | JAK2V617F-mutant HSPCs exhibit intrinsic pro-inflammatory and c, TF motif accessibility for WT or JAK2V617F -mutated HSC and HSCMY clusters
myeloid-biased epigenetic priming. a, Differential gene accessibility scores (n = 240 cells) for a patient with JAK2V616F CH. The error bars represent the range,
between WT (n = 1,868 cells) and mutant (n = 1,814 cells) cells within the HSC the boxes represent the interquartile range and centre lines represent the
and HSCMY clusters of untreated patients with MF (n = 12; excluding CH and PV median. Statistical analysis was performed using two-sided Wilcoxon rank-sum
sample). The horizontal dotted line represents FDR = 0.05; the vertical dotted tests. d, Differential TF motif accessibility between WT (n = 378 cells) and mutant
lines represent absolute log 2[fold change] > 0.1. b, Differentially accessible (n = 1,521 cells) cells within the MkP cluster of untreated patients with MF with
TF motifs between WT (n = 1,858 cells) and JAK2V617F -mutant (n = 1,800 cells) >50 genotyped cells in the cluster (n = 7 patients). The horizontal dotted line
cells from untreated patients with MF with >50 genotyped cells in the cluster represents FDR = 0.05; the vertical dotted lines represent absolute Δz > 0.1. For
(n = 11 MF samples; excluding CH and PV samples). The horizontal dotted line a, b and d, statistical analysis was performed using a linear mixed model (LMM)
represents FDR = 0.05; the vertical dotted lines represent absolute Δz > 0.1. followed by Benjamini–Hochberg correction.

To validate mutant HSC signatures, we used bulk RNA-seq data of JAK2V617F-mutated MkPs have a distinct cell-intrinsic pro-inflammatory
Lin−SCA1+KIT+ (LSK) progenitor cells from a dre-rox, cre-lox dual recom- signature mediated by JUN and FOS TFs, which differs from the
binase Jak2V617F mouse model that allows for sequential mutated allele NF-κB- and TGF-β-mediated pro-inflammatory signature observed
knock-in followed by knockout45 (Jak2Rox/Lox; Extended Data Fig. 8f). in HSCs.
Gene set analysis revealed enrichment of haeme metabolism genes,
including erythroid TFs, and TNF signalling through NF-κB gene set
in Jak2V617F compared with Jak2V617F-deleted mouse LSK cells (Extended Epigenetic changes in JAK2-mutant EPs
Data Fig. 8f), supporting the causality of the JAK2V617F mutation in driv- We next defined epigenetic changes in EPs, which are enriched in
ing the observed phenotypes. JAK2V617F-mutated cells in untreated patients (Fig. 2f). We found
We assessed whether TF motif accessibility changes observed in MF increased accessibility of STAT1, STAT3, STAT5A and STAT5B motifs in
were already evident in CH, and found that JAK2V617F-mutated HSCs in the EP clusters (EP1–3; Fig. 4a, Extended Data Fig. 8n and Supplementary
CH had increased accessibility of canonical downstream JAK2 targets Table 8), highlighting cell-type-specific activation of canonical JAK2
and the predicted JAK2 phosphorylation target LMO2 (Fig. 3c, Extended targets in EPs versus MkPs, possibly increasing the fitness of mutated
Data Fig. 8g,h and Supplementary Table 6), as seen in MF (Fig. 3b), while EPs. We observed increased accessibility of multiple myeloid/erythroid
the motif accessibility of TFs involved in TGF-β or NF-κB pathways was TF-binding motifs (Fig. 4a), consistent with haeme metabolism pathway
not increased in JAK2V617F-mutated HSCs in CH (Fig. 3c), suggesting a enrichment (Extended Data Fig. 8o) and our early HSC observations,
more restricted inflammatory phenotype in our CH sample. pointing to sustained JAK2V617F effects across erythroid differentia-
We next assessed cell-intrinsic epigenetic changes in JAK2V617F- tion. By contrast, PU.1 (encoded by SPI1)—a myeloid/lymphoid cell fate
mutant MkPs, identifying a distinct pro-inflammatory signature char- regulator47 that, when downregulated in myeloid-biased stem cells,
acterized by increased accessibility of JUN and FOS TF motifs (Fig. 3d, promotes erythroid differentiation48—had decreased motif accessibil-
Extended Data Fig. 8i and Supplementary Table 7), validated by TF ity in mutant cells (Fig. 4a). Thus, increased erythroid-associated and
footprinting (Extended Data Fig. 8j). JUN and FOS have been associ- STAT activity may increase the expansion of mutated EPs.
ated with increased bone marrow remodelling and fibrosis46, and A single JAK2V617F copy was sufficient for STAT-family activation in
differential gene accessibility score analysis followed by gene path- heterozygous EPs, but other TFs (such as GATA1) showed increased
way enrichment revealed a pro-inflammatory signature in mutated motif accessibility proportional to allelic burden (Extended Data
MkPs (Extended Data Fig. 8k), findings supported by our Jak2V617F Fig. 8p). BCL11A showed decreased motif accessibility in mutated
mouse model (Extended Data Fig. 8l–m). These findings show that EPs (FDR = 7.64 × 10−6, Δz = −0.54; Fig. 4a and Extended Data Fig. 8q).

Nature | www.nature.com | 5
Article
a EP clusters (untreated) b
LCR
Increased in WT Increased in JAK2V617F
12 BCL11A
FDR < 0.05 and STAT5B

Normalized signal
WT

(range 0–170)
absolute Δz > 0.1 STAT1
STAT5A
Myeloid/erythroid
STAT3 MUT
STAT
9
HBE1
SPIB HBG2
Genes
SPI1 KLF2 HBG1
SPIC TCF12
–log10[FDR]

Peaks
KLF4
6 KLF1 WT Co-accessibility

Co-accessibility of
KLF3

HBG1 enhancer
BCL11A LMO2 0 0.1 0.2 0.3 0.4 0.5
TCF4
ID3 GATA1–6
MUT
3
MESP1
POU2F1
TFCP2 WT

Co-accessibility of
control peak
0

–1 0 1 MUT
Δz score for motif accessibility
(JAK2 V617F – WT)

5240000 5260000 5280000 5300000


c d Chr. 11 position (bp)
P < 10–10 P < 10–10
P = 0.011
P = 2.3 × 10–5 FWER = 3.46 × 10–8 FWER = 0.022

Normalized signal
FWER = 0.0011
accessibility (%)
BCL11A motif

Cells with HBG1


accessibility

(range 0–28)
10 WT NS NS
–5
5 MUT
–10 0 HBG2
WT MUT WT MUT WT MUT WT MUT HBG1
Pt-01 (PV) Pt-01 (MF) Pt-01 (PV) Pt-01 (MF) Peaks
5237500 5240000 5242500 5245000 5247500 5250000
Chr. 11 position (bp)

e P = 0.035 f Pt-14 (myelofibrosis) g


JAK2V617F (c.1849G>T)
250
105 105 HbF– HbF+
HbF+ cells (%)
CD34+CD71+

10 CD34+CD71+HbF– CD34+CD71+HbF+
200
SSC (×103)

104 104 25.9% 5.01%


CD71
CD71

150
5
100 103 103
4.19%
50 86.2%
102 102 A T GG A G T A T G T T T C T G A T GG A G T A T G T T T C T G
0 0
0 100 200 101 102 103 104 105 101 102 103 104 105 G (30%):T (70%) G (2%):T (98%)
Control MF
(n = 3) (n = 5) FSC (×103) CD34 HbF

Fig. 4 | JAK2V617F-driven epigenetic dysregulation of the EP haemoglobin calculated using Fisher exact tests followed by Bonferroni correction. The
locus. a, EP1–3 differential TF motif accessibility between WT (n = 1,372 cells) family-wise error rate (FWER) is shown for each peak. NS, not significant
and mutant (n = 3,796 cells) cells of samples of untreated patients with MF (FWER > 0.05). c, BCL11A motif accessibility for Pt-01 PV (n = 37 WT cells; n = 201
(n = 8) with >50 cells per cluster. The horizontal line represents FDR = 0.05; the JAK2V617F cells) to MF (n = 213 WT cells; n = 1,690 JAK2V617F cells). Statistical analysis
vertical lines represent absolute Δz > 0.1. Statistical analysis was performed was performed using two-sided Wilcoxon rank-sum tests. d, The percentage of
using an LMM followed by likelihood ratio test and Benjamini–Hochberg EP1–3 with HBG1 gene accessibility for Pt-01. Statistical analysis was performed
correction. b, Co-accessibility (correlation > 0.1, FDR < 0.05; two-sided using two-sided Fisher tests. e, The percentage of HbF-positive EPs was
Wilcoxon rank-sum test and Benjamini–Hochberg correction) at the HBG1 determined using flow cytometry analysis of mobilized peripheral blood of
locus for WT or JAK2V617F cells (downsampled to n = 1,372 cells per genotype). healthy individuals or patients with MF with JAK2 V617F mutation. Statistical
Haemoglobin locus control region (LCR), BCL11A motifs, normalized analysis was performed using two-sided Wilcoxon rank-sum tests. f, Flow
accessibility signal tracks for WT or JAK2V617F -mutated EPs, gene annotations cytometry gating of EPs for sorting HbF-positive or HbF-negative cells. g, Sanger
and peak regions are shown. Co-accessibility centred on peaks (shaded boxes) sequencing traces for sorted cells from f. The grey box shows the expected
for an HBG1 enhancer (top). Negative control: peak with no BCL11A motif base change (c.1849G>T). For all box plots, the whiskers show the range, the box
(bottom). Inset: HBG1 proximal peaks. Differential peak accessibility was limits show the interquartile range and the centre line shows the median.

BCL11A represses γ-globin genes HBG1 and HBG249, and its loss were already present in mutant cells in PV that persisted in MF. To
increases expression of HBG150, a fetal haemoglobin (HbF) compo- validate increased HBG1 in JAK2V617F-mutated EPs, we measured HbF
nent. To examine the regulation of the haemoglobin locus, we defined using flow cytometry in CD34+CD71+ EPs (Methods). Patients with
pairs of co-accessible regions, reflecting coordinated activation of MF had increased total HbF protein levels compared with healthy
regulatory elements in WT or JAK2V617F-mutant EPs (Methods). A peak control individuals (Fig. 4e), confirming previous reports in MPN51.
containing an enhancer region (ENCODE; https://screen.wenglab.org/ Sorting HbF+CD34+CD71+ or HbF−CD34+CD71+ EP cells followed by
search/?q=ENCODE:+EH38E2941881&uuid=0&assembly=GRCh38) sequencing showed an enrichment of JAK2V617F-mutated (c.1849G>T
upstream of HBG1 and a predicted BCL11A-binding motif showed transversion) reads in the HbF+ cell fraction (98%) compared with in
increased co-accessibility in JAK2V617F-mutant EPs compared with the HbF− cell fraction (70%; Fig. 4f,g, Methods and Supplementary
WT EPs, while another proximal enhancer region with no predicted Fig. 5). Thus, while defective erythropoiesis can increase HbF levels,
BCL11A-binding site showed unchanged co-accessibility with the locus cell-intrinsic regulatory changes in JAK2V617F -mutant EPs may also pro-
control region across both WT and JAK2V617F-mutant cells (Fig. 4b). mote HbF expression, preceding marrow fibrosis. These changes were
By comparing genotyped cells, we observed increased accessibil- reverted by ruxolitinib treatment (Supplementary Table 9), supporting
ity of HBG1 proximal peaks in mutant cells (Fig. 4b). In Pt-01 (PV to a cell-intrinsic mechanism and indicating that constitutive JAK2V617F
MF), decreased genome-wide BCL11A motif accessibility (Fig. 4c) and kinase activity increases the accessibility of HBG1 and decreases the
increased proportion of cells with HBG1 gene accessibility (Fig. 4d) accessibility of BCL11A-binding motifs.

6 | Nature | www.nature.com
a Pt-02 sample mtDNA coverage b To leverage mitochondrial variants, we developed and applied a
GoT–ChA–ASAP Pt-02 sample
random-forest classifier to impute missing JAK2 genotypes on the
Average reads per cell

Heteroplasmy (%)
100 DOGMA-seq JAK2V617F JAK2 WT
10 GoT–ChA 100
1 75 12,786
basis of heteroplasmy levels (Methods and Supplementary Fig. 6).
0.1 50
25
A>G Application of genotype imputation to GoT–ChA–ASAP from Pt-02
0 3,834
0 G>A increased the genotyping rate from 20.6% to 85.4% of cells (Fig. 5c).
Average heteroplasmy:
12,786 A>G (77.7%); 3,834 G>A (18.1%) TF motif accessibility changes between WT and JAK2V617F-mutated
c d HSCMY cluster HSCs showed a significant positive correlation between methods
Droplet-based JAK2V617 34.0% 85.4% (Extended Data Fig. 9f) and lymphoid clusters were mostly WT (91.3%)

Δz score (MUT – WT)


CD90 FWER < 0.05 and

Protein expression
100 single-cell genotyping (n = 10) (n = 1)
Cells genotyped (%)

1.0 Δz > 1
75 38.1% FWER < 0.05 and compared with the remaining clusters (35.6%; P < 10−10; Fisher exact
Δz < 1
50
(n = 13)
0.5 CD99 NS test; Extended Data Fig. 9g), supporting accurate genotype imputa-
7.3% 9.7% CD34
25 (n = 1) (n = 7) tion. We validated the genotyping consistency through correlation of
0
0 the mutant cell fraction per cluster for GoT–ChA or GoT–ChA–ASAP
Ref. 19 Ref. 22 GoT–ChA GoT–ChA– GoT–ChA– 0 50 100 150
ASAP ASAP
+ mtDNA (Pt-02)
Rank with or without mitochondrial-variant-based genotyping (Extended
e f Pt-02 (HSC to MkP)
Data Fig. 9h). Thus, our imputation classifier built on co-occurring
HSC
HSCMY 1.00 Feature Genotype mitochondrial variants yielded accurate genotyping for nearly all
MEP
(min.–max. normalized)

DORC WT
CMP cells in the sample.
CD36 feature value

0.75 RNA JAK2V617F


Cell cluster

LMPP
EP1 Protein expression (ADT) We also used GoT–ChA–ASAP to measure protein expression, apply-
HSCLY 0.50
BaEoMa ing a panel of haematopoietic cell-surface protein markers to validate
GMP
LMPPEM 0.25 cluster assignments (Methods and Extended Data Fig. 9i). Differential
EP2
MkP WT
JAK2V617F protein expression comparisons showed that mutant HSCMY cells had
CLP 0
6 3 0 3 0 5 10 15 20 higher CD90 expression compared with the WT (Fig. 5d), but not in
Normalized CD90 protein expression MkP trajectory pseudotime ruxolitinib-treated patients (Extended Data Fig. 10a,b). Mutant HSCs
Fig. 5 | GoT–ChA integration with ASAP–seq. a, Mitochondrial genome and HSCMY cells had increased gene accessibility of THY1 (encoding
(mtDNA) coverage (sample Pt-02) for GoT–ChA–ASAP, DOGMA-seq or GoT–ChA. CD90) across untreated samples, with no change in ruxolitinib-treated
b, Pt-02 mitochondrial variant heteroplasmy per cell, showing 12,786 A>G and patients (Extended Data Fig. 10c). CD90 is a cell-surface marker exp­
3,834 G>A variants that co-occur with GoT–ChA genotyping. c, The percentage ressed in a subset of primitive HSCs53. We verified the earliest HSC clus-
of JAK2V617-genotyped cells across technologies. Data are mean ± s.d. The dots ter assignment based on CD90 levels in WT or JAK2V617F haematopoiesis;
represent individual samples. The dotted line shows an increased percentage CD90 protein is highly expressed in WT cells from the HSC cluster as
of genotyped cells for Pt-02 after mtDNA-based genotype imputation. expected, while, for JAK2V617F-mutated cells, the HSCMY cluster has the
d, Differential protein expression between WT and mutant cells within the highest CD90 levels (Fig. 5e and Supplementary Table 10). CD90high HSC
HSCMY cluster. Statistical analysis was performed using an LMM followed by
population expansion was reported in PV14, and aberrant CD90 expres-
likelihood ratio test and Bonferroni correction. e, Normalized CD90 protein
sion was noted in homozygous JAK2V617F MPN HSCs using TARGET-seq16;
levels across clusters in WT or JAK2V617F -mutated cells in untreated patients
however, this observation was supported by only five cells. For further
with MF. Data are mean ± 95% confidence intervals. Cell numbers are shown in
Supplementary Table 10. f, The dynamics of CD36-associated features in MkPs
validation, we measured CD90 in HSCs using flow cytometry in MF
pseudotime trajectory, minimum–maximum normalized for visualization. samples (n = 71; Extended Data Fig. 10d). We observed a positive correla-
Statistical analysis was performed using a two-sided F-test. The shadowed tion between CD90 expression and JAK2V617F allele burden in mutated
regions represent the 95% confidence interval. HSCs (Extended Data Fig. 10e), and no such correlation in more differ-
entiated progenitor cells (HPCs; Extended Data Fig. 10f,g). We further
verified CD90 enrichment in mutated HSCMY cells using an orthogonal
single-cell genotyping method, observing the trend towards increased
GoT–ChA with select antigen profiling CD90 levels in mutated HSCs in Pt-11 (Extended Data Fig. 10h). Thus,
For expanded multimodality single-cell sequencing, we integrated CD90 is a potential marker of JAK2V617F-mutated HSCs in MF.
GoT–ChA with ATAC with select antigen profiling by sequencing We next integrated our mitochondrial-variant-based classifier with
(ASAP–seq)52, which assays chromatin accessibility simultaneously with DOGMA-seq52, which captures single-cell chromatin accessibility, RNA
targeted cell-surface protein expression using fixed whole cells, and expression and cell-surface protein profiles, but is incompatible with
applied this to ten patient samples (Supplementary Table 3). JAK2V617 GoT–ChA genotyping due to the lack of an in-droplet PCR step. To cir-
genotyping was obtained for 34.1 ± 29.4% (mean ± s.d.) of cells, while cumvent this limitation, we applied our classifier to Pt-02 DOGMA-seq
retaining chromatin accessibility quality (Extended Data Fig. 9a–c). data, which yielded imputed JAK2V617F genotyping for 45.8% of cells
Comparing a sample processed by both methods (Pt-02), we noted a (Extended Data Fig. 9g), and single-cell co-capture of genotype, chro-
moderate decrease in the genotyping rate with GoT–ChA–ASAP (20.7% matin accessibility, RNA expression and cell-surface protein levels. As
of cells) compared with GoT–ChA alone (50.4% of cells; Supplementary expected, lymphoid clusters were predominantly WT (Extended Data
Table 3), which is likely to be due to ASAP–seq fixation conditions. Fig. 9g). We analysed THY1 (encoding CD90), observing increased
However, by assaying entire cells rather than isolated nuclei, ASAP–seq accessibility and RNA expression in JAK2V617F-mutated HSCs and veri-
provides increased coverage of mitochondrial genomes52 compared fying that the proximal peak is linked to THY1 RNA levels (Extended
with standard scATAC (Fig. 5a). We hypothesized that JAK2V617F -mutant Data Fig. 10i). When comparing the overall gene accessibility and gene
cells might contain mitochondrial variants that could be used for geno- expression changes, we again (Fig. 3a) detected that BMPR1B was upreg-
type imputation in cells without direct capture of JAK2 (Extended Data ulated and FRY was downregulated in mutant HSCs (Extended Data
Fig. 9d). Pt-08, Pt-09 and Pt-10 lacked variants with sufficient coverage Fig. 10j,k). Thus, mitochondrial-variant-based genotype imputation
(Methods) and, of the remaining seven samples, four had low-level allows for linking epigenetic changes driven by somatic mutations
heteroplasmy and two had variants common to all cells (Extended with gene expression.
Data Fig. 9e). Only Pt-02 had two mitochondrial variants (12,786 A>G MkP cluster analysis identified CD36—a platelet membrane glyco-
and 3,834 G>A) at high heteroplasmy in JAK2V617F cells, with low hetero- protein receptor for thrombospondin-154 that facilitates megakaryo-
plasmy in WT cells (Fig. 5b). Both variants co-occurred with the JAK2V617F cyte maturation through fatty acid uptake55—as a top upregulated
mutation, but were mutually exclusive, suggesting early acquisition gene in JAK2V617F-mutated cells in both gene accessibility and gene
after the JAK2V617F mutation. expression (Extended Data Fig. 10l). Leveraging the multimodal

Nature | www.nature.com | 7
Article
capture, we compared WT and JAK2V617F-mutated cells along mega- 6. Mullally, A. et al. Physiological Jak2V617F expression causes a lethal myeloproliferative
neoplasm with differential effects on hematopoietic stem and progenitor cells. Cancer
karyocyte differentiation for CD36 features: associated domain of Cell 17, 584–596 (2010).
open regulatory chromatin (DORC), RNA expression and protein 7. Gerritsen, M. et al. RUNX1 mutations enhance self-renewal and block granulocytic
levels. JAK2V617F-mutated progenitors displayed an early increase in differentiation in human in vitro models and primary AMLs. Blood Adv. 3, 320–332
(2019).
the CD36 DORC, followed by increased gene and later protein levels 8. Jaiswal, S. et al. Age-related clonal hematopoiesis associated with adverse outcomes.
for CD36 compared to WT (Fig. 5f). We validated cell-surface protein N. Engl. J. Med. 371, 2488–2498 (2014).
expression in GoT–ChA–ASAP-processed samples, finding increased 9. Levine, R. L. et al. Activating mutation in the tyrosine kinase JAK2 in polycythemia vera,
essential thrombocythemia, and myeloid metaplasia with myelofibrosis. Cancer Cell 7,
CD36 protein expression in mutated MkPs across untreated, but not 387–397 (2005).
ruxolitinib-treated, samples (Extended Data Fig. 10m). These results 10. Kralovics, R. et al. A gain-of-function mutation of JAK2 in myeloproliferative disorders.
point to JAK2V617F as a driver of epigenetic deregulation of CD36 with N. Engl. J. Med. 352, 1779–1790 (2005).
11. James, C. et al. A unique clonal JAK2 mutation leading to constitutive signalling causes
concomitant increase in CD36 transcript and protein levels, which may polycythaemia vera. Nature 434, 1144–1148 (2005).
support MkP differentiation. Thus, mitochondrial-variant-based classi- 12. Baxter, E. J. et al. Acquired mutation of the tyrosine kinase JAK2 in human myeloproliferative
fiers for somatic mutations built on GoT–ChA–ASAP data allow for the disorders. Lancet 365, 1054–1061 (2005).
13. Panteli, K. E. et al. Serum interleukin (IL)-1, IL-2, sIL-2Ra, IL-6 and thrombopoietin
extension of multimodal capacity to DOGMA-seq, capturing genotypes, levels in patients with chronic myeloproliferative diseases. Br. J. Haematol. 130, 709–715
chromatin accessibility, gene expression and protein expression in (2005).
the same single cell. 14. Jamieson, C. H. M. et al. The JAK2 V617F mutation occurs in hematopoietic stem cells in
polycythemia vera and predisposes toward erythroid differentiation. Proc. Natl Acad. Sci.
USA 103, 6224–6229 (2006).
15. Giustacchini, A. et al. Single-cell transcriptomics uncovers distinct molecular signatures
Discussion of stem cells in chronic myeloid leukemia. Nat. Med. 23, 692–702 (2017).
16. Rodriguez-Meira, A. et al. Unravelling intratumoral heterogeneity through high-sensitivity
GoT–ChA delivers droplet-based, high-throughput joint capture of single-cell mutational analysis and parallel RNA sequencing. Mol. Cell 73, 1292–1305
genotypes and chromatin accessibility, with multiplexing capabilities (2019).
for targeting multiple loci and the ability to be integrated with protein 17. Rodriguez-Meira, A., O’Sullivan, J., Rahman, H. & Mead, A. J. TARGET-seq: a protocol for
high-sensitivity single-cell mutational analysis and parallel RNA sequencing. STAR Protoc.
and mitochondrial DNA capture, enabling linkage of somatic genotypes 1, 100125 (2020).
to a variety of signals in single cells (Supplementary Discussion). GoT– 18. van Galen, P. et al. Single-cell RNA-seq reveals AML hierarchies relevant to disease
progression and immunity. Cell 176, 1265–1281 (2019).
ChA obviates the limiting dependencies on mutated locus expression
19. Nam, A. S. et al. Somatic mutations and cell identity linked by genotyping of transcriptomes.
and location, and compatibility with nuclei allows for potential applica- Nature 571, 355–360 (2019).
tion to archived frozen solid tissues, critical for the emerging field of 20. Morita, K. et al. Clonal evolution of acute myeloid leukemia revealed by high-throughput
single-cell genomics. Nat. Commun. 11, 5327 (2020).
clonal mosaicism across human tissues56–58.
21. Miles, L. A. et al. Single-cell mutation analysis of clonal evolution in myeloid malignancies.
Analysis of clonal expansions in primary human samples from Nature 587, 477–482 (2020).
patients with PV and MF revealed that the epigenetic consequences 22. Van Egeren, D. et al. Reconstructing the lineage histories and differentiation trajectories
of individual cancer cells in myeloproliferative neoplasms. Cell Stem Cell 28, 514–523
of the JAK2V617F mutation are highly cell-state dependent. Previous
(2021).
research has linked JAK2-mediated activation of STAT1 and STAT3 23. Van Egeren, D. et al. Transcriptional differences between JAK2-V617F and wild-type bone
to increased NF-κB signalling in mouse models43,44 and highlighted marrow cells in patients with myeloproliferative neoplasms. Exp. Hematol. 107, 14–19
(2022).
cell-extrinsic effects of the microenvironment. Here we show that the
24. Turkalj, S. et al. GTAC enables parallel genotyping of multiple genomic loci with
JAK2V617F-mutant HSCs display epigenetic profiles that are consistent chromatin accessibility profiling in single cells. Cell Stem Cell 30, 722–740 (2023).
with cell-intrinsic pro-inflammatory phenotypes, opening a route 25. Mackinnon, R. N. et al. Genome organization and the role of centromeres in evolution of
the erythroleukaemia cell line HEL. Evol. Med. Publ. Health 2013, 225–240 (2013).
for potential combined therapeutic strategies for mutant-specific
26. Stuart, T., Srivastava, A., Madad, S., Lareau, C. A. & Satija, R. Single-cell chromatin state
targeting, aimed at both JAK2V617F constitutive activation as well as analysis with Signac. Nat. Methods 18, 1333–1341 (2021).
pro-inflammatory signalling in mutant HSCs. Together, our data from 27. Mustjoki, S. et al. JAK2V617F mutation and spontaneous megakaryocytic or erythroid
colony formation in patients with essential thrombocythaemia (ET) or polycythaemia vera
MF, PV and CH samples suggest that JAK2V617F-mediated inflammation (PV). Leuk. Res. 33, 54–59 (2009).
and fibrosis result from a complex interplay between cell-extrinsic16,43,44 28. Schieber, M., Crispino, J. D. & Stein, B. Myelofibrosis in 2019: moving beyond JAK2
and cell-intrinsic effects that vary across different progenitor popula- inhibition. Blood Cancer J. 9, 74 (2019).
29. Pardanani, A. & Tefferi, A. Definition and management of ruxolitinib treatment failure in
tions and disease stages. myelofibrosis. Blood Cancer J. 4, e268 (2014).
We anticipate that GoT–ChA will serve as a foundation for broad 30. Cervantes, F. et al. Three-year efficacy, safety, and survival findings from COMFORT-II, a
future explorations to uncover the critical links between somatic muta- phase 3 study comparing ruxolitinib with best available therapy for myelofibrosis. Blood
122, 4047–4053 (2013).
tions and epigenetic deregulation in malignant and non-malignant 31. Mondet, J., Hussein, K. & Mossuz, P. Circulating cytokine levels as markers of inflammation
contexts. in philadelphia negative myeloproliferative neoplasms: diagnostic and prognostic interest.
Mediators Inflamm. 2015, 670580 (2015).
32. Tefferi, A. et al. Circulating interleukin (IL)-8, IL-2R, IL-12, and IL-15 levels are independently
prognostic in primary myelofibrosis: a comprehensive cytokine profiling study. J. Clin.
Online content Oncol. 29, 1356–1363 (2011).
Any methods, additional references, Nature Portfolio reporting summa- 33. Verstovsek, S. et al. A double-blind, placebo-controlled trial of ruxolitinib for myelofibrosis.
N. Engl. J. Med. 366, 799–807 (2012).
ries, source data, extended data, supplementary information, acknowl- 34. Vukotić, M. et al. Inhibition of proinflammatory signaling impairs fibrosis of bone marrow
edgements, peer review information; details of author contributions mesenchymal stromal cells in myeloproliferative neoplasms. Exp. Mol. Med. 54, 273–284
and competing interests; and statements of data and code availability (2022).
35. Dunbar, A. J. et al. CXCL8/CXCR2 signaling mediates bone marrow fibrosis and is a
are available at https://doi.org/10.1038/s41586-024-07388-y. therapeutic target in myelofibrosis. Blood 141, 2508–2519 (2023).
36. Hu, W.-H. et al. NIBP, a novel NIK and IKKβ-binding protein that enhances NF-κB activation.
J. Biol. Chem. 280, 29233–29241 (2005).
1. Corces, M. R. et al. Lineage-specific and single-cell chromatin accessibility charts human 37. Jeanpierre, S. et al. The quiescent fraction of chronic myeloid leukemic stem cells
hematopoiesis and leukemia evolution. Nat. Genet. 48, 1193–1203 (2016). depends on BMPR1B, Stat3 and BMP4-niche signals to persist in patients in remission.
2. Buenrostro, J. D. et al. Single-cell chromatin accessibility reveals principles of regulatory Haematologica 106, 111–122 (2021).
variation. Nature 523, 486–490 (2015). 38. Wu, Y. et al. The prognostic value of matrix metalloproteinase-7 and matrix
3. Ma, S. et al. Chromatin potential identified by shared single-cell profiling of RNA and metalloproteinase-15 in acute myeloid leukemia. J. Cell. Biochem. 120, 10613–10624
chromatin. Cell 183, 1103–1116 (2020). (2019).
4. Izzo, F. et al. DNA methylation disruption reshapes the hematopoietic differentiation 39. Ikeda, M., Chiba, S., Ohashi, K. & Mizuno, K. Furry protein promotes aurora A-mediated
landscape. Nat. Genet. 52, 378–387 (2020). Polo-like kinase 1 activation. J. Biol. Chem. 287, 27670–27681 (2012).
5. Nam, A. S. et al. Single-cell multi-omics of human clonal hematopoiesis reveals that 40. Komorowska, K. et al. Hepatic leukemia factor maintains quiescence of hematopoietic
DNMT3A R882 mutations perturb early progenitor states through selective stem cells and protects the stem cell pool during regeneration. Cell Rep. 21, 3514–3523
hypomethylation. Nat. Genet. 54, 1514–1526 (2022). (2017).

8 | Nature | www.nature.com
41. Ficara, F. et al. Pbx1 restrains myeloid maturation while preserving lymphoid potential in 52. Mimitou, E. P. et al. Scalable, multimodal profiling of chromatin accessibility,
hematopoietic progenitors. J. Cell Sci. 126, 3181–3191 (2013). gene expression and protein levels in single cells. Nat. Biotechnol. 39, 1246–1258
42. Ficara, F., Murphy, M. J., Lin, M. & Cleary, M. L. Pbx1 regulates self-renewal of long-term (2021).
hematopoietic stem cells by maintaining their quiescence. Cell Stem Cell 2, 484–496 53. Baum, C. M., Weissman, I. L., Tsukamoto, A. S., Buckle, A. M. & Peault, B. Isolation of a
(2008). candidate human hematopoietic stem-cell population. Proc. Natl Acad. Sci. USA 89,
43. Kleppe, M. et al. JAK-STAT pathway activation in malignant and nonmalignant cells 2804–2808 (1992).
contributes to MPN pathogenesis and therapeutic response. Cancer Discov. 5, 316–331 54. Asch, A. S., Barnwell, J., Silverstein, R. L. & Nachman, R. L. Isolation of the thrombospondin
(2015). membrane receptor. J. Clin. Invest. 79, 1054–1061 (1987).
44. Kleppe, M. et al. Dual targeting of oncogenic activation and inflammatory signaling 55. Valet, C. et al. Adipocyte fatty acid transfer supports megakaryocyte maturation. Cell Rep.
increases therapeutic efficacy in myeloproliferative neoplasms. Cancer Cell 33, 785–787 32, 107875 (2020).
(2018). 56. Mustjoki, S. & Young, N. S. Somatic mutations in “benign” disease. N. Engl. J. Med. 384,
45. Dunbar, A. J. et al. Jak2V617F reversible activation shows its essential requirement in 2039–2052 (2021).
myeloproliferative neoplasms. Cancer Discov. https://doi.org/10.1158/2159-8290.CD-22- 57. Martincorena, I. et al. Somatic mutant clones colonize the human esophagus with age.
0952 (2024). Science 362, 911–917 (2018).
46. Wernig, G. et al. Unifying mechanism for different fibrotic diseases. Proc. Natl Acad. Sci. 58. Martincorena, I. et al. Tumor evolution. High burden and pervasive positive selection of
USA 114, 4757–4762 (2017). somatic mutations in normal human skin. Science 348, 880–886 (2015).
47. Burda, P., Laslo, P. & Stopka, T. The role of PU.1 and GATA-1 transcription factors during
normal and leukemogenic hematopoiesis. Leukemia 24, 1249–1257 (2010). Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
48. Zhang, P. et al. Negative cross-talk between hematopoietic regulators: GATA proteins published maps and institutional affiliations.
repress PU.1. Proc. Natl Acad. Sci. USA 96, 8705–8710 (1999).
49. Basak, A. & Sankaran, V. G. Regulation of the fetal hemoglobin silencing factor BCL11A. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this
Ann. N. Y. Acad. Sci. 1368, 25–30 (2016). article under a publishing agreement with the author(s) or other rightsholder(s); author
50. Sankaran, V. G. et al. Human fetal hemoglobin expression is regulated by the developmental self-archiving of the accepted manuscript version of this article is solely governed by the
stage-specific repressor BCL11A. Science 322, 1839–1842 (2008). terms of such publishing agreement and applicable law.
51. Hoffman, R. et al. Fetal hemoglobin in polycythemia vera: cellular distribution in 50
unselected patients. Blood 53, 1148–1155 (1979). © The Author(s), under exclusive licence to Springer Nature Limited 2024

Nature | www.nature.com | 9
Article
Methods (2) During the post-GEM incubation clean-up (step 3.2), 45.5 µl of
elution solution I is used to elute material from SPRIselect beads.
Cell lines A total of 5 μl is used for GoT–ChA library construction, and the
Human CA46 (ATCC, CRL-1648), HEL (ATCC, TIB-180), SET-2 (DSMZ, remaining 40 µl is used for ATAC library construction as indicated
ACC 608), CCRF-CEM (ATCC, CRM-CCL-119), OCI-AML3 (provided by in the standard protocol.
C. Park), HEPG2 with the FOXO1S22W mutation (provided by P. Campbell) (3) To generate the GoT–ChA library, two additional PCRs were per-
and SUM159 (a gift from Y. Yarden) cell lines were maintained according formed on the 5 µl set aside during step 3.2 (Extended Data Fig. 1c).
to standard procedures in RPMI-1640 (Thermo Fisher Scientific, 11-875- The first PCR aims to amplify genotyping fragments before sample
119) or in DMEM (HEPG2, Thermo Fisher Scientific, 11965092) with indexing and uses P5 (binds to the P5 Illumina sequencing handle:
10% (or 20% for CA46 and SET-2 cells) FBS (Thermo Fisher Scientific, AATGATACGGCGACCACCGAGATCTACAC) and GoT–ChA_nested
10-437-028) at 37 °C with 5% CO2. Cell lines in culture were screened (a nested, biotinylated, locus-specific primer with a partial TruSeq
biweekly for mycoplasma contamination using the MycoAlert PLUS small RNA read 2 handle: /5BiosG/CCTTGGCACCCGAGAATTCCA,
Mycoplasma Detection Kit (Lonza, LT07-703). 20–22 bp locus specific) primers with the following thermocycler
program: 95 °C for 3 min; 15 cycles of 95 °C for 20 s, 65 °C for 30 s
Patient samples and 72 °C for 20 s; followed by 72 °C for 5 min and ending with hold
This study was approved by the local ethics committee and by the Insti- at 4 °C. After a 1.2× SPRIselect clean-up, biotinylated PCR prod-
tutional Review Boards of Weill Cornell Medicine, Memorial Sloan Ket- uct is bound and isolated using Dynabeads M-280 Streptavidin
tering Cancer Center and the Icahn School of Medicine at Mount Sinai, magnetic beads (Thermo Fisher Scientific, 11206D). In brief, the
and was conducted in accordance with the Declaration of Helsinki pro- beads are washed three times with 1× sodium chloride sodium
tocol. Either fresh peripheral blood or cryopreserved mononuclear cells phosphate-EDTA buffer (SSPE, VWR, VWRV0810-4L), added to
isolated from bone marrow biopsies or peripheral blood from patients the purified PCR product and incubated at room temperature for
with JAK2V617F mutations were retrieved after a database search (Sup- 15 min. The beads are then washed twice with 1× SSPE buffer and
plementary Table 3). Samples were selected to contain only JAK2V617F once with 10 mM Tris-HCl (pH 8.0) before resuspending in water.
mutations, as assessed by MSK-IMPACT, RainDance or Neogenomics The bead-bound fragments are then amplified and sample indexed
haematology panels, the sole exception being Pt-08. Fresh peripheral using P5 and RPI-X (binds to the partial TruSeq small RNA read 2
blood mononuclear cells were isolated within 24 h of blood collection handle and adds a sample index and P7 Illumina sequencing handle:
using a Ficoll (Thermo Fisher Scientific, 45-001-750) gradient according CAAGCAGAAGACGGCATACGAGATXXXXXXXXGTGACTGGAGTTC
to the manufacturer’s recommendations, and were either processed CTTGGCACCCGAGAATTCCA, where X denotes a user-defined
immediately or cryopreserved for future experiments. Isolated mono- sample index) primers with the following thermocycler program:
nuclear cells from peripheral blood or bone marrow biopsies were 95 °C for 3 min; 6–10 cycles of 95 °C for 20 s, 65 °C for 30 s and 72 °C
thawed if cryopreserved and stained according to standard procedures, for 20 s; followed by 72 °C for 5 min and ending with hold at 4 °C.
beginning with resuspension in staining buffer (BioLegend, 420201)
and incubation with Human TruStain FxC (10 min at 4 °C; BioLegend, Final libraries were quantified using the Qubit dsDNA HS Assay Kit
422302) to block Fc receptor-mediated binding. Cells were then stained (Thermo Fisher Scientific, Q32854) and the High Sensitivity DNA chip
with a CD34-PE-Vio770 antibody (20 min at 4 °C; Miltenyi Biotec, AC136, (Agilent Technologies, 5067-4626) run on a Bioanalyzer 2100 system
130-113-180; 1:50 for up to 106 cells in a final volume of 100 µl) and DAPI (Agilent Technologies) and sequenced on a NovaSeq 6000 system
(Invitrogen, D1306). The samples were then sorted for DAPI−CD34+ cells at the Weill Cornell Medicine Genomics Resources Core Facility with
using THE BD Influx cell sorter (Supplementary Fig. 3). For validation of the following parameters: paired-end 50 cycles; read 1N, 50 cycles;
fetal haemoglobin expression, CD71 (CD71-FITC; BioLegend, 334104) i7 index, 8 cycles; i5 index, 16 cycles; read 2N,50 cycles. ATAC librar-
and HbF (HbF-PE; Invitrogen, MHFH04) antibodies were used (5 µl per ies were sequenced to a depth of 25,000 read pairs per nucleus and
million cells in 100 µl staining volume). GoT–ChA libraries were sequenced to 5,000 read pairs per nucleus.
A list of the primer sequences used in this study is provided in Sup-
scATAC–seq with GoT–ChA plementary Table 1.
Cells were subjected to nucleus isolation according to the Nuclei Iso-
lation for Single Cell ATAC Sequencing protocol (version CG000169 Multiplex-adapted GoT–ChA
Rev D, 10x Genomics). In brief, cells were resuspended with lysis buffer Multiplex-adapted GoT–ChA follows the same sample preparation
(10 mM Tris-HCL (pH 7.4), 10 mM NaCl, 3 mM MgCl2, 0.1% Tween-20, protocol as described above. Additional modifications incorporated
0.1% nonidet P40 substitute, 0.01% digitonin, 1% BSA) and incubated in GoT–ChA protocol for multiplexing are as follows:
on ice (3 min for patient samples, 5 min for cell lines), followed by addi- (1) An IS2 handle was added to GoT–ChA_Rev primers (locus-specific
tion of chilled wash buffer (10 mM Tris-HCL (pH 7.4), 10 mM NaCl, 3 mM primer with an IS2 handle sequence: AGCAAGTGAGAAGCATCGT-
MgCl2, 1% BSA, 0.1% Tween-20) and centrifugation to pellet isolated GTC, 20–22 bp locus specific).
nuclei. Nuclei were then resuspended in 1× diluted nucleus buffer (10x (2) During the GEM generation and barcoding reaction (step 2.1), 1 µl
Genomics) and counted using trypan blue and a Countess II FL Auto- of 22.5 µM GoT–ChA primer mix for all targets was added to the
mated Cell Counter. barcoding reaction mixture. With more than three targets, 22.5 µM
Nuclei were subsequently processed according to the Chromium GoT–ChA primer mixes of two targets were made and 1 μl for each
Next GEM Single Cell ATAC Solution user guide (version CG000209 pair of targets was added to minimize the added volume.
Rev F, 10x Genomics) with the following modifications: (3) During GEM incubation (step 2.5), the annealing step (59 °C)
(1) During the GEM generation and barcoding reaction (step 2.1), 1 µl time was increased from 30 s to 2 min:30 s for the first 6 cycles of
of 22.5 µM GoT–ChA primer mix was added to the barcoding reac- in-droplet PCR. This resulted in an increased genotyping efficiency
tion mixture. The primers used are GoT–ChA_R1N (locus-specific of each individual target and in an increased proportion of cells
primer with a read 1N handle sequence: TCGTCGGCAGCGTCAGAT- with more than one locus captured (Fig. 1j).
GTGTATAAGAGACAG, 20–22 bp locus specific) and GoT–ChA_Rev (4) To generate the GoT–ChA libraries, one additional PCR was per-
(20-22 bp locus-specific primer). These primers allow for exponen- formed on the 5 µl set aside during step 3.2, before nested and
tial amplification of the GoT–ChA fragments relative to the linear sample indexing PCRs. This PCR aims to amplify all genotyping
amplification of ATAC fragments. fragments and uses P5 (binds to the P5 Illumina sequencing handle:
AATGATACGGCGACCACCGAGATCTACAC) and IS2 (binds to the IS2 input into the GoT–ChA pre-processing pipeline designed to result in
handle: AGCAAGTGAGAAGCATCGTGTC) primers with the following a genotype-per-cell output. These functions are publicly available as
thermocycler program: 95 °C for 3 min; 8 cycles of 95 °C for 20 s, an R package (see the ‘Code availability’ section).
65 °C for 30 s and 72 °C for 20 s; followed by 72 °C for 5 min and As a first step, input FASTQ files are split into a user-defined n
ending with hold at 4 °C. After 1.2× SPRIselect clean up, the PCR reads files to allow for parallelized downstream processing through
product was equally divided for separate GoT–ChA library prepa- the FastqSplit function. Next, the FastqFiltering function takes each
ration for each target. The two PCRs, biotin binding and clean-ups newly generated split FASTQ file and identifies read pairs that do not
were performed as described in the single-target GoT–ChA protocol pass a set of user-defined parameters for base quality filtering. Default
above with one modification of reducing the number of cycles for usage was designed to identify poor-quality bases at or surround-
each heminested PCR (with P5 and GoT–ChA_nested primers) from ing a single-nucleotide variant site, although the function includes
15 to 8. parameters to easily adjust for filtering of all base pairs in paired read
sequences or of only a single read for each pair. After quality filtering,
scASAP–seq with GoT–ChA the BatchMutationCalling function first identifies whether a read con-
The samples were processed in a similar manner to that described tains the expected sequence of the nested primer used during library
previously for standard scATAC–seq, with a few key differences as construction through pattern matching. If so, then each read’s paired
described by the original authors52. Additional minor modifications cell barcode is matched to a provided whitelist, considering a maxi-
for incorporation of GoT–ChA into ASAP–seq are as follows: mum Hamming distance of two. All reads that pass these criteria are
(1) During the GEM generation and the barcoding reaction (step 2.1), then assessed for whether the read contains a specified WT or mutant
1 µl of 22.5 µM GoT–ChA primer mix is added to the barcoding reac- sequence at the specified position. This process is performed for each
tion mixture, just as was done for scATAC–seq. read in each split and filtered FASTQ file and can be run via parallelized
(2) During the post-GEM incubation clean-up (step 3.2), 45.5 µl of elu- processing through a slurm workload manager, ultimately outputting a
tion solution I is used to elute material from SPRIselect beads. A cell barcode by genotyped reads matrix for each FASTQ file processed.
total of 5 µl is used for GoT–ChA library construction, and the re- Finally, the MergeMutationCalling function combines all matrices gen-
maining 40 µl is used for ATAC library construction as indicated in erated from each split FASTQ file and merges them together grouping
the standard protocol. GoT–ChA library construction proceeded by cell barcode and summarizing the counts of reads that were identi-
as described above. fied as WT, mutant or neither. The BatchMutationCalling function
allows to run this step using parallelization through a slurm cluster.
Final libraries were quantified using the Qubit dsDNA HS Assay Kit The summarized genotyping data must then be integrated with the
(Thermo Fisher Scientific, Q32854) and the High Sensitivity DNA chip chromatin accessibility information through shared cell barcodes.
(Agilent Technologies, 5067-4626) run on the Bioanalyzer 2100 system This is achieved using the AddGenotyping function, which is compat-
(Agilent Technologies) and sequenced on the NovaSeq 6000 system ible with the Signac26 and ArchR59 scATAC–seq pipelines, and results in
with the following parameters: paired-end 50 cycles; read 1N, 50 cycles; addition of the genotyping read counts to the object metadata.
i7 index, 8 cycles; i5 index, 16 cycles; read 2N, 50 cycles. ATAC librar-
ies were sequenced to a depth of 25,000 read pairs per cell, and both Background correction and genotype assignment. Once read counts
GoT–ChA and any protein tag libraries were sequenced to 5,000 read per cell corresponding to the WT and MUT alleles are generated, we
pairs per cell. then require a method to label these cells with their corresponding bio-
Protein expression measurements were performed using TotalSeq-A logical genotype. Given the high level of PCR amplification needed to
reagents from BioLegend (1 µl each antibody per 106 cells in a final capture in single cells the 1–2 molecules of DNA containing the targeted
volume of 100 µl) according to the manufacturer’s recommendations. locus, the data are vulnerable to various sources of noise such as PCR
The following surface markers were assayed: CD34 (343537), CD38 errors and PCR recombination, aggravated by the exponential amplifi-
(356635), CD90 (328135), CD49f (313633), CD45RA (304157), CD41 cation bias inherent to PCR. For example, the total observed read counts
(303737), CD36 (336225), CD69 (310947), CD9 (312119), CD71 (334123), for WT do not correspond to the number of WT alleles. Rather, they are
CD99 (371317), CD184 (306531), HLA-DR (307659), CD134 (350033), defined by the cycle of PCR in which the targeted locus is captured, as
CD48 (336709), CD52 (316017), CD135 (313317), CD47 (323129), CD7 well as the sequencing depth. Thus, if an allele is captured earlier on
(343123), CD56 (362557) and CD45 (368543). in the PCR process, exponential amplification would then result in
inflated read counts. Moreover, ambient contamination of fragments
GoT–ChA primer design between neighbouring cells can influence the null distribution of reads
GoT–ChA_R1N (primer with a read 1N handle sequence: TCGTCG- for each allele. To account for the potential sources of noise, we devel-
GCAGCGTCAGATGTGTATAAGAGACAG, 20–22 bp locus specific) and oped two alternative approaches to quantify and correct for potential
GoT–ChA_Rev (primer with an IS2 handle sequence: AGCAAGTGAGAA- background noise in the genotyping data. Our first approach (empty
GCATCGTGTC, 20–22 bp locus specific) primers were designed around droplet-based) uses the genotyping information present in empty
the locus of interest with desired PCR product length in the approxi- droplets generated during the 10x run (that is, barcodes that do not
mate range of 150–500 bp. GoT–ChA_nested (a nested, biotinylated, contain a cell, and therefore yield few to zero ATAC–seq fragments). Our
locus specific primer with a TruSeq small RNA read 2 handle: /5BiosG/ second approach, (cluster-based) leverages the presence of a bimodal
CCTTGGCACCCGAGAATTCCA, 20–22 bp locus specific) primers were distribution of genotyping reads in cells, representing successful versus
designed in reverse position to and at least 100 bp downstream from unsuccessful genotyping. While the cluster-based method is used by
the respective GoT–ChA_R1N primers (Extended Data Fig. 1a,b). Primers default, both methods are described in the sections below.
were designed such that the mutation site was within 50 bp from either
read 1N or TruSeq small RNA read 2 handles. A list of the GoT–ChA prim- Empty droplet-based background noise correction and genotype
ers used in this study is provided in Supplementary Table 1. assignment. To estimate the background noise present in the geno-
typing data, we leveraged the presence of empty droplets obtained
GoT–ChA genotyping data processing from every 10x run, as has previously been done for noise correction in
Initial read processing and genotype calling. Raw sequencing data single-cell protein expression60. First, the background noise is estimat-
for GoT–ChA libraries were demultiplexed using cellranger-ATAC ed for either WT or mutant reads independently. Given the zero-inflated
mkfastq. The GoT–ChA sequencing FASTQ files were then used as distribution of genotyping reads present in empty droplets and to avoid
Article
the potential presence of outlier values (that is, a droplet that contains a z − score = [X –E (background)]/[sqrt(Var(background)–Var[Y ])]
cell but was assigned as empty), we estimate the value of the background
noise as that of the 99th percentile of the read number distribution for The distribution of the background noise is computed as the KDE
each genotype independently. Once the background noise is quantified, of the values of X + Y below the previously computed threshold. We
we proceed to subtract the value for each genotype read count from the can then plot the z scores for each WT and MUT distributions in two
barcodes containing real cells. Moreover, cells are required to contain a dimensions (WT z scores and mutated z scores), with the thresholds
minimum number of genotyping reads (>250 after background subtrac- dividing the data into four quadrants as shown in Extended Data
tion). This procedure can be performed by using the AddGenotyping Fig. 1n. The resulting quadrants are then easily interpretable: cells in
function followed by the FilterGenotyping function, both available in the bottom left quadrant have low z scores for both WT and MUT and
the Gotcha package (see the ‘Code availability’ section). as such are non-genotyped. The bottom right quadrant corresponds
to homozygous WT cells, the top left is homozygous MUT, and the top
Cluster-based background noise correction and genotype assign- right consists of heterozygous cells. Crude genotype calls can in fact
ment. An alternative approach that results in higher genotyping effi- be made at this stage and these labels are included in the final output.
ciency and includes the ability to detect heterozygous mutated cells However, as straight decision thresholds are rarely optimal, we pro-
was informed by pre-existing approaches for normalization of CITE-seq ceeded with an alternative approach based on semi-supervised learn-
data and hashtag oligo (HTO) demultiplexing such as DSB60 and HT- ing. To do this, we first remove the genotype label of data points within
ODemux61, respectively. The basic assumption is that, in our dataset, 5% of the axis range of the previously defined quadrant boundaries. To
we may encounter populations of homozygous WT, homozygous MUT relabel them, we used a semi-supervised k-nearest neighbours (k-NN)
and heterozygous cells. Furthermore, we anticipate a population of cells classifier. With k = sqrt(N), the classifier builds a k-NN graph using all
for which no genotyping call can be made, due to experimental con- the data and iteratively assigns unlabelled points based on the labels
straints on capture. These cells may still have non-zero read counts for of their neighbours. The edges of the k-NN graph were weighted by
each allele, representing the level of background signal for each allele. the distance between neighbours. Given the size of most single-cell
Thus, the data will typically follow a bimodal distribution of supporting datasets, we chose to opt for precision over performance, and forced
reads for either the mutated or WT allele. One mode represents cells the classifier to relabel only the most optimal point each time before
with true capture of the mutated allele and the second mode represents the k-NN graph is recomputed on each iteration.
cells for which reads reflect background noise. In this way, all the previously unlabelled points are labelled by their
In the DSB60 method, the authors use a log-transformation of the pro- closest inferred genotype. The resulting output labelling therefore
tein expression counts, which produces a bimodal distribution whereby tends to better reflect the underlying data distribution and clustering
the lower mode represents the background signal, and the higher mode structure. The pipeline outputs and stores all relevant plots for each
represents the true signal. The log-counts are then z-scored using the sample. It is recommended for the user to inspect plots to assess the
mean and s.d. of the noise distribution, as defined by a Gaussian mixture two possible outputs (quadrant versus cluster-based genotyping) and
model with two components. In several methods for HTO demultiplex- determine which output’s labels fit the distribution of the data better.
ing, the transformed hashtag counts are clustered, and the resulting The default choice should be the cluster outputs, except in the case the
clusters are characterized to produce interpretable labels. We there- assigned clusters span multiple quadrants to a large degree. Impor-
fore sought to combine these two basic steps: normalization based on tantly, as the finding of quadrant thresholds can fall prey to suboptimal
background noise distribution followed by clustering. local minima, the pipeline creates several combinations of outputs
The flowchart depicting the steps of our method is shown in Supple- in addition to the default that the user should inspect and compare.
mentary Fig. 2. All analyses, including estimation of background, are Finally, the output data are stored for cell-barcode-matched integration
performed only on the set of barcodes representing captured cells as into the scATAC–seq metadata using the AddGenotype function for
defined by the ATAC library profile (see the ‘ATAC–seq data processing’ downstream analysis. The genotype labelling analysis was performed in
section below). Each feature (WT and MUT read counts) is normalized Python, using the packages pandas (v.1.4.1), numpy (v.1.21.5), matplotlib
independently. First, the read counts are log-transformed with a pseudo- (v.3.5.1), seaborn (v.0.11.2), scipy (v.1.7.3) and sklearn (v.1.0.2). Detailed
count of 1, resulting in a bimodal distribution. We then define a probabil- documentation of the method and plots at each step are available (see
ity density function (PDF) on these data for decomposition background the ‘Code availability’ section).
signal and true signal distributions. For this purpose, we use a kernel
density estimation (KDE) with a Gaussian kernel and a fixed bandwidth. scATAC–seq data processing
Next, the value defining the threshold between background and sig- Cell line mixing data processing and analysis. Raw sequencing data
nal distributions is defined as the lowest point in the KDE between the for ATAC–seq libraries were demultiplexed using cellranger-ATAC
two modes with the greatest widths, as calculated using the scipy.signal. (v2.0.0) mkfastq. ATAC–seq reads were then aligned to the hg38 ref-
find.peaks function in Python. The KDE is decomposed by computing erence genome using cellranger-ATAC count. Fragment files generated
KDEs separately on the data above or below the defined boundary, by cellranger-ATAC were used as input for processing through either the
where the weights of each PDF component are defined by the propor- ArchR59 (v.1.0.0) pipeline or Signac26 (v.1.7.0) for downstream analysis.
tion of the data above or below the threshold. The mean and s.d. of the On the basis of barcode quality control, a minimum transcription start
inferred distribution of the measurement of background noise are then site (TSS) enrichment score of 5 and a minimum number of unique
computed by integration. fragments of 20,000 was set based on data distribution. For ArchR pro-
To compute z scores, we then go back to the raw, log-transformed cessing, potential doublets were identified by the addDoubletScores
counts, and subtract the mean and divide by the s.d. of the background function and removed using the filterDoublets function. Initial dimen-
noise distribution estimated from the PDF. Importantly, as opposed to sionality reduction was performed through iterative LSI using the cell by
the Gaussian mixture model approach, this approach does not force genomic bin matrix (bin size = 500 bp) with the following parameters:
the inferred background noise and signal distributions to be Gaussian. iterations = 4, resolution = 0.1-4, sampleCells = 10,000, n.start = 10.
In fact, the observed background noise across multiple samples shows For Signac processing, doublet identification was performed using
a positive skewed distribution. Thus, this approach is more broadly AMULET62, and barcodes with FDR < 0.05 were called as multiplets
generalizable to unseen datasets. Below is the formulation of the final and filtered out from downstream analysis. In both ArchR and Signac
z score, where X represents the log transformation of the read counts, processing, cell clustering was performed through the addClusters
with pseudocount 1: function using the Seurat method, with a resolution of 0.5.
Patient data processing, dimensionality reduction and clustering. versus WT cells, followed by application of log2-transformation. When
Raw sequencing data for ATAC–seq libraries were demultiplexed using indicated, imputation of gene accessibility scores was performed using
cellranger-ATAC (v2.0.0) mkfastq. ATAC–seq reads were then aligned to the imputeMatrix function within the ArchR59 package (v.1.0.0). For TF
the hg38 reference genome using the cellranger-ATAC count function. motif accessibility, differences between genotypes were calculated
Fragment files generated by cellranger-ATAC were used as input for the as the difference in the mean z score for each TF for the specified cell
Signac (v.1.7.0) pipeline to generate the cell-by-peak matrices for each cluster and genotype.
patient sample. High-quality barcodes representing captured cells were Pathway enrichment analysis was performed through preranked
identified on the basis of a minimum number of unique fragments of differential gene accessibility scores using the msigdbr (v.7.2.1) and
above 3,000 and a nucleosome score of below 4, based on their distri- fgsea (v.1.12.0) packages. We used the unadjusted P values to rank
bution as shown in Extended Data Fig. 4b,c. For initial dimensionality candidate genes for subsequent gene-set enrichment analysis using
reduction, the cell-by-peak matrix was used as an input for reciprocal the −log10[P value] times the sign based on the direction of change (1 or
LSI as calculated using the Signac26 (v.1.7.0) pipeline. In brief, the term −1), correcting the nominal P values for multiple-hypothesis testing.
frequency–inverse document frequency matrix is calculated followed This rank was then used as the input for a preselected set of Hallmark
by singular value decomposition as implemented in the RunSVD func- pathways into the fgsea function with minSize = 10, maxSize = 1,000
tion. Dimensionality reduction using UMAP was performed with the and nperm = 100,000.
RunUMAP function of the Seurat63 (v.4.0.1) package for LSI components TF motif accessibility correlations were calculated based on the z
1:50. Next, to generate an integrated embedding across patient samples, score matrices estimated using ChromVAR66 (v.1.8.0). Cell-by-motif
we performed unsupervised identification of anchor correspondence matrices were subset based on the TFs of interest and motif–motif cor-
between datasets64 using the FindIntegrationAnchors function. To relations were calculated using the cor base function in R (v.3.6.2). For
generate a common LSI space, we ran the IntegrateEmbeddings func- STAT1 and NFKB1 correlations, the linear model was estimated using
tion followed by the RunUMAP function with the following parameters: the lm base function in R, and F-test parameters were retrieved using
dims = 1:50, min.dist = 0.01 and spread = 0.5 for dimensionality reduc- the summary base R function. Co-accessibility scores for either WT or
tion for visualization in two dimensions. Cell clustering was performed mutant cells for a given cluster were calculated using Cicero67 (v.1.3.9).
using the FindNeighbours function using the first 50 dimensions of the We generated co-accessibility calculations by genotype, generating
integrated LSI space as input followed by the FindClusters function loop files for each group separately. In brief, after specifying the cell
with resolution = 1. Downstream analysis was performed based on the clusters, genotype levels (that is, WT or MUT), subset of peaks of inter-
genotype assignment obtained through the Gotcha pipeline (see the est and the maximum distance between pairs of peaks, co-accessibility
‘GoT–ChA genotyping data processing’ section above). Pseudotime is calculated for each genotype independently after downsampling to
cell ordering was performed in a semi-supervised manner defining the same number of cells in each group. The loop files for each genotype
the HSC cluster as the initial point of differentiation and estimated are provided as an input for plotting and visualization.
using Monocle 365. For calculating the normalized fraction of mutant
cells along pseudotime as defined by the differentiation axis, cells Protein data processing
involved in the differentiation trajectory were divided into quantiles Protein expression was estimated using the antibody derived tag (ADT)
based on their pseudotime values, and the fraction of mutant cells information. ADT FASTQ files were first processed with the Python
within the window was calculated for each sample. The code used for script ASAP_to_kite.py (obtained from the ASAP–seq authors) which
data pre-processing, sample integration and downstream analysis is converts the files into a format similar to the 10x scRNA-seq FASTQ
publicly available (see the ‘Code availability’ section). format, thus enabling analysis using the kallisto68 (v.0.46.0), bustools69
(v.0.39.3) and kite70 (v.0.0.2) frameworks. Protein data were normal-
Differential gene and motif accessibility modelling interpatient ized using the DSBNormalizeProtein function from the DSB60 (v.0.1.0)
variability. Gene accessibility scores were obtained through the ArchR59 package. Statistical comparisons between genotypes were performed
(v.1.0.0) pipeline, and ChromVAR66 (v1.8.0) was used to estimate TF using a LMM followed by a likelihood ratio test, followed by Bonferroni
motif accessibility z scores. For intracluster differential testing of multiple-hypothesis correction.
either gene or TF motif accessibility, we took a statistical approach
allowing to account for potential technical confounders arising from Mitochondrial data pre-processing and variant calling
sample-specific batch effects. To that end, we applied an LMM approach A modified version of the hg38 reference genome with hard-masked
followed by likelihood ratio test as follows: nuclear mitochondrial DNA sequences was generated using the
cellranger-atac (v.2.0.0) mkref function. Sequencing reads were then
m1 = yi ~ gi + (1 ∣ pi ) mapped to the modified reference genome using cellranger-atac
count. Note that most scATAC fragments are not expected to arise
m2 = yi ~ (1 ∣ pi ) from nuclear mitochondrial DNA sequences, so they can be a priori
safely uniquely mapped to the mitochondrial genome by masking their
nuclear paralogues71. Next, the mgatk (v.0.5.0) package function mgatk
anova(m1, m2)
with tenx mode was used to generate base counts at each mitochon-
drial genomic position for each cell passing cell-ranger quality-control
Where yi is the response feature for cell i, represented by either motif or standards. Reads with a mapping quality of lower than 20 were not
gene accessibility scores, gi is the genotype of cell i and pi is the patient considered, as well as bases with a sequencing quality of lower than
sample of origin for cell i, explicitly modelled as a random factor of the 20. Files containing the combined per-cell base count information
LMM model. This model was selected to account for patient-specific for each sample were loaded into R and analysed using custom code
confounders, and to account for variability in cell numbers across and functions developed to identify mitochondrial variants72. In sum-
samples. Raw P values were then adjusted using Benjamini–Hochberg mary, only those variants supported by at least 2 forward and 2 reverse
correction. This analysis can be applied using parallelized computing reads in a minimum of 5 cells, likely heteroplasmic (variance to mean
through the DiffLMM function available within the Gotcha R pack- ratio in log scale higher than −2) and with a high strand concordance
age (see the ‘Code availability’ section). For gene accessibility scores, (correlation between forward and reverse counts across cells higher
accessibility differences were calculated as the ratio of the mean value than 0.65) were considered for downstream analysis71. Some previ-
across the cells for the specified genotype and cell cluster for mutant ously described false-positive calls were further discarded, as well
Article
as those variants present in two or more unrelated samples, as these Use Committees at MSKCC. The animal holding room is maintained
probably arise from technical artifacts. For each variant call, an alter- at 72 ± 2 °F (21.5 ± 1 °C), relative humidity of between 30% and 70%,
native allele frequency—also called heteroplasmy in mitochondrial and all animals are maintained under a 12 h–12 h light–dark cycle
context—was calculated for each cell. However, for those cells show- with access to water and standard chow ad libitum. Veterinary staff
ing less than ten reads, the allele frequency was set to undetermined provided regular monitoring and husbandry care. Jak2V617F knock-in
to favour robust calling of heteroplasmic fractions. The GoT–ChA was carried out using dre mRNA electroporation45. For gene expres-
genotyping information was combined with these results and used to sion analysis, secondary cohorts of lethally irradiated C57BL/6 mice
visualize mitochondrial mutations present in a subgroup or the whole transplanted with UbccreER-Jak2RL bone marrow 8 weeks after trans-
population of cells carrying the GoT–ChA-targeted mutation. In this plant and exhibiting MPN were treated with ruxolitinib (60 mg per
situation, both the nuclear and mitochondrial mutations are referred kg orally twice per day), tamoxifen (100 mg per kg daily ×4 followed
to as being co-occurring. by 80 mg per kg daily of TAM chow ×3) or vehicle (MPN control) for
7 days and then euthanized. Bone marrow cells were isolated from
Cell genotype classifier based on mitochondrial variants limb bones into FACS buffer (phosphate-buffered saline (PBS) + 2%
Mitochondrial mutations co-occurring with the GoT–ChA mutation fetal bovine serum) by centrifugation (8,000 rpm for 1 min). After red
status were used to impute genotyping of cells of which the mutation blood cell lysis (BioLegend), single-cell suspensions were depleted
status could not be determined based on GoT–ChA information. This of lineage-committed haematopoietic cells using the Lineage Cell
was achieved by implementing a supervised learning approach based Depletion Kit according to manufacturer’s protocol (EasySep, StemCell
on a random-forest classifier. For training and testing the classification Technologies). Lineage-depleted bone marrow was then stained with
model, the dataset was downsampled to contain the same number of an antibody cocktail composed of a lineage cocktail (1:100 dilution,
mutant and WT cells to obtain a balanced dataset for training and test- BV421, BioLegend), as well as KIT (BV785; 1:100), SCA1 (D7; 1:500),
ing, and 90% of the cells were assigned to the training set and 10% were FcγRII/III (2.4G2; 1:100) and CD34 (RAM34; 1:100). All of the antibod-
assigned to the test set. The classification model was built by applying ies were purchased from BioLegend. After staining, the samples were
the randomForest function within the R package tidymodels73 (v.0.1.3) washed in FACS buffer and resuspended in FACS buffer with DAPI as a
to the training set, using the heteroplasmy of the in-phase mutations live–dead stain for cell sorting. TdTomato+ (Jak2RL knock-in) or GFP+
as features and WT and mutant genotypes as classes. The number of (Jak2RL knockout) LSKs and MEPs were then sorted on a FACSAria III
decision trees was set to 1,000. The model was then applied to the directly into Trizol LS (Invitrogen) and stored at −80 °C until process-
test set to obtain genotype predictions. The classifier accuracy was ing. RNA was subsequently isolated using the Direct-Zol Microprep Kit
measured as the percentage of correctly classified instances out of (Zymo Research, R2061) according to the manufacturer’s protocol and
all predictions. As the model showed a high accuracy on the test set, quantified using the Agilent High Sensitivity RNA ScreenTape (Agilent,
it was subsequently applied to make genotype imputations for the 5067-5579) on the Agilent 2200 TapeStation. cDNA was generated from
undetermined GoT–ChA calls. 1 ng of input RNA using the SMART-Seq HT Kit (Takara, 634455) at half
reaction volume followed by Nextera XT (Illumina, FC-131-1024) library
Inference of CNVs from scATAC–seq data preparation. cDNA and tagmented libraries were quantified using High
A CNV score was calculated for each cell adapted from a method previ- Sensitivity D5000 ScreenTape (5067-5592) and High Sensitivity D1000
ously described74. Here, instead of using bins of similar GC content as ScreenTape, respectively (5067-5584). Libraries were sequenced on the
a baseline CNV score, we compared the normalized counts in 10 Mb NovaSeq system at the Integrated Genomics Operation (IGO) at MSKCC.
bins (step size of 2 Mb) to normal diploid CD34+ cells. CNV scores were
plotted as a heat map for visualization and hierarchical clustering was DOGMA-seq
performed using the fastcluster (v.1.2.3) and parallelDist (v.0.2.6) R DOGMA-seq was performed using a previously described protocol52.
packages using the default parameters. In brief, sorted CD34+ cells were stained with TotalSeq-A universal
antibody panel V1 (BioLegend, 399907; one vial resuspended in 25 µl,
Cell label prediction based on scRNA-seq reference and bridge according to the manufacturer’s instructions and added to 106 cells in a
integration final volume of 100 µl) along with CD34 and CD90 TotaSeq-A antibody
To orthogonally validate our manually assigned cluster labels, we (BioLegend, 343537 and 328135, respectively; 1 µl each antibody per 106
used method for annotation of scATAC–seq cell clusters using a cells in a final volume of 100 µl). Cells were fixed with 0.1% formaldehyde
scRNA-seq reference dataset (available at Zenodo; https://zenodo. (Thermo Fisher Scientific, 28906) in PBS/RI (PBS supplemented with
org/record/5521512#.YmnDDi-B1uV), by multiome (scATAC–seq plus 0.1% BSA and 0.2 U µl−1 of RNase inhibitor) for 5 min at room tempera-
scRNA-seq) bridge integration through Seurat63 (v.4.1.0). In brief, ture and quenched with glycine solution to a final concentration of
genomic features present in the publicly available multiome data 0.125 M. The fixed cells were treated with 100 μl LLL lysis buffer sup-
(Gene Expression Omnibus: GSE194122) were used to recalculate the plemented with 1 mM DTT and 2 U µl−1 of RNase inhibitor for 3 min on
count matrices from the scATAC–seq query data. A common dimen- ice, followed by washing with 100 μl of chilled LLL wash buffer with 1 mM
sionality reduction space was generated allowing for the creation of DTT and 1 U µl−1 of RNase inhibitor. The cells were diluted in 1× nucleus
a bridge reference, followed by defining bridge anchors between the buffer supplemented with 1 mM DTT and 1 U µl−1 of RNase inhibitor (as
extended bridge reference and the scATAC–seq query. Finally, cells described by 10x Genomics), followed by counting using Trypan Blue
from the query were mapped to the reference cluster labels through and the Countess II FL Automated Cell Counter. The cells were subjected
the extended bridge integration. to further transposition (as suggested by 10x Genomics Multiome kit)
and loaded into the 10x Chip J. The Multiome protocol was followed
JAK2V617F reversible mouse model, RNA-seq and data analysis with the following minor modifications:
Female and male Jak2Rox/Lox (Jak2RL) dre-rox, cre-lox dual-recombinase (1) Spike in 1 μl of 0.2 μM ADT primer during pre-amp step (4.1a).
knock-in/knockout mice (aged 6–8 weeks)45 were crossed to UbccreER (2) Elution of the SPRI-cleaned products in 100 μl of EB instead of 160 μl
tamoxifen-inducible Cre lines and RLTG dual-recombinase reporter (step 4.3k), followed by using 25 μl for ATAC library generation,
lines75,76. All of the mice were housed at Memorial Sloan Kettering 35 μl for cDNA amplification and 35 μl for ADT amplification.
Cancer Center (MSKCC). All animal procedures were completed in (3) The protein tags were sample indexed similar to ASAP–seq cite
accordance with the Guidelines for the Care and Use of Laboratory libraries mentioned above for a total of 10–12 cycles followed by
Animals and were approved by the Institutional Animal Care and SPRI purification (1.6×).
Chromatin accessibility and RNA expression sequencing data were 73. Svetnik, V. et al. Random forest: a classification and regression tool for compound
classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43, 1947–1958 (2003).
processed through the cell-ranger-arc (v.2.0.2) pipeline. 74. Satpathy, A. T. et al. Massively parallel single-cell chromatin landscapes of human immune
cell development and intratumoral T cell exhaustion. Nat. Biotechnol. 37, 925–936 (2019).
scDNA-seq using MissionBio Tapestri 75. Plummer, N. W. et al. Expanding the power of recombinase-based labeling to uncover
cellular diversity. Development 142, 4385–4393 (2015).
Single-cell DNA + protein sequencing was performed using the Tap- 76. Ruzankina, Y. et al. Deletion of the developmentally essential gene ATR in adult mice
estri platform (MissionBio). A custom amplicon panel targeting 45 leads to age-related phenotypes and stem cell loss. Cell Stem Cell 1, 113–126 (2007).
genes with 312 amplicons (which covers the JAK2V617F mutation in the 77. Kozlov, A., Alves, J. M., Stamatakis, A. & Posada, D. CellPhy: accurate and fast probabilistic
inference of single-cell phylogenies from scDNA-seq data. Genome Biol. 23, 37 (2022).
panel) was used in this study. For the surface protein panel, TotalSeq-D
Human Heme Oncology Cocktail, V1.0 (BioLegend, 399906; one vial
per 150,000 cells according to the MissionBio protocol instructions) Acknowledgements R.M.M. is supported by a Medical Scientist Training Program grant from
was also used. In brief, the CD34+ cells were sorted and stained for the the National Institute of General Medical Sciences of the National Institutes of Health under
award number T32GM007739 to the Weill Cornell/Rockefeller/Sloan Kettering Tri-Institutional
surface antibodies, washed twice and resuspended in the cell buffer MD-PhD Program and by the Weill Cornell Medicine NYSTEM Training Program under award
provided in the kit and were loaded on the Tapestri platform for cell number C32558GG. F.I. is supported by the American Society of Hematology Fellow-to-Faculty
Scholar Award number 204377-01. A.J.D. is a William Raveis Charitable Fund Physician-Scientist
encapsulation and subsequent barcoding after the droplet workflow
of the Damon Runyon Cancer Research Foundation (PST-24-19) and has received funding from
(chemistry V1) according to the manufacturer’s instructions. Libraries the American Association of Cancer Research and the American Association of Clinical
were analysed with a High Sensitivity DNA Kit on a Bioanalyzer (Agilent Oncology. E.O.E. was supported by a Medical Scientist Training Program grant from the
National Institute of General Medical Sciences of the National Institutes of Health under award
Technologies) and sequenced on the NovaSeq 6000 instrument (Illu-
number T32GM007739 to the Weill Cornell/Rockefeller/Sloan Kettering Tri-Institutional
mina) according to the manufacturer’s instructions. MD-PhD Program. A subset of biospecimens and data for this work were provided through
the Hematological Malignancies Tissue Bank, which is administered and functions under the
Reporting summary auspices of the NCI-designated Tisch Cancer Institute at the Icahn School of Medicine at Mount
Sinai. R.C. is supported by Lymphoma Research Foundation and Marie Skłodowska-Curie
Further information on research design is available in the Nature Port- fellowships. R.L.L. is supported by a Leukemia & Lymphoma Society Specialized Center of
folio Reporting Summary linked to this article. Research grant and a National Cancer Institute award (P01 CA108671) and the National Institutes
of Health/National Cancer Institute (P50 CA254838-01). B.M. is supported by the National
Heart, Blood and Lung Institute (K08 1K08HL163489-01A1) and the National Cancer Institute
(P01 2P01CA108671-15). D.A.L. is supported by the Burroughs Wellcome Fund Career Award for
Data availability Medical Scientists, Valle Scholar Award, Leukemia Lymphoma Scholar Award and the Mark
Foundation Emerging Leader Award. This work was also supported by the Tri-Institutional Stem
Raw data and processed data files generated from cell lines are avail-
Cell Initiative, the National Heart Lung and Blood Institute (R01HL157387-01A1), the National
able at Gene Expression Omnibus (GEO) as part of the superseries Cancer Institute (R33 CA267219), the National Human Genome Research Institute, Center of
GSE203251. Processed data files generated from patient samples Excellence in Genomic Science (RM1HG011014) and the National Institutes of Health Common
Fund Somatic Mosaicism Across Human Tissues (UG3NS132139). This work was enabled by the
are deposited at GEO as part of the superseries GSE203251. Patient Weill Cornell Flow Cytometry Core. We thank A. Melnick for commenting on the manuscript
raw sequencing data containing genomic sequences generated in and M. Hoare for providing the FOXO1S22W mutant cell line. This work was made possible by
this study have been deposited at the European Genome–Phenome the MacMillan Family Foundation and the MacMillan Center for the Study of the Non-Coding
Cancer Genome at the New York Genome Center.
Archives under accession number EGAS50000000164. The GRCh38
reference genome was used for alignment of single-cell ATAC–seq Author contributions R.M.M., F.I. and D.A.L. conceived the project, devised the research
data (refdata-cellranger-atac-GRCh38-1.2.0) and for DOGMA-seq data strategy and analysed the data. R.M.M., F.I., E.P.M., R.C., P.S. and D.A.L. developed GoT–ChA.
R.M.M., F.I., S.K., T.B. and D.A.L. developed the analytical pipelines for processing GoT–ChA
(refdata-cellranger-arc-GRCh38-2020-A-2.0.0) and are freely available
data. M.S., S.E.G.-B., J.A., R.H., B.M., I.M.G., D.C.C. and O.A.-W. conducted a database search
from the 10x Genomics website (https://support.10xgenomics.com). and retrieved patient samples for experimental use. R.M.M., S.G., L.M. and J.S. performed the
experiments. F.I., R.M.M., S.K., T.P., R.R., E.O.E., T.B. and L.M. performed the computational
analyses. A.J.D. and R.L.B. performed the in vivo mouse bulk RNA-seq experiments. J.M.S. and
D.C.C. collected the samples and performed the CD90 flow cytometry measurements. R.M.M.,
Code availability F.I., C.P. and D.A.L. wrote the manuscript. R.M.M., F.I., S.K., T.P., A.J.D., R.L.B., E.P.M., M.S., R.R.,
The code used for raw data processing and noise correction approaches S.G., L.M., R.H., R.C., O.A.-W., P.S., B.M., I.M.G., J.M.S., R.L.L. and D.A.L. helped to interpret
results. R.M.M., F.I. and D.A.L. acquired funding for this work. All of the authors reviewed and
for the genotyping data obtained through GoT–ChA, as well as func-
approved the final manuscript.
tions for downstream differential gene accessibility and TF motif acces-
sibility are available at GitHub (https://github.com/landau-lab/Gotcha). Competing interests M.S. served on the advisory board for Novartis, Kymera, Sierra Oncology,
GSK, Rigel, BMS and Taiho; consulted for Boston Consulting and Dedham group and participated
59. Granja, J. M. et al. ArchR is a scalable software package for integrative single-cell chromatin in GME activity for Novartis, Curis Oncology, Haymarket Media and Clinical care options. R.H.
accessibility analysis. Nat. Genet. 53, 403–411 (2021). has served as a consultant for Protagonist Therapeutics, received research funding from Kartos
60. Mulè, M. P., Martins, A. J. & Tsang, J. S. Normalizing and denoising protein expression data Therapeutics, Novartis and AbbVie, and is on the data safety monitoring board of Novartis and
from droplet-based single cell profiling. Nat. Commun. 13, 2099 (2022). AbbVie. O.A.-W. has served as a consultant for H3B Biomedicine, Foundation Medicine, Merck,
61. Stoeckius, M. et al. Cell hashing with barcoded antibodies enables multiplexing and Pfizer, Codify Therapeutics and Janssen, and is on the scientific advisory board of Envisagenics,
doublet detection for single cell genomics. Genome Biol. 19, 224 (2018). AIChemy and Codify Therapeutics. O.A.-W. has received previous research funding from H3B
62. Thibodeau, A. et al. AMULET: a novel read count-based method for effective multiplet Biomedicine, LOXO Oncology, Nurix Therapeutics, Codify Therapeutics and Minovia unrelated
detection from single nucleus ATAC-seq data. Genome Biol. 22, 252 (2021). to the current work. O.A.-W. is a scientific co-founder of Codify Therapeutics. P.S. and E.P.M. are
63. Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587 (2021). current employees of 10x Genomics and Immunai, respectively. R.L.L. is on the supervisory
64. Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019). board of Qiagen and is a scientific advisor to Imago, Mission Bio, Bakx, Zentalis, Ajax, Auron,
65. Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Prelude, C4 Therapeutics and Isoplexis. R.L.L. has received research support from Abbvie,
Nature 566, 496–502 (2019). Constellation, Ajax, Zentalis and Prelude. R.L.L. has received research support from and
66. Schep, A. N., Wu, B., Buenrostro, J. D. & Greenleaf, W. J. chromVAR: inferring transcription- consulted for Celgene and Roche and has consulted for Syndax, Incyte, Janssen, Astellas,
factor-associated accessibility from single-cell epigenomic data. Nat. Methods 14, 975–978 Morphosys and Novartis. R.L.L. has received honoraria from Astra Zeneca and Novartis for
(2017). invited lectures and from Gilead and Novartis for grant reviews. D.A.L. has served as a
67. Pliner, H. A. et al. Cicero predicts cis-regulatory DNA interactions from single-cell chromatin consultant for Abbvie and Illumina and is on the scientific advisory board of Mission Bio and
accessibility data. Mol. Cell 71, 858–871 (2018). C2i Genomics. D.A.L. has received previous research funding from BMS, 10x Genomics and
68. Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq Illumina unrelated to the current work. R.M.M., F.I., E.P.M., R.C., P.S. and D.A.L. have filed a
quantification. Nat. Biotechnol. 34, 525–527 (2016). patent for GoT–ChA (63/288,874). The other authors declare no competing interests.
69. Melsted, P. et al. Modular, efficient and constant-memory single-cell RNA-seq
preprocessing. Nat. Biotechnol. 39, 813–818 (2021). Additional information
70. Gehring, J., Hwee Park, J., Chen, S., Thomson, M. & Pachter, L. Highly multiplexed single- Supplementary information The online version contains supplementary material available at
cell RNA-seq by DNA oligonucleotide tagging of cellular proteins. Nat. Biotechnol. 38, https://doi.org/10.1038/s41586-024-07388-y.
35–38 (2020). Correspondence and requests for materials should be addressed to Franco Izzo or
71. Lareau, C. A. et al. Massively parallel single-cell mitochondrial DNA genotyping and Dan A. Landau.
chromatin profiling. Nat. Biotechnol. 39, 451–461 (2021). Peer review information Nature thanks Andrew Adey, Vijay Sankaran and the other,
72. Ludwig, L. S. et al. Lineage tracing in humans enabled by mitochondrial mutations and anonymous, reviewer(s) for their contribution to the peer review of this work.
single-cell genomics. Cell 176, 1325–1339 (2019). Reprints and permissions information is available at http://www.nature.com/reprints.
Article

Extended Data Fig. 1 | See next page for caption.


Extended Data Fig. 1 | GoT-ChA primers, genotyping and quality control exclusive mitochondrial variants detected in the scATAC-seq data for HEL or
metrics. a, Primer design schematic for GoT-ChA. b, Primer binding sites CA46 cells (Methods). j, scATAC-seq library fragment size distribution for the
(blue) for TP53 R248 and JAK2V617 genotyping, with custom primer handles from a. TP53 R248 mixing study, showing expected nucleosomal periodicity. k, Number
c, Schematic showing GoT-ChA library construction, composed of a biotinylated of unique nuclear fragments per cell for each cell line in the TP53 R248 mixing
hemi-nested PCR, a streptavidin-biotin pull-down, and an on-bead sample study, indicating adequate complexity of the scATAC-seq libraries (HEL
indexing PCR, resulting in genotyping libraries compatible with Illumina n = 2,540 cells; CA46 n = 2,117 cells). l, Transcription start site (TSS) enrichment
sequencing. d, Representative image of electrophoresis gel for GoT-ChA for scores per cell in the TP53 R248 mixing study, showing high signal-to-background
two out of 21 total samples. Full length gel can be found in Supplementary Fig. 1. ratio in the scATAC-seq data (HEL n = 2,540 cells; CA46 n = 2,117 cells).
e, Representative bioanalyzer traces of GoT-ChA genotyping (top) and GoT-ChA m, Histograms of WT (left) and MUT (right) number of reads per cell from the
scATAC (bottom) libraries for two samples. FU, fluorescent units. f, Sanger TP53 R248 mixing study. Kernel density estimation (KDE) lines for overall data
sequencing confirming known homozygosity of TP53 R248 WT HEL cells and (red), background (yellow), and signal (pink) are shown for each genotype.
TP53 R248Q mutant CA46 cells. g, Differential gene accessibility score heat map n, Scatter plots comparing GoT-ChA assigned genotypes (top) compared to the
showing distinct HEL and CA46 cells in the TP53 R248 mixing study (FDR < 0.05 true genotypes as determined by cell line identity (bottom). Dotted lines show
and log 2FC > 1.25; Wilcoxon rank sum test followed by Benjamini-Hochberg the detected threshold for the distinction between background and signal
correction). h, Chromatin accessibility coverage of marker genes (EBF1 and before updated cluster assignments for both WT and MUT data. For all boxplots,
GATA1; FDR < 0.05, log 2FC > 1.25; Wilcoxon rank sum test followed by Benjamini- error bars represent the range, boxes represent the interquartile range and
Hochberg correction), agnostic to genotyping information, used for cell line lines represent the median.
identity assignments (Methods). i, Heatmap showing heteroplasmy for mutually
Article

Extended Data Fig. 2 | Genotyping accuracy, quality control metrics, selected thresholds based on the distribution. g, Histograms of WT (left) and
and GoT-ChA data processing for JAK2V617F locus. a, Sanger sequencing MUT (right) read distributions from the JAK2V617 mixing study. KDE lines for
confirmation of known genotypes for the JAK2V617 mixing study: CCRF-CEM WT overall data (red), background (yellow), and signal (pink) are shown for each
cells, SET-2 heterozygous cells, and HEL homozygous mutant cells. SET-2 data genotype. h, Scatter plots comparing GoT-ChA-assigned genotypes (left) to
confirm the known allelic ratio of 3:1 for mutated:WT alleles in this cell line. the true genotypes (right) as determined by cell line identity. Dotted lines
b, Heat map of differential gene accessibility score (FDR < 0.05, log 2FC > 1.25; indicate the initial thresholds identified between background noise and signal
Wilcoxon rank sum test followed by Benjamini-Hochberg correction) for either WT (vertical line) or MUT (horizontal line) data before final genotype
distinguishing the CCRF-CEM, SET-2, and HEL cells used in the JAK2V617 mixing assignment after clustering (Methods). i, JAK2V617 locus coverage (Methods).
study. c, Chromatin accessibility coverage of marker genes (FDR < 0.05, j, same as Fig. 1e for JAK2V617-mutant HEL cells (with known chromosome 9
log 2FC > 1.25), agnostic to genotyping information used for cell line identity amplification) vs healthy control (Methods). k,l, Fraction of cells genotyped
assignments. Wilcoxon rank sum test followed by Benjamini-Hochberg by GoT-ChA (k) or GoT-ChA genotyping accuracy (l) per targeted locus copy
correction. d, Heatmap showing heteroplasmy of mutually exclusive number. Grey area, 95% confidence interval. m, Sanger sequencing confirmation
mitochondrial variants detected in the scATAC-seq data for HEL, CCRF-CEM of known genotypes for the FOXO1 S22 (c.65 C > G) mixing study: SUM159 WT cells
and SET-2 cells (Methods) e, Fragment size distribution for the JAK2V617 mixing and HEPG2 homozygous mutant cells. n, UMAP coloured by GoT-ChA FOXO1 S22
study scATAC-seq library, showing expected nucleosomal periodicity. genotype classifications of HEPG2 (n = 8,111 cells) and SUM159 (n = 2,841 cells)
f, Scatter plots showing the number of unique nuclear fragments per cell vs. assigned as wild-type (WT, blue), mutant (MUT, red), or not assignable
the transcriptional start site (TSS) enrichment. Dotted lines indicate the (NA, grey) cells.
Extended Data Fig. 3 | See next page for caption.
Article
Extended Data Fig. 3 | Multiplexed GoT-ChA protocol for simultaneous (MUT, red), or not assignable (NA, grey) for original GoT-ChA (left) and
capture of multiple targeted loci. a, Sanger sequencing traces showing the multiplexed-adapted GoT-ChA (right). f-i, Same as panel e, but for NRAS Q61,
expected genotypes of OCI-AML3, CA46, HEL, and SET-2 cell lines for NRAS Q61, TP53 M133 , TP53 R248_1 and TP53 R248_2 , respectively. j, Percentage of cells genotyped
TP53 M133 , TP53 R248 and JAK2V617 utilized in the multiplexed-adapted GoT-ChA for targeted loci ( JAKV617, NRAS Q61, TP53 M133 , TP53 R248_1 and TP53 R248_2) for either
cell mixing experiment. Extended Data Fig. 2a has JAK2V617 sequencing traces GoT-ChA original or GoT-ChA adapted protocols (Methods). k, Accuracy for
for HEL and SET-2 cells. b, Accessibility-based UMAP for original GoT-ChA targeted loci and protocols as in j (Methods). l, Distribution of percentage
protocol for CA46 (grey), HEL (gold), OCI-AML3 (violet) and SET-2 (green) cells. of cells for which a given number of targeted loci were captured, for either
c, Accessibility-based UMAP for multiplexing-adapted GoT-ChA protocol the GoT-ChA original or multiplex adapted GoT-ChA protocols (Methods).
(Methods) for cell lines from b. d, Differential gene accessibility markers m, Fraction of cells genotyped according to targeted gene accessibility quantile
(FDR < 0.05, Log 2FC > 1.25; Wilcoxon rank sum test followed by Benjamini- across targeted loci. Accessibility was assessed as normalized scATAC fragments
Hochberg correction) used for cell line identification. e, UMAP coloured mapping to the gene body; cells with zero scATAC fragments mapped to the
by GoT-ChA JAK2V617 genotypes of each cell as wild type (WT, blue), mutant targeted gene were assigned to the first quantile.
Extended Data Fig. 4 | See next page for caption.
Article
Extended Data Fig. 4 | Quality control, data integration and doublet filtering or DOGMA-seq [n = 15,465 cells], Methods). f, UMAP coloured according to
of primary samples processed with GoT-ChA. a, scATAC-seq library fragment multiplet calling (Methods), either cells (n = 163,964; grey) or multiplet (n = 9,899;
size distribution for primary samples, showing expected nucleosomal red) are shown. Multiplet detection rate corresponds to 5.7% of total barcodes.
periodicity. b, Distribution of the number of ATAC fragments per cell for g, Percentage of detected multiplets according to initial Seurat clusters. Cell
each processed primary sample. Cells with fragment counts below 1,000 or clusters with multiplet detection above 25% (red) were filtered out. h, Percentage
above 50,000 were filtered out. Cell numbers are in Supplementary Table 3. of detected multiplets per primary sample before filtering. i, Count of ATAC
c, Distribution of nucleosome signal per cell for each of the processed primary fragments per single cell according to multiplet calling as cell (grey) or multiplet
samples. Cells with nucleosome signal above 4 were filtered out. Cell numbers (red) for each primary sample. j, Count of detected ATAC features according to
are in Supplementary Table 3. d, Accessibility-based UMAP for primary samples. multiplet calling as cell (grey) or multiplet (red) for each primary sample. For all
e, Accessibility-based UMAP split according to the technology used to generate boxplots, error bars represent the range, boxes represent the interquartile
the scATAC profiles (GoT-ChA [n = 72,318 cells], GoT-ChA-ASAP [n = 62,860 cells] range and lines represent the median.
Extended Data Fig. 5 | Marker features for cell cluster identity assignment across cell clusters. d, Differential TF motif accessibility score (FDR < 0.05,
in primary samples. a, Differential gene accessibility score (FDR < 0.05, log 2FC > 0; Wilcoxon rank sum test followed by Benjamini-Hochberg correction)
log 2FC > 1.25; Wilcoxon rank sum test followed by Benjamini-Hochberg between HSC, HSCMY and HSCLY clusters. e, Accessibility-based UMAP coloured
correction) heatmap for each identified cell cluster. Mean gene accessibility by the predicted cell type label obtained via bridge integration mapping
and proportion of cells with detected accessibility is shown. b, Representative (Methods). f, Confusion matrix between manually annotated cluster labels and
TF motif accessibility across cell clusters for primary samples (n = 21 samples). predicted labels based on scRNA-seq reference via bridge integration mapping
c, Genomic track examples of differentially accessible peaks (FDR < 0.05, (Methods).
log 2FC > 1; Wilcoxon rank sum test followed by Benjamini-Hochberg correction)
Article

Extended Data Fig. 6 | See next page for caption.


Extended Data Fig. 6 | Genotype assignment based on GoT-ChA read tracks of normalized ATAC signal across HSCs (n = 7,627 cells), EP1 (n = 11,816
distribution for primary samples. a, Accessibility-based UMAP coloured by cells), GMP (n = 10,310 cells) or MkP (n = 7,154 cells) clusters for the JAK2
GoT-ChA genotype assignment (blue = WT; red = homozygous mutant; gold = promoter region (± 2 kb from transcriptional start site). f, Percentage of
heterozygous; grey = NA) for each primary sample (n = 21 samples). b, Correlation genotyped cells according to JAK2 gene accessibility quantile. Each quantile
between JAK2V617F variant allele fraction (VAF) as measured by bulk DNA comprises 100 randomly sampled cells. Quantiles were defined by normalized
sequencing (Bulk DNA VAF) and pseudobulk JAK2V617F VAF as estimated from accessibility score (as reads mapping the gene for every 10,000 reads per cell,
GoT-ChA genotype calls (Spearman’s ρ = 0.64; R 2 = 0.51; P = 1.2 × 10 −3; Two-sided Methods), with ranges corresponding to: 0.01 − 10.84 (Quantile 1), 10.85 − 21.67
F-test). Grey area represents the 95% confidence interval c, Genotype frequency (Quantile 2), 21.68 − 32.51 (Quantile 3), 32.52 − 43.33 (Quantile 4) and 43.34 –
for Pt-10 JAK2V617 locus as measured by GoT-ChA (n = 8,682 cells) or Mission Bio 108.33 (Quantile 5). g, Percentage of genotyped cells and mean JAK2 gene
Tapestri (n = 2,223 cells). HET = JAK2V617F heterozygous; MUT = homozygous accessibility per cell cluster (Spearman’s ρ = −0.003; R 2 = 0.015; P = 0.55; Two-
JAK2V617F mutant; WT = wild-type. d, Accessibility tracks of normalized ATAC sided F-test). Grey area represents the 95% confidence interval. h, Accessibility
signal across all genotyped cells in the dataset (n = 45,167 cells) for the JAK2 tracks across the JAK2 gene body (± 2 kb) for each cell cluster. The genomic
promoter region (± 2 kb from transcriptional start site) for WT (n = 14,878 cells), coordinates corresponding to the JAK2V617F (c.1849G>T) mutation are highlighted
homozygous MUT (n = 22,842 cells) and HET (n = 7,647 cells). e, Accessibility in pink.
Article

Extended Data Fig. 7 | JAK2V617F-mutated cells are enriched in erythroid, MEP (n = 1,352 cells), erythroid progenitors (EP1, n = 1,639 cells; EP2-3, n = 2,482
megakaryocyte and granulocyte-monocyte progenitor cells in untreated cells) and MkP (n = 907 cells) in ruxolitinib-treated MF patients (each dot
or patients with no clinical response to ruxolitinib. a, JAK2V617 genotyping represents a single patient; Two-sided Wilcoxon rank sum test; error bars show
efficiency across studies applying single-cell droplet-based genotyping, plotted standard error). f, Normalized fraction of mutated cells in HSC (n = 883 cells),
as mean ± s.d. of biologically independent samples (points). b, Heatmap showing HSCMY, (n = 889 cells) and GMP (n = 2,562 cells) in ruxolitinib-treated MF patients
the normalized mutant fraction across indicated HSCs, MEPs, MkPs and (each dot represents a single patient; Two-sided Wilcoxon rank sum test; error
erythroid progenitor (EP[1–3]) clusters (>20 cells genotyped) for untreated bars show standard error). g, Accessibility-based UMAP coloured by GoT-ChA
(green) or ruxolitinib-treated (yellow) clonal hematopoiesis (CH), polycythemia genotype assignment for the Pt-07 sample (no on-treatment response to
vera (PV) and myelofibrosis (MF) patient samples with >20 cells genotyped per ruxolitinib) as WT (n = 994 cells), homozygous MUT (n = 674 cells), HET (n = 193
cluster. c, Normalized fraction of mutated cells in HSCs (n = 1,365 cells), MEP cells) or not assignable (NA, n = 3,448). h, Top: odds ratio between the fraction
(n = 2,565 cells), erythroid progenitors (EP1, n = 2,315 cells; EP2-3, n = 3,610 cells) of mutated cells in each of the indicated clusters and the fraction of mutated
and MkP (n = 1,784 cells) in untreated MF patients (each dot represents a single cells for the remaining clusters (Two-sided Fisher Exact test; dots indicate the
patient; Two-sided Wilcoxon rank sum test; error bars show standard error). estimated odds ratio, error bars show the 95% confidence interval; the dotted
d, Normalized fraction of mutated cells in HSCs (n = 1,365 cells), HSCMY line indicates an odds ratio of 1, signifying no change). Bottom: total number of
(n = 2,970 cells) and GMP (n = 2,209 cells) in untreated MF patients (each dot cells in each cluster for which genotyping data are available. i, Pseudotime
represents a single patient; Two-sided Wilcoxon rank sum test; error bars show estimation as calculated by Monocle 3 (Methods) for the Pt-07 sample UMAP
standard error). e, Normalized fraction of mutated cells in HSC (n = 883 cells), from (g), setting the HSC cluster as the starting point of the trajectories.
Extended Data Fig. 8 | See next page for caption.
Article
Extended Data Fig. 8 | Per sample differences in TF motif accessibility and Δz-score. Concordant changes (black, same direction in both CH and MF) and
gene pathway enrichment. a, Normalized accessibility tracks for genes with significance (red, P < 0.05 in CH; FDR < 0.05 in MF) are shown. i, Heatmap for
increased accessibility in JAK2V617F-mutated HSC and HSCMY clusters (BMPR1B, examples of differentially accessible TF motifs in the MkP cluster in untreated
MMP15) or in WT cells (HLF, BAG2). b, Heatmap for examples of differentially patients. Hierarchical clustering was performed and heatmap was split by rows
accessible TF motifs in early HSCs and HSCMY clusters in untreated patients. defining two expected groups. Colour scale indicates the mean z-score
Hierarchical clustering was performed and heatmap was split by rows defining difference between JAK2V617F -mutated and WT cells. TF motifs are defined as
two expected groups. Colour scale indicates the mean z-score difference upregulated (red) or downregulated (blue) in JAK2V617F. Samples with at least 50
between JAK2V617F -mutated and WT cells. TF motifs are defined as upregulated cells genotyped in the analysed clusters were included. j, TF footprinting for
(red) or downregulated (blue) in JAK2V617F. Samples with > 50 cells genotyped JUN comparing WT (blue) and mutant (red) in untreated (n = 12) MF patient
in the analysed clusters were included. c, Heatmap showing the TF motif samples. Shadowed regions- represent the 95% confidence interval. k, Gene
accessibility for those TF found to be statistically significant between WT set enrichment analysis illustrating an enrichment of Hallmark inflammatory
(n = 1,902 cells) and JAK2V617F homozygous mutant (n = 1,885 cells) HSCs and signature in JAK2V617F -mutated MkPs compared to WT MkPs (FDR = 0.15;
HSCMY, including JAK2V617F heterozygous cells (n = 371 cells) for visualization. normalized enrichment score [NES] = 1.42). l, Schematic of mouse model
Colour scale represents row scaled mean z-scores of motif accessibility for the experiment. m, Gene set enrichment analysis illustrating a depletion of Jun
indicated TFs. d, STAT1 TF motif accessibility in a longitudinal sample (Pt-01) targets in Jak2V617F -deleted compared to Jak2V617F mouse MEPs (FDR = 4.9 × 10 −5;
that progressed from PV (n = 76 WT cells; n = 30 JAK2V617F cells) to MF (n = 192 WT normalized enrichment score [NES] = −1.75). n, Heatmap for examples of
cells; n = 117 JAK2V617F cells). e, Heatmap of correlation values between STAT TFs differentially accessible TF motifs in the erythroid progenitor (EP[1–3]) clusters
(STAT1, STAT2, STAT3, STAT4, STAT5A, STAT5B and STAT6) and TFs involved in in untreated patients. Hierarchical clustering was performed and heatmap was
the NF-κB pathway (NFKB1, NFKB2, REL, RELA and RELB). Colour scale represents split by rows defining two expected groups. Colour scale indicates the mean
the Spearman’s ρ value. Side barplot represents the mean correlation across z-score difference between JAK2V617F -mutated and WT cells. TF motifs are
columns for the indicated row. f, Jak2 RL experiment schematic. Bulk RNA-seq defined as upregulated (red) or downregulated (blue) in JAK2V617F. Samples with
was performed on sorted LSK cells from Jak2V617F and Jak2V617F -deleted mice at least 50 cells genotyped in the analysed clusters were included. o, Gene set
(top). Pre-ranked gene set enrichment of differentially expressed genes within enrichment analysis illustrating an enrichment of heme metabolism genes in
the erythroid (FDR = 2.5 × 10−4; normalized enrichment score (NES) = −1.87; heme JAK2V617F erythroid progenitor clusters (EP[1–3]) compared to WT (FDR = 0.05;
metabolism Hallmark gene set) and TNF via NF-kB (FDR = 4.1 × 10 −4; NES = −1.59) normalized enrichment score [NES] = 1.52). p, Heatmap showing the TF motif
gene sets in Jak2V617F compared to Jak2V617F -deleted mouse LSK cells (bottom). accessibility for those TFs found to be statistically significant between WT
NES, normalized enrichment score. g, Differential TF motif accessibility (n = 1,312 cells) and JAK2V617F homozygous mutant (n = 3,745 cells) EP[1–3] cells,
(FDR < 0.05, absolute Δz-score > 0.1; Two-sided Wilcoxon rank sum test including JAK2V617F heterozygous cells (n = 446 cells) or JAK2V617F homozygous
followed by Benjamini-Hochberg correction) in Pt-19 CH sample within the cells (n = 3,745 cells) for visualization. Colour scale represents row scaled mean
early stem cell clusters (HSC and HSCMY). h, Heatmap comparing changes in TF z-scores of motif accessibility for the indicated TFs. q, TF footprinting for
motif accessibility between JAK2V617F -mutated and WT early HSC and HSCMY BCL11A comparing WT (blue) and mutant (red) in EPs (EP[1–3]; n = 5,925 cells) of
clusters in CH (P < 0.05, absolute Δz-score > 0.1; Two-sided Wilcoxon rank sum untreated MF patient samples. Shadowed areas represent the 95% confidence
test) or MF (FDR < 0.05, absolute Δz-score > 0.1; LMM followed by likelihood interval.
ratio test and Benjamini-Hochberg correction). Colour scale represents the
Extended Data Fig. 9 | Quality control, mitochondrial-based genotype heteroplasmy of mitochondrial variants per cell per patient sample (Methods).
imputation and protein measurements with GoT-ChA-ASAP. a, scATAC-seq f, Correlation of TF motif accessibility mean Δz-score between JAK2V617F -mutated
library fragment size distribution for primary samples processed through and WT early HSC and HSCMY clusters between cells genotyped via GoT-ChA-
GoT-ChA-ASAP, showing expected nucleosomal periodicity. b, Distribution of ASAP or via mitochondrial-based genotype imputation for Pt-02 (Pearson’s
scATAC fragment counts per cell for samples processed through GoT-ChA-ASAP. ρ = 0.94; R 2 = 0.88; P < 2.2 × 10 −16; Two-sided F-test, shadowed area represents
Cells with fragment counts below 1,000 or above 50,000 were filtered out the 95% confidence interval). g, Pt-02 UMAP coloured by genotype from
(Methods). c, Distribution of nucleosome signal per cell for samples processed GoT-ChA (n = 7,763 cells), GoT-ChA-ASAP (n = 11,602 cells), GoT-ChA-ASAP with
through GoT-ChA-ASAP. Cells with nucleosome signal above 4 were filtered mtDNA-based genotype imputation (n = 11,602 cells) or DOGMA-seq with
out (Methods). d, Lineage tree of HSPCs from a patient (ET1) with essential mtDNA-based genotype imputation (n = 15,465 cells), showing percent of
thrombocythemia (ET)22 built from 21,430 clonal SNVs detected within the genotyped cells. h, Pearson correlation values between mutant cell fractions
single-cell expanded clones across the whole genome using CellPhy77. Terminal for each cluster for Pt-02 between methods in g or shuffled control. i, UMAP
nodes are coloured based on JAK2 genotype. Cell heteroplasmies for two from g, coloured by cell-surface protein expression from GoT-ChA-ASAP.
mitochondrial mutations are shown in the heatmap on the right. e, Heatmap of
Article

Extended Data Fig. 10 | Integrated mitochondrial-based genotype the 95% confidence interval, the dotted line marks zero (no correlation).
imputation with chromatin accessibility, gene expression and protein Two-sided F-test. h, CD90 protein expression as measured by MissionBio
measurements using GoT-ChA-ASAP. a, Differential cell surface protein Tapestri (Methods) in Pt-11 (n = 195 cells WT; n = 62 cells JAK2V617F); Two-sided
expression rank between JAK2V617F -mutated and WT HSCMY cells in ruxolitinib- Wilcoxon rank sum test. The trend towards increased CD90 in JAK2V617F-mutated
treated patients (LMM followed by likelihood ratio test and Bonferroni HSCs does not reach statistical significance due to low cell number (Two-sided
correction). b, CD90 protein expression in the HSCMY cluster for patients Wilcoxon rank sum test). i, Accessibility track for THY1 in WT (blue) or JAK2V617F -
processed with GoT-ChA-ASAP with > 50 genotyped cells in the cluster. Patient mutated (red) HSC and HSCMY cells defined by mitochondrial-based genotype
Pt-08 was removed due to the presence of additional mutations. Two-sided imputation in the Pt-02 sample processed through DOGMA-seq. Imputed
Wilcoxon rank sum test; Δ represents the effect size. c, CD90 (THY1 gene) THY1 expression at the RNA level is shown in the violin plot on the right panel
imputed gene accessibility scores in HSC and HSCMY clusters for untreated MF (P < 2.2 × 10 −16; Two-sided Wilcox rank sum test). Peak to gene expression linkage
samples (n = 12); excluding Pt-01 (PV) and Pt-02, or ruxolitinib-treated samples is shown (FDR < 0.05; colour scale shows the correlation value). j, Correlation
(n = 6). LMM modelling patient identity as random effects, followed by between WT vs JAK2V617F changes in RNA expression or gene accessibility for
likelihood ratio test. d, Flow cytometry gating for measurements of CD90 the same gene in HSC + HSCMY clusters for Pt-02 DOGMA-seq data. Two-sided
mean fluorescence intensity (MFI) in HSCs defined as Lineage-, CD45+, CD34+, F-test. k, RNA expression levels for Pt-02 HSC and HSCMY clusters for BMPR1B
CD38-, CD45RA- cells. e, Correlation between CD90 mean fluorescence (top) and FRY (bottom) in WT (n = 467 cells) and JAK2V617F -mutated cells (n = 163
intensity (MFI) and JAK2V617F variant allele fraction (VAF) in HSCs. (Two-sided cells). Two-sided Wilcoxon rank sum test. l, Correlation between WT vs JAK2V617F
F-test). f, Correlation between CD90 MFI and JAK2V617F variant allele fraction changes in RNA expression or gene accessibility for the same gene in MkP cluster
(VAF) in the hematopoietic progenitor cell (HPC) compartment defined as for Pt-02 DOGMA-seq data. Two-sided F-test. m, CD36 protein expression in
Lineage-, CD45+, CD34+, CD38+, CD45RA- cells (n = 71 patients; P > 0.05 [n.s.]; the MkP cluster for either untreated (n = 225 cells WT; n = 550 cells JAK2V617F) or
Two-sided F-test; grey area represents the 95% confidence interval). ruxolitinib-treated (n = 36 cells WT; n = 108 cells JAK2V617F) MF patients (Two-sided
g, Comparison of correlation between JAK2V617F VAF and CD90 MFI within HPCs Wilcoxon rank sum test). For all boxplots, error bars represent the range, boxes
or HSCs (Methods). Dots represent Spearman’s ρ values, error bars represent represent the interquartile range and lines represent the median.
γ

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy