Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrated multiple samples #44

Open
Aaron-sqw opened this issue Oct 5, 2023 · 7 comments
Open

Integrated multiple samples #44

Aaron-sqw opened this issue Oct 5, 2023 · 7 comments

Comments

@Aaron-sqw
Copy link

Hi,
Thanks for the excellent method for CITE-seq!
I find the vignettes from "https://cran.r-project.org/web/packages/dsb/vignettes/end_to_end_workflow.html" only deal with one sample.
If I have multiple samples (more than 3) CITE-seq data and hope to integrate together by anchors to correct the batch, how to deal with this circumstance, may I use the dsb normalized directly?

immune.anchors_adt <- FindIntegrationAnchors(object.list = immune.combined_adt.list, anchor.features = features)
immune.combined_adt <- IntegrateData(anchorset = immune.anchors_adt, new.assay.name = "integrated.adt")

Thanks.

@MattPM
Copy link
Collaborator

MattPM commented Oct 9, 2023

Hi @Aaron-sqw If by samples you mean you have 3 batches, first you want to verify you have a large batch effect before doing any computational integration or correction. You can do this by calculating variance explained by batch. If you observe a large batch effect, then you could do that yes. However, it would not be appropriate if you have only a few proteins measured. Since methods like Seurat integration and Harmony compress the ADT data to find latent shared components in high dimensional space they are more geared toward mRNA where you have thousands of features (genes). For protein since there are less features, we have done a simple linear model batch correction (with limma) after dsb on a few projects with 80-100 proteins which also works well. If you used one of the larger protein panels currently available with hundreds of proteins you could try a less parsimonious correction like you're describing as long as the comparison groups of your experiment are not batch confounded.

@MattPM MattPM closed this as completed Oct 9, 2023
@corbettberry
Copy link

Hi @MattPM! I'm hoping you can assist with a similar question. I have 8 distinct biological samples and I want to use dsb to normalize so I can better compare between the samples. They are not separate batches but completely distinct samples.

It looks like the raw feature matrices are no longer generated when aggregating multiple samples using cell ranger ("Starting with Cell Ranger 6.1, the unfiltered or raw feature-barcode matrix is no longer output by cellranger aggr."). So I have 8 separate raw_feature_bc_matrix and filtered_feature_bc_matrix. Would you recommend findings someway to concatenate these files and then run DSBNormalizeProtein or running separately before combining? What would be the best way to aggregate these raw files? Thank you so much for any thoughts/recs.

@MattPM MattPM reopened this Sep 20, 2024
@ddmk7
Copy link

ddmk7 commented Sep 30, 2024

Hi @MattPM,

Thank you for developing this amazing noise reduction technique. I have 4 tumor samples and 4 healthy donor samples, and for each sample, I added a DSB-normalized matrix (containing over 90 antibodies plus 3 isotype controls) before integrating them based on RNA. The Seurat FindMarkers function recommends using either the raw count or normalized data slot (DSB normalized in my case), but doing so generates large log2 fold changes (>100) between tumor and healthy cells in different cell types. I'd appreciate your thoughts on how to handle this. Thank you!

@MattPM
Copy link
Collaborator

MattPM commented Oct 5, 2024

@corbettberry
Can you clarify your experiment design? Do you have 8 separate antibody staining reactions, or are you multiplexing 8 samples together, staining with antibodies in single tube, then partitioning that across 8 lanes?

@MattPM
Copy link
Collaborator

MattPM commented Oct 5, 2024

@ddmk7 This is more of a downstream analysis question, a bit outside the scope of this denoising package.

I will say that since you have several distinct individuals, the functions in "FindMarkers" are not ideal. The cells (more technically, the standard errors of the statistical test) are not independent if you're comparing n=4 tumor vs n=4 normal and treating all the samples as "replicates", ignoring the fact that you have several individuals. That is an example of pseudoreplication. Those p values are inflated and technically invalid. Unfortunately it happens a lot in published literature. This paper explains some of the issues: https://www.nature.com/articles/s41467-021-21038-1.

FindMarkers is ok for defining cluster specific markers just for the purpose of cell type annotation but not testing experimental effects in your case.

I would recommend using a mixed effects model and a pseudobulk approach. See here for more information: https://www.nature.com/articles/s41467-020-19894-4.

You can also read the methods of the differential expression approach used in our recent paper https://mattpm.net/man/pdf/natural_adjuvant_immunity_2024.pdf the associated code with the paper has a lot of examples https://niaid.github.io/fsc/

That framework leans heavily on Gabe Hoffmans method for applying mixed models on rnaseq data. His more recent advance of empirical bayes methods for mixed models with single cell data would be good to read for a deeper dive: https://academic.oup.com/bioinformatics/article/37/2/192/5878955
https://pubmed.ncbi.nlm.nih.gov/36993704/

-matt

@corbettberry
Copy link

@MattPM we had 8 separate antibody staining reactions that were then run across 8 separate lanes. Thanks for your help!

@MattPM
Copy link
Collaborator

MattPM commented Oct 5, 2024

@corbettberry Edited 6 Oct 2024 for clarity.

@corbettberry
Please see the answer and code provided in #12

dsb is based on the underlying physics of each antibody staining reaction. In theory, each unique staining reaction for your 8 samples is a separete data generation process, so the data should be normalized separately, then each normalized matrix can be combined. However, in practice, if you stained the cells the same way for the same amount of time carefully, and the samples are relatively similar in their composition, the background distributions will be very similar between samples, and you can run one normalization after first combining all of the matrices into a large cell matrix and background matrix respectively. There is no way of knowing which will be better for your data without some testing and I have seen datasets where either scheme worked better. In issue 12 linked above, I provided code to run both ways. You can then see which method works better for you. If combining the matrices, append all the barcodes of each "lane" with a string like "_1" for the first lane, "_2" for the second and so on, to avoid barcode collisions (some barcodes of different lanes will be the same).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy