Closing Wrap Up Mathematical Frameworks for Integrative Analysis of Emerging Biological Data Types June 15 - 19, 2020 Zoom from Banff International Research Station, Canada Aedin Culhane (Dana-Farber Cancer Institute, Harvard TH Chan School of Public Health) Elana Fertig (John Hopkins University) Kim-Anh Lê Cao (University of Melbourne) #BIRSBioIntegration
Transparency Goals of this workshop Collaboration Multi-omics integration of single cell data Open science is an active and emerging field ○ May provide insight that cannot be obtained from single datasets ○ Fairness lacks established performance benchmarks, ○ gold standard datasets, assessment standards. ○ Inclusion Bring together interdisciplinary computational scientists to examine cutting edge techniques for integrative analysis of diverse multi-omics. ○ Provide & assess open source resources for multi-platform analysis ○ Formulate goals and future directions to advance multi-omics analysis ○ Products: Guidelines, build collaboration, code & datasets, a white paper #BIRSBioIntegration
#BIRSBiointegration Community 3 challenging data challenges 16 contributed talks focusing on analysis 5 keynotes 9 Brainstorming sessions Data and GitHub code shared 339 Commits to manubot 156 Members, 16 Active Channels on Slack #BIRSBioIntegration
Outreach Beyond Banff Live Stream #BIRSBioIntegration http://www.birs.ca/live https://twitter.com/hashtag/BIRSBiointegration
Emerging Research: Five keynote speakers Prof. GC Yuan Prof. Bernd Prof. Oliver Stegle Prof. Susan Prof. Vincent Carey Dana-Farber Cancer Institute, Bodenmiller German Cancer Research Holmes Harvard Medical School, Harvard TH Chan School of Center & EMBL Brigham & Women’s Hospital University of Zurich Stanford University Public Health Mon Tues Wed Thurs Fri #BIRSBioIntegration
Contributed talks from hackathon participants sc seq-FISH sc Targ Proteomics scNMT-seq Alexis Coullomb Yingxin Lin Al J Abadi Hang Xu Chen Meng Joshua Welch Dario Righelli Pratheepa Jeganathan Arshi Arora Amrit Singh Kris Sankaran Wouter Meuleman Joshua Sodicoff Lauren Hsu Duncan Forster Slides from Brainstorming sessions available, see on Slack #information
3 Hackathon Challenges Breast Cancer sc Proteomics Gastrulation (scNMT) A dult mouse visual cortex seqFISH, Non-overlapping patients scRNAseq 826 cells matching across all data sets MIBI 40 TN, Mass Tag 7 TN (transcriptome, DNA accessibility and - seqFISH - 1,597 single cells x 125 genes DNA methylation) after quality control and mapped (Zhu et al 2018) filtering. - scRNA-seq. ~1,600 cells (Tasic et al 2016 ) … with 20 overlapping proteins
Hackathon Challenge Brainstorms Spatial Fish Targeted Proteomics RNA - DNA Summary Expt design, Normalisation, Binary data Summary of common Platform Specific Partial feature overlap Transfer challenges : bias, Non-overlapping cells learning or Non-overlapping Inclusion of Integrating by phenotype imputation features and/or cells, spatial information Inherent spatial nature of using other from data-driven biologial data, atlases, towards mechanistic Non-linear driven, integration Objective Scale/metrics from single DNA features Generic towards context Assessment, cell to cell communities summary, specific methods Annotation Atlases and Annotation of Incorporate prior maps for benchmarking histone db knowledge #BIRSBioIntegration
9 Brainstorming sessions seqfish_theme sc_targ_proteomics_theme scNMT-seq_theme summary_analyses_theme benchmark_theme Guo-Cheng Yuan & Aedin Culhane & Ricard Arguelaget & Kim-Anh Lê Cao & Mike Love & Ruben Dries Olga Vitek Oliver Stegle Casey Green Matt Ritchie Dana-Farber Cancer Institute, German Cancer Research Center & University of Melbourne & Uni University of North Carolina-Chapel Hill Dana-Farber Cancer Institute, Harvard TH Chan School of Public EMBL Pennsylvania & Walter and Eliza Hall Institute Harvard TH Chan School of Public Health & Boston University Health & Northeastern University Vincent Carey Elana Fertig Susan Holmes Harvard Medical School and Johns Hopkins University Stanford University Brigham & Women’s Hospital interpretation_theme software_theme future_theme
Benchmarking Interpretation Software Future Establish Issue of Representation High cell/large benchmarking mutli-view data tissue (HCA, Allen, performance datasets Spatial HTAN) benchmarks and immunology gated Modality assessment descrete Colocation eQTL standards Assessment metrics Vocabulary for inside Annotation 4D, Need pertubations/ Datasets blueprint -Cell State- dynamic datsets data science versus benchmarks Cell State. Dropouts towards biologists Data sharing Glossary for paper Scalability - Deliver open source containers Molecular coverage (appendix) resources for Deeper sampling Figures and multi-platform analysis Connecting to visualization for (data wrangling) consoritums Which data for which question communication Awesome- Color blind standard versus discovery. (import for UMAP) Training on model multi-omics
Community Coordination & Communication - Representations - Scale - Metrics - Unified language - Annotation, ontology resources - Leverages skills in other disciplines (spatial) - Training - across disciplines - - Benchmarking dataset - ground truth - What would be most interesting?
DNA “accessible” for gene expression? ● DNA ->Regulation -> RNA -> Protein-> Regulation Using the Genome in experimental design ● heterochromatin v euchromatin (silent v active) DNA Which chromatin features under selection (active) and which features defines the genome accessible for transcription are evolutionary silent (historical)? ● Genome organization variability in cell types, states, How precisely can chromatin define normal cell types (differentiation, development, stress, disease) unknown ● If regions are expected background off and other expected “accessible” (within a expt negative control?) “Stable functional states and cell populations can be generated by two mechanisms: time- or population averaging of gene activity ( Fig. 4A ) or the formation of functionally equivalent but morphologically diverse cellular structures ( Fig. 4B ).” Finn & Misteli Suggests timing Is there a timing delay between methylation and gene expression How to capture dynamics with the right technology? How do we distinguish cause vs effect of interactions? Multi-omics integration is a fancy word but are we learning anything new here (biology-wise)? Multi-omics done well might help us understand how the different levels of regulation are influencing each other. Methylation and gene expression don’t match up (or do they if you consider timing delays?). We haven’t accurately captured the direction of the regulation. How to capture dynamics with the right technology? How do we distinguish cause vs effect of interactions? How should multi-omic experiments be designed to be useful? What we learn will be constrained by how the experiment is designed. Filtering using a different omic lens might make it easier to identify functionally important events that are not particularly differentially events but that are implicated by other features. Use one omics as a ‘surrogate’ for omics data integration ○ How much can we even hack data from one technology to understand another (e.g., copy number estimates from single cell RNA) to capture regulation or distinct processes occurring at different scales ○ Use omics as a surrogate for temporal measurements?
The accessible genome “open” for gene expression Bulk RNAseq normalization approaches Predicting # functional mRNA molecules assumed 50% genes silent in sample Delineate heterochromatin and transcriptional silencing. >50% RNAseq in single cells are silent? Histone marks, Methylation of promoter/enhancers Impact on DE gene expression analysis of Transcription bursts (3 state model) scRNAseq if the Nascent mRNA, half life (cap/tail) miRNA Heterchromatin ∋ G p(E) =0 How do we distinguish cause vs effect of interactions? Euchromatin ∋ G p(E) >0 *Activity dependent on functional network of gene Protein complexes (imputation, dropout.. ) Activation enzyme (precursor -> active form cleavage) Post -translational modification Co-localization Requires Multi-omics *activity can be measured with proteins or inferred by expression of downstream targets
bulk - single cell Cell State - dependent on local autocrine, paracrine, community signalling. More dynamic/variant. BULK sc Cell Type - relatively stable except for chromatin Qualitative Quantitative, reorganization (stress/CNV/ dev) assessments of cell high-resolution cell identity atlases => Would predict bulk RNAseq captures Cell lineage -> Cell Type ≠ Cell State cell type >> cell state
Single Cells -> Communities -> Phenotype Human Phenotype defined by Systems, Organs that are composed of Cell Composed of organized Communities Connected by signnaling Cells types, polarity ‘Omics DNA Chromatin (paracrine, endocine RNA Protein Gap junctions, autocrine) Glycosylation- metabolites etc
Recommend
More recommend