Single-cell analysis workshop Sydney Precision Bioinformatics Group - PowerPoint PPT Presentation

Single-cell analysis workshop Sydney Precision Bioinformatics Group The University of Sydney Page 1

Sydney Precision Bioinformatics Research Group We share an interest in developing statistical and computational methodologies to tackle the foremost significant challenges posed by modern biology and medicine. Meet our senior and junior research leaders Kitty Lo Rachel Wang Samuel Muller PengyiYang Ellis Patrick Garth Tarr Jean Yang John Ormerod and senior research associates, PhD candidates, Honours and TSP students: 25 Find out more: http://www.maths.usyd.edu.au/bioinformatics/ Get interactive: http://shiny.maths.usyd.edu.au/ The University of Sydney Page 2

Roadmap for the workshop - Setting up: 1:15 – 1:30 Google cloud set up - Session 1: 1:30 – 2:00 Single cell analysis overview (scdney) - Session 2: 2:00 – 2:45 Quality control and data integration - Session 3: 2:45 – 3:45 Cell type identification via cluster analysis - Session 4: 3:45 – 4:30 Downstream analysis: identify marker genes & cell type composition - Extension: cell type identification via supervised classification and single cell trajectory analysis Workshop presenters in each session: Jean Yang, Kevin Wang, Pengyi Yang, Yingxin Lin The University of Sydney Page 3

Configuring Google Cloud – Machine 1: 34.69.169.142 – Machine 2: 34.94.220.230 source("/home/user_setup.R") The University of Sydney Page 4

Exponential growth in single cell RNA seq technologies Svensson et al. Nature Protocols ( 2018) The University of Sydney Page 5

Droplet based technologies are now dominating Macosko et al. (2015), Cell 10X Genomics is a commercial provider of droplet based scRNAseq platform The University of Sydney Page 6

scRNAseq experiments approaching 1 million cells Saunders et al., (2018) Cell 690,000 individual cells from 9 regions of adult mouse brain The University of Sydney Page 7

Number of scRNAseq tools also increasing rapidly Downloaded from www.scrna-tools.org The University of Sydney Page 8

Single-cell RNA-seq analysis The University of Sydney Page 9

Components of a typical scRNA-seq analysis process The University of Sydney Page 10

Component 1: Data acquisition Software • CellRanger for 10X Genomics data • Macosko’s custom scripts for DropSeq data • STAR for alignment plus custom scripts (or there is STAR-solo) Input Considerations • BCL or fastq file from the sequencer • Single or mix of species? Does it include ERCC spike-ins? May need to build a custom reference Output • Barcode and/or UMI sequencing errors – • Gene/cell counts matrix CellRanger takes care of this automatically • Align to exon or exon and intron? The University of Sydney Page 11

Component 2: Data preprocessing – Quality control Software • Seurat (all-purpose single cell R package) • Scater • DropletUtils (R package with a number of handy utility functions) • Your own custom scripts Considerations • Filter out droplets with doublets – may be difficult to find. Can estimate expected rate by doing species mixture experiment The University of Sydney Page 12 Croset (2018), eLife

Component 2: Data preprocessing – Quality control Software • Seurat (all-purpose single cell R package) • Scater • DropletUtils (R package with a number of handy utility functions) • Your own custom scripts Considerations • Filter out droplets with doublets – may be difficult to find. Can estimate expected rate by doing species mixture experiment • Filter out droplets with no cells The University of Sydney Page 13

Component 2: Data preprocessing – Quality control Software • Seurat (all-purpose single cell R package) • Scater • DropletUtils (R package with a number of handy utility functions) • Your own custom scripts Considerations • Filter out droplets with doublets – may be difficult to find. Can estimate expected rate by doing species mixture experiment • Filter out droplets with no cells • Filter out droplets with damaged cells – look for high mitochondrial gene content or high spike-in The University of Sydney Page 14

Component 3: Data integration Software • Seurat (all-purpose single cell R package) for very basic normalization • Batch effect correction • mnnCorrect • ZINB-Wave • scMerge The University of Sydney Page 15

scMerge motivation - Liver fetal development time course dataset E17.5 E9.5 E10.5 E11.5 E12.5 E13.5 E14.5 E15.5 E16.5 GSE87795 Su et al. The University of Sydney Page 16

Liver fetal development time course datasets E17.5 E9.5 E10.5 E11.5 E12.5 E13.5 E14.5 E15.5 E16.5 GSE87795 N = 389 cells Su et al. GSE90047 Yang et N = 448 cells al. GSE87038 Dong et N = 320 cells al. GSE96981 Camp et N = 79 cells al. The University of Sydney Page 17

tSNE of liver fetal development time course datasets Highlighted by batches Highlighted by cell types Challenge: Strong “batch effect” The University of Sydney Page 18

Breaking observed data into components For n cells with data collected for m genes Biologically relevant Unwanted variation The data we observe Random noise variation batch and technical cell types effects p wanted variables k unwanted variables The University of Sydney Page 19

scMerge algorithm Estimated by stably expressed genes by factor analysis Estimated with replicates by factor analysis RUVIII algorithm Molania et al. (2019), Nuclei Acids Res The University of Sydney Page 20

scMerge algorithm Clustering for each batch Pseudo- (k-means by default) replicates Find Mutual Nearest Clusters as pseudo-replicates Frame as pseudo-replicate information The University of Sydney Page 21

Coming back to our motivational data – Liver fetal development time course datasets Before scMerge After scMerge cell_types logcounts scMerge_scSEG cholangiocyte 40 Endothelial Cell Epithelial Cell 20 Hematopoietic 20 hepatoblast/hepatocyte tSNE2 tSNE2 Immune cell tSNE2 0 Mesenchymal Cell 0 Stellate Cell −20 batch −20 −20 −20 GSE87038 GSE87795 −20 −20 −10 GSE90047 −20 −20 −10 0 20 0 10 20 30 tSNE1 tSNE1 GSE96981 The University of Sydney Page 22

More information scMerge R package and website: PNAS: https://sydneybiox.github.io/scMerge/ https://doi.org/10.1073/pnas.1820006116 The University of Sydney Page 23

We will try this soon … 2:00 – 2:45 Quality control and data integration The University of Sydney Page 24

Component 4: Cell type identification Science questions • What cell types are present in the dataset? • Can we identify the cell types? The University of Sydney Page 25

Phase 3: Cell assignment Science questions • What cell types are present in the dataset? • Can we identify the cell types? Analysis techniques • Visualization (dimension reduction) • Clustering (unsupervised learning) • Classification (supervised learning) The University of Sydney Page 26

Dimension reduced plot of our data (tSNE plot) t−SNE plot 20 How many cell types are there? What are the cell types? 10 tsne2 0 −10 −20 −20 −10 0 10 20 tsne1 The University of Sydney Page 27

k-means clustering t−SNE plot 20 How many cell types are there? What are the cell types? 10 tsne2 0 −10 −20 −20 −10 0 10 20 tsne1 The University of Sydney Page 28

Clustering algorithms for scRNA-seq k -means Hierarchical 25%+ RaceID SC3 CIDR countClust Luke Zappia, et al. PLoS Comp. Bio. 2018 RCA SIMLR The University of Sydney Page 32

Similarity metric is the core of clustering algorithm Key question: is there a similarity metric that performs (on average) k -means better for clustering single cells based on their transcriptome? Hierarchical Euclidean RaceID Pearson SC3 Manhattan CIDR Spearman countClust Maximum RCA Correlation-based SIMLR Distance-based The University of Sydney Page 33

k -means Clustering on GSE60361 k -means Clustering on GSE60361 k -means pre-defined cell types Zeisel A, et al. Science 2015 The University of Sydney Page 34

Evaluation framework Agreement to pre-defined classes: Normalized Mutual Information (NMI) Adjusted Rand Index (ARI) Fowlkes-Mallows Index (FM) Jaccard Index (Jaccard) Taiyun Kim The University of Sydney Page 35

Evaluation results (against the pre-defined cell types) Multiple datasets PhD student: Taiyun Kim The University of Sydney Page 36

Evaluation results (against the pre-defined cell types) Evaluation results (against the pre-defined cell types) using other measures On average, correlation-based metrics improved on distance-based metrics by 31.5% (NMI), 39.6% (ARI), 16% (FM), 23% (Jaccard) The University of Sydney Page 37

Account for data scaling and zero-counts Additional processing Linnorm normalisation SAVER imputation Agreement to pre-defined classes: Normalized Mutual Information (NMI) Adjusted Rand Index (ARI) Fowlkes-Mallows Index (FM) Jaccard Index (Jaccard) The University of Sydney Page 38

Account for normalisation and imputation The University of Sydney Page 39

Improving the state-of-the-art clustering method using correlation metric SIMLR The University of Sydney Page 40

Single-cell analysis workshop Sydney Precision Bioinformatics Group - PowerPoint PPT Presentation

Single-cell analysis workshop Sydney Precision Bioinformatics Group The University of Sydney Page 1 Sydney Precision Bioinformatics Research Group We share an interest in developing statistical and computational methodologies to tackle the

Bacteria Without a Cell Wall L-forms Pros & Cons of Cell Wall Cell membrane Cell wall DNA

Cell Communication and Cell Signaling Why is cell signaling important? Why is cell signaling

Single Cell Analysis with the MVX-7100 L Workstation July 17 th 2019 Peter Winship, Ph.D.

Does God play dice with the cell? Does God play dice with the cell? Does God play dice with the

Single-cell transcriptomics (scRNA-seq) Eukaryotic Single Cell Genomics facility Applications for

Cell Hydration as Cell Hydration as an Essential Cell Parameter for an Essential Cell Parameter

Eukaryotic Cell Structures and Functions General Animal Cell Structure General Plant Cell

VHL and clear cell Renal Cell Carcinoma Gene expression profiles in renal cell VHL syndrome

What is single-cell RNA-Seq, and why is it useful? S IN GLE-CELL RN A-S EQ W ORK F LOW S IN R

Single Cell Imaging What can be learned about the single whole cell using EM? Where are we now?

Introduction to single cell RNA sequencing CRUK Bioinformatics Summer School 2018 Mike

Introduction to Single Cell RNA Sequencing Sarah Boswell Director of the Single Cell Core,

Introduction to Single Cell RNA Sequencing Sarah Boswell Director of the Single Cell Core,

Lectures 20, 21: Single-cell Sequencing and Assembly Spring

Cell Communication Topics 4.1 through 4.2 Topic 4.1 Cell Communication Importance of Cell

MP-ECBC CELL Lets build Tomorrow presentation by Kaushal Lodaya MP-ECBC CELL Member

Advanced Topics in Compensation & Panel Design Katharine Schwedhelm January 30, 2020 O

SARS-CoV-2 and COVID-19 Treatment: Tocilizumab David H. Spach, MD Gretchen Snoeyenbos Newman, MD

Techno-economic analysis of semicontinuous production of recombinant butyrylcholinesterase in

Ne New biologics: Wh What are they? Peter Chin-Hong MD Professor of Medicine UCSF October

FlowCAP - History Richard H. Scheuermann, Ph.D. U.T. Southwestern Medical Center Brief History

Principal Component of Explained Variance High-Dimensional Estimation and Inference Max Turgeon

Topics for the Day What is a Mediterranean diet? Why might you want to eat a Mediterranean diet?

MOL2NET, 2018 , 3, doi:10.3390/mol2net-03-xxxx 2 Introduction Obesity is considered one of the

Single-cell analysis workshop Sydney Precision Bioinformatics Group - PowerPoint PPT Presentation

Single-cell analysis workshop Sydney Precision Bioinformatics Group The University of Sydney Page 1 Sydney Precision Bioinformatics Research Group We share an interest in developing statistical and computational methodologies to tackle the

Bacteria Without a Cell Wall L-forms Pros &amp; Cons of Cell Wall Cell membrane Cell wall DNA

Cell Communication and Cell Signaling Why is cell signaling important? Why is cell signaling

Single Cell Analysis with the MVX-7100 L Workstation July 17 th 2019 Peter Winship, Ph.D.

Does God play dice with the cell? Does God play dice with the cell? Does God play dice with the

Single-cell transcriptomics (scRNA-seq) Eukaryotic Single Cell Genomics facility Applications for

Cell Hydration as Cell Hydration as an Essential Cell Parameter for an Essential Cell Parameter

Eukaryotic Cell Structures and Functions General Animal Cell Structure General Plant Cell

VHL and clear cell Renal Cell Carcinoma Gene expression profiles in renal cell VHL syndrome

What is single-cell RNA-Seq, and why is it useful? S IN GLE-CELL RN A-S EQ W ORK F LOW S IN R

Single Cell Imaging What can be learned about the single whole cell using EM? Where are we now?

Introduction to single cell RNA sequencing CRUK Bioinformatics Summer School 2018 Mike

Introduction to Single Cell RNA Sequencing Sarah Boswell Director of the Single Cell Core,

Introduction to Single Cell RNA Sequencing Sarah Boswell Director of the Single Cell Core,

Lectures 20, 21: Single-cell Sequencing and Assembly Spring

Cell Communication Topics 4.1 through 4.2 Topic 4.1 Cell Communication Importance of Cell

MP-ECBC CELL Lets build Tomorrow presentation by Kaushal Lodaya MP-ECBC CELL Member

Advanced Topics in Compensation &amp; Panel Design Katharine Schwedhelm January 30, 2020 O

SARS-CoV-2 and COVID-19 Treatment: Tocilizumab David H. Spach, MD Gretchen Snoeyenbos Newman, MD

Techno-economic analysis of semicontinuous production of recombinant butyrylcholinesterase in

Ne New biologics: Wh What are they? Peter Chin-Hong MD Professor of Medicine UCSF October

FlowCAP - History Richard H. Scheuermann, Ph.D. U.T. Southwestern Medical Center Brief History

Principal Component of Explained Variance High-Dimensional Estimation and Inference Max Turgeon

Topics for the Day What is a Mediterranean diet? Why might you want to eat a Mediterranean diet?

MOL2NET, 2018 , 3, doi:10.3390/mol2net-03-xxxx 2 Introduction Obesity is considered one of the

Bacteria Without a Cell Wall L-forms Pros & Cons of Cell Wall Cell membrane Cell wall DNA

Advanced Topics in Compensation & Panel Design Katharine Schwedhelm January 30, 2020 O