Automated High-dimensional Cytometric Data Analysis Cytometric Data - PowerPoint PPT Presentation

Automated High-dimensional Cytometric Data Analysis Cytometric Data Analysis Philip L. De Jager, M.D. Ph.D. Director, Program in Translational NeuroPsychiatric Genomics Brigham & Women’s Hospital Assistant Professor of Neurology Harvard Medical School

Challenges in cytometric analysis Challenges in cytometric analysis • • Large amount of high dimensional data Large amount of high dimensional data • Manual data processing (subjective, slow) • Not suitable for high-throughput study g g p y • Difficult to use in inferential analysis – “hypothesis limited” • • Sub optimal usage of data dimensions Sub-optimal usage of data dimensions - Increasingly multi-parametric - Restricted visualization Solution: Automated & Multivariate Analysis Automated & Multivariate Analysis 2

FLAME FLow cytometry analysis with Automated y y y Multivariate Estimation • Clustering – parametric and multivariate mixture l i i d l i i i modeling of the populations in each flow sample • Meta clustering • Meta-clustering – match the corresponding match the corresponding populations from multiple samples to compare features of these matched populations • Feature selection – identify features that distinguish populations between different classes (such as normal vs. disease, wt vs. mutant, (suc as o a s. d sease, t s. uta t, longitudinal observations, etc.) • Classification – predict class membership for new samples based on those distinctive features l b d h di i i f

2. Meta-clustering FLAME summary FLAME summary 1. Clustering flow data Sample 1 3. Feature Selection class1 class2 class3 • Frequencies • Locations • Means • Modes Sample 2 • Variances • Scales • Orientations • Shapes Downstream Analyses Sample 3 • Visualization • Class Discovery Cl Di • Class Prediction • Etc.

FLAME Methodology Methodology

Concept: Finite Mixture Model Finite Mixture Model : weighted sum of g univariate or multivariate densities Univariate Gaussian mixture Univariate Gaussian mixture Bivariate Gaussian mixture Bivariate Gaussian mixture w 1 =0.5 w 2 =0.5 Fitted curve curve µ 1 µ 2 µ 3 6 g=3 g=2 Sum of 3 Gaussians

Different distributions Skew N N Skew

Model Selection options in FLAME p Skew N N Skew

Step 1: Fitting a distribution Step 1: Fitting a distribution • Lymphoblastic cell line 9

Fitting skew t deals with asymmetry g y y Gaussian Skew Skew Asymmetric Data Asymmetric Data Distribution Distribution Density Plot

Step 2: Meta-clustering Step 2: Meta clustering 1. Input: Individual samples clustered by mixture model 2. Take all samples and pool their cluster locations T k ll l d l th i l t l ti 3. 3. Algorithm: Run Partitioning Around Medoids (PAM) to go t : u a t t o g ou d edo ds ( ) to obtain k meta-clusters 4. Output: Matched features used for classification of samples 1 1

Example 2: Identifying discriminating features •Experiment: examine ZAP70 and SLP76 phosphorylation events before p p y and after T cell receptor activation in naïve and memory T cells •Lymphocytes stained with four Lymphocytes stained with four markers: •CD4 •CD45RA •ZAP70Y292 •SLP76Y128 •SLP76Y128 •60 samples: 30 subjects x two time points: pre- and post- anti-CD3 antibody stimulation ib d i l i

Registering populations across samples Pre ‐ stimulation samples p Post ‐ stimulation samples p

a. c. CD45RA CD45RA C C e. Sample 121106A_0min CD45RA b. b d d. 5RA RA CD45R CD45 Pre-stimulation P Post-stimulation t ti l ti Sample 121106A_5min

Step 3: Discriminating features Pre-stimulation Post-stimulation zero minute zero-minute five minute five-minute

Discriminating features ∆ mean Feature [five- feature name Type Cluster # Dimension(s) min] p-value III vars11.4 Variance 4 1 -0.156 1.65E-18 IV orientation 72 Orientation 5 3 -0.649 1.01E-14 orientation 56 Orientation 4 3 -0.609 1.13E-12 vars11.5 Variance 5 1 -0.082 4.00E-08 orientation 66 Orientation 5 1 -0.515 1.37E-05 shape 11 Shape 3 3 -0.175 2.62E-08 scale4 Scale 4 NA -0.052 3.32E-06 II II orientation 19 Orientation 2 1 -0.632 1.34E-06 shape 8 Shape 2 2 -0.141 4.41E-09 shape 15 Shape 4 4 -0.178 5.17E-07 vars41.5 Variance 5 1,4 -0.024 2.63E-05 orientation 42 Orientation 3 3 -0.422 9.73E-04 shape 20 Shape 5 4 -0.060 7.93E-05 scale5 Scale 5 NA -0.038 7.23E-04 vars43.3 Variance 3 3,4 -0.020 7.10E-04 vars31.4 Variance 4 1,3 -0.015 3.34E-03 I I vars11.3 Variance 3 1 0.314 6.22E-12 CD45RA A orientation 52 Orientation 4 1 0.552 1.87E-10 V vars21.2 Variance 2 1,2 0.251 1.22E-10 vars21.3 Variance 3 1,2 0.259 1.14E-11 vars21.4 Variance 4 1,2 0.060 3.42E-08 orientation 20 Orientation 2 1 0.504 2.17E-11 shape 10 Shape 3 3 0.740 1.31E-09 shape 7 Shape 2 2 0.682 4.49E-16 shape 13 Shape 4 4 1.023 4.37E-09 mus1.4 Mean 4 1 1.761 6.13E-22 orientation 54 orientation 54 Orientation Orientation 4 4 2 2 0.534 0 534 1 26E-08 1.26E 08 mus1.5 Mean 5 1 1.657 2.47E-21 vars22.2 Variance 2 2 0.282 5.45E-05 orientation 59 Orientation 4 3 0.548 1.51E-04 vars22.3 Variance 3 2 0.146 1.09E-05 orientation 47 Orientation 3 4 0.561 4.65E-05 orientation 43 Orientation 3 3 0.066 8.01E-05 orientation 70 Orientation 5 2 0.308 4.07E-03 scale3 Scale 3 NA 0.063 2.19E-04 mus1.2 Mean 2 1 1.571 1.52E-18 vars11 2 vars11.2 Variance Variance 2 2 1 1 0 131 0.131 1 62E 04 1.62E-04 vars22.5 Variance 5 2 0.023 2.65E-04

Example 3: Identifying a rare cell population p y g p p Regulatory T cells occur as a less Than 0 5 1 0% population in human Than 0.5-1.0% population in human peripheral blood mononuclear cells 3-PE -PE Foxp3- Foxp3 1 7 Baecher-Allan et al., JI , 2006

Stepwise detection of Tregs Stepwise detection of Tregs Step 1 Step 2

Overview Operator/QC p /Q FLAME

FLAME • Automated analysis method • Deconstructs the components of a mixture of cells • Cross-registers cell clusters across samples C i ll l l • Provides a specific record of analysis parameters allowing exact replication of an parameters, allowing exact replication of an analysis by a third party • Operator Modes: Ope ato odes – Cell population discovery mode – Clinical trial mode

Availability • Free software • Available through the GenePattern toolkit on the Broad Institute website – http://www.broadinstitute.org/cancer/software/genepattern/index.htm http://www broadinstitute org/cancer/software/genepattern/index htm l • GenePattern – an environment with pipelining capabilities and a repertoire of downstream analysis tools and a repertoire of downstream analysis tools • Pyne et al. Proc Natl Acad Sci USA 2009; 106: 8519-8524.

Acknowledgements • De Jager lab • Jill Mesirov – Cristin Aubin Cristin Aubin – Saumyadipta Pyne Saumyadipta Pyne – Aaron Brandes – Pablo Tamayo – Becky Briskin – Lori Chibnik • Geoff McLachlan – Portia Chipendo • Kui Wang – Xinli Hu – Linda Ottoboni • David Hafler – Nikolaos Patsopoulos – Clare Baecher-Allan – Joshua Shulman – Lisa Maier – Dong Tran – Irene Wood Irene Wood – Zongqi Xia Funding Sources • • National MS Society National MS Society • NIH: NIA, NINDS

Illustrative Examples Illustrative Examples

E xample 3: Feature selection – Phosphorylation of naïve & memory T cells pre- and post-stimulation 4-dimensional samples Mixture modeling Mixture modeling (ZAP70Y292 (ZAP70Y292 not shown) t h ) 5RA CD45 2 CD4 4

P hosphorylation causes feature alterations in populations lt ti i l ti 5 min. 0 min. Pre-stimulation Post-stimulation

M atching pre- and post- stimulation populations ti l ti l ti 2 pre-stimulation post-stimulation 6

M atching pre- and post- stimulation populations across all samples populations across all samples

F eature Selection Heatmap Zero-minutes Five-minutes 2 8

D t QC/ t Data QC/standardization d di ti • Carefully selected panels • Carefully selected panels • Minimal cross ‐ sample variation

FLow analysis with Automated Multivariate Estimation 10/01/2008 10/01/2008

L ow dimension t - mixture mixture Outliers ? Low dimension clustering is not good enough

M lti M ultivariate t -mixture is better i t t i t i b tt ? 3 Symmetric density is often not good enough 2

M odeling with skewed distributions distributions Better fit with skew Sk Skew N Skew-normal distribution 3 Photo courtesy: Azzalini J.M. et al. Statistical applications of the multivariate skew-normal distribution, 1999. 3

parametric mixture modeling •A biological population is assumed to follow a mathematical distribution, such as Gaussian •Each population can be abstracted as a cluster described by parameters such •Each population can be abstracted as a cluster , described by parameters, such as mean, mode, standard deviation, and skew, etc. •A mixture of populations can be abstracted as a mixture of distributions

Modeling with Gaussian G Gaussian may be too “skinny” to i b t “ ki ” t capture the entire population

Automated High-dimensional Cytometric Data Analysis Cytometric Data - PowerPoint PPT Presentation

Automated High-dimensional Cytometric Data Analysis Cytometric Data Analysis Philip L. De Jager, M.D. Ph.D. Director, Program in Translational NeuroPsychiatric Genomics Brigham & Womens Hospital Assistant Professor of Neurology Harvard

Automated Design of Digital Automated Design of Digital Automated Design of Digital Automated

Week 3 Video 4 Automated Feature Generation Automated Feature Selection Automated Feature

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

High Dimensional Data Alark Joshi High dimensional data Data with multiple dimensions,

CELLENGER CELLENGER Automated High Automated High Content Content Analysis of Analysis of

Overview of Automated Bus Consortium Program Accelerating automated technology for transit

Automated Reasoning: Some Successes and New Challenges Predrag Jani ci c

Automated Reasoning Course Presentation Summary Automated Reasoning Motivations Course Plan

High Dimensional Data, Covariance Matrices High Dimensional Data Examples and Application to

Statistics for High-Dimensional Data: Selected Topics Peter B uhlmann Seminar f ur

Using Local Neighborhoods to Find Subspace Clusters Emin Aksehirli with Bart Goethals, Emmanuel

my.SWISSTRAFFIC Automated Traffic Collection & Analysis FULLY AUTOMATED TRAFFIC DATA

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Comparisons of discriminant analysis techniques for high- dimensional correlated data Line H.

Lecture 15: High Dimensional Data Analysis, Numpy Overview COMPSCI/MATH 290-04 Chris Tralie,

Deep Neural Network Mathematical Mysteries for High Dimensional Learning Stphane Mallat

ANPRM Single IRB Review mandated for multi-site domestic research P. Pearl ORourke, M.D.

Smart Coaster SDP20 Team 16 MDR December 4th, 2019 Meet the Team Professor Joshua Jonathan

Michigan Questionnaire c ga Quest o a e Documentation System (MQDS): A U A Users

Disclosures Siblings of Children with I have nothing to disclose. Autism And Other Select

Introduction to NTRR and BRICS 1 Agenda Introduction to NTRR and BRICS Modules Overview NTRR

Decision aid methodologies in transportation Lecture 2: Aircraft Scheduling Prem Kumar

Use and Application of Real-Time Exposure Monitoring John E. Snawder. PhD, DABT jsnawder@cdc.gov

STEP UP TO SAFET ETY MA Y MANAGEMENT WELCOME! TODAYS PRESENTER Rob Neisius Regional

Sambuz

Useful Links

Newsletter

Mail Us

Automated High-dimensional Cytometric Data Analysis Cytometric Data - PowerPoint PPT Presentation

Automated High-dimensional Cytometric Data Analysis Cytometric Data Analysis Philip L. De Jager, M.D. Ph.D. Director, Program in Translational NeuroPsychiatric Genomics Brigham & Womens Hospital Assistant Professor of Neurology Harvard

Automated Design of Digital Automated Design of Digital Automated Design of Digital Automated

Week 3 Video 4 Automated Feature Generation Automated Feature Selection Automated Feature

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

High Dimensional Data Alark Joshi High dimensional data Data with multiple dimensions,

CELLENGER CELLENGER Automated High Automated High Content Content Analysis of Analysis of

Overview of Automated Bus Consortium Program Accelerating automated technology for transit

Automated Reasoning: Some Successes and New Challenges Predrag Jani ci c

Automated Reasoning Course Presentation Summary Automated Reasoning Motivations Course Plan

High Dimensional Data, Covariance Matrices High Dimensional Data Examples and Application to

Statistics for High-Dimensional Data: Selected Topics Peter B uhlmann Seminar f ur

Using Local Neighborhoods to Find Subspace Clusters Emin Aksehirli with Bart Goethals, Emmanuel

my.SWISSTRAFFIC Automated Traffic Collection &amp; Analysis FULLY AUTOMATED TRAFFIC DATA

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Comparisons of discriminant analysis techniques for high- dimensional correlated data Line H.

Lecture 15: High Dimensional Data Analysis, Numpy Overview COMPSCI/MATH 290-04 Chris Tralie,

Deep Neural Network Mathematical Mysteries for High Dimensional Learning Stphane Mallat

ANPRM Single IRB Review mandated for multi-site domestic research P. Pearl ORourke, M.D.

Smart Coaster SDP20 Team 16 MDR December 4th, 2019 Meet the Team Professor Joshua Jonathan

Michigan Questionnaire c ga Quest o a e Documentation System (MQDS): A U A Users

Disclosures Siblings of Children with I have nothing to disclose. Autism And Other Select

Introduction to NTRR and BRICS 1 Agenda Introduction to NTRR and BRICS Modules Overview NTRR

Decision aid methodologies in transportation Lecture 2: Aircraft Scheduling Prem Kumar

Use and Application of Real-Time Exposure Monitoring John E. Snawder. PhD, DABT jsnawder@cdc.gov

STEP UP TO SAFET ETY MA Y MANAGEMENT WELCOME! TODAYS PRESENTER Rob Neisius Regional

Sambuz

Useful Links

Newsletter

Mail Us

my.SWISSTRAFFIC Automated Traffic Collection & Analysis FULLY AUTOMATED TRAFFIC DATA