automated high dimensional cytometric data analysis
play

Automated High-dimensional Cytometric Data Analysis Cytometric Data - PowerPoint PPT Presentation

Automated High-dimensional Cytometric Data Analysis Cytometric Data Analysis Philip L. De Jager, M.D. Ph.D. Director, Program in Translational NeuroPsychiatric Genomics Brigham & Womens Hospital Assistant Professor of Neurology Harvard


  1. Automated High-dimensional Cytometric Data Analysis Cytometric Data Analysis Philip L. De Jager, M.D. Ph.D. Director, Program in Translational NeuroPsychiatric Genomics Brigham & Women’s Hospital Assistant Professor of Neurology Harvard Medical School

  2. Challenges in cytometric analysis Challenges in cytometric analysis • • Large amount of high dimensional data Large amount of high dimensional data • Manual data processing (subjective, slow) • Not suitable for high-throughput study g g p y • Difficult to use in inferential analysis – “hypothesis limited” • • Sub optimal usage of data dimensions Sub-optimal usage of data dimensions - Increasingly multi-parametric - Restricted visualization Solution: Automated & Multivariate Analysis Automated & Multivariate Analysis 2

  3. FLAME FLow cytometry analysis with Automated y y y Multivariate Estimation • Clustering – parametric and multivariate mixture l i i d l i i i modeling of the populations in each flow sample • Meta clustering • Meta-clustering – match the corresponding match the corresponding populations from multiple samples to compare features of these matched populations • Feature selection – identify features that distinguish populations between different classes (such as normal vs. disease, wt vs. mutant, (suc as o a s. d sease, t s. uta t, longitudinal observations, etc.) • Classification – predict class membership for new samples based on those distinctive features l b d h di i i f

  4. 2. Meta-clustering FLAME summary FLAME summary 1. Clustering flow data Sample 1 3. Feature Selection class1 class2 class3 • Frequencies • Locations • Means • Modes Sample 2 • Variances • Scales • Orientations • Shapes Downstream Analyses Sample 3 • Visualization • Class Discovery Cl Di • Class Prediction • Etc.

  5. FLAME Methodology Methodology

  6. Concept: Finite Mixture Model Finite Mixture Model : weighted sum of g univariate or multivariate densities Univariate Gaussian mixture Univariate Gaussian mixture Bivariate Gaussian mixture Bivariate Gaussian mixture w 1 =0.5 w 2 =0.5 Fitted curve curve µ 1 µ 2 µ 3 6 g=3 g=2 Sum of 3 Gaussians

  7. Different distributions Skew N N Skew

  8. Model Selection options in FLAME p Skew N N Skew

  9. Step 1: Fitting a distribution Step 1: Fitting a distribution • Lymphoblastic cell line 9

  10. Fitting skew t deals with asymmetry g y y Gaussian Skew Skew Asymmetric Data Asymmetric Data Distribution Distribution Density Plot

  11. Step 2: Meta-clustering Step 2: Meta clustering 1. Input: Individual samples clustered by mixture model 2. Take all samples and pool their cluster locations T k ll l d l th i l t l ti 3. 3. Algorithm: Run Partitioning Around Medoids (PAM) to go t : u a t t o g ou d edo ds ( ) to obtain k meta-clusters 4. Output: Matched features used for classification of samples 1 1

  12. Example 2: Identifying discriminating features •Experiment: examine ZAP70 and SLP76 phosphorylation events before p p y and after T cell receptor activation in naïve and memory T cells •Lymphocytes stained with four Lymphocytes stained with four markers: •CD4 •CD45RA •ZAP70Y292 •SLP76Y128 •SLP76Y128 •60 samples: 30 subjects x two time points: pre- and post- anti-CD3 antibody stimulation ib d i l i

  13. Registering populations across samples Pre ‐ stimulation samples p Post ‐ stimulation samples p

  14. a. c. CD45RA CD45RA C C e. Sample 121106A_0min CD45RA b. b d d. 5RA RA CD45R CD45 Pre-stimulation P Post-stimulation t ti l ti Sample 121106A_5min

  15. Step 3: Discriminating features Pre-stimulation Post-stimulation zero minute zero-minute five minute five-minute

  16. Discriminating features ∆ mean Feature [five- feature name Type Cluster # Dimension(s) min] p-value III vars11.4 Variance 4 1 -0.156 1.65E-18 IV orientation 72 Orientation 5 3 -0.649 1.01E-14 orientation 56 Orientation 4 3 -0.609 1.13E-12 vars11.5 Variance 5 1 -0.082 4.00E-08 orientation 66 Orientation 5 1 -0.515 1.37E-05 shape 11 Shape 3 3 -0.175 2.62E-08 scale4 Scale 4 NA -0.052 3.32E-06 II II orientation 19 Orientation 2 1 -0.632 1.34E-06 shape 8 Shape 2 2 -0.141 4.41E-09 shape 15 Shape 4 4 -0.178 5.17E-07 vars41.5 Variance 5 1,4 -0.024 2.63E-05 orientation 42 Orientation 3 3 -0.422 9.73E-04 shape 20 Shape 5 4 -0.060 7.93E-05 scale5 Scale 5 NA -0.038 7.23E-04 vars43.3 Variance 3 3,4 -0.020 7.10E-04 vars31.4 Variance 4 1,3 -0.015 3.34E-03 I I vars11.3 Variance 3 1 0.314 6.22E-12 CD45RA A orientation 52 Orientation 4 1 0.552 1.87E-10 V vars21.2 Variance 2 1,2 0.251 1.22E-10 vars21.3 Variance 3 1,2 0.259 1.14E-11 vars21.4 Variance 4 1,2 0.060 3.42E-08 orientation 20 Orientation 2 1 0.504 2.17E-11 shape 10 Shape 3 3 0.740 1.31E-09 shape 7 Shape 2 2 0.682 4.49E-16 shape 13 Shape 4 4 1.023 4.37E-09 mus1.4 Mean 4 1 1.761 6.13E-22 orientation 54 orientation 54 Orientation Orientation 4 4 2 2 0.534 0 534 1 26E-08 1.26E 08 mus1.5 Mean 5 1 1.657 2.47E-21 vars22.2 Variance 2 2 0.282 5.45E-05 orientation 59 Orientation 4 3 0.548 1.51E-04 vars22.3 Variance 3 2 0.146 1.09E-05 orientation 47 Orientation 3 4 0.561 4.65E-05 orientation 43 Orientation 3 3 0.066 8.01E-05 orientation 70 Orientation 5 2 0.308 4.07E-03 scale3 Scale 3 NA 0.063 2.19E-04 mus1.2 Mean 2 1 1.571 1.52E-18 vars11 2 vars11.2 Variance Variance 2 2 1 1 0 131 0.131 1 62E 04 1.62E-04 vars22.5 Variance 5 2 0.023 2.65E-04

  17. Example 3: Identifying a rare cell population p y g p p Regulatory T cells occur as a less Than 0 5 1 0% population in human Than 0.5-1.0% population in human peripheral blood mononuclear cells 3-PE -PE Foxp3- Foxp3 1 7 Baecher-Allan et al., JI , 2006

  18. Stepwise detection of Tregs Stepwise detection of Tregs Step 1 Step 2

  19. Overview Operator/QC p /Q FLAME

  20. FLAME • Automated analysis method • Deconstructs the components of a mixture of cells • Cross-registers cell clusters across samples C i ll l l • Provides a specific record of analysis parameters allowing exact replication of an parameters, allowing exact replication of an analysis by a third party • Operator Modes: Ope ato odes – Cell population discovery mode – Clinical trial mode

  21. Availability • Free software • Available through the GenePattern toolkit on the Broad Institute website – http://www.broadinstitute.org/cancer/software/genepattern/index.htm http://www broadinstitute org/cancer/software/genepattern/index htm l • GenePattern – an environment with pipelining capabilities and a repertoire of downstream analysis tools and a repertoire of downstream analysis tools • Pyne et al. Proc Natl Acad Sci USA 2009; 106: 8519-8524.

  22. Acknowledgements • De Jager lab • Jill Mesirov – Cristin Aubin Cristin Aubin – Saumyadipta Pyne Saumyadipta Pyne – Aaron Brandes – Pablo Tamayo – Becky Briskin – Lori Chibnik • Geoff McLachlan – Portia Chipendo • Kui Wang – Xinli Hu – Linda Ottoboni • David Hafler – Nikolaos Patsopoulos – Clare Baecher-Allan – Joshua Shulman – Lisa Maier – Dong Tran – Irene Wood Irene Wood – Zongqi Xia Funding Sources • • National MS Society National MS Society • NIH: NIA, NINDS

  23. Illustrative Examples Illustrative Examples

  24. E xample 3: Feature selection – Phosphorylation of naïve & memory T cells pre- and post-stimulation 4-dimensional samples Mixture modeling Mixture modeling (ZAP70Y292 (ZAP70Y292 not shown) t h ) 5RA CD45 2 CD4 4

  25. P hosphorylation causes feature alterations in populations lt ti i l ti 5 min. 0 min. Pre-stimulation Post-stimulation

  26. M atching pre- and post- stimulation populations ti l ti l ti 2 pre-stimulation post-stimulation 6

  27. M atching pre- and post- stimulation populations across all samples populations across all samples

  28. F eature Selection Heatmap Zero-minutes Five-minutes 2 8

  29. D t QC/ t Data QC/standardization d di ti • Carefully selected panels • Carefully selected panels • Minimal cross ‐ sample variation

  30. FLow analysis with Automated Multivariate Estimation 10/01/2008 10/01/2008

  31. L ow dimension t - mixture mixture Outliers ? Low dimension clustering is not good enough

  32. M lti M ultivariate t -mixture is better i t t i t i b tt ? 3 Symmetric density is often not good enough 2

  33. M odeling with skewed distributions distributions Better fit with skew Sk Skew N Skew-normal distribution 3 Photo courtesy: Azzalini J.M. et al. Statistical applications of the multivariate skew-normal distribution, 1999. 3

  34. parametric mixture modeling •A biological population is assumed to follow a mathematical distribution, such as Gaussian •Each population can be abstracted as a cluster described by parameters such •Each population can be abstracted as a cluster , described by parameters, such as mean, mode, standard deviation, and skew, etc. •A mixture of populations can be abstracted as a mixture of distributions

  35. Modeling with Gaussian G Gaussian may be too “skinny” to i b t “ ki ” t capture the entire population

Recommend


More recommend