flowMerge: Merging Mixture Components to Identify Distinct Cell Populations in Flow Cytometry Greg Finak, PhD R. Gottardo Laboratory, Vaccine and Infectious Disease Division Fred Hutchinson Cancer Research Center flowCAP 2010 Summit. September 21, 2010 Greg Finak (FHCRC) flowMerge flowCAP 2010 1 / 25
Outline Outline 1 Introduction Goals of Automated Gating Challenges for Automated Gating 2 The flowClust and flowMerge Algorithms The flowMerge Algorithm Shortfalls of flowClust / flowMerge 3 flowCAP Algorithm Settings and Gating Strategy Our Take–home lessons Future Improvements to flowClust / flowMerge Acknowledgements Greg Finak (FHCRC) flowMerge flowCAP 2010 2 / 25
Introduction Goals of Automated Gating Goals of Automated Gating . In an Ideal World: . . . Identify the same populations that a human expert can identify... as well as those they can’t. . . . . . Specifically Identify biologically relevant cell populations. Classify events into (one or more) of the identified cell populations. Do it accurately. (relative to some standard) Do it quickly (or at least faster than the human expert). Many approaches, both parametric and non–parametric. Greg Finak (FHCRC) flowMerge flowCAP 2010 3 / 25
Introduction Challenges for Automated Gating Characteristics of FCM Data . Globally FCM Data is Well Represented by a Mixture of Distributions. . However . . Cell populations in FCM data tend to be noisy, asymmetric, overlapping and not always well resolved by existing markers. Not all populations in an experiment are of interest to the question at hand. . . . . . From a modelling perspective. The distributions of individual cell populations are not ” nice” Noisy, non-gaussian, asymmetric, have non–constant variance. Gating strategies depend on the data. Greg Finak (FHCRC) flowMerge flowCAP 2010 4 / 25
Introduction Mixture Models Quick Intro to Mixture Models I . Mixture Model . . . Model a complicated distribution using a weighted combination of ” simpler”distributions. G ∑ f 0 ( y ) = π g f g ( y | θ ) g = 1 . . . . . f g ( · ) ’s can be any distributions. In practice: Multivariate Gaussian Multivariate– t (flowClust) Multivariate– t with Box–Cox transformation (flowClust) Skewed Multivariate– t (FLAME) Greg Finak (FHCRC) flowMerge flowCAP 2010 5 / 25
Introduction Mixture Models Quick Intro to Mixture Models II 0.25 0.20 0.15 0.10 Component 1 0.05 Component 2 0.00 −2 0 2 4 6 Greg Finak (FHCRC) flowMerge flowCAP 2010 6 / 25
Introduction Mixture Models Quick Intro to Mixture Models III . Gaussian Mixtures . . . Spherical or ellipsoidal covariance . . . . . . t Distribution . . . Robust to outliers . . . . . . Box–Cox Transformation . . . Allows for asymmetry . . . . . Greg Finak (FHCRC) flowMerge flowCAP 2010 7 / 25
Introduction Data Transformations The Box–Cox Transformation Flow cytometry data is usually transformed prior to gating. arcsinh, log, logicle, Box–Cox Individual populations can still be skewed. . Generalized Box–Cox Transform . . . sgn ( y ) | y | λ − 1 if λ ̸ = 0 λ x = log ( y ) otherwise . . . . . The Box–Cox encompasses the power, square, and log transformations, depending on the value of λ flowClust implements the Box–Cox as part of the model fitting procedure. Greg Finak (FHCRC) flowMerge flowCAP 2010 8 / 25
The flowClust and flowMerge Algorithms flowClust Automated Gating With flowClust I . flowClust (Lo K, Brinkman R, Gotardo R. Cytometry A, 2008 . . . A robust, flexible, model–based approach to automated gating of flow cytometry data. . . . . . Mixture model framework. Multivariate- t - robust Box–Cox transformation - allows for asymmetric popualtions. Greg Finak (FHCRC) flowMerge flowCAP 2010 9 / 25
The flowClust and flowMerge Algorithms flowClust Automated Gating With flowClust II Normal-Gamma compound parameterization of the multivariate- t . The flowClust Model (Lo et. al.) Complete data log-likelihood . . . n G { π g ϕ p ( y ( λ ) ∏ ∑ L c ( Ψ | y , z , u ) = z ig log | µ g , Σ g / u i ) i i = 1 g = 1 ·| J p ( y i ; λ ) | · Ga ( u i , ν 2 , ν } 2 ) . . . . . Ψ = { µ g , Σ g , π g , ν, λ } population means, covariances, proportions, transformation parameter and degrees of freedom for the t –distribution; Can be computed efficiently via EM. Greg Finak (FHCRC) flowMerge flowCAP 2010 10 / 25
The flowClust and flowMerge Algorithms Model Selection flowClust: Model Selection . Standard approach using BIC (Bayesian information Criterion) . . . BIC = − 2 ln ( L ) + k ln ( n ) . . . . . fit flowClust models with G = 1 through G = 20 BIC clusters ● ● ● ● ● ● ● ● ● −165000 ● ● ● ● Choose the model with −170000 the largest BIC value. ● BIC ● −175000 When G is large, there are −180000 many events, or many ● samples, this becomes 5 10 15 20 time-consuming. Can be # of Clusters parallelized. Greg Finak (FHCRC) flowMerge flowCAP 2010 11 / 25
The flowClust and flowMerge Algorithms flowClust: Problems Problems with flowClust . . G fixed to the ” true”number of populations doesn’t necessarily give the best model fit. Multiple mixture components represent the same cell population. . . . . . flowClust: G=4 flowClust: G=9 800 800 600 600 FL2−H 19% FL2−H 18% misclassified misclassified 400 400 200 200 0 0 0 200 400 600 800 1000 0 200 400 600 800 1000 FL1−H FL1−H Greg Finak (FHCRC) flowMerge flowCAP 2010 12 / 25
The flowClust and flowMerge Algorithms flowMerge flowMerge: Modelling Distinct Cell Populations . flowMerge (Finak G, Bashashati A, Brinkman R, Gottardo R. . Advances in Bioinformatics, 2009 . . Extends the flowClust methodology to identify and model distinct cell populations. . . . . . Merges overlapping mixture components based on entropy. Summarizes merged components using a single multivariate– t distribution based on moment matching conditions. Greg Finak (FHCRC) flowMerge flowCAP 2010 13 / 25
The flowClust and flowMerge Algorithms Merging Cell Populations Mixture Components and Entropy Entropy measures the uncertainty of a random variable. For mixture models we define the entropy of clustering of a G–component mixture model. . Definition . . . Entropy of Clustering G N ∑ ∑ H ( G ) = − 2 z ij log ( z ij ) i = 1 j = 1 . . . . . z ij is the probability that Entropy: Single Cell, Two Clusters cell j is assigned to 2.0 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 1.5 * * population i . * * * * * * * * * * * * c Entropy * * * * * * 1.0 * * * * * * * * * * Overlapping mixture * * * * 0.5 * * * * * * * * components: large * * 0.0 0.2 0.4 0.6 0.8 1.0 P(Z=1) uncertainty, high entropy. Greg Finak (FHCRC) flowMerge flowCAP 2010 14 / 25
The flowClust and flowMerge Algorithms The flowMerge Algorithm The flowMerge Algorithm 1 Start with a max ( BIC ) flowClust model ( k clusters). 2 Compute the entropy for all pairwise model components. 3 Merge the two components that contribute most to the entropy. 4 Recompute the pairwise entropy of the new merged cluster. 5 Repeat from 2. until one component remains. 6 Choose the ” best”fitting merged model from the plot of Entropy vs Number of Clusters. Greg Finak (FHCRC) flowMerge flowCAP 2010 15 / 25
The flowClust and flowMerge Algorithms Summarizing Merged Components Summarizing Components We can summarize merged components using the same multivariate– t framework used in flowClust. p ∗ f ∗ = p i f i + p j f j . Moment Matching Conditions . . . p ∗ = p i + p j µ ∗ = ( p i µ i + p j µ j ) ; p ∗ [ ] [ ν j ] ν i − 2 Σ i + µ i µ ′ ν i ν j − 2 Σ j + µ j µ ′ ( ν ∗ − 2 ) p i ( ν ∗ − 2 ) p j i j Σ ∗ = + p ∗ ν ∗ p ∗ ν ∗ − ( ν ∗ − 2 ) p ∗ µ ∗ µ ′ ∗ p ∗ ν ∗ . . . . . Greg Finak (FHCRC) flowMerge flowCAP 2010 16 / 25
The flowClust and flowMerge Algorithms flowClust vs flowMerge Example flowClust vs flowMerge flowClust ICL solution flowClust ICL solution flowClust ICL solution A 104 101 102 103 104 101 102 103 104 103 CD8 PC5 CD8 PC5 CD4 PE 102 101 0 0 0 101 102 103 104 101 102 103 104 101 102 103 104 0 0 0 CD7 FITC CD7 FITC CD4 PE flowClust BIC solution flowClust BIC solution flowClust BIC solution B 104 101 102 103 104 101 102 103 104 103 CD8 PC5 CD8 PC5 CD4 PE 102 101 0 0 0 101 102 103 104 101 102 103 104 101 102 103 104 0 0 0 CD7 FITC CD7 FITC CD4 PE C D flowMerge solution flowMerge solution flowMerge solution 104 60000 101 102 103 104 101 102 103 104 103 CD8 PC5 CD8 PC5 Entropy CD4 PE 102 20000 101 0 0 0 0 101 102 103 104 101 102 103 104 101 102 103 104 2 4 6 8 10 12 0 0 0 Number of Clusters CD7 FITC CD7 FITC CD4 PE Figure 4 Greg Finak (FHCRC) flowMerge flowCAP 2010 17 / 25
Recommend
More recommend