C LUSTERING S TRATEGY : SWIFT GMM clustering with Sampling for k [ K - PowerPoint PPT Presentation

SWIFT: S CALABLE W EIGHTED I TERATIVE F LOW - CLUSTERING T ECHNIQUE Iftekhar Naim ∗ , Gaurav Sharma ∗ , Suprakash Datta † , James S. Cavenaugh ∗ , Jyh-Chiang E. Wang ∗ , Jonathan A. Rebhahn ∗ , Sally A. Quataert ∗ , and Tim R. Mosmann ∗ ∗ University of Rochester, Rochester, NY † York University, Toronto, ON FlowCAP Summit, 2010 1 / 48 SWIFT

O UTLINE 1 I NTRODUCTION Flow cytometry (FC) Data Analysis Automated Multivariate clustering of FC Data 2 SWIFT METHOD FOR FC DATA ANALYSIS SWIFT Algorithm Weighted Iterative Sampling based EM Bimodality Splitting Graph-based Merging 3 D OES I T W ORK ? Does It Work? How Do We Know It Works? 4 F LOW CAP C ONTEST Results on FlowCAP Datasets Few Thoughts for FlowCAP II 5 C ONCLUSION 2 / 48 SWIFT

F LOW CYTOMETRY (FC) O VERVIEW ◮ Rapid multivariate analysis of individual cells. ◮ High throughput data generation (description of ∼ 1 million cells). ◮ High dimensionality ( ∼ 20 measurements per cell). Fluorochrome Antibody Antigen Cell F IGURE : Flow cytometry system (Ref: http://probes.invitrogen.com) 4 / 48 SWIFT

FC D ATA A NALYSIS ◮ Traditionally FC data analyzed by Manual Gating ◮ Subjective, Scales poorly with increasing dimensions ◮ 1D/2D Projections may not represent full picture ◮ Inaccurate for overlapping clusters (a) Two overlapping (b) Combined view (c) Manual gating clusters F IGURE : Manual gating for overlapping clusters. ◮ Automated multivariate clustering is desirable for FC Data analysis . ◮ Repeatable, nonsubjective, comprehends multivariate structure. 5 / 48 SWIFT

C HALLENGES OF AUTOMATED CLUSTERING OF FC D ATA ◮ Challenges of Automated Clustering: ◮ Large FC datasets ◮ ∼ 1 million events ◮ High dimensionality ( 20 or more dimensions) ◮ Very small clusters that are important in immunological analysis (100 − 200 cells out of millions) ◮ Overlapping clusters and background noise 6 / 48 SWIFT

C HALLENGES OF AUTOMATED CLUSTERING OF FC D ATA ◮ Challenges of Automated Clustering: ◮ Large FC datasets ◮ ∼ 1 million events ◮ High dimensionality ( 20 or more dimensions) ◮ Very small clusters that are important in immunological analysis (100 − 200 cells out of millions) ◮ Overlapping clusters and background noise ◮ Our goal: Design automated clustering method capable of addressing these challenges 6 / 48 SWIFT

M ANY D IFFERENT C LUSTERING M ETHODS Patitional Clustering Soft Hard Mixture Fuzzy Grid Spectral .... K-means Model Clustering Based Clustering 7 / 48 SWIFT

M ANY D IFFERENT C LUSTERING M ETHODS Patitional Clustering Soft Hard Mixture Fuzzy Grid Spectral .... K-means Model Clustering Based Clustering 8 / 48 SWIFT

M ODEL BASED CLUSTERING FOR FC DATA ◮ Model based clustering offers several advantages: ◮ Soft clustering- comprehends overlapping clusters, background noise ◮ BUT, computationally expensive and choice of model imposes limitation 9 / 48 SWIFT

M ODEL BASED CLUSTERING FOR FC DATA ◮ Model based clustering offers several advantages: ◮ Soft clustering- comprehends overlapping clusters, background noise ◮ BUT, computationally expensive and choice of model imposes limitation ◮ Recent proposals for statistical model based FC clustering (Chan et al. [2008], Lo et al. [2008],Finak et al. [2009], Pyne et al. [2009]) 9 / 48 SWIFT

M ODEL BASED CLUSTERING FOR FC DATA ◮ Model based clustering offers several advantages: ◮ Soft clustering- comprehends overlapping clusters, background noise ◮ BUT, computationally expensive and choice of model imposes limitation ◮ Recent proposals for statistical model based FC clustering (Chan et al. [2008], Lo et al. [2008],Finak et al. [2009], Pyne et al. [2009]) ◮ We propose computationally efficient model-based clustering method SWIFT (Naim et al. [2010]) that offers two advantages: ◮ Scalability: Faster Computation + Less Memory Usage ◮ Detection of Small Populations: ∼ 100 cells out of 1 million 9 / 48 SWIFT

SWIFT A LGORITHM FOR FC D ATA C LUSTERING SWIFT: a three stage algorithm: 11 / 48 SWIFT

SWIFT A LGORITHM FOR FC D ATA C LUSTERING SWIFT: a three stage algorithm: 1 Weighted Iterative Sampling based EM : Gaussian mixture model clustering + novel weighted iterative sampling ◮ Bayesian Information Criterion (BIC) 11 / 48 SWIFT

SWIFT A LGORITHM FOR FC D ATA C LUSTERING SWIFT: a three stage algorithm: 1 Weighted Iterative Sampling based EM : Gaussian mixture model clustering + novel weighted iterative sampling ◮ Bayesian Information Criterion (BIC) 2 Bimodality Splitting: Split any cluster that is, ◮ Bimodal in any dimensions or any principal components. ◮ Useful for clustering high dimensional data. 11 / 48 SWIFT

SWIFT A LGORITHM FOR FC D ATA C LUSTERING SWIFT: a three stage algorithm: 1 Weighted Iterative Sampling based EM : Gaussian mixture model clustering + novel weighted iterative sampling ◮ Bayesian Information Criterion (BIC) 2 Bimodality Splitting: Split any cluster that is, ◮ Bimodal in any dimensions or any principal components. ◮ Useful for clustering high dimensional data. 3 Graph-based Merging: Merge overlapping Gaussians ( Hennig [2009], Finak et al. [2009], Baudry et al. [2010]). ◮ Allows representation of non-Gaussian clusters 11 / 48 SWIFT

C LUSTERING S TRATEGY : SWIFT GMM clustering with Sampling for k ∈ [ K min , K max ] BIC to decide number of Gaussians ( ˆ K ) Split Bimodal Clusters until Unimodal. Results in K split Clusters Graph-based Merging using Overlap/Entropy criteria Results in K entropy Clusters Soft clustering for K entropy clusters 12 / 48 SWIFT

S TAGE 1: G AUSSIAN M IXTURE M ODEL C LUSTERING GMM clustering with Sampling for k ∈ [ K min , K max ] BIC to decide number of Gaussians ( ˆ K ) Split Bimodal Clusters until Unimodal. Results in K split Clusters Graph-based Merging using Overlap/Entropy criteria Results in K entropy Clusters Soft clustering for K entropy clusters 13 / 48 SWIFT

S TAGE 1: G AUSSIAN MIXTURE MODEL CLUSTERING ◮ Gaussian mixture model (GMM) clustering is chosen among the model based methods ◮ Faster than other model based clustering methods ◮ Closed form solution 14 / 48 SWIFT

S TAGE 1: G AUSSIAN MIXTURE MODEL CLUSTERING ◮ Gaussian mixture model (GMM) clustering is chosen among the model based methods ◮ Faster than other model based clustering methods ◮ Closed form solution ◮ Expectation Maximization (EM) algorithm for parameter estimation ◮ Computational complexity of each iteration: O ( Nkd 2 ) ◮ N = the number of data-vectors in the dataset ◮ k = is the number of Gaussian components ◮ d = is the dimension of each data-vectors 14 / 48 SWIFT

S TAGE 1: S AMPLING FOR S CALABILITY ◮ Operate on smaller subsample of dataset for better computation performance. ◮ Challenge: Poor representation of smaller clusters. (a) 4 Gaussians with 150K, 100K, 50K (b) After 10% sampling and 150 datapoints 15 / 48 SWIFT

S TAGE 1: S AMPLING FOR S CALABILITY ◮ Operate on smaller subsample of dataset for better computation performance. ◮ Challenge: Poor representation of smaller clusters. (c) 4 Gaussians with 150K, 100K, 50K (d) After 10% sampling and 150 datapoints ◮ Solution: Weighted iterative sampling ◮ Faster computation ◮ Better detection of small clusters 15 / 48 SWIFT

S TAGE 1: W EIGHTED I TERATIVE S AMPLING BASED EM FCS Dataset X Subsample S from X GMM fitting to S using EM Fix p largest clusters and add Resample S from X l �∈ F γ ( i ) with probability � them to F . Initially F = ∅ j All the No clusters fixed? Yes Perform few EM iterations on X Output model parameters ( θ ) 16 / 48 SWIFT

S TAGE 1: W EIGHTED I TERATIVE S AMPLING BASED EM FCS Dataset X F = set of clusters whose parameters are fixed. Subsample S from X GMM fitting to S using EM Fix p largest clusters and add Resample S from X l �∈ F γ ( i ) with probability � them to F . Initially F = ∅ j All the No clusters fixed? Yes Perform few EM iterations on X Output model parameters ( θ ) 16 / 48 SWIFT

S TAGE 1: W EIGHTED I TERATIVE S AMPLING BASED EM FCS Dataset X F = set of clusters whose parameters are fixed. Subsample S from X P ( X ( i ) is selected in S ) = ∑ l �∈ F γ ( i ) l GMM fitting to S using EM Fix p largest clusters and add Resample S from X l �∈ F γ ( i ) with probability � them to F . Initially F = ∅ j All the No clusters fixed? Yes Perform few EM iterations on X Output model parameters ( θ ) 16 / 48 SWIFT

S TAGE 1: W EIGHTED I TERATIVE S AMPLING BASED EM F IGURE : 4 Gaussian clusters with 150K, 100K, 50K and 150 datapoints 17 / 48 SWIFT

C LUSTERING S TRATEGY : SWIFT GMM clustering with Sampling for k [ K - PowerPoint PPT Presentation

SWIFT: S CALABLE W EIGHTED I TERATIVE F LOW - CLUSTERING T ECHNIQUE Iftekhar Naim , Gaurav Sharma , Suprakash Datta , James S. Cavenaugh , Jyh-Chiang E. Wang , Jonathan A. Rebhahn , Sally A. Quataert , and Tim R. Mosmann

COMBINING SWIFT AND OBJECTIVE-C AGENDA Using Objective-C from Swift Using Swift from

SWIFT presentation SWIFT for Corporates - Do not share without SWIFT's prior consent 2 Whats

Good Morning SWIFT HI! I'm Marc Prud'hommeaux marc@glimpse.io Swift Public beta: June 2014

Swift Swiftly A quick introduction to the Swift language Oliver Jones Technical Director

OpenStack Swift OpenStack Summit Atlanta 2014 Martin Lanner & Hugo Kuo May 15, 2014 Agenda

Death by a 1000 Cuts: Bringing Swift to Windows Saleem Abdulrasool ( @ compnerd) Porting by a

Distribution Nion Swift Workshop Chris Meyer 2018 October 5 Nion Swift is software for

SWIFT: Administration SWIM Industry Collaboration Workshop #6 SWIM, Services & SWIFT

SWIFT: Administration SWIM Industry Collaboration Workshop #10 SWIM, Services & SWIFT

Swift Fox By Ruby Stucki Looks The swift fox is a small, light orange-tan fox with large ears,

Pippa Leary, CEO SWIFT MEDIA CEO Messages What is Swift? TRANSITIONING TO A STRONGER

Swift Follow-Up Observations of s and +GW events Swift follow-up of IceCube

Measuring Progress of the Iowa Nutrient Reduction S trategy Laurie Nowatzke Measurement

A N ATIONAL F OOD S TRATEGY ? 1) Blueprint for a National Food Strategy (with the Center for

S EMANTIC -B ASED M ULTILINGUAL D OCUMENT C LUSTERING VIA T ENSOR M ODELING Salvatore Romeo 1 ,

P rediction of U nderlying L atent C lasses via K -means and H ierarchical C lustering A lgorithm

Inferring Datatypes Ningning Xie HIW 2019.08.23 data Maybe a = Nothing | Just a data List a = Nil

fc Forward Checking Consider the following problem (csp5) variables V[1] to V[10] uniform

FDP Strategic Plan for Phase VII Richard Seligman, Chair California Institute of Technology

Generative adversarial networks Ian Jean Mehdi Goodfellow Pouget-Abadie Mirza David Bing

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #16:

Changing of the Guards . Joan Daemen CHES 2017 Taipei, September 26, 2017 Radboud University

OMG, NPIV! Virtualizing Fibre Channel with Linux and KVM Paolo Bonzini, Red Hat Hannes Reinecke,

Revisiting Network Support for RDMA Radhika Mittal 1 , Alex Shpiner 3 , Aurojit Panda 1,4 , Eitan

C LUSTERING S TRATEGY : SWIFT GMM clustering with Sampling for k [ K - PowerPoint PPT Presentation

SWIFT: S CALABLE W EIGHTED I TERATIVE F LOW - CLUSTERING T ECHNIQUE Iftekhar Naim , Gaurav Sharma , Suprakash Datta , James S. Cavenaugh , Jyh-Chiang E. Wang , Jonathan A. Rebhahn , Sally A. Quataert , and Tim R. Mosmann

COMBINING SWIFT AND OBJECTIVE-C AGENDA Using Objective-C from Swift Using Swift from

SWIFT presentation SWIFT for Corporates - Do not share without SWIFT's prior consent 2 Whats

Good Morning SWIFT HI! I'm Marc Prud'hommeaux marc@glimpse.io Swift Public beta: June 2014

Swift Swiftly A quick introduction to the Swift language Oliver Jones Technical Director

OpenStack Swift OpenStack Summit Atlanta 2014 Martin Lanner &amp; Hugo Kuo May 15, 2014 Agenda

Death by a 1000 Cuts: Bringing Swift to Windows Saleem Abdulrasool ( @ compnerd) Porting by a

Distribution Nion Swift Workshop Chris Meyer 2018 October 5 Nion Swift is software for

SWIFT: Administration SWIM Industry Collaboration Workshop #6 SWIM, Services &amp; SWIFT

SWIFT: Administration SWIM Industry Collaboration Workshop #10 SWIM, Services &amp; SWIFT

Swift Fox By Ruby Stucki Looks The swift fox is a small, light orange-tan fox with large ears,

Pippa Leary, CEO SWIFT MEDIA CEO Messages What is Swift? TRANSITIONING TO A STRONGER

Swift Follow-Up Observations of s and +GW events Swift follow-up of IceCube

Measuring Progress of the Iowa Nutrient Reduction S trategy Laurie Nowatzke Measurement

A N ATIONAL F OOD S TRATEGY ? 1) Blueprint for a National Food Strategy (with the Center for

S EMANTIC -B ASED M ULTILINGUAL D OCUMENT C LUSTERING VIA T ENSOR M ODELING Salvatore Romeo 1 ,

P rediction of U nderlying L atent C lasses via K -means and H ierarchical C lustering A lgorithm

Inferring Datatypes Ningning Xie HIW 2019.08.23 data Maybe a = Nothing | Just a data List a = Nil

fc Forward Checking Consider the following problem (csp5) variables V[1] to V[10] uniform

FDP Strategic Plan for Phase VII Richard Seligman, Chair California Institute of Technology

Generative adversarial networks Ian Jean Mehdi Goodfellow Pouget-Abadie Mirza David Bing

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #16:

Changing of the Guards . Joan Daemen CHES 2017 Taipei, September 26, 2017 Radboud University

OMG, NPIV! Virtualizing Fibre Channel with Linux and KVM Paolo Bonzini, Red Hat Hannes Reinecke,

Revisiting Network Support for RDMA Radhika Mittal 1 , Alex Shpiner 3 , Aurojit Panda 1,4 , Eitan

OpenStack Swift OpenStack Summit Atlanta 2014 Martin Lanner & Hugo Kuo May 15, 2014 Agenda

SWIFT: Administration SWIM Industry Collaboration Workshop #6 SWIM, Services & SWIFT

SWIFT: Administration SWIM Industry Collaboration Workshop #10 SWIM, Services & SWIFT