On the Use of NMF and curvHDR to Cluster Flow Cytometry Data e M. Maisog 1,2 , Andrea A. Barbo 2 , George Luta 2 Jos´ 1 Medical Numerics, Inc., Germantown, MD 20876 2 Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, DC 20057 FlowCAP Summit, September 21-22, 2010
Outline Non-Negative Matrix Factorization 1 NMF and curvHDR September, 2010 2 / 21
Outline Non-Negative Matrix Factorization 1 curvHDR 2 NMF and curvHDR September, 2010 2 / 21
Outline Non-Negative Matrix Factorization 1 curvHDR 2 Strategy for FlowCAP Challenge 2 3 NMF and curvHDR September, 2010 2 / 21
Outline Non-Negative Matrix Factorization 1 curvHDR 2 Strategy for FlowCAP Challenge 2 3 Discussion 4 NMF and curvHDR September, 2010 2 / 21
Non-Negative Matrix Factorization NMF A relatively new method of matrix decomposition [LS99] NMF and curvHDR September, 2010 3 / 21
Non-Negative Matrix Factorization NMF A relatively new method of matrix decomposition [LS99] Given Y , an M × N non-negative matrix NMF and curvHDR September, 2010 3 / 21
Non-Negative Matrix Factorization NMF A relatively new method of matrix decomposition [LS99] Given Y , an M × N non-negative matrix Find W and H such that: Y ≈ W × H NMF and curvHDR September, 2010 3 / 21
Non-Negative Matrix Factorization NMF A relatively new method of matrix decomposition [LS99] Given Y , an M × N non-negative matrix Find W and H such that: Y ≈ W × H W is M × k , H is k × N NMF and curvHDR September, 2010 3 / 21
Non-Negative Matrix Factorization NMF A relatively new method of matrix decomposition [LS99] Given Y , an M × N non-negative matrix Find W and H such that: Y ≈ W × H W is M × k , H is k × N W and H are non-negative NMF and curvHDR September, 2010 3 / 21
Non-Negative Matrix Factorization NMF A relatively new method of matrix decomposition [LS99] Given Y , an M × N non-negative matrix Find W and H such that: Y ≈ W × H W is M × k , H is k × N W and H are non-negative Must specify k (cf. k -means clustering) NMF and curvHDR September, 2010 3 / 21
Non-Negative Matrix Factorization NMF A relatively new method of matrix decomposition [LS99] Given Y , an M × N non-negative matrix Find W and H such that: Y ≈ W × H W is M × k , H is k × N W and H are non-negative Must specify k (cf. k -means clustering) Dimensionality Reduction: k < M, N NMF and curvHDR September, 2010 3 / 21
Non-Negative Matrix Factorization NMF A relatively new method of matrix decomposition [LS99] Given Y , an M × N non-negative matrix Find W and H such that: Y ≈ W × H W is M × k , H is k × N W and H are non-negative Must specify k (cf. k -means clustering) Dimensionality Reduction: k < M, N There are multiple variations, e.g. different optimization criteria NMF and curvHDR September, 2010 3 / 21
Non-Negative Matrix Factorization NMF: Algorithm H Variables (e.g., genes) les Samples Y W W • H + E = (Based on a figure from [You09]) NMF and curvHDR September, 2010 4 / 21
Non-Negative Matrix Factorization NMF: Algorithm H Variables (e.g., genes) les Samples Y W W • H + E = (Based on a figure from [You09]) Initialize W and H with random values. NMF and curvHDR September, 2010 4 / 21
Non-Negative Matrix Factorization NMF: Algorithm H Variables (e.g., genes) Samples les Y W W • H + E = (Based on a figure from [You09]) Initialize W and H with random values. Optimize so that � ( y ij − wh ij ) 2 is minimized. NMF and curvHDR September, 2010 4 / 21
Non-Negative Matrix Factorization NMF: Algorithm H Variables (e.g., genes) Samples les Y W W • H + E = (Based on a figure from [You09]) Initialize W and H with random values. Optimize so that � ( y ij − wh ij ) 2 is minimized. The k rows of H define “metagenes”, while the i th row of W represents the “metagene expression pattern of the corresponding sample” [Dev08] NMF and curvHDR September, 2010 4 / 21
Non-Negative Matrix Factorization NMF Results are “Sparse” NMF has decomposed the face data into discrete “parts.” (Lee and Seung, Nature 1999 Oct 21;401(6755):788-91) NMF and curvHDR September, 2010 5 / 21
Non-Negative Matrix Factorization PCA of Face Data Principal components are “holistic” rather than discrete “parts.” (Lee and Seung, Nature 1999 Oct 21;401(6755):788-91) NMF and curvHDR September, 2010 6 / 21
Non-Negative Matrix Factorization Comparison of Matrix Decomposition Methods Method Constraints Basis Encodings PCA/SVD components are orthogonal non-sparse non-sparse ICA statistically independent components sparse non-sparse NMF data and factors are non-negative sparse sparse NMF and curvHDR September, 2010 7 / 21
curvHDR curvHDR Unsupervised clustering with unknown number of clusters [NLW10] NMF and curvHDR September, 2010 8 / 21
curvHDR curvHDR Unsupervised clustering with unknown number of clusters [NLW10] Algorithm: Remove excess boundary points and other debris NMF and curvHDR September, 2010 8 / 21
curvHDR curvHDR Unsupervised clustering with unknown number of clusters [NLW10] Algorithm: Remove excess boundary points and other debris Obtain significant high negative curvature regions [DCKW08] NMF and curvHDR September, 2010 8 / 21
curvHDR curvHDR Unsupervised clustering with unknown number of clusters [NLW10] Algorithm: Remove excess boundary points and other debris Obtain significant high negative curvature regions [DCKW08] Replace each of the significant curvature regions by their convex hull NMF and curvHDR September, 2010 8 / 21
curvHDR curvHDR Unsupervised clustering with unknown number of clusters [NLW10] Algorithm: Remove excess boundary points and other debris Obtain significant high negative curvature regions [DCKW08] Replace each of the significant curvature regions by their convex hull Grow each convex hull by a factor G. NMF and curvHDR September, 2010 8 / 21
curvHDR curvHDR Unsupervised clustering with unknown number of clusters [NLW10] Algorithm: Remove excess boundary points and other debris Obtain significant high negative curvature regions [DCKW08] Replace each of the significant curvature regions by their convex hull Grow each convex hull by a factor G. Obtain a kernel density estimate for data within each grown region [DCKW08] NMF and curvHDR September, 2010 8 / 21
curvHDR curvHDR Unsupervised clustering with unknown number of clusters [NLW10] Algorithm: Remove excess boundary points and other debris Obtain significant high negative curvature regions [DCKW08] Replace each of the significant curvature regions by their convex hull Grow each convex hull by a factor G. Obtain a kernel density estimate for data within each grown region [DCKW08] The curvHDR gate is the union of the level- τ high density regions (HDRs). NMF and curvHDR September, 2010 8 / 21
curvHDR curvHDR Unsupervised clustering with unknown number of clusters [NLW10] Algorithm: Remove excess boundary points and other debris Obtain significant high negative curvature regions [DCKW08] Replace each of the significant curvature regions by their convex hull Grow each convex hull by a factor G. Obtain a kernel density estimate for data within each grown region [DCKW08] The curvHDR gate is the union of the level- τ high density regions (HDRs). Currently only the 2D version is implemented, but a 3D version will be released soon NMF and curvHDR September, 2010 8 / 21
curvHDR curvHDR: Illustration (Naumann et al., BMC Bioinformatics 2010 Jan 22;11:44) NMF and curvHDR September, 2010 9 / 21
Recommend
More recommend