EMPIRICAL COMPARISON OF COLUMN SUBSET SELECTION ALGORITHMS Yining - PowerPoint PPT Presentation

EMPIRICAL COMPARISON OF COLUMN SUBSET SELECTION ALGORITHMS Yining Wang , Aarti Singh Machine Learning Department, Carnegie Mellon University 1

COLUMN SUBSET SELECTION M ∈ R n 1 × n 2 C ∈ R n 1 × s | C | ≤ s k M � CC † M k F min 2

COLUMN SUBSET SELECTION Interpretable low-rank approximation (compared to PCA) Applications: Unsupervised feature selection Image compression Genetic analysis: target SNP selection, etc. Challenges: Exact column subset selection is NP-hard 3

ALGORITHMS Deterministic Algorithms Rank-revealing QR (RRQR) [Chan, 87] Most accurate, but expensive: O ( n 3 ) Sampling based algorithms,slightly inaccurate, but cheap: O ( n 2 k ) Norm sampling [Frieze et. al., 04] Leverage score sampling [Drineas et. al., 08] Iterative norm sampling (approximate volume sampling) [Deshpand & Vempala, 06] 4

NORM SAMPLING The algorithm: 1. Compute column norms k M ( i ) k 2 2. Sample each column with probability p i / k M ( i ) k 2 2 Time complexity: O ( n 2 ) “Additive error” Error analysis: k M � CC † M k 2 F  k M � M k k 2 F + O ( k/s ) · k M k 2 F 5

LEVERAGE SCORE SAMPLING The algorithm: M = U k Σ k V > k + U � k Σ � k V > 1. Top-k truncated SVD: � k p i / k U k e i k 2 2. Leverage score sampling: 2 Time complexity: O ( n 2 k ) “Relative error” Error analysis: assuming s = Ω ( k 2 / ✏ 2 ) k M � CC † M k F  (1 + ✏ ) k M � M k k F 6

ITERATIVE NORM SAMPLING Initialize C=0. Repeat until s columns are selected: r i = M ( i ) − CC † M ( i ) 1. Compute residue: p i / k r i k 2 2. Residue norm sampling: 2 Time complexity: O ( n 2 s ) “Multiplicative error” Error analysis: k M � CC † M k 2  ( k + 1)! k M � M k k 2 ⇥ ⇤ E c F F 7

QUESTION Three different algorithms Norm sampling: k M � CC † M k 2 F  k M � M k k 2 F + ✏ k M k 2 F Leverage score sampling: k M � CC † M k 2 F  (1 + ✏ ) k M � M k k 2 F Iterative norm sampling: k M � CC † M k 2 F  ( k + 1)! k M � M k k 2 F Which one works best in practice? 8

EXPERIMENTS Synthetic data: Generate an n x k random Gaussian matrix A Set M = AA T , then normalize so that M has unit F norm Coherent design: pick a random column in M, enlarge its norm by 10 times and repeat the same column five times. Noise corruption: impose entrywise zero-mean noise on the normalized matrix M. 9

EXPERIMENTS Low-rank input, coherent design 10

EXPERIMENTS Full-rank input, coherent design 11

EXPERIMENTS Computational efficiency 12

EXPERIMENTS Human genetic data: Hapmap Phase II 13

CONCLUSION Iterative norm sampling performs much better than leverage score sampling in practice, which is not predicted by existing theoretical results. Iterative norm sampling is also computationally cheaper then leverage score sampling , which requires truncated SVD. Calls for improved analysis of iterative norm sampling! 14

REFERENCES T.F. Chan, “Rank Revealing QR Factorizations,” Linear Algebra and Its Applications, vol. 88, pp. 67-82, 1987. A. Frieze, R. Kannan and S. Vempala, “Fast Monte-Carlo Algorithms for Finding Low-rank Approximations,” Journal of the ACM , vol. 51, no. 6, pp. 1025-1041, 2004. P . Drineas, M.W. Mahoney and S. Muthukrishnan, “Relative-error CUR Matrix Decompositions,” SIAM Journal on Matrix Analysis and Applications , vol. 30, no. 2, pp. 844-881, 2008. A. Deshpande and S. Vempala, “Adaptive Sampling and Fast Low-rank Matrix Approximation,” in Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 2006, pp. 292-303. 15

EMPIRICAL COMPARISON OF COLUMN SUBSET SELECTION ALGORITHMS Yining - PowerPoint PPT Presentation

EMPIRICAL COMPARISON OF COLUMN SUBSET SELECTION ALGORITHMS Yining Wang , Aarti Singh Machine Learning Department, Carnegie Mellon University 1 COLUMN SUBSET SELECTION M R n 1 n 2 C R n 1 s | C | s k M CC M k F min 2

Column Subset Selection Joel A. Tropp Applied & Computational Mathematics California

Theorem 7.56 SUBSET-SUM is NP Complete ANSHUMAN MOHANTY SUBSET-SUM Problem Consider a set of

W4231: Analysis of Algorithms Subset Sum The Subset Sum problem is defined as follows: 11/30/99

Feature Selection: ROC and Subset Selection Theodoridis 5.5-5.7 Using ROC for Feature Selection

Part I bers, t - target number Question: Is there a subset of X such the sum of its elements is t ?

More Recursion Summary Topics: more recursion Subset sum: finding if a subset of an

Linear Algebra Vectors A column vector is a list of numbers stored vertically. The dimen-

Vectors and Matrices Vectors Defn. A matrix with one column is called a (column) vector . We

Approximation Algorithms Subset Sum III Instance : X = { x 1 , . . . , x n } n integer

Quantum algorithms Subset-sum example: for the subset-sum problem Is there a subsequence of (499

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Just Relax Convex Programming Methods for Subset Selection and Sparse Approximation Joel A.

A Fast Greedy Algorithm for Generalized Column Subset Selec:on

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Selection, Preparation, and Presentation of an Expert Witness Prepared by: Stephen M. Fowler,

Meet the Lecturer M arla Dukharan is a graduate of the University of the West Indies and native

Payment Substitutes in Traditional Cash Environments Richard D. Porter, Vice President and

Macroeconomic Implications of East Japan Earthquake by Nobuhiro Kiyotaki Princeton and LSE

Analysis of Diagnosis Errors for the APR1400 Main Control Rooms Awwal M. Arigi a and Jonghyun Kim a

Record for Appeal By: Joe Eagleton The Basics In the absence of fundamental error,

To Buy or Lease Solar PV: A Selection Bias Problem Jacquelyn Pless , Research Economist, Joint

+ Measuring Candidate Dispositions via ONLINE Interviews A Pilot Study of the DAP TM

Sambuz

Useful Links

Newsletter

Mail Us

EMPIRICAL COMPARISON OF COLUMN SUBSET SELECTION ALGORITHMS Yining - PowerPoint PPT Presentation

EMPIRICAL COMPARISON OF COLUMN SUBSET SELECTION ALGORITHMS Yining Wang , Aarti Singh Machine Learning Department, Carnegie Mellon University 1 COLUMN SUBSET SELECTION M R n 1 n 2 C R n 1 s | C | s k M CC M k F min 2

Column Subset Selection Joel A. Tropp Applied &amp; Computational Mathematics California

Theorem 7.56 SUBSET-SUM is NP Complete ANSHUMAN MOHANTY SUBSET-SUM Problem Consider a set of

W4231: Analysis of Algorithms Subset Sum The Subset Sum problem is defined as follows: 11/30/99

Feature Selection: ROC and Subset Selection Theodoridis 5.5-5.7 Using ROC for Feature Selection

Part I bers, t - target number Question: Is there a subset of X such the sum of its elements is t ?

More Recursion Summary Topics: more recursion Subset sum: finding if a subset of an

Linear Algebra Vectors A column vector is a list of numbers stored vertically. The dimen-

Vectors and Matrices Vectors Defn. A matrix with one column is called a (column) vector . We

Approximation Algorithms Subset Sum III Instance : X = { x 1 , . . . , x n } n integer

Quantum algorithms Subset-sum example: for the subset-sum problem Is there a subsequence of (499

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

Just Relax Convex Programming Methods for Subset Selection and Sparse Approximation Joel A.

A Fast Greedy Algorithm for Generalized Column Subset Selec:on

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Selection, Preparation, and Presentation of an Expert Witness Prepared by: Stephen M. Fowler,

Meet the Lecturer M arla Dukharan is a graduate of the University of the West Indies and native

Payment Substitutes in Traditional Cash Environments Richard D. Porter, Vice President and

Macroeconomic Implications of East Japan Earthquake by Nobuhiro Kiyotaki Princeton and LSE

Analysis of Diagnosis Errors for the APR1400 Main Control Rooms Awwal M. Arigi a and Jonghyun Kim a

Record for Appeal By: Joe Eagleton The Basics In the absence of fundamental error,

To Buy or Lease Solar PV: A Selection Bias Problem Jacquelyn Pless , Research Economist, Joint

+ Measuring Candidate Dispositions via ONLINE Interviews A Pilot Study of the DAP TM

Sambuz

Useful Links

Newsletter

Mail Us

Column Subset Selection Joel A. Tropp Applied & Computational Mathematics California

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?