Recent Theoretical Advances in Sparse Approximation Joel A. Tropp - PowerPoint PPT Presentation

Recent Theoretical Advances in Sparse Approximation ❦ Joel A. Tropp <jtropp@ices.utexas.edu> Institute for Computational Engineering and Sciences The University of Texas at Austin Includes joint work with A. C. Gilbert, S. Muthukrishnan and M. J. Strauss of AT&T Research. S. Muthukrishnan is also affiliated with Rutgers Univ. 1

What is Sparse Approximation? ❦ ❧ We work in the finite-dimensional Hilbert space C d ❧ Let D = { ϕ ω } be a dictionary of N unit-norm atoms indexed by Ω ❧ Let m be a fixed, positive integer ❧ Suppose x is an arbitrary input vector

What is Sparse Approximation? ❦ ❧ We work in the finite-dimensional Hilbert space C d ❧ Let D = { ϕ ω } be a dictionary of N unit-norm atoms indexed by Ω ❧ Let m be a fixed, positive integer ❧ Suppose x is an arbitrary input vector ❧ The sparse approximation problem is to solve � � � � � min min � x − b λ ϕ λ subject to | Λ | ≤ m � � � � Λ ⊂ Ω b ∈ C Λ � λ ∈ Λ 2 ❧ The inner minimization is a least squares problem ❧ But the outer minimization is combinatorial

What is Sparse Approximation? ❦ ❧ We work in the finite-dimensional Hilbert space C d ❧ Let D = { ϕ ω } be a dictionary of N unit-norm atoms indexed by Ω ❧ Let m be a fixed, positive integer ❧ Suppose x is an arbitrary input vector ❧ The sparse approximation problem is to solve � � � � � min min � x − b λ ϕ λ subject to | Λ | ≤ m � � � � Λ ⊂ Ω b ∈ C Λ � λ ∈ Λ 2 ❧ The inner minimization is a least squares problem ❧ But the outer minimization is combinatorial ❧ Formally, we call the problem ( D , m ) - Sparse Greed is Good 2

Basic Dictionary Properties ❦ ❧ The dictionary is complete if the atoms span C d ❧ The dictionary is redundant if it contains linearly dependent atoms

Basic Dictionary Properties ❦ ❧ The dictionary is complete if the atoms span C d ❧ The dictionary is redundant if it contains linearly dependent atoms ❧ A complete dictionary can represent every vector without error ❧ Each vector has infinitely many representations over a redundant dictionary

Basic Dictionary Properties ❦ ❧ The dictionary is complete if the atoms span C d ❧ The dictionary is redundant if it contains linearly dependent atoms ❧ A complete dictionary can represent every vector without error ❧ Each vector has infinitely many representations over a redundant dictionary ❧ In most modern applications, dictionaries are complete and redundant Greed is Good 3

Subset Selection in Regression ❦ ❧ Suppose x is a vector of d observations of a random variable X ❧ Suppose ϕ ω is a vector of d observations of random variable Φ ω ❧ Want to find a small subset of { Φ ω } for linear prediction of X

Subset Selection in Regression ❦ ❧ Suppose x is a vector of d observations of a random variable X ❧ Suppose ϕ ω is a vector of d observations of random variable Φ ω ❧ Want to find a small subset of { Φ ω } for linear prediction of X ❧ Method: Solve the sparse approximation problem!

Subset Selection in Regression ❦ ❧ Suppose x is a vector of d observations of a random variable X ❧ Suppose ϕ ω is a vector of d observations of random variable Φ ω ❧ Want to find a small subset of { Φ ω } for linear prediction of X ❧ Method: Solve the sparse approximation problem! ❧ Statisticians have developed many approaches 1. Forward selection 2. Backward elimination 3. Sequential replacement 4. Stepwise regression [Efroymson 1960] 5. Exhaustive search [Garside 1965, Beale et al. 1967] 6. Projection Pursuit Regression [Friedman–Stuetzle 1981] Reference: [A. J. Miller 2002] Greed is Good 4

Transform Coding ❦ ❧ In simplest form, can be viewed as a sparse approximation problem DCT − − − → IDCT ← − − − − Reference: [Evans-Mersereau 2003] Greed is Good 5

Computational Complexity ❦ Theorem 1. [Davis (1994), Natarajan (1995)] Any instance of Exact Cover by Three Sets ( x3c ) is reducible in polynomial time to a sparse approximation problem. An instance of x3c Greed is Good 6

Computational Complexity II ❦ Corollary 2. Any algorithm that can solve ( D , m ) - Sparse for every dictionary and sparsity level must solve an NP-hard problem. ❧ It is widely believed that no tractable algorithms exist for NP-hard problems

Computational Complexity II ❦ Corollary 2. Any algorithm that can solve ( D , m ) - Sparse for every dictionary and sparsity level must solve an NP-hard problem. ❧ It is widely believed that no tractable algorithms exist for NP-hard problems ❧ BUT a specific problem ( D , m ) - Sparse may be easy ❧ AND preprocessing is allowed Greed is Good 7

Orthonormal Dictionaries ❦ ❧ Suppose that D is an orthonormal basis (ONB)

Orthonormal Dictionaries ❦ ❧ Suppose that D is an orthonormal basis (ONB) ❧ For any vector x and sparsity level m , 1. Sort the indices { ω n } so the numbers |� x , ϕ ω n �| are decreasing

Orthonormal Dictionaries ❦ ❧ Suppose that D is an orthonormal basis (ONB) ❧ For any vector x and sparsity level m , 1. Sort the indices { ω n } so the numbers |� x , ϕ ω n �| are decreasing 2. The solution to ( D , m ) - Sparse for input x is m � � x , ϕ ω n � ϕ ω n n =1

Orthonormal Dictionaries ❦ ❧ Suppose that D is an orthonormal basis (ONB) ❧ For any vector x and sparsity level m , 1. Sort the indices { ω n } so the numbers |� x , ϕ ω n �| are decreasing 2. The solution to ( D , m ) - Sparse for input x is m � � x , ϕ ω n � ϕ ω n n =1 3. The squared approximation error is d |� x , ϕ ω n �| 2 � n = m +1

Orthonormal Dictionaries ❦ ❧ Suppose that D is an orthonormal basis (ONB) ❧ For any vector x and sparsity level m , 1. Sort the indices { ω n } so the numbers |� x , ϕ ω n �| are decreasing 2. The solution to ( D , m ) - Sparse for input x is m � � x , ϕ ω n � ϕ ω n n =1 3. The squared approximation error is d |� x , ϕ ω n �| 2 � n = m +1 Insight: ( D , m ) - Sparse can be solved approximately so long as sub-collections of m atoms in D are sufficiently close to being orthogonal. Greed is Good 8

Coherence ❦ ❧ Donoho and Huo introduced the coherence parameter µ of a dictionary: � �� µ = max ϕ ω j , ϕ ω k � j � = k ❧ Measures how much distinct atoms look alike

Coherence ❦ ❧ Donoho and Huo introduced the coherence parameter µ of a dictionary: � �� µ = max ϕ ω j , ϕ ω k � j � = k ❧ Measures how much distinct atoms look alike ❧ Many natural dictionaries are incoherent [Donoho–Huo 2000] ❧ Example: Spikes + sines 1 2/ √ d Greed is Good 9

Coherence Bounds ❦ ❧ In general, � N − d µ ≥ d ( N − 1) ❧ If the dictionary contains an orthonormal basis, � 1 µ ≥ d

Coherence Bounds ❦ ❧ In general, � N − d µ ≥ d ( N − 1) ❧ If the dictionary contains an orthonormal basis, � 1 µ ≥ d ❧ Incoherent dictionaries can be enormous [GMS 2003] Greed is Good 10

Quasi-Coherence ❦ ❧ Donoho–Elad [2003] and JAT [2003] independently introduced the quasi-coherence : m � µ 1 ( m ) = max max |� ϕ ω , ϕ λ t �| ω λ 1 ,...,λ m t =1 ❧ Observe that µ 1 (1) = µ ❧ Generalizes the cumulative coherence: µ 1 ( m ) ≤ µ m Greed is Good 11

Quasi-Coherence Example ❦ ❧ Consider the dictionary of translates of a double pulse: √ 35 /6 1/6 √ ❧ The coherence is µ = 35 / 36 ❧ The quasi-coherence is √  35 / 36 , m = 1 √  µ 1 ( m ) = 35 / 18 , m = 2 √ 35 / 12 , m ≥ 3  Greed is Good 12

Roadmap ❦ ❧ First, a few basic algorithms for sparse approximation ❧ Then, the role of quasi-coherence in the performance of these algorithms ❧ Finally, a new algorithm that offers better approximation guarantees Greed is Good 13

Matching Pursuit (MP) ❦ ❧ In 1993, Mallat and Zhang presented a greedy method for sparse approximation over redundant dictionaries ❧ Equivalent to Projection Pursuit Regression [Friedman–Stuetzle 1981] ❧ Developed independently by Qian and Chen [1993]

Matching Pursuit (MP) ❦ ❧ In 1993, Mallat and Zhang presented a greedy method for sparse approximation over redundant dictionaries ❧ Equivalent to Projection Pursuit Regression [Friedman–Stuetzle 1981] ❧ Developed independently by Qian and Chen [1993] ❧ Procedure: 1. Initialize a 0 = 0 and r 0 = x

Matching Pursuit (MP) ❦ ❧ In 1993, Mallat and Zhang presented a greedy method for sparse approximation over redundant dictionaries ❧ Equivalent to Projection Pursuit Regression [Friedman–Stuetzle 1981] ❧ Developed independently by Qian and Chen [1993] ❧ Procedure: 1. Initialize a 0 = 0 and r 0 = x 2. At step t , select an atom ϕ λ t that solves max |� r t − 1 , ϕ ω �| ω

Recent Theoretical Advances in Sparse Approximation Joel A. Tropp - PowerPoint PPT Presentation

Recent Theoretical Advances in Sparse Approximation Joel A. Tropp <jtropp@ices.utexas.edu> Institute for Computational Engineering and Sciences The University of Texas at Austin Includes joint work with A. C. Gilbert, S.

Recent advances in compressed sensing techniques for the numerical approximation of PDEs Simone

Recent Advances in Sparse Linear Solver Stacks for Exascale NCAR Multi-core 9 Workshop Stephen

Recent Advances in Photonic Recent Advances in Photonic effect employing IP- based distributed

Structured Prediction Problem Unstructured prediction Structured prediction Part of

Recent Advances in Biomolecular NMR Lucia Banci CERM University of Florence Recent Advances

Recent Advances in Biomolecular NMR Lucia Banci CERM University of Florence Recent Advances

Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Jared Tanner Workshop

A regularized least-squares method for sparse low-rank approximation of multivariate functions

MMSE Approximation for the Sparse Prior Using Stochastic Resonance Dror Simon Computer Science,

Tutorial: Sparse Recovery Using Sparse Matrices Piotr Indyk MIT Problem Formulation

Sparse Approximation of Signals and Images Gerlind Plonka Institute for Numerical and Applied

A Unifying Framework for Sparse Gaussian Process Approximation using Power Expectation Propagation

A Unifying Framework for Sparse Gaussian Process Approximation using Power Expectation Propagation

Sparse Inverse Covariance Estimation Using Quadratic Approximation Inderjit S. Dhillon Dept of

Confl flict and Development: Recent Advances and Future Agendas Professor Patricia Justino

Recent Advances Aly Khawaja Outline STAR- CCM+: a complete simulation workflow Emphasis on

Gradient-based Sparse Approximation for Computed Tomography Elham Sakhaee , Manuel Arreola and

Recent Results in Sparse Domination Michael Lacey Georgia Tech May 31, 2018 Section 0.0 Slide

Just Relax Convex Programming Methods for Subset Selection and Sparse Approximation Joel A.

Recent advances in Mandelbrot martingales theory Julien Barral, Universit e Paris Nord

Bayesian Batch Active Learning as Sparse Subset Approximation Robert Pinsler Jonathan Gordon

Summaries of Streaming Data Martin J. Strauss University of Michigan Sparse Approximation

Sparse approximation by modified Prony method Gerlind Plonka and Vlada Pototskaia Institut f

Recent Advances in Adap.ve Sampling and Reconstruc.on for Monte