recent theoretical advances in sparse approximation
play

Recent Theoretical Advances in Sparse Approximation Joel A. Tropp - PowerPoint PPT Presentation

Recent Theoretical Advances in Sparse Approximation Joel A. Tropp <jtropp@ices.utexas.edu> Institute for Computational Engineering and Sciences The University of Texas at Austin Includes joint work with A. C. Gilbert, S.


  1. Recent Theoretical Advances in Sparse Approximation ❦ Joel A. Tropp <jtropp@ices.utexas.edu> Institute for Computational Engineering and Sciences The University of Texas at Austin Includes joint work with A. C. Gilbert, S. Muthukrishnan and M. J. Strauss of AT&T Research. S. Muthukrishnan is also affiliated with Rutgers Univ. 1

  2. What is Sparse Approximation? ❦ ❧ We work in the finite-dimensional Hilbert space C d ❧ Let D = { ϕ ω } be a dictionary of N unit-norm atoms indexed by Ω ❧ Let m be a fixed, positive integer ❧ Suppose x is an arbitrary input vector

  3. What is Sparse Approximation? ❦ ❧ We work in the finite-dimensional Hilbert space C d ❧ Let D = { ϕ ω } be a dictionary of N unit-norm atoms indexed by Ω ❧ Let m be a fixed, positive integer ❧ Suppose x is an arbitrary input vector ❧ The sparse approximation problem is to solve � � � � � min min � x − b λ ϕ λ subject to | Λ | ≤ m � � � � Λ ⊂ Ω b ∈ C Λ � λ ∈ Λ 2 ❧ The inner minimization is a least squares problem ❧ But the outer minimization is combinatorial

  4. What is Sparse Approximation? ❦ ❧ We work in the finite-dimensional Hilbert space C d ❧ Let D = { ϕ ω } be a dictionary of N unit-norm atoms indexed by Ω ❧ Let m be a fixed, positive integer ❧ Suppose x is an arbitrary input vector ❧ The sparse approximation problem is to solve � � � � � min min � x − b λ ϕ λ subject to | Λ | ≤ m � � � � Λ ⊂ Ω b ∈ C Λ � λ ∈ Λ 2 ❧ The inner minimization is a least squares problem ❧ But the outer minimization is combinatorial ❧ Formally, we call the problem ( D , m ) - Sparse Greed is Good 2

  5. Basic Dictionary Properties ❦ ❧ The dictionary is complete if the atoms span C d ❧ The dictionary is redundant if it contains linearly dependent atoms

  6. Basic Dictionary Properties ❦ ❧ The dictionary is complete if the atoms span C d ❧ The dictionary is redundant if it contains linearly dependent atoms ❧ A complete dictionary can represent every vector without error ❧ Each vector has infinitely many representations over a redundant dictionary

  7. Basic Dictionary Properties ❦ ❧ The dictionary is complete if the atoms span C d ❧ The dictionary is redundant if it contains linearly dependent atoms ❧ A complete dictionary can represent every vector without error ❧ Each vector has infinitely many representations over a redundant dictionary ❧ In most modern applications, dictionaries are complete and redundant Greed is Good 3

  8. Subset Selection in Regression ❦ ❧ Suppose x is a vector of d observations of a random variable X ❧ Suppose ϕ ω is a vector of d observations of random variable Φ ω ❧ Want to find a small subset of { Φ ω } for linear prediction of X

  9. Subset Selection in Regression ❦ ❧ Suppose x is a vector of d observations of a random variable X ❧ Suppose ϕ ω is a vector of d observations of random variable Φ ω ❧ Want to find a small subset of { Φ ω } for linear prediction of X ❧ Method: Solve the sparse approximation problem!

  10. Subset Selection in Regression ❦ ❧ Suppose x is a vector of d observations of a random variable X ❧ Suppose ϕ ω is a vector of d observations of random variable Φ ω ❧ Want to find a small subset of { Φ ω } for linear prediction of X ❧ Method: Solve the sparse approximation problem! ❧ Statisticians have developed many approaches 1. Forward selection 2. Backward elimination 3. Sequential replacement 4. Stepwise regression [Efroymson 1960] 5. Exhaustive search [Garside 1965, Beale et al. 1967] 6. Projection Pursuit Regression [Friedman–Stuetzle 1981] Reference: [A. J. Miller 2002] Greed is Good 4

  11. Transform Coding ❦ ❧ In simplest form, can be viewed as a sparse approximation problem DCT − − − → IDCT ← − − − − Reference: [Evans-Mersereau 2003] Greed is Good 5

  12. Computational Complexity ❦ Theorem 1. [Davis (1994), Natarajan (1995)] Any instance of Exact Cover by Three Sets ( x3c ) is reducible in polynomial time to a sparse approximation problem. An instance of x3c Greed is Good 6

  13. Computational Complexity II ❦ Corollary 2. Any algorithm that can solve ( D , m ) - Sparse for every dictionary and sparsity level must solve an NP-hard problem. ❧ It is widely believed that no tractable algorithms exist for NP-hard problems

  14. Computational Complexity II ❦ Corollary 2. Any algorithm that can solve ( D , m ) - Sparse for every dictionary and sparsity level must solve an NP-hard problem. ❧ It is widely believed that no tractable algorithms exist for NP-hard problems ❧ BUT a specific problem ( D , m ) - Sparse may be easy ❧ AND preprocessing is allowed Greed is Good 7

  15. Orthonormal Dictionaries ❦ ❧ Suppose that D is an orthonormal basis (ONB)

  16. Orthonormal Dictionaries ❦ ❧ Suppose that D is an orthonormal basis (ONB) ❧ For any vector x and sparsity level m , 1. Sort the indices { ω n } so the numbers |� x , ϕ ω n �| are decreasing

  17. Orthonormal Dictionaries ❦ ❧ Suppose that D is an orthonormal basis (ONB) ❧ For any vector x and sparsity level m , 1. Sort the indices { ω n } so the numbers |� x , ϕ ω n �| are decreasing 2. The solution to ( D , m ) - Sparse for input x is m � � x , ϕ ω n � ϕ ω n n =1

  18. Orthonormal Dictionaries ❦ ❧ Suppose that D is an orthonormal basis (ONB) ❧ For any vector x and sparsity level m , 1. Sort the indices { ω n } so the numbers |� x , ϕ ω n �| are decreasing 2. The solution to ( D , m ) - Sparse for input x is m � � x , ϕ ω n � ϕ ω n n =1 3. The squared approximation error is d |� x , ϕ ω n �| 2 � n = m +1

  19. Orthonormal Dictionaries ❦ ❧ Suppose that D is an orthonormal basis (ONB) ❧ For any vector x and sparsity level m , 1. Sort the indices { ω n } so the numbers |� x , ϕ ω n �| are decreasing 2. The solution to ( D , m ) - Sparse for input x is m � � x , ϕ ω n � ϕ ω n n =1 3. The squared approximation error is d |� x , ϕ ω n �| 2 � n = m +1 Insight: ( D , m ) - Sparse can be solved approximately so long as sub-collections of m atoms in D are sufficiently close to being orthogonal. Greed is Good 8

  20. Coherence ❦ ❧ Donoho and Huo introduced the coherence parameter µ of a dictionary: � �� �� µ = max ϕ ω j , ϕ ω k � j � = k ❧ Measures how much distinct atoms look alike

  21. Coherence ❦ ❧ Donoho and Huo introduced the coherence parameter µ of a dictionary: � �� �� µ = max ϕ ω j , ϕ ω k � j � = k ❧ Measures how much distinct atoms look alike ❧ Many natural dictionaries are incoherent [Donoho–Huo 2000] ❧ Example: Spikes + sines 1 2/ √ d Greed is Good 9

  22. Coherence Bounds ❦ ❧ In general, � N − d µ ≥ d ( N − 1) ❧ If the dictionary contains an orthonormal basis, � 1 µ ≥ d

  23. Coherence Bounds ❦ ❧ In general, � N − d µ ≥ d ( N − 1) ❧ If the dictionary contains an orthonormal basis, � 1 µ ≥ d ❧ Incoherent dictionaries can be enormous [GMS 2003] Greed is Good 10

  24. Quasi-Coherence ❦ ❧ Donoho–Elad [2003] and JAT [2003] independently introduced the quasi-coherence : m � µ 1 ( m ) = max max |� ϕ ω , ϕ λ t �| ω λ 1 ,...,λ m t =1 ❧ Observe that µ 1 (1) = µ ❧ Generalizes the cumulative coherence: µ 1 ( m ) ≤ µ m Greed is Good 11

  25. Quasi-Coherence Example ❦ ❧ Consider the dictionary of translates of a double pulse: √ 35 /6 1/6 √ ❧ The coherence is µ = 35 / 36 ❧ The quasi-coherence is √  35 / 36 , m = 1 √  µ 1 ( m ) = 35 / 18 , m = 2 √ 35 / 12 , m ≥ 3  Greed is Good 12

  26. Roadmap ❦ ❧ First, a few basic algorithms for sparse approximation ❧ Then, the role of quasi-coherence in the performance of these algorithms ❧ Finally, a new algorithm that offers better approximation guarantees Greed is Good 13

  27. Matching Pursuit (MP) ❦ ❧ In 1993, Mallat and Zhang presented a greedy method for sparse approximation over redundant dictionaries ❧ Equivalent to Projection Pursuit Regression [Friedman–Stuetzle 1981] ❧ Developed independently by Qian and Chen [1993]

  28. Matching Pursuit (MP) ❦ ❧ In 1993, Mallat and Zhang presented a greedy method for sparse approximation over redundant dictionaries ❧ Equivalent to Projection Pursuit Regression [Friedman–Stuetzle 1981] ❧ Developed independently by Qian and Chen [1993] ❧ Procedure: 1. Initialize a 0 = 0 and r 0 = x

  29. Matching Pursuit (MP) ❦ ❧ In 1993, Mallat and Zhang presented a greedy method for sparse approximation over redundant dictionaries ❧ Equivalent to Projection Pursuit Regression [Friedman–Stuetzle 1981] ❧ Developed independently by Qian and Chen [1993] ❧ Procedure: 1. Initialize a 0 = 0 and r 0 = x 2. At step t , select an atom ϕ λ t that solves max |� r t − 1 , ϕ ω �| ω

Recommend


More recommend