reconstruction from anisotropic random measurements
play

Reconstruction from Anisotropic Random Measurements Mark Rudelson - PowerPoint PPT Presentation

Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 2013 Ann Arbor, Michigan August 7, 2013 Want to estimate a parameter R p


  1. Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 2013 Ann Arbor, Michigan August 7, 2013

  2. Want to estimate a parameter β ∈ R p Example: How is a response y ∈ R related to the Parkinson’s disease affected by a set of genes among the Chinese population? Construct a linear model: y = β T � x + ǫ , where E ( y | � x ) = β T � x . ◮ Parameter: Non-zero entries in β (sparsity of β ) identify a subset of genes and indicate how much they influence y . Take a random sample of ( X , Y ) , and use the sample to estimate β ; that is, we have Y = X β + ǫ .

  3. Model selection and parameter estimation When can we approximately recover β from n noisy observations Y ? Questions: How many measurements n do we need in order to recover the non-zero positions in β ? How does n scale with p or s , where s is the number of non-zero entries of β ? What assumptions about the data matrix X are reasonable?

  4. Sparse recovery When β is known to be s -sparse for some 1 ≤ s ≤ n , which means that at most s of the coefficients of β can be non-zero: Assume every 2 s columns of X are linearly independent: Identifiability condition (reasonable once n ≥ 2 s ) � X υ � 2 △ Λ min ( 2 s ) = min n � υ � 2 > 0 . υ � = 0 , 2 s -sparse Proposition: (Cand` es-Tao 05). Suppose that any 2 s columns of the n × p matrix X are linearly independent. Then, any s -sparse signal β ∈ R p can be reconstructed uniquely from X β .

  5. ℓ 0 -minimization How to reconstruct an s -sparse signal β ∈ R p from the measurements Y = X β given Λ min ( 2 s ) > 0? Let β be the unique sparsest solution to X β = Y : β = arg min β : X β = Y � β � 0 where � β � 0 := # { 1 ≤ i ≤ p : β i � = 0 } is the sparsity of β . Unfortunately, ℓ 0 -minimization is computationally intractable; (in fact, it is an NP-complete problem).

  6. Basis pursuit Consider the following convex optimization problem β ∗ := arg min β : X β = Y � β � 1 . Basis pursuit works whenever the n × p measurement matrix X is sufficiently incoherent: RIP (Cand` es-Tao 05) requires that for all T ⊂ { 1 , . . . , p } with | T | ≤ s and for all coefficients sequences ( c j ) j ∈ T , ( 1 − δ s ) � c � 2 ≤ � X T c / n � 2 ≤ ( 1 + δ s ) � c � 2 holds for some 0 < δ s < 1 ( s -restricted isometry constant). The “good” matrices for compressed sensing should satisfy the inequalities for the largest possible s .

  7. Restricted Isometry Property (RIP): examples For Gaussian random matrix, or any sub-Gaussian ensemble, RIP holds with s ≍ n / log ( p / n ) . For random Fourier ensemble, or randomly sampled rows of orthonormal matrices, RIP holds for s = O ( n / log 4 p ) . For a random matrix composed of columns that are independent isotropic vectors with log-concave densities, RIP holds for s = O ( n / log 2 ( p / n )) . References: Cand` es-Tao 05, 06, Rudelson-Vershynin 05, Donoho 06, Baraniuk et al. 08, Mendelson et al. 08, Adamczak et al. 09.

  8. Basis pursuit for high dimensional data These algorithms are also robust with regards to noise, and RIP will be replaced by more relaxed conditions. In particular, the isotropicity condition which has been assumed in all literature cited above needs to be dropped. Let X i ∈ R p , i = 1 , . . . , n be i.i.d. random row vectors of the design matrix X . Covariance matrix: E X i ⊗ X i = E X i X T Σ( X i ) = i n n � � 1 X i ⊗ X i = 1 � X i X T Σ n = i n n i = 1 i = 1 X i is isotropic if Σ( X i ) = I and E � X i � 2 2 = n .

  9. Sparse recovery for Y = X β + ǫ Lasso (Tibshirani 96), a.k.a. Basis Pursuit (Chen, Donoho and Saunders 98, and others): � β � Y − X β � 2 / 2 n + λ n � β � 1 , β = arg min where the scaling factor 1 / ( 2 n ) is chosen by convenience. Dantzig selector (Cand` es-Tao 07): β ∈ R p � � β � 1 subject to � X T ( Y − X � ( DS ) arg min β ) / n � ∞ ≤ λ n . � References: Greenshtein-Ritov 04, Meinshausen-B¨ uhlmann 06, Zhao-Yu 06, Bunea et al. 07, Cand` es-Tao 07, van de Geer 08, Zhang-Huang 08, Wainwright 09, Koltchinskii 09, Meinshausen-Yu 09, Bickel et. al. 09, and others.

  10. The Cone Constraint For an appropriately chosen λ n , the solution of the Lasso or the Dantzig selector satisfies (under i.i.d. Gaussian noise), with high probability, υ := � β − β ∈ C ( s , k 0 ) k 0 = 1 for the Dantzig selector, and k 0 = 3 for the Lasso. Object of interest: for 1 ≤ s 0 ≤ p , and a positive number k 0 , � � x ∈ R p | ∃ J ∈ { 1 , . . . , p } , | J | = s 0 s.t. � x J c � 1 ≤ k 0 � x J � 1 C ( s 0 , k 0 ) = This object has appeared in earlier work in the noiseless setting References: Donoho-Huo 01, Elad-Bruckstein 02, Feuer-Nemirovski 03, Cand` es-Tao 07, Bickel-Ritov-Tsybakov 09, Cohen-Dahmen-DeVore 09.

  11. The Lasso solution

  12. Restricted Eigenvalue (RE) condition Object of interest: � � x ∈ R p | ∃ J ∈ { 1 , . . . , p } , | J | = s 0 s.t. � x J c � 1 ≤ k 0 � x J � 1 C ( s 0 , k 0 ) = . Definition Matrix A q × p satisfies RE ( s 0 , k 0 , A ) condition with parameter K ( s 0 , k 0 , A ) if for any υ � = 0, 1 � A υ � 2 K ( s 0 , k 0 , A ) := min min > 0 . � υ J � 2 � υ Jc � 1 ≤ k 0 � υ J � 1 J ⊆{ 1 ,..., p } , | J |≤ s 0 References: van de Geer 07, Bickel-Ritov-Tsybakov 09, van de Geer-B¨ uhlmann 09.

  13. An elementary estimate Lemma For each vector υ ∈ C ( s 0 , k 0 ) , let T 0 denote the locations of the s 0 � � � � � � � υ T 0 � largest coefficients of υ in absolute values. Then � υ T c � 1 ≤ 1 , and 0 � � � υ � 2 � υ T 0 � √ 2 ≥ 1 + k 0 . Implication: Let A be a q × p matrix such that RE ( s 0 , 3 k 0 , A ) condition holds for 0 < K ( s 0 , 3 k 0 , A ) < ∞ . Then ∀ υ ∈ C ( s 0 , k 0 ) ∩ S p − 1 � � � υ T 0 � 1 2 � A υ � 2 ≥ K ( s 0 , k 0 , A ) ≥ K ( s 0 , k 0 , A ) · √ 1 + k 0 > 0

  14. Sparse eigenvalues Definition For m ≤ p , we define the largest and smallest m -sparse eigenvalue of a q × p matrix A to be 2 / � t � 2 t ∈ R p , t � = 0 ; m − sparse � At � 2 ρ max ( m , A ) := max 2 , 2 / � t � 2 t ∈ R p , t � = 0 ; m − sparse � At � 2 ρ min ( m , A ) := min 2 . If RE ( s 0 , k 0 , A ) is satisfied with k 0 ≥ 1, then the square submatrices of size 2 s 0 of A T A are necessarily positive definite, that is, ρ min ( 2 s 0 , A ) > 0.

  15. Examples: of A which satisfies the Restricted Eigenvalue condition, but not RIP (Ruskutti, Wainwright, and Yu 10) Spiked Identity matrix: for a ∈ [ 0 , 1 ) , Σ p × p = ( 1 − a ) I p × p + a � 1 � 1 T 1 ∈ R p is the vector of all ones. where � ρ min (Σ) > 0 Then for all s 0 × s 0 submatrix Σ SS , we have ρ max (Σ SS ) ρ min (Σ SS ) = 1 + a ( s 0 − 1 ) 1 − a � � � Σ 1 / 2 e j � Largest sparse eigenvalue → ∞ as s 0 → ∞ , but 2 = 1 is bounded.

  16. Motivation: to construct classes of design matrices such that the Restricted Eigenvalue condition will be satisfied. Design matrix X has just independent rows, rather than independent entries: e.g., consider for some matrix A q × p X = Ψ A , where rows of the matrix Ψ n × q are independent isotropic vectors with subgaussian marginals, and RE ( s 0 , ( 1 + ε ) k 0 , A ) holds for some ε > 0, p > s 0 ≥ 0, and k 0 > 0. Design matrix X consists of independent identically distributed rows with bounded entries, whose covariance matrix Σ( X i ) = EX i X T satisfies RE ( s 0 , ( 1 + ε ) k 0 , Σ 1 / 2 ) . i The rows of X will be sampled from some distributions in R p ; The distribution may be highly non-Gaussian and perhaps discrete.

  17. Outline Introduction The main results ◮ The reduction principle ◮ Applications of the reduction principle Ingredients of the proof Conclusion

  18. Notation Let e 1 , . . . , e p be the canonical basis of R p . For a set J ⊂ { 1 , . . . , p } , denote E J = span { e j : j ∈ J } . For a matrix A , we use � A � 2 to denote its operator norm. For a set V ⊂ R p , we let conv V denote the convex hull of V . For a finite set Y , the cardinality is denoted by | Y | . 2 and S p − 1 be the unit Euclidean ball and the unit sphere Let B p respectively

  19. The reduction principle: Theorem Let E = ∪ | J | = d E J for d ( 3 k 0 ) < p, where � � 16 K 2 ( s 0 , 3 k 0 , A )( 3 k 0 ) 2 ( 3 k 0 + 1 ) � 2 � Ae j d ( 3 k 0 ) = s 0 + s 0 max 2 δ 2 j and E denotes R p otherwise. Let � Ψ be a matrix such that � � � � �� ∀ x ∈ AE ( 1 − δ ) � x � 2 ≤ Ψ x � 2 ≤ ( 1 + δ ) � x � 2 . Then RE ( s 0 , k 0 , � Ψ A ) holds with 0 < K ( s 0 , k 0 , � Ψ A ) ≤ K ( s 0 , k 0 , A ) . 1 − 5 δ If the matrix � Ψ acts as almost isometry on the images of the d -sparse vectors under A , then the product � Ψ A satisfies the RE condition with a smaller parameter k 0 .

Recommend


More recommend