decoding in compressed sensing
play

Decoding in Compressed Sensing Ronald DeVore USC, 2008 p. 1/33 - PowerPoint PPT Presentation

Decoding in Compressed Sensing Ronald DeVore USC, 2008 p. 1/33 Discrete Compressed Sensing R N with N large x I USC, 2008 p. 2/33 Discrete Compressed Sensing R N with N large x I We are able to ask n questions about x USC, 2008


  1. Sample Results: ℓ p If an n × N matrix Φ is instance-optimal of order k = 1 in ℓ 2 with constant C 0 then n ≥ N/C 0 This shows that instance-optimal is not a viable concept for ℓ 2 2 − 2 /p p For 1 < p < 2 and any k ≤ c 0 N 1 − 2 /p [ n/ log( N/n )] 2 − p USC, 2008 – p. 10/33

  2. Sample Results: ℓ p If an n × N matrix Φ is instance-optimal of order k = 1 in ℓ 2 with constant C 0 then n ≥ N/C 0 This shows that instance-optimal is not a viable concept for ℓ 2 2 − 2 /p p For 1 < p < 2 and any k ≤ c 0 N 1 − 2 /p [ n/ log( N/n )] 2 − p This bound cannot be improved USC, 2008 – p. 10/33

  3. Sample Results: ℓ p If an n × N matrix Φ is instance-optimal of order k = 1 in ℓ 2 with constant C 0 then n ≥ N/C 0 This shows that instance-optimal is not a viable concept for ℓ 2 2 − 2 /p p For 1 < p < 2 and any k ≤ c 0 N 1 − 2 /p [ n/ log( N/n )] 2 − p This bound cannot be improved Matrices that satisfy instance-optimal for this range of k are obtained from matrices which satisfy RIP for ¯ k = k 2 /p − 1 N 2 − 2 /p USC, 2008 – p. 10/33

  4. What are good matrices How can we construct Φ satisfying RIP for the largest range of k ? USC, 2008 – p. 11/33

  5. What are good matrices How can we construct Φ satisfying RIP for the largest range of k ? R n Choose at random N vectors from the unit sphere in I and use these as the columns of Φ USC, 2008 – p. 11/33

  6. What are good matrices How can we construct Φ satisfying RIP for the largest range of k ? R n Choose at random N vectors from the unit sphere in I and use these as the columns of Φ Choose each entry of Φ independently and at random from the Gaussian distribution N (0 , 1 / √ n ) USC, 2008 – p. 11/33

  7. What are good matrices How can we construct Φ satisfying RIP for the largest range of k ? R n Choose at random N vectors from the unit sphere in I and use these as the columns of Φ Choose each entry of Φ independently and at random from the Gaussian distribution N (0 , 1 / √ n ) We choose each entry of Φ independently and at random from the Bernouli distribution and then normalize columns to have length one. USC, 2008 – p. 11/33

  8. What are good matrices How can we construct Φ satisfying RIP for the largest range of k ? R n Choose at random N vectors from the unit sphere in I and use these as the columns of Φ Choose each entry of Φ independently and at random from the Gaussian distribution N (0 , 1 / √ n ) We choose each entry of Φ independently and at random from the Bernouli distribution and then normalize columns to have length one. With high probability on the draw the resulting matrix will have RIP for the largest range of k USC, 2008 – p. 11/33

  9. What are good matrices How can we construct Φ satisfying RIP for the largest range of k ? R n Choose at random N vectors from the unit sphere in I and use these as the columns of Φ Choose each entry of Φ independently and at random from the Gaussian distribution N (0 , 1 / √ n ) We choose each entry of Φ independently and at random from the Bernouli distribution and then normalize columns to have length one. With high probability on the draw the resulting matrix will have RIP for the largest range of k Problem: None of these are constructive. Can we put our hands on matrices?? USC, 2008 – p. 11/33

  10. What are good matrices How can we construct Φ satisfying RIP for the largest range of k ? R n Choose at random N vectors from the unit sphere in I and use these as the columns of Φ Choose each entry of Φ independently and at random from the Gaussian distribution N (0 , 1 / √ n ) We choose each entry of Φ independently and at random from the Bernouli distribution and then normalize columns to have length one. With high probability on the draw the resulting matrix will have RIP for the largest range of k Problem: None of these are constructive. Can we put our hands on matrices?? No constructions are known for largest range of k USC, 2008 – p. 11/33

  11. Instance-Optimality in Probability We saw that Instance-Optimality for ℓ N 2 is not viable Suppose Φ( ω ) is a collection of random matrices We say this family satisfies RIP of order k with probability 1 − ǫ if a random draw { Φ( ω ) } will satisfy RIP of order k with probability 1 − ǫ We say { Φ( ω ) } is bounded with probability 1 − ǫ if given R N with probability 1 − ǫ a random draw { Φ( ω ) } any x ∈ I will satisfy � Φ( ω )( x ) � ℓ N 2 ≤ C 0 � x � ℓ N 2 with C 0 an absolute constant Our earlier analysis showed that Gaussian and Bernouli random matrices have these properties with ǫ = e − cn USC, 2008 – p. 12/33

  12. Theorem: Cohen-Dahmen-DeVore If { Φ( ω ) } satisfies RIP of order 3 k and boundedness each with probability 1 − ǫ then there are decoders ∆( ω ) such that given any x ∈ ℓ N 2 we have with probability 1 − 2 ǫ � x − ∆( ω )Φ( ω )( x ) � ℓ N 2 ≤ C 0 σ k ( x ) ℓ N 2 Instance-optimality in probability Range of k is k ≤ c 0 n/ log( N/n ) Decoder is impractical USC, 2008 – p. 13/33

  13. Decoding By far the most intriguing part of Compressed Sensing is the decoding USC, 2008 – p. 14/33

  14. Decoding By far the most intriguing part of Compressed Sensing is the decoding There are continuing debates as to which decoding is numerically fastest USC, 2008 – p. 14/33

  15. Decoding By far the most intriguing part of Compressed Sensing is the decoding There are continuing debates as to which decoding is numerically fastest Some common decoders USC, 2008 – p. 14/33

  16. Decoding By far the most intriguing part of Compressed Sensing is the decoding There are continuing debates as to which decoding is numerically fastest Some common decoders ℓ 1 minimization: Long history (Donoho; Candes-Romberg) USC, 2008 – p. 14/33

  17. Decoding By far the most intriguing part of Compressed Sensing is the decoding There are continuing debates as to which decoding is numerically fastest Some common decoders ℓ 1 minimization: Long history (Donoho; Candes-Romberg) Greedy algorithms - find support of a good approximation vector and then decode using ℓ 2 minimization (Gilbert-Tropp; Needel-Vershynin(ROMP), Donoho (STOMP)) USC, 2008 – p. 14/33

  18. Decoding By far the most intriguing part of Compressed Sensing is the decoding There are continuing debates as to which decoding is numerically fastest Some common decoders ℓ 1 minimization: Long history (Donoho; Candes-Romberg) Greedy algorithms - find support of a good approximation vector and then decode using ℓ 2 minimization (Gilbert-Tropp; Needel-Vershynin(ROMP), Donoho (STOMP)) Iterative Reweighted Least Squares (Osborne, Daubechies-DeVore-Fornasier-Gunturk) USC, 2008 – p. 14/33

  19. Decoding By far the most intriguing part of Compressed Sensing is the decoding There are continuing debates as to which decoding is numerically fastest Some common decoders ℓ 1 minimization: Long history (Donoho; Candes-Romberg) Greedy algorithms - find support of a good approximation vector and then decode using ℓ 2 minimization (Gilbert-Tropp; Needel-Vershynin(ROMP), Donoho (STOMP)) Iterative Reweighted Least Squares (Osborne, Daubechies-DeVore-Fornasier-Gunturk) We shall make some remarks on these decoders emphasizing the last one USC, 2008 – p. 14/33

  20. Issues in Decoding Range of Instance Optimality: When combined with encoder does it give full range of instance optimality? USC, 2008 – p. 15/33

  21. Issues in Decoding Range of Instance Optimality: When combined with encoder does it give full range of instance optimality? Number of computations to decode? USC, 2008 – p. 15/33

  22. Issues in Decoding Range of Instance Optimality: When combined with encoder does it give full range of instance optimality? Number of computations to decode? Robustness to noise? USC, 2008 – p. 15/33

  23. Issues in Decoding Range of Instance Optimality: When combined with encoder does it give full range of instance optimality? Number of computations to decode? Robustness to noise? Theorems versus numerical examples USC, 2008 – p. 15/33

  24. Issues in Decoding Range of Instance Optimality: When combined with encoder does it give full range of instance optimality? Number of computations to decode? Robustness to noise? Theorems versus numerical examples Instance Optimality in Probability USC, 2008 – p. 15/33

  25. Issues in Decoding Range of Instance Optimality: When combined with encoder does it give full range of instance optimality? Number of computations to decode? Robustness to noise? Theorems versus numerical examples Instance Optimality in Probability Given that we cant construct best encoding matrices it seems that the best results would correspond to random draws of matrices USC, 2008 – p. 15/33

  26. ℓ 1 minimization ℓ 1 minimization: x ∗ := Argmin � z � ℓ 1 z ∈F ( y ) USC, 2008 – p. 16/33

  27. ℓ 1 minimization ℓ 1 minimization: x ∗ := Argmin � z � ℓ 1 z ∈F ( y ) x ∗ = x − η ∗ where η ∗ := Argmin � x − η � ℓ 1 η ∈N USC, 2008 – p. 16/33

  28. ℓ 1 minimization ℓ 1 minimization: x ∗ := Argmin � z � ℓ 1 z ∈F ( y ) x ∗ = x − η ∗ where η ∗ := Argmin � x − η � ℓ 1 η ∈N Can be solved by Linear Programming USC, 2008 – p. 16/33

  29. ℓ 1 minimization ℓ 1 minimization: x ∗ := Argmin � z � ℓ 1 z ∈F ( y ) x ∗ = x − η ∗ where η ∗ := Argmin � x − η � ℓ 1 η ∈N Can be solved by Linear Programming Let T := supp(x) USC, 2008 – p. 16/33

  30. ℓ 1 minimization ℓ 1 minimization: x ∗ := Argmin � z � ℓ 1 z ∈F ( y ) x ∗ = x − η ∗ where η ∗ := Argmin � x − η � ℓ 1 η ∈N Can be solved by Linear Programming Let T := supp(x) Recall that x is an ℓ 1 minimizer if and only if � � | sign( x i ) η i | ≤ | η i | , ∀ η ∈ N i ∈ T i ∈ T c USC, 2008 – p. 16/33

  31. ℓ 1 minimization ℓ 1 minimization: x ∗ := Argmin � z � ℓ 1 z ∈F ( y ) x ∗ = x − η ∗ where η ∗ := Argmin � x − η � ℓ 1 η ∈N Can be solved by Linear Programming Let T := supp(x) Recall that x is an ℓ 1 minimizer if and only if � � | sign( x i ) η i | ≤ | η i | , ∀ η ∈ N i ∈ T i ∈ T c If N has the following null space property: there is a γ < 1 with � η T � ℓ 1 ≤ γ � η T c � ℓ 1 , ∀ η ∈ N , #( T ) ≤ k USC, 2008 – p. 16/33

  32. ℓ 1 minimization ℓ 1 minimization: x ∗ := Argmin � z � ℓ 1 z ∈F ( y ) x ∗ = x − η ∗ where η ∗ := Argmin � x − η � ℓ 1 η ∈N Can be solved by Linear Programming Let T := supp(x) Recall that x is an ℓ 1 minimizer if and only if � � | sign( x i ) η i | ≤ | η i | , ∀ η ∈ N i ∈ T i ∈ T c If N has the following null space property: there is a γ < 1 with � η T � ℓ 1 ≤ γ � η T c � ℓ 1 , ∀ η ∈ N , #( T ) ≤ k then all x ∈ Σ k have unique ℓ 1 minimizers equal to x USC, 2008 – p. 16/33

  33. Orhtogonal Matching Pursuit (OMP) We seek approximations to y from the dictionary D := { φ 1 , . . . , φ N } consisting of the columns of Φ USC, 2008 – p. 17/33

  34. Orhtogonal Matching Pursuit (OMP) We seek approximations to y from the dictionary D := { φ 1 , . . . , φ N } consisting of the columns of Φ j 1 := Argmax |� y, φ j �| j =1 , ··· ,N USC, 2008 – p. 17/33

  35. Orhtogonal Matching Pursuit (OMP) We seek approximations to y from the dictionary D := { φ 1 , . . . , φ N } consisting of the columns of Φ j 1 := Argmax |� y, φ j �| j =1 , ··· ,N y 1 := z 1 j 1 φ j 1 with z 1 j 1 := � y, φ j 1 � / � φ j 1 � 2 USC, 2008 – p. 17/33

  36. Orhtogonal Matching Pursuit (OMP) We seek approximations to y from the dictionary D := { φ 1 , . . . , φ N } consisting of the columns of Φ j 1 := Argmax |� y, φ j �| j =1 , ··· ,N y 1 := z 1 j 1 φ j 1 with z 1 j 1 := � y, φ j 1 � / � φ j 1 � 2 i-th step: { j 1 , · · · , j i } and y i = � i l =1 z i j l φ j i the orthogonal projection of y onto Span { φ j 1 , · · · , φ j l } USC, 2008 – p. 17/33

  37. Orhtogonal Matching Pursuit (OMP) We seek approximations to y from the dictionary D := { φ 1 , . . . , φ N } consisting of the columns of Φ j 1 := Argmax |� y, φ j �| j =1 , ··· ,N y 1 := z 1 j 1 φ j 1 with z 1 j 1 := � y, φ j 1 � / � φ j 1 � 2 i-th step: { j 1 , · · · , j i } and y i = � i l =1 z i j l φ j i the orthogonal projection of y onto Span { φ j 1 , · · · , φ j l } |� r i , φ j �| where r i := y − y i is the residual j i +1 := Argmax j =1 , ··· ,N USC, 2008 – p. 17/33

  38. Orhtogonal Matching Pursuit (OMP) We seek approximations to y from the dictionary D := { φ 1 , . . . , φ N } consisting of the columns of Φ j 1 := Argmax |� y, φ j �| j =1 , ··· ,N y 1 := z 1 j 1 φ j 1 with z 1 j 1 := � y, φ j 1 � / � φ j 1 � 2 i-th step: { j 1 , · · · , j i } and y i = � i l =1 z i j l φ j i the orthogonal projection of y onto Span { φ j 1 , · · · , φ j l } |� r i , φ j �| where r i := y − y i is the residual j i +1 := Argmax j =1 , ··· ,N x i is z i j 1 , · · · , z i j i augmented by zeros in the other coordinates, x i j = z i j , if j ∈ { j 1 , · · · , j i } , 0 otherwise . USC, 2008 – p. 17/33

  39. Results for Decoding for Random Draws Gilbert and Tropp proved that for Bernouli random matrices, OMP captures a k sparse vector with high probability for the large range of k USC, 2008 – p. 18/33

  40. Results for Decoding for Random Draws Gilbert and Tropp proved that for Bernouli random matrices, OMP captures a k sparse vector with high probability for the large range of k Cohen-Dahmen-DeVore extend this to general random families of matrices USC, 2008 – p. 18/33

  41. Results for Decoding for Random Draws Gilbert and Tropp proved that for Bernouli random matrices, OMP captures a k sparse vector with high probability for the large range of k Cohen-Dahmen-DeVore extend this to general random families of matrices Are there practical decoders that give instance optimality in ℓ 2 ? USC, 2008 – p. 18/33

  42. Results for Decoding for Random Draws Gilbert and Tropp proved that for Bernouli random matrices, OMP captures a k sparse vector with high probability for the large range of k Cohen-Dahmen-DeVore extend this to general random families of matrices Are there practical decoders that give instance optimality in ℓ 2 ? Wojtaszcek has shown that for Gaussian matrices ℓ 1 minimization does the job USC, 2008 – p. 18/33

  43. Results for Decoding for Random Draws Gilbert and Tropp proved that for Bernouli random matrices, OMP captures a k sparse vector with high probability for the large range of k Cohen-Dahmen-DeVore extend this to general random families of matrices Are there practical decoders that give instance optimality in ℓ 2 ? Wojtaszcek has shown that for Gaussian matrices ℓ 1 minimization does the job This result rests on a geometric property of Gaussian matrices USC, 2008 – p. 18/33

  44. Results for Decoding for Random Draws Gilbert and Tropp proved that for Bernouli random matrices, OMP captures a k sparse vector with high probability for the large range of k Cohen-Dahmen-DeVore extend this to general random families of matrices Are there practical decoders that give instance optimality in ℓ 2 ? Wojtaszcek has shown that for Gaussian matrices ℓ 1 minimization does the job This result rests on a geometric property of Gaussian matrices Namely the image of the unit ℓ 1 ball under such matrices will with high probability contain an ℓ 2 ball of √ radius 1 / k USC, 2008 – p. 18/33

  45. General Random Families Cohen-Dahmen-DeVore USC, 2008 – p. 19/33

  46. General Random Families Cohen-Dahmen-DeVore Given an arbitrary ǫ > 0 we can obtain � x − ∆(Φ( x )) � ℓ 2 ≤ Cσ k ( x ) ℓ 2 + ǫ with high probability USC, 2008 – p. 19/33

  47. General Random Families Cohen-Dahmen-DeVore Given an arbitrary ǫ > 0 we can obtain � x − ∆(Φ( x )) � ℓ 2 ≤ Cσ k ( x ) ℓ 2 + ǫ with high probability Encoders that attain this USC, 2008 – p. 19/33

  48. General Random Families Cohen-Dahmen-DeVore Given an arbitrary ǫ > 0 we can obtain � x − ∆(Φ( x )) � ℓ 2 ≤ Cσ k ( x ) ℓ 2 + ǫ with high probability Encoders that attain this ℓ 1 minimization USC, 2008 – p. 19/33

  49. General Random Families Cohen-Dahmen-DeVore Given an arbitrary ǫ > 0 we can obtain � x − ∆(Φ( x )) � ℓ 2 ≤ Cσ k ( x ) ℓ 2 + ǫ with high probability Encoders that attain this ℓ 1 minimization Greedy thresholding: at each iteration take all coordinates for which the inner product √ � r, φ ν � ≥ δ � r � ℓ 2 / k where r is the residual USC, 2008 – p. 19/33

  50. General Random Families Cohen-Dahmen-DeVore Given an arbitrary ǫ > 0 we can obtain � x − ∆(Φ( x )) � ℓ 2 ≤ Cσ k ( x ) ℓ 2 + ǫ with high probability Encoders that attain this ℓ 1 minimization Greedy thresholding: at each iteration take all coordinates for which the inner product √ � r, φ ν � ≥ δ � r � ℓ 2 / k where r is the residual Here δ > 0 is a fixed threshold parameter USC, 2008 – p. 19/33

  51. Other Possible Decoders We seek other possible decoding algorithms which may be faster USC, 2008 – p. 20/33

  52. Other Possible Decoders We seek other possible decoding algorithms which may be faster Here we add in the knowledge of our Compressed Sensing setting - the RIP property for Φ USC, 2008 – p. 20/33

  53. Other Possible Decoders We seek other possible decoding algorithms which may be faster Here we add in the knowledge of our Compressed Sensing setting - the RIP property for Φ We have also discussed the least squares problem � z � ℓ 2 = Argmin � x − η � ℓ 2 x := Argmin ¯ η ∈N z ∈F ( y ) USC, 2008 – p. 20/33

  54. Other Possible Decoders We seek other possible decoding algorithms which may be faster Here we add in the knowledge of our Compressed Sensing setting - the RIP property for Φ We have also discussed the least squares problem � z � ℓ 2 = Argmin � x − η � ℓ 2 x := Argmin ¯ η ∈N z ∈F ( y ) We know this does not work well USC, 2008 – p. 20/33

  55. Other Possible Decoders We seek other possible decoding algorithms which may be faster Here we add in the knowledge of our Compressed Sensing setting - the RIP property for Φ We have also discussed the least squares problem � z � ℓ 2 = Argmin � x − η � ℓ 2 x := Argmin ¯ η ∈N z ∈F ( y ) We know this does not work well However it is easy to compute x = Φ t [ΦΦ t ] − 1 Φ x = Φ t [ΦΦ t ] − 1 y ¯ USC, 2008 – p. 20/33

  56. Other Possible Decoders We seek other possible decoding algorithms which may be faster Here we add in the knowledge of our Compressed Sensing setting - the RIP property for Φ We have also discussed the least squares problem � z � ℓ 2 = Argmin � x − η � ℓ 2 x := Argmin ¯ η ∈N z ∈F ( y ) We know this does not work well However it is easy to compute x = Φ t [ΦΦ t ] − 1 Φ x = Φ t [ΦΦ t ] − 1 y ¯ O ( Nn 2 ) arithmetic operations USC, 2008 – p. 20/33

  57. Weighted Least Squares Consider weighted ℓ 2 minimization. USC, 2008 – p. 21/33

  58. Weighted Least Squares Consider weighted ℓ 2 minimization. Let w j > 0 , j = 1 , . . . , N be a positive weight USC, 2008 – p. 21/33

  59. Weighted Least Squares Consider weighted ℓ 2 minimization. Let w j > 0 , j = 1 , . . . , N be a positive weight � 1 / 2 �� N j =1 w j u 2 � u � ℓ 2 ( w ) := j USC, 2008 – p. 21/33

  60. Weighted Least Squares Consider weighted ℓ 2 minimization. Let w j > 0 , j = 1 , . . . , N be a positive weight � 1 / 2 �� N j =1 w j u 2 � u � ℓ 2 ( w ) := j � u, v � w := � N j =1 w j u j v j USC, 2008 – p. 21/33

  61. Weighted Least Squares Consider weighted ℓ 2 minimization. Let w j > 0 , j = 1 , . . . , N be a positive weight � 1 / 2 �� N j =1 w j u 2 � u � ℓ 2 ( w ) := j � u, v � w := � N j =1 w j u j v j Define x ( w ) := Argmin � z � ℓ 2 ( w ) z ∈F ( y ) USC, 2008 – p. 21/33

  62. Weighted Least Squares Consider weighted ℓ 2 minimization. Let w j > 0 , j = 1 , . . . , N be a positive weight � 1 / 2 �� N j =1 w j u 2 � u � ℓ 2 ( w ) := j � u, v � w := � N j =1 w j u j v j Define x ( w ) := Argmin � z � ℓ 2 ( w ) z ∈F ( y ) x ( w ) = x − η ( w ) where η ( w ) := Argmin � x − η � ℓ 2 ( w ) η ∈N USC, 2008 – p. 21/33

  63. Weighted Least Squares Consider weighted ℓ 2 minimization. Let w j > 0 , j = 1 , . . . , N be a positive weight � 1 / 2 �� N j =1 w j u 2 � u � ℓ 2 ( w ) := j � u, v � w := � N j =1 w j u j v j Define x ( w ) := Argmin � z � ℓ 2 ( w ) z ∈F ( y ) x ( w ) = x − η ( w ) where η ( w ) := Argmin � x − η � ℓ 2 ( w ) η ∈N Note that this solution is characterized by the orthogonality conditions � x ( w ) , η � w = 0 , η ∈ N USC, 2008 – p. 21/33

  64. Weighted Least Squares Consider weighted ℓ 2 minimization. Let w j > 0 , j = 1 , . . . , N be a positive weight � 1 / 2 �� N j =1 w j u 2 � u � ℓ 2 ( w ) := j � u, v � w := � N j =1 w j u j v j Define x ( w ) := Argmin � z � ℓ 2 ( w ) z ∈F ( y ) x ( w ) = x − η ( w ) where η ( w ) := Argmin � x − η � ℓ 2 ( w ) η ∈N Note that this solution is characterized by the orthogonality conditions � x ( w ) , η � w = 0 , η ∈ N We can again solve for x ( w ) in O ( Nn 2 ) operations USC, 2008 – p. 21/33

Recommend


More recommend