Sequential Compressed Sensing Dmitry Malioutov DRW, research done mostly at MIT Joint work with Sujay Sanghavi and Alan Willsky September 03, 2009 1
Motivation Many important classes of signals are either sparse, or compressible. Examples: sparsity in : (a) standard, (b) Fourier, (c) wavelet basis. 2 2 50 1.5 100 1.5 1 150 1 0.5 200 0 0.5 250 −0.5 300 0 350 −1 −0.5 400 −1.5 450 −1 −2 0 20 40 60 80 100 120 0 50 100 150 100 200 300 400 500 70 60 100 50 200 40 30 300 20 400 10 500 0 0 50 100 150 100 200 300 400 500 CS: K -sparse x ∈ R N . We take M << N measurements y = A x + n , and try to recover x knowing that it is sparse. Related problems: recovering structure of graphical models from samples, recovering low-rank matrices from few measurements, ... 2
Motivation CS: for certain random A , x can be efficiently recovered with high prob. after O ( K log( N/K )) samples, where x is K -sparse. Req. M for signal with K = 10. 100 However: Gaussian 80 60 • may not know K a-priori 40 20 • such bounds are not available for 0 15 20 25 30 35 40 45 50 all decoders 100 Bernoulli • constants may not be tight. 80 60 How many samples to get? 40 20 0 15 20 25 30 35 40 45 50 Our approach: receive samples sequentially y i = a ′ i x and stop once we know that enough samples have been received. 3
Presentation Outline 1. CS formulation with sequential observations. 2. Stopping rule for the Gaussian case. 3. Stopping rule for the Bernoulli case. 4. Near-sparse and noisy signals. 5. Efficient solution of the sequential problem. 4
Batch CS Batch CS: suppose y = A x ∗ . Find the sparsest x satisfying y = A x . Relaxations: greedy methods, convex ℓ 1 , non-convex ℓ p , sparse Bayesian learning, message passing, e.t.c. – these all give sparse solutions. How to verify that the solution also recovers x ∗ ? 0.4 0.2 0 Top plot: reconstruction from M = 20 −0.2 M = 20 samples, N = 80. −0.4 0 10 20 30 40 50 60 70 80 0.5 Bottom plot: using M = 21 sam- M = 21 ples (correct). 0 −0.5 0 10 20 30 40 50 60 70 80 To guarantee correct reconstruction with high probability - we need a superfluous number of samples to be on the ’safe side’. 5
Sequential CS formulation Observations are available in sequence: y i = a ′ i x ∗ , i = 1 , .., M . At step M we use any sparse decoder to get a feasible solution ˆ x M , e.g. the ℓ 1 decoder: a ′ x M = arg min || x || 1 ˆ s.t. i x = y i , i = 1 , .., M and either declare victory and stop, or ask for another sample. Q: How does one know when enough samples have been received? Waiting for M ∝ CK log( N/K ): requires knowledge of K , K = � x ∗ � 0 . Also only rough bounds on proportionality constants may be known, and not even for all algorithms. 6
Gaussian measurement case Receive y i = a ′ i x ∗ , where a i ∼ N (0 , I ) i.i.d. Gaussian samples. x M +1 = ˆ x M then ˆ x M = x ∗ with probability 1. Claim : if ˆ A M � [ a ′ 1 , ... a ′ M ] ′ x M ˆ x ∗ a T M +1 x = y M +1 A M x = y 1: M A new sample a ′ M +1 x = y M +1 passes a random hyperplane through x M is zero. x ∗ . Probability that this hyperplane also goes through ˆ 7
Gaussian case (continued) x M = y M +1 x M � 0 < M or if (ii) a ′ Even simpler rules: (i) if � ˆ M +1 ˆ x M = x ∗ . then ˆ This works because for a random Gaussian matrix all M × M submatrices are non-singular with prob. 1. ||x|| 0 40 30 20 10 Example: N = 100, and K = 10. 0 0 5 10 15 20 25 30 35 40 ||x|| 1 15 x M � 0 . Top plot: � ˆ 10 5 x M � 1 . Midle plot: � ˆ 0 0 5 10 15 20 25 30 35 40 Bottom plot: � x ∗ − ˆ ||y − A x|| 2 x M � 2 . 6 4 2 0 0 5 10 15 20 25 30 35 40 M 8
Bernoulli case Let a i have equiprobable i.i.d. Bernoulli entries ± 1. Now M × M submatrices of A M can be singular (non-0 probability). The stopping rule for the Gaussian case does not hold. We modify x M = ˆ x M +1 = ... = ˆ x M + T . it as follows: wait until ˆ x M + T � = x ∗ ) < 2 − T . Claim : After T -step agreement P (ˆ Proof depends on Lemma (Tao and Vu): Let a ∈ {− 1 , 1 } N be an i.i.d. equiprobable Bernoulli. Let W be a fixed d -dimensional subspace of R N , 0 ≤ d < N . Then P ( a ∈ W ) ≤ 2 d − N . x M � = x ∗ . Let J and I be their supports, L = |I ∪ J | . Suppose ˆ x M − x ∗ ) ′ a I∪J = 0 } is an ( L − 1)-dim. Then A = { a I∪J | (ˆ subspace of R L . Prob that a M +1 I∪J belongs to A is at most 1 / 2. ⋄ 9
Bernoulli case (continued) Rule only uses T . Ideally we should also use M and N : errors are more likely for smaller M and N . Conjecture: for M × M matrix P (det( A ) = 0) ∝ M 2 2 1 − M . (Main failure: a pair of equal rows or columns). Best provable upper bound is still quite loose. Such analysis could allow shorter delay. ||x|| 0 ||x|| 0 30 15 10 20 10 5 0 0 0 5 10 15 0 5 10 15 20 25 30 ||y − A x|| 2 ||y − A x|| 2 8 2 6 1.5 4 1 0.5 2 0 0 0 5 10 15 0 5 10 15 20 25 30 Example with K = 3, N = 40. Example with K = 10, N = 40. 10
Near-sparse signals In many practical settings signals are near-sparse: e.g. Fourier or wavelet transforms of smooth signals. 10 50 20 (a) signal, 40 5 0 30 (b) wav. coeffs., 20 −20 0 (c) coeffs. sorted. 10 −40 −5 0 0 50 100 150 0 50 100 150 0 50 100 150 x M has similar error CS results: with roughly O ( K log N ) samples, ˆ to keeping K largest entries in x ∗ . Our approach: x M to x M , we obtain T new samples, and find distance from ˆ Given ˆ H M + T � { x | y i = a ′ i x, 1 ≤ i ≤ M + T } . This distance can be used to bound the reconstruction error � x ∗ − ˆ x M � 2 . 11
Near-sparse signals (continued) Let H M + T � { x | y i = a ′ i x , i = 1 , .., M + T } . Let θ T be the angle x M ) and H M + T . between the line ( x ∗ , ˆ x M , H M + T ) x M ) = d (ˆ d ( x ∗ , ˆ x M , H M + T ) � C T d (ˆ sin( θ T ) Let L = N − M . x M + T ˆ Using properties of χ L , χ 2 L and Jensen’s ineq. we have: x M ˆ θ T � � 1 L L − 2 E [ sin( θ ) ] ≈ T ≤ x ∗ T − 2 � � 1 ≤ L − 2 T − 2 − L V ar H T A M x = y 1: M sin( θ ) M + T T 12
Examples: near-sparse signals 20 Error error−est 0 10 Error (dB) −20 sample mean 8 approx. mean −40 bound on mean 6 4 −60 2 −80 0 20 40 60 80 100 0 0 20 40 60 80 100 T 10 Error 0 error−est 8 sample std −10 bound on std 6 Error (dB) −20 4 −30 2 −40 0 −50 0 20 40 60 80 100 T −60 0 200 400 600 800 1000 M (Top) sample C T , approx and Errors and bounds for (Top) sparse bound. sig., N = 100 , T = 5 , K = 10. (Bottom) sample std of C T , (Bottom): power-law decay signal, and a bound. L = 100. N = 1000 , T = 10. 13
Near-sparse and noisy: simplified approach x , true: x ∗ . Take T new samples y i = a ′ i x ∗ , and Current solution ˆ y i = a ′ x − x ∗ , and let compute ˆ i ˆ x . Denote the error by δ = ˆ y i − y i . Then z i = ˆ z i = a ′ i δ, 1 ≤ i ≤ T Now z i ’s are i.i.d. from some a zero-mean distribution with variance � δ � 2 2 V ar ( a ij ). We can estimate � δ � 2 2 by estimating the variance of the z i . For example, for Gaussian a i , confidence bounds on � δ � 2 2 can be obtained from the χ 2 T distribution. This is related to recent paper by Ward, ”Compressed sensing with cross-validation” that uses the Johnson-Lindenstrauss lemma. 14
Solving sequential CS Main goal of sequential CS – min number of samples. Yet, we also want efficient solution – not just resolving each time. Warm-starting simplex: x M is not feasible after M + 1-st sample. Add a ’slack’ variable: min � x � 1 + Qz , where y 1: M = A M x , y M +1 = a ′ M +1 x − z , z ≥ 0. For Q large enough, z is forced to 0. LP1 100 num iter. LP2 50 0 0 5 10 15 20 25 30 M x M and Alternative approach: homotopy continuation between ˆ x M +1 – follow the piece-wise linear solution path. Garrigues and ˆ El Ghaoui, 2008, and indep. Asif and Romberg, 2008. 15
Summary and future work Sequential processing can minimize the number of required measurements. • Gaussian case: a simple rule requires the least possible number of samples. • Bernoulli case: trade-off between probability of error and delay. • Near-sparse and noisy case: Change in solutions gives us information about solution accuracy. Interesting questions: • Related results in graphical models structure recovery from samples, and low-rank matrix recovery. • More efficient sequential solutions. • Comparison with active learning approaches? 16
Recommend
More recommend