The Noise Collector for sparse recovery in high dimensions Alexei Novikov Department of Mathematics Penn State University, USA with M.Moscoso (Madrid), G.Papanicolaou (Stanford), and C.Tsogka (UC Merced). Supported by NSF and AFOSR.
Inverse problems in wave propagation known incident scatterers field unknown medium
Inverse problems in wave propagation test scatterers unknown medium
Inverse problems in wave propagation known scatterers unknown medium
Imaging setup x r y j a λ IW x s h L An signal is emitted from x s at the array of N transducers, it illuminates the scatterers in the image window (IW). The point scatterers are at y j , and now they can be viewed as the sources of the signal. They send the scattered signal back to the array. There are K pixels in the IW. The number of scatters is M < N , and N < K , typically. The map Aρ ⇒ b in the paraxial approximation is (up to a constant) the (partial) Fourier transform.
Imaging of sparse scenes 1.2 20 1 0.9 15 1 0.8 10 0.7 0 0.8 5 cross-range in 0.6 0.6 0 0.5 0.4 -5 0.4 0.3 -10 0.2 0.2 -15 0.1 -20 0 0 -60 -40 -20 0 20 40 60 0 500 1000 1500 2000 range in 0 Left: the true image. I show 2-dimensional images for simplicity. Right: the recovered solution vector is plotted with red stars and the true solution vector of Aρ = b with green circles.
Noise, l 1 versus l 2 regularization True ρ l 1 solution ℓ 2 solution 15 1 15 1 15 1 0.9 0.9 0.9 10 10 10 0.8 0.8 0.8 0.7 0.7 0.7 0 0 0 5 5 5 cross-range in cross-range in cross-range in 0.6 0.6 0.6 0 0.5 0 0.5 0 0.5 0.4 0.4 0.4 -5 -5 -5 0.3 0.3 0.3 0.2 0.2 0.2 -10 -10 -10 0.1 0.1 0.1 -15 0 -15 0 -15 -15 -10 -5 0 5 10 15 -15 -10 -5 0 5 10 15 -15 -10 -5 0 5 10 15 range in range in range in 0 0 0 l 1 -methods are unstable to noise., l 2 -methods loose resolution.
l 1-regularization and Lasso Noiseless. Want to solve a sparsity promoting optimization ρ = arg min � ˜ ρ � 0 , subject to A ˜ ρ = b where � ρ � 0 = { # ρ i � = 0 } . It is expensive, so we solve ρ = arg min � ˜ ρ � 1 , subject to A ˜ ρ = b where � ρ � 1 = � i | ρ i | . Noisy case, Lasso. R. Tibshirani ’96, Chen & D.Donoho ’94, F.Santosa & W.Symes ’86 ρ − b � 2 ρ � 1 + � A ˜ � � 2 ρ = arg min λ � ˜ 2 i | ρ i | 2 and λ is a tuning parameter. �� where � ρ � 2 =
Tuning λ in Lasso 15 1 15 1 15 1 0.9 0.9 0.9 10 10 10 0.8 0.8 0.8 0.7 0.7 0.7 0 0 0 5 5 5 cross-range in cross-range in cross-range in 0.6 0.6 0.6 0 0.5 0 0.5 0 0.5 0.4 0.4 0.4 -5 -5 -5 0.3 0.3 0.3 0.2 0.2 0.2 -10 -10 -10 0.1 0.1 0.1 -15 0 -15 0 -15 0 -15 -10 -5 0 5 10 15 -15 -10 -5 0 5 10 15 -15 -10 -5 0 5 10 15 range in range in range in 0 0 0 1.2 1.2 1.2 c c c 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 0 500 1000 1500 2000 2500 3000 3500 4000 0 500 1000 1500 2000 2500 3000 3500 4000 0 500 1000 1500 2000 2500 3000 3500 4000 LASSO results with λ = 1, λ = 0 . 5 (optimal) and λ = 0 . 1.
l 1 & LASSO LASSO: finds a sparse approximate solution Aρ ≈ b if the tuning parameter λ is chosen correctly. l 1 : finds a sparse solution only if noise e = 0. No tuning parameters. If e is small, the sparse approximate solution can be found by thresholding. Thresholding has to be tuned. We propose to solve ( ρ τ , η ) = arg min ( τ � ρ � 1 + � η � 1 ) , subject to Aρ + Cη = b where C is the noise collector matrix and τ is the weight of the noise collector, and b = b 0 + e . We can prove ρ ≈ ρ τ and Cη ≈ e if τ is chosen correctly. √ We can choose τ = O ( ln N ) for any level of noise, before de-noising.
no NC with NC, but no weight: τ = 1 20 1 20 1 0.9 0.9 15 15 0.8 0.8 10 10 0.7 0.7 0 0 cross-range in 5 cross-range in 5 0.6 0.6 0 0.5 0 0.5 0.4 0.4 -5 -5 0.3 0.3 -10 -10 0.2 0.2 -15 -15 0.1 0.1 -20 0 -20 0 -20 -10 0 10 20 -20 -10 0 10 20 range in range in 0 0 1.2 1.2 c c 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 500 1000 1500 2000 0 500 1000 1500 2000 The top row - the images, the bottom row - the solution vector with red stars and the true solution vector with green circles.
with NC and weight ℓ 2 on the support 20 1 20 1 0.9 0.9 15 15 0.8 0.8 10 10 0.7 0.7 0 0 cross-range in 5 cross-range in 5 0.6 0.6 0 0.5 0 0.5 0.4 0.4 -5 -5 0.3 0.3 -10 -10 0.2 0.2 -15 -15 0.1 0.1 -20 0 -20 0 -20 -10 0 10 20 -20 -10 0 10 20 range in range in 0 0 1.2 1.2 c 2 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 500 1000 1500 2000 0 500 1000 1500 2000 The top row - the images, the bottom row - the solution vector with red stars and the true solution vector with green circles.
no NC with NC 20 1 20 1 0.9 0.9 15 15 0.8 0.8 10 10 0.7 0.7 0 0 cross-range in 5 cross-range in 5 0.6 0.6 0 0.5 0 0.5 0.4 0.4 -5 -5 0.3 0.3 -10 -10 0.2 0.2 -15 -15 0.1 0.1 -20 0 -20 0 -20 -10 0 10 20 -20 -10 0 10 20 range in range in 0 0 1.2 1.2 c 2 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 500 1000 1500 2000 0 500 1000 1500 2000
Design of the Noise Collector (i) Columns of NC should be sufficiently orthogonal to the columns of A , so it does not absorb signals with meaningful information. (ii) Columns of NC should be uniformly distributed on the unit sphere S N − 1 so that we could approximate well a typical noise vector. (iii) The number of columns of NC should grow slower than exponential with N , otherwise the method is impractical. (iv) Deterministic approach. If we fill up C imposing α α √ √ | � a i · � c j | < ∀ i, j , and | � c i · � c j | < ∀ i � = j, (1) N N then the Kabatjanskii-Levenstein inequality implies that the number Σ of columns in C grows at most polynomially: N α � Σ � N α 2 .
Probabilistic approach to design of NC If the columns of C are drawn at random independently. then the dot √ product of any two random unit vectors is still typically of order 1 / N . We have an asymptotically negligible event that our noise collector is bad The decoherence constraint is weakened by a logarithmic factor. Lemma: Choose β > 1, and pick Σ = N β vectors � c i at random and independently on S N − 1 . Then, for any κ > 0 there are constants c 0 ( κ, β ) and α > 1 / 2, such that (i) √ √ | � a i · � c j | < c 0 ln N/ N for all i, j, (2) e ∈ S N − 1 there exists at least one � and (ii) for any � c j , so √ | � e · � c j | > α/ N, (3) with large probability 1 − 1 /N κ . In addition the condition number of [ A | C ] is O (1).
False Discovery Rate is zero Theorem 1: (No phantom signal) Suppose there is no signal: ρ = 0 and e/ � e � l 2 is uniformly distributed on the unit sphere. For any κ > 0 we can construct the noise collector and choose weight τ so that ρ τ = 0 with probability 1 − 1 /N κ . Theorem 2: Let ρ be an M -sparse solution of Aρ = b 0 . If the columns of 1 A are decoherent: | a i · a j | � 3 M , then supp( ρ τ ) ⊆ supp( ρ ) with probability 1 − 1 /N κ .
Supports of ρ and ρτ agree Theorem 3: Suppose r is the magnitude of smallest non-zero entry of ρ . √ If � e � l 2 / � b 0 � l 2 � c 2 ln N , c 2 = c 2 ( κ, β, r, M ), then supp( ρ τ ) = supp( ρ ), with probability 1 − 1 /N κ . Theorem 4: (Exact Recovery): If there is no noise e = 0. Then ρ τ = ρ with probability 1 − 1 /N κ .
Failure to recover NC Lasso with optimal λ 1 1 0.9 0.9 0.8 0.8 0.7 0.7 SNR SNR 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 M M
Image Window and Noise Collector NC and IW IW only 1.2 1.2 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 5000 10000 15000 0 500 1000 1500 2000 Coefficients of the solution, 0dB, N = 625, K = 1000, Σ = 10000.
GeLMA with Fast Noise Collector Require: Set ρ = 0, z = 0. η = 0. √ Pick β = β ( A, C ), and τ = . 8 ln N . repeat r = b − Aρ − Cη z ⇐ z + βr ρ ⇐ S τβ ( ρ + βA ∗ ( z + r )) η ⇐ S β ( η + βC ∗ ( z + r )) until Convergence Calibrate τ so that FDR is zero when b = e (only noise). ”No phantom signal” criterion. Fast Noise Collector C is several random circular matrices. Then it is cheap to store C, and we can use FFT for matrix-vector multiplication. The soft shrinkage-thresholding operator S τ ( y i ) = sign( y i ) max { 0 , | y i | − τ } .
Geometric interpretation of � z � a i · � z = τ sign( ρ i ) , if ρ i � = 0 , and | � a i · � z | � τ if ρ i = 0 . Assume both � a i and − � a i are columns of A , and all ρ i � 0 then H A = { x ∈ R N : x = � � α i � a i , α i � 1 , α i � 0 } i i Suppose Λ is the support of ρ , typically (for non-sparse ρ ) | Λ | = N . Then, the simplex � � � � � � x ∈ R N � � � x = α i � a i , α i = 1 , α i � 0 � � i ∈ Λ i ∈ Λ has the unique normal vector � n , which is collinear to � z because z · � a i = � b 1 = � z · � , ∀ i ∈ Λ , and � z · � a j < 1 , ∀ j �∈ Λ . (4) � � b � A
Recommend
More recommend