Compressed sensing off-the-grid: The Fisher metric, support stability and optimal sampling bounds Clarice Poon University of Bath Joint work with: Nicolas Keriven and Gabriel Peyr´ e ´ Ecole Normale Sup´ erieure February 6, 2019 1 / 36
Outline Compressed sensing off-the-grid 1 The Fisher metric and the minimum separation condition 2 Support stability for the subsampled problem 3 Ideas behind the proofs – Dual certificates 4 Removal of random signs assumption 5 2 / 36
Compressed sensing [Cand` es, Romberg & Tao ’06; Donoho ’06] Task: Recover a ∈ C N from y = Φ a where Φ ∈ C m × N with m ≪ N and a is s -sparse. Typical compressed sensing statement: For certain random matrices Φ ∈ C m × N , with high probability, a can be uniquely recovered from m = O ( s log ( N )) measurements by solving z ∈ C N � z � 1 subject to Φ z = y min or in the noisy case of y = Φ a + w , the minimizer ˆ a of z ∈ C N λ � z � 1 + 1 2 � Φ z − y � 2 min 2 with λ ∼ δ/ √ s and � w � � δ satisfies � a − ˆ a � 1 � σ s ( x ) 1 + √ sδ. 3 / 36
Compressed sensing [Cand` es, Romberg & Tao ’06; Donoho ’06] Task: Recover a ∈ C N from y = Φ a where Φ ∈ C m × N with m ≪ N and a is s -sparse. Typical compressed sensing statement: For certain random matrices Φ ∈ C m × N , with high probability, a can be uniquely recovered from m = O ( s log ( N )) measurements by solving z ∈ C N � z � 1 subject to Φ z = y min or in the noisy case of y = Φ a + w , the minimizer ˆ a of z ∈ C N λ � z � 1 + 1 2 � Φ z − y � 2 min 2 with λ ∼ δ/ √ s and � w � � δ satisfies � a − ˆ a � 1 � σ s ( x ) 1 + √ sδ. In the case where U is unitary, the above statement holds with Φ = P Ω U where Ω are m = O ( N · µ ( U ) 2 · s · log( N )) uniformly drawn indices, µ ( U ) = max i,j | U ij | is the so called coherence . In the case of U being the DFT, we have µ ( U ) 2 = 1 /N . 3 / 36
Compressed sensing off the grid Aim: Recover µ 0 ∈ M ( X ), X ⊆ R d , from m observations, y = Φ µ 0 + w Let (Ω , Λ) be a probability space. For ω ∈ Ω, we have random features ϕ ω ∈ C ( X ) . iid For k = 1 , . . . , m , let ω k ∼ Λ. The measurement operator is � m 1 �� def. Φ : M ( X ) → C m , Φ µ = √ m ϕ ω k ( x )d µ ( x ) k =1 Typically, the measure of interest is µ 0 = � s j =1 a j δ x j where aδ x denotes the Dirac at x ∈ X with amplitude a ∈ C (also called a “spike”). 4 / 36
Imaging Sampling the Fourier transform (e.g. astronomy) Recover µ ∈ M ( T d ) from ( F µ ( ω k )) m k =1 where F is the Fourier transform and ω k are drawn ] d , Unif). iid from ([ [ − f c , f c ] � − i2 πx ⊤ ω � Here, ϕ ω ( x ) = exp and m s 1 � � � − i2 πx ⊤ Φ µ 0 = √ m a j exp j ω k j =1 k =1 Sampling the Laplace transform (e.g. fluorescence microscopy) + ) from ( L µ ( ω k )) m Recover µ ∈ M ( R d k =1 where L is the Laplace transform and ω k are drawn iid from ( R d − 2 α ⊤ ω + , Λ α ) where Λ α ( ω ) ∝ exp � � . � − x ⊤ ω � Here, ϕ ω ( x ) = exp and m s 1 � � � − x ⊤ Φ µ 0 = √ m a j exp j ω k j =1 k =1 5 / 36
Two layer neural network [Bach, 2015] Let Ω ⊆ R d , and ω 1 , . . . , ω m are the training samples drawn from (Ω , Λ), with corresponding values y 1 , . . . , y m ∈ R . Find a function of the form s � f ( ω ) = a j max ( � x j , ω � , 0) j =1 where a j ∈ R and x j ∈ R d such that f ( ω j ) ≈ y j for j = 1 , . . . , m . We can then use the function f to predict y given ω ∈ Ω. 6 / 36
Two layer neural network [Bach, 2015] Let Ω ⊆ R d , and ω 1 , . . . , ω m are the training samples drawn from (Ω , Λ), with corresponding values y 1 , . . . , y m ∈ R . Find a function of the form s � f ( ω ) = a j max ( � x j , ω � , 0) j =1 where a j ∈ R and x j ∈ R d such that f ( ω j ) ≈ y j for j = 1 , . . . , m . We can then use the function f to predict y given ω ∈ Ω. This is precisely our sparse spikes problem where we let ϕ ω ( x ) = max ( � x, ω � , 0) and m s � Φ µ 0 = a j max ( � x j , ω k � , 0) j =1 k =1 where µ 0 = � s j =1 a j δ x j . 6 / 36
Density estimation i =1 ∈ X s of a mixture Task: Given data on T , estimate parameters ( a i ) ∈ R N + and ( x i ) s s � � ξ ( t ) = a j ξ x j ( t ) = ξ x ( t )d µ 0 ( x ) X j =1 where µ 0 = � j a j δ x j where ( ξ x ) x ∈X is a family of template distributions. E.g. x = ( m, σ ) ∈ X = R × R + and ξ x = N ( m, σ 2 ). 7 / 36
Density estimation i =1 ∈ X s of a mixture Task: Given data on T , estimate parameters ( a i ) ∈ R N + and ( x i ) s s � � ξ ( t ) = a j ξ x j ( t ) = ξ x ( t )d µ 0 ( x ) X j =1 where µ 0 = � j a j δ x j where ( ξ x ) x ∈X is a family of template distributions. E.g. x = ( m, σ ) ∈ X = R × R + and ξ x = N ( m, σ 2 ). Sketching [Gribonval, Blanchard, Keriven & Traonmilin, 2017] No direct access to ξ but n iid samples ( t 1 , . . . , t n ) ∈ T n drawn from ξ . You do not record this (possibly huge) set of data, but compute online a small set y ∈ C m of m sketches against sketching functions θ ω ( t ): n = 1 � � � def. � y k θ ω k ( t j ) ≈ θ ω k ( t ) ξ ( t )d t = θ ω k ( t ) ξ x ( t )d t d µ 0 ( x ) . n T X T j =1 def. T θ ω k ( t ) ξ x ( t )d t . E.g. θ ω ( t ) = e i � ω, t � and ϕ · ( x ) is the characterisatic � So, ϕ ω ( x ) = function of ξ x . 7 / 36
The Beurling LASSO The BLASSO was initially proposed by [De Castro & Gamboa, 2012] and [Bredies & Pikkarainnen, 2013]. Solve 1 2 � Φ µ − y � 2 + λ | µ | ( X ) ( ˆ P λ ( y )) min µ ∈M ( X ) def. � � where | µ | ( X ) = sup Re ( � f, µ � ) ; f ∈ C ( X ) , � f � ∞ � 1 . Noiseless problem: for y 0 = Φ µ 0 , ( ˆ µ ∈M ( X ) | µ | ( X ) subject to Φ µ = y 0 min P 0 ( y 0 )) NB: If µ = � j a j δ x j , then | µ | ( X ) = � a � 1 . 8 / 36
The Beurling LASSO The BLASSO was initially proposed by [De Castro & Gamboa, 2012] and [Bredies & Pikkarainnen, 2013]. Solve 1 2 � Φ µ − y � 2 + λ | µ | ( X ) ( ˆ P λ ( y )) min µ ∈M ( X ) def. � � where | µ | ( X ) = sup Re ( � f, µ � ) ; f ∈ C ( X ) , � f � ∞ � 1 . Noiseless problem: for y 0 = Φ µ 0 , ( ˆ µ ∈M ( X ) | µ | ( X ) subject to Φ µ = y 0 min P 0 ( y 0 )) NB: If µ = � j a j δ x j , then | µ | ( X ) = � a � 1 . Goal: A CS-type theory . Under what conditions can we recover µ 0 = � s j =1 a j δ x j exactly (stably) from m = O ( s × log factors) (noisy) randomised linear measurements? 8 / 36
Remarks Other approaches include Prony-type methods (1795): MUSIC [Schmidt, 1986], ESPRIT [Roy, 1987], Finite Rate of Innovation [Vetterli, 2002] ... ◮ Nonvariational approaches which encodes the spikes positions as the zeros of some polynomial, whose coefficients are derived from the measurements. ◮ Generally restricted to Fourier type measurements. ◮ Extension to multivariate setting is nontrivial. There are efficient algorithms for solving this infinite dimensional problem, e.g. SDP approaches [Cand` es & Fernandez-Granda, 2012; De Castro, Gamboa, Henrion & Lasserre 2015] and Frank-Wolfe approaches [Bredies & Pikkarainnen 2013; Boyd, Schiebinger & Recht ’15; Denoyelle, Duval & Peyr´ e ’18] . 9 / 36
Background on the BLASSO Recovery of spikes of arbitrary signs require a minimum separation condition: � F µ 0 ( k ) ; k ∈ Z d , � k � ∞ � f c � [Cand` es & Fernandez-Granda ’12]: Given , µ 0 can be recovered uniquely if ∆ = min i � = j � x i − x j � ∞ � C d f c . Many extensions to other measurement operators, minimum separation is fundamental (for BLASSO) and often imposed via ad hoc metrics [Bendory et al ’15, Tang ’15]. 10 / 36
Background on the BLASSO Recovery of spikes of arbitrary signs require a minimum separation condition: � F µ 0 ( k ) ; k ∈ Z d , � k � ∞ � f c � [Cand` es & Fernandez-Granda ’12]: Given , µ 0 can be recovered uniquely if ∆ = min i � = j � x i − x j � ∞ � C d f c . Many extensions to other measurement operators, minimum separation is fundamental (for BLASSO) and often imposed via ad hoc metrics [Bendory et al ’15, Tang ’15]. Stability for the recovered measure ˆ µ : Integral type stability estimates [Cand` es & Fernandez-Granda ’13]: � K hi ⋆ (ˆ µ − µ 0 ) � L 1 . Support concentration [Fernandez-Granda ’13; Asa¨ ıs, De Castro & Gamboa ’12]: � � µ ( X near µ | ( X far ). Bounds on � ˆ ) − a j � and | ˆ � � j Support stability [Duval and Peyr´ e ’15]: in the small noise regime where � w � and λ are sufficiently small, ˆ µ consists of exactly s spikes, and the recovered amplitudes and positions vary continuously with respect to λ and w . 10 / 36
Recommend
More recommend