Phase Retrieval using Partial Unitary Sensing Matrices Rishabh Dudeja, Milad Bakhshizadeh, Junjie Ma, Arian Maleki 1
The Phase Retrieval Problem ◮ Recover unknown x ⋆ from y = | Ax ⋆ | ◮ x ⋆ ∈ C n : signal vector. ◮ y ∈ R m : measurements. ◮ A : sensing matrix. ◮ δ = m/n : sampling ratio. 2
Popular sensing matrices (in theoretical work) ◮ Popular sensing matrices: ∼ CN � 0 , 1 /n � i.i.d. ◮ Gaussian: A ij ◮ Coded diffraction pattern (CDP): F D 1 . . A CDP = . F D L ◮ D l = Diag ( e iφ ( l ) 1 , . . . e iφ ( l ) n ) ◮ F : Fourier matrix ◮ φ 1 , . . . , φ n : independent uniform phases ◮ Objective: Which matrix performs better from a purely theoretical standpoint? 3
Flashback to compressed sensing: ◮ Performance of partial orthogonal versus Gaussian on LASSO ◮ Noiseless measurements: Same phase transition ◮ Noisy measurements: Partial orthogonal (Fourier) is better Related work: ◮ Originally observed Donoho, Tanner (2009) ◮ Phase transition analysis of Gaussian matrices Donoho, Tanner (2006) ◮ Mean square error calculation of Gaussian matrices Donohoa, Maleki, Montanari (2011), Bayati, Montanari (2011), Thrampoulidis, Oymak, Hassibi (2015) ◮ Mean square error calculation of partial orthogonal matrices: Tulino, Verdue, Caire (2013) , Thrampoulidis,Hassibi (2015) 4
This talk : Spectral Estimator P. Netrapalli, P. Jain & S. Sanghavi (2015) ◮ The spectral estimator ˆ x : the leading eigenvector of the matrix = A H T A ∆ M ∆ ◮ T = Diag ( T ( y 1 ) , . . . T ( y m )) ◮ T : R ≥ 0 → [0 , 1] is a continuous trimming function 5
This talk : Spectral Estimator P. Netrapalli, P. Jain & S. Sanghavi (2015) ◮ The spectral estimator ˆ x : the leading eigenvector of the matrix = A H T A ∆ M ∆ ◮ T = Diag ( T ( y 1 ) , . . . T ( y m )) ◮ T : R ≥ 0 → [0 , 1] is a continuous trimming function ◮ Population Behaviour: E M = λ 1 x ⋆ x H ⋆ + λ 2 ( I n − x ⋆ x H ⋆ ) ◮ λ 1 = E T | Z | 2 ◮ λ 2 = E T ◮ Z ∼ CN (0 , 1) √ ◮ T = T ( | Z | / δ ) . 5
CDP behaves like oversampled Haar model ρ = 1 n | x H ⋆ ˆ x | . 1 0 . 9 0 . 8 0 . 7 0 . 6 ρ 2 0 . 5 0 . 4 0 . 3 0 . 2 T ( y ) = δy 2 / ( δy 2 + 0 . 1) √ T ( y ) = δy 2 / ( δy 2 + δ − 1) 0 . 1 δy 2 ≤ 2) / 4 T ( y ) = δy 2 I ( � 0 1 2 3 4 5 6 7 8 9 10 δ Oversampled Haar model explains CDP. 6
Refined objective ◮ Compare the spectral estimator on ◮ Gaussian: A ij ∼ N (0 , 1 n ) . ◮ Oversampled Haar: H m ∼ Unif( U ( m )) , A = H m S m,n , y = | Ax ⋆ | . ◮ S m,n : selects the columns randomly. ◮ We use the asymptotic framework 1. m, n → ∞ 2. m/n → δ 7
Sharp Asymptotics: Gaussian Sensing matrices Y. Lu and G. Li (2016) x H x ⋆ | 2 /n → ρ 2 > 0 . ◮ For δ > 1 : ∃ A spectral estimator ˆ x with | ˆ Y. Lu and G. Li (2016), M. Mondelli and A. Montanari (2017) ◮ Lu and Li also showed how ρ can be calculated. 8
Main Result: oversampled Haar matrices Theorem: We have, � | x H x | 2 δ ⋆ ˆ 0 , ψ 1 ( τ ⋆ ) < δ − 1 , P → δ n ρ 2 T ( δ ) , ψ 1 ( τ ⋆ ) > δ − 1 , where, � � | Z | 2 E Λ( τ ) = τ − 1 − 1 /δ τ − T � � , τ ∈ [1 , ∞ ) . , ψ 1 ( τ ) = 1 E 1 E τ − T τ − T √ And, τ ⋆ = arg min Λ( τ ) , Z ∼ CN (0 , 1) , T = T ( | Z | / δ ) . 9
Application: Optimal Trimming Functions ◮ Weak recovery threshold of T : = inf { δ ≥ 1 : ρ 2 ∆ T ( δ ) > 0 } δ T 10
Application: Optimal Trimming Functions ◮ Weak recovery threshold of T : = inf { δ ≥ 1 : ρ 2 ∆ T ( δ ) > 0 } δ T ◮ For oversampled Haar measurement matrix, the optimal trimming function is 1 T ⋆ ( y ) = 1 − δy 2 , δ T ⋆ = 2 . 10
Application: Optimal Trimming Functions ◮ Weak recovery threshold of T : = inf { δ ≥ 1 : ρ 2 ∆ T ( δ ) > 0 } δ T ◮ For oversampled Haar measurement matrix, the optimal trimming function is 1 T ⋆ ( y ) = 1 − δy 2 , δ T ⋆ = 2 . ◮ For Gaussian Sensing: δ T ⋆ = δ IT = 1 . Luo, Alghamadi and Lu (2018). 10
Remainder of the Talk: A sketch of the proof. 11
Main ingredient: free probability ◮ Classical probability theory: ◮ Consider two independent random variables X ∼ f X ( x ) , Y ∼ f Y ( y ) : ◮ f X + Y ( t ) = f X ( t ) ∗ f Y ( t ) = � f X ( z ) f Y ( t − z ) dz . ◮ f XY ( t ) = � z f X ( x ) f Y ( z x ) 1 | x | dx . 12
Main ingredient: free probability ◮ Classical probability theory: ◮ Consider two independent random variables X ∼ f X ( x ) , Y ∼ f Y ( y ) : ◮ f X + Y ( t ) = f X ( t ) ∗ f Y ( t ) = � f X ( z ) f Y ( t − z ) dz . ◮ f XY ( t ) = � z f X ( x ) f Y ( z x ) 1 | x | dx . ◮ Free probability theory (for random matrices): ◮ Let X and Y be “freely independent” ◮ Let µ X ( z ) denote empirical spectral distribution of X ◮ µ X + Y ( z ) = µ x ( z ) ⊞ µ y ( z ) (free "additive" convolution) ◮ µ XY ( z ) = µ x ( z ) ⊠ µ y ( z ) (free "multiplicative" convolution) 12
Step 1: Reduction to Rank-1 Additive Deformation ⇒ assume x ⋆ = √ n e 1 . ◮ Rotational invariance = ◮ Partition M : � � A H A H 1 T A 1 1 T A − 1 M = A H A H − 1 T A 1 − 1 T A − 1 13
Step 1: Reduction to Rank-1 Additive Deformation ⇒ assume x ⋆ = √ n e 1 . ◮ Rotational invariance = ◮ Partition M : � � A H A H 1 T A 1 1 T A − 1 M = A H A H − 1 T A 1 − 1 T A − 1 Proposition (Lu & Li, 2017) : a ∈ R , µ ∈ R ≥ 0 . � � q H a , � M ( µ ) = P + µ qq H . M ( a ) = q P ∃ µ eff ( a ) : (a) λ 1 ( M ( a )) = λ 1 ( � M ( µ eff ( a ))) . 1 v 1 | 2 = d a λ 1 ( M ( a )) (b) | e H 13
Step 1: Reduction to Rank-1 Additive Deformation ⇒ assume x ⋆ = √ n e 1 . ◮ Rotational invariance = ◮ Partition M : � � A H A H 1 T A 1 1 T A − 1 M = A H A H − 1 T A 1 − 1 T A − 1 Proposition (Lu & Li, 2017) : a ∈ R , µ ∈ R ≥ 0 . � � q H a , � M ( µ ) = P + µ qq H . M ( a ) = q P ∃ µ eff ( a ) : (a) λ 1 ( M ( a )) = λ 1 ( � M ( µ eff ( a ))) . 1 v 1 | 2 = d a λ 1 ( M ( a )) (b) | e H = λ 1 ( A H ∆ − 1 ( T + µ T A 1 ( T A 1 ) H ) A − 1 ) New Goal: Analyze L ( µ ) 13
Why free probability? ◮ Analyze L ( µ ) = λ 1 ( A H − 1 ( T + µ T A 1 ( T A 1 ) H ) A − 1 ) . 14
Why free probability? ◮ Analyze L ( µ ) = λ 1 ( A H − 1 ( T + µ T A 1 ( T A 1 ) H ) A − 1 ) . ◮ A − 1 , A 1 are dependent. 14
Conclusion ◮ Compared oversampled Haar sensing matrix with Gaussian ◮ Oversampled Haar sensing matrix with optimal trimming: δ = 2 ◮ Gaussian matrix with optimal trimming: δ = 1 15
Conclusion ◮ Compared oversampled Haar sensing matrix with Gaussian ◮ Oversampled Haar sensing matrix with optimal trimming: δ = 2 ◮ Gaussian matrix with optimal trimming: δ = 1 ◮ Oversampled Haar approximates the CDP sensing matrices 1 0 . 9 0 . 8 0 . 7 0 . 6 ρ 2 0 . 5 0 . 4 0 . 3 0 . 2 T ( y ) = δy 2 / ( δy 2 + 0 . 1) √ T ( y ) = δy 2 / ( δy 2 + δ − 1) 0 . 1 δy 2 ≤ 2) / 4 T ( y ) = δy 2 I ( � 0 1 2 3 4 5 6 7 8 9 10 δ Oversampled Haar model explains CDP; ρ = 1 n | x H ∗ ˆ x | . 15
Recommend
More recommend