Nonconvex Phase Retrieval with Random Gaussian Measurements Yuejie Chi Department of Electrical and Computer Engineering December 2017 CSA, Berlin
Acknowledgements • Primary Collaborators: Yingbin Liang (OSU), Yuanxin Li (OSU), Yi Zhou (OSU), Huishuai Zhang (MSRA), Yuxin Chen (Princeton), Cong Ma (Princeton), and Kaizheng Wang (Princeton). • This research is supported by AFOSR, ONR and NSF. 1
Phase Retrieval: The Missing Phase Problem • In high-frequency (e.g. optical) applications, the (optical) detection devices [e.g., CCD cameras, photosensitive films, and the human eye] cannot measure the phase of a light wave. ω 0 10 ω 0 100 ω 0 • Optical devices measure the photon flux (no. of photons per second per unit area), which is proportional to the magnitude. • This leads to the so-called phase retrieval problem — inference with only intensity measurements. 2
Computational Imaging Phase retrieval is the foundation for modern computational imaging. Terahertz Imaging Ankylography Ptychography Space Telescope 3
Mathematical Setup • Phase retrieval: estimate x ♮ ∈ R n / C n from m phaseless measurements: y i = |� a i , x ♮ �| , i = 1 , . . . , m where a i corresponds to the i th measurement vector. • a i ’s are (coded or oversampled) Fourier transform vectors; • a i ’s are short-time Fourier transform vectors; • a i ’s are “generic” vectors such as random Gaussian vectors. • In a vectorized notation, we write − a ∗ 1 − − a ∗ 2 − y = | Ax ♮ | ∈ R n ∈ R / C m × n . + , where A = . . . − a ∗ m − • Phase retrieval solves a quadratic nonlinear system since: i = |� a i , x ♮ �| 2 = ( x ♮ ) ∗ a i a ∗ y 2 i x ♮ , i = 1 , . . . , m, 4
Nonconvex Procedure m 1 � x = argmin ˆ ℓ ( y i ; x ) m x ∈ R n / C n i =1 • Initialize x 0 via spectral methods to land in the neighborhood of the ground truth; • Iterative update using simple methods such as gradient descent and alternating minimization; Figure credit: Yuxin Chen. 5
Quadratic Loss of Amplitudes Wirtinger Flow (WF) employs the intensity-based loss surface [Cand` es et.al.]: m ℓ W F ( x ) = 1 |� a i , x ♮ �| 2 − |� a i , x �| 2 � 2 , � � m i =1 which is nonconvex and smooth . 6
Quadratic Loss of Amplitudes Wirtinger Flow (WF) employs the intensity-based loss surface [Cand` es et.al.]: m ℓ W F ( x ) = 1 |� a i , x ♮ �| 2 − |� a i , x �| 2 � 2 , � � m i =1 which is nonconvex and smooth . Reshaped Wirtinger Flow (RWF) : In contrast, we propose to minimize the quadratic loss of amplitude measurements: ℓ ( x ) := 1 m � y − | Ax |� 2 2 m m = 1 ℓ ( y i ; x ) = 1 � 2 , � � |� a i , x ♮ �| − |� a i , x �| � m m i =1 i =1 which is nonconvex and nonsmooth . 6
Quadratic Loss of Amplitudes Wirtinger Flow (WF) employs the intensity-based loss surface [Cand` es et.al.]: m ℓ W F ( x ) = 1 |� a i , x ♮ �| 2 − |� a i , x �| 2 � 2 , � � m i =1 which is nonconvex and smooth . Reshaped Wirtinger Flow (RWF) : In contrast, we propose to minimize the quadratic loss of amplitude measurements: ℓ ( x ) := 1 m � y − | Ax |� 2 2 m m = 1 ℓ ( y i ; x ) = 1 � 2 , � � |� a i , x ♮ �| − |� a i , x �| � m m i =1 i =1 which is nonconvex and nonsmooth . Which one is better? 6
The Choice of Loss Function is Important The quadratic loss of amplitudes enjoy better curvature in expectation! 10 5 180 9 4.5 160 8 4 140 7 3.5 120 6 3 100 5 2.5 80 4 2 60 3 1.5 40 2 1 0.5 20 1 0 0 0 2 1 2 2 1 2 2 1 0 0 1 2 0 -1 -1 0 1 0 -1 -1 0 1 -1 -2-2 -1 -2-2 -2-2 (a) Expected loss of LS (b) ℓ ( x ) (c) ℓ W F ( x ) Figure: Surface of the expected loss function of (a) least-squares (mirrored symmetrically), (b) quadratic loss of amplitudes, and (c) quadratic loss of intensity when x = [1 , − 1] T . In fact, Error Reduction (ER) , proposed by Gerchberg and Saxton in 1972, can be interpreted as alternating minimization using ℓ ( x ) . 7
Gradient Descent Reshaped Wirtinger Flow (RWF) : • The generalized gradient of ℓ ( x ) can be calculated as m ∇ ℓ ( x ) = 1 � ( � a i , x � − y i · sign ( � a i , x � )) a i m i =1 • Start with an initialization x 0 . At iteration t = 0 , 1 , . . . x t +1 = x t − µ ∇ ℓ ( x t ) I − µ x t + µ � � m A ∗ A m A ∗ diag ( y ) sign ( Ax t ) , = where µ is the step size. • Stochastic versions are even faster. 8
Statistical Measurement Model Strong performance guarantees are possible by leverage statistical properties of the measurement ensemble. • Gaussian measurement model: a i ∼ N ( 0 , I ) i.i.d. if real-valued , a i ∼ CN ( 0 , I ) i.i.d. if complex-valued , • Distance measure: φ ∈ [0 , 2 π ) � x − e jφ z � . dist ( x , z ) = min x z 9
Linear Convergence of RWF Theorem (Zhang, Zhou, Liang, C., 2016) Under i.i.d. Gaussian design, RWF achieves � t � x ♮ � 2 ( linear convergence) 1 − µ • � x t − x ♮ � 2 � � 2 provided that step size µ = O (1) is small enough and sample size m � n . 10
Linear Convergence of RWF Theorem (Zhang, Zhou, Liang, C., 2016) Under i.i.d. Gaussian design, RWF achieves � t � x ♮ � 2 ( linear convergence) 1 − µ • � x t − x ♮ � 2 � � 2 provided that step size µ = O (1) is small enough and sample size m � n . loss function regularization step size sample size WF intensity-based no O (1 /n ) O ( n log n ) RWF amplitude-based no O (1) O ( n ) TWF intensity-based truncation O (1) O ( n ) WF can be improved by designing a better loss function or introducing proper regularization. But is it really that bad? Zhang, Zhou, Liang and C. , “Reshaped Wirtinger Flow and Incremental Algorithms for solving Quadratic Systems of Equations”, Journal of Machine Learning Research , to appear. 10
Another look at WF • The local Hessian of WF loss satisfies w.h.p. when m = O ( n log n ) : 1 2 I � ∇ 2 ℓ W F ( x ) � n I • Implies a stepsize of µ = O (1 /n ) ⇒ O ( n log(1 /ǫ )) iterations to reach ǫ -accuracy. = 11
Another look at WF • The local Hessian of WF loss satisfies w.h.p. when m = O ( n log n ) : 1 2 I � ∇ 2 ℓ W F ( x ) � n I • Implies a stepsize of µ = O (1 /n ) ⇒ O ( n log(1 /ǫ )) iterations to reach ǫ -accuracy. = Numerically, WF can run much more aggressively! ( µ = 0 . 1 ) 11
Gradient descent theory Which region enjoys both strong convexity and smoothness? m ∇ 2 ℓ W F ( x ) = 1 � 2 − � k x ♮ � 2 � � a ⊤ a ⊤ a k a ⊤ � � 3 k x k m k =1 12
Gradient descent theory Which region enjoys both strong convexity and smoothness? m ∇ 2 ℓ W F ( x ) = 1 � 2 − � k x ♮ � 2 � � a ⊤ a ⊤ a k a ⊤ � � 3 k x k m k =1 • Not smooth if x and a k are too close (coherent) 12
Gradient descent theory Which region enjoys both strong convexity and smoothness? · x \ • x is not far away from x ♮ 12
Gradient descent theory Which region enjoys both strong convexity and smoothness? a 1 · x \ r � � a > 1 ( x − x \ ) � � p . log n k x − x \ k 2 � � • x is not far away from x ♮ • x is incoherent w.r.t. sampling vectors (incoherence region) 12
Gradient descent theory Which region enjoys both strong convexity and smoothness? a 2 a 1 · x \ � � � � p k − k r � � � a > 2 ( x − x \ ) � � a > 1 ( x − x \ ) � � p log n . � p . log n k x − x \ k 2 k x − x \ k 2 � � • x is not far away from x ♮ • x is incoherent w.r.t. sampling vectors (incoherence region) 12
A second look at gradient descent theory region of local strong convexity + smoothness • Prior theory only ensures that iterates remain in ℓ 2 ball but not incoherence region 13
A second look at gradient descent theory region of local strong convexity + smoothness • Prior theory only ensures that iterates remain in ℓ 2 ball but not incoherence region 13
A second look at gradient descent theory region of local strong convexity + smoothness • Prior theory only ensures that iterates remain in ℓ 2 ball but not incoherence region 13
A second look at gradient descent theory region of local strong convexity + smoothness • Prior theory only ensures that iterates remain in ℓ 2 ball but not incoherence region 13
A second look at gradient descent theory region of local strong convexity + smoothness · · • Prior theory only ensures that iterates remain in ℓ 2 ball but not incoherence region 13
A second look at gradient descent theory region of local strong convexity + smoothness · · • Prior theory only ensures that iterates remain in ℓ 2 ball but not incoherence region 13
A second look at gradient descent theory region of local strong convexity + smoothness · · • Prior theory only ensures that iterates remain in ℓ 2 ball but not incoherence region 13
A second look at gradient descent theory region of local strong convexity + smoothness · · • Prior theory only ensures that iterates remain in ℓ 2 ball but not incoherence region 13
Recommend
More recommend