Nonconvex Phase Retrieval with Random Gaussian Measurements Yuejie - PowerPoint PPT Presentation

Nonconvex Phase Retrieval with Random Gaussian Measurements Yuejie Chi Department of Electrical and Computer Engineering December 2017 CSA, Berlin

Acknowledgements • Primary Collaborators: Yingbin Liang (OSU), Yuanxin Li (OSU), Yi Zhou (OSU), Huishuai Zhang (MSRA), Yuxin Chen (Princeton), Cong Ma (Princeton), and Kaizheng Wang (Princeton). • This research is supported by AFOSR, ONR and NSF. 1

Phase Retrieval: The Missing Phase Problem • In high-frequency (e.g. optical) applications, the (optical) detection devices [e.g., CCD cameras, photosensitive films, and the human eye] cannot measure the phase of a light wave. ω 0 10 ω 0 100 ω 0 • Optical devices measure the photon flux (no. of photons per second per unit area), which is proportional to the magnitude. • This leads to the so-called phase retrieval problem — inference with only intensity measurements. 2

Computational Imaging Phase retrieval is the foundation for modern computational imaging. Terahertz Imaging Ankylography Ptychography Space Telescope 3

Mathematical Setup • Phase retrieval: estimate x ♮ ∈ R n / C n from m phaseless measurements: y i = |� a i , x ♮ �| , i = 1 , . . . , m where a i corresponds to the i th measurement vector. • a i ’s are (coded or oversampled) Fourier transform vectors; • a i ’s are short-time Fourier transform vectors; • a i ’s are “generic” vectors such as random Gaussian vectors. • In a vectorized notation, we write  − a ∗  1 − − a ∗ 2 −   y = | Ax ♮ | ∈ R n  ∈ R / C m × n . + , where A = .   .   .  − a ∗ m − • Phase retrieval solves a quadratic nonlinear system since: i = |� a i , x ♮ �| 2 = ( x ♮ ) ∗ a i a ∗ y 2 i x ♮ , i = 1 , . . . , m, 4

Nonconvex Procedure m 1 � x = argmin ˆ ℓ ( y i ; x ) m x ∈ R n / C n i =1 • Initialize x 0 via spectral methods to land in the neighborhood of the ground truth; • Iterative update using simple methods such as gradient descent and alternating minimization; Figure credit: Yuxin Chen. 5

Quadratic Loss of Amplitudes Wirtinger Flow (WF) employs the intensity-based loss surface [Cand` es et.al.]: m ℓ W F ( x ) = 1 |� a i , x ♮ �| 2 − |� a i , x �| 2 � 2 , � � m i =1 which is nonconvex and smooth . 6

Quadratic Loss of Amplitudes Wirtinger Flow (WF) employs the intensity-based loss surface [Cand` es et.al.]: m ℓ W F ( x ) = 1 |� a i , x ♮ �| 2 − |� a i , x �| 2 � 2 , � � m i =1 which is nonconvex and smooth . Reshaped Wirtinger Flow (RWF) : In contrast, we propose to minimize the quadratic loss of amplitude measurements: ℓ ( x ) := 1 m � y − | Ax |� 2 2 m m = 1 ℓ ( y i ; x ) = 1 � 2 , � � |� a i , x ♮ �| − |� a i , x �| � m m i =1 i =1 which is nonconvex and nonsmooth . 6

Quadratic Loss of Amplitudes Wirtinger Flow (WF) employs the intensity-based loss surface [Cand` es et.al.]: m ℓ W F ( x ) = 1 |� a i , x ♮ �| 2 − |� a i , x �| 2 � 2 , � � m i =1 which is nonconvex and smooth . Reshaped Wirtinger Flow (RWF) : In contrast, we propose to minimize the quadratic loss of amplitude measurements: ℓ ( x ) := 1 m � y − | Ax |� 2 2 m m = 1 ℓ ( y i ; x ) = 1 � 2 , � � |� a i , x ♮ �| − |� a i , x �| � m m i =1 i =1 which is nonconvex and nonsmooth . Which one is better? 6

The Choice of Loss Function is Important The quadratic loss of amplitudes enjoy better curvature in expectation! 10 5 180 9 4.5 160 8 4 140 7 3.5 120 6 3 100 5 2.5 80 4 2 60 3 1.5 40 2 1 0.5 20 1 0 0 0 2 1 2 2 1 2 2 1 0 0 1 2 0 -1 -1 0 1 0 -1 -1 0 1 -1 -2-2 -1 -2-2 -2-2 (a) Expected loss of LS (b) ℓ ( x ) (c) ℓ W F ( x ) Figure: Surface of the expected loss function of (a) least-squares (mirrored symmetrically), (b) quadratic loss of amplitudes, and (c) quadratic loss of intensity when x = [1 , − 1] T . In fact, Error Reduction (ER) , proposed by Gerchberg and Saxton in 1972, can be interpreted as alternating minimization using ℓ ( x ) . 7

Gradient Descent Reshaped Wirtinger Flow (RWF) : • The generalized gradient of ℓ ( x ) can be calculated as m ∇ ℓ ( x ) = 1 � ( � a i , x � − y i · sign ( � a i , x � )) a i m i =1 • Start with an initialization x 0 . At iteration t = 0 , 1 , . . . x t +1 = x t − µ ∇ ℓ ( x t ) I − µ x t + µ � � m A ∗ A m A ∗ diag ( y ) sign ( Ax t ) , = where µ is the step size. • Stochastic versions are even faster. 8

Statistical Measurement Model Strong performance guarantees are possible by leverage statistical properties of the measurement ensemble. • Gaussian measurement model: a i ∼ N ( 0 , I ) i.i.d. if real-valued , a i ∼ CN ( 0 , I ) i.i.d. if complex-valued , • Distance measure: φ ∈ [0 , 2 π ) � x − e jφ z � . dist ( x , z ) = min x z 9

Linear Convergence of RWF Theorem (Zhang, Zhou, Liang, C., 2016) Under i.i.d. Gaussian design, RWF achieves � t � x ♮ � 2 ( linear convergence) 1 − µ • � x t − x ♮ � 2 � � 2 provided that step size µ = O (1) is small enough and sample size m � n . 10

Linear Convergence of RWF Theorem (Zhang, Zhou, Liang, C., 2016) Under i.i.d. Gaussian design, RWF achieves � t � x ♮ � 2 ( linear convergence) 1 − µ • � x t − x ♮ � 2 � � 2 provided that step size µ = O (1) is small enough and sample size m � n . loss function regularization step size sample size WF intensity-based no O (1 /n ) O ( n log n ) RWF amplitude-based no O (1) O ( n ) TWF intensity-based truncation O (1) O ( n ) WF can be improved by designing a better loss function or introducing proper regularization. But is it really that bad? Zhang, Zhou, Liang and C. , “Reshaped Wirtinger Flow and Incremental Algorithms for solving Quadratic Systems of Equations”, Journal of Machine Learning Research , to appear. 10

Another look at WF • The local Hessian of WF loss satisfies w.h.p. when m = O ( n log n ) : 1 2 I � ∇ 2 ℓ W F ( x ) � n I • Implies a stepsize of µ = O (1 /n ) ⇒ O ( n log(1 /ǫ )) iterations to reach ǫ -accuracy. = 11

Another look at WF • The local Hessian of WF loss satisfies w.h.p. when m = O ( n log n ) : 1 2 I � ∇ 2 ℓ W F ( x ) � n I • Implies a stepsize of µ = O (1 /n ) ⇒ O ( n log(1 /ǫ )) iterations to reach ǫ -accuracy. = Numerically, WF can run much more aggressively! ( µ = 0 . 1 ) 11

Gradient descent theory Which region enjoys both strong convexity and smoothness? m ∇ 2 ℓ W F ( x ) = 1 � 2 − � k x ♮ � 2 � � a ⊤ a ⊤ a k a ⊤ � � 3 k x k m k =1 12

Gradient descent theory Which region enjoys both strong convexity and smoothness? m ∇ 2 ℓ W F ( x ) = 1 � 2 − � k x ♮ � 2 � � a ⊤ a ⊤ a k a ⊤ � � 3 k x k m k =1 • Not smooth if x and a k are too close (coherent) 12

Gradient descent theory Which region enjoys both strong convexity and smoothness? · x \ • x is not far away from x ♮ 12

Gradient descent theory Which region enjoys both strong convexity and smoothness? a 1 · x \ r � � a > 1 ( x − x \ ) � � p . log n k x − x \ k 2 � � • x is not far away from x ♮ • x is incoherent w.r.t. sampling vectors (incoherence region) 12

Gradient descent theory Which region enjoys both strong convexity and smoothness? a 2 a 1 · x \ � � � � p k − k r � � � a > 2 ( x − x \ ) � � a > 1 ( x − x \ ) � � p log n . � p . log n k x − x \ k 2 k x − x \ k 2 � � • x is not far away from x ♮ • x is incoherent w.r.t. sampling vectors (incoherence region) 12

A second look at gradient descent theory region of local strong convexity + smoothness • Prior theory only ensures that iterates remain in ℓ 2 ball but not incoherence region 13

A second look at gradient descent theory region of local strong convexity + smoothness · · • Prior theory only ensures that iterates remain in ℓ 2 ball but not incoherence region 13

Nonconvex Phase Retrieval with Random Gaussian Measurements Yuejie - PowerPoint PPT Presentation

Nonconvex Phase Retrieval with Random Gaussian Measurements Yuejie Chi Department of Electrical and Computer Engineering December 2017 CSA, Berlin Acknowledgements Primary Collaborators: Yingbin Liang (OSU), Yuanxin Li (OSU), Yi Zhou

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Byron Nelson High School Phase 2 GMP January 14, 2019 BNHS Phase 2 GMP Bid Date: December 11,

COMMUNITY GAME RETURN TO PLAY ROADMAP Phase 1 Phase 2A Phase 2B Phase 3 Phase 4 Phase 5 WRU &

Gaussian Random Variables and Processes Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Phase IB Supplement Phase II Submission Progressing Towards a Phase II Submission Phase IB

Gaussian Free Field in (self-adjoint) random matrices and random surfaces Alexei Borodin Corners

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Lecture 12 Gaussian Process Models 10/16/2018 1 Multivariate Normal Multivariate Normal

Dense Triangular Solvers on Multicore Clusters using UPC Jorge Gonzlez-Domnguez*, Mara J.

CapelliniSpTRSV: A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs Jiya Su

CSE-571 Robot moves from to . Probabilistic Robotics x , y , x ' ,

Comments on Sigma Filter Degradation of a smoothed image is due to blurring of object

RNAseq: Normalization and differential expression I Jens Gietzelt 22.05.2012 Robinson, Oshlack.

Towards a robust vision of geometric inference Claire Brcheteau Universit Paris-Sud 11,

Lecture 6: Space/Order Information Visualization CPSC 533C, Fall 2006 Tamara Munzner UBC

Nonconvex Phase Retrieval with Random Gaussian Measurements Yuejie - PowerPoint PPT Presentation

Nonconvex Phase Retrieval with Random Gaussian Measurements Yuejie Chi Department of Electrical and Computer Engineering December 2017 CSA, Berlin Acknowledgements Primary Collaborators: Yingbin Liang (OSU), Yuanxin Li (OSU), Yi Zhou

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Byron Nelson High School Phase 2 GMP January 14, 2019 BNHS Phase 2 GMP Bid Date: December 11,

COMMUNITY GAME RETURN TO PLAY ROADMAP Phase 1 Phase 2A Phase 2B Phase 3 Phase 4 Phase 5 WRU &amp;

Gaussian Random Variables and Processes Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Phase IB Supplement Phase II Submission Progressing Towards a Phase II Submission Phase IB

Gaussian Free Field in (self-adjoint) random matrices and random surfaces Alexei Borodin Corners

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

Retrieval Models: Outline CS490W: Web I nformation Search &amp; Management Retrieval Models

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Lecture 12 Gaussian Process Models 10/16/2018 1 Multivariate Normal Multivariate Normal

Dense Triangular Solvers on Multicore Clusters using UPC Jorge Gonzlez-Domnguez*, Mara J.

CapelliniSpTRSV: A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs Jiya Su

CSE-571 Robot moves from to . Probabilistic Robotics x , y , x ' ,

Comments on Sigma Filter Degradation of a smoothed image is due to blurring of object

RNAseq: Normalization and differential expression I Jens Gietzelt 22.05.2012 Robinson, Oshlack.

Towards a robust vision of geometric inference Claire Brcheteau Universit Paris-Sud 11,

Lecture 6: Space/Order Information Visualization CPSC 533C, Fall 2006 Tamara Munzner UBC

COMMUNITY GAME RETURN TO PLAY ROADMAP Phase 1 Phase 2A Phase 2B Phase 3 Phase 4 Phase 5 WRU &

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models