Modern Optimization Meets Physics: Recent Progress on Phase Retrieval Yuxin Chen (on behalf of Emmanuel Cand` es) CSA 2015, Technical University Berlin, Dec. 2015
Missing phase problem Detectors record intensities of diffracted rays electric field x ( t 1 , t 2 ) − → Fraunhofer diffraction (Fourier transform) ˆ x ( f 1 , f 2 ) Fig credit: Stanford SLAC � � � 2 � 2 = � � x ( t 1 , t 2 ) e − i 2 π ( f 1 t 1 + f 2 t 2 ) d t 1 d t 2 � � intensity of electrical field: � ˆ x ( f 1 , f 2 ) � �
Missing phase problem Detectors record intensities of diffracted rays electric field x ( t 1 , t 2 ) − → Fraunhofer diffraction (Fourier transform) ˆ x ( f 1 , f 2 ) Fig credit: Stanford SLAC � � � 2 � 2 = � � x ( t 1 , t 2 ) e − i 2 π ( f 1 t 1 + f 2 t 2 ) d t 1 d t 2 � � intensity of electrical field: � ˆ x ( f 1 , f 2 ) � � � � 2 Phase retrieval : recover signal x ( t 1 , t 2 ) from intensity measurements | ˆ x ( f 1 , f 2 )
Origin in X-ray crystallography Method for determining atomic structure within a crystal Survey: Shechtman, Eldar, Cohen, Chapman, Miao and Segev (’15)
Phase retrieval in X-ray crystallography Knowledge of phase crucial to build electron density map Algorithmic means of recovering phase structure without sophisticated setups Sayre (’52), Fienup (’78) Initial success in certain cases by using very specific prior knowledge H. Hauptman J. Karle 1985 Nobel Prize
Phase retrieval is a feasibility problem y = | Ax | 2 x A Ax 1 1 9 -3 2 4 -1 1 16 4 4 2 -2 4 1 -1 9 3 4 16 Solve for x ∈ C n in m quadratic equations |� a k , x �| 2 , y k ≈ k = 1 , . . . , m where | z | 2 := [ | z 1 | 2 , · · · , | z m | 2 ] ⊤ | Ax | 2 or y ≈
An equivalent view: low-rank factorization Introduce X = xx ∗ to linearize constraints k x | 2 = a ∗ y k ≈ | a ∗ k ( xx ∗ ) a y k ≈ a ∗ = ⇒ k Xa k
An equivalent view: low-rank factorization Introduce X = xx ∗ to linearize constraints k x | 2 = a ∗ y k ≈ | a ∗ k ( xx ∗ ) a y k ≈ a ∗ = ⇒ k Xa k find X y k ≈ a ∗ s.t. k Xa k , k = 1 , · · · , m rank ( X ) = 1
An equivalent view: low-rank factorization Introduce X = xx ∗ to linearize constraints k x | 2 = a ∗ y k ≈ | a ∗ k ( xx ∗ ) a y k ≈ a ∗ = ⇒ k Xa k find X y k ≈ a ∗ s.t. k Xa k , k = 1 , · · · , m rank ( X ) = 1 Solving quadratic systems is essentially low-rank matrix completion
Solving quadratic systems is NP-complete in general ... “I can’t find an efficient algorithm, but neither can all these people.” Fig credit: coding horror
NP-complete stone problem Given weights w i ∈ R , i = 1 , . . . , n , is there an assignment x i = ± 1 s.t. � n i =1 w i x i = 0?
NP-complete stone problem Given weights w i ∈ R , i = 1 , . . . , n , is there an assignment x i = ± 1 s.t. � n i =1 w i x i = 0? Formulation as a quadratic system | x i | 2 = 1 , i = 1 , . . . , n � � 2 � � � w i x i = 0 � � i Many combinatorial problems can be reduced to solving quadratic systems
This talk: random quadratic systems are solvable in linear time!
A first impulse: maximum likelihood estimate Assume (Pretend) a noise model ... � m minimize z ℓ ( z ) = k =1 ℓ k ( z ) ���� negative log-likelihood
A first impulse: maximum likelihood estimate Assume (Pretend) a noise model ... � m minimize z ℓ ( z ) = k =1 ℓ k ( z ) ���� negative log-likelihood k x | 2 + N (0 , σ 2 ) y k ∼ | a ∗ Gaussian data: k z | 2 � 2 � y k − | a ∗ ℓ k ( z ) =
A first impulse: maximum likelihood estimate Assume (Pretend) a noise model ... � m minimize z ℓ ( z ) = k =1 ℓ k ( z ) ���� negative log-likelihood k x | 2 + N (0 , σ 2 ) y k ∼ | a ∗ Gaussian data: k z | 2 � 2 � y k − | a ∗ ℓ k ( z ) = � k x | 2 � | a ∗ Poisson data: y k ∼ Poisson k z | 2 − y k log | a ∗ ℓ k ( z ) = | a ∗ k z | 2
A first impulse: maximum likelihood estimate Assume (Pretend) a noise model ... � m minimize z ℓ ( z ) = k =1 ℓ k ( z ) ���� negative log-likelihood k x | 2 + N (0 , σ 2 ) y k ∼ | a ∗ Gaussian data: k z | 2 � 2 � y k − | a ∗ ℓ k ( z ) = � k x | 2 � | a ∗ Poisson data: y k ∼ Poisson k z | 2 − y k log | a ∗ ℓ k ( z ) = | a ∗ k z | 2 Problem: ℓ nonconvex, many local stationary points
Prior art: solving random quadratic systems y = | Ax | 2 , A ∈ R m × n n : # unknowns; m : sample size (# eqns); sample complexity cvx relaxation n mn infeasible comput. cost
Prior art: solving random quadratic systems y = | Ax | 2 , A ∈ R m × n n : # unknowns; m : sample size (# eqns); sample complexity infeasible cvx relaxation n mn infeasible comput. cost mn 2
Prior art: solving random quadratic systems y = | Ax | 2 , A ∈ R m × n n : # unknowns; m : sample size (# eqns); sample complexity infeasible cvx relaxation n mn infeasible comput. cost mn mn 2 2
Prior art: solving random quadratic systems y = | Ax | 2 , A ∈ R m × n n : # unknowns; m : sample size (# eqns); sample complexity alt-min (fresh samples at each iter) n log 3 n infeasible cvx relaxation n mn infeasible comput. cost mn mn 2 2
This talk: solving random quadratic systems y = | Ax | 2 , A ∈ R m × n n : # unknowns; m : sample size (# eqns); sample complexity alt-min (fresh samples at each iter) n log 3 n infeasible cvx relaxation This talk n mn infeasible comput. cost mn mn 2 2
Nonconvex paradigm: optimize vector variables directly � m minimize z ℓ ( z ) = k =1 ℓ k ( z ) ≈ h − i initial guess z 0 x basin of attraction Start from an appropriate initial point 1 Fienup, Cand` es et al., Netrapalli et al., Shechtman et al., Schniter et al., ...
Nonconvex paradigm: optimize vector variables directly � m minimize z ℓ ( z ) = k =1 ℓ k ( z ) ≈ h − i initial guess z 0 i ess z 0 z 1 z 2 x x basin of attraction basin of attraction Start from an appropriate initial point Proceed via some iterative updates 1 1 Fienup, Cand` es et al., Netrapalli et al., Shechtman et al., Schniter et al., ...
Wirtinger flow: Cand` es, Li and Soltanolkotabi Initialization via spectral method: z 0 ← leading eigenvector of m 1 � y k a k a ∗ k m k =1
Wirtinger flow: Cand` es, Li and Soltanolkotabi Initialization via spectral method: z 0 ← leading eigenvector of m 1 � y k a k a ∗ k m k =1 Iterations: for t = 0 , 1 , . . . z t +1 = z t − µ t m ∇ ℓ ( z t ) Make use of Wirtinger derivative � k z | 2 � 2 � � � � k z | 2 − y k y k − | a ∗ | a ∗ ( a k a ∗ ℓ ( z ) = ∇ ℓ ( z ) := 4 k ) z k k
Wirtinger flow: Cand` es, Li and Soltanolkotabi Initialization via spectral method: z 0 ← leading eigenvector of m 1 � y k a k a ∗ k m k =1 Iterations: for t = 0 , 1 , . . . z t +1 = z t − µ t m ∇ ℓ ( z t ) Make use of Wirtinger derivative � k z | 2 � 2 � � � � k z | 2 − y k y k − | a ∗ | a ∗ ( a k a ∗ ℓ ( z ) = ∇ ℓ ( z ) := 4 k ) z k k Already rich theory (see Soltanolokotabi’s Ph. D. thesis)
Interpretation of spectral initialization � m Initialization is first eigenvector of Y := 1 k =1 y k a k a ∗ k m Random Gaussian or coded diffraction models: E [ Y ] = � x � 2 I + 2 xx ∗ eigenvalues: (3 , 1 , 1 , . . . , 1) · � x � 2
Wirtinger flow as a stochastic gradient scheme + 3 � z � 2 − � x � 2 � 2 z ∗ ( I − xx ∗ ) z � L ( z ) = 4 � �� � � �� � penalizes orientation mismatch penalizes norm mismatch Ideal gradient scheme µ t z t +1 = z t − � z 0 � 2 ∇ L ( z t )
Wirtinger flow as a stochastic gradient scheme + 3 � z � 2 − � x � 2 � 2 z ∗ ( I − xx ∗ ) z � L ( z ) = 4 � �� � � �� � penalizes orientation mismatch penalizes norm mismatch Ideal gradient scheme µ t z t +1 = z t − � z 0 � 2 ∇ L ( z t ) But we don’t have access to L ( · ) !
Wirtinger flow as a stochastic gradient scheme + 3 z ∗ ( I − xx ∗ ) z � � z � 2 − � x � 2 � 2 L ( z ) = 4 � �� � � �� � penalizes orientation mismatch penalizes norm mismatch m 1 � � k z | 2 � 2 y k − | a ∗ What we can compute: ℓ ( z ) = 4 m k =1
Wirtinger flow as a stochastic gradient scheme + 3 z ∗ ( I − xx ∗ ) z � � z � 2 − � x � 2 � 2 L ( z ) = 4 � �� � � �� � penalizes orientation mismatch penalizes norm mismatch m 1 � � k z | 2 � 2 y k − | a ∗ What we can compute: ℓ ( z ) = 4 m k =1 Fixed z If z t independent of sampling vectors ( false assumption ) µ t E [ z t +1 ] = z t − � z 0 � 2 ∇ L ( z t )
Wirtinger flow as a stochastic gradient scheme + 3 z ∗ ( I − xx ∗ ) z � � z � 2 − � x � 2 � 2 L ( z ) = 4 � �� � � �� � penalizes orientation mismatch penalizes norm mismatch m 1 � � k z | 2 � 2 y k − | a ∗ What we can compute: ℓ ( z ) = 4 m k =1 Fixed z If z t independent of sampling vectors ( false assumption ) µ t E [ z t +1 ] = z t − � z 0 � 2 ∇ L ( z t ) A stochastic optimization scheme for maximizing population likelihood
Recommend
More recommend