Solving composite optimization problems, with applications to phase retrieval John Duchi (based on joint work with Feng Ruan)
Outline Composite optimization problems Methods for composite optimization Application: robust phase retrieval Experimental evaluation Large scale composite optimization?
What I hope to accomplish today ◮ Investigate problem structures that are not quite convex but still amenable to elegant solution approaches ◮ Show how we can leverage stochastic structure to turn hard non-convex problems into “easy” ones [Keshavan, Montanari, Oh 10; Loh & Wainwright 12] ◮ Consider large scale versions of these problems
Composite optimization problems The problem: minimize f ( x ) := h ( c ( x )) x where h : R m → R is convex and c : R n → R m is smooth
Motivation: the exact penalty minimize f ( x ) subject to x ∈ X x equivalent (for all large enough λ ) to minimize f ( x ) + λ dist( x, X ) x dist( x, X )
Motivation: the exact penalty minimize f ( x ) subject to x ∈ X x equivalent (for all large enough λ ) to minimize f ( x ) + λ dist( x, X ) x dist( x, X )
Motivation: the exact penalty minimize f ( x ) subject to x ∈ X x equivalent (for all large enough λ ) to minimize f ( x ) + λ dist( x, X ) x dist( x, X )
Motivation: the exact penalty minimize f ( x ) subject to c ( x ) = 0 x equivalent to (for all large enough λ ) minimize f ( x ) + λ � c ( x ) � x [Fletcher & Watson 80, 82; Burke 85]
Motivation: the exact penalty minimize f ( x ) subject to c ( x ) = 0 x equivalent to (for all large enough λ ) minimize f ( x ) + λ � c ( x ) � � �� � x = h ( c ( x )) where h ( z ) = λ � z � [Fletcher & Watson 80, 82; Burke 85]
Motivation: nonlinear measurements and modeling ◮ Have true signal x ⋆ ∈ R n and measurement vectors a i ∈ R n
Motivation: nonlinear measurements and modeling ◮ Have true signal x ⋆ ∈ R n and measurement vectors a i ∈ R n ◮ Observe nonlinear measurements b i = φ ( � a i , x ⋆ � ) + ξ i , i = 1 , . . . , m for φ ( · ) a nonlinear function but smooth function An objective: m � � � 2 f ( x ) = 1 φ ( � a i , x � ) − b i m i =1
Motivation: nonlinear measurements and modeling ◮ Have true signal x ⋆ ∈ R n and measurement vectors a i ∈ R n ◮ Observe nonlinear measurements b i = φ ( � a i , x ⋆ � ) + ξ i , i = 1 , . . . , m for φ ( · ) a nonlinear function but smooth function An objective: m � � � 2 f ( x ) = 1 φ ( � a i , x � ) − b i m i =1 Nonlinear least squares [Nocedal & Wright 06; Plan & Vershynin 15; Oymak & Soltanolkotabi 16]
(Robust) Phase retrieval [Cand` es, Li, Soltanolkotabi 15]
(Robust) Phase retrieval [Cand` es, Li, Soltanolkotabi 15] Observations (usually) b i = � a i , x ⋆ � 2 yield objective � m f ( x ) = 1 | � a i , x � 2 − b i | m i =1
Optimization methods How do we solve optimization problems? 1. Build a “good” but simple local model of f 2. Minimize the model (perhaps regularizing)
Optimization methods How do we solve optimization problems? 1. Build a “good” but simple local model of f 2. Minimize the model (perhaps regularizing) Gradient descent: Taylor (first-order) model f ( y ) ≈ f x ( y ) := f ( x ) + ∇ f ( x ) T ( y − x )
Optimization methods How do we solve optimization problems? 1. Build a “good” but simple local model of f 2. Minimize the model (perhaps regularizing) Newton’s method: Taylor (second-order) model f ( y ) ≈ f x ( y ) := f ( x ) + ∇ f ( x ) T ( y − x ) + (1 / 2)( y − x ) T ∇ 2 f ( x )( y − x )
Modeling composite problems Now we make a convex model f ( x ) = h ( c ( x ))
Modeling composite problems Now we make a convex model f ( x ) = h ( c ( x ) ) ���� linearize
Modeling composite problems Now we make a convex model f ( y ) ≈ h ( c ( x ) + ∇ c ( x ) T ( y − x ))
Modeling composite problems Now we make a convex model f ( y ) ≈ h ( c ( x ) + ∇ c ( x ) T ( y − x ) ) � �� � = c ( y )+ O ( � x − y � 2 )
Modeling composite problems Now we make a convex model � � c ( x ) + ∇ c ( x ) T ( y − x ) f x ( y ) := h
Modeling composite problems Now we make a convex model � � c ( x ) + ∇ c ( x ) T ( y − x ) f x ( y ) := h [Burke 85; Drusvyatskiy, Ioffe, Lewis 16]
Modeling composite problems Now we make a convex model � � c ( x ) + ∇ c ( x ) T ( y − x ) f x ( y ) := h Example: f ( x ) = | x 2 − 1 | , h ( z ) = | z | and c ( x ) = x 2 − 1
Modeling composite problems Now we make a convex model � � c ( x ) + ∇ c ( x ) T ( y − x ) f x ( y ) := h Example: f ( x ) = | x 2 − 1 | , h ( z ) = | z | and c ( x ) = x 2 − 1
Modeling composite problems Now we make a convex model � � c ( x ) + ∇ c ( x ) T ( y − x ) f x ( y ) := h Example: f ( x ) = | x 2 − 1 | , h ( z ) = | z | and c ( x ) = x 2 − 1
The prox-linear method [Burke, Drusvyatskiy et al.] Iteratively (1) form regularized convex model and (2) minimize it � � f x k ( x ) + 1 2 α � x − x k � 2 x k +1 = argmin 2 x ∈ X � � � � + 1 c ( x k ) + ∇ c ( x k ) T ( x − x k ) 2 α � x − x k � 2 = argmin h 2 x ∈ X
The prox-linear method [Burke, Drusvyatskiy et al.] Iteratively (1) form regularized convex model and (2) minimize it � � f x k ( x ) + 1 2 α � x − x k � 2 x k +1 = argmin 2 x ∈ X � � � � + 1 c ( x k ) + ∇ c ( x k ) T ( x − x k ) 2 α � x − x k � 2 = argmin h 2 x ∈ X
The prox-linear method [Burke, Drusvyatskiy et al.] Iteratively (1) form regularized convex model and (2) minimize it � � f x k ( x ) + 1 2 α � x − x k � 2 x k +1 = argmin 2 x ∈ X � � � � + 1 c ( x k ) + ∇ c ( x k ) T ( x − x k ) 2 α � x − x k � 2 = argmin h 2 x ∈ X | x k − x ⋆ | = . 3
The prox-linear method [Burke, Drusvyatskiy et al.] Iteratively (1) form regularized convex model and (2) minimize it � � f x k ( x ) + 1 2 α � x − x k � 2 x k +1 = argmin 2 x ∈ X � � � � + 1 c ( x k ) + ∇ c ( x k ) T ( x − x k ) 2 α � x − x k � 2 = argmin h 2 x ∈ X | x k − x ⋆ | = . 024
The prox-linear method [Burke, Drusvyatskiy et al.] Iteratively (1) form regularized convex model and (2) minimize it � � f x k ( x ) + 1 2 α � x − x k � 2 x k +1 = argmin 2 x ∈ X � � � � + 1 c ( x k ) + ∇ c ( x k ) T ( x − x k ) 2 α � x − x k � 2 = argmin h 2 x ∈ X | x k − x ⋆ | = 3 · 10 − 4
The prox-linear method [Burke, Drusvyatskiy et al.] Iteratively (1) form regularized convex model and (2) minimize it � � f x k ( x ) + 1 2 α � x − x k � 2 x k +1 = argmin 2 x ∈ X � � � � + 1 c ( x k ) + ∇ c ( x k ) T ( x − x k ) 2 α � x − x k � 2 = argmin h 2 x ∈ X | x k − x ⋆ | = 4 · 10 − 8
Robust phase retrieval problems A nice application for these composite methods
Robust phase retrieval problems Data model: true signal x ⋆ ∈ R n , for p fail < 1 2 observe � 0 w.p. ≥ 1 − p fail b i = � a i , x ⋆ � 2 + ξ i where ξ i = arbitrary otherwise
Robust phase retrieval problems Data model: true signal x ⋆ ∈ R n , for p fail < 1 2 observe � 0 w.p. ≥ 1 − p fail b i = � a i , x ⋆ � 2 + ξ i where ξ i = arbitrary otherwise Goal: solve m � f ( x ) = 1 | � a i , x � 2 − b i | minimize m x i =1
Robust phase retrieval problems Data model: true signal x ⋆ ∈ R n , for p fail < 1 2 observe � 0 w.p. ≥ 1 − p fail b i = � a i , x ⋆ � 2 + ξ i where ξ i = arbitrary otherwise Goal: solve m � f ( x ) = 1 | � a i , x � 2 − b i | minimize m x i =1 Composite problem: f ( x ) = 1 m � φ ( Ax ) − b � 1 = h ( c ( x )) where φ ( · ) is elementwise square, h ( z ) = 1 m � z � 1 , c ( x ) = φ ( Ax ) − b
A convergence theorem Three key ingredients. (1) Stability: f ( x ) − f ( x ⋆ ) ≥ λ � x − x ⋆ � 2 � x + x ⋆ � 2 � � � � � � � A T A (2) Close models: | f x ( y ) − f ( y ) | ≤ 1 � � � � � op � x − y � 2 2 m (3) A good initialization
A convergence theorem Three key ingredients. (1) Stability: f ( x ) − f ( x ⋆ ) ≥ λ � x − x ⋆ � 2 � x + x ⋆ � 2 � � � � � � � A T A (2) Close models: | f x ( y ) − f ( y ) | ≤ 1 � � � � � op � x − y � 2 2 m (3) A good initialization ◮ Measurement matrix A = [ a 1 · · · a m ] T ∈ R m × n and m � mA T A = 1 1 a i a T i m i =1 ◮ Convex model f x of f at x defined by f x ( y ) = h ( c ( x ) + ∇ c ( x ) T ( y − x ))
A convergence theorem Three key ingredients. (1) Stability: f ( x ) − f ( x ⋆ ) ≥ λ � x − x ⋆ � 2 � x + x ⋆ � 2 � � � � � � � A T A (2) Close models: | f x ( y ) − f ( y ) | ≤ 1 � � � � � op � x − y � 2 2 m (3) A good initialization ◮ Measurement matrix A = [ a 1 · · · a m ] T ∈ R m × n and m � mA T A = 1 1 a i a T i m i =1 ◮ Convex model f x of f at x defined by � � m � f x ( y ) = 1 � � � � a i , x � 2 + 2 � a i , x � � a i , y − x � � m i =1
Recommend
More recommend