solving composite optimization problems with applications
play

Solving composite optimization problems, with applications to phase - PowerPoint PPT Presentation

Solving composite optimization problems, with applications to phase retrieval John Duchi (based on joint work with Feng Ruan) Outline Composite optimization problems Methods for composite optimization Application: robust phase retrieval


  1. Solving composite optimization problems, with applications to phase retrieval John Duchi (based on joint work with Feng Ruan)

  2. Outline Composite optimization problems Methods for composite optimization Application: robust phase retrieval Experimental evaluation Large scale composite optimization?

  3. What I hope to accomplish today ◮ Investigate problem structures that are not quite convex but still amenable to elegant solution approaches ◮ Show how we can leverage stochastic structure to turn hard non-convex problems into “easy” ones [Keshavan, Montanari, Oh 10; Loh & Wainwright 12] ◮ Consider large scale versions of these problems

  4. Composite optimization problems The problem: minimize f ( x ) := h ( c ( x )) x where h : R m → R is convex and c : R n → R m is smooth

  5. Motivation: the exact penalty minimize f ( x ) subject to x ∈ X x equivalent (for all large enough λ ) to minimize f ( x ) + λ dist( x, X ) x dist( x, X )

  6. Motivation: the exact penalty minimize f ( x ) subject to x ∈ X x equivalent (for all large enough λ ) to minimize f ( x ) + λ dist( x, X ) x dist( x, X )

  7. Motivation: the exact penalty minimize f ( x ) subject to x ∈ X x equivalent (for all large enough λ ) to minimize f ( x ) + λ dist( x, X ) x dist( x, X )

  8. Motivation: the exact penalty minimize f ( x ) subject to c ( x ) = 0 x equivalent to (for all large enough λ ) minimize f ( x ) + λ � c ( x ) � x [Fletcher & Watson 80, 82; Burke 85]

  9. Motivation: the exact penalty minimize f ( x ) subject to c ( x ) = 0 x equivalent to (for all large enough λ ) minimize f ( x ) + λ � c ( x ) � � �� � x = h ( c ( x )) where h ( z ) = λ � z � [Fletcher & Watson 80, 82; Burke 85]

  10. Motivation: nonlinear measurements and modeling ◮ Have true signal x ⋆ ∈ R n and measurement vectors a i ∈ R n

  11. Motivation: nonlinear measurements and modeling ◮ Have true signal x ⋆ ∈ R n and measurement vectors a i ∈ R n ◮ Observe nonlinear measurements b i = φ ( � a i , x ⋆ � ) + ξ i , i = 1 , . . . , m for φ ( · ) a nonlinear function but smooth function An objective: m � � � 2 f ( x ) = 1 φ ( � a i , x � ) − b i m i =1

  12. Motivation: nonlinear measurements and modeling ◮ Have true signal x ⋆ ∈ R n and measurement vectors a i ∈ R n ◮ Observe nonlinear measurements b i = φ ( � a i , x ⋆ � ) + ξ i , i = 1 , . . . , m for φ ( · ) a nonlinear function but smooth function An objective: m � � � 2 f ( x ) = 1 φ ( � a i , x � ) − b i m i =1 Nonlinear least squares [Nocedal & Wright 06; Plan & Vershynin 15; Oymak & Soltanolkotabi 16]

  13. (Robust) Phase retrieval [Cand` es, Li, Soltanolkotabi 15]

  14. (Robust) Phase retrieval [Cand` es, Li, Soltanolkotabi 15] Observations (usually) b i = � a i , x ⋆ � 2 yield objective � m f ( x ) = 1 | � a i , x � 2 − b i | m i =1

  15. Optimization methods How do we solve optimization problems? 1. Build a “good” but simple local model of f 2. Minimize the model (perhaps regularizing)

  16. Optimization methods How do we solve optimization problems? 1. Build a “good” but simple local model of f 2. Minimize the model (perhaps regularizing) Gradient descent: Taylor (first-order) model f ( y ) ≈ f x ( y ) := f ( x ) + ∇ f ( x ) T ( y − x )

  17. Optimization methods How do we solve optimization problems? 1. Build a “good” but simple local model of f 2. Minimize the model (perhaps regularizing) Newton’s method: Taylor (second-order) model f ( y ) ≈ f x ( y ) := f ( x ) + ∇ f ( x ) T ( y − x ) + (1 / 2)( y − x ) T ∇ 2 f ( x )( y − x )

  18. Modeling composite problems Now we make a convex model f ( x ) = h ( c ( x ))

  19. Modeling composite problems Now we make a convex model f ( x ) = h ( c ( x ) ) ���� linearize

  20. Modeling composite problems Now we make a convex model f ( y ) ≈ h ( c ( x ) + ∇ c ( x ) T ( y − x ))

  21. Modeling composite problems Now we make a convex model f ( y ) ≈ h ( c ( x ) + ∇ c ( x ) T ( y − x ) ) � �� � = c ( y )+ O ( � x − y � 2 )

  22. Modeling composite problems Now we make a convex model � � c ( x ) + ∇ c ( x ) T ( y − x ) f x ( y ) := h

  23. Modeling composite problems Now we make a convex model � � c ( x ) + ∇ c ( x ) T ( y − x ) f x ( y ) := h [Burke 85; Drusvyatskiy, Ioffe, Lewis 16]

  24. Modeling composite problems Now we make a convex model � � c ( x ) + ∇ c ( x ) T ( y − x ) f x ( y ) := h Example: f ( x ) = | x 2 − 1 | , h ( z ) = | z | and c ( x ) = x 2 − 1

  25. Modeling composite problems Now we make a convex model � � c ( x ) + ∇ c ( x ) T ( y − x ) f x ( y ) := h Example: f ( x ) = | x 2 − 1 | , h ( z ) = | z | and c ( x ) = x 2 − 1

  26. Modeling composite problems Now we make a convex model � � c ( x ) + ∇ c ( x ) T ( y − x ) f x ( y ) := h Example: f ( x ) = | x 2 − 1 | , h ( z ) = | z | and c ( x ) = x 2 − 1

  27. The prox-linear method [Burke, Drusvyatskiy et al.] Iteratively (1) form regularized convex model and (2) minimize it � � f x k ( x ) + 1 2 α � x − x k � 2 x k +1 = argmin 2 x ∈ X � � � � + 1 c ( x k ) + ∇ c ( x k ) T ( x − x k ) 2 α � x − x k � 2 = argmin h 2 x ∈ X

  28. The prox-linear method [Burke, Drusvyatskiy et al.] Iteratively (1) form regularized convex model and (2) minimize it � � f x k ( x ) + 1 2 α � x − x k � 2 x k +1 = argmin 2 x ∈ X � � � � + 1 c ( x k ) + ∇ c ( x k ) T ( x − x k ) 2 α � x − x k � 2 = argmin h 2 x ∈ X

  29. The prox-linear method [Burke, Drusvyatskiy et al.] Iteratively (1) form regularized convex model and (2) minimize it � � f x k ( x ) + 1 2 α � x − x k � 2 x k +1 = argmin 2 x ∈ X � � � � + 1 c ( x k ) + ∇ c ( x k ) T ( x − x k ) 2 α � x − x k � 2 = argmin h 2 x ∈ X | x k − x ⋆ | = . 3

  30. The prox-linear method [Burke, Drusvyatskiy et al.] Iteratively (1) form regularized convex model and (2) minimize it � � f x k ( x ) + 1 2 α � x − x k � 2 x k +1 = argmin 2 x ∈ X � � � � + 1 c ( x k ) + ∇ c ( x k ) T ( x − x k ) 2 α � x − x k � 2 = argmin h 2 x ∈ X | x k − x ⋆ | = . 024

  31. The prox-linear method [Burke, Drusvyatskiy et al.] Iteratively (1) form regularized convex model and (2) minimize it � � f x k ( x ) + 1 2 α � x − x k � 2 x k +1 = argmin 2 x ∈ X � � � � + 1 c ( x k ) + ∇ c ( x k ) T ( x − x k ) 2 α � x − x k � 2 = argmin h 2 x ∈ X | x k − x ⋆ | = 3 · 10 − 4

  32. The prox-linear method [Burke, Drusvyatskiy et al.] Iteratively (1) form regularized convex model and (2) minimize it � � f x k ( x ) + 1 2 α � x − x k � 2 x k +1 = argmin 2 x ∈ X � � � � + 1 c ( x k ) + ∇ c ( x k ) T ( x − x k ) 2 α � x − x k � 2 = argmin h 2 x ∈ X | x k − x ⋆ | = 4 · 10 − 8

  33. Robust phase retrieval problems A nice application for these composite methods

  34. Robust phase retrieval problems Data model: true signal x ⋆ ∈ R n , for p fail < 1 2 observe � 0 w.p. ≥ 1 − p fail b i = � a i , x ⋆ � 2 + ξ i where ξ i = arbitrary otherwise

  35. Robust phase retrieval problems Data model: true signal x ⋆ ∈ R n , for p fail < 1 2 observe � 0 w.p. ≥ 1 − p fail b i = � a i , x ⋆ � 2 + ξ i where ξ i = arbitrary otherwise Goal: solve m � f ( x ) = 1 | � a i , x � 2 − b i | minimize m x i =1

  36. Robust phase retrieval problems Data model: true signal x ⋆ ∈ R n , for p fail < 1 2 observe � 0 w.p. ≥ 1 − p fail b i = � a i , x ⋆ � 2 + ξ i where ξ i = arbitrary otherwise Goal: solve m � f ( x ) = 1 | � a i , x � 2 − b i | minimize m x i =1 Composite problem: f ( x ) = 1 m � φ ( Ax ) − b � 1 = h ( c ( x )) where φ ( · ) is elementwise square, h ( z ) = 1 m � z � 1 , c ( x ) = φ ( Ax ) − b

  37. A convergence theorem Three key ingredients. (1) Stability: f ( x ) − f ( x ⋆ ) ≥ λ � x − x ⋆ � 2 � x + x ⋆ � 2 � � � � � � � A T A (2) Close models: | f x ( y ) − f ( y ) | ≤ 1 � � � � � op � x − y � 2 2 m (3) A good initialization

  38. A convergence theorem Three key ingredients. (1) Stability: f ( x ) − f ( x ⋆ ) ≥ λ � x − x ⋆ � 2 � x + x ⋆ � 2 � � � � � � � A T A (2) Close models: | f x ( y ) − f ( y ) | ≤ 1 � � � � � op � x − y � 2 2 m (3) A good initialization ◮ Measurement matrix A = [ a 1 · · · a m ] T ∈ R m × n and m � mA T A = 1 1 a i a T i m i =1 ◮ Convex model f x of f at x defined by f x ( y ) = h ( c ( x ) + ∇ c ( x ) T ( y − x ))

  39. A convergence theorem Three key ingredients. (1) Stability: f ( x ) − f ( x ⋆ ) ≥ λ � x − x ⋆ � 2 � x + x ⋆ � 2 � � � � � � � A T A (2) Close models: | f x ( y ) − f ( y ) | ≤ 1 � � � � � op � x − y � 2 2 m (3) A good initialization ◮ Measurement matrix A = [ a 1 · · · a m ] T ∈ R m × n and m � mA T A = 1 1 a i a T i m i =1 ◮ Convex model f x of f at x defined by � � m � f x ( y ) = 1 � � � � a i , x � 2 + 2 � a i , x � � a i , y − x � � m i =1

Recommend


More recommend