introductory course on non smooth optimisation
play

Introductory Course on Non-smooth Optimisation Lecture 09 - - PowerPoint PPT Presentation

Introductory Course on Non-smooth Optimisation Lecture 09 - Non-convex optimisation Jingwei Liang Department of Applied Mathematics and Theoretical Physics Table of contents Examples 1 2 Non-convex optimisation Convex relaxation 3 4


  1. Introductory Course on Non-smooth Optimisation Lecture 09 - Non-convex optimisation Jingwei Liang Department of Applied Mathematics and Theoretical Physics

  2. Table of contents Examples 1 2 Non-convex optimisation Convex relaxation 3 4 Łojasiewicz inequality 5 Kurdyka-Łojasiewicz inequality

  3. Compressed sensing For orwar ard ob obser servation tion b = A ˚ x , ˚ x ∈ R n is sparse. A : R n → R m with m << n . Compr Compressed essed sensing sensing min x ∈ R n || x || 0 s . t . Ax = b . NB : NP-hard problem. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  4. Image processing Two-phase o-phase segmen segmentation tion Given an image I , which consists of foreground and background, segment the foreground. Ideally, I = f C + b Ω \ C . Mum Mumfor ord–Shah d–Shah model model �� � � E ( u , C ) = ( u − I ) 2 d x + λ ||∇ u || 2 d x + α | C | , Ω Ω \ C where | C | = peri ( C ) . Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  5. Principal component pursuit For orwar ard mix mixtur ture model model w = ˚ x +˚ y + ǫ, x ∈ R m × n is κ -sparse, ˚ y ∈ R m × n is σ -low-rank and ǫ is noise. where ˚ Non-con Non-c onvex PCP PCP 1 2 || x + y − w || 2 min x , y ∈ R m × n s . t . || x || 0 ≤ κ rank ( y ) ≤ σ. and Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  6. Neural networks Each ach la layer er of of NNs NNs is is con onvex Linear operation, e.g. convolution. Non-linear activation function, e.g. rectifier max { x , 0 } . The composition of convex functions is not necessarily convex... Neural networks are universal function approximators. Hence need to approximate non-convex functions. Cannot approximate non-convex functions with convex functions. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  7. Outline 1 Examples 2 Non-convex optimisation 3 Convex relaxation 4 Łojasiewicz inequality 5 Kurdyka-Łojasiewicz inequality

  8. Non-convex optimisation Non-convex problem Any problem that is not convex/concave is non-convex... Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  9. Challenges Potentially many local minima. Saddle points. Very flat regions. Widely varying curvature. NP-hard. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  10. Outline 1 Examples 2 Non-convex optimisation 3 Convex relaxation 4 Łojasiewicz inequality 5 Kurdyka-Łojasiewicz inequality

  11. Convex relaxation Non-c Non-con onvex op optimisa timisation tion pr problem oblem min E ( x ) . x Con Convex op optimisa timisation tion pr problem oblem min F ( x ) . x Wha What if if Argmin ( F ) ⊆ Argmin ( E ) , Subtle and case-dependent. Somehow, finding F is almost equivalent to solving E . Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  12. Convex relaxation Loose relaxation Ideal relaxation In practice, it is easier to obtain Argmin ( E ) ⊆ Argmin ( F ) . Loose relaxation will work ork if two global minima are close enough. ail if Argmin ( F ) is too large. Ideal relaxation will fail Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  13. Convolution For certain problems, non-convexity can be treated as noise... Original function Convolution Symmetric boundary condition for the convolution. Almost convex problem after convolution. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  14. Outline 1 Examples 2 Non-convex optimisation 3 Convex relaxation 4 Łojasiewicz inequality 5 Kurdyka-Łojasiewicz inequality

  15. Smooth problem Let F ∈ C 1 L . Gradient descent x k + 1 = x k − γ ∇ F ( x k ) . Descent property F ( x k ) − F ( x k + 1 ) ≥ γ ( 1 − γ L 2 ) ||∇ F ( x k ) || 2 . Let γ ∈ ] 0 , 2 / L [ , i = 0 ||∇ F ( x i ) || 2 ≤ F ( x 0 ) − F ( x k + 1 ) ≤ F ( x 0 ) − F ( x ⋆ ) . 2 ) � k γ ( 1 − γ L F ( x ⋆ ) > −∞ , rhs is a positive constant. for lhs, let k → + ∞ , k → + ∞ ||∇ F ( x k ) || 2 = 0 . lim NB : for smooth case, a critical point is guarantee. For non-smooth problem... Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  16. Semi-algebraic sets and functions Semi-algebraic set A semi-algebraic subset of R n is a finite union of sets of the form x ∈ R n : f i ( x ) = 0 , g j ( x ) ≤ 0 , i ∈ I , j ∈ J � � where I , J are finite and f i , g j : R n → R are real polynomial functions. Stability under finite ∩ , ∪ and complementation. Semi-algebraic set A function or a mapping is semi-algebraic if its graph is a semi-algebraic set. Same definition for real-extended function or multivalued mappings. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  17. Properties Tarski-Seidenberg The image of a semi-algebraic set by a linear projection is semi-algebraic. The closure of a semi-algebraic set A is semi-algebraic. Ex Example ample The graph of the derivative of a semi-algebraic function is semi-algebraic. Let A be a semi-algebraic subset of R n and f : R n → R p semi-algebraic. Then f ( A ) is semi-algebraic. g ( x ) = max { F ( x , y ) : y ∈ S } is semi-algebraic if F and S are semi-algebraic. Other examples 2 || Ax − b || 2 + µ || x || p : p is rational , min 1 x 2 || AX − B || 2 + µ rank ( X ) . min 1 X Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  18. Subdifferential tial R ∈ Γ 0 ( R n ) Con Convex subdiff subdiffer eren ential g : R ( x ′ ) ≥ R ( x ) + � g , x ′ − x � , ∀ x ′ ∈ R n � � ∂ R ( x ) = . Fréchet subdifferential Given x ∈ dom ( R ) , the Fréchet subdifferential ˆ ∂ R ( x ) of R at x is the set of vectors v such that R ( x ′ ) − R ( x ) − � v , x ′ − x � � � lim inf 1 ≥ 0 . || x − x ′ || x ′ → x , x ′ � = x ∈ dom ( R ) , then ˆ If x / ∂ R ( x ) = ∅ . Limiting subdifferential The limiting-subdifferential (or simply subdifferential) of R at x , written as ∂ R ( x ) , reads = { v ∈ R n : ∃ x k → x , R ( x k ) → R ( x ) , v k ∈ ˆ ∂ R ( x ) def ∂ R ( x k ) → v } . ˆ ∂ R is convex and ∂ R is closed. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  19. Critical points Minimal Minimal norm norm subgr subgradien adient || ∂ R ( x ) || − = min {|| v || : v ∈ ∂ R ( x ) } . Critical points Fermat’s rule: if x is a minimiser of R , then 0 ∈ ∂ R ( x ) . Conversely when 0 ∈ ∂ R ( x ) , the point x is called a critical point. When R is convex, any minimiser is a global minimiser. When R is non-convex – Local minima. – Local maxima. – Saddle point. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  20. Sharpness Sharpness Function R : R n → R ∪ { + ∞} is called sharp on the slice x ∈ R n : a < f ( x ) < b � � [ a < R < b ] def = . If there exists α > 0 such that || ∂ R ( x ) || − ≥ α, ∀ x ∈ [ a < R < b ] . Norms, e.g. R ( x ) = || x || . Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  21. Łojasiewicz inequality Łojasiewicz inequality Let R : R n → R ∪ { + ∞} be proper lower semi-continuous, and moreover continuous along its domain. Then R is said to have Łojasiewicz property if: for any critical point ¯ x , there exist C , ǫ > 0 and θ ∈ [ 0 , 1 [ such that x ) | θ ≤ C || v || , ∀ x ∈ B ¯ | R ( x ) − R (¯ x ( ǫ ) , v ∈ ∂ R ( x ) . By convention, let 0 0 = 0. Property Suppose that R has Łojasiewicz property. If S is a connected subset of the set of critical points of R , that is 0 ∈ ∂ R ( x ) for all x ∈ S , then R is constant on S . If in addition S is a compact set, then there exist C , ǫ > 0 and θ ∈ [ 0 , 1 [ such that x ) | θ ≤ C || v || . ∀ x ∈ R n , dist ( x , S ) ≤ ǫ, ∀ v ∈ ∂ R ( x ) : | R ( x ) − R (¯ Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  22. Non-convex PPA Proximal point algorithm Let R : R n → R ∪ { + ∞} be proper and lower semi-continuous. From arbitrary x 0 ∈ R n , x k + 1 ∈ argmin x γ R ( x ) + 1 2 || x − x k || 2 . Assump Assumption tion R is proper, that is x ∈ R n R ( x ) > −∞ . inf This implies argmin x γ R ( x ) + 1 2 || x − x k || 2 is non-empty and compact. The restriction of R to its domain is a continuous function. R has the Łojasiewicz property. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  23. Property Property Let { x k } k ∈ N be the sequence generated by non-convex PPA and ω ( x k ) the set of its limiting points. Then Sequence { R ( x k ) } k ∈ N is decreasing. k || x k − x k + 1 || 2 < + ∞ . � If R satisfies assumption 2, then ω ( x k ) ⊂ crit ( R ) . If moreover, { x k } k ∈ N is bounded ω ( x k ) is a non-empty compact set, and � � → 0 . x k , ω ( x k ) dist If R satisfies assumption 2, then R is finite and constant on ω ( x k ) . NB : Boundedness can be guaranteed if R is coercive. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Recommend


More recommend