Constrained Optimization in ℜ : Recap October 19, 2018 197 / 424
Global Extrema on Closed Intervals Recall the extreme value theorem. A consequence is that: if either of c or d lies in( a , b ), then it is a critical number of f ; else each of c and d must lie on one of the boundaries of[ a , b ]. This gives us a procedure for finding the maximum and minimum of a continuous function f on a closed bounded interval I : Procedure [Finding extreme values on closed, bounded intervals]: Find the critical points inint ( I ) . 1 2 Compute the values of f at the critical points and at the endpoints of the interval. 3 Select the least and greatest of the computed values. October 19, 2018 198 / 424
Global Extrema on Closed Intervals (contd) To compute the maximum and minimum values of f ( x ) = 4 x 3 − 8 x 2 + 5 x on the interval [0 , 1], ▶ We first compute f ′ ( x ) = 12 x 2 − 16 x + 5which is0at x = 1 , 5 . 2 6 ▶ Values at the critical points are f ( 1 ) = 1, f ( 5 ) = 25 . 2 6 27 ▶ The values at the end points are f (0) = 0and f (1) = 1. ▶ Therefore, the minimum value is f (0) = 0and the maximum value is f (1) = f ( 1 ) =1. 2 In this context, it is relevant to discuss the one-sided derivatives of a function at the endpoints of the closed interval on which it is defined. October 19, 2018 199 / 424
Global Extrema on Closed Intervals (contd) Definition [One-sided derivatives at endpoints] : Let f be defined on a closed bounded interval [ a , b ] . The (right-sided) derivative of f at x = a is defined as f ( a + h ) − f ( a ) f ′ ( a ) =lim h h → 0 + Similarly, the (left-sided) derivative of f at x = b is defined as f ( b + h ) − f ( b ) f ′ ( b ) =lim h h → 0 − Essentially, each of the one-sided derivatives defines one-sided slopes at the endpoints. October 19, 2018 200 / 424
Global Extrema on Closed Intervals (contd) Based on these definitions, the following result can be derived. Claim If f is continuous on [ a , b ] and f ′ ( a ) exists as a real number or as ±∞ , then we have the following necessary conditions for extremum at a. f ′ ( a )= −∞ . ′ ( a ) ≤ 0 or en f If f ( a ) is the maximum value of f on [ a , b ] , th f ( a ) is the minimum value of f on [ a , b ] , then f ′ ( a ) ≥ 0 or f ′ ( a )= ∞ . If If f is continuous on [ a , b ] and f ′ ( b ) exists as a real number or as ±∞ , then we have the following necessary conditions for extremum at b October 19, 2018 201 / 424
Global Extrema on Closed Intervals (contd) Based on these definitions, the following result can be derived. Claim If f is continuous on [ a , b ] and f ′ ( a ) exists as a real number or as ±∞ , then we have the following necessary conditions for extremum at a. If f ( a ) is the maximum value of f on [ a , b ] , then f ′ ( a ) ≤ 0 or f ′ ( a ) = −∞ . If f ( a ) is the minimum value of f on [ a , b ] , then f ′ ( a ) ≥ 0 or f ′ ( a ) = ∞ . If f is continuous on [ a , b ] and f ′ ( b ) exists as a real number or as ±∞ , then we have the following necessary conditions for extremum at b If f ( b ) is the maximum value of f on [ a , b ] , then f ′ ( b ) ≥ 0 or f ′ ( b ) = ∞ . If f ( b ) is the minimum value of f on [ a , b ] , then f ′ ( b ) ≤ 0 or f ′ ( b ) = −∞ . October 19, 2018 201 / 424
Global Extrema on Closed Intervals (contd) The following result gives a useful procedure for finding extrema on closed intervals . Claim If f is continuous on [ a , b ] and f ′′ ( x ) exists for all x ∈ ( a , b ) . Then, If f ′′ ( x ) ≤ 0 , ∀ x ∈ ( a , b ) , then the minimum value of f on [ a , b ] is either f ( a ) or f ( b ) . If, in addition, f has a critical point c ∈ ( a , b ) , then f ( c ) is the maximum value of f on [ a , b ] . If f ′′ ( x ) ≥ 0 , ∀ x ∈ ( a , b ) , then the maximum value of f on [ a , b ] is either f ( a ) or f ( b ) . If, in addition, f has a critical point c ∈ ( a , b ) , then f ( c ) is the minimum value of f on [ a , b ] . October 19, 2018 202 / 424
Global Extrema on Open Intervals The next result is very useful for finding extrema on open intervals. Claim Let I be an open interval and let f ′′ ( x ) exist ∀ x ∈I . If f ′′ ( x ) ≥ 0 , ∀ x ∈I , and if there is a number c ∈I where f ′ ( c ) = 0 , then f ( c ) isthe global minimum value of fon I . If f ′′ ( x ) ≤ 0 , ∀ x ∈I , and if there is a number c ∈I where f ′ ( c ) = 0 , then f ( c ) isthe global maximum value of fon I . x − sec x and 2 For example, let f ( x ) = 3 − sin x I = ( 2 − sec x tan x = 2 = 0 ⇒ x = − π π , ) . f ( x ) = π . Further, ′ cos 2 x 3 6 2 2 3 f ′′ ( x ) = − sec x (tan 2 x +sec 2 x ) < 0on( − π , π ). Therefore, f attains the maximumvalue 2 2 − on I . f ( ) = π π 2 √ 6 9 3 October 19, 2018 203 / 424
Global Extrema on Open Intervals (contd) As another example, let us find the dimensions of the cone with minimum volume that can contain a sphere with radius R . Let h be the height of the cone and r the radius of its base. 1 2 The objective to be minimized is the volume f ( r , h ) = r h . The constraint betwen r and h is π 3 √ h 2 h − R = + r shown in Figure 10. The traingle AEF is similar to traingle ADB and therefore, 2 . R r Figure 10: October 19, 2018 204 / 424
Constrained Optimization and Subgradient Descent October 19, 2018 206 / 424
Constrained Optimization Consider the objective min f ( x ) s.t. g i ( x ) ≤ 0 , ∀ i Recall: Indicator function for g i ( x ) { i ( x ) ≤ 0 0 , if g I ( x ) = g i ∞ , otherwise ▶ We have shown that this is convex if each g i ( x )is convex. Option 1: Subgradient descent on f(x) + I_g(x) October 19, 2018 207 / 424
Constrained Optimization Consider the objective min f ( x ) s.t. g i ( x ) ≤ 0 , ∀ i Recall: Indicator function for g i ( x ) { i ( x ) ≤ 0 0 , if g I ( x ) = g i ∞ , otherwise ▶ We have shown that this is convex if each g i ( x )is convex. ∑ Option 1: Use subgradient descent to minimize f ( x ) + I ( x ) g i i Option 2: Barrier Method (approximate I g i ( x )using some differentiable and non-decreasing function such as − (1/ t )log − u ), Augmented Lagrangian, ADMM, etc. October 19, 2018 207 / 424
Option 1: (Sub)Gradient Descent with Sum of indicators Convert our objective to the following unconstrained optimization problem { } | g ( x ) ≤ 0 is convex if g ( x )is convex. Each C = x i i i We take ∑ min F ( x ) =min f ( x ) + I ( x ) x C i x i Recap a subgradient of F : October 19, 2018 208 / 424
Option 1: (Sub)Gradient Descent with Sum of indicators Convert our objective to the following unconstrained optimization problem { } | g ( x ) ≤ 0 is convex if g ( x )is convex. Each C = x i i i We take ∑ min F ( x ) =min f ( x ) + I ( x ) x C i x i ∑ Recap a subgradient of F : h ( x ) = h ( x ) + h ( x ). Recallthat F f I i Ci k optimizes ▶ h f ( x ) = ∇ f ( x )if f ( x )is differentiable. Also, −∇ f ( x )at x Let us treat the gradient of f at x^k as that vector which minimized the second order quadratic expansion of f around x^k October 19, 2018 208 / 424
Option 1: (Sub)Gradient Descent with Sum of indicators Convert our objective to the following unconstrained optimization problem { } | g ( x ) ≤ 0 is convex if g ( x )is convex. Each C = x i i i We take ∑ min F ( x ) =min f ( x ) + I ( x ) x C i x i ∑ Recap a subgradient of F : h ( x ) = h ( x ) + h ( x ). Recallthat F f I i Ci ▶ h f ( x ) = ∇ f ( x )if f ( x )is differentiable. Also, −∇ f ( x )at x k optimizes the first order 1 : −∇ f ( x ) =argmin f ∇ f ( x ) h + || h || : approximation for f ( x )around x k k T k 2 ( x ) + 2 h Variations on the form of 1 || h || 2 lead to Mirror Descent etc. replacing with entropic 2 r e g u l a r i z e r ▶ h I Ci ( x )is d ∈ R n s.t. d T x ≥ d T y , ∀ y ∈ C . Also, h ( x ) = 0if x is in t h e i n t e r i o r of C , and i I Ci i has other solutions if x is on the boundary: Analysis for convex g i ’s leads to KKT conditions and Dual Ascent etc. October 19, 2018 208 / 424
Option 1: Generalized Gradient Descent Consider the problem of minimizing the following sum of a differentiable function f ( x ) ∑ and a (possibly) nondifferentiable function c ( x )(an example being i I C i ( x )) m x in F ( x ) =mi x n f ( x ) + c ( x ) k leaving As in gradient descent, consider the first order approximation for f ( x )around x k +1 : c ( x )alone to obtain the next iterate x 1 k +1 k ∇ f ( x )( x − x ) + T k k || x − x || + k 2 =argmin f ( x )+ c ( x ) x 2 t x October 19, 2018 209 / 424
Option 1: Generalized Gradient Descent Consider the problem of minimizing the following sum of a differentiable function f ( x ) ∑ and a (possibly) nondifferentiable function c ( x )(an example being i I C i ( x )) m x in F ( x ) =mi x n f ( x ) + c ( x ) k leaving As in gradient descent, consider the first order approximation for f ( x )around x k +1 : c ( x )alone to obtain the next iterate x 1 k +1 =argmin f ( x ) + ∇ f k T ( x )( x − x ) + k k || x − x || + c ( x ) k 2 x 2 t x Deleting f ( k ) from the objective and adding t ||∇ f ( x k ) || 2 to the objective (without any x 2 loss) to complete squares, we obtain x k +1 as: October 19, 2018 209 / 424
Recommend
More recommend