AM 205: lecture 18 Last time: optimization methods Today: - PowerPoint PPT Presentation

AM 205: lecture 18 ◮ Last time: optimization methods ◮ Today: conditions for optimality

Newton’s Method Example: Newton’s method for the two-point Gauss quadrature rule Recall the system of equations F 1 ( x 1 , x 2 , w 1 , w 2 ) = w 1 + w 2 − 2 = 0 F 2 ( x 1 , x 2 , w 1 , w 2 ) = w 1 x 1 + w 2 x 2 = 0 w 1 x 2 1 + w 2 x 2 F 3 ( x 1 , x 2 , w 1 , w 2 ) = 2 − 2 / 3 = 0 w 1 x 3 1 + w 2 x 3 F 4 ( x 1 , x 2 , w 1 , w 2 ) = 2 = 0

Newton’s Method We can solve this in Python using our own implementation of Newton’s method To do this, we require the Jacobian of this system:   0 0 1 1 w 1 w 2 x 1 x 2   J F ( x 1 , x 2 , w 1 , w 2 ) =  x 2 x 2  2 w 1 x 1 2 w 2 x 2  1 2  3 w 1 x 2 3 w 2 x 2 x 3 x 3 1 2 1 2

Newton’s Method Alternatively, we can use Python’s built-in fsolve function Note that fsolve computes a finite difference approximation to the Jacobian by default (Or we can pass in an analytical Jacobian if we want) Matlab has an equivalent fsolve function.

Newton’s Method Python example: With either approach and with starting guess x 0 = [ − 1 , 1 , 1 , 1], we get x k = -0.577350269189626 0.577350269189626 1.000000000000000 1.000000000000000

Conditions for Optimality

Existence of Global Minimum In order to guarantee existence and uniqueness of a global min. we need to make assumptions about the objective function e.g. if f is continuous on a closed 1 and bounded set S ⊂ R n then it has global minimum in S In one dimension, this says f achieves a minimum on the interval [ a , b ] ⊂ R In general f does not achieve a minimum on ( a , b ), e.g. consider f ( x ) = x (Though x ∈ ( a , b ) f ( x ), the largest lower bound of f on ( a , b ), is inf well-defined) 1 A set is closed if it contains its own boundary

Existence of Global Minimum Another helpful concept for existence of global min. is coercivity A continuous function f on an unbounded set S ⊂ R n is coercive if � x �→∞ f ( x ) = + ∞ lim That is, f ( x ) must be large whenever � x � is large

Existence of Global Minimum If f is coercive on a closed, unbounded 2 set S , then f has a global minimum in S Proof: From the definition of coercivity, for any M ∈ R , ∃ r > 0 such that f ( x ) ≥ M for all x ∈ S where � x � ≥ r Suppose that 0 ∈ S , and set M = f (0) Let Y ≡ { x ∈ S : � x � ≥ r } , so that f ( x ) ≥ f (0) for all x ∈ Y And we already know that f achieves a minimum (which is at most f (0)) on the closed, bounded set { x ∈ S : � x � ≤ r } Hence f achieves a minimum on S � 2 e.g. S could be all of R n , or a “closed strip” in R n

Existence of Global Minimum For example: ◮ f ( x , y ) = x 2 + y 2 is coercive on R 2 (global min. at (0 , 0)) ◮ f ( x ) = x 3 is not coercive on R ( f → −∞ for x → −∞ ) ◮ f ( x ) = e x is not coercive on R ( f → 0 for x → −∞ )

Convexity An important concept for uniqueness is convexity A set S ⊂ R n is convex if it contains the line segment between any two of its points That is, S is convex if for any x , y ∈ S , we have { θ x + (1 − θ ) y : θ ∈ [0 , 1] } ⊂ S

Convexity Similarly, we define convexity of a function f : S ⊂ R n → R f is convex if its graph along any line segment in S is on or below the chord connecting the function values i.e. f is convex if for any x , y ∈ S and any θ ∈ (0 , 1), we have f ( θ x + (1 − θ ) y ) ≤ θ f ( x ) + (1 − θ ) f ( y ) Also, if f ( θ x + (1 − θ ) y ) < θ f ( x ) + (1 − θ ) f ( y ) then f is strictly convex

Convexity 3 2.5 2 1.5 1 0.5 0 −1 −0.5 0 0.5 1 Strictly convex

Convexity 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 −0.05 −0.1 0 0.2 0.4 0.6 0.8 1 Non-convex

Convexity 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0 0.2 0.4 0.6 0.8 1 Convex (not strictly convex)

Convexity If f is a convex function on a convex set S , then any local minimum of f must be a global minimum 3 Proof: Suppose x is a local minimum, i.e. f ( x ) ≤ f ( y ) for y ∈ B ( x , ǫ ) (where B ( x , ǫ ) ≡ { y ∈ S : � y − x � ≤ ǫ } ) Suppose that x is not a global minimum, i.e. that there exists w ∈ S such that f ( w ) < f ( x ) (Then we will show that this gives a contradiction) 3 A global minimum is defined as a point z such that f ( z ) ≤ f ( x ) for all x ∈ S . Note that a global minimum may not be unique, e.g. if f ( x ) = − cos x then 0 and 2 π are both global minima.

Convexity Proof (continued...): For θ ∈ [0 , 1] we have f ( θ w + (1 − θ ) x ) ≤ θ f ( w ) + (1 − θ ) f ( x ) Let σ ∈ (0 , 1] be sufficiently small so that z ≡ σ w + (1 − σ ) x ∈ B ( x , ǫ ) Then f ( z ) ≤ σ f ( w ) + (1 − σ ) f ( x ) < σ f ( x ) + (1 − σ ) f ( x ) = f ( x ) , i.e. f ( z ) < f ( x ), which contradicts that f ( x ) is a local minimum! Hence we cannot have w ∈ S such that f ( w ) < f ( x ) �

Convexity Note that convexity does not guarantee uniqueness of global minimum e.g. a convex function can clearly have a “horizontal” section (see earlier plot) If f is a strictly convex function on a convex set S , then a local minimum of f is the unique global minimum Optimization of convex functions over convex sets is called convex optimization, which is an important subfield of optimization

Optimality Conditions We have discussed existence and uniqueness of minima, but haven’t considered how to find a minimum The familiar optimization idea from calculus in one dimension is: set derivative to zero, check the sign of the second derivative This can be generalized to R n

Optimality Conditions If f : R n → R is differentiable, then the gradient vector ∇ f : R n → R n is ∂ f ( x )   ∂ x 1 ∂ f ( x )   ∂ x 2   ∇ f ( x ) ≡ .   .  .    ∂ f ( x ) ∂ x n The importance of the gradient is that ∇ f points “uphill,” i.e. towards points with larger values than f ( x ) And similarly −∇ f points “downhill”

Optimality Conditions This follows from Taylor’s theorem for f : R n → R Recall that f ( x + δ ) = f ( x ) + ∇ f ( x ) T δ + H.O.T. Let δ ≡ − ǫ ∇ f ( x ) for ǫ > 0 and suppose that ∇ f ( x ) � = 0, then: f ( x − ǫ ∇ f ( x )) ≈ f ( x ) − ǫ ∇ f ( x ) T ∇ f ( x ) < f ( x ) Also, we see from Cauchy–Schwarz that −∇ f ( x ) is the steepest descent direction

Optimality Conditions Similarly, we see that a necessary condition for a local minimum at x ∗ ∈ S is that ∇ f ( x ∗ ) = 0 In this case there is no “downhill direction” at x ∗ The condition ∇ f ( x ∗ ) = 0 is called a first-order necessary condition for optimality, since it only involves first derivatives

Optimality Conditions x ∗ ∈ S that satisfies the first-order optimality condition is called a critical point of f But of course a critical point can be a local min., local max., or saddle point (Recall that a saddle point is where some directions are “downhill” and others are “uphill”, e.g. ( x , y ) = (0 , 0) for f ( x , y ) = x 2 − y 2 )

Optimality Conditions As in the one-dimensional case, we can look to second derivatives to classify critical points If f : R n → R is twice differentiable, then the Hessian is the matrix-valued function H f : R n → R n × n ∂ 2 f ( x ) ∂ 2 f ( x ) ∂ 2 f ( x )   · · · ∂ x 2 ∂ x 1 x 2 ∂ x 1 x n 1 ∂ 2 f ( x ) ∂ 2 f ( x ) ∂ 2 f ( x )   · · ·   ∂ x 2 x 1 ∂ x 2 ∂ x 2 x n H f ( x ) ≡  2  . . . ...  . . .  . . .     ∂ 2 f ( x ) ∂ 2 f ( x ) ∂ 2 f ( x ) · · · ∂ x 2 ∂ x n x 1 ∂ x n x 2 n The Hessian is the Jacobian matrix of the gradient ∇ f : R n → R n If the second partial derivatives of f are continuous, then ∂ 2 f /∂ x i ∂ x j = ∂ 2 f /∂ x j ∂ x i , and H f is symmetric

Optimality Conditions Suppose we have found a critical point x ∗ , so that ∇ f ( x ∗ ) = 0 From Taylor’s Theorem, for δ ∈ R n , we have f ( x ∗ ) + ∇ f ( x ∗ ) T δ + 1 f ( x ∗ + δ ) 2 δ T H f ( x ∗ + ηδ ) δ = f ( x ∗ ) + 1 2 δ T H f ( x ∗ + ηδ ) δ = for some η ∈ (0 , 1)

Optimality Conditions Recall positive definiteness: A is positive definite if x T Ax > 0 Suppose H f ( x ∗ ) is positive definite Then (by continuity) H f ( x ∗ + ηδ ) is also positive definite for � δ � sufficiently small, so that: δ T H f ( x ∗ + ηδ ) δ > 0 Hence, we have f ( x ∗ + δ ) > f ( x ∗ ) for � δ � sufficiently small, i.e. f ( x ∗ ) is a local minimum Hence, in general, positive definiteness of H f at a critical point x ∗ is a second-order sufficient condition for a local minimum

Optimality Conditions A matrix can also be negative definite: x T Ax < 0 for all x � = 0 Or indefinite: There exists x , y such that x T Ax < 0 < y T Ay Then we can classify critical points as follows: ⇒ x ∗ is a local minimum ◮ H f ( x ∗ ) positive definite = ⇒ x ∗ is a local maximum ◮ H f ( x ∗ ) negative definite = ⇒ x ∗ is a saddle point ◮ H f ( x ∗ ) indefinite =

AM 205: lecture 18 Last time: optimization methods Today: - PowerPoint PPT Presentation

AM 205: lecture 18 Last time: optimization methods Today: conditions for optimality Newtons Method Example: Newtons method for the two-point Gauss quadrature rule Recall the system of equations F 1 ( x 1 , x 2 , w 1 , w 2 ) = w 1

I-205 SB Closed at X Johnson Creek Blvd I-205 SB Detour Route: Johnson Creek Blvd WB to OR213

DISTRICT 2 VIRTUAL TOWNHALL MEETING March 12, 2020 Community Resource Representatives Melony

Janesville Patriotic Patriotic Society Traxler Park Project Project Janesville Patriotic

TA PRESENTATION JANUARY 10, 2019 ARTICLE ONE Preamble The Board of Education of District 205

Elmhurst Community Unit District 205 K-12 English Language Arts Curriculum Recommendation March

AM 205: lecture 6 Last time: finished the data fitting topic Todays lecture: numerical

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

AM 205: lecture 12 Last time: Numerical differentiation, numerical solution of ordinary

AM 205: lecture 13 Last time: Numerical solution of ordinary differential equations Today:

AM 205: lecture 15 Last time: Boundary Value Problems, PDE classification Today: Numerical

AM 205: lecture 17 Last time: introduction to optimization Today: scalar and vector

AM 205: lecture 18 Last time: optimization methods Today: conditions for optimality

AM 205: lecture 13 Last time: ODE convergence and stability, RungeKutta methods Today:

AM 205: lecture 22 Final project proposal due by 6pm on Thu Nov 17. Email Chris or the TFs to

AM 205: lecture 11 Final project worth 30% of grade Due on Thursday December 13th at 11:59

AM 205: lecture 15 Last time: Boundary Value Problems, PDE classification Today: Numerical

Kernels & Kernelization Ken Kreutz-Delgado (Nuno Vasconcelos) Winter 2012 UCSD ECE

Positive definite max-QP Recall max-cut max vE(1-v) s.t. 0 v 1 max

Mee eeting E Employers Sharing data, stories, and tools February 9, 2017 Advancing your

Statistical analysis and bootstrapping Michel Bierlaire michel.bierlaire@epfl.ch Transport and

Positive semidefinite rank Pablo A. Parrilo Laboratory for Information and Decision Systems

Symmetric indefinite systems, positive definite preconditioning, and interior eigenvalues Eugene

Determinacy for the complex moment problem via positive definite extensions Dariusz Cicho n

MSc in Computer Engineering, Cybersecurity and Artificial Intelligence Course FDE , a.a.

Sambuz

Useful Links

Newsletter

Mail Us

AM 205: lecture 18 Last time: optimization methods Today: - PowerPoint PPT Presentation

AM 205: lecture 18 Last time: optimization methods Today: conditions for optimality Newtons Method Example: Newtons method for the two-point Gauss quadrature rule Recall the system of equations F 1 ( x 1 , x 2 , w 1 , w 2 ) = w 1

I-205 SB Closed at X Johnson Creek Blvd I-205 SB Detour Route: Johnson Creek Blvd WB to OR213

DISTRICT 2 VIRTUAL TOWNHALL MEETING March 12, 2020 Community Resource Representatives Melony

Janesville Patriotic Patriotic Society Traxler Park Project Project Janesville Patriotic

TA PRESENTATION JANUARY 10, 2019 ARTICLE ONE Preamble The Board of Education of District 205

Elmhurst Community Unit District 205 K-12 English Language Arts Curriculum Recommendation March

AM 205: lecture 6 Last time: finished the data fitting topic Todays lecture: numerical

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

AM 205: lecture 12 Last time: Numerical differentiation, numerical solution of ordinary

AM 205: lecture 13 Last time: Numerical solution of ordinary differential equations Today:

AM 205: lecture 15 Last time: Boundary Value Problems, PDE classification Today: Numerical

AM 205: lecture 17 Last time: introduction to optimization Today: scalar and vector

AM 205: lecture 18 Last time: optimization methods Today: conditions for optimality

AM 205: lecture 13 Last time: ODE convergence and stability, RungeKutta methods Today:

AM 205: lecture 22 Final project proposal due by 6pm on Thu Nov 17. Email Chris or the TFs to

AM 205: lecture 11 Final project worth 30% of grade Due on Thursday December 13th at 11:59

AM 205: lecture 15 Last time: Boundary Value Problems, PDE classification Today: Numerical

Kernels &amp; Kernelization Ken Kreutz-Delgado (Nuno Vasconcelos) Winter 2012 UCSD ECE

Positive definite max-QP Recall max-cut max vE(1-v) s.t. 0 v 1 max

Mee eeting E Employers Sharing data, stories, and tools February 9, 2017 Advancing your

Statistical analysis and bootstrapping Michel Bierlaire michel.bierlaire@epfl.ch Transport and

Positive semidefinite rank Pablo A. Parrilo Laboratory for Information and Decision Systems

Symmetric indefinite systems, positive definite preconditioning, and interior eigenvalues Eugene

Determinacy for the complex moment problem via positive definite extensions Dariusz Cicho n

MSc in Computer Engineering, Cybersecurity and Artificial Intelligence Course FDE , a.a.

Sambuz

Useful Links

Newsletter

Mail Us

Kernels & Kernelization Ken Kreutz-Delgado (Nuno Vasconcelos) Winter 2012 UCSD ECE