introduction
play

Introduction to Optimization Amy Langville SAMSI Undergraduate - PowerPoint PPT Presentation

Introduction to Optimization Amy Langville SAMSI Undergraduate Workshop N.C. State University SAMSI 6/1/05 GOAL: minimize f ( x 1 , x 2 , x 3 , x 4 , x 5 ) = x 2 1 . 5 x 2 x 3 + x 4 /x 5 PRIZE: $1 million # of independent variables =


  1. Introduction to Optimization Amy Langville SAMSI Undergraduate Workshop N.C. State University SAMSI 6/1/05

  2. GOAL: minimize f ( x 1 , x 2 , x 3 , x 4 , x 5 ) = x 2 1 − . 5 x 2 x 3 + x 4 /x 5 PRIZE: $1 million # of independent variables = • z = f ( x 1 , x 2 , x 3 , x 4 , x 5 ) lives in ℜ ? •

  3. GOAL: minimize f ( x 1 , x 2 , x 3 , x 4 , x 5 ) = x 2 1 − . 5 x 2 x 3 + x 4 /x 5 PRIZE: $1 million # of independent variables = • z = f ( x 1 , x 2 , x 3 , x 4 , x 5 ) lives in ℜ ? • Suppose you know little to nothing about Calculus or Optimization, could you win the prize? How?

  4. GOAL: minimize f ( x 1 , x 2 , x 3 , x 4 , x 5 ) = x 2 1 − . 5 x 2 x 3 + x 4 /x 5 PRIZE: $1 million # of independent variables = • z = f ( x 1 , x 2 , x 3 , x 4 , x 5 ) lives in ℜ ? • Suppose you know little to nothing about Calculus or Optimization, could you win the prize? How? Trial and Error, repeated function evaluations

  5. Calculus III Review local min vs. global min vs. saddle point • CPs and horizontal T. planes • Local Mins and 2nd Derivative Test • Global Mins and CPs and BPs • Gradient = Direction of ? •

  6. Constrained vs. Unconstrained Opt. Unconstrained • min f ( x, y ) = x 2 + y 2 Constrained • min f ( x, y ) = x 2 + y 2 — s.t. x ≥ 0 , y ≥ 0 min f ( x, y ) = x 2 + y 2 — s.t. x > 0 , y > 0 min f ( x, y ) = x 2 + y 2 — s.t. 1 ≤ x ≤ 2 , 0 ≤ y ≤ 3 min f ( x, y ) = x 2 + y 2 — y = x + 2 s.t. EVT •

  7. Gradient Descent Methods Hillclimbers on Cloudy Day: max f ( x, y ) = − min f ( x, y ) • Initializations • 1st-order and 2nd-order info. from partials: Gradient + Hessian • Matlab function: gd( α, x 0 ) •

  8. Iterative Methods Issues — Convergence Test: what is it for gd.m? — Convergence Proof: is gd.m guaranteed to converge to local min? For α > 0 ? For α < 0 ? — Rate of Convergence: how many iterations? How do starting points x 0 affect number of iterations? Worst starting point for α = 4 ? Best?

  9. Convergence of Optimization Methods global vs. local vs. stationary point vs. none Most optimization algorithms cannot guarantee convergence to • global min, much less local min. However, some classes of optim. problems are particularly nice. • Convex objective—EX: z = . 5 ( α x 2 + y 2 ), α > 0 — Every local min is global min! Even for particularly tough optim. problems, sometimes the most • popular, successful algorithms perform well on many problems, despite lack of convergence theory. Must qualify statements: I found best “global min” to date. •

  10. Your Least Squares Problem how many variables/unknowns n =? • z = f ( x 1 , x 2 , ... , x n ) lives in ℜ ? • can we graph z ? •

  11. Nonsmooth, Nondifferentiable Surfaces Can’t compute gradient ∇ f ⇒ can’t use GD Methods • Line Search Methods • Method of Alternating Variables (Coordinate Descent): solve se- • ries of 1-D problems — what would these steps look like on contour map?

  12. fminsearch and Nelder-Mead maintain basis of n + 1 points where n = # variables • form simplex using these points; convex hull • idea: move in direction away from worst of these points • EX: n = 2 , so maintain basis of 3 points living in xy -plane • ⇒ simplex is triangle create new simplex by moving away from worst point: reflect, ex- • pand, contract, shrink steps

  13. 117 PROPERTIES OF NELDER–MEAD x 3 x 3 ¯ ¯ x x x r x r x e Fig. 1 . Nelder–Mead simplices after a reflection and an expansion step. The original simplex is shown with a dashed line. x 3 x 3 x cc ¯ ¯ x 1 x x x c x r Fig. 2 . Nelder–Mead simplices after an outside contraction, an inside contraction, and a shrink. The original simplex is shown with a dashed line. then x ( k +1) = x ( k ) 1 . Beyond this, whatever rule is used to define the original ordering 1 may be applied after a shrink. We define the change index k ∗ of iteration k as the smallest index of a vertex that differs between iterations k and k + 1: k ∗ = min { i | x ( k ) � = x ( k +1) (2.8) } . i i (Tie-breaking rules are needed to define a unique value of k ∗ .) When Algorithm NM terminates in step 2, 1 < k ∗ ≤ n ; with termination in step 3, k ∗ = 1; with termination in step 4, 1 ≤ k ∗ ≤ n + 1; and with termination in step 5, k ∗ = 1 or 2. A statement that “ x j changes” means that j is the change index at the relevant iteration. The rules and definitions given so far imply that, for a nonshrink iteration,

  14. N-M Algorithm

  15. N-M Algorithm

  16. � ✁ ✂ ✄ ✂ ✄ ✄ ☎ ✄ ✁ ✄ ✂ ✞ ✁ ☛ ✞ ✟ ✂ ✄ ✆ ✡ ✠ ✞ ✂ ✡ ✟ ☛ ✞ � ✂ ✠ ✝ � ☎ ✆ ☎ � ☎ ✂ ☎ ✞ ✆ ✠ ✜ ☛ ✓ ✘ ✔ ✍ ✝ ✘ ✜ ✜ ✎ ✔ ✞ ✏ ✍ ✞ ✘ ☛ ✝ ☛ ☞ ✞ ✟ ✠ ✤ ✠ ✜ ✗ ✠ ✏ ✳ ✠ ✍ ✗ ✍ ✜ ✦ ☛ ✏ ✘ ✞ ✟ ✢ ✎ ✔ ✠ ✗ ✘ ✝ ★ ✓ ✟ ✠ ✏ ✠ ✍ ✜ ✘ ✞ ✞ ✜ ✠ ✔ ✘ ✢ ✕ ✜ ✠ ✡ ✂ � ✴ ✣ ✁ ✁ ☞ ✞ ✏ ✘ ✍ ✝ ✦ ✜ ✠ ✙ ✏ ✍ ✓ ✜ ✔ ✖ ✗ ☛ ✓ ✝ ✟ ✘ ✜ ✜ ✞ ☛ ✓ ✍ ✏ ✗ ✔ ✞ ✟ ✠ ✜ ☛ ✙ ✍ ✜ ✢ ✘ ✝ ✘ ✢ ✎ ✢ ✜ ✘ ✠ ✍ ✝ ✍ ✢ ☛ ✠ ✩ ✍ ✪ ✁ ✓ ✝ ✂ ✥ ☞ ✞ ✠ ✏ ✭ ✔ ✞ ✠ ✕ ✔ ✪ SN1939A, Residual norm as a function of λ 1 and λ 2 0.23 SN1939A 0 10 0.22 0.21 Residual norm 0.2 0.19 0.18 luminosity −1 10 0.17 20 40 60 80 −2 10 λ 1 100 8 6 7 5 3 4 2 0 20 40 60 80 100 120 140 160 180 λ 2 days ✻ ☞ ✞ ✠ ✏ ✔ ✞ ✠ ✕ ✪ ✥ SN1939A, Residual norm as a function of λ 1 and λ 2 0.23 SN1939A 0 10 0.22 0.21 Residual norm 0.2 0.19 0.18 luminosity −1 0.17 10 20 40 60 80 −2 10 λ 1 100 8 7 5 6 4 3 2 0 20 40 60 80 100 120 140 160 180 λ 2 days ✻ ✮

  17. � ✁ ✂ ✄ ✂ ✄ ✄ ☎ ✄ ✁ ✄ ✂ ✞ ✁ ☛ ✞ ✟ ✂ ✄ ✆ ✡ ✠ ✞ ✂ ✡ ✟ ☛ ✞ � ✂ ✠ ✝ � ☎ ✆ ☎ � ☎ ✂ ☎ ✞ ✆ ☞ ✞ ✠ ✏ ✔ ✞ ✠ ✕ ✔ ✪ ✥ ✽ SN1939A, Residual norm as a function of λ 1 and λ 2 0.23 SN1939A 0 10 0.22 0.21 Residual norm 0.2 0.19 0.18 luminosity −1 10 0.17 20 40 60 80 −2 10 λ 1 100 7 8 6 4 5 3 2 0 20 40 60 80 100 120 140 160 180 λ 2 days ✥ ☞ ✞ ✠ ✏ ✔ ✞ ✠ ✕ ✔ ✪ ◗ SN1939A, Residual norm as a function of λ 1 and λ 2 0.23 SN1939A 0 10 0.22 0.21 Residual norm 0.2 0.19 0.18 luminosity −1 10 0.17 20 40 60 80 −2 10 λ 1 100 8 6 7 5 4 2 3 0 20 40 60 80 100 120 140 160 180 λ 2 days ✻ ✒

  18. � ✁ ✂ ✄ ✂ ✄ ✄ ☎ ✄ ✁ ✄ ✂ ✞ ✁ ☛ ✞ ✟ ✂ ✄ ✆ ✡ ✠ ✞ ✂ ✡ ✟ ☛ ✞ � ✂ ✠ ✝ � ☎ ✆ ☎ � ☎ ✂ ☎ ✞ ✆ ☞ ✞ ✠ ✏ ✔ ✞ ✠ ✕ ✔ ✪ ✥ ✡ SN1939A, Residual norm as a function of λ 1 and λ 2 0.23 SN1939A 0 10 0.22 0.21 Residual norm 0.2 0.19 0.18 luminosity −1 10 0.17 20 40 60 80 −2 10 λ 1 100 7 8 6 4 5 3 2 0 20 40 60 80 100 120 140 160 180 λ 2 days ✻ ✥ ☞ ✞ ✠ ✏ ✔ ✞ ✠ ✕ ✔ ✪ ✽ SN1939A, Residual norm as a function of λ 1 and λ 2 0.23 SN1939A 0 10 0.22 0.21 Residual norm 0.2 0.19 0.18 luminosity −1 10 0.17 20 40 60 80 −2 10 λ 1 100 8 6 7 5 4 2 3 0 20 40 60 80 100 120 140 160 180 λ 2 days ✭ ✽

Recommend


More recommend