Local, Unconstrained Function Optimization COMPSCI 527 — Computer Vision COMPSCI 527 — Computer Vision Local, Unconstrained Function Optimization 1 / 27
Outline 1 Gradient, Hessian, and Convexity 2 A Local, Unconstrained Optimization Template 3 Steepest Descent 4 Termination 5 Convergence Speed of Steepest Descent 6 Convergence Speed of Newton’s Method 7 Newton’s Method 8 Counting Steps versus Clocking COMPSCI 527 — Computer Vision Local, Unconstrained Function Optimization 2 / 27
Motivation and Scope METERS JAM • Most estimation problems are solved by optimization • Machine learning: • Parametric predictor: h ( x ; v ) : R d ⇥ R m ! Y 0 n = 1 ` ( y n , h ( x n ; v )) : R m ! R P N • Risk: L T ( v ) = 1 N • Training: v = arg min v 2 R ˆ m L T ( v ) SCENE C s • 3D Reconstruction: t I = ⇡ ( C , S ) where I are the images, C are the camera ft positions and orientations, S is scene shape 00 • Given I , find ˆ C , ˆ S = arg min C , S k I � ⇡ ( C , S ) k • In general, “solving” equation E ( z ) = 0 can be viewed as ˆ T RAG T z = arg min z k E ( z ) k EEhFE COMPSCI 527 — Computer Vision Local, Unconstrained Function Optimization 3 / 27
Only Local Minimization 2 C IRM ˆ z = arg min z 2 ? f ( z ) • All we know about f is a “black box” (think Python function) • For many problems, f has many local minima • Start somewhere ( z 0 ), and take steps “down” f ( z k + 1 ) < f ( z k ) • When we get stuck at a local minimum, we declare success • We would like global minima, but all we get is local ones • For some problems, f has a unique minimum... • ... or at least a single connected set of minima COMPSCI 527 — Computer Vision Local, Unconstrained Function Optimization 4 / 27
Gradient, Hessian, and Convexity fC k Gradient zEe q g 2 ∂ f 2 3 ∂ z 1 fork someday . r f ( z ) = ∂ f . ∂ z = 6 . 7 4 5 ∂ f ∂ z m • We saw the gradient for the case z 2 R 2 0 • If r f ( z ) exists everywhere, the condition r f ( z ) = 0 is necessary and sufficient for a stationary point (max, min, or saddle) • Warning: only necessary for a minimum! • Reduces to first derivative when f : R ! R COMPSCI 527 — Computer Vision Local, Unconstrained Function Optimization 5 / 27
Gradient, Hessian, and Convexity First Order Taylor Expansion ZE R2 f ( z ) ⇡ g 1 ( z ) = f ( z 0 ) + [ r f ( z 0 )] T ( z � z 0 ) approximates f ( z ) near z 0 with a (hyper)plane through z 0 f( z ) FE z 2 z 0 z 1 r f ( z 0 ) points to direction of steepest increase of f at z 0 • If we want to find z 1 where f ( z 1 ) < f ( z 0 ) , going along �r f ( z 0 ) seems promising • This is the general idea of steepest descent COMPSCI 527 — Computer Vision Local, Unconstrained Function Optimization 6 / 27
Gradient, Hessian, and Convexity Hessian ∂ 2 f ∂ 2 f 2 3 . . . ∂ z 2 ∂ z 1 ∂ z m 1 . . 6 . . 7 H ( z ) = . . 6 7 4 5 ∂ 2 f ∂ 2 f . . . ∂ z m ∂ z 1 ∂ z 2 m • Symmetric matrix because of Schwarz’s theorem: ∂ 2 f ∂ 2 f = ∂ z i ∂ z j ∂ z j ∂ z i • Eigenvalues are real because of symmetry • Reduces to d 2 f dz 2 for f : R ! R COMPSCI 527 — Computer Vision Local, Unconstrained Function Optimization 7 / 27
Gradient, Hessian, and Convexity Convexity f( z ) u f( z ) + (1-u) f( z' ) f( z' ) f(u z + (1-u) z' ) z O z' u z + (1-u) z' • Convex everywhere : For all z , z 0 in the (open) domain of f and for all u 2 [ 0 , 1 ] f ( u z + ( 1 � u ) z 0 ) uf ( z ) + ( 1 � u ) f ( z 0 ) • Convex at z 0 : The function f is convex everywhere in some open neighborhood of z 0 COMPSCI 527 — Computer Vision Local, Unconstrained Function Optimization 8 / 27
Gradient, Hessian, and Convexity Convexity and Hessian Of E o • If H ( z ) is defined at a stationary point z of f , then z is a minimum iff H ( z ) < 0 • “ < ” means positive semidefinite : z T H z � 0 for all z 2 R m • Above is definition of H ( z ) < 0 • To check computationally: All eigenvalues are nonnegative • H ( z ) < 0 reduces to d 2 f dz 2 � 0 for f : R ! R COMPSCI 527 — Computer Vision Local, Unconstrained Function Optimization 9 / 27
Gradient, Hessian, and Convexity Second Order Taylor Expansion O f ⇡ g 2 ( z ) = f ( z 0 ) + [ r z 0 ] T ( z � z 0 ) + ( z � z 0 ) T H ( z 0 )( z � z 0 ) approximates f ( z ) near z 0 with a quadratic equation through z 0 • For minimization, this is useful only when H ( z 0 ) < 0 • Function looks locally like a bowl f( z ) z 2 z 0 z 1 z 1 • If we want to find z 1 where f ( z 1 ) < f ( z 0 ) , going to the bottom of the bowl seems promising • This is the general idea of Newton’s method COMPSCI 527 — Computer Vision Local, Unconstrained Function Optimization 10 / 27
A Local, Unconstrained Optimization Template A Template GIVEN Eo • Regardless of method, most local unconstrained optimization methods fit the following template: ITERATION COUNT k = 0 while z k is not a minimum e I compute step direction p k with k p k k > 0 compute step size α k > 0 z k + 1 = z k + α k p k k = k + 1 I end. COMPSCI 527 — Computer Vision Local, Unconstrained Function Optimization 11 / 27
A Local, Unconstrained Optimization Template Design Decisions • Whether to stop (“while z k is not a minimum”) • In what direction to proceed ( p k ) • How long a step to take in that direction ( α k ) • Different decisions for the last two lead to different methods with very different behaviors and computational costs COMPSCI 527 — Computer Vision Local, Unconstrained Function Optimization 12 / 27
Recommend
More recommend