Lecture 5 Math Prerequisite II: Nonlinear Least-squares Lin ZHANG, - PowerPoint PPT Presentation

Lecture 5 Math Prerequisite II: Nonlinear Least-squares Lin ZHANG, PhD School of Software Engineering Tongji University Spring, 2020 Tongji University

Why is least squares an important problem? In engineering fields, some mathematical terminologies are often met Jacobian matrix Hessian matrix Damped method Homogeneous linear Trust-region method equation system Damped Newton method Inhomogeneous linear equation system Gauss-Newton method Levenberg-Marquardt method Lagrange multiplier Dog-leg method Line search Steepest descent method Newton method Tongji University

Outline • Non-linear Least Squares • General Methods for Non-linear Optimization • Basic Concepts • Descent Methods • Non-linear Least Squares Problems Tongji University

Basic Concepts Definition 1 : Local minimizer Given . n * Find so that F    : x ( ) ≤ − < δ * * F x F ( ), for x x x δ where is a small positive number Tongji University

Basic Concepts Assume that the function F is differentiable and so smooth that the Taylor expansion is valid, ( ) 1 ( ) ( ) ( ) ( ) 3 + = + + + T ' T '' F x h F x h F x h F x h O h 2 ( ) ( ) where ' is the gradient and '' is the Hessian, F x F x   ∂ ∂ ∂ 2 2 2 F F F ∂   F    ( ) x ∂ ∂ ∂ ∂ ∂ ∂ x x x x x x     ∂ x 1 1 1 2 1 n   1   ∂ ∂ ∂ 2 2 2 F F F ∂   F      ∂ ( ) 2 F x ( ) ( ) = = ∂ ∂ ∂ ∂ ∂ ∂   '' F x  x  x x x x x x , ( )   ∂ =  ' x F x 2 1 2 2 2 n ∂ ∂ x x      2   i j  × n n         ∂ ∂ ∂  2 2 2 F F F ∂ F  ( )     x ∂ ∂ ∂ ∂ ∂ ∂ x x x x x x     ∂ x   n 1 n 2 n n × n n n Tongji University

Basic Concepts Assume that the function F is differentiable and so smooth that the Taylor expansion is valid, ( ) 1 ( ) ( ) ( ) ( ) 3 + = + + + T ' T '' F x h F x h F x h F x h O h 2 ( ) ( ) where ' is the gradient and '' is the Hessian, F x F x It is easy to verify that, ( ) ' d F x ( ) = '' F x T d x Tongji University

Basic Concepts Theorem 1 : Necessary condition for a local minimizer * If is a local minimizer, then x ( ) * = ' F x 0 Definition 2 : Stationary point ( ) s = ' If , F x 0 then is said to be a stationary point for F . x s A local minimizer (or maximizer) is also a stationary point. A stationary point which is neither a local maximizer nor a local minimizer is called a saddle point Tongji University

Basic Concepts Theorem 2 : Sufficient condition for a local minimizer ( ) x '' Assume that is a stationary point and that is positive definite, then F x s s x is a local minimizer s ( ) ( ) x If is negative definite, then is a local maximizer. If is indefinite (ie. '' '' F x F x s s s it has both positive and negative eigenvalues), then is a saddle point x s Tongji University

Outline • Non-linear Least Squares • General Methods for Non-linear Optimization • Basic Concepts • Descent Methods • Non-linear Least Squares Problems Tongji University

Descent Methods • All methods for non-linear optimization are iterative: from a starting point the method produces a series of vectors which x x x , ,..., 0 1 2 (hopefully) converges to * x • The methods have measures to enforce the descending condition, ( ) ( ) < F x F x + k 1 k Thus, these kinds of methods are referred to as “descent methods” • For descent methods, in each iteration, we need to – Figure out a suitable descent direction to update the parameter – Find a step length giving good decrease in the F value Tongji University

Descent Methods Consider the variation of the F -value along the half line starting at x and with direction h , ( ) ( ) ( ) ( ) + α = + α T ' + α 2 F x h F x h F x O ( ) ( ) + α α > T '  F x h F x for sufficiently small 0 Definition 3 : Descent direction h is a descent direction for F at x if ( ) T ' < h F x 0 Tongji University

Descent Methods Descent Methods 2-phase methods 1-phase methods (direction and step length are (direction and step length are determined jointly ) determined in 2 phases separately )  Trust region methods  Damped methods Phase I Phase II Ex: Damped Newton method  Methods for Methods for computing descent computing the direction step length  Steepest descent  Line search method  Newton’s method  SD and Newton hybrid Tongji University

2-phase methods: General Algorithm Framework Algo#1 : 2-phase Descent Method (a general framework ) Tongji University

2-phase methods: steepest descent to compute the descent direction α h α When we perform a step with positive , the relative gain in function value satisfies, ( ) ( ) ( ) ( ) ( ) ( )   − + α T ' − + α F x F x h F x T ' F x F x h h F x   = = − lim lim α α h h h α → α → 0 0 ( ) θ ' h F x cos ( ) = − = − θ ' F x cos h ( ) where is the angle between vectors and θ ' h F x θ = π This shows that we get the greatest relative gain when , i.e., we use the ( ) steepest descent direction h sd given by sd = − ' h F x This is called the steepest gradient descent method Tongji University

2-phase methods: steepest descent to compute the descent direction • Properties of the steepest descent methods – The choice of descent direction is “the best” (locally) and we could combine it with an exact line search – A method like this converges, but the final convergence is linear and often very slow – For many problems, however, the method has quite good performance in the initial stage of the iterative; Considerations like this have lead to the so-called hybrid methods, which – as the name suggests – are based on two different methods. One of which is good in the initial stage, like the gradient method, and another method which is good in the final stage, like Newton’s method Tongji University

2-phase methods: Newton’s method to compute the descent direction Newton’s method is derived from the condition that x * is a stationary point, i.e., ( * = ) ' F x 0 From the current point x , along which direction moves, will it be most possible to arrive at a stationary point? I.e., we solve h from, ( ) + = ' F x h 0 what is the solution to h ? Tongji University

2-phase methods: Newton’s method to compute the descent direction     T T     So h n is the solution to,     ∂ ∂ ∂ F F F ∂ ∂      F   F  + ∇ ∇ | | h | h             | | ∂ x ∂ x ∂ x ( ) ( )       x x   x n = −     '' '     F x h F x + x h x ∂ ∂ x x 1 1 1         1 1 T T ( )  ∂       ∂      F   F   Suppose that is positive ∂ ∂ ∂ F F F '' F x | | + ∇ ∇ | | h | h                 ( )     + ∂ x h ∂ x + = = + ' F x h x  ∂ x ∂ x x ∂ x definite, then, x x x                 2 2 2 2 2             ( ) ( ) = − >     T '' T '     h F x h h F x 0 ∂ ∂ n n n F F         T T     |   |   ∂ ∂ ∂ F F F i.e., +         ∂ x h ∂ x + ∇ ∇ x x | | h | h             ( )     < n n T ' x x x h F x 0  ∂ ∂   ∂  x x x             n n n n indicates that is a descent  ∂ ∂ ∂  2 2 2 ∂ h   F F F F | |  | n |     x x x direction x ∂ ∂ ∂ ∂ ∂ ∂ ∂ x x x x x x x     1 1 1 2 1 n 1   ∂ In classical Newton method, the update is (then   ∂ ∂ ∂ F 2 2 2 F F F |   | |  |   ( ) ( ) it can be regarded as a 1-phase method), x ∂ x x x = + ∂ ∂ ∂ ∂ ∂ ∂ = + x ' '' x x x x x x h F x F x h     2 2 1 2 2 2 n x := x + h n           However, in most modern implementations, ∂ F  ∂ ∂ ∂  2 2 2 F F F   x:=x+ α h n | | |  |     x ∂ x ∂ ∂ x ∂ ∂ x ∂ ∂ x   x x x x x x   where α is determined by line search n n 1 n 2 n n Tongji University

Lecture 5 Math Prerequisite II: Nonlinear Least-squares Lin ZHANG, - PowerPoint PPT Presentation

Lecture 5 Math Prerequisite II: Nonlinear Least-squares Lin ZHANG, PhD School of Software Engineering Tongji University Spring, 2020 Tongji University Why is least squares an important problem? In engineering fields, some mathematical

Practical Least-Squares for Computer Graphics Siggraph Course 11 Siggraph Course 11 Practical

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Least Mean Squares Regression Machine Learning 1 Least Squares Method for regression

The Mathemagic of Magic Squares History of Magic Squares Mathematics and Magic Squares

Non linear Least Squares Lectures for PHD course on Numerical optimization Enrico Bertolazzi

ECE 516: Adaptive Digital Filters Lecture 13 (Recursive Least-Squares) Mojtaba Soltanalian 2

Nonlinear Control Lecture # 31 Nonlinear Observers Nonlinear Control Lecture # 31 Nonlinear

Nonlinear Control Lecture # 22 Special nonlinear Forms Nonlinear Control Lecture # 22 Special

Nonlinear Control Lecture # 21 Special nonlinear Forms Nonlinear Control Lecture # 21 Special

Statistical Geometry Processing Winter Semester 2011/2012 Least-Squares Least-Squares Fitting

9. Equality constraints and tradeoffs More least squares Example: moving average model

8. Least squares Review of linear equations Least squares Example: curve-fitting

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1

Moving Least Squares Outline The Approximation Power of Moving Least- Squares D. Levin

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

Geometry of Least Squares 2 Least squares from the

Optimization of lowest Robin eigenvalues on 2-manifolds and unbounded cones Vladimir Lotoreichik

CONVERSION FUNNEL MASTERY A N D C E R T I F I C A T I O N C L A S S OUR GOAL: Craft a

Sparse Canonical Correlation Analysis: Minimaxity, Algorithm, and Computational Barrier Harrison

Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Anima

Hermitian-Yang-Mills approach to the conjecture of Griffiths on the positivity of ample vector

Quantitative Sabotage Games Paul Hunter Universit Libre de Bruxelles Work of: Thomas Brihaye,

Individual Choice Behavior: This is a large, sprawling literature, in economics and psychology,

Diverse Particle Selection for High-Dimensional Inference in Graphical Models Erik Sudderth UC