8. Least squares Review of linear equations Least squares Example: - PowerPoint PPT Presentation

CS/ECE/ISyE 524 Introduction to Optimization Spring 2017–18 8. Least squares ❼ Review of linear equations ❼ Least squares ❼ Example: curve-fitting ❼ Vector norms ❼ Geometrical intuition Laurent Lessard (www.laurentlessard.com)

Review of linear equations System of m linear equations in n unknowns: a 11 x 1 + · · · + a 1 n x n = b 1       a 11 a 1 n x 1 b 1 . . . a 21 x 1 + · · · + a 2 n x n = b 2 . . . . ... . . . . ⇐ ⇒  =  . .   .   .  . . . . . .      . . . a m 1 a mn x n b m . . . a m 1 x 1 + · · · + a mn x n = b m Compact representation: Ax = b . Only three possibilities: 1. exactly one solution (e.g. x 1 + x 2 = 3 and x 1 − x 2 = 1) 2. infinitely many solutions (e.g. x 1 + x 2 = 0) 3. no solutions (e.g. x 1 + x 2 = 1 and x 1 + x 2 = 2) 8-2

Review of linear equations ❼ column interpretation : the vector b is a linear combination of { a 1 , . . . , a n } , the columns of A .   x 1 x 2   � � Ax =  = a 1 x 1 + · · · + a n x n = b a 1 a 2 a n  .  . . . .   .  x n The solution x tells us how the vectors a i can be combined in order to produce b . ❼ can be visualized in the output space R m . 8-3

Review of linear equations ❼ row interpretation : the intersection of hyperplanes i is the i th row of A . a T a T ˜ i x = b i where ˜       a T a T ˜ ˜ 1 x b 1 1 a T a T ˜ ˜ 2 x b 2       2 Ax =  x =  =  .   .   .  . . .       . . .     a T a T ˜ ˜ m x b m m The solution x is a point at the intersection of the affine hyperplanes. Each ˜ a i is a normal vector to a hyperplane. ❼ can be visualized in the input space R n . 8-4

Review of linear equations ❼ The set of solutions of Ax = b is an affine subspace . ❼ If m > n , there is (usually but not always) no solution. This is the case where A is tall (overdetermined). ◮ Can we find x so that Ax ≈ b ? ◮ One possibility is to use least squares . ❼ If m < n , there are infinitely many solutions. This is the case where A is wide (underdetermined). ◮ Among all solutions to Ax = b , which one should we pick? ◮ One possibility is to use regularization . In this lecture, we will discuss least squares . 8-5

Least squares ❼ Typical case of interest: m > n (overdetermined). If there is no solution to Ax = b we try instead to have Ax ≈ b . ❼ The least-squares approach: make Euclidean norm � Ax − b � as small as possible. ❼ Equivalently: make � Ax − b � 2 as small as possible. Standard form: � 2 � � minimize � Ax − b x It’s an unconstrained optimization problem. 8-6

Least squares ❼ Typical case of interest: m > n (overdetermined). If there is no solution to Ax = b we try instead to have Ax ≈ b . ❼ The least-squares approach: make Euclidean norm � Ax − b � as small as possible. ❼ Equivalently: make � Ax − b � 2 as small as possible. Properties : √ � ❼ � x � = x 2 1 + · · · + x 2 x T x n = ❼ In Julia: � x � = norm(x) ❼ In JuMP: � x � 2 = dot(x,x) = sum(x.^2) 8-7

Least squares ❼ column interpretation : find the linear combination of columns { a 1 , . . . , a n } that is closest to b . � Ax − b � 2 = � 2 � � � ( a 1 x 1 + · · · + a n x n ) − b b a 2 a 1 x 1 + a 2 x 2 a 1 8-8

Least squares i is the i th row of A , define a T ❼ row interpretation : If ˜ i x − b i to be the i th residual component. a T r i := ˜ � Ax − b � 2 = (˜ 1 x − b 1 ) 2 + · · · + (˜ a T a T m x − b m ) 2 We minimize the sum of squares of the residuals. ❼ Solving Ax = b would make all residual components zero. Least squares attempts to make all of them small. 8-9

Example: curve-fitting ❼ We are given noisy data points ( x i , y i ). ❼ We suspect they are related by y = px 2 + qx + r ❼ Find the p , q , r that best agrees with the data. Writing all the equations: y 1 ≈ px 2 1 + qx 1 + r     x 2 1 y 1 x 1 1   y 2 ≈ px 2 p 2 + qx 2 + r x 2 y 2 x 2 1     2 = ⇒  ≈ q  .   . . .  . . . . .   .  .   . . .  . r    x 2 1 y m x m y m ≈ px 2 m + qx m + r m ❼ Also called regression 8-10

Example: curve-fitting ❼ More complicated : y = pe x + q cos( x ) − r √ x + s x 3 ❼ Find the p , q , r , s that best agrees with the data. Writing all the equations: −√ x 1     e x 1 x 3 cos( x 1 ) y 1   p 1 −√ x 2 x 3 e x 2 cos( x 2 ) y 2 q     2    ≈  .   . . . .  . . . . .   r     . . . . .   −√ x m    s e x m x 3 cos( x m ) y m m ❼ Julia notebook: Regression.ipynb 8-11

Vector norms We want to solve Ax = b , but there is no solution. Define the residual to be the quantity r := b − Ax . We can’t make it zero, so instead we try to make it small . Many options! ❼ minimize the largest component (a.k.a. the ∞ -norm) � r � ∞ = max | r i | i ❼ minimize the sum of absolute values (a.k.a. the 1-norm) � r � 1 = | r 1 | + | r 2 | + · · · + | r m | ❼ minimize the Euclidean norm (a.k.a. the 2-norm) � r 2 1 + r 2 � r � 2 = � r � = 2 + · · · + r 2 m 8-12

Vector norms � x � � 1 � Example: find that is closest to . 2 x y 4 Blue line is the set of points with coordinates ( x , x ). 3 2 Find the one closest to the red point located at (1 , 2). 1 Answer depends on your notion x of distance! - 1 1 2 3 4 - 1 8-13

Vector norms � x � � 1 � Example: find that is closest to . 2 x f ( x ) 4 3 Minimize largest component: 2 min max {| x − 1 | , | x − 2 |} x 1 Optimum is at x = 1 . 5. x - 1 1 2 3 4 - 1 8-14

Vector norms � x � � 1 � Example: find that is closest to . 2 x f ( x ) 4 3 Minimize sum of components: 2 min | x − 1 | + | x − 2 | x 1 Optimum is any 1 ≤ x ≤ 2. x - 1 1 2 3 4 - 1 8-15

Vector norms � x � � 1 � Example: find that is closest to . 2 x f ( x ) 4 3 Minimize sum of squares: x ( x − 1) 2 + ( x − 2) 2 2 min 1 Optimum is at x = 1 . 5. x - 1 1 2 3 4 - 1 8-16

Vector norms � x � � 1 � Example: find that is closest to . 2 x f ( x ) 4 Equivalently, we can: 3 Minimize √ sum of squares 2 ( x − 1) 2 + ( x − 2) 2 � min 1 x Optimum is at x = 1 . 5. x - 1 1 2 3 4 - 1 8-17

Vector norms ❼ minimizing the largest component is an LP: min t � a T � min max � ˜ i x − r i ⇐ ⇒ x , t � x i a T s.t. − t ≤ ˜ i x − r i ≤ t ❼ minimizing the sum of absolute values is an LP: m min t 1 + · · · + t m � a T � � min � ˜ i x − r i ⇐ ⇒ x , t i � x a T s.t. − t i ≤ ˜ i x − r i ≤ t i i =1 ❼ minimizing the 2-norm is not an LP! m � 2 � a T � min ˜ i x − r i x i =1 8-18

Geometry of LS b b − A ˆ x a 2 A ˆ x a 1 ❼ The set of points { Ax } is a subspace . ❼ We want to find ˆ x such that A ˆ x is closest to b . ❼ Insight : ( b − A ˆ x ) must be orthogonal to all line segments contained in the subspace. 8-19

Geometry of LS b b − A ˆ x a 2 A ˆ x a 1 x − Az ) T ( b − A ˆ ❼ Must have: ( A ˆ x ) = 0 for all z x − z ) T ( A T b − A T A ˆ ❼ Simplifies to: (ˆ x ) = 0. Since this holds for all z , the normal equations are satisfied: A T A ˆ x = A T b 8-20

Normal equations Theorem: If ˆ x satisfies the normal equations, then ˆ x is a solution to the least-squares optimization problem � Ax − b � 2 minimize x Proof: Suppose A T A ˆ x = A T b . Let x be any other point. � Ax − b � 2 = � A ( x − ˆ x − b ) � 2 x ) + ( A ˆ x ) � 2 + � A ˆ x − b � 2 + 2( x − ˆ x ) T A T ( A ˆ = � A ( x − ˆ x − b ) x ) � 2 + � A ˆ x − b � 2 = � A ( x − ˆ x − b � 2 ≥ � A ˆ 8-21

Normal equations Least squares problems are easy to solve! ❼ Solving a least squares problem amounts to solving the normal equations. ❼ Normal equations can be solved in a variety of standard ways: LU (Cholesky) factorization, for example. ❼ More specialized methods are available if A is very large, sparse, or has a particular structure that can be exploited. ❼ Comparable to LPs in terms of solution difficulty. 8-22

Least squares in Julia 1. Using JuMP: using JuMP, Gurobi m = Model(solver=GurobiSolver(OutputFlag=0)) @variable( m, x[1:size(A,2)] ) @objective( m, Min, sum((A*x-b).^2) ) solve(m) Note: only Gurobi or Mosek currently support this syntax 2. Solving the normal equations directly: x = inv(A’*A)*(A’*b) Note: Requires A to have full column rank ( A T A invertible) 3. Using the backslash operator (similar to Matlab): x = A\b Note: Fastest and most reliable option! 8-23

8. Least squares Review of linear equations Least squares Example: - PowerPoint PPT Presentation

CS/ECE/ISyE 524 Introduction to Optimization Spring 201718 8. Least squares Review of linear equations Least squares Example: curve-fitting Vector norms Geometrical intuition Laurent Lessard (www.laurentlessard.com)

Practical Least-Squares for Computer Graphics Siggraph Course 11 Siggraph Course 11 Practical

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Least Mean Squares Regression Machine Learning 1 Least Squares Method for regression

The Mathemagic of Magic Squares History of Magic Squares Mathematics and Magic Squares

ECE 516: Adaptive Digital Filters Lecture 13 (Recursive Least-Squares) Mojtaba Soltanalian 2

Statistical Geometry Processing Winter Semester 2011/2012 Least-Squares Least-Squares Fitting

9. Equality constraints and tradeoffs More least squares Example: moving average model

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1

Moving Least Squares Outline The Approximation Power of Moving Least- Squares D. Levin

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

Non linear Least Squares Lectures for PHD course on Numerical optimization Enrico Bertolazzi

Geometry of Least Squares 2 Least squares from the

Squares of function spaces and function spaces on squares Miko laj Krupski University of

A fast way to compute Least Squares Teo Zhi Shen Anderson Serangoon Junior College Least

Humanoid Robotics Least Squares Maren Bennewitz Goal of This Lecture Introduction to least

Mimetic Least Squares Spectral/ hp Finite Element Method for the Poisson Equation Artur Palha 1

Shell Programming Professor Patrick McDaniel Fall 2016 Vim + Make Vim integrates with make

Getting to grips with Unix and the Linux family David Chiappini, Giulio Pasqualetti, Tommaso

Homographies and Estimating Extrinsics Instructor - Simon Lucey 16-423 - Designing Computer

Getting By With Just psql PgConf EU - Warsaw October 2017 Corey Huinker Why Use Only psql?

Introduction to SuperCollider workshop, Notam 2019 Program for the day 1. An overview: What is

Numerical Solution of Linear Systems Chen Greif Department of Computer Science The University of

Ling 555 Programming for Linguists Python Dictionaries Robert Albert Felty Speech

Po PostgreSQL tuning fo for Oracle DBAs Ab About me me Her Herv Sc Schweitzer CTO