8 least squares
play

8. Least squares Review of linear equations Least squares Example: - PowerPoint PPT Presentation

CS/ECE/ISyE 524 Introduction to Optimization Spring 201718 8. Least squares Review of linear equations Least squares Example: curve-fitting Vector norms Geometrical intuition Laurent Lessard (www.laurentlessard.com)


  1. CS/ECE/ISyE 524 Introduction to Optimization Spring 2017–18 8. Least squares ❼ Review of linear equations ❼ Least squares ❼ Example: curve-fitting ❼ Vector norms ❼ Geometrical intuition Laurent Lessard (www.laurentlessard.com)

  2. Review of linear equations System of m linear equations in n unknowns: a 11 x 1 + · · · + a 1 n x n = b 1       a 11 a 1 n x 1 b 1 . . . a 21 x 1 + · · · + a 2 n x n = b 2 . . . . ... . . . . ⇐ ⇒  =  . .   .   .  . . . . . .      . . . a m 1 a mn x n b m . . . a m 1 x 1 + · · · + a mn x n = b m Compact representation: Ax = b . Only three possibilities: 1. exactly one solution (e.g. x 1 + x 2 = 3 and x 1 − x 2 = 1) 2. infinitely many solutions (e.g. x 1 + x 2 = 0) 3. no solutions (e.g. x 1 + x 2 = 1 and x 1 + x 2 = 2) 8-2

  3. Review of linear equations ❼ column interpretation : the vector b is a linear combination of { a 1 , . . . , a n } , the columns of A .   x 1 x 2   � � Ax =  = a 1 x 1 + · · · + a n x n = b a 1 a 2 a n  .  . . . .   .  x n The solution x tells us how the vectors a i can be combined in order to produce b . ❼ can be visualized in the output space R m . 8-3

  4. Review of linear equations ❼ row interpretation : the intersection of hyperplanes i is the i th row of A . a T a T ˜ i x = b i where ˜       a T a T ˜ ˜ 1 x b 1 1 a T a T ˜ ˜ 2 x b 2       2 Ax =  x =  =  .   .   .  . . .       . . .     a T a T ˜ ˜ m x b m m The solution x is a point at the intersection of the affine hyperplanes. Each ˜ a i is a normal vector to a hyperplane. ❼ can be visualized in the input space R n . 8-4

  5. Review of linear equations ❼ The set of solutions of Ax = b is an affine subspace . ❼ If m > n , there is (usually but not always) no solution. This is the case where A is tall (overdetermined). ◮ Can we find x so that Ax ≈ b ? ◮ One possibility is to use least squares . ❼ If m < n , there are infinitely many solutions. This is the case where A is wide (underdetermined). ◮ Among all solutions to Ax = b , which one should we pick? ◮ One possibility is to use regularization . In this lecture, we will discuss least squares . 8-5

  6. Least squares ❼ Typical case of interest: m > n (overdetermined). If there is no solution to Ax = b we try instead to have Ax ≈ b . ❼ The least-squares approach: make Euclidean norm � Ax − b � as small as possible. ❼ Equivalently: make � Ax − b � 2 as small as possible. Standard form: � 2 � � minimize � Ax − b x It’s an unconstrained optimization problem. 8-6

  7. Least squares ❼ Typical case of interest: m > n (overdetermined). If there is no solution to Ax = b we try instead to have Ax ≈ b . ❼ The least-squares approach: make Euclidean norm � Ax − b � as small as possible. ❼ Equivalently: make � Ax − b � 2 as small as possible. Properties : √ � ❼ � x � = x 2 1 + · · · + x 2 x T x n = ❼ In Julia: � x � = norm(x) ❼ In JuMP: � x � 2 = dot(x,x) = sum(x.^2) 8-7

  8. Least squares ❼ column interpretation : find the linear combination of columns { a 1 , . . . , a n } that is closest to b . � Ax − b � 2 = � 2 � � � ( a 1 x 1 + · · · + a n x n ) − b b a 2 a 1 x 1 + a 2 x 2 a 1 8-8

  9. Least squares i is the i th row of A , define a T ❼ row interpretation : If ˜ i x − b i to be the i th residual component. a T r i := ˜ � Ax − b � 2 = (˜ 1 x − b 1 ) 2 + · · · + (˜ a T a T m x − b m ) 2 We minimize the sum of squares of the residuals. ❼ Solving Ax = b would make all residual components zero. Least squares attempts to make all of them small. 8-9

  10. Example: curve-fitting ❼ We are given noisy data points ( x i , y i ). ❼ We suspect they are related by y = px 2 + qx + r ❼ Find the p , q , r that best agrees with the data. Writing all the equations: y 1 ≈ px 2 1 + qx 1 + r     x 2 1 y 1 x 1 1   y 2 ≈ px 2 p 2 + qx 2 + r x 2 y 2 x 2 1     2 = ⇒  ≈ q  .   . . .  . . . . .   .  .   . . .  . r    x 2 1 y m x m y m ≈ px 2 m + qx m + r m ❼ Also called regression 8-10

  11. Example: curve-fitting ❼ More complicated : y = pe x + q cos( x ) − r √ x + s x 3 ❼ Find the p , q , r , s that best agrees with the data. Writing all the equations: −√ x 1     e x 1 x 3 cos( x 1 ) y 1   p 1 −√ x 2 x 3 e x 2 cos( x 2 ) y 2 q     2    ≈  .   . . . .  . . . . .   r     . . . . .   −√ x m    s e x m x 3 cos( x m ) y m m ❼ Julia notebook: Regression.ipynb 8-11

  12. Vector norms We want to solve Ax = b , but there is no solution. Define the residual to be the quantity r := b − Ax . We can’t make it zero, so instead we try to make it small . Many options! ❼ minimize the largest component (a.k.a. the ∞ -norm) � r � ∞ = max | r i | i ❼ minimize the sum of absolute values (a.k.a. the 1-norm) � r � 1 = | r 1 | + | r 2 | + · · · + | r m | ❼ minimize the Euclidean norm (a.k.a. the 2-norm) � r 2 1 + r 2 � r � 2 = � r � = 2 + · · · + r 2 m 8-12

  13. Vector norms � x � � 1 � Example: find that is closest to . 2 x y 4 Blue line is the set of points with coordinates ( x , x ). 3 2 Find the one closest to the red point located at (1 , 2). 1 Answer depends on your notion x of distance! - 1 1 2 3 4 - 1 8-13

  14. Vector norms � x � � 1 � Example: find that is closest to . 2 x f ( x ) 4 3 Minimize largest component: 2 min max {| x − 1 | , | x − 2 |} x 1 Optimum is at x = 1 . 5. x - 1 1 2 3 4 - 1 8-14

  15. Vector norms � x � � 1 � Example: find that is closest to . 2 x f ( x ) 4 3 Minimize sum of components: 2 min | x − 1 | + | x − 2 | x 1 Optimum is any 1 ≤ x ≤ 2. x - 1 1 2 3 4 - 1 8-15

  16. Vector norms � x � � 1 � Example: find that is closest to . 2 x f ( x ) 4 3 Minimize sum of squares: x ( x − 1) 2 + ( x − 2) 2 2 min 1 Optimum is at x = 1 . 5. x - 1 1 2 3 4 - 1 8-16

  17. Vector norms � x � � 1 � Example: find that is closest to . 2 x f ( x ) 4 Equivalently, we can: 3 Minimize √ sum of squares 2 ( x − 1) 2 + ( x − 2) 2 � min 1 x Optimum is at x = 1 . 5. x - 1 1 2 3 4 - 1 8-17

  18. Vector norms ❼ minimizing the largest component is an LP: min t � a T � min max � ˜ i x − r i ⇐ ⇒ x , t � x i a T s.t. − t ≤ ˜ i x − r i ≤ t ❼ minimizing the sum of absolute values is an LP: m min t 1 + · · · + t m � a T � � min � ˜ i x − r i ⇐ ⇒ x , t i � x a T s.t. − t i ≤ ˜ i x − r i ≤ t i i =1 ❼ minimizing the 2-norm is not an LP! m � 2 � a T � min ˜ i x − r i x i =1 8-18

  19. Geometry of LS b b − A ˆ x a 2 A ˆ x a 1 ❼ The set of points { Ax } is a subspace . ❼ We want to find ˆ x such that A ˆ x is closest to b . ❼ Insight : ( b − A ˆ x ) must be orthogonal to all line segments contained in the subspace. 8-19

  20. Geometry of LS b b − A ˆ x a 2 A ˆ x a 1 x − Az ) T ( b − A ˆ ❼ Must have: ( A ˆ x ) = 0 for all z x − z ) T ( A T b − A T A ˆ ❼ Simplifies to: (ˆ x ) = 0. Since this holds for all z , the normal equations are satisfied: A T A ˆ x = A T b 8-20

  21. Normal equations Theorem: If ˆ x satisfies the normal equations, then ˆ x is a solution to the least-squares optimization problem � Ax − b � 2 minimize x Proof: Suppose A T A ˆ x = A T b . Let x be any other point. � Ax − b � 2 = � A ( x − ˆ x − b ) � 2 x ) + ( A ˆ x ) � 2 + � A ˆ x − b � 2 + 2( x − ˆ x ) T A T ( A ˆ = � A ( x − ˆ x − b ) x ) � 2 + � A ˆ x − b � 2 = � A ( x − ˆ x − b � 2 ≥ � A ˆ 8-21

  22. Normal equations Least squares problems are easy to solve! ❼ Solving a least squares problem amounts to solving the normal equations. ❼ Normal equations can be solved in a variety of standard ways: LU (Cholesky) factorization, for example. ❼ More specialized methods are available if A is very large, sparse, or has a particular structure that can be exploited. ❼ Comparable to LPs in terms of solution difficulty. 8-22

  23. Least squares in Julia 1. Using JuMP: using JuMP, Gurobi m = Model(solver=GurobiSolver(OutputFlag=0)) @variable( m, x[1:size(A,2)] ) @objective( m, Min, sum((A*x-b).^2) ) solve(m) Note: only Gurobi or Mosek currently support this syntax 2. Solving the normal equations directly: x = inv(A’*A)*(A’*b) Note: Requires A to have full column rank ( A T A invertible) 3. Using the backslash operator (similar to Matlab): x = A\b Note: Fastest and most reliable option! 8-23

Recommend


More recommend