the geometry of least squares
play

The Geometry of Least Squares Mathematical Basics Inner / dot - PowerPoint PPT Presentation

The Geometry of Least Squares Mathematical Basics Inner / dot product: a and b column vectors a b = a T b = a i b i a b a T b = 0 Matrix Product: A is r s B is s t ( AB ) rt = A rs B st s Richard Lockhart STAT


  1. The Geometry of Least Squares Mathematical Basics ◮ Inner / dot product: a and b column vectors � a · b = a T b = a i b i a ⊥ b ⇔ a T b = 0 ◮ Matrix Product: A is r × s B is s × t � ( AB ) rt = A rs B st s Richard Lockhart STAT 350: Geometry of Least Squares

  2. Partitioned Matrices ◮ Partitioned matrices are like ordinary matrices but the entries are matrices themselves. ◮ They add and multiply (if the dimensions match properly) just like regular matrices but(!) you must remember that matrix multiplication is not commutative. ◮ Here is an example � A 11 � A 12 A 13 A = A 21 A 22 A 23   B 11 B 12 B = B 21 B 22   B 31 B 32 Richard Lockhart STAT 350: Geometry of Least Squares

  3. ◮ Think of A as a 2 × 3 matrix and B as a 3 × 2 matrix. ◮ multiply them to get C = AB a 2 × 2 matrix as follows: � A 11 B 11 + A 12 B 21 + A 13 B 31 � A 11 B 12 + A 12 B 22 + A 13 B 32 AB = A 21 B 11 + A 22 B 21 + A 23 B 31 A 21 B 12 + A 22 B 22 + A 23 B 32 ◮ BUT: this only works if each of the matrix products in the formulas makes sense. ◮ So, A 11 must have the same number of columns as B 11 has rows and many other similar restrictions apply. Richard Lockhart STAT 350: Geometry of Least Squares

  4. First application: X = [ X 1 | X 2 | · · · | X p ] where each X i is a column of X . Then   β 1 .   . X β = [ X 1 | X 2 | · · · | X p ]  = X 1 β 1 + X 2 β 2 + · · · + X p β p .  β p which is a linear combination of the columns of X . Definition : The column space of X , written col ( X ) is the (vector space of) set of all linear combinations of columns of X also called the space “spanned” by the columns of X . SO: ˆ µ = X β is in col ( X ). Richard Lockhart STAT 350: Geometry of Least Squares

  5. Back to normal equations: X T Y = X T X ˆ β or X T � � Y − X ˆ β = 0 or   X T 1 � � . Y − X ˆ  .  β = 0 .   X T p or � � Y − X ˆ X T = 0 i = 1 , . . . , p β i or Y − X ˆ β ⊥ every vector in col ( X ) Richard Lockhart STAT 350: Geometry of Least Squares

  6. ǫ = Y − X ˆ Definition : ˆ β is the fitted residual vector. SO: ˆ ǫ ⊥ col ( X ) and ˆ ǫ ⊥ ˆ µ Pythagoras’ Theorem : If a ⊥ b then || a || 2 + || b || 2 = || a + b || 2 Definition : || a || is the “length” or “norm” of a : √ �� a 2 || a || = i = a T a Moreover, if a , b , c , . . . are all perpendicular then || a || 2 + || b || 2 + · · · = || a + b + · · · || 2 Richard Lockhart STAT 350: Geometry of Least Squares

  7. Application Y = Y − X ˆ β + X ˆ β = ˆ ǫ + ˆ µ so || Y || 2 = || ˆ ǫ || 2 + || ˆ µ || 2 or � � � Y 2 ǫ 2 µ 2 i = ˆ i + ˆ i Definitions : � Y 2 i = Total Sum of Squares (unadjusted) � ǫ 2 ˆ i = Error or Residual Sum of Squares � µ 2 ˆ i = Regression Sum of Squares Richard Lockhart STAT 350: Geometry of Least Squares

  8. Alternative formulas for the Regression SS � µ T ˆ µ 2 ˆ i = ˆ µ = ( X ˆ β ) T ( X ˆ β ) β T X T X ˆ = ˆ β Notice the matrix identity which I will use regularly: ( AB ) T = B T A T . Richard Lockhart STAT 350: Geometry of Least Squares

  9. What is least squares? Choose ˆ β to minimize � µ i ) 2 = || Y − ˆ µ || 2 ( Y i − ˆ ǫ || 2 . The resulting ˆ That is, to minimize || ˆ µ is called the Orthogonal Projection of Y onto the column space of X . Extension : � β 1 � X = [ X 1 | X 2 ] β = p = p 1 + p 2 β 2 Imagine we fit 2 models: 1. The FULL model: Y = X β + ǫ (= X 1 β 1 + X 2 β 2 + ǫ ) 2. The REDUCED model: Y = X 1 β 1 + ǫ Richard Lockhart STAT 350: Geometry of Least Squares

  10. If we fit the full model we get ˆ ˆ ˆ ˆ ǫ F ⊥ col ( X ) (1) β F µ F ǫ F If we fit the reduced model we get ˆ ˆ ˆ ˆ µ R ∈ col ( X 1 ) ⊂ col ( X ) (2) β R µ R ǫ R Notice that ˆ ǫ F ⊥ ˆ (3) µ R . (The vector ˆ µ R is in the column space of X 1 so it is in the column space of X and ˆ ǫ F is orthogonal to everything in the column space of X .) So: Y = ˆ ǫ F + ˆ µ F = ˆ ǫ F + ˆ µ R + (ˆ µ F − ˆ µ R ) = ǫ R + ˆ µ R Richard Lockhart STAT 350: Geometry of Least Squares

  11. You know ˆ ǫ F ⊥ ˆ µ R (from (3) above) and ˆ ǫ F ⊥ ˆ µ F (from (1) above). So ǫ F ⊥ ˆ ˆ µ F − ˆ µ R Also µ R ⊥ ˆ ˆ ǫ R = ˆ ǫ F + (ˆ µ F − ˆ µ R ) So µ R ) T ˆ 0 = (ˆ ǫ F + ˆ µ F − ˆ µ R µ R ) T ˆ ǫ T = ˆ F ˆ µ R +(ˆ µ F − ˆ µ R � �� � 0 so µ F − ˆ ˆ µ R ⊥ ˆ µ R Richard Lockhart STAT 350: Geometry of Least Squares

  12. Summary We have Y = ˆ µ R + (ˆ µ F − ˆ µ R ) + ˆ ǫ F All three vectors on the Right Hand Side are perpendicular to each other. This gives: || Y || 2 = || ˆ µ R || 2 + || ˆ µ R || 2 + || ˆ ǫ F || 2 µ F − ˆ which is an Analysis of Variance (ANOVA) table! Richard Lockhart STAT 350: Geometry of Least Squares

  13. Here is the most basic version of the above: X = [ 1 | X 1 ] Y i = β 0 + · · · + ǫ i The notation here is that   1 .  .  1 = .   1 is a column vector with all entries equal to 1. The coefficient of this column, β 0 , is called the “intercept” term in the model. Richard Lockhart STAT 350: Geometry of Least Squares

  14. To find ˆ µ R we minimize � ( Y i − ˆ β 0 ) 2 and get simply β 0 = ¯ ˆ Y and   ¯ Y .  .  µ R = ˆ .   ¯ Y Our ANOVA identity is now || Y || 2 = || ˆ µ R || 2 + || ˆ µ R || 2 + || ˆ ǫ F || 2 µ F − ˆ Y 2 + || ˆ µ R || 2 + || ˆ = n ¯ ǫ F || 2 µ F − ˆ Richard Lockhart STAT 350: Geometry of Least Squares

  15. This identity is usually rewritten in subtracted form: || Y || 2 − n ¯ Y 2 = || ˆ µ R || 2 + || ˆ ǫ F || 2 µ F − ˆ Y ) 2 = � Y 2 Y 2 we find Remembering the identity � ( Y i − ¯ i − n ¯ � � � Y ) 2 = Y ) 2 + ( Y i − ¯ µ F , i − ¯ ǫ 2 (ˆ ˆ F , i These terms are respectively: ◮ the Adjusted or Corrected Total Sum of Squares, ◮ the Regression or Model Sum of Squares and ◮ the Error Sum of Squares. Richard Lockhart STAT 350: Geometry of Least Squares

  16. Simple Linear Regression ◮ Filled Gas tank 107 times. ◮ Record distance since last fill, gas needed to fill. ◮ Question for discussion: natural model? ◮ Look at JMP analysis. Richard Lockhart STAT 350: Geometry of Least Squares

  17. The sum of squares decomposition in one example ◮ Example discussed in Introduction . ◮ Consider model Y ij = µ + α i + ǫ ij with α 4 = − ( α 1 + α 2 + α 3 ). ◮ Data consist of blood coagulation times for 24 animals fed one of 4 different diets. ◮ Now I write the data in a table and decompose the table into a sum of several tables. ◮ The 4 columns of the table correspond to Diets A, B, C and D. ◮ You should think of the entries in each table as being stacked up into a column vector, but the tables save space. Richard Lockhart STAT 350: Geometry of Least Squares

  18. ◮ The design matrix can be partitioned into a column of 1s and 3 other columns. ◮ You should compute the product X T X and get   24 − 4 − 2 − 2 − 4 12 8 8     − 2 8 14 8   − 2 8 8 14 ◮ The matrix X T Y is just   � � � � � � � Y ij , Y 1 j − Y 4 j , Y 2 j − Y 4 j , Y 3 j − Y 4 j  ij j j j j j j Richard Lockhart STAT 350: Geometry of Least Squares

  19. ◮ The matrix X T X can be inverted using a program like Maple. ◮ I found that   17 7 − 1 − 1 7 65 − 23 − 23 384( X T X ) − 1 =     − 1 − 23 49 − 15   − 1 − 23 − 15 49 ◮ It now takes quite a bit of algebra to verify that the vector of fitted values can be computed by simply averaging the data in each column. Richard Lockhart STAT 350: Geometry of Least Squares

  20. That is, the fitted value, ˆ µ is the table   61 66 68 61 61 66 68 61     61 66 68 61     61 66 68 61     66 68 61     66 68 61     61   61 Richard Lockhart STAT 350: Geometry of Least Squares

  21. On the other hand fitting the model with a design matrix consisting only of a column of 1s just leads to ˆ µ R (notation from the lecture) given by   64 64 64 64 64 64 64 64     64 64 64 64     64 64 64 64     64 64 64     64 64 64     64   64 Richard Lockhart STAT 350: Geometry of Least Squares

  22. Earlier I gave identity: Y = ˆ µ R + (ˆ µ F − ˆ µ R ) + ˆ ǫ F which corresponds to the following identity: 2 62 63 68 56 3 2 64 64 64 64 3 2 − 3 2 4 − 3 3 2 1 − 3 0 − 5 3 60 67 66 62 64 64 64 64 − 3 2 4 − 3 − 1 1 − 2 1 6 7 6 7 6 7 6 7 6 63 71 71 60 7 6 64 64 64 64 7 6 − 3 2 4 − 3 7 6 2 5 3 − 1 7 6 7 6 7 6 7 6 7 59 64 67 61 64 64 64 64 − 3 2 4 − 3 − 2 − 2 − 1 0 6 7 6 7 6 7 6 7 6 7 = 6 7 + 6 7 + 6 7 65 68 63 64 64 64 2 4 − 3 − 1 0 2 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 7 66 68 64 64 64 64 2 4 − 3 0 0 3 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 7 63 64 − 3 2 4 5 4 5 4 5 4 5 59 64 − 3 − 2 Richard Lockhart STAT 350: Geometry of Least Squares

Recommend


More recommend