The Geometry of Least Squares Mathematical Basics Inner / dot - PowerPoint PPT Presentation

The Geometry of Least Squares Mathematical Basics ◮ Inner / dot product: a and b column vectors � a · b = a T b = a i b i a ⊥ b ⇔ a T b = 0 ◮ Matrix Product: A is r × s B is s × t � ( AB ) rt = A rs B st s Richard Lockhart STAT 350: Geometry of Least Squares

Partitioned Matrices ◮ Partitioned matrices are like ordinary matrices but the entries are matrices themselves. ◮ They add and multiply (if the dimensions match properly) just like regular matrices but(!) you must remember that matrix multiplication is not commutative. ◮ Here is an example � A 11 � A 12 A 13 A = A 21 A 22 A 23   B 11 B 12 B = B 21 B 22   B 31 B 32 Richard Lockhart STAT 350: Geometry of Least Squares

◮ Think of A as a 2 × 3 matrix and B as a 3 × 2 matrix. ◮ multiply them to get C = AB a 2 × 2 matrix as follows: � A 11 B 11 + A 12 B 21 + A 13 B 31 � A 11 B 12 + A 12 B 22 + A 13 B 32 AB = A 21 B 11 + A 22 B 21 + A 23 B 31 A 21 B 12 + A 22 B 22 + A 23 B 32 ◮ BUT: this only works if each of the matrix products in the formulas makes sense. ◮ So, A 11 must have the same number of columns as B 11 has rows and many other similar restrictions apply. Richard Lockhart STAT 350: Geometry of Least Squares

First application: X = [ X 1 | X 2 | · · · | X p ] where each X i is a column of X . Then   β 1 .   . X β = [ X 1 | X 2 | · · · | X p ]  = X 1 β 1 + X 2 β 2 + · · · + X p β p .  β p which is a linear combination of the columns of X . Definition : The column space of X , written col ( X ) is the (vector space of) set of all linear combinations of columns of X also called the space “spanned” by the columns of X . SO: ˆ µ = X β is in col ( X ). Richard Lockhart STAT 350: Geometry of Least Squares

Back to normal equations: X T Y = X T X ˆ β or X T � � Y − X ˆ β = 0 or   X T 1 � � . Y − X ˆ  .  β = 0 .   X T p or � � Y − X ˆ X T = 0 i = 1 , . . . , p β i or Y − X ˆ β ⊥ every vector in col ( X ) Richard Lockhart STAT 350: Geometry of Least Squares

ǫ = Y − X ˆ Definition : ˆ β is the fitted residual vector. SO: ˆ ǫ ⊥ col ( X ) and ˆ ǫ ⊥ ˆ µ Pythagoras’ Theorem : If a ⊥ b then || a || 2 + || b || 2 = || a + b || 2 Definition : || a || is the “length” or “norm” of a : √ �� a 2 || a || = i = a T a Moreover, if a , b , c , . . . are all perpendicular then || a || 2 + || b || 2 + · · · = || a + b + · · · || 2 Richard Lockhart STAT 350: Geometry of Least Squares

Application Y = Y − X ˆ β + X ˆ β = ˆ ǫ + ˆ µ so || Y || 2 = || ˆ ǫ || 2 + || ˆ µ || 2 or � � � Y 2 ǫ 2 µ 2 i = ˆ i + ˆ i Definitions : � Y 2 i = Total Sum of Squares (unadjusted) � ǫ 2 ˆ i = Error or Residual Sum of Squares � µ 2 ˆ i = Regression Sum of Squares Richard Lockhart STAT 350: Geometry of Least Squares

Alternative formulas for the Regression SS � µ T ˆ µ 2 ˆ i = ˆ µ = ( X ˆ β ) T ( X ˆ β ) β T X T X ˆ = ˆ β Notice the matrix identity which I will use regularly: ( AB ) T = B T A T . Richard Lockhart STAT 350: Geometry of Least Squares

What is least squares? Choose ˆ β to minimize � µ i ) 2 = || Y − ˆ µ || 2 ( Y i − ˆ ǫ || 2 . The resulting ˆ That is, to minimize || ˆ µ is called the Orthogonal Projection of Y onto the column space of X . Extension : � β 1 � X = [ X 1 | X 2 ] β = p = p 1 + p 2 β 2 Imagine we fit 2 models: 1. The FULL model: Y = X β + ǫ (= X 1 β 1 + X 2 β 2 + ǫ ) 2. The REDUCED model: Y = X 1 β 1 + ǫ Richard Lockhart STAT 350: Geometry of Least Squares

If we fit the full model we get ˆ ˆ ˆ ˆ ǫ F ⊥ col ( X ) (1) β F µ F ǫ F If we fit the reduced model we get ˆ ˆ ˆ ˆ µ R ∈ col ( X 1 ) ⊂ col ( X ) (2) β R µ R ǫ R Notice that ˆ ǫ F ⊥ ˆ (3) µ R . (The vector ˆ µ R is in the column space of X 1 so it is in the column space of X and ˆ ǫ F is orthogonal to everything in the column space of X .) So: Y = ˆ ǫ F + ˆ µ F = ˆ ǫ F + ˆ µ R + (ˆ µ F − ˆ µ R ) = ǫ R + ˆ µ R Richard Lockhart STAT 350: Geometry of Least Squares

You know ˆ ǫ F ⊥ ˆ µ R (from (3) above) and ˆ ǫ F ⊥ ˆ µ F (from (1) above). So ǫ F ⊥ ˆ ˆ µ F − ˆ µ R Also µ R ⊥ ˆ ˆ ǫ R = ˆ ǫ F + (ˆ µ F − ˆ µ R ) So µ R ) T ˆ 0 = (ˆ ǫ F + ˆ µ F − ˆ µ R µ R ) T ˆ ǫ T = ˆ F ˆ µ R +(ˆ µ F − ˆ µ R � �� 0 so µ F − ˆ ˆ µ R ⊥ ˆ µ R Richard Lockhart STAT 350: Geometry of Least Squares

Summary We have Y = ˆ µ R + (ˆ µ F − ˆ µ R ) + ˆ ǫ F All three vectors on the Right Hand Side are perpendicular to each other. This gives: || Y || 2 = || ˆ µ R || 2 + || ˆ µ R || 2 + || ˆ ǫ F || 2 µ F − ˆ which is an Analysis of Variance (ANOVA) table! Richard Lockhart STAT 350: Geometry of Least Squares

Here is the most basic version of the above: X = [ 1 | X 1 ] Y i = β 0 + · · · + ǫ i The notation here is that   1 .  .  1 = .   1 is a column vector with all entries equal to 1. The coefficient of this column, β 0 , is called the “intercept” term in the model. Richard Lockhart STAT 350: Geometry of Least Squares

To find ˆ µ R we minimize � ( Y i − ˆ β 0 ) 2 and get simply β 0 = ¯ ˆ Y and   ¯ Y .  .  µ R = ˆ .   ¯ Y Our ANOVA identity is now || Y || 2 = || ˆ µ R || 2 + || ˆ µ R || 2 + || ˆ ǫ F || 2 µ F − ˆ Y 2 + || ˆ µ R || 2 + || ˆ = n ¯ ǫ F || 2 µ F − ˆ Richard Lockhart STAT 350: Geometry of Least Squares

This identity is usually rewritten in subtracted form: || Y || 2 − n ¯ Y 2 = || ˆ µ R || 2 + || ˆ ǫ F || 2 µ F − ˆ Y ) 2 = � Y 2 Y 2 we find Remembering the identity � ( Y i − ¯ i − n ¯ � � � Y ) 2 = Y ) 2 + ( Y i − ¯ µ F , i − ¯ ǫ 2 (ˆ ˆ F , i These terms are respectively: ◮ the Adjusted or Corrected Total Sum of Squares, ◮ the Regression or Model Sum of Squares and ◮ the Error Sum of Squares. Richard Lockhart STAT 350: Geometry of Least Squares

Simple Linear Regression ◮ Filled Gas tank 107 times. ◮ Record distance since last fill, gas needed to fill. ◮ Question for discussion: natural model? ◮ Look at JMP analysis. Richard Lockhart STAT 350: Geometry of Least Squares

The sum of squares decomposition in one example ◮ Example discussed in Introduction . ◮ Consider model Y ij = µ + α i + ǫ ij with α 4 = − ( α 1 + α 2 + α 3 ). ◮ Data consist of blood coagulation times for 24 animals fed one of 4 different diets. ◮ Now I write the data in a table and decompose the table into a sum of several tables. ◮ The 4 columns of the table correspond to Diets A, B, C and D. ◮ You should think of the entries in each table as being stacked up into a column vector, but the tables save space. Richard Lockhart STAT 350: Geometry of Least Squares

◮ The design matrix can be partitioned into a column of 1s and 3 other columns. ◮ You should compute the product X T X and get   24 − 4 − 2 − 2 − 4 12 8 8     − 2 8 14 8   − 2 8 8 14 ◮ The matrix X T Y is just   � � � � � � � Y ij , Y 1 j − Y 4 j , Y 2 j − Y 4 j , Y 3 j − Y 4 j  ij j j j j j j Richard Lockhart STAT 350: Geometry of Least Squares

◮ The matrix X T X can be inverted using a program like Maple. ◮ I found that   17 7 − 1 − 1 7 65 − 23 − 23 384( X T X ) − 1 =     − 1 − 23 49 − 15   − 1 − 23 − 15 49 ◮ It now takes quite a bit of algebra to verify that the vector of fitted values can be computed by simply averaging the data in each column. Richard Lockhart STAT 350: Geometry of Least Squares

That is, the fitted value, ˆ µ is the table   61 66 68 61 61 66 68 61     61 66 68 61     61 66 68 61     66 68 61     66 68 61     61   61 Richard Lockhart STAT 350: Geometry of Least Squares

On the other hand fitting the model with a design matrix consisting only of a column of 1s just leads to ˆ µ R (notation from the lecture) given by   64 64 64 64 64 64 64 64     64 64 64 64     64 64 64 64     64 64 64     64 64 64     64   64 Richard Lockhart STAT 350: Geometry of Least Squares

Earlier I gave identity: Y = ˆ µ R + (ˆ µ F − ˆ µ R ) + ˆ ǫ F which corresponds to the following identity: 2 62 63 68 56 3 2 64 64 64 64 3 2 − 3 2 4 − 3 3 2 1 − 3 0 − 5 3 60 67 66 62 64 64 64 64 − 3 2 4 − 3 − 1 1 − 2 1 6 7 6 7 6 7 6 7 6 63 71 71 60 7 6 64 64 64 64 7 6 − 3 2 4 − 3 7 6 2 5 3 − 1 7 6 7 6 7 6 7 6 7 59 64 67 61 64 64 64 64 − 3 2 4 − 3 − 2 − 2 − 1 0 6 7 6 7 6 7 6 7 6 7 = 6 7 + 6 7 + 6 7 65 68 63 64 64 64 2 4 − 3 − 1 0 2 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 7 66 68 64 64 64 64 2 4 − 3 0 0 3 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 7 63 64 − 3 2 4 5 4 5 4 5 4 5 59 64 − 3 − 2 Richard Lockhart STAT 350: Geometry of Least Squares

The Geometry of Least Squares Mathematical Basics Inner / dot - PowerPoint PPT Presentation

The Geometry of Least Squares Mathematical Basics Inner / dot product: a and b column vectors a b = a T b = a i b i a b a T b = 0 Matrix Product: A is r s B is s t ( AB ) rt = A rs B st s Richard Lockhart STAT

Practical Least-Squares for Computer Graphics Siggraph Course 11 Siggraph Course 11 Practical

Geometry of Least Squares 2 Least squares from the

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Least Mean Squares Regression Machine Learning 1 Least Squares Method for regression

The Mathemagic of Magic Squares History of Magic Squares Mathematics and Magic Squares

Statistical Geometry Processing Winter Semester 2011/2012 Least-Squares Least-Squares Fitting

ECE 516: Adaptive Digital Filters Lecture 13 (Recursive Least-Squares) Mojtaba Soltanalian 2

9. Equality constraints and tradeoffs More least squares Example: moving average model

8. Least squares Review of linear equations Least squares Example: curve-fitting

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1

Moving Least Squares Outline The Approximation Power of Moving Least- Squares D. Levin

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

Non linear Least Squares Lectures for PHD course on Numerical optimization Enrico Bertolazzi

Squares of function spaces and function spaces on squares Miko laj Krupski University of

A fast way to compute Least Squares Teo Zhi Shen Anderson Serangoon Junior College Least

Humanoid Robotics Least Squares Maren Bennewitz Goal of This Lecture Introduction to least

Optimal Vector Quantization: from signal processing to clustering and numerical probability

GCLC A Tool for Constructive Euclidean Geometry and More than That Predrag Jani ci c

An Introduction to the Birch and Swinnerton-Dyer Conjecture April 1, 2004 William Stein

P rs t rs sttst

Trigonometric Review Radians In this course we use radians to measure angles. The circumference

Platonic QHE Index J. Avron Department of Physics, Technion ESI, 2014 Hofstadter butterfly:

The quadratic formula You may recall the quadratic formula for roots of quadratic polynomials ax 2

Inclusive Assessment! Shelley Moore www.blogsomemoore.com @tweetsomemoore Bridging Philosophy