Fitting Linear Statistical Models to Data by Least Squares III: Multivariate Brian R. Hunt and C. David Levermore University of Maryland, College Park Math 420: Mathematical Modeling January 25, 2012 version
Outline 1) Introduction to Linear Statistical Models 2) Linear Euclidean Least Squares Fitting 3) Linear Weighted Least Squares Fitting 4) Least Squares Fitting for Univariate Polynomial Models 5) Least Squares Fitting with Orthogonalization 6) Multivariate Linear Least Squares Fitting 7) General Multivariate Linear Least Squares Fitting
6. Multivariate Linear Least Squares Fitting The least square method extends to settings with a multivariate dependent variable y . Suppose we are given data { ( x j , y j ) } n j =1 where the x j lie within a domain X ⊂ R p and the y j lie in R q . The problem we will examine is now the following. How can you use this data set to make a reasonable guess about the value of y when x takes a value X that is not represnted in the data set? In this setting x is called the independent variable while y is called the dependent variable . We will use weighted least squares to fit the data to a linear statistical model with m parameter q -vectors in the form m � f ( x ; β 1 , · · · , β m ) = β i f i ( x ) , i =1 where each basis function f i ( x ) is defined over X and takes values in R .
We now define the j th residual by the vector-valued formula m � r j ( β 1 , · · · , β m ) = y j − β i f i ( x ) . i =1 Introduce the m × q -matrix B , the n × q -vectors Y and R , and the n × m - matrix F by β T T T y r 1 1 1 . . . . . . B = . Y = . R = . , , , β T T T y r n n m f 1 ( x 1 ) · · · f m ( x 1 ) . . . . . . F = . . . . f 1 ( x n ) · · · f m ( x n ) We will assume the matrix F has rank m . The fitting problem then can be recast as finding B so as to minimize the size of the vector R ( B ) = Y − F B .
As we did for univariate weighted least square fitting, we will minimize n � q ( B ) = 1 T r j ( β 1 , · · · , β m ) , w j r j ( β 1 , · · · , β m ) 2 j =1 where the w j are positive weights. If we again let W be the n × n diagonal matrix whose j th diagonal entry is w j then this can be expressed as � � � � q ( B ) = 1 T WR ( B ) = 1 T W ( Y − F B ) 2 tr R ( B ) 2 tr ( Y − F B ) � � � � � � = 1 Y T WY B T F T WY + 1 T F T WF B 2 tr − tr 2 tr B . T WF is positive definite. The Because F has rank m the m × m -matrix F function q ( B ) thereby has a strictly convex structure similar to that it had in the univariate case. It thereby has a unique global minimizer B = � B given by T WF ) − 1 F T WY . � B = ( F
The fact that � B in a global minimizer again can be seen from the fact T WF is positive definite and the identity F � � � � � T F Y T WY T WF � q ( B ) = tr − tr B B � � T F T WF ( B − � ( B − � + tr B ) B ) � � T F T WF ( B − � = q ( � ( B − � B ) + tr B ) B ) . B ) for every B ∈ R m × q and that In particular, this shows that q ( B ) ≥ q ( � q ( B ) = q ( � B ) if and only if B = � B . T i be the i th row of � If we let � β B then the fit is given by m � � � f ( x ) = β i f i ( x ) . i =1 The geometric interpretation of this fit is similar to that for the univariate weighted least squares fit.
Example. Use least squares to fit the affine model f ( x ; a , B ) = a + Bx with a ∈ R q and B ∈ R q × p to the data { ( x j , y j ) } n j =1 . Begin by setting T T � � y 1 x T 1 1 a . . . . . . B = Y = . F = . . , , . T B T T 1 x y n n Because � � � � T � T � � 1 � � y � x T WY = T WF = F , F , T � T � � x y � x � � x x we find that � � − 1 � � T � T � 1 � x � y T WF ) − 1 F T WY = � B = ( F T � T � � x � � x x � x y T � T � − 1 � T � T � − � x � T � − � x �� x � T � − � x �� y � � y � x x � x y = . � T � − 1 � T � T � − � x �� x � T � − � x �� y � � x x � x y
� � T = Because � � , by setting � x � = ¯ x and � y � = ¯ B y we can express a B � a and � these formulas for � B simply as � T � � T � − 1 , � y − � B = y ( x − ¯ x ) ( x − ¯ x ) ( x − ¯ x ) a = ¯ B ¯ x . � The affine fit is therefore � y + � f ( x ) = ¯ B ( x − ¯ x ) . Remark. The linear multivariate models considered above have the form m � f ( x ; β 1 , · · · , β m ) = β i f i ( x ) , i =1 where each parameter vector β i lies in R q while each basis function f i ( x ) is defined over the bounded domain X ⊂ R p and takes values in R . This is assumes that each entry of f is being fit to the same family — namely, the family spanned by the basis { f i ( x ) } m i =1 . Such families often are too large to be practical. We will therefore consider more general linear models.
7. General Multivariate Linear Least Squares Fitting We now extend the least square method to the general multivariate set- ting. Suppose we are given data { ( x j , y j ) } n j =1 where the x j lie within a bounded domain X ⊂ R p while the y j lie in R q . We will use weighted least squares to fit the data to a linear statistical model with m real parameters in the form m � f ( x ; β 1 , · · · , β m ) = β i f i ( x ) , i =1 where each basis function f i ( x ) is defined over X and takes values in R q . We will minimize the j th residual, which is defined by the vector-valued formula m � r j ( β 1 , · · · , β m ) = y j − β i f i ( x ) . i =1
Following what was done earlier, introduce the m -vector β , the nq -vectors Y and R , and the nq × m matrix F by β 1 y 1 r 1 . . . . . . β = Y = R = . , . , . , β m y n r n f 1 ( x 1 ) · · · f m ( x 1 ) . . . . . . F = . . . . f 1 ( x n ) f m ( x n ) · · · We will assume the matrix F has rank m . The fitting problem then can be recast as finding β so as to minimize the size of the vector R ( β ) = Y − F β .
We assume that R q is endowed with an inner product. Without loss of T Gz where generality we can assume that this inner product has the form y G is a symmetric, positive definite q × q matrix. We will minimize n � q ( β ) = 1 T Gr j ( β 1 , · · · , β m ) , w j r j ( β 1 , · · · , β m ) 2 j =1 where the w j are positive weights. If we let W be the symmetric, positive definite nq × nq block-diagonal matrix 0 0 w 1 G · · · . ... . 0 w 2 G . W = , . ... ... . 0 . 0 · · · 0 w n G then q ( β ) can be expressed in terms of the weight matrix W as q ( β ) = 1 T WR ( β ) = 1 T W ( Y − F β ) 2 R ( β ) 2 ( Y − F β ) = 1 2 Y T WY − β T F T WY + 1 2 β T F T WF β .
T WF is positive definite. The Because F has rank m the m × m -matrix F function q ( β ) thereby has the same strictly convex structure as it had in the univariate case. It therefore has a unique minimizer β = � β where T WF ) − 1 F T WY . � β = ( F The fact that � T WF is positive β in a minimizer again follows from the fact F definite and the identity T F q ( β ) = 1 2 Y T WY − 1 T WF � β + 1 T F T WF ( β − � 2 � 2 ( β − � β β ) β ) T F T WF ( β − � = q ( � β ) + ( β − � β ) β ) . β ) for every β ∈ R m and that In particular, this shows that q ( β ) ≥ q ( � q ( β ) = q ( � β ) if and only if β = � β . Remark. The geometric interpretation of this fit is that same as that for the weighted least squares fit, except here the W -inner product on R nq is T WQ . ( P | Q ) W = P
Further Questions We have seen how to use least squares to fit linear satistical models with m parameters to data sets containing n pairs when m << n . Among the questions that arise are the following. • How does one pick a basis that is well suited to the given data? • How can one avoid overfitting? • Do these methods extended to nonlinear statistical models? • Can one use other notions of smallness of the residual?
Recommend
More recommend