Projection Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Projection 1 / 42
Hilbert Space and Projection Theorem 1 Projection in the L 2 Space 2 Projection in R n 3 Projection Matrices Partitioned Fit and Residual Regression 4 Projection along a Subspace Ping Yu (HKU) Projection 2 / 42
Overview Whenever we discuss projection, there must be an underlying Hilbert space since we must define "orthogonality". We explain projection in two Hilbert spaces ( L 2 and R n ) and integrate many estimators in one framework. Projection in the L 2 space: linear projection and regression (linear regression is a special case) Projection in R n : Ordinary Least Squares (OLS) and Generalized Least Squares (GLS) One main topic of this course is the (ordinary) least squares estimator (LSE). Although the LSE has many interpretations, e.g., as a MLE or a MoM estimator, the most intuitive interpretation is that it is a projection estimator. Ping Yu (HKU) Projection 2 / 42
Hilbert Space and Projection Theorem Hilbert Space and Projection Theorem Ping Yu (HKU) Projection 3 / 42
Hilbert Space and Projection Theorem Hilbert Space Definition (Hilbert Space) A complete inner product space is called a Hilbert space . a An inner product is a bilinear operator h� , �i : H � H ! R , where H is a real vector space, satisfying for any x , y , z 2 H and α 2 R , (i) h x + y , z i = h x , z i + h y , z i ; (ii) h α x , z i = α h x , z i ; (iii) h x , z i = h z , x i ; (iv) h x , x i � 0 with equal if and only if x = 0 . We denote this Hilbert space as ( H , h� , �i ) . a A metric space ( H , d ) is complete if every Cauchy sequence in H converges in H , where d is a metric on H . A sequence f x n g in a metric space is called a Cauchy sequence if for any ε > 0, there is a positive integer N such that for all natural numbers m , n > N , d ( x m , x n ) < ε . Ping Yu (HKU) Projection 4 / 42
Hilbert Space and Projection Theorem Angle and Orthogonality An important inequality in the inner product space is the Cauchy–Schwarz inequality: jh x , y ij � k x k�k y k , p where k�k � h� , �i is the norm induced by h� , �i . Due to this inequality, we can define h x , y i angle ( x , y ) = arccos k x k�k y k . We assume the value of the angle is chosen to be in the interval [ 0 , π ] . [Figure Here] If h x , y i = 0, angle ( x , y ) = π 2 ; we call x is orthogonal to y and denote it as x ? y . Ping Yu (HKU) Projection 5 / 42
Hilbert Space and Projection Theorem Figure: Angle in Two-dimensional Euclidean Space Ping Yu (HKU) Projection 6 / 42
Hilbert Space and Projection Theorem Projection and Projector The ingredients of a projection are f y , M , ( H , h� , �i ) g , where M is a subspace of H . Note that the same H endowed with different inner products are different Hilbert spaces, so the Hilbert space is denoted as ( H , h� , �i ) rather than H . Our objective is to find some Π ( y ) 2 M such that h 2 M k y � h k 2 . Π ( y ) = argmin (1) Π ( � ) : H ! M is called a projector , and Π ( y ) is called a projection of y . Ping Yu (HKU) Projection 7 / 42
Hilbert Space and Projection Theorem Direct Sum, Orthogonal Space and Orthogonal Projector Definition Let M 1 and M 2 be two disjoint subspaces of H so that M 1 \ M 2 = f 0 g . The space V = f h 2 H j h = h 1 + h 2 , h 1 2 M 1 , h 2 2 M 2 g is called the direct sum of M 1 and M 2 and it is denoted by V = M 1 � M 2 . Definition Let M be a subspace of H . The space M ? � f h 2 H jh h , M i = 0 g is called the orthogonal space or orthogonal complement of M , where h h , M i = 0 means h is orthogonal to every element in M . Definition Suppose H = M 1 � M 2 . Let h 2 H so that h = h 1 + h 2 for unique h i 2 M i , i = 1 , 2. Then P is a projector onto M 1 along M 2 if Ph = h 1 for all h . In other words, PM 1 = M 1 and PM 2 = 0. When M 2 = M ? 1 , we call P as an orthogonal projector . Ping Yu (HKU) Projection 8 / 42
Hilbert Space and Projection Theorem Figure: Projector and Orthogonal Projector What is M 2 ? [Back to Lemma 9] Ping Yu (HKU) Projection 9 / 42
Hilbert Space and Projection Theorem Hilbert Projection Theorem Theorem (Hilbert Projection Theorem) If M is a closed subspace of a Hilbert space H, then for each y 2 H, there exists a unique point x 2 M for which k y � x k is minimized over M. Moreover, x is the closest element in M to y if and only if h y � x , M i = 0 . The first part of the theorem states the existence and uniqueness of the projector. The second part of the theorem states something related to the first order conditions (FOCs) of (1) or, simply, orthogonal conditions. From the theorem, given any closed subspace M of H , H = M � M ? . Also, the closest element in M to y is determined by M itself, not the vectors generating M since there may be some redundancy in these vectors. Ping Yu (HKU) Projection 10 / 42
Hilbert Space and Projection Theorem Figure: Projection Ping Yu (HKU) Projection 11 / 42
Hilbert Space and Projection Theorem Sequential Projection Theorem (Law of Iterated Projections or LIP) If M 1 and M 2 are closed subspaces of a Hilbert space H, and M 1 � M 2 , then Π 1 ( y ) = Π 1 ( Π 2 ( y )) , where Π j ( � ) , j = 1 , 2 , is the orthogonal projector of y onto M j . Proof. Write y = Π 2 ( y ) + Π ? 2 ( y ) . Then Π 1 ( y ) = Π 1 ( Π 2 ( y ) + Π ? 2 ( y )) = Π 1 ( Π 2 ( y )) + Π 1 ( Π ? 2 ( y )) = Π 1 ( Π 2 ( y )) , � � = 0 for any x 2 M 2 and M 1 � M 2 . Π ? where the last equality is because 2 ( y ) , x We first project y onto a larger space M 2 , and then project the projection of y (in the first step) onto a smaller space M 1 . The theorem shows that such a sequential procedure is equivalent to projecting y onto M 1 directly. We will see some applications of this theorem below. Ping Yu (HKU) Projection 12 / 42
Projection in the L 2 Space Projection in the L 2 Space Ping Yu (HKU) Projection 13 / 42
Projection in the L 2 Space Linear Projection A random variable x 2 L 2 ( P ) if E [ x 2 ] < ∞ . L 2 ( P ) endowed with some inner product is a Hilbert space. y 2 L 2 ( P ) , x 1 , ��� , x k 2 L 2 ( P ) , M = span ( x 1 , ��� , x k ) � span ( x ) , 1 H = L 2 ( P ) with h� , �i defined as h x , y i = E [ xy ] . h ( y � h ) 2 i Π ( y ) = argmin h 2 M E h ( y � x 0 β ) 2 i (2) = x 0 � arg min β 2 R k E is called the best linear predictor (BLP) of y given x , or the linear projection of y onto x . � z 2 L 2 ( P ) j z = x 0 α , α 2 R k � 1 span ( x ) = . Ping Yu (HKU) Projection 14 / 42
Projection in the L 2 Space continue... Since this is a concave programming problem, FOCs are sufficient 2 : � � �� = 0 ) E [ x u ] = 0 y � x 0 β 0 � 2 E x (3) h ( y � x 0 β ) 2 i where u = y � Π ( y ) is the error, and β 0 = arg min β 2 R k E . Π ( y ) always exists and is unique, but β 0 needn’t be unique unless x 1 , ��� , x k are linearly independent, that is, there is no nonzero vector a 2 R k such that a 0 x = 0 almost surely (a.s.). h ( a 0 x ) 2 i Why? If 8 a 6 = 0, a 0 x 6 = 0, then E > 0 and a 0 E [ xx 0 ] a > 0, thus E [ xx 0 ] > 0. So from (3), � � xx 0 �� � 1 E [ x y ] (why?) β 0 = E (4) and Π ( y ) = x 0 ( E [ xx 0 ]) � 1 E [ x y ] . In the literature, β with a subscript 0 usually represents the true value of β . 2 ∂ ∂ x ( a 0 x ) = ∂ x ( x 0 a ) = a ∂ Ping Yu (HKU) Projection 15 / 42
Projection in the L 2 Space Regression The setup is the same as in linear projection except that M = L 2 ( P , σ ( x )) , where L 2 ( P , σ ( x )) is the space spanned by any function of x (not only the linear function of x ) as long as it is in L 2 ( P ) . h ( y � h ) 2 i Π ( y ) = argmin h 2 M E (5) Note that h ( y � h ) 2 i E h ( y � E [ y j x ] + E [ y j x ] � h ) 2 i = E h ( y � E [ y j x ]) 2 i h ( E [ y j x ] � h ) 2 i = E + 2 E [( y � E [ y j x ])( E [ y j x ] � h )] + E h ( y � E [ y j x ]) 2 i h ( E [ y j x ] � h ) 2 i h ( y � E [ y j x ]) 2 i ? � E [ u 2 ] , = E + E � E so Π ( y ) = E [ y j x ] , which is called the population regression function (PRF), where the error u satisfies E [ u j x ] = 0 (why?). We can use variation to characterize the FOCs: h ( y � ( Π ( y ) + ε h ( x ))) 2 i 0 = argmin ε 2 R E (6) � 2 E [ h ( x )( y � ( Π ( y ) + ε h ( x )))] j ε = 0 = 0 ) E [ h ( x ) u ] = 0, 8 h ( x ) 2 L 2 ( P , σ ( x )) Ping Yu (HKU) Projection 16 / 42
Projection in the L 2 Space Relationship Between the Two Projections Π 1 ( y ) is the BLP of Π 2 ( y ) given x , i.e., the BLPs of y and Π 1 ( y ) given x are the same. This is a straightforward application of the law of iterated projections. Explicitly, define Z h� h� � 2 i � 2 i E [ y j x ] � x 0 β E [ y j x ] � x 0 β β o = arg min β 2 R k E = arg min dF ( x ) . β 2 R k The FOCs for this minimization problem are E [ � 2 x ( E [ y j x ] � x 0 β o )] = 0 ) E [ xx 0 ] β o = E [ x E [ y j x ]] = E [ x y ] ) β o = ( E [ xx 0 ]) � 1 E [ x y ] = β 0 In other words, β 0 is a (weighted) least squares approximation to the true model. If E [ y j x ] is not linear in x , β o depends crucially on the weighting function F ( x ) or the distribution of x . The weighting function ensures that frequently drawn x i will yield small approximation errors at the cost of larger approximation errors for less frequently drawn x i . Ping Yu (HKU) Projection 17 / 42
Recommend
More recommend