Advantage of Orthonormal Bases • Recall that writing a vector in basis B in general requires solving a linear system • When is orthonormal, B = { u 1 , u 2 , . . . , u n } the coordinates are immediately available • Simply write x = ξ 1 u 1 + ξ 2 u 2 + · · · + ξ n u n and notice that n n n D E X X X � � x | u i ⇥ = � ξ i u j | u i ⇥ = ξ i � u j | u i ⇥ ξ i u j � u i = i = j i = j i = j = ξ i � u i | u i ⇥ = ξ i
Fourier Expansion • If is an orthonormal B = { u 1 , u 2 , . . . , u n } basis for an inner product space V , then each x in V can be expressed as x = � x | u 1 ⇥ u 1 + � x | u 2 ⇥ u 2 + · · · + � x | u n ⇥ u n • This is the Fourier expansion of x • The scalars are the coordinates of x � x | u i ⇥ with respect to B , its Fourier coefficients • Each represents the projection of x � x | u i ⇥ u i onto the line spanned by u i
Idea of Gram-Schmidt Procedure • Orthonormal bases often desirable • Given a general basis B = { x 1 , x 2 , . . . , x n } for a vector space V , can we construct an orthonormal basis spanning V ? • Yes: Use the Gram-Schmidt procedure • At each step k we complete an orthonormal basis S k for span { x 1 , x 2 , . . . , x k } • I.e., proceed incrementally
Gram-Schmidt Procedure Steps • The first step is easy x 1 S 1 = { u 1 } u 1 = k x 1 k • Now assume we have S k and want to find u k + 1 • u k + 1 ⊥ u i , i < k + 1 • � � span { x 1 , x 2 , . . . , x k , x k +1 } = span S k ∪ { u k +1 } • k u k +1 k = 1 • Use Fourier expansion of x k +1 to solve for u k +1 k k X X � u i | x k +1 ⇥ u i α k +1 u k +1 = x k +1 � ⇥ u i | x k +1 ⇤ u i x k +1 = α k +1 u k +1 + i =1 i =1 α k +1 u k +1 u k +1 = k α k +1 u k +1 k
Fourier Expansion in Matrix Form • Let be an orthonormal B = { u 1 , u 2 , . . . , u n } basis for V , and put all u i as columns in U • Now, using the standard inner-product [ U ∗ ] 1 ∗ x u ∗ 1 x n [ U ∗ ] 2 ∗ x u ∗ 2 x X x = u j ( u ∗ j x ) = U = U = UU ∗ x . . . . . . j =1 [ U ∗ ] n ∗ x u ∗ n x • So U * = U - 1 ? ( 1 , i = j [ U ∗ U ] ij = u ∗ i u j = i 6 = j 0 ,
Gram-Schmidt in Matrix Form • First step is x 1 u 1 = k x 1 k • To produce u k + 1 , compute ( I � U k U ∗ k ) x k +1 u k +1 = k ( I � U k U ∗ k ) x k +1 k where the orthogonal matrix U k is given by � u 1 � u 2 u k U k = · · ·
Unitary and Orthogonal Matrices • A unitary matrix is a complex matrix U n × n whose columns form an orthonormal basis for C n • An orthogonal matrix is a real matrix P n × n whose columns form an orthonormal basis for R n • The following characterizations are equivalent • P (or U ) has orthonormal columns • P (or U ) have orthonormal rows • P - 1 = P T (or U - 1 = U * ) for all x in R n × 1 • k Px k 2 = k x k 2 (or for all x in C n × 1 ) k Ux k 2 = k x k 2
The QR Factorization • The Gram-Schmidt procedure can be seen as a matrix factorization • Put each x k of basis as B = { x 1 , x 2 , . . . , x n } a column of matrix A • Put each u k in S n as columns of an orthonormal or unitary matrix Q • Pick column i of matrix R to compute the linear combination of u k that results in x i • Results in R being upper-triangular!
The QR factorization • Every matrix A m × n with l.i. columns can be uniquely factored as A = QR where the columns of Q m × n form an orthonormal basis for R ( A ) and R n × n is an upper-triangular matrix with positive diagonal entries k − 1 X h a k | q i i q i a k = α k q k + q ∗ 1 a 2 q ∗ 1 a 3 q ∗ 1 a n α 1 · · · i =1 q ∗ 2 a 3 q ∗ 2 a n 0 α 2 · · · 0 0 q ∗ 3 a n α 3 R = · · · . . . . ... k − 1 . . . . . . . . X α k = k a k � h a k | q i i q i k 0 0 0 α n · · · i =1
Least-squares application of QR • Start from the normal equations A T Ax = A T b • Decomposing A T A into LU is both inefficient and leads to information loss • From the QR factorization, we have ( QR ) T QRx = R T ( Q T Q ) Rx = R T Rx = R T Q T b • If R T is non-singular (What if it isn’t?) Rx = Q T b x = R − 1 Q T b
General orthogonal projectors • Given an m × n orthogonal or unitary matrix U , find the orthogonal projection z in R ( U ) of a vector b not necessarily in R ( U ) ( z − b ) ⊥ Uy , ∀ y z = Ux ( Uy ) ∗ ( Ux − b ) = 0 ⇒ y ∗ U ∗ ( Ux − b ) = 0 ⇒ y ∗ x = y ∗ U ∗ b Choose y = e i to prove ⇒ z = UU ∗ b ⇒ x = U ∗ b • This is just like the Fourier expansion, for an incomplete basis
Elementary orthogonal projectors • For a vector , , matrix u ∈ C n × 1 u 6 = 0 Q = I − uu ∗ Not an unitary matrix! u ∗ u is called an elementary orthogonal projector • Note that and Q 2 = Q Q ∗ = Q with • x = Qx + ( I − Q ) x Qx ⊥ ( I − Q ) x is the orthogonal • ( I − Q ) x = u ( u ∗ x ) projection in � � { u } span is the orthogonal projection of x in the • Qx orthogonal complement of u ⊥ u
Elementary reflectors (Householder transformation) u ∈ C n × 1 u 6 = 0 • For a vector , , matrix R = I − 2 uu ∗ Unitary matrix! u ∗ u is the elementary reflector about u ⊥ • Note that and and • R 2 = I R − 1 = R R ∗ = R • QR = RQ = Q • R = 2 Q − I � uu ∗ � � • k x � Qx k = k Rx � Qx k = � � u ∗ ux �
Geometry of reflectors and projectors Q = I − uu ∗ R = I − 2 uu ∗ u ∗ u u ∗ u u ⊥ u x k x � Qx k ( I − Q ) x Qx k Rx � Qx k Rx
Reflector into given vector • Can find such that u = v + e k v k µ u = v ± µ e R = I − 2 uu ∗ ⇒ Rv = ⌥ µ e v u ∗ u u 0 = v � e k v k • Proof e Rv = v − 2 uu ∗ u ∗ uv k e k = 1 = v − 2 u ∗ v = ⌥ µ e u ∗ uu − e ⇔ 2 u ∗ v = u ∗ u 2 u ∗ v � u ∗ u = v ∗ v � k µ k 2 ± ¯ µ e ∗ v ⌥ µ v ∗ e = 0 µ = k v k e ∗ v k e ∗ v k
Plane rotations (Givens rotations) • Orthogonal matrix of the form col j col i → → 1 ... row i → c s 1 ... P ij ( c, s ) = row j − s c → 1 ... 1 where c 2 + s 2 = 1 • Performs a rotation in the i,j plane of R n
Pivoting with plane rotations • Kill element below pivot • “Exchange” rows 2 3 x 1 x 1 x 1 . . . . . 6 7 . . . . 6 7 6 q 7 x 2 i + x 2 cx i + sx j 6 7 x j j 6 7 . . 6 7 . . . P ij ( c, s ) x = P ij (0 , 1) x = = . . . 6 7 . 6 7 − sx i + cx j 6 7 − x i 0 6 7 . . 6 7 . . . . . . 6 7 . 4 5 x n x n x n x i x j c = s = q q x 2 i + x 2 x 2 i + x 2 j j x 2 i + x 2 j 6 = 0
Combined rotations into given vector • Rotations from any vector into k x k e 1 x p p 2 3 2 3 x 2 1 + x 2 x 2 1 + x 2 2 + x 2 k x k 2 3 0 0 0 6 7 6 7 6 7 6 7 0 0 x 3 6 7 6 7 P 1 n . . . P 13 P 12 x = P 12 x = P 13 P 12 x = 6 7 6 7 0 x 4 x 4 6 7 6 7 6 7 6 7 . . . . . . 6 7 6 7 . . . 4 5 4 5 0 x n x n • Use as building block for rotations into arbitrary vector , with k x k e k e k = 1 • Result of combined rotations is not a rotation, but is an orthogonal transformation
Orthogonal reduction • Matrix A can be reduced to echelon form by elementary row operations • Eliminate all entries below pivot • Matrix A can also be reduced using elementary reflectors or plane rotations • Using elementary projectors leads to Householder reduction • Using plane rotations leads to Givens reduction
Orthogonal reduction • Start by finding such that R 1 [ A ] ∗ 1 = µ 1 e 1 R 1 t T � µ 1 1 R 1 A = 0 A 2 • Then find such that R 2 [ A 2 ] ∗ 1 = µ 2 e 2 R 2 1 � t T � 0 µ 1 1 R 1 A = 0 R 2 0 R 2 A 2 t T � µ 2 2 R 2 A 2 = 0 A 3 • And so on…
Reduces to trapezoidal form ∗ ∗ ∗ · · · 0 ∗ ∗ · · · . . ... . . . . 0 0 ∗ PA = · · · 0 0 0 · · · . . . ... . . . . . . 0 0 0 · · · ∗ ∗ ∗ ∗ ∗ · · · · · · 0 ∗ ∗ ∗ ∗ · · · · · · PA = . . . . ... . . . . . . . . 0 0 ∗ ∗ ∗ · · · · · ·
Unicity of QR factorization • When matrix is square and non-singular, Gram-Schmidt, Householder, and Givens produce QR, with R triangular • Gram-Schmidt has positive diagonal. Householder and Givens don’t, but can • In this case, factorization is unique
We now have two factorizations • The LU factorization and QR factorization • LU gives the roadmap to Gaussian elimination applied to a square, non-singular matrix • QR gives the roadmap to Gram-Schmidt applied to a matrix with independent columns • Both LU and QR can be used to solve linear systems efficiently QRx = b ⇒ Rx = Q T b • • LU about twice as efficient • QR is useful for least squares as well, more stable
Practical considerations • Gram-Schmidt as presented is numerically unstable in floating-point • For example if applied to a set of vectors that is not even close to being orthogonal • Even modified Gram-Schmidt stable only for LS • Householder/Givens are stable and faster • When dealing with sparse problems • Gram-Schmidt and Householder can transform it into dense problem • Not enough memory? • Maybe Givens can help
COMPLEMENTARY SUBSPACES
Complementary Subspaces • Subspaces X , Y of V are said to be complementary whenever and V = X + Y , in which case V is said to be the X ∩ Y = 0 direct sum of X and Y , denoted by V = X ⊕ Y • Given bases B X and B Y for X and Y , the following statements are equivalent • V = X ⊕ Y • For each v in V , there are unique vectors x in X and y in Y such that v = x + y is a basis for V and • B X ∪ B Y B X ∩ B Y = ∅
Projection • Suppose that so that for each V = X ⊕ Y v in V there are unique vectors x in X and y in Y such that v = x + y • The vector x is the projection of v onto X along Y • The vector y is the projection of v onto Y along X • When X and Y are not orthogonal subspaces, this is an “oblique projection” • As opposed to the orthogonal projection
Visually Y v = x + y y x 0 X
Computing Oblique Projections • Assume we have bases B X and B Y for complementary subspaces X and Y of R n • This means that is a basis for R n B = B X ∪ B Y B = { x 1 , x 2 , . . . , x r , y 1 , y 2 , . . . , y n − r } • It is easier to find the coordinate matrix for the projection P onto X along Y in basis B � [ P ( x 1 )] B � [ P ] B = [ P ( x r )] B [ P ( y 1 )] B [ P ( y n − r )] B · · · · · · ✓ I r ◆ 0 � e 1 0 � e r 0 = · · · · · · = 0 0 • Placing the elements of B ( x i and y i ) as columns in B ✓ I r ◆ 0 B − 1 P = B 0 0
Summary of Projections • Let X and Y be complementary subspaces of V . Each v in V can be uniquely expressed as x + y , with x in X and y in Y • The unique linear operator P defined by Pv = x is the projector onto X along Y • The following properties hold • P 2 = P (i.e., P is idempotent) • I – P is the complementary projector onto Y along X • R ( P ) = { x | Px = x } • R ( P ) = N ( I – P ) = X and R ( I – P ) = N ( P ) = Y • If V = R n or C n , then P is given by ✓ I r ◆ 0 B − 1 P = B 0 0 where B = [ X | Y ] and X , Y are respective bases for X , Y
Projectors and Idempotents • A linear operator P on V is a projector if and only if it is idempotent • Proof • We have just seen that projectors ⇒ idempotents • Need to prove that idempotent ⇒ projector P 2 = P ⇒ R ( P ) are complementary and N ( P ) R ( P ) ∩ N ( P ) = { 0 } x = Py Px = 0 ⇒ 0 = Px = P 2 y = Py = x Pv ∈ R ( P ) v = Pv + ( I − P ) v ( I − P ) v ∈ N ( P )
RANGE-NULLSPACE DECOMPOSITION
Complementary subspaces associated to a square matrix • Let’s go back to a matrix A n × n • The Rank+Nullity theorem guarantees that dim R ( A ) + dim N ( A ) = n • But are R ( A ) and N ( A ) complementary? • They certainly are if A is non-singular. Duh. • Otherwise, it can happen that R ( A ) ⇥ N ( A ) � = { 0 } ✓ 0 ◆ ✓ 1 ◆ 1 A = ∈ R ( A ) ∩ N ( A ) ⇒ 0 0 0 • We can find complementary subspaces by looking at the powers of A
Range-Nullspace Decomposition • For every singular matrix A n × n there exists a positive integer k such that R ( A k ) and N ( A k ) are complementary subspaces. I.e., R n = R ( A k ) ⊕ N ( A k ) • The smallest positive integer k for which this holds is called the index of A • For non-singular matrices, we simply define index ( A ) = 0
Proof • The nullspace doesn’t shrink and the range doesn’t grow with k N ( A ) ⊆ N ( A 2 ) ⊆ · · · ⊆ N ( A k ) ⊆ N ( A k +1 ) ⊆ · · · R ( A ) ⊇ R ( A 2 ) ⊇ · · · ⊇ R ( A k ) ⊇ R ( A k +1 ) ⊇ · · · • At some point there must be equality • The dimensions can’t shrink below zero or grow above n • Once equality is attained, it is preserved forever R ( A k ) = R ( A k +1 ) ⇒ A R ( A k ) = A R ( A k +1 ) ⇒ R ( A k +1 ) = R ( A k +2 ) • Equality is attained at the same k for the nullspace dim N ( A k ) = n − dim R ( A k ) = n − dim R ( A k +1 ) = dim N ( A k +1 ) • When equality is attained, R ( A k ) ∩ N ( A k ) = { 0 } x ∈ R ( A k ) ∩ N ( A k ) ⇒ x = A k y , A k x = 0 ⇒ A 2 k y = 0 ⇒ y ∈ N ( A 2 k ) ⇒ y ∈ N ( A k ) ⇒ x = A k y = 0 • Finally, when equality is attained R ( A k ) + N ( A k ) = R n R ( A k ) + N ( A k ) = dim R ( A k ) + dim N ( A k ) − dim R ( A k ) ∩ N ( A k ) ⇥ ⇤ dim = dim R ( A k ) + dim N ( A k ) = n ⇒ R ( A k ) + N ( A k ) = R n
Summary about the Index of a square matrix • The index of a square matrix A is the smallest nonnegative integer k such that any of these holds • rank ( A k ) = rank ( A k + 1 ) • R ( A k ) = R ( A k + 1 ) (the point where it stops shrinking) • N ( A k ) = N ( A k + 1 ) (the point where it stops growing) • For non-singular matrices, index ( A ) = 0 • For singular matrices, index ( A ) is the smallest positive integer k such such that any of these holds • R ( A k ) ∩ N ( A k ) = { 0 } R n = R ( A k ) ⊕ N ( A k ) •
Nilpotent Matrices • N n × n is said to be nilpotent whenever N k = 0 for some positive integer k • k = index ( N ) is the smallest positive integer such that N k = 0 • Proof • As soon as N k = 0 , we have R ( N k + i ) = { 0 } • index ( N ) ≤ k • On the other hand, if we reach index ( N ) and R ( N k ) ≠ { 0 }, we could never have N k = 0 • index ( N ) ≥ k
Example of Nilpotent Matrix • Start taking powers 0 1 0 0 0 1 0 0 0 N 2 = N 3 = N = 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 • index ( N ) = 3 , since N 2 ≠ 0 and N 3 = 0
Application for the Range-Nullspace decomposition • If k = index ( A ) , then both R ( A k ) and N ( A k ) are invariant subspaces of A • A R ( A k ) = R ( A k +1 ) = R ( A k ) • x ∈ A N ( A k ) ⇒ x = Aw , w ∈ N ( A k ) = N ( A k +1 ) ⇒ A k x = A k +1 w = 0 ⇒ x ∈ N ( A k ) • When we write A in basis [ R ( A k ) N ( A k ) ], we get a block diagonal matrix • This will be very useful as a step toward the Jordan form we will see later on
Core-Nilpotent Decomposition • Let A n × n be a singular matrix with index k , and let rank ( A ) = r . There exists a non-singular matrix Q such that ✓ C r × r ◆ 0 Q − 1 AQ = 0 N where C is nonsingular and N is nilpotent of index k • A is similar to a 2×2 block diagonal matrix containing a non-singular “core” C and a nilpotent component N • This block diagonal matrix is known as the core-nilpotent decomposition of A • When A is non-singular, this is trivial (and useless)
Core-Nilpotent Decomposition • Proof • We have already seen the block diagonal property • To see that N is nilpotent, write ✓ C k ◆ ✓ U ◆ ✓ UA k X ◆ 0 0 = Q − 1 A k Q A k � X Y � = = N k VA k X 0 V 0 with columns of X , Y bases for R ( A k ) , N ( A k ) • Since rank ( C k ) = rank ( A k ) = r , C is non-singular • To see that index ( N ) = k , imagine it is smaller • Then rank ( A k -1 ) = rank ( C k -1 ) = r = rank ( A k ) • So index ( A ) < k , which is a contradiction
ORTHOGONAL DECOMPOSITION
Orthogonal Complement • For a subset M of an inner-product space V , the orthogonal complement M ⊥ of M is the set of all vectors in V that are orthogonal to all vectors in M M ⊥ = { x � V | ⇥ m | x ⇤ = 0 for all m � M} M ⊥ M = { x } M ⊥ M
Orthogonal Complementary Subspaces • If M is a subspace of an finite-dimensional inner-product space V , then V = M ⊕ M ⊥ • Furthermore, if N is another subspace such that and , then V = M ⊕ N N ⊥ M N = M ⊥ • Finally, if dim V = n , then N = M ⊥ dim M ⊥ = n − dim M N ⊕ M = V ⊥ = M N ⊕ N ⊥ = V M ⊥ N ⊥ = M
Orthogonal Decomposition Theorem • For every A m × n R ( A ) ⊥ = N ( A T ) N ( A ) ⊥ = R ( A T ) • Proof x ∈ R ( A ) ⊥ ⇔ x T Ay = 0 ⇔ x T A = 0 ⇔ x ∈ N ( A T ) • In other words, every matrix in R m × n produces orthogonal decompositions of R m and R n R m = R ( A ) ⊕ N ( A T ) R n = N ( A ) ⊕ R ( A T )
Significance #1 • The orthogonal decomposition theorem give us decompositions of R m and R n in terms of the four fundamental subspaces • The resulting bases for R m and R n reveal more about basic components of matrix A itself • Combine orthonormal bases for R ( A ) , N ( A T ) and for R ( A T ) , N ( A ) into bases of R m and R n N ( A T ) � � U m × m = R ( A ) R ( A T ) � � V n × n = N ( A )
Significance #2 • What does A look like w.r.t. these bases? N ( A T ) � � U m × m = R ( A ) R ( A T ) � � V n × n = N ( A ) • Multiply on the right by V and on the left by U -1 R = U − 1 AV = U T AV = U T A R ( A T ) � � N ( A ) ✓ R ( A ) T � = ◆ � A R ( A T ) A R ( A T ) = U T � � 0 0 N ( A T ) T R ( A ) T A R ( A T ) ✓ ◆ ✓ C r × r ◆ 0 0 = = 0 0 0 0
The URV Factorization • For each matrix A m × n in R m × n of rank r , there are orthogonal matrices U m × m and V n × n and a non-singular matrix C r × r such that ✓ C r × r ◆ 0 A = URV T = U V T 0 0 m × n • The first r columns of U are an orthonormal basis for R ( A ) • The last m – r columns of U an orthonormal basis for N ( A T ) • The first r columns of V are an orthonormal basis for R ( A T ) • The last n – r columns of U an orthonormal basis for N ( A ) • In the complex case, replace orthogonal by unitary and conjugate the transposes
Four fundamental subspaces R n = R ( A T ) ⊕ N ( A ) R m = R ( A ) ⊕ N ( A T ) A dim = r dim = r R ( A T ) R ( A ) A T y A T Ax z ⊥ A T y 0 A T A 0 w ⊥ Ax N ( A ) N ( A T ) A T w = 0 Az = 0 dim = n − r dim = m − r
The action of A R n = R ( A T ) ⊕ N ( A ) R m = R ( A ) ⊕ N ( A T ) R ( A T ) R ( A ) A x y 0 v = x + z 0 z N ( A ) N ( A T )
Inverting what we can R n = R ( A T ) ⊕ N ( A ) R m = R ( A ) ⊕ N ( A T ) R ( A T ) A † R ( A ) x y ? 0 0 y + w = u N ( A ) w N ( A T )
Inverting what we can R n = R ( A T ) ⊕ N ( A ) R m = R ( A ) ⊕ N ( A T ) R ( A T ) A † R ( A ) x x y ? 0 v = x + z 0 z N ( A ) N ( A T )
Inverting what we can R n = R ( A T ) ⊕ N ( A ) R m = R ( A ) ⊕ N ( A T ) R ( A T ) A † R ( A ) x y 0 0 y + w = u N ( A ) w N ( A T )
Finding a matrix representation • Start from an URV factorization of A • Singularity of A has been isolated • Problem comes from N ( A ) and N ( A T ) • Map from R ( A T ) and R ( A ) is invertible • It is given by invertible matrix C • Simply transpose and replace C with C – 1 ✓ C r × r ◆ ✓ C − 1 ◆ 0 0 A † V T U T r × r A m × n = U n × m = V 0 0 0 0
The Moore-Penrose Inverse • In terms of the URV factorization, the Moore-Penrose Inverse of ✓ C r × r ◆ ✓ C − 1 ◆ 0 0 is A † V T U T r × r A m × n = U n × m = V 0 0 0 0 • When Ax = b is consistent, is the x = A † b solution with minimal Euclidean norm • When Ax = b is inconsistent, is the least x = A † b squares solution of minimal Euclidean norm • Note: much like the inverse, the pseudoinverse is useful in theory, not in practical applications
The Moore-Penrose Inverse • Proof for the consistent case solves Ax = b • x � = A † b A = AA † A Ax 0 = AA † b = AA † Ax = Ax = b has minimum Euclidean norm • x � = A † b x = x 0 + N ( A ) = x 0 + n R ( A † ) = R ( A T ) R ( A T ) ⊥ N ( A ) ⇒ x 0 ⊥ n k x k 2 = k x 0 + n k 2 = k x 0 k 2 + k n k 2 � k x 0 k 2
Matrix decompositions based on four fundamental spaces #1 • Orthogonal decomposition • Used by URV decomposition • Works with any matrix • Different orthogonal transformations on each side • Specializes to the singular value decomposition • Range-Nullspace decomposition • Used by Core-Nilpotent factorization • Matrix must be square • Based on similarity transformation • Paves the way to the Jordan form
Matrix decompositions based on four fundamental spaces #1 • Similarities reveal properties of a transformation that are independent of bases or coordinate systems • Orthogonallity is useful in least-squares, or for numerical stability when using floating-point • Core-Nilpotent is usually not a special case of URV • Not even necessarily when A is a square matrix • URV T is not a similarity unless U = V • Surprisingly, U = V defines a rather large class of matrices
Range Perpendicular to Nullspace • For rank ( A n × n ) = r , the following are equivalent • R ( A ) ⊥ N ( A ) R ( A ) = R ( A T ) • N ( A ) = N ( A T ) • ✓ C r × r ◆ U orthogonal and 0 • U T with A = U C non-singular 0 0 • Such matrices are called RPN matrices , or range- symmetric matrices, or EP matrices • In the complex case, replace orthogonal by unitary and conjugate the transposes
A map of matrix types RPN Normal Hermitian Real-Symmetric Non-singular
SINGULAR VALUE DECOMPOSITION
Can we specialize URV even more? • We can certainly make C triangular • Factor C = QR • Note that ✓ C ◆ ✓ Q ◆ ✓ R ◆ 0 0 0 = 0 0 0 I 0 0 • So that ✓ Q ◆ ✓ R ◆ 0 0 V T A = U 0 I 0 0 • We now do even better, by making C diagonal • This leads to the Singular Value Decomposition • SVD is the Swiss-army knife of decompositions
Derivation of SVD #1 • We proceed incrementally … ✓ σ 1 ◆ ✓ σ 2 ◆ 0 0 = P T = P T D 1 = 1 C 1 Q 1 D 2 = 2 C 2 Q 2 0 C 2 0 C 3 • Multiplying by blocks, we have ✓ x T ◆ ✓ x T ◆ � y 1 ✓ x T x T ◆ ✓ σ 1 ◆ 1 C 1 1 C 1 y 1 1 C 1 Y 1 0 1 � y 1 � � C 1 Y 1 Y 1 = = = X T X T X T X T 1 C 1 1 C 1 y 1 1 C 1 Y 1 0 C 2 1
Derivation of SVD #1 • We proceed incrementally … ✓ σ 1 ◆ ✓ σ 2 ◆ 0 0 = P T = P T D 1 = 1 C 1 Q 1 D 2 = 2 C 2 Q 2 0 C 2 0 C 3 • Multiplying by blocks, we have ✓ x T ◆ ✓ x T ◆ � y 1 ✓ x T x T ◆ ✓ σ 1 ◆ 1 C 1 1 C 1 y 1 1 C 1 Y 1 0 1 � y 1 � � C 1 Y 1 Y 1 = = = X T X T X T X T 1 C 1 1 C 1 y 1 1 C 1 Y 1 0 C 2 1 • So we have two equations to satisfy and X T x T 1 C 1 y 1 = 0 1 C 1 Y 1 = 0 • The first requires , and we know x 1 ⊥ X 1 C 1 y 1 ⊥ X 1 • Simply use Cy 1 to define x 1 and normalize C 1 y 1 ) X T 1 C 1 y 1 = X T x 1 = 1 x 1 k C 1 y 1 k = 0 k C 1 y 1 k
Derivation of SVD #2 • Next requires x T C T 1 C 1 Y 1 = 0 1 x 1 ⊥ Y 1 • In other words C T 1 C 1 y 1 ⊥ Y 1 • Problem solved if there is y 1 such that C T 1 C 1 y 1 = λ y 1 • Same as ( C T 1 C 1 − λ I ) y 1 = 0 • There is y 1 , and k C 1 y 1 k y 1 = arg max k y 1 k =1 • Now that we can pick x 1 and y 1 , complete X 1 and Y 1 • Then set C 2 = X T 1 C 1 Y 1 • Finally, note that x T 1 C 1 y 1 = σ 1 = k C 1 y 1 k ⇒ σ 2 1 = λ 1 C T 1 C 1 y 1 = λ 1 y 1 ) k C 1 y 1 k 2 = λ 1
Derivation of SVD #3 • Now that we have and ✓ σ 2 ◆ ✓ σ 1 ◆ 0 0 = P T = P T D 2 = 2 C 2 Q 2 D 1 = 1 C 1 Q 1 0 C 3 0 C 2 set and ✓ 1 ◆ ✓ 1 ◆ 0 0 ˆ ˆ Q 2 = P 2 = 0 Q 2 0 P 2 0 0 σ 1 so that we get D 2 = ˆ ˆ 1 C 1 Q 1 ˆ P T 2 P T 0 Q 2 = 0 σ 2 0 0 C 3 • After r – 1 steps P T Cˆ ˆ Q = D = diag( σ 1 , σ 2 , . . . , σ r ) with P and Q orthogonal ˆ ˆ P Q • To complete the proof, use P T Dˆ ✓ ˆ ◆ ✓ ˆ ✓ ˆ ✓ C ◆ ◆ ✓ C ◆ ◆ ✓ D ◆ P T 0 0 0 Q 0 Q 0 0 U T AV = = = 0 0 0 0 0 0 0 I 0 I 0 0
The Singular Value Decomposition • For each A in R m × n of rank r , there are orthogonal matrices U m × m and V n × n , and a diagonal matrix such that D = diag( σ 1 , σ 2 , . . . , σ r ) ✓ D ◆ 0 with σ 1 ≥ σ 2 ≥ · · · ≥ σ r > 0 V T A = U 0 0 • The are called the nonzero singular values of A σ i • When r < p = min( m , n ) , A is said to have p – r additional zero singular values • This factorization is called the singular value decomposition of A • The columns of U and V are called left-hand and right-hand singular vectors of A , respectively
Left and right singular vectors • The bases for R ( A T ) in V and for R ( A ) in U are chosen in a very special way ✓ D ◆ 0 AV = U ⇒ Av i = σ i u i 0 0 ✓ D ◆ 0 U T A = ⇒ u T i A = σ i v T V T i 0 0
Image of Unit Sphere • For a nonsingular matrix A in R n × n having singular values and a SVD A = UDV T , where σ 1 ≥ σ 2 ≥ · · · ≥ σ n , the image of the unit 2-sphere D = diag( σ 1 , σ 2 , . . . , σ n ) is an ellipsoid whose k th semiaxis is given by σ k u k • Furthermore, v k in unit sphere is such that Av k = σ k u k R ( A T ) R ( A ) Av 2 = σ 2 u 2 Av 1 = σ 1 u 1 v 1 A v 3 v 2 Av 3 = σ 3 u 3
Connection to A T A and AA T • The squared singular values are non-zero σ 2 i eigenvalues of both A T A and of AA T • Furthermore, the vectors v i in V and u i in U are corresponding eigenvectors of A T A and AA T • Proof ✓ D 2 ◆ ✓ D 2 ◆ 0 0 ⇒ A T A v i = σ 2 ( A T A ) n × n = V V T ⇒ A T A V = V i v i 0 0 0 0 ✓ D 2 ◆ ✓ D 2 ◆ 0 0 ( AA T ) m × m = U U T ⇒ AA T U = U ⇒ AA T u i = σ 2 i u i 0 0 0 0
Compact SVDs • When matrix A m × n has many more rows than columns, n << m , it is better to use • Thin SVD: A = U n Σ n V T • Compute only first n columns in U and Σ • We usually don’t care about the Nullspaces • When the rank ( A ) = r is much smaller than the number of rows or columns • Economy/Compact SVD: A = U r Σ r V T r • Compute and store only first r columns
Dimensionality Reduction • Map points in high-dimensional space to points in a lower-dimensional space • Try to preserve structure • Pairwise distances etc • Useful for further processing • Less computation, fewer parameters • Easier to understand, visualize
Principal Component Analysis • Principal Component Analysis (PCA) is a method for approximating a high-dimensional data set using a lower-dimensional affine space * * Second principal component * * First principal component * * * * * * * * * Original axes * * * * * * * * * * * Data points
PCA: Formal definition • Given a set of points in R n , D = { x 1 , x 2 , . . . , x N } can we find the m -dimensional subspace V of R n that allows us to best represent the set D ? • For each x i , we want to find an y i such that Px i ∈ V , a ∈ R n y i = Px i + a , • Given m , find V and a that minimize N N ε = 1 k y i � x i k 2 = 1 X X k a � ( I � P ) x i k 2 N N i =1 i =1
PCA: Derivation #1 • First, let us find the value of a N ε = 1 � T � X � � a − ( I − P ) x i a − ( I − P ) x i N i =1 N = 1 a T a − 2 a T ( I − P ) x i + x T ( I − P ) T ( I − P ) x i X N i =1 • Using calculus, N ⇒ 1 ∂ε X 2 e T k a − 2 e T = 0 k ( I − P ) x i = 0 ∂ [ a ] k N i =1 N N N ⇒ 1 a = 1 x i X X X ( I − P ) x i = ( I − P ) N N N i =1 i =1 i =1 ⇒ a = ( I − P ) ¯ x
PCA: Derivation #2 • Now that we know a , we can write N ε = 1 X x ) k 2 k ( I � P )( x i � ¯ N • Take orthonormal bases for V and its orthogonal i =1 complement and put in matrices V and U N N ε = 1 x ) k 2 = 1 k UU T ( x i � ¯ k U T ( x i � ¯ X X x ) k 2 N N i =1 i =1 N ! N n − m n − m 1 = 1 x ) T u j X X X X u T u T x ) T = ( x i − ¯ x )( x i − ¯ j ( x i − ¯ x )( x i − ¯ u j j N N j =1 i =1 i =1 j =1 S = 1 n − m N S = 1 N XX T X X u T x ) T ( x i − ¯ x )( x i − ¯ j S u j = N [ X ] ∗ j = x j − ¯ x i =1 j =1 S is the covariance matrix , and it does not depend on U
Recommend
More recommend