Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 1 / 26
Outline Span & Linear Dependence 1 Norms 2 Eigendecomposition 3 Singular Value Decomposition 4 Traces and Determinant 5 Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 2 / 26
Outline Span & Linear Dependence 1 Norms 2 Eigendecomposition 3 Singular Value Decomposition 4 Traces and Determinant 5 Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 3 / 26
Matrix Representation of Linear Functions A linear function (or map or transformation) f : R n ! R m can be represented by a matrix A , A 2 R m ⇥ n , such that f ( x ) = Ax = y , 8 x 2 R n , y 2 R m Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 4 / 26
Matrix Representation of Linear Functions A linear function (or map or transformation) f : R n ! R m can be represented by a matrix A , A 2 R m ⇥ n , such that f ( x ) = Ax = y , 8 x 2 R n , y 2 R m span ( A : , 1 , ··· , A : , n ) is called the column space of A rank ( A ) = dim ( span ( A : , 1 , ··· , A : , n )) Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 4 / 26
System of Linear Equations Given A and y , solve x in Ax = y Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 5 / 26
System of Linear Equations Given A and y , solve x in Ax = y What kind of A that makes Ax = y always have a solution? Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 5 / 26
System of Linear Equations Given A and y , solve x in Ax = y What kind of A that makes Ax = y always have a solution? Since Ax = Σ i x i A : , i , the column space of A must contain R m , i.e., R m ✓ span ( A : , 1 , ··· , A : , n ) Implies n � m Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 5 / 26
System of Linear Equations Given A and y , solve x in Ax = y What kind of A that makes Ax = y always have a solution? Since Ax = Σ i x i A : , i , the column space of A must contain R m , i.e., R m ✓ span ( A : , 1 , ··· , A : , n ) Implies n � m When does Ax = y always have exactly one solution? Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 5 / 26
System of Linear Equations Given A and y , solve x in Ax = y What kind of A that makes Ax = y always have a solution? Since Ax = Σ i x i A : , i , the column space of A must contain R m , i.e., R m ✓ span ( A : , 1 , ··· , A : , n ) Implies n � m When does Ax = y always have exactly one solution? A has at most m columns; otherwise there is more than one x parametrizing each y Implies n = m and the columns of A are linear independent with each other Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 5 / 26
System of Linear Equations Given A and y , solve x in Ax = y What kind of A that makes Ax = y always have a solution? Since Ax = Σ i x i A : , i , the column space of A must contain R m , i.e., R m ✓ span ( A : , 1 , ··· , A : , n ) Implies n � m When does Ax = y always have exactly one solution? A has at most m columns; otherwise there is more than one x parametrizing each y Implies n = m and the columns of A are linear independent with each other A � 1 exists at this time, and x = A � 1 y Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 5 / 26
Outline Span & Linear Dependence 1 Norms 2 Eigendecomposition 3 Singular Value Decomposition 4 Traces and Determinant 5 Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 6 / 26
Vector Norms A norm of vectors is a function k · k that maps vectors to non-negative values satisfying k x k = 0 ) x = 0 k x + y k k x k + k y k (the triangle inequality) k c x k = | c |· k x k , 8 c 2 R Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 7 / 26
Vector Norms A norm of vectors is a function k · k that maps vectors to non-negative values satisfying k x k = 0 ) x = 0 k x + y k k x k + k y k (the triangle inequality) k c x k = | c |· k x k , 8 c 2 R E.g., the L p norm ! 1 / p ∑ | x i | p k x k p = i L 2 (Euclidean) norm: k x k = ( x > x ) 1 / 2 L 1 norm: k x k 1 = ∑ i | x i | Max norm: k x k ∞ = max i | x i | Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 7 / 26
Vector Norms A norm of vectors is a function k · k that maps vectors to non-negative values satisfying k x k = 0 ) x = 0 k x + y k k x k + k y k (the triangle inequality) k c x k = | c |· k x k , 8 c 2 R E.g., the L p norm ! 1 / p ∑ | x i | p k x k p = i L 2 (Euclidean) norm: k x k = ( x > x ) 1 / 2 L 1 norm: k x k 1 = ∑ i | x i | Max norm: k x k ∞ = max i | x i | x > y = k x kk y k cos θ , where θ is the angle between x and y x and y are orthonormal i ff x > y = 0 (orthogonal) and k x k = k y k = 1 (unit vectors) Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 7 / 26
Matrix Norms Frobenius norm r ∑ A 2 k A k F = i , j i , j Analogous to the L 2 norm of a vector An orthogonal matrix is a square matrix whose column (resp. rows) are mutually orthonormal, i.e., A > A = I = AA > Implies A � 1 = A > Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 8 / 26
Outline Span & Linear Dependence 1 Norms 2 Eigendecomposition 3 Singular Value Decomposition 4 Traces and Determinant 5 Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 9 / 26
Decomposition Integers can be decomposed into prime factors E.g., 12 = 2 ⇥ 2 ⇥ 3 Helps identify useful properties, e.g., 12 is not divisible by 5 Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 10 / 26
Decomposition Integers can be decomposed into prime factors E.g., 12 = 2 ⇥ 2 ⇥ 3 Helps identify useful properties, e.g., 12 is not divisible by 5 Can we decompose matrices to identify information about their functional properties more easily? Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 10 / 26
Eigenvectors and Eigenvalues An eigenvector of a square matrix A is a non-zero vector v such that multiplication by A alters only the scale of v : Av = λ v , where λ 2 R is called the eigenvalue corresponding to this eigenvector Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 11 / 26
Eigenvectors and Eigenvalues An eigenvector of a square matrix A is a non-zero vector v such that multiplication by A alters only the scale of v : Av = λ v , where λ 2 R is called the eigenvalue corresponding to this eigenvector If v is an eigenvector, so is any its scaling c v , c 2 R , c 6 = 0 c v has the same eigenvalue Thus, we usually look for unit eigenvectors Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 11 / 26
Eigendecomposition I Every real symmetric matrix A 2 R n ⇥ n can be decomposed into A = Q diag ( λ ) Q > λ 2 R n consists of real-valued eigenvalues (usually sorted in descending order) Q = [ v ( 1 ) , ··· , v ( n ) ] is an orthogonal matrix whose columns are corresponding eigenvectors Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 12 / 26
Eigendecomposition I Every real symmetric matrix A 2 R n ⇥ n can be decomposed into A = Q diag ( λ ) Q > λ 2 R n consists of real-valued eigenvalues (usually sorted in descending order) Q = [ v ( 1 ) , ··· , v ( n ) ] is an orthogonal matrix whose columns are corresponding eigenvectors Eigendecomposition may not be unique When any two or more eigenvectors share the same eigenvalue Then any set of orthogonal vectors lying in their span are also eigenvectors with that eigenvalue Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 12 / 26
Eigendecomposition I Every real symmetric matrix A 2 R n ⇥ n can be decomposed into A = Q diag ( λ ) Q > λ 2 R n consists of real-valued eigenvalues (usually sorted in descending order) Q = [ v ( 1 ) , ··· , v ( n ) ] is an orthogonal matrix whose columns are corresponding eigenvectors Eigendecomposition may not be unique When any two or more eigenvectors share the same eigenvalue Then any set of orthogonal vectors lying in their span are also eigenvectors with that eigenvalue What can we tell after decomposition? Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 12 / 26
Eigendecomposition II Because Q = [ v ( 1 ) , ··· , v ( n ) ] is an orthogonal matrix, we can think of A as scaling space by λ i in direction v ( i ) Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 13 / 26
Rayleigh’s Quotient Theorem (Rayleigh’s Quotient) Given a symmetric matrix A 2 R n ⇥ n , then 8 x 2 R n , λ min x > Ax x > x λ max , where λ min and λ max are the smallest and largest eigenvalues of A . x > Px x > x = λ i when x is the corresponding eigenvector of λ i Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 14 / 26
Singularity Suppose A = Q diag ( λ ) Q > , then A � 1 = Q diag ( λ ) � 1 Q > Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 15 / 26
Singularity Suppose A = Q diag ( λ ) Q > , then A � 1 = Q diag ( λ ) � 1 Q > A is non-singular (invertible) i ff none of the eigenvalues is zero Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall 2016 15 / 26
Recommend
More recommend