linear algebra for machine learning
play

Linear Algebra for Machine Learning Sargur N. Srihari - PowerPoint PPT Presentation

Deep Learning Srihari Linear Algebra for Machine Learning Sargur N. Srihari srihari@cedar.buffalo.edu 1 Deep Learning Srihari Overview Linear Algebra is based on continuous math rather than discrete math Computer scientists have


  1. Deep Learning Srihari Linear Algebra for Machine Learning Sargur N. Srihari srihari@cedar.buffalo.edu 1

  2. Deep Learning Srihari Overview • Linear Algebra is based on continuous math rather than discrete math – Computer scientists have little experience with it • Essential for understanding ML algorithms • Here we discuss: – Discuss scalars, vectors, matrices, tensors – Multiplying matrices/vectors – Inverse, Span, Linear Independence – SVD, PCA 2

  3. Deep Learning Srihari Scalar • Single number • Represented in lower-case italic x – E.g., let be the slope of the line x ∈ ! • Defining a real-valued scalar n ∈ ! – E.g., let be the number of units • Defining a natural number scalar 3

  4. Deep Learning Srihari Vector • An array of numbers • Arranged in order • Each no. identified by an index • Vectors are shown in lower-case bold ⎡ ⎤ x 1 ⎢ ⎥ ⎢ ⎥ x 2 ⎢ ⎥ ⇒ x T = x 1 ,x 2 ,..x n ⎡ ⎤ x = ⎢ ⎥ ⎣ ⎦ ⎢ ⎥ ⎢ ⎥ x n ⎢ ⎥ ⎣ ⎦ • If each element is in R then x is in R n • We think of vectors as points in space 4 – Each element gives coordinate along an axis

  5. Deep Learning Srihari Matrix • 2-D array of numbers • Each element identified by two indices • Denoted by bold typeface A • Elements indicated as A m,n ⎡ ⎤ – E.g., A 1,1 A 1,2 ⎢ ⎥ A = ⎢ ⎥ A 2,1 A 2,2 ⎣ ⎦ • A i : is i th row of A , A :j is j th column of A • If A has shape of height m and width n with real-values then A ∈ ! m × n 5

  6. Deep Learning Srihari Tensor • Sometimes need an array with more than two axes • An array arranged on a regular grid with variable number of axes is referred to as a tensor • Denote a tensor with bold typeface: A • Element (i,j,k) of tensor denoted by A i,j,k 6

  7. Deep Learning Srihari Transpose of a Matrix • Mirror image across principal diagonal ⎡ ⎤ ⎡ ⎤ A 1,1 A 1,2 A 1,3 A 1,1 A 2,1 A 3,1 ⎢ ⎥ ⎢ ⎥ ⇒ A T = ⎢ ⎥ ⎢ ⎥ A = A 2,1 A 2,2 A 2,3 A 1,2 A 2,2 A 3,2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ A 3,1 A 3,2 A 3,3 A 1,3 A 2,3 A 3,3 ⎣ ⎦ ⎣ ⎦ • Vectors are matrices with a single column – Often written in-line using transpose x = [ x 1 ,..,x n ] T • Since a scalar is a matrix with one element a=a T 7

  8. Deep Learning Srihari Matrix Addition • If A and B have same shape (height m , width n ) C = A + B ⇒ C i , j = A i , j + B i , j • A scalar can be added to a matrix or multiplied by a scalar D = aB + c ⇒ D i , j = aB i , j + c • Vector added to matrix (non-standard matrix algebra) C = A+b ⇒ C i , j = A i , j + b j – Called broadcasting since vector b is added to each 8 row of A

  9. Deep Learning Srihari Multiplying Matrices • For product C=AB to be defined, A has to have the same no. of columns as the no. of rows of B • If A is of shape m x n and B is of shape n x p then matrix product C is of shape m x p ∑ C = AB ⇒ C i , j = A i , k B k , j k • Note that the standard product of two matrices is not just the product of two individual elements 9

  10. Deep Learning Srihari Multiplying Vectors • Dot product of two vectors x and y of same dimensionality is the matrix product x T y • Conversely, matrix product C=AB can be viewed as computing C ij the dot product of row i of A and column j of B 10

  11. Deep Learning Srihari Matrix Product Properties • Distributivity over addition: A(B+C)=AB+AC • Associativity: A(BC)=(AB)C • Not commutative: AB=BA is not always true • Dot product between vectors is commutative: x T y=y T x • Transpose of a matrix product has a simple form: (AB) T =B T A T 11

  12. Deep Learning Srihari Linear Transformation • A x = b – where and A ∈ ! n × n b ∈ ! n – More explicitly A 11 x 1 + A 12 x 2 +....+ A 1n x n = b n equations in 1 A 2 1 x 1 + A 2 2 x 2 +....+ A 2 n x n = b 2 n unknowns A n 1 x 1 + A m 2 x 2 +....+ A n , n x n = b n ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ A 1,1 A 1, n ! x 1 b Can view A as a linear transformation ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ A = " " " x = " b = " of vector x to vector b ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ A n ,1 ! A nn x n b ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ n ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ n x n n x 1 n x 1 • Sometimes we wish to solve for the unknowns x ={ x 1 ,..,x n } when A and b provide constraints 12

  13. Deep Learning Srihari Identity and Inverse Matrices • Matrix inversion is a powerful tool to analytically solve A x = b • Needs concept of Identity matrix • Identity matrix does not change value of vector when we multiply the vector by identity matrix – Denote identity matrix that preserves n-dimensional vectors as I n – Formally and I n ∈ ! n × n ∀ x ∈ ! n , I n x = x ⎡ ⎤ – Example of I 3 1 0 0 ⎢ ⎥ 0 1 0 ⎢ ⎥ 13 ⎢ ⎥ 0 0 1 ⎣ ⎦

  14. Deep Learning Srihari Matrix Inverse • Inverse of square matrix A defined as A − 1 A = I n • We can now solve A x = b as follows: A x = b A − 1 A x = A − 1 b I n x = A − 1 b x = A − 1 b • This depends on being able to find A -1 • If A -1 exists there are several methods for finding it 14

  15. Deep Learning Srihari Solving Simultaneous equations • A x = b where A is ( M+1 ) x ( M+1 ) x is ( M+1 ) x 1: set of weights to be determined b is N x 1 • Two closed-form solutions 1. Matrix inversion x = A -1 b 2. Gaussian elimination 15

  16. Deep Learning Srihari Linear Equations: Closed-Form Solutions 1. Matrix Formulation: Ax=b Solution: x=A -1 b 2. Gaussian Elimination followed by back-substitution L 2 -3L 1 à L 2 L 3 -2L 1 à L 3 -L 2 /4 à L 2

  17. Deep Learning Srihari Example: System of Linear Equations in Linear Regression • Instead of A x = b • We have Φ w = t – where Φ is design matrix of m features (basis functions ϕ i (x j ) ) for samples x j , t is targets of sample – We need weight w to be used with m basis functions to determine output m ( ) ∑ y ( x,w ) = w i φ i x 17 i = 1

  18. Deep Learning Srihari Disadvantage of closed-form solutions • If A -1 exists, the same A -1 can be used for any given b – But A -1 cannot be represented with sufficient precision – It is not used in practice • Gaussian elimination also has disadvantages – numerical instability (division by small no.) – O ( n 3 ) for n x n matrix • Software solutions use value of b in finding x – E.g., difference (derivative) between b and output is 18 used iteratively

  19. Deep Learning Srihari How many solutions for A x = b exist? • System of equations with A 11 x 1 + A 12 x 2 +....+ A 1n x n = b 1 A 2 1 x 1 + A 2 2 x 2 +....+ A 2 n x n = b 2 – n variables and m equations is • Solution is x =A -1 b A m 1 x 1 + A m 2 x 2 +....+ A m n x n = b m • In order for A -1 to exist A x = b must have exactly one solution for every value of b – It is also possible for the system of equations to have no solutions or an infinite no. of solutions for some values of b • It is not possible to have more than one but fewer than infinitely many solutions – If x and y are solutions then z = α x + ( 1- α ) y is a 19 solution for any real α

  20. Deep Learning Srihari Span of a set of vectors • Span of a set of vectors: set of points obtained by a linear combination of those vectors – A linear combination of vectors { v (1 ) ,.., v (n) } with coefficients c i is ∑ v ( i ) c i i – System of equations is A x = b • A column of A , i.e., A :i specifies travel in direction i • How much we need to travel is given by x i ∑ • This is a linear combination of vectors A x = x i A :, i i – Thus determining whether A x = b has a solution is equivalent to determining whether b is in the span of columns of A • This span is referred to as column space or range of A

  21. Deep Learning Srihari Conditions for a solution to A x = b • Matrix must be square, i.e., m=n and all columns must be linearly independent – Necessary condition is n ≥ m b ∈ ! m • For a solution to exist when we require the ! m column space be all of – Sufficient Condition • If columns are linear combinations of other columns, ! m column space is less than – Columns are linearly dependent or matrix is singular ! m • For column space to encompass at least one set of m linearly independent columns • For non-square and singular matrices – Methods other than matrix inversion are used

  22. Deep Learning Srihari Norms • Used for measuring the size of a vector • Norms map vectors to non-negative values • Norm of vector x is distance from origin to x – It is any function f that satisfies: ( ) = 0 ⇒ x = 0 f x ( ) + f y ( ) Triangle Inequality f (x + y ) ≤ f x ( ) = α f x ( ) ∀ α ∈ ! f α x 22

  23. Deep Learning Srihari L P Norm 1 • Definition ⎛ ⎞ p ∑ p x p = x i ⎜ ⎟ ⎝ ⎠ i • L 2 Norm – Called Euclidean norm, written simply as ||x|| – Squared Euclidean norm is same as x T x • L 1 Norm – Useful when 0 and non-zero have to be distinguished (since L 2 increases slowly near origin, e.g., 0.1 2 =0.01) • L ∞ Norm ∞ = max x x i i – Called max norm 23

  24. Deep Learning Srihari Size of a Matrix • Frobenius norm 1 ⎛ ⎞ 2 ∑ 2 A i , j A F = ⎜ ⎟ ⎝ ⎠ i,j • It is analogous to L 2 norm of a vector 24

  25. Deep Learning Srihari Angle between Vectors • Dot product of two vectors can be written in terms of their L 2 norms and angle θ between them x T y = x 2 cos θ 2 y 25

Recommend


More recommend