Review of Linear Algebra Fereshte Khani April 9, 2020 1 / 57
Basic Concepts and Notation 1 Matrix Multiplication 2 Operations and Properties 3 Matrix Calculus 4 2 / 57
Basic Concepts and Notation 3 / 57
Basic Notation - By x ∈ R n , we denote a vector with n entries. x 1 x 2 x = . . . . x n - By A ∈ R m × n we denote a matrix with m rows and n columns, where the entries of A are real numbers. a T · · · — — a 11 a 12 a 1 n 1 | | | a T a 21 a 22 · · · a 2 n — — 2 = a 1 a 2 a n A = = · · · . . . . . ... . . . . . . . . | | | a T · · · — — a m 1 a m 2 a mn m 4 / 57
The Identity Matrix The identity matrix , denoted I ∈ R n × n , is a square matrix with ones on the diagonal and zeros everywhere else. That is, � 1 i = j I ij = 0 i � = j It has the property that for all A ∈ R m × n , AI = A = IA . 5 / 57
Diagonal matrices A diagonal matrix is a matrix where all non-diagonal elements are 0. This is typically denoted D = diag ( d 1 , d 2 , . . . , d n ) , with � d i i = j D ij = 0 i � = j Clearly, I = diag ( 1 , 1 , . . . , 1 ) . 6 / 57
Vector-Vector Product - inner product or dot product y 1 n y 2 � x T y ∈ R = � � · · · = x i y i . x 1 x 2 x n . . . i = 1 y n - outer product x 1 x 1 y 1 x 1 y 2 · · · x 1 y n · · · x 2 x 2 y 1 x 2 y 2 x 2 y n xy T ∈ R m × n = � � y 1 y 2 · · · y n = . . . . . ... . . . . . . . . x m x m y 1 x m y 2 · · · x m y n 7 / 57
Matrix-Vector Product - If we write A by rows, then we can express Ax as, a T a T — — 1 x 1 a T a T — — 2 x 2 y = Ax = x = . . . . . . . a T a T — — m x m - If we write A by columns, then we have: x 1 | | | x 2 a n a 1 x 1 + a 2 x 2 + . . . + a n x n . a 1 a 2 · · · y = Ax = = . . . | | | x n (1) y is a linear combination of the columns of A . 8 / 57
Matrix-Vector Product It is also possible to multiply on the left by a row vector. - If we write A by columns, then we can express x ⊤ A as, | | | y T = x T A = x T = a 1 a 2 a n � x T a 1 x T a 2 x T a n � · · · · · · | | | - expressing A in terms of rows we have: a T — — 1 a T — — y T = x T A 2 � · · · � = x 1 x 2 x m . . . a T — — m � a T � � a T � � a T � = — — + x 2 — — + ... + x m — — x 1 1 2 m y T is a linear combination of the rows of A . 9 / 57
Matrix-Matrix Multiplication (different views) 1. As a set of vector-vector products a T a T a T a T 1 b p 1 b 1 1 b 2 · · · — — 1 | | | a T a T 2 b 1 a T 2 b 2 a T 2 b p — — · · · 2 = b p b 1 b 2 · · · C = AB = . . . . . ... . . . . . . . . | | | a T a T a T a T m b p m b 1 m b 2 · · · — — m 10 / 57
Matrix-Matrix Multiplication (different views) 2. As a sum of outer products b T — — 1 | | | b T n — — 2 � a i b T a 1 a 2 a n C = AB = · · · = . . . i . | | | i = 1 b T — — n 11 / 57
Matrix-Matrix Multiplication (different views) 3. As a set of matrix-vector products. | | | | | | = . b 1 b 2 b p Ab 1 Ab 2 Ab p C = AB = A · · · · · · (2) | | | | | | Here the i th column of C is given by the matrix-vector product with the vector on the right, c i = Ab i . These matrix-vector products can in turn be interpreted using both viewpoints given in the previous subsection. 12 / 57
Matrix-Matrix Multiplication (different views) 4. As a set of vector-matrix products. a T a T — — — 1 B — 1 a T a T — — — — 2 B 2 C = AB = B = . . . . . . . a T a T — — — m B — m 13 / 57
Matrix-Matrix Multiplication (properties) - Associative: ( AB ) C = A ( BC ) . - Distributive: A ( B + C ) = AB + AC . - In general, not commutative; that is, it can be the case that AB � = BA . (For example, if A ∈ R m × n and B ∈ R n × q , the matrix product BA does not even exist if m and q are not equal!) 14 / 57
Operations and Properties 15 / 57
The Transpose The transpose of a matrix results from “flipping” the rows and columns. Given a matrix A ∈ R m × n , its transpose, written A T ∈ R n × m , is the n × m matrix whose entries are given by ( A T ) ij = A ji . The following properties of transposes are easily verified: - ( A T ) T = A - ( AB ) T = B T A T - ( A + B ) T = A T + B T 16 / 57
Trace The trace of a square matrix A ∈ R n × n , denoted tr A , is the sum of diagonal elements in the matrix: n � tr A = A ii . i = 1 The trace has the following properties: - For A ∈ R n × n , tr A = tr A T . - For A , B ∈ R n × n , tr ( A + B ) = tr A + tr B . - For A ∈ R n × n , t ∈ R , tr ( tA ) = t tr A . - For A , B such that AB is square, tr AB = tr BA . - For A , B , C such that ABC is square, tr ABC = tr BCA = tr CAB , and so on for the product of more matrices. 17 / 57
Norms A norm of a vector � x � is informally a measure of the “length” of the vector. More formally, a norm is any function f : R n → R that satisfies 4 properties: 1. For all x ∈ R n , f ( x ) ≥ 0 (non-negativity). 2. f ( x ) = 0 if and only if x = 0 (definiteness). 3. For all x ∈ R n , t ∈ R , f ( tx ) = | t | f ( x ) (homogeneity). 4. For all x , y ∈ R n , f ( x + y ) ≤ f ( x ) + f ( y ) (triangle inequality). 18 / 57
Examples of Norms The commonly-used Euclidean or ℓ 2 norm, � n � � � x 2 � x � 2 = i . � i = 1 The ℓ 1 norm, n � � x � 1 = | x i | i = 1 The ℓ ∞ norm, � x � ∞ = max i | x i | . In fact, all three norms presented so far are examples of the family of ℓ p norms, which are parameterized by a real number p ≥ 1, and defined as � n � 1 / p � | x i | p � x � p = . i = 1 19 / 57
Matrix Norms Norms can also be defined for matrices, such as the Frobenius norm, � m n � � � � � A 2 � A � F = ij = tr ( A T A ) . � i = 1 j = 1 Many other norms exist, but they are beyond the scope of this review. 20 / 57
Linear Independence A set of vectors { x 1 , x 2 , . . . x n } ⊂ R m is said to be (linearly) dependent if one vector belonging to the set can be represented as a linear combination of the remaining vectors; that is, if n − 1 � x n = α i x i i = 1 for some scalar values α 1 , . . . , α n − 1 ∈ R ; otherwise, the vectors are (linearly) independent . 21 / 57
Linear Independence A set of vectors { x 1 , x 2 , . . . x n } ⊂ R m is said to be (linearly) dependent if one vector belonging to the set can be represented as a linear combination of the remaining vectors; that is, if n − 1 � x n = α i x i i = 1 for some scalar values α 1 , . . . , α n − 1 ∈ R ; otherwise, the vectors are (linearly) independent . Example: 1 4 2 x 1 = 2 x 2 = 1 x 3 = − 3 − 1 3 5 are linearly dependent because x 3 = − 2 x 1 + x 2 . 21 / 57
Rank of a Matrix - The column rank of a matrix A ∈ R m × n is the size of the largest subset of columns of A that constitute a linearly independent set. 22 / 57
Rank of a Matrix - The column rank of a matrix A ∈ R m × n is the size of the largest subset of columns of A that constitute a linearly independent set. - The row rank is the largest number of rows of A that constitute a linearly independent set. 22 / 57
Rank of a Matrix - The column rank of a matrix A ∈ R m × n is the size of the largest subset of columns of A that constitute a linearly independent set. - The row rank is the largest number of rows of A that constitute a linearly independent set. - For any matrix A ∈ R m × n , it turns out that the column rank of A is equal to the row rank of A (prove it yourself!), and so both quantities are referred to collectively as the rank of A , denoted as rank ( A ) . 22 / 57
Properties of the Rank - For A ∈ R m × n , rank ( A ) ≤ min ( m , n ) . If rank ( A ) = min ( m , n ) , then A is said to be full rank . 23 / 57
Properties of the Rank - For A ∈ R m × n , rank ( A ) ≤ min ( m , n ) . If rank ( A ) = min ( m , n ) , then A is said to be full rank . - For A ∈ R m × n , rank ( A ) = rank ( A T ) . 23 / 57
Properties of the Rank - For A ∈ R m × n , rank ( A ) ≤ min ( m , n ) . If rank ( A ) = min ( m , n ) , then A is said to be full rank . - For A ∈ R m × n , rank ( A ) = rank ( A T ) . - For A ∈ R m × n , B ∈ R n × p , rank ( AB ) ≤ min ( rank ( A ) , rank ( B )) . 23 / 57
Recommend
More recommend