Lecture 3 Principal Component Analysis Lin ZHANG, PhD School of Software Engineering Tongji University Fall 2020 Lin ZHANG, SSE, 2020
Content • Matrix Differentiation • Lagrange Multiplier • Principal Component Analysis • Eigen-face based face classification Lin ZHANG, SSE, 2020
Matrix differentiation • Function is a vector and the variable is a scalar [ ] T = f t ( ) f t ( ), f t ( ),..., f t ( ) 1 2 n Definition T df df t ( ) df t ( ) df t ( ) = 1 , 2 ,..., n dt dt dt dt Lin ZHANG, SSE, 2020
Matrix differentiation • Function is a matrix and the variable is a scalar f ( ) t f ( ),..., t f ( ) t 11 12 1 m f ( ) t f ( ),..., t f ( ) t = 21 22 2 m = f t ( ) f ( ) t ij × n m f ( ) t f ( ),..., t f ( ) t n 1 n 2 nm Definition df ( ) t df ( ) ,..., t df ( ) t 11 12 1 m dt dt dt df ( ) t df ( ) t df ( ) t df ( ) t ,..., 2 m 21 22 df = = ij dt dt dt dt dt × n m df ( ) t df ( ) t df ( ) t n 1 n 2 nm ,..., dt dt dt Lin ZHANG, SSE, 2020
Matrix differentiation • Function is a scalar and the variable is a vector = ) T f ( ), x x ( , x x ,..., x 1 2 n Definition T ∂ ∂ ∂ df f f f = , ,..., ∂ ∂ ∂ d x x x x 1 2 n In a similar way, = f ( ), x x ( , x x ,..., x ) 1 2 n ∂ ∂ ∂ df f f f = , ,..., ∂ ∂ ∂ d x x x x 1 2 n Lin ZHANG, SSE, 2020
Matrix differentiation • Function is a vector and the variable is a vector [ ] [ ] T T = = x x x , ,..., x , y y ( ), x y ( ),..., x y ( ) x 1 2 n 1 2 m Definition ∂ ∂ ∂ y ( ) x y ( ) x y ( ) x 1 , 1 ,..., 1 ∂ ∂ ∂ x x x 1 2 n ∂ ∂ ∂ y ( ) x y ( ) x y ( ) x , ,..., 2 2 2 d y ∂ ∂ ∂ = x x x 1 2 n T d x ∂ ∂ ∂ y ( ) x y ( ) x y ( ) x m , m ,..., m ∂ ∂ ∂ x x x 1 2 n × m n Lin ZHANG, SSE, 2020
Matrix differentiation • Function is a vector and the variable is a vector [ ] [ ] T T = = x x x , ,..., x , y y ( ), x y ( ),..., x y ( ) x 1 2 n 1 2 m In a similar way, ∂ ∂ ∂ y ( ) x y ( ) x y ( ) x 1 , 2 ,..., m ∂ ∂ ∂ x x x 1 1 1 ∂ ∂ ∂ y ( ) x y ( ) x y ( ) x 1 , 2 ,..., m T d y ∂ ∂ ∂ = x x x 2 2 2 d x ∂ ∂ ∂ y ( ) x y ( ) x y ( ) x 1 , 2 ,..., m ∂ ∂ ∂ x x x n n n × n m Lin ZHANG, SSE, 2020
Matrix differentiation • Function is a vector and the variable is a vector Example: x 1 y ( ) x 1 = = = − = + 2 2 y , x x , y ( ) x x x , y ( ) x x 3 x 2 1 1 2 2 3 2 y ( ) x 2 x 3 ∂ ∂ y ( ) x y ( ) x 1 2 ∂ ∂ x x 1 1 2 x 0 1 ∂ ∂ T d y y ( ) x y ( ) x = = − 1 2 1 3 ∂ ∂ d x x x 2 2 0 2 x 3 ∂ ∂ y ( ) x y ( ) x 1 2 ∂ ∂ x x 3 3 Lin ZHANG, SSE, 2020
Matrix differentiation • Function is a scalar and the variable is a matrix × ∈ m n f ( X X ), Definition ∂ ∂ ∂ f f f ∂ ∂ ∂ x x x 11 12 1 n df ( X ) = d X ∂ ∂ ∂ f f f ∂ ∂ ∂ x x x m 1 m 2 mn Lin ZHANG, SSE, 2020
Matrix differentiation • Useful results (1) ∈ n × 1 x a , Then, T T d a x d x a = = a , a d x d x How to prove? Lin ZHANG, SSE, 2020
Matrix differentiation • Useful results dA x × × ∈ ∈ = m n n 1 A , x A (2) Then, T d x T T d x A × × = ∈ ∈ T m n n 1 A A , x (3) Then, d x T d x A x × × ∈ ∈ n n n 1 = + A , x T (4) ( A A ) x Then, d x T d a Xb × × × ∈ ∈ ∈ m n m 1 n 1 = T X , a , b ab (5) Then, d X T T d a X b × × × ∈ ∈ ∈ n m m 1 n 1 = T X , a , b ba (6) Then, d X T d x x = 2 x n × ∈ 1 x (7) Then, d x Lin ZHANG, SSE, 2020
Content • Matrix Differentiation • Lagrange Multiplier • Principal Component Analysis • Eigen-face based face classification Lin ZHANG, SSE, 2020
Lagrange multiplier • Single-variable function ∈ x ( , ) a b f ( x ) is differentiable in ( a , b ) . At , f ( x ) achieves an 0 extremum df = | 0 x dx 0 • Two-variables function ( x , y ) f ( x , y ) is differentiable in its domain. At , f ( x, y ) 0 0 achieves an extremum ∂ ∂ f f = = | 0, | 0 ( x , y ) ( x , y ) ∂ ∂ x y 0 0 0 0 Lin ZHANG, SSE, 2020
Lagrange multiplier • In general case n × ∈ 1 x f x ( ) If , achieves a local extremum at x 0 and it is differential at x 0 , then x 0 is a stationary point of f ( x ) , i.g., ∂ ∂ ∂ f f f = = = | 0, | 0,..., | 0 x x x ∂ ∂ ∂ x x x 0 0 0 1 2 n Or in other words, ∇ = f ( ) | x 0 = x x 0 Lin ZHANG, SSE, 2020
Lagrange multiplier • Lagrange multiplier is a strategy for finding the stationary point of a function subject to equality constraints = ∈ × n 1 y f ( ), x x Problem: find stationary points for = = g ( ) x 0, k 1,2,..., m under m constraints k Solution: m + ∑ λ λ = λ F ( ; x ,..., ) f ( ) x g ( ) x 1 m k k = k 1 λ λ λ ( x , , ..., ) If is a stationary point 0 10 20 m 0 of F , then, x f x ( ) is a stationary point of with constraints 0 Joseph-Louis Lagrange Jan. 25, 1736~Apr.10, 1813 Lin ZHANG, SSE, 2020
Lagrange multiplier • Lagrange multiplier is a strategy for finding the stationary point of a function subject to equality constraints = ∈ × n 1 y f ( ), x x Problem: find stationary points for = = g ( ) x 0, k 1,2,..., m under m constraints k m Solution: + ∑ λ λ = λ F ( ; x ,..., ) f ( ) x g ( ) x 1 m k k = k 1 λ λ ( x , ,..., ) is a stationary point of F 0 10 m 0 ∂ ∂ ∂ ∂ ∂ ∂ F F F F F F = = = = = = 0, 0,..., 0, 0, 0,..., 0 ∂ ∂ ∂ ∂ λ ∂ λ ∂ λ x x x 1 2 n 1 2 m at that point n + m equations! Lin ZHANG, SSE, 2020
Lagrange multiplier • Example Problem: for a given point p 0 = (1, 0), among all the points lying on the line y = x , identify the one having the least distance to p 0 . The distance is = − + − 2 2 f x y ( , ) ( x 1) ( y 0) Now we want to find the stationary point of f ( x , y ) under the constraint ? = − = g x y ( , ) y x 0 According to Lagrange multiplier method, p 0 construct another function y = x λ = + λ = − + + λ − 2 2 F x y ( , , ) f x ( ) g x ( ) ( x 1) y ( y x ) F x y λ ( , , ) Find the stationary point for Lin ZHANG, SSE, 2020
Lagrange multiplier • Example Problem: for a given point p 0 = (1, 0), among all the points lying on the line y = x , identify the one having the least ∂ distance to p 0 . F = 0 ∂ x ∂ − + λ = = 2( x 1) 0 x 0.5 F = 0 ∂ − λ = = 2 y 0 y 0.5 ? y − = λ = x y 0 1 ∂ F = ∂ 0 p 0 λ y = x F x y λ ( , , ) (0.5,0.5,1) is a stationary point of (0.5,0.5) is a stationary point of f ( x , y ) under constraints Lin ZHANG, SSE, 2020
Content • Matrix Differentiation • Lagrange Multiplier • Principal Component Analysis • Eigen-face based face classification Lin ZHANG, SSE, 2020
Principal Component Analysis (PCA) • PCA: converts a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components • This transformation is defined in such a way that the first principal component has the largest possible variance, and each succeeding component in turn has the highest variance possible under the constraint that it be orthogonal to (i.e., uncorrelated with) the preceding components Lin ZHANG, SSE, 2020
Principal Component Analysis (PCA) How to find? • Illustration x , y (2.5, 2.4) (0.5, 0.7) (2.2, 2.9) (1.9, 2.2) (3.1, 3.0) (2.3, 2.7) (2.0, 1.6) (1.0, 1.1) (1.5, 1.6) (1.1, 0.9) De-correlation! Along which orientation the data points scatter most? Lin ZHANG, SSE, 2020
Recommend
More recommend