Introduction Theoretical Development Simulation Application Conclusion Nonlinear Dimension Reduction Using Kernel Representations Katie Kempfert University of North Carolina Wilmington Statistical and Machine Learning REU July 25, 2017 1 / 22
Introduction Theoretical Development Simulation Application Conclusion Outline Introduction Theoretical Development Kernel Principal Component Analysis (KPCA) Supervised Kernel Principal Component Analysis (SKPCA) Kernel Fisher’s Discriminant Analysis (KFDA) Simulation Application Conclusion 2 / 22
Introduction Theoretical Development Simulation Application Conclusion Introduction ◮ Dimensionality reduction techniques have become more popular with the rise of big data. ◮ In particular, dimensionality reduction is important for image processing. ◮ Features extracted from images often have very high dimension, especially because redundant information and noise is contained in the features. ◮ Dimensionality reduction can be used to find meaningful patterns in the data. 3 / 22
Introduction Theoretical Development Simulation Application Conclusion Kernel Principal Component Analysis (KPCA) Principal Component Analysis (PCA) ◮ Let X be a data matrix with covariance matrix Σ. ◮ In standard PCA, we assume the directions of variability in X are linear. ◮ Hence, we seek some transformation of X n × d = X Y p × d , A (1) n × p such that A is an orthogonal matrix. ◮ This optimization problem can be expressed as the following eigenproblem: Σ a i = λ i a i , i = 1 , . . . , d , (2) where λ 1 ≥ λ 2 ≥ ... ≥ λ d are eigenvalues with associated eigenvectors a i . 4 / 22
Introduction Theoretical Development Simulation Application Conclusion Kernel Principal Component Analysis (KPCA) Nonlinear Mapping ◮ A disadvantage of standard PCA is that it can only identify linear directions of variability. ◮ We overcome this by mapping X into some higher-dimensional Hilbert space R q via the nonlinear function Φ: Φ : R n → R q (3) X �→ Z ◮ The goal is to find Φ such that the directions of variability in Φ( X ) = Z are linear. 5 / 22
Introduction Theoretical Development Simulation Application Conclusion Kernel Principal Component Analysis (KPCA) Example of Nonlinear Mapping Figure 1: Intuition of KPCA 6 / 22
Introduction Theoretical Development Simulation Application Conclusion Kernel Principal Component Analysis (KPCA) Kernel Trick ◮ Performing PCA in a higher-dimensional space like R q may present computational complexities. ◮ In order to reduce the complexity, we use the kernel trick : k ( x i , x j ) = � Φ( x i ) , Φ( x j ) � , i , j = 1 , 2 , ..., n (4) ◮ The kernel is substituted for any dot product used in the covariance or Grammian matrix. ◮ Then we can essentially perform PCA on Z in R q . 7 / 22
Introduction Theoretical Development Simulation Application Conclusion Supervised Kernel Principal Component Analysis (SKPCA) SKPCA Problem ◮ PCA and KPCA are unsupervised methods, since they do not consider a response variable when identifying directions of variability in the data. ◮ SKPCA is a generalization of PCA and KPCA which incorporates class information. ◮ This is done by solving the maximization problem tr ( β KHLHK β t ) , (5) β where K ij = k ( x i , x j ) is the kernel matrix as defined for KPCA, L ij = l ( y i , y j ) = 1( y i = y j ) × k ( x i , x j ) is the link matrix, and H ij = 1( i = j ) − 1 n is the Hat matrix, for i , j = 1 , ..., n . 8 / 22
Introduction Theoretical Development Simulation Application Conclusion Supervised Kernel Principal Component Analysis (SKPCA) SKPCA Solution Assuming K is non-singular, this is a regular eigenproblem, since we have Av = λ Bv , which implies (6) B − 1 Av = λ v , where λ = v t Av v t Kv , B = K , and A = KHLHK , and B − 1 Av is symmetric. 9 / 22
Introduction Theoretical Development Simulation Application Conclusion Kernel Fisher’s Discriminant Analysis (KFDA) Fisher’s Discriminant Analysis (FDA) ◮ FDA is a popular dimension reduction technique in statistical and machine learning. ◮ Given a dataset with m classes, FDA aims to find the best set of features to discriminate between the classes. ◮ FDA is a supervised method; for every observation x i , FDA uses a class label associated with it. ◮ FDA can only identify groups that are linearly separable. 10 / 22
Introduction Theoretical Development Simulation Application Conclusion Kernel Fisher’s Discriminant Analysis (KFDA) ◮ In standard FDA, we seek to maximize the following objective function J ( v ): J ( v ) = v t S B v v t S W v , (7) where S B is the between classes scatter matrix, S W is the within classes scatter matrix, and v is a p x 1 vector. � x ) t p × p = S B ( ¯ x c − ¯ x )( ¯ x c − ¯ c (8) � � x c ) t p × p = ( x i − ¯ x c )( x i − ¯ S W c i ∈ c ◮ The solution to the maximization of J ( v ) is the eigenproblem 1 1 B S W − 1 S 2 B u = λ u . 2 (9) S 11 / 22
Introduction Theoretical Development Simulation Application Conclusion Kernel Fisher’s Discriminant Analysis (KFDA) Kernel Trick ◮ FDA can be generalized to KFDA to accommodate nonlinearities in the data. ◮ Similar to KPCA and SKPCA, this is achieved through the kernel trick. ◮ Basically, any occurrence of the dot product in the scatter matrices is replaced with the kernel function k ( x i , x j ) = � Φ( x i ) , Φ( x j ) � , i , j = 1 , 2 , ..., n . (10) 12 / 22
Introduction Theoretical Development Simulation Application Conclusion Simulation Summary ◮ KPCA, SKPCA, and KFDA are applied to 3 simulation datasets generated in R. ◮ For all methods, a modification of the radial basic function (RBF) k ( x i , x j ) = e − δ || x i − x j || 2 (11) is used. ◮ The tuning parameter δ is chosen through a grid search for each combination of dimensionality reduction method and dataset. ◮ For data visualization purposes, plots comparing the original data and the reduced dimension data in two dimensions (the projections) are given for each dataset. 13 / 22
Introduction Theoretical Development Simulation Application Conclusion Three Ring Data (a) Original Data (b) KFDA Projections in 2D (c) KPCA Projections in 2D (d) SKPCA Projections in 2D 14 / 22
Introduction Theoretical Development Simulation Application Conclusion Wine Chocolate Data (a) Original Data (b) KFDA Projections in 2D (c) KPCA Projections in 2D (d) SKPCA Projections in 2D 15 / 22
Introduction Theoretical Development Simulation Application Conclusion Swiss Roll Data (a) Original Data (b) KFDA Projections in 2D (c) KPCA Projections in 2D (d) SKPCA Projections in 2D 16 / 22
Introduction Theoretical Development Simulation Application Conclusion Introduction to MORPH-II ◮ MORPH-II is a face imaging database used by over 500 researchers worldwide for a variety of race, gender, and age face imaging tasks. ◮ It includes 55,134 mugshots of 13,617 individuals collected over a 5-year span. ◮ Additionally, MORPH-II provides relevant metadata such as subject ID number, picture number, date of birth, date of arrest, race, gender, and age. ◮ On average, there are 4 images per subject, with ages ranging from 16 to 77 years. 17 / 22
Introduction Theoretical Development Simulation Application Conclusion Process for MORPH-II 1. Clean, pre-process, and subset MORPH-II database. 2. Extract a number of features from MORPH-II images: biologically-inspired features (BIFs), histogram of oriented gradients (HOGs), and local binary patterns (LBPs). 3. Use KPCA, SKPCA, and KFDA to reduce the dimension of the feature data. 4. Perform gender classification with a linear support vector machine (SVM), taking reduced dimension data as input. 5. Compare results for all combinations of dimensionality reduction technique and feature type. 18 / 22
Introduction Theoretical Development Simulation Application Conclusion Tuning Parameters ◮ The dimension reduction techniques used on MORPH-II require careful tuning of parameters. ◮ Tuning is done on a subset of 1000 ”even” images from MORPH-II using two-fold cross-validation. ◮ The tuning parameters (including number of dimensions) which yield the highest average gender classification accuracy on the subset of 1000 images are the ones which will be used on the full set of images in MORPH-II. 19 / 22
Introduction Theoretical Development Simulation Application Conclusion Results from MORPH-II Subset Table 1: Feature Type Tuning Summary Values KFDA Ac- KPCA Ac- SKPCA curacy curacy Accuracy LBP r =1,2,3 80.80% 85.20% 86.00% s =10,12,14,16,18,20 ( r =1, s =10) ( r =1, s =14) ( r =1, s =10) HOG o =4,6,8 79.40% 90.40% 88.90% s =4,6,8,10,12,14 ( o =8, s =4) ( o =4, s =4) ( o =4, s =4) BIF g =0.1,0.2,...,1.0 83.50% 90.40% 89.80% ( s =15-29,7-37) ( g =0.4, s =15- ( g =0.1, s =15- ( g =1.0, s =15- 29) 29) 29) In all cases, linear SVM with cost c = 1 is used to classify gender. 20 / 22
Introduction Theoretical Development Simulation Application Conclusion Results from MORPH-II ◮ The proposed machine learning pipeline for MORPH-II has not yet been used on the full set of images. ◮ This process requires the use of high-performance computing, due to the size of MORPH-II, the dimension of extracted features, and the computational complexity of kernel-based methods. ◮ Hopefully, in the future a supercomputer can be used successfully for the task. 21 / 22
Recommend
More recommend