dimension reduction for classification
play

Dimension Reduction for Classification Alfred O. Hero Dept. EECS, - PowerPoint PPT Presentation

Dimension Reduction for Classification Alfred O. Hero Dept. EECS, Dept BME, Dept. Statistics University of Michigan - Ann Arbor hero@eecs.umich.edu http://www.eecs.umich.edu/~hero BIRS, July. 2005 1. The manifold supporting data sample 2.


  1. Dimension Reduction for Classification Alfred O. Hero Dept. EECS, Dept BME, Dept. Statistics University of Michigan - Ann Arbor hero@eecs.umich.edu http://www.eecs.umich.edu/~hero BIRS, July. 2005 1. The manifold supporting data sample 2. Classification constrained dimension reduction 3. Dimension estimation on smooth manifolds 4. Applications 5. Conclusions

  2. 1. Manifold supporting data sample d dimensional subspace • Yale Face Database B: each 128x128 image lies in R^(16384)

  3. Data-driven dimensionality reduction • Data-driven dimensionality reduction consists of: – Estimation of intrinsic dimension d: • Direct intrinsic dimension estimation – Reconstruction of data samples in the manifold domain: • Manifold learning • Classifiers on intrinsic data dimension – Estimated dimension as a discriminant – Label-constrained manifold learning

  4. Manifold Learning Manifold learning problem setup: Given a finite sampling of a d -dimensional manifold , find an embedding of into a subset (usually ) without any prior knowledge about or d.

  5. Manifold learning background Reconstructing the mapping and attributes of the manifold from a finite dataset falls into the general manifold learning problem. Manifold reconstruction for fixed d: 1. ISOMAP , Tenenbaum, de Silva, Langford (2000); 2. Locally Linear Embeddings (LLE), Roweiss, Saul (2000); 3. Laplacian Eigenmaps , Belkin, Niyogi (2002); 4. Hessian Eigenmaps (HLLE), Grimes, Donoho (2003); 5. Local Space Tangent Alignment (LTSA), Zhang, Za (2003); 6. SemiDefinite Embedding (SDE), Weinberger, Saul (2004).

  6. Laplacian Eigenmaps Laplacian Eigenmaps: preserving local information (Belkin & Niyogi 2002) 1. Constructing an Adjacency Graph: a. compute a k-NN graph on the dataset; b. compute a similarity/weight matrix W between data points, that encodes neighborhood information (e.g., heat kernel):

  7. Laplacian Eigenmaps 2. Manifold learning as an optimization problem: a. objective function: where is the Graph Laplacian . b. embedding is solution of ( � )

  8. Laplacian Eigenmaps 3. Eigenmaps: a. solution to ( � ) is given by the d generalized eigenvectors associated with the d smallest generalized eigenvalues that solve: equivalently, eigenvectors of the normalized Graph Laplacian b. if is the collection of such eigenvectors, then the embedded points are given by

  9. Dimension Reduction for Labeled Data 10 15 10 5 5 0 0 −5 −5 −10 −15 −10 −60 −40 −20 0 20 40 60 20 15 10 Dimension reduced Data X 5 10 0 5 −5 Original Data Y 800 points uniform on Swiss roll, 400 each class

  10. 2. Classification constrained dimensionality reduction Adding class dependent constraints – “virtual” class vertices. 10 5 0 −5 −10 20 15 10 5 10 5 0 −5

  11. Label-penalized Laplacian Eigenmaps 1. If C is the class membership matrix (i.e., c_ij = 1 if point j is from class i), define the objective function: where , are the “virtual” class centers and is a regularization parameter. 2. Embedding is solution of where L is Laplacian of augmented weight matrix

  12. 15 10 Unconstrained 5 Dimensionality 0 Reduction −5 −10 10 −15 −60 −40 −20 0 20 40 60 5 0 0.03 0.025 0.02 −5 0.015 0.01 −10 0.005 20 15 10 5 10 0 5 −5 0 Classification −0.005 Constrained −0.01 Dimensionality −0.015 Reduction −0.02 −0.02 −0.015 −0.01 −0.005 0 0.005 0.01 0.015 0.02

  13. Partially Labeled Data Semi-Supervised Learning on Manifolds unlabeled samples 10 5 0 labeled samples −5 −10 20 15 10 5 10 5 0 −5

  14. Semisupervised extension Algorithm: 1. Compute the constrained embedding of the entire data set, inserting a zero column in C for each unlabeled sample. 2. Fit a (e.g., linear) classifier to the labeled embedded points by minimizing the quadratic error loss: 3. For an unlabeled point , label it using the fitted (linear) classifier:

  15. Classification Error Rates 60 50 40 k−NN Error rate (%) Laplacian CCDR 30 20 10 0 0 100 200 300 400 500 600 700 Number of labeled points Percentage of errors for labeling unlabeled samples as a function of the number of labeled points, out of a total of 1000 points on the Swiss roll.

  16. 3. Methods of dimension estimation Residual variance vs dimentionality- Data Set 1 0.015 Knee? • Scree plots 0.01 ce n ria l va – Plot residual fitting errors of a u sid e R 0.005 SVD, Isomap, LE, LLE 0 0 2 4 6 8 10 12 14 16 18 20 Isomap dimensionality ISOMAP residual curve • Kolmogorov/Entropy/Correlation dimension – Box counting, sphere packing (Liebovitch and Toth) • Maximum likelihood – Poisson approximation to Binomial (Levina&Bickel:2004) • Entropic graphs – Spanner-graph length approximation to entropy functional (Costa&Hero:2003)

  17. Euclidean Random Graphs • data in D-dimensionalEuclidean space • Euclidean MST with edge power weighting gamma: • pairwise distance matrix over • edge length matrix of spanning trees over • Euclidean k-NNG with edge power weighting gamma:

  18. Example: Uniform Planar Sample Uniform sample on square: N=400 3 2.5 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 γ =1, k=4 Minimal Spanning Tree: k NN Graph: γ =1, k=4 3 3 2.5 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 −0.5 −0.5 −1 −1 −1.5 −1.5 −2 −2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3

  19. Convergence of Euclidean CQF’s Beardwood, Halton, Hammersley Theorem (BHH:1959):

  20. k-NNG Convergence Theorem in Non-Euclidean Spaces Costa, Hero: TSP(2004), Birkhauser(2005)

  21. Application to 2D Torus 4 x 10 2.81 • 600 uniformly distributed random samples 2.8 2.79 average length 2.78 2.77 d=2 H=9.6 bits 2.76 2.75 682 684 686 688 690 692 694 696 698 number of samples Mean kNNG (k=5) length

  22. Local Extension via kNNG – Initialize: – For i=1,2,…p • compute and set

  23. 4. Application to MNIST Digits • Large database of 8 bit images of digits 0-9. • 28x28 pixels for each image • First 1000 images in training set used here • Non-adaptive: digit labels are known

  24. Scree Plot Costa&Hero:Birkhauser05

  25. Local Dimension/Entropy Statistics Costa&Hero:Birkhauser05

  26. Adaptive Anomaly Detection • Spatio-temporal measurement vector: STTL temperat ure CHIN NYCM SNVA day DNVR IPLS WASH tempera ture LOSA KSCY ATLA day HSTN temperat ure day

  27. Data Observed from Abilene Network • Objective: detect changes in network traffic via local intrinsic dimension • Hypotheses: – High traffic from few sources lowers the local dimension of the network traffic – Changes in distribution of dimension estimate can be used as a marker for more subtle changes in traffic • Data collection period: 1/1/05-1/2/05 • Data sampling: packet flow sampled every 5 minutes from all 11 routers on Abilene Network • Data fields: aggregate of all flows to/from all ports

  28. KNN Algorithm (Costa) Modified Knn-Algo II 10 8 6 d 4 2 0 100 200 300 400 500 6 Actual Data x 10 2.5 2 1.5 1 0.5 0 0 100 200 300 400 500 1/01/05-1/02/05,n=150,Q=65,NN=4,N=10,M=10

  29. Example – 1/1/05, 12:20 pm • Large data transfers from IPs 145.146.96 and 192.31.120 drastically increase flows through Chicago and NYC. Modified Knn-Algo II 10 8 6 d 4 2 100 110 120 130 140 150 160 170 180 190 200 6 Actual Data x 10 2.5 2 1.5 1 0.5 0 100 110 120 130 140 150 160 170 180 190 200 1/01/05-1/02/05,n=150,Q=65,N N=4,N=10,M=10

  30. 5. Conclusions • Classification constraints can be included in manifold learning dimension reduction algorithms • kNNG jointly estimate dimension and entropy of high dimensional data • Dimension can be used as a discriminant in anomaly detection • Can be used as precursor to model reduction and database compression • Methods only suffer from curse of intrinsic dimensionality

  31. References • J. Costa, N. Patwari and A. O. Hero, "Distributed multidimensional scaling with adaptive weighting for node localization in sensor networks", (http://www.eecs.umich.edu/~hero/Preprints/wmds_v9.pdf), ACM Journal on Networking To appear 2005. • J. Costa and A. O. Hero, "Geodesic entropic graphs for dimension and entropy estimation in manifold learning," (http://www.eecs.umich.edu/~hero/Preprints/sp_mlsi_final_twocolumn.pdf) , IEEE Trans. on Signal Process. , Vol. 52, No. 8, pp. 2210-2221, Aug. 2004. • J.A. Costa, A. Girotra and A.O. Hero, "Estimating Local Intrinsic Dimension with k-Nearest Neighbor Graphs," IEEE Workshop on Statistical Signal Processing (SSP), Bordeaux, July 2005. (http://www.eecs.umich.edu/~hero/Preprints/ssp_2005_final_1.pdf) • J. Costa and A. O. Hero, "Classification constrained dimensionality reduction," (http://www.eecs.umich.edu/~hero/Preprints/costa_icassp2005.pdf), Proc. of ICASSP , Philadelphia, March, 2005.

Recommend


More recommend