fast direct search in an optimally compressed continuous
play

Fast Direct Search in an Optimally Compressed Continuous Target - PowerPoint PPT Presentation

Introduction Methodology Results Fast Direct Search in an Optimally Compressed Continuous Target Space for Efficient Multi-Label Active Learning Weishi Shi and Qi Yu B. Thomas Golisano College of Computing and Information Sciences Rochester


  1. Introduction Methodology Results Fast Direct Search in an Optimally Compressed Continuous Target Space for Efficient Multi-Label Active Learning Weishi Shi and Qi Yu B. Thomas Golisano College of Computing and Information Sciences Rochester Institute of Technology Jun 2019 Weishi Shi and Qi Yu Multi-label Active Learning

  2. Introduction Methodology Results Multi-Label Active Learning Multi-label classification (ML-C) aims to learn a model that automatically assigns a set of relevant labels to a data instance. Multi-label problems naturally arise in many applications, including various image classification and video/audio recognition tasks. Data labeling for model training becomes more labor intensive as it is necessary to check each label in a potentially large label space, making active learning more important. Key challenges for multi-label AL Sampling measure is hard to design due to label correlations. Rare labels are much harder to detect. Computational cost increases fast with the number of labels. Weishi Shi and Qi Yu Multi-label Active Learning

  3. Introduction Methodology Results CS-BPCA Label Transformation We have proposed a principled two-level label transformation (Compressed Sensing (CS) + Bayesian Principal Component Analysis (BPCA)) strategy that enables multi-label active learning to be performed in an optimally compressed target space. CS-BPCA: Two-level Label Transformation Compressing/sampling Original label space (Y) CS Compressed space (R) BPCA Target space (U) Data Sample MOGP Recovery/prediction Weishi Shi and Qi Yu Multi-label Active Learning

  4. Introduction Methodology Results CS-BPCA Label Transformation We have proposed a principled two-level label transformation (Compressed Sensing (CS) + Bayesian Principal Component Analysis (BPCA)) strategy that enables multi-label active learning to be performed in an optimally compressed target space. CS-BPCA: Two-level Label Transformation Compressing/sampling Original label space (Y) CS Compressed space (R) BPCA Target space (U) Data Sample MOGP Recovery/prediction Key Properties of the Transformed Label Space Optimally compressed: The optimal compressing rate is automatically determined. Orthogonal: Label correlation is fully decoupled. Weishi Shi and Qi Yu Multi-label Active Learning

  5. Introduction Methodology Results Multi-output GP (MOGP) based Data Sampling Two key benefits Output the predictive entropy that provides an informative measure for uncertainty based data sampling. Use a flexible covariance function to precisely capture the covariance structure of the input data. A flexible kernel function k ( x i , x j ) = θ 0 exp {− θ 1 2 � x i − x j � 2 } + θ 2 x T i x j + θ 3 Apply to the optimally compressed target space Continuous: Consistent with the MOGP assumption; Compact: Efficient computation; Weighted: Precise sampling; Orthogonality: Decoupling label correlation. Weishi Shi and Qi Yu Multi-label Active Learning

  6. Introduction Methodology Results Gradient-free Hyper-parameter Optimization High computational cost of gradient based methods Compute the gradient of the likelihood over each hyperparameter until convergence (via p iterations): O | θ | pm 3 [Need to run multiple times due to a non-convex likelihood]. Construct the covariance matrix of input data: O ( m 2 n ). The overall complexity: O ( | θ | ( pm 3 + m 2 n )) Fast kernel re-estimation for covariance matrix construction We separate two blocks of computation that are invariant to θ and only partially update the kernel matrix for fast covariance matrix construction. O ( m 2 n ) − → O ( m 2 ) Weishi Shi and Qi Yu Multi-label Active Learning

  7. Introduction Methodology Results Gradient-free Hyper-parameter Optimization Bayesian Optimization (B-OPT) Use expected improvement as a cheap surrogate of the likelihood to choose a candidate θ from the grid search space. Need to define a grid search space. Simplex Optimization (S-OPT) Explore the search space by evolving (i.e., expanding, reflecting, and contracting) a simplex. Automatically explore the search space. Overall Complexity Reduction O ( | θ | ( pm 3 + m 2 n )) − → O ( qm 3 + m 2 ) where q ≪ p Weishi Shi and Qi Yu Multi-label Active Learning

  8. Introduction Methodology Results Benchmark Datasets and Compared Models Summary of Datasets Dataset Domain Instances Features Labels Label Card Label Sparsity Delicious web 8172 500 157 5.56 0.03 BookMark publication 38548 2150 136 3.45 0.02 WebAPI software 9166 5659 90 2.50 0.02 Corel5K images 5000 499 132 3.25 0.02 Bibtex text 7013 1836 127 2.4 0.02 Competitive Active Learning Models for Multi-label Classification Type I models : Perform active learning in a compressed label space (CS-MIML, CS-BR, CS-RR). Type II models : Perform active learning in the original label space (MMC, Adaptive). Weishi Shi and Qi Yu Multi-label Active Learning

  9. Introduction Methodology Results Comparison Results WEB API data Delicious data Bookmark data Corel5K data Bibtex data 0.30 0.16 0.35 0.28 0.15 0.25 0.30 0.25 0.26 0.14 0.20 0.25 0.24 0.13 F-score F-score F-score F-score F-score 0.22 0.12 0.20 0.20 0.15 0.20 0.11 CS-BPCA-GP CS-BPCA-GP CS-BPCA-GP 0.15 CS-BPCA-GP 0.18 CS-BR 0.10 CS-BR CS-BR CS-BR 0.10 0.15 0.16 CS-RR CS-BPCA-GP CS-RR CS-RR CS-RR CS-RR 0.10 0.09 CS-MIML CS-BR CS-MIML CS-MIML CS-MIML CS-MIML 0.14 0.05 0.08 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 Number of active iterations Number of active iterations Number of active iterations Number of active iterations Number of active iterations Comparison Result I Reduced WEB API Dataset Reduced Delicious Dataset Reduced Bookmark Dataset Reduced Colrel5K Dataset Reduced Bibtex Dataset 0.50 0.46 0.40 0.28 0.35 0.45 0.44 0.26 0.30 0.40 0.42 0.35 0.24 0.35 0.40 0.25 F-score F-score F-score F-score 0.22 F-score 0.30 0.38 0.30 0.20 0.20 0.25 0.36 0.18 0.20 0.34 CS-BPCA-GP CS-BPCA-GP CS-BPCA-GP CS-BPCA-GP CS-BPCA-GP 0.15 MMC MMC 0.25 MMC MMC MMC 0.16 0.15 0.32 Adaptive Adaptive Adaptive Adaptive Adaptive 0.10 0.30 0.14 0.10 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 Number of active iterations Number of active iterations Number of active iterations Number of active iterations Number of active iterations Comparison Result II Weishi Shi and Qi Yu Multi-label Active Learning

  10. Introduction Methodology Results Rare Label Prediction Comparison Bookmark data Delicious data Corel5K data Adaptive Adaptive Adaptive CS-BPCA-GP CS-BPCA-GP CS-BPCA-GP 1.0 1.0 1.0 0.8 0.8 0.8 Recall Recall Recall 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.003 0.003 0.004 0.004 0.004 0.004 0.005 0.006 0.008 0.010 0.012 0.029 0.029 0.020 0.020 0.023 0.025 0.028 0.030 0.033 0.041 0.047 0.060 0.067 0.086 0.112 0.177 0.012 0.012 0.012 0.016 0.017 0.020 0.022 0.023 0.025 0.027 0.030 0.039 0.043 0.059 0.069 0.149 0.149 Label Frequency Label Frequency Label Frequency Bibtex data Web-API data 1.0 Adaptive Adaptive CS-BPCA-GP CS-BPCA-GP 1.0 0.8 0.8 0.6 Recall Recall 0.6 0.4 0.4 0.2 0.2 0.006 0.006 0.009 0.009 0.010 0.010 0.014 0.015 0.016 0.018 0.021 0.027 0.035 0.062 0.062 0.008 0.008 0.009 0.011 0.012 0.013 0.016 0.017 0.024 0.043 0.045 0.066 0.109 Label Frequency Label Frequency Rare Label Prediction Comparison The proposed model is effective at detecting rare labels by leveraging label correlation. Weishi Shi and Qi Yu Multi-label Active Learning

  11. Introduction Methodology Results CPU Time of Hyper-parameter Optimization Dataset GA B-OPT S-OPT Delicious 1.83 0.17 0.20 BookMark 15.0 0.80 0.79 WebAPI 10.10 0.54 0.55 Corel5K 0.58 0.08 0.08 Bibtex 8.71 0.48 0.51 The proposed direct search methods learn the kernel parameters 10 ∼ 15 times faster than the gradient based methods. Weishi Shi and Qi Yu Multi-label Active Learning

  12. Introduction Methodology Results Conclusions Propose a two-level CS-BPCA process to generate an optimally compressed, weighted, orthogonal, and continuous target space to support multi-label data sampling. Propose an MOGP based sampling function that accurately captures the covariance structure of the input data. Propose gradient-free hyper-parameter optimization to enable fast online active learning. Apply to real-world multi-label datasets from diverse domains to evaluate the effectiveness of the proposed model. Poster Poster ID: 261 Weishi Shi and Qi Yu Multi-label Active Learning

Recommend


More recommend