aggressive double sampling for reducing multi class
play

Aggressive Double Sampling for Reducing Multi-class Classification - PowerPoint PPT Presentation

Aggressive Double Sampling for Reducing Multi-class Classification to Binary Classification Bikash Joshi (PhD Student) AMA team, LIG Supervised By: Prof. Massih-Reza Amini and Dr. Franck Iutzeler March 20, 2017 Bikash Joshi (AMA team, LIG)


  1. Aggressive Double Sampling for Reducing Multi-class Classification to Binary Classification Bikash Joshi (PhD Student) AMA team, LIG Supervised By: Prof. Massih-Reza Amini and Dr. Franck Iutzeler March 20, 2017 Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 1 / 27

  2. Outline Introduction 1 Multiclass to Binary Reduction 2 Double-Sampled Multiclass to Binary Reduction 3 Experimental Results 4 Conclusion 5 Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 2 / 27

  3. Outline Introduction 1 Multiclass to Binary Reduction 2 Double-Sampled Multiclass to Binary Reduction 3 Experimental Results 4 Conclusion 5 Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 3 / 27

  4. Multiclass Classification: Introduction Figure : Digit Figure : Image Figure : Text Classification Classification Classification Finite set of categories (K > 2) Popular applications: image and text classification. Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 4 / 27

  5. Multiclass classification: Related Work 1 Combined approaches based on binary classification: ◮ One-Vs-Rest ⋆ One binary problem for each class ⋆ K binary problems ⋆ O(K × d) ◮ One-Vs-One ⋆ One binary problem for each pair of classes ⋆ O( K 2 × d) 2 Uncombined Approaches ◮ for example: multiclass SVM, MLP ◮ One scoring function per class 3 Logarithmic Time Algorithms ◮ For example: logTree, Recall-Tree ◮ Each leaf node represents a class ◮ O(logK) Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 5 / 27

  6. Multiclass classification : Challenges The number of classes, K, in new emerging multiclass problems, for example in text and image classification, may reach 10 5 to 10 6 categories. For example: ◮ 4 × 10 6 sites ◮ 10 6 categories ◮ 10 5 editors ◮ Imbalanced nature of hierarchies Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 6 / 27

  7. Multiclass classification : Challenges Class imbalance problem Majority of classes have few representative examples Long tailed distribution 4000 DMOZ-7500 3500 3000 2500 # Classes 2000 1500 1000 500 0 2-5 6-10 11-30 31-100 101-200 >200 # Documents Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 7 / 27

  8. Text Classification: Task: Automatic classification of an example text to one of fixed set of categories. Feature Representation: Bag of Words: ◮ From training corpus extract vocabulary. ◮ Represent each terms as 0 or 1 ◮ Highly sparse Document-class joint feature representation: ◮ Inspired by learning to rank ◮ Similarity features between an example and class of examples ◮ For example: � 1 t ∈ y ∩ x Where, x → One document y → Class of documents Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 8 / 27

  9. Outline Introduction 1 Multiclass to Binary Reduction 2 Double-Sampled Multiclass to Binary Reduction 3 Experimental Results 4 Conclusion 5 Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 9 / 27

  10. Motivation of our work Baselines: Model complexity increases with classes(K) and feature dimension (d). Algorithm that scales well for large scale data Does not suffer from class imbalance problem Less complex model Competitive with the state of the art approaches Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 10 / 27

  11. Framework X ⊆ R d : Input Space Y = 1,...,K : Output Space S = ( x y i i ) m i =1 : Training set of i.i.d. pairs G = g : X × Y → R : Class of predictors Instantaneous Loss 1 � e ( g , x y ) = (1) ✶ g ( x y ) ≤ g ( x y ′ ) K − 1 y ′ ∈Y\ y ✶ π is the indicator function (Value is 0 or 1) Average number of classes that get greater scoring by g than true class Ranking loss used in Multiclass-SVM a a Weston et. al. (1998) Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 11 / 27

  12. Framework Empirical Loss Empirical error of g ∈ G over S is: m 1 � � L m ( g , S ) = (2) ✶ i ) ≤ g ( x y ′ g ( x yi m ( K − 1) ) i y ′ ∈ Y \ y i i =1 m 1 � � = (3) ✶ ′ m ( K − 1) h ( x y i i , x y i ) ≤ 0 i =1 y ′ ∈ Y \ y i � �� � ) − g ( xy ′ g ( xyi ) i i Resembles to binary-classification-loss based risk Selection of a hypothesis in G minimizing risk over S is equivalent to search a hypothesis in H minimizing risk over T(S) of size m × ( K − 1) Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 12 / 27

  13. Multiclass to binary reduction example We consider the following transformation �� � � � � � x k i , x y i z j = , ˜ y j = − 1 if k < y i i T ( S ) = , � � � � x y i i , x k z j = , ˜ y j = +1 elsewhere j . i =( i − 1)( K − 1)+ k | T ( S ) | = m × (K - 1) Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 13 / 27

  14. Multiclass to binary reduction algorithm [Bikash et al. 2015] Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 14 / 27

  15. Improvements and New challenges Improvements: One parameter vector for all classes. Low-dimensional feature space. Overcome class imbalance. New Challenges: Number of transformations huge for larger K Large computational overhead Large memory requirement Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 15 / 27

  16. Outline Introduction 1 Multiclass to Binary Reduction 2 Double-Sampled Multiclass to Binary Reduction 3 Experimental Results 4 Conclusion 5 Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 16 / 27

  17. Aggressive double sampling 1 Drawing uniformly µ examples per class, in order to form practical set S µ ; ◮ Reduce redundancy in examples ◮ Emphasizing rare classes 2 For each example x y in S µ , drawing uniformly κ adversarial classes in Y\{ y } . ◮ Reduces time complexity ◮ Low memory requirement Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 17 / 27

  18. Double Sampled Multi to Binary Reduction Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 18 / 27

  19. Outline Introduction 1 Multiclass to Binary Reduction 2 Double-Sampled Multiclass to Binary Reduction 3 Experimental Results 4 Conclusion 5 Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 19 / 27

  20. Experimental Setup Datasets: Application: Text Classification DMOZ and Wikipedia datasets. (LSHTC challenge) Pre-processed with stop word removal and stemming. Random samples of 1000, 2000, 3000, 4000, 5000, 7500, 10000, 20000. Comparison: DS-m2b: Proposed double sampled multiclass to binary algorithm OVA: One-Vs-All algorithm M-SVM: Crammar-Singer implementation of multiclass SVM Recall Tree: Hierarchical One-Vs-Some algorithm Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 20 / 27

  21. Feature representation Φ( x y ) Features ln(1 + l S � � 1. ln(1 + y t ) 2. S t ) t ∈ y ∩ x t ∈ y ∩ x ln(1 + y t � � 3. I t 4. | y | ) t ∈ y ∩ x t ∈ y ∩ x ln(1 + y t ln(1 + y t | y | . l S � � 5. | y | . I t ) 6. S t ) t ∈ y ∩ x t ∈ y ∩ x y t � � 7. 1 8. | y | . I t t ∈ y ∩ x t ∈ y ∩ x 10. d ( x y , centroid ( y )) 9. BM25 x t : number of occurrences of terme t in document x , V : Number of distinct terms in S , y t = � x ∈ y x t , | y | = � t ∈V y t , S t = � x ∈S x t , l S = � t ∈V S t . I t : idf of the term t , Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 21 / 27

  22. Results: Runtime Comparison 6 10 OVA MSVM Recall Tree 5 10 DS-m2b Total runtime (seconds) 4 10 3 10 2 10 1 10 1000 3000 5000 7500 10000 20000 # of classes Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 22 / 27

  23. Results: Memory Comparison 60 OVA MSVM 50 Recall Tree DS-m2b Total memory usage (GB) 40 32GB Limit 30 20 16GB Limit 10 0 1000 3000 5000 7500 10000 20000 # of classes Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 23 / 27

  24. Results: Prediction Performance Comparison 0.6 OVA MSVM 0.5 Recall Tree DS-m2b 0.4 MAF 0.3 0.2 0.1 0.0 1000 3000 5000 7500 10000 20000 # of classes Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 24 / 27

  25. Outline Introduction 1 Multiclass to Binary Reduction 2 Double-Sampled Multiclass to Binary Reduction 3 Experimental Results 4 Conclusion 5 Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 25 / 27

  26. Conclusion: Multiclass to binary reduction to handle large-class scenario and overcome class imbalance problem. Use of double sampling to further improve computational complexity and memory usage. Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 26 / 27

  27. Questions? Bikash Joshi (AMA team, LIG) Multi to binary reduction March 20, 2017 27 / 27

Recommend


More recommend