on flat versus hierarchical classification in large scale
play

On Flat versus Hierarchical Classification in Large-Scale Taxonomies - PowerPoint PPT Presentation

On Flat versus Hierarchical Classification in Large-Scale Taxonomies R. Babbar, I. Partalas, E. Gaussier, M.-R. Amini Gargantua (CNRS Mastodons) - November the 26 th , 2013 2/21 Challenges Proposed approach Hierarchy Pruning Experiments


  1. On Flat versus Hierarchical Classification in Large-Scale Taxonomies R. Babbar, I. Partalas, ´ E. Gaussier, M.-R. Amini Gargantua (CNRS Mastodons) - November the 26 th , 2013

  2. 2/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Large-scale Hierarchical Classification in Practice ❑ Directory Mozilla ❑ 5 × 10 6 sites ❑ 10 6 categories ❑ 10 5 editors Root Arts Arts Sports Sports Movies Video Tennis Soccer Players Fun Gargantua - Mastodons Massih-Reza.Amini@imag.fr

  3. 3/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Approaches for Large Scale Hierarchical Classification (LSHC) ❑ Hierarchical ❑ Top-down - solve individual classification problems at every node ❑ Big-bang - solve the problem at Root once for entire tree Books Books Music Music ❑ Flat - ignore the taxonomy structure altogether Comics Poetry Rock Jazz ❑ Flattening Approaches in LSHTC Funky Fusion ❑ Somewhat arbitrary as they flatten entire layers ❑ Not quite clear which layers to flatten when taxonomy are much deeper with 10-15 levels Gargantua - Mastodons Massih-Reza.Amini@imag.fr

  4. 4/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Key Challenges in LSHC ❑ How reliable is the given hierarchical structure ? ❑ Arbitrariness in taxonomy creation based on personal biases and choices ❑ Other sources of noise include imbalanced nature of hierarchies ❑ Which Approach - Flat or Hierarchical ? ❑ Lack of clarity on exploiting the hierarchical structure of categories ❑ Speed versus Accuracy trade-off Gargantua - Mastodons Massih-Reza.Amini@imag.fr

  5. 5/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound ⊥ ❑ hierarchy of classes H = ( V , E ) is defined in the form of a rooted tree, with a root ⊥ and a parent relationship π ❑ Nodes at the leaf level, Y = { y ∈ V : ∄ v ∈ V , ( y , v ) ∈ E } ⊂ V , constitute the set of target classes ❑ ∀ v ∈ V \ {⊥} , we define the set of its sisters S ( v ) = { v ′ ∈ V \ {⊥} ; v � = v ′ ∧ π ( v ) = π ( v ′ ) } and its daughters D ( v ) = { v ′ ∈ V \ {⊥} ; π ( v ′ ) = v } ❑ ∀ y ∈ Y , P ( y ) = { v y 1 , . . . , v y k y ; v y 1 = π ( y ) ∧ ∀ l ∈ { 1 , . . . , k y − 1 } , v y l +1 = π ( v y l ) ∧ π ( v y k y ) = ⊥} Gargantua - Mastodons Massih-Reza.Amini@imag.fr

  6. 5/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound ⊥ Y ❑ hierarchy of classes H = ( V , E ) is defined in the form of a rooted tree, with a root ⊥ and a parent relationship π ❑ Nodes at the leaf level, Y = { y ∈ V : ∄ v ∈ V , ( y , v ) ∈ E } ⊂ V , constitute the set of target classes ❑ ∀ v ∈ V \ {⊥} , we define the set of its sisters S ( v ) = { v ′ ∈ V \ {⊥} ; v � = v ′ ∧ π ( v ) = π ( v ′ ) } and its daughters D ( v ) = { v ′ ∈ V \ {⊥} ; π ( v ′ ) = v } ❑ ∀ y ∈ Y , P ( y ) = { v y 1 , . . . , v y k y ; v y 1 = π ( y ) ∧ ∀ l ∈ { 1 , . . . , k y − 1 } , v y l +1 = π ( v y l ) ∧ π ( v y k y ) = ⊥} Gargantua - Mastodons Massih-Reza.Amini@imag.fr

  7. 5/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound ⊥ S ( v ) v D ( v ) ❑ hierarchy of classes H = ( V , E ) is defined in the form of a rooted tree, with a root ⊥ and a parent relationship π ❑ Nodes at the leaf level, Y = { y ∈ V : ∄ v ∈ V , ( y , v ) ∈ E } ⊂ V , constitute the set of target classes ❑ ∀ v ∈ V \ {⊥} , we define the set of its sisters S ( v ) = { v ′ ∈ V \ {⊥} ; v � = v ′ ∧ π ( v ) = π ( v ′ ) } and its daughters D ( v ) = { v ′ ∈ V \ {⊥} ; π ( v ′ ) = v } ❑ ∀ y ∈ Y , P ( y ) = { v y 1 , . . . , v y k y ; v y 1 = π ( y ) ∧ ∀ l ∈ { 1 , . . . , k y − 1 } , v y l +1 = π ( v y l ) ∧ π ( v y k y ) = ⊥} Gargantua - Mastodons Massih-Reza.Amini@imag.fr

  8. 5/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound ⊥ P ( y ) y ❑ hierarchy of classes H = ( V , E ) is defined in the form of a rooted tree, with a root ⊥ and a parent relationship π ❑ Nodes at the leaf level, Y = { y ∈ V : ∄ v ∈ V , ( y , v ) ∈ E } ⊂ V , constitute the set of target classes ❑ ∀ v ∈ V \ {⊥} , we define the set of its sisters S ( v ) = { v ′ ∈ V \ {⊥} ; v � = v ′ ∧ π ( v ) = π ( v ′ ) } and its daughters D ( v ) = { v ′ ∈ V \ {⊥} ; π ( v ′ ) = v } ❑ ∀ y ∈ Y , P ( y ) = { v y 1 , . . . , v y k y ; v y 1 = π ( y ) ∧ ∀ l ∈ { 1 , . . . , k y − 1 } , v y l +1 = π ( v y l ) ∧ π ( v y k y ) = ⊥} Gargantua - Mastodons Massih-Reza.Amini@imag.fr

  9. 6/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound ❑ We consider a top-down hierarchical classification strategy ; ❑ Let K : X × X → R be a PDS kernel and let Φ : X → H be the associated feature mapping function, we suppose that there exists R > 0 such that K ( x , x ) ≤ R 2 for all x ∈ X ; ❑ We consider the class of functions f ∈ F B = { f : ( x , v ) ∈ X × V �→ � Φ( x ) , w v � | W = ( w 1 . . . , w | V | ) , || W || H ≤ B } ; ❑ An exemple ( x , y ) is misclassified iff by f ∈ F B v ′ ∈ S ( v ) f ( x , v ′ )) ≤ 0 v ∈ P ( y ) ( f ( x , v ) − max min Gargantua - Mastodons Massih-Reza.Amini@imag.fr

  10. 6/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound root ⊥ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ y ◦ ❑ An exemple ( x , y ) is misclassified iff by f ∈ F B v ′ ∈ S ( v ) f ( x , v ′ )) ≤ 0 v ∈ P ( y ) ( f ( x , v ) − max min Gargantua - Mastodons Massih-Reza.Amini@imag.fr

  11. 6/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound root ⊥ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ × ◦ ◦ ◦ y ◦ ❑ An exemple ( x , y ) is misclassified iff by f ∈ F B v ′ ∈ S ( v ) f ( x , v ′ )) v ∈ P ( y ) ( f ( x , v ) − max min ≤ 0 � �� � multi-class margin Gargantua - Mastodons Massih-Reza.Amini@imag.fr

  12. 6/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound root ⊥ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ × ◦ ◦ ◦ y ◦ ❑ Top-Down hierarchical techniques suffer from error propagation, but imbalancement harms less as it does for flat approaches ⇒ a generalization bound to study these effects. Gargantua - Mastodons Massih-Reza.Amini@imag.fr

  13. 7/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Hierarchical Rademacher-based Generalization Bound Theorem Let S = (( x ( i ) , y ( i ) )) m i =1 an i.i.d. training set drawn according to a probability distribution D over X × Y , and let A be a Lipschitz function with constant L dominating the 0 / 1 loss; further let K : X × X → R be a PDS kernel and let Φ : X → H be the associated feature mapping function. Assume R > 0 such that K ( x , x ) ≤ R 2 for all x ∈ X . Then, with probability at least (1 − δ ) the following bound holds for all f ∈ F B = { f : ( x , v ) ∈ X × V �→ � Φ( x ) , w v � | W = ( w 1 . . . , w | V | ) , || W || H ≤ B } : m � E ( g f ) ≤ 1 A ( g f ( x ( i ) , y ( i ) )) + 8 BRL ln(2 /δ ) � � √ m | D ( v ) | ( | D ( v ) | − 1) + 3 m 2 m i =1 v ∈ V \Y (1) where G F B = { g f : ( x , y ) ∈ X × Y �→ min v ∈ P ( y ) ( f ( x , v ) − max v ′ ∈ S ( v ) f ( x , v ′ )) | f ∈ F B } and | D ( v ) | denotes the number of daughters of node v. Gargantua - Mastodons Massih-Reza.Amini@imag.fr

  14. 8/21 Challenges Proposed approach Hierarchy Pruning Experiments Conclusion and Future Work Experiments - II Extension of an existing result for flat multi-class classification Theorem (Guermeur, 2007) Let S = (( x ( i ) , y ( i ) )) m i =1 an i.i.d. training set drawn according to a probability distribution D over X × Y , and let A be a Lipschitz function with constant L dominating the 0 / 1 loss; further let K : X × X → R be a PDS kernel and let Φ : X → H be the associated feature mapping function. Assume R > 0 such that K ( x , x ) ≤ R 2 for all x ∈ X . Then, with probability at least (1 − δ ) the following bound holds for all f ∈ F B = { f : ( x , y ) ∈ X × Y �→ � Φ( x ) , w y � | W = ( w 1 . . . , w |Y| ) , || W || H ≤ B } : m � E ( g f ) ≤ 1 A ( g f ( x ( i ) , y ( i ) )) + 8 BRL ln(2 /δ ) � √ m |Y| ( |Y| − 1) + 3 (2) m 2 m i =1 where G F B = { g f : ( x , y ) ∈ X × Y �→ ( f ( x , y ) − max y ′ ∈Y\{ y } f ( x , y ′ )) | f ∈ F B } . Gargantua - Mastodons Massih-Reza.Amini@imag.fr

Recommend


More recommend