Merging Classifiers of Different Classification Approaches Incremental Classification, Concept Drift and Novelty Detection Workshop Antonina Danylenko 1 and Welf L¨ owe 1 antonina.danylenko@lnu.se 14 December, 2014 1 Linnaeus University, Sweden Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 1(28)
Agenda ◮ Introduction; ◮ Problem, Motivation, Approach; ◮ Decision Algebra; ◮ Merge as an Operation of Decision Algebra; ◮ Merging Classifiers; ◮ Experiments; ◮ Conclusions. Agenda Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 2(28)
Introduction ◮ Classification is a common problem that arises in different fields of Computer Science (data mining, information storage and retrieval, knowledge management); ◮ Classification approaches are often tightly coupled to: ◮ learning strategies: different algorithms are used; ◮ data structures: represent information in different ways; ◮ how common problems are addressed: workarounds; ◮ It is not that easy to select an appropriate classification model for classification problem (be aware of accuracy, robustness, scalability); Introduction Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 3(28)
Problem and Motivation ◮ Simple combining of classifiers learned over different data sets of the same problem is not straightforward; ◮ Current work is done in aggregation and meta-learning: ◮ combine different classifiers learned over same data set; ◮ construct single classifier learned on the different variations of the same classification problem; ◮ as a result - do not take into account that the context can differ. ◮ Combining classifiers with partly- or completely- disjoint contexts use one single classification approach for base-level classifiers; ◮ Generality gets lost: incomparable, difficult benchmarking, hard to propagate advances between domains; Introduction Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 4(28)
Proposed Approach ◮ Use Decision Algebra that defines classifiers as re-usable black-boxes in terms of so-called decision functions; ◮ Define a general merge operation over these decisions functions which allows for symbolic computations with classification information captured; ◮ Show an example of merging classifiers of different classification approaches; ◮ Show that the merger of classifiers tendentiously becomes more accurate; Introduction Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 5(28)
Classification Information ◮ Classification information is a set of decision tuples: CI = { ( � a 1 , c 1 ) , . . . ( � a n , c n ) } a ∈ � ◮ It is complete if: ∀ � A : ( � a , c ) ∈ CI ; ◮ It is non-contradictive if: ∀ ( � a i , c i ) , ( � a j , c j ) ∈ CI : � a i = � a j ⇒ c i = c j ; ◮ Problem domain ( A , C ) of CI is a superset of � A × C , that defines the actual classification problem, where � A ∈ A ; Decision Algebra Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 6(28)
Decision Function ◮ Decision Function is a representation of complete and possibly contradictive decision information: df : � A → D ( C ) a ∈ � maps actual context � A to a (probability) distribution D ( C ); ◮ It is a higher order (or curried) function: df n : A n → ( A n − 1 → ( . . . ( A 1 → ( → D ( C ))))); ◮ Can be easily represented as a decision tree or decision graph: df n = x 1 ( df n − 1 , . . . , df n − 1 | Λ 1 | ) 1 where Λ i is a domain of attribute A 1 Decision Algebra Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 7(28)
Graph Representation of Decision Function ◮ Decision function df 2 = x 1 ( na , x 2 ( na , na , a , a ) , x 2 ( na , na , a , a ) , a ) na na a a na na a a na a high med vhigh high med low vhigh low na 2 2 a 2 high med low vhigh 1 1 Figur: A tree (left) and graph (right) representation of df 2 . Each node labeled with n represents a decision term with a selection operator x n ; each square leaf node labled with c corresponds to a probability distribution over classes C with c the most probable class. Decision Algebra Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 8(28)
Decision Algebra ◮ ( DA ) is a theoretical framework that is defined as a parameterized specification, with � A and D ( C ) as parameters. It provides a general representation of classification information as an abstract classifier; Decision Algebra Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 9(28)
Operations Over Decision Functions ◮ Constructor x n : x n : Λ 1 × DF [ � A ′ , D ] × · · · × Λ 1 × DF [ � A ′ , D ] → DF [ � A , D ] � �� � | Λ 1 | times ◮ Bind binds attribute A i to an attribute value a ∈ Λ i : DF [ � A , D ] × Λ i → DF [ � A ′ , D ] : bind A i ( x n ( a 1 , df 1 , · · · , a | Λ 1 | , df | Λ 1 | ) , a ) ≡ df i , if a = a i bind A 1 ( df 2 , high) = x 2 ( na , na , a , a ) bind A 1 ◮ Evert changes the order of attributes in the decision function: DF [ � A , D ] → DF [ � A ′ , D ] : evert A i evert A i ( df ) := x ( a 1 , bind A i ( df , a 1 ) , . . . , a | Λ i | , bind A i ( df , a | Λ i | )) evert A 2 ( df 2 ) x 2 ( x 1 ( na , na , na , a ) , x 1 ( na , na , na , a ) , = x 1 ( na , a , a , a ) , x 1 ( na , a , a , a )) Merge as an Operation of Decision Algebra Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 10(28)
Merge Operation over Decision Functions ◮ Merge operator ⊔ D over class distribution D ( C ); ⊔ D : D ( C ) × D ( C ) → D ( C ) d ( C ) ⊔ D d ′ ( C ) = { ( c , p + p ′ ) | ( c , p ) ∈ d ( C ) , ( c , p ′ ) ∈ d ′ ( C ) } ◮ General merge operation over decision functions : ⊔ : DF 1 [ � A , D ] × DF 2 [ � A , D ] → DF ′ [ � A , D ] 0 ∈ DF ∅ [ { � 0 , df 2 ◮ Merge over constant decision functions df 1 0 } , D ]: ⊔ ( df 0 1 , df 0 x 0 ( ⊔ D ( df 0 1 , df 0 2 ) := 2 )) Merge as an Operation of Decision Algebra Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 11(28)
Scenario One: Same Formal Context ◮ Prerequisite : The decision functions df 1 ∈ DF 1 [ � A , D ] and df 2 ∈ DF 2 [ � A ′ , D ] are constructed over different samples of the A ′ = Λ 1 × . . . × Λ n ; same problem domain and � A = � ⊔ ( df 1 , df 2 ) := x n ( a 1 , ⊔ ( bind A 1 ( df 1 , a 1 ) , bind A 1 ( df 2 , a 1 )) , . . . , a k , ⊔ ( bind A 1 ( df 1 , a k ) , bind A 1 ( df 2 , a k ))) Merging Classifiers Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 12(28)
Scenario One: Cont’d na a low high vhigh med 2 a na med high high vhigh med vhigh 1 1 vg 1: if df 1 ∈ DF ∅ [ { � low low 0 } , D ] ∧ df 2 ∈ (a) (b) DF ∅ [ { � 0 } , D ] then 2: return x ( ⊔ D ( df 1 , df 2 )) na na, a 3: end if vhigh 4: for all a ∈ Λ 1 do high med 5: = df a na 2 low ⊔ ( bind 1 ( df 1 , a ) , bind 1 ( df 2 , a )) (c.1) (c.2) 6: end for 7: return na, a a x ( a 1 , df a 1 , . . . , a | Λ 1 | , df a | Λ1 | ) vhigh high med high med vhigh na 2 a low 2 med a, vg low low high 1 (c.3) (c.4) vhigh (c) (d) Merging Classifiers Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 13(28)
Scenario Two: Disjoint Formal Contexts ◮ Prerequisite : The decision functions df 1 ∈ DF 1 [ � A , D ] and df 2 ∈ DF 2 [ � A ′ , D ] are constructed over samples with disjoint formal contexts of the same problem domain: � A = Λ 1 × . . . × Λ n and A ′ = Λ ′ � 1 × . . . × Λ ′ m and attributes { A 1 , . . . , A n } ∩ { A ′ 1 , . . . , A ′ m } = ∅ ; ⊔ ( df 1 , df 2 ) := x n ( a 1 , ⊔ ( bind A 1 ( df 1 , a 1 ) , bind A 1 ( df 2 , a 1 )) , . . . , a k , ⊔ ( bind A 1 ( df 1 , a k ) , bind A 1 ( df 2 , a k ))) ⊔ ( df 0 df 2 , df 0 1 , df 2 ) := ⊔ ( 1 ) Merging Classifiers Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 14(28)
Scenario Two: Cont’d acc 2 5more 3 4 na g 3 acc 1: if df 1 ∈ DF ∅ [ { � low 0 } , D ] ∧ df 2 ∈ vhigh 2 high more 4 DF ∅ [ { � 0 } , D ] then 1 vg 4 med low 2: return x ( ⊔ D ( df 1 , df 2 )) 5 6 3: end if (a) (b) 4: if df 1 ∈ DF ∅ [ { � 0 } , D ] then acc 5: return ⊔ ( df 2 , df 1 )) 2 6: end if more 4 vg, acc 4 7: for all a ∈ Λ 1 do acc acc, g acc, na low low 8: df a = more more 6 6 more 2 2 ⊔ ( bind 1 ( df 1 , a ) , bind 1 ( df 2 , a )) 2 4 4 vg, g 4 vg, na 4 g vg 4 4 2 2 9: end for low low low low 5more 5more 3 3 10: return 6 3 6 6 3 6 4 4 low low x ( a 1 , df a 1 , . . . , a | Λ 1 | , df a | Λ1 | ) vhigh vhigh 1 1 high high med med 5 5 (c) (d) Merging Classifiers Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 15(28)
Recommend
More recommend