Pattern-Based Classification: A Unifying Perspective LeGo Slovenia, Bled 2009 07.09.2009 Albrecht Zimmermann, Siegfried Nijssen, Björn Bringmann Katholieke Universiteit Leuven, Belgium
Observations The LeGo schema Pattern Feature Model PS PS M DB Mining Selection Induction General schema Augment/replaces data mining step in KDD Topic of this workshop
Observations (cont.) Pattern Feature Model PS PS M DB Mining Selection Induction Exhaustive Frequent Decision Tree Closed Heuristic Decision List Correlating SVM
Observations (cont.) Pattern Feature Model PS PS M DB Mining Selection Induction Exhaustive Frequent Decision Tree Closed Heuristic Decision List Correlating SVM No overview Ramamohanarao et al ‘07
Observations (cont.) Pattern Feature Model PS PS M DB Mining Selection Induction Exhaustive Frequent Decision Tree Closed Heuristic Decision List Correlating SVM No overview → reinventions → revisited dead ends → lost progress
Observations (cont.) Pattern Feature Model PS PS M DB Mining Selection Induction Exhaustive Frequent Decision Tree Closed Heuristic Decision List Correlating SVM No overview → reinventions → revisited dead ends → lost progress
What patterns and how? Which pattern type Which data-structure Itemsets FP-Trees Multi-itemsets ZBDDs Sequences TID-Lists Trees Bit-Vectors Graphs
What patterns and how? Which pattern type Which data-structure Itemsets FP-Trees Results hold for Multi-itemsets ZBDDs lattices (itemsets) or even Sequences TID-Lists partial orders (graphs) Trees Bit-Vectors Independent of Graphs Pattern Type Sequences ⊂ Trees ⊂ Graphs
What patterns and how? Which pattern type Which data-structure Itemsets FP-Trees Results hold for Multi-itemsets ZBDDs lattices (itemsets) or even Sequences TID-Lists partial orders (graphs) Trees Bit-Vectors Independent of Independent of Graphs Pattern Type Data Structure Sequences ⊂ Trees ⊂ Graphs
Why mine explicit patterns? Traditional classification E X C U R S U S Attributes: {A 1 ,...,A d } Why should we care in Values: V(A) = {v 1 ,...,v r } the first place? Decision Trees: Rules: A 1 =v 2 A 1 =v 2 ∧ A 4 =v 1 ⇒ + apart from attending the workshop A 3 =v 2 ∧ A 2 =v 1 ⇒ - A 4 =v 1 A 3 =v 2
Why mine explicit patterns? Traditional classification Attributes: {A 1 ,...,A d } Values: V(A) = {v 1 ,...,v r } Decision Trees: Rules: A 1 =v 2 A 1 =v 2 ∧ A 4 =v 1 ⇒ + A 3 =v 2 ∧ A 2 =v 1 ⇒ - A 4 =v 1 A 3 =v 2
Why mine explicit patterns? Pattern based classification Transactions are Structured t ⊆ {i 1 ,...,i ℑ } Patterns provide instance description Models can be built independent of data type Yield interpretable classifiers Alternatives are opaque (Kernels, NN, ...)
Thus leverage pattern mining techniques Advantages: 15 years of research → fast and scaleable Described in structured language → persistent, not opaque Challenge(s): (Re-)Entangle instance description and classification
Roadmap Class-sensitive patterns & the mining thereof Model-independence Post-processing Iterative Mining Model-dependence Post-processing Iterative Mining
Roadmap D I S C L A I M E R Class-sensitive patterns & the mining thereof We will probably miss Model-independence some approaches that Post-processing should have been Iterative Mining included in the presentation. Model-dependence Post-processing which just proves our point Iterative Mining
Should we use frequent patterns? • Well-researched • Which threshold? • Frequent → expected • Frequent → no/anti- to hold on unseen correlation w/classes • Efficient mining • (Too) many patterns Pattern Feature Model PS PS M DB Mining Selection Induction
New Item! Class-sensitive patterns Taking relationship to class-labels into account 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 Jumping Emerging Patterns ’01 (JEP) Interesting Rules ’98 (IR) Emerging Patterns ’99 (EP) Nuggets ‘94 Subgroup Descriptions ’96 (SGD) Contrast Sets ’99 (CS) Version Space Patterns ‘01 Correlating Patterns ’00 (CP) Discriminative Patterns ’07 (DP) Class-Association Rules ’98 (CAR) Taking no sides/not subscribing to particular universe
Evaluating class-sensitivity Confidence, Lift, WRAcc (Novelty), X 2 , Correlation Coefficient, Information Gain, Fisher Score Some of them mathematically equivalent, some semantically Lavrac et al. ‘09
How to mine them? Mining frequent patterns & Bounding specific measure post-processing Wrobel ’97 (SGD) Liu et al. ’98 (CAR) Bay et al. ’99 (CS) Kavask et al. ’06 (SGD) Wang et al. ’05 (CAR) Atzmüller et al. ’06 (SGD) Arunasalam et al. ’06 (CAR) Cheng et al. ’07 (DP) Nowozin et al. ’07 (CAR) Cheng et al. ’08 (DP) (1 bound) CAR - Class Association Rules CS - Contrast Sets DP - Discriminative Patterns SGD - SubGroup Descriptions
How to? (cont.) General Branch-and-bound Iterative deepening Webb ’95 (CAR) Bringmann et al. ’06 (CP) Klösgen ’96 (SGD) Cerf et al. ’08 (CAR) Morishita et al. ’00 Yan et al. ’08 (DP) (2-bounds) Sequential sampling Grosskreutz et al. ’08 (SGD) Scheffer et al. ’02 (SGD) Nijssen et al. ’09 (4-bounds)* Earlier than most specifics, subsumes them! *) itemset-specific, constraint programming
What traversal strategy Seriously ?
Result sets Are still too big May include irrelevant patterns May include much redundancy
The (extended) LeGo Pattern Feature Model PS PS M DB Mining Selection Induction Pattern set constraint Model constraint
The (extended) LeGo Mining Constraint Optimisation Criteria Pattern Feature Model PS PS M DB Mining Selection Induction Model constraint
The (extended) LeGo Model-Independent Mining Constraint Iterative Mining Model-Independent Optimisation Post-Processing Criteria Pattern Feature Model PS PS M DB Mining Selection Induction Model constraint
The (extended) LeGo Model-Independent Iterative Mining Model-Independent Post-Processing Pattern Feature Model PS PS M DB Mining Selection Induction Optimisation Criteria Mining Constraint
The (extended) LeGo Model-Independent Iterative Mining Model-Independent Post-Processing Pattern Feature Model PS PS M DB Mining Selection Induction Model-Dependent Post-Processing Model-Dependent Iterative Mining
Model-independence Model-Independent Iterative Mining Model-Independent Post-Processing Pattern Feature Model PS PS M DB Mining Selection Induction Only patterns affect other patterns’ selection Modular: usable in any classifier (often SVM)
Model independent Post-processing Model-Independent Post-Processing Pattern Feature Model PS PS M DB Mining Selection Induction Mine large set of patterns Select subset Exhaustively: too expensive Heuristically: usually ordered Use measure to quantify combined worth
Model independent Post-Processing Pattern Set Scores • Pattern sets can be scored based on computable for all data types • TID lists of patterns only • significance: incorporate support/class-sensitivity • redundancy: similarity between TID lists requires specialization • Pattern structure & TID lists • using a pattern distance measure • by computing how well the patterns compress data Pattern Feature Model PS PS M DB Mining Selection Induction
Model independent Post-Processing Exhaustive D I S C L A I M E R Knobbe et al. ’06 De Raedt et al. ’07 Exhaustive enumeration Exhaustive enumeration The following Explicit size constraint Arbitrary constraints algorithms should be Boundable pruning Monotone, boundable pruning Implicit redundancy control Explicit redundancy control considered illustrating (entropy) examples, NOT recommendations! other approaches vary Extremely large search space -> scalability issues Counter-intuitive result: all sets Pattern Feature Model PS PS M DB Mining Selection Induction
Model independent Post-Processing Exhaustive Knobbe et al. ’06 De Raedt et al. ’07 Exhaustive enumeration Exhaustive enumeration Explicit size constraint Arbitrary constraints Boundable pruning Monotone, boundable pruning Implicit redundancy control Explicit redundancy control (entropy) Extremely large search space -> scalability issues Counter-intuitive result: all sets Pattern Feature Model PS PS M DB Mining Selection Induction
Model independent Post-Processing Heuristic Search Strategies • Fixed Order: Scan patterns in (possibly random) fixed order, add each pattern that improves running score (O(n)) P6 P1 P4 P2 P3 P5 P7 P8 P9 • Greedy: Repeatedly reorder patterns to pick pattern that improves score most (O(n2)) P1 P2 P3 P4 P5 P6 P7 P8 P9
Model independent Post-Processing Heuristic Search Strategies • Fixed Order: Scan patterns in (possibly random) fixed order, add each pattern that improves running score (O(n)) P6 P1 P4 P2 P3 P5 P7 P8 P9 • Greedy: Repeatedly reorder patterns to pick pattern that improves score most (O(n2)) P7 P5 P1 P3 P8 P9 P2 P4 P6
Recommend
More recommend