Dynamic Re-ordering in Mining Top- k Productive Discriminative Patterns Yoshitaka Kameya * and Ken’ya Ito Meijo University TAAI-17 1
Outline • Background • Dynamic re-ordering in mining top- k productive discriminative patterns • Experiments • Related work and Conclusion TAAI-17 2
Outline • Background • Dynamic re-ordering in mining top- k productive discriminative patterns • Experiments • Related work and Conclusion TAAI-17 3
Background: Discriminative Patterns (1) • Discriminative patterns: – Show differences between two groups (classes) – Used for: • Characterizing the positive class • Building more precise classifiers Discriminative pattern x milk=True aquatic=False ➔ + + :Positive class – :Negative class Positive class Class labels TAAI-17 4
Background: Discriminative Patterns (2) • Discriminative patterns tend to be more meaningful than frequent patterns (thanks to class labels) • Are class labels always available? – Comparing groups is a standard (and promising) starting point in data analysis – Clustering can find groups (classes) ! → Cluster labeling Clusters labeled with discriminative patterns Clusters .... Original data 2. Discriminative 1. Clustering .... pattern mining .... TAAI-17 5
Background: Discriminative Patterns (3) • Quality score: Measures the overlap between pattern x and positive class c c x c x Quality is high Quality is low • Most of popular quality scores are not anti-monotonic: – Confidence, Lift – Support difference, Weighted relative accuracy, Leverage – F-score, Dice, Jaccard – ... ➔ Branch & bound pruning is often used [Morishita+ 00][Zimmermann+ 09][Nijssen+ 09] TAAI-17 6
Background: B&B Pruning for Top- k Patterns • Suppose: we are visiting a pattern x in a depth-first search • We compute the upper bound U ( x ) of its quality R ( x ) ( U ( x ) = an optimistic estimate of qualities of x ’s extensions ) • We prune the subtree below x if U ( x ) < R ( z ) , where z is the k -th candidate We are visiting here A B C D AB AC BC AD BD x =CD Candidate list ABD ABC ACD BCD for tentative top- k patterns ABCD 1 Prune the subtree below x Descending 2 w.r.t. quality if U ( x ) < R ( z ) ! : Optimistic estimate: z k U ( x ) TAAI-17 7
Background: Suffix Enumeration Trees (1) Prefix enumeration tree: A B C D AB AC AD BC BD CD ABC ABD ACD BCD ABCD Suffix enumeration tree: A B C D AB AC BC AD BD CD ABD ABC ACD BCD ABCD TAAI-17 8
Background: Suffix Enumeration Trees (1) • Beneficial for checking the productivity constraint in a depth-first search Prefix enumeration tree: A B C D Productivity constraint: AB AC AD BC BD CD Every pattern must not be of less quality than its sub-pattern ABC ABD ACD BCD ABCD Suffix enumeration tree: A B C D 0.4 0.3 0.5 AB AC BC AD BD CD 0.2 0.4 0.6 ABD ABC ACD BCD 0.5 ABCD ACD will be removed TAAI-17 9
Background: Suffix Enumeration Trees (1) • Beneficial for checking the productivity constraint in a depth-first search Prefix enumeration tree: → NOT “Sub - patterns first” A B C D AB AC AD BC BD CD ABC ABD ACD BCD Suffix enumeration tree: ABCD → “Sub - patterns first” A B C D 0.4 0.3 0.5 AB AC BC AD BD CD 0.2 0.4 0.6 “Sub - patterns first” property: ABD ABC ACD BCD When visiting a pattern x , we have 0.5 ABCD already visited all sub-patterns of x TAAI-17 10
Background: Suffix Enumeration Trees (2) • Also beneficial for effective B&B pruning Suppose: A = the highest quality item, B = the 2 nd highest quality item, Suffix enumeration tree: C = the 3 rd highest quality item, … A B C D ➔ Items of higher quality are A only combined earlier AB AC BC AD BD CD A, B combined ➔ Patterns of higher quality ABD ABC ACD BCD would be visited earlier A, B, C combined ABCD A, B, C, D combined B&B pruning would be more aggressive! Candidate list 1 Descending We prune the subtree below x if U ( x ) < R ( z ) 2 w.r.t. ➔ Threshold in B&B pruning is higher : quality z if z has a higher quality k TAAI-17 11
Outline ✓ Background • Dynamic re-ordering in mining top- k productive discriminative patterns – Basic idea – Justification • Experiments • Related work and Conclusion TAAI-17 12
Outline ✓ Background • Dynamic re-ordering in mining top- k productive discriminative patterns – Basic idea – Justification • Experiments • Related work and Conclusion TAAI-17 13
Our proposal: Basic idea (1) • Basic idea : Re-order sibling patterns dynamically according to their qualities ➔ Patterns of higher quality will be visited yet earlier ➔ B&B pruning will be yet more aggressive siblings A B C D siblings AB AC BC AD BD CD siblings siblings ABD ACD BCD ABC ABCD TAAI-17 14
Our proposal: Basic idea (2) • Example: Dataset – 10 transactions Class Transaction – Quality is measured by F-score + {A, B} + {A, C, E} Positive + {A, D} + {B, C, E} + {B, D} – {A, B, C} – {B, E} – {C, D} Negative – {C, D, E} – {E} TAAI-17 15
Our proposal: Basic idea (4) • Example: Dataset – 10 transactions Class Transaction – Quality is measured by F-score + {A, B} + {A, C, E} Recall of {A} = 3 / 5 = 0.6 Positive + {A, D} Precision of {A} = 3 / 4 = 0.75 + {B, C, E} + {B, D} F-score of {A} = – {A, B, C} 2 * 0.6 * 0.75 / (0.6 + 0.75) = 0.67 – {B, E} – {C, D} Negative – Similarly, we have: – {C, D, E} • F-score of {A} = 0.67 – {E} • F-score of {B} = 0.6 • F-score of {C} = 0.4 • F-score of {D} = 0.44 Static ordering among patterns: • F-score of {E} = 0.4 A < B < D < C < E TAAI-17 16
Our proposal: Basic idea (4) • Example: Dataset – 10 transactions Class Transaction Class Transaction – Quality is measured by F-score + + {A, B} {A, B} + + {A, C, E} {A, C , E } Suffix enumeration tree Positive + + {A, D} {A, D} under static ordering A < B < D < C < E: + + {B, C , E } {B, C, E} + + {B, D} {B, D} – – {A, B, C } {A, B, C} – – {B, E } {B, E} A B D E C 0.4 0.6 0.44 0.4 – – { C , D} {C, D} Negative 0.67 – – { C , D, E } {C, D, E} AB AD BD BE CE AC BC AE 0.5 – – { E } {E} 0.29 0.33 0.33 0.29 0.33 0.29 0.29 ACE BCE (Note) Patterns that do not appear 0.33 0.33 in the dataset are hidden “Sub - patterns first” property holds and we have productive patterns {A}, {B}, {C, E}, {D}, {C}, {E} TAAI-17 17
Our proposal: Basic idea (4) • Example: Dataset – 10 transactions Class Transaction – Quality is measured by F-score + {A, B} + {A, C, E} Suffix enumeration tree Positive + {A, D} with dynamic re-ordering: + {B, C, E} + {B, D} – {A, B, C} – {B, E} A B D E C 0.4 0.6 0.44 0.4 – {C, D} Negative 0.67 – {C, D, E} AB AD BD AE BE AC BC CE – {E} 0.29 0.29 0.33 0.33 0.33 0.29 0.29 0.5 CAE CBE 0.33 0.33 {C, E} comes earlier than before and it is interesting to see the “sub - patterns first” property still holds ➔ Why? TAAI-17 18
Outline ✓ Background • Dynamic re-ordering in mining top- k productive discriminative patterns ✓ Basic idea – Justification • Experiments • Related work and Conclusion TAAI-17 19
Our proposal: Justification (1) • “Sub - patterns first” property is assured even with dynamic re-ordering • Key observation: Visiting order of a search = topological order over a Hasse diagram The search is “sub - patterns first” A B C D AB AC AD BC BD CD ABD ACD BCD ABC ABCD TAAI-17 20
Our proposal: Justification (2) • “Sub - patterns first” property is assured even with dynamic re-ordering • Key observation: Visiting order of a search = topological order over a Hasse diagram The search is “sub - patterns first” Stack Topological sorting by right-to-left traverse A B C D AB AC AD BC BD CD ABD ACD BCD ABC ABCD TAAI-17 21
Our proposal: Justification (2) • “Sub - patterns first” property is assured even with dynamic re-ordering • Key observation: Visiting order of a search = topological order over a Hasse diagram The search is “sub - patterns first” Stack Topological sorting by right-to-left traverse A B C D AB AC AD BC BD CD ABD ACD BCD ABC BCD ABCD ABCD TAAI-17 22
Our proposal: Justification (2) • “Sub - patterns first” property is assured even with dynamic re-ordering • Key observation: Visiting order of a search = topological order over a Hasse diagram The search is “sub - patterns first” Stack Topological sorting by right-to-left traverse A B C D AB AC AD BC BD CD CD ABD ACD BCD ABC ACD BCD ABCD ABCD TAAI-17 23
Our proposal: Justification (2) • “Sub - patterns first” property is assured even with dynamic re-ordering • Key observation: A B Visiting order of a search = AB topological order over a Hasse diagram C The search is “sub - patterns first” AC Stack BC ABC A B C D D AD BD AB AC AD BC BD CD ABD CD ABD ACD BCD ABC ACD BCD ABCD ABCD TAAI-17 24
Recommend
More recommend