parameter free mining of non redundant
play

Parameter-free Mining of Non-redundant Discriminative Itemsets - PowerPoint PPT Presentation

An Exhaustive Covering Approach to Parameter-free Mining of Non-redundant Discriminative Itemsets Yoshitaka Kameya Meijo University DaWaK-16 1 Outline Background Our propsal Experiments DaWaK-16 2 Outline Background Our


  1. An Exhaustive Covering Approach to Parameter-free Mining of Non-redundant Discriminative Itemsets Yoshitaka Kameya Meijo University DaWaK-16 1

  2. Outline • Background • Our propsal • Experiments DaWaK-16 2

  3. Outline • Background • Our propsal • Experiments DaWaK-16 3

  4. Background: Discriminative Patterns (1) • Discriminative patterns: – Show differences between two groups (classes) – Used for: • Characterizing the positive class • Building more precise classifiers Discriminative pattern x milk=True  aquatic=False  + + :Positive class – :Negative class Positive class Class labels DaWaK-16 4

  5. Background: Discriminative Patterns (2) • Discriminative patterns tend to be more meaningful than frequent patterns (thanks to class labels) • Are class labels always available? – Comparing groups is a standard starting point in data analysis – Clustering can find groups (classes)  Cluster labeling Clusters labeled with discriminative patterns Clusters .... Original data 2. Discriminative 1. Clustering .... pattern mining .... DaWaK-16 5

  6. Background: Discriminative Patterns (3) • Quality score: Measures the overlap between pattern x and positive class c c c x x Quality is high Quality is low • Most of popular quality scores are not anti-monotonic: – Confidence, Lift – Support difference, Weighted relative accuracy, Leverage – F-score, Dice, Jaccard – ...  Branch & bound pruning is often used [Morishita+ 00][Zimmarmann+ 09][Nijssen+ 09] DaWaK-16 6

  7. Background: Coping with redundancy (1) • Example : Item A is relevant to the positive class  Patterns containing A tend to be top-ranked in the candidate list (most of them are redundant) Top-15 patterns (+1 due to tie score) TIDs Dataset Rank Pattern F-score Covered TID Class TID Class Transaction Transaction 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 1 1 + + {A, B, D, E} {A, B, D, E} 3 {A} 0.67 1, 2, 3, 4 2 2 + + {A, B, C, D, E} {A, B, C, D, E} 3 {A, B} 0.67 1, 2, 4 Positive 3 3 + + {A, C, D, E} {A, C, D, E} 5 {A, D, E} 0.60 1, 2, 3 Transactions 4 4 + + {A, B, C} {A, B, C} 5 {A, E} 0.60 1, 2, 3 5 5 + + {B} {B} 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 6 6 – – {A, B, D, E} {A, B, D, E} 8 {A, C, D} 0.57 2, 3 7 7 – – {B, C, D, E} {B, C, D, E} 8 {A, C, D, E} 0.57 2, 3 8 8 – – {C, D, E} {C, D, E} Negative 8 {A, C, E} 0.57 2, 3 Transactions 9 9 – – {A, D, E} {A, D, E} 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 10 10 – – {A, D} {A, D} 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 7

  8. Background: Coping with redundancy (2) • Set-inclusion-based constraints – Closedness [Pasquier+ 99] – Productivity [Bayardo 00][Webb 07] DaWaK-16 8

  9. Background: Coping with redundancy (2) • Set-inclusion-based constraints – Closedness [Pasquier+ 99] – Productivity [Bayardo 00][Webb 07] TIDs Rank Pattern F-score Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 Closedness : 3 {A} 0.67 1, 2, 3, 4 For patterns covering 3 {A, B} 0.67 1, 2, 4 the same (positive) 5 {A, D, E} 0.60 1, 2, 3 transactions, 5 {A, E} 0.60 1, 2, 3 pick the largest one 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 9

  10. Background: Coping with redundancy (2) • Set-inclusion-based constraints – Closedness [Pasquier+ 99] – Productivity [Bayardo 00][Webb 07] TIDs Rank Pattern F-score Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 Closedness : 3 {A} 0.67 1, 2, 3, 4 For patterns covering 3 {A, B} 0.67 1, 2, 4 the same (positive) 5 {A, D, E} 0.60 1, 2, 3 transactions, 5 {A, E} 0.60 1, 2, 3 pick the largest one 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 10

  11. Background: Coping with redundancy (2) • Set-inclusion-based constraints – Closedness [Pasquier+ 99] – Productivity [Bayardo 00][Webb 07] TIDs Rank Pattern F-score Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 3 {A} 0.67 1, 2, 3, 4 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 16 patterns  8 patterns 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 11

  12. Background: Coping with redundancy (2) • Set-inclusion-based constraints – Closedness [Pasquier+ 99] – Productivity [Bayardo 00][Webb 07] TIDs Rank Pattern F-score Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 Productivity : 3 {A} 0.67 1, 2, 3, 4 If a super-pattern has no 3 {A, B} 0.67 1, 2, 4 higher quality, remove it 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 12

  13. Background: Coping with redundancy (2) • Set-inclusion-based constraints – Closedness [Pasquier+ 99] – Productivity [Bayardo 00][Webb 07] TIDs Rank Pattern F-score Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 Productivity : 3 {A} 0.67 1, 2, 3, 4 If a super-pattern has no 3 {A, B} 0.67 1, 2, 4 higher quality, remove it 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 13

  14. Background: Coping with redundancy (2) • Set-inclusion-based constraints – Closedness [Pasquier+ 99] – Productivity [Bayardo 00][Webb 07] TIDs Rank Pattern F-score Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 3 {A} 0.67 1, 2, 3, 4 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 16 patterns  4 patterns 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 14

  15. Background: Coping with redundancy (2) • Set-inclusion-based constraints – Productivity + Closedness [Kameya+ 13] TIDs Rank Pattern F-score Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 3 {A} 0.67 1, 2, 3, 4 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 16 patterns  3 patterns 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 15

  16. Background: Coping with redundancy (3) • The best-covering constraint – In the same spirit of the HCC (highest confidence covering) constraint in HARMONY [Wang+ 05] TIDs Rank Pattern F-score Best-covering : Covered Every pattern must be 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 the best to at least one 3 {A} 0.67 1, 2, 3, 4 positive transaction 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 16

  17. Background: Coping with redundancy (3) • The best-covering constraint – In the same spirit of the HCC (highest confidence covering) constraint in HARMONY [Wang+ 05] TIDs Rank Pattern F-score Best-covering : Covered Every pattern must be 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 the best to at least one 3 {A} 0.67 1, 2, 3, 4 positive transaction 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 17

Recommend


More recommend