An Efficient Algorithm for SPM with CP Problem of Sequential 2 1 1 J. AOGA , T. Guns , P. Schaus 2 1 UCLouvain, KULeuven — Belgium Pattern Mining (SPM) ECML PKDD 2016, Riva del Garda, Italy,19–23/09/2016 Aoga et al., An Efficient Algorithm for SPM with CP, ECML PKDD 2016 Sequence Milk Coffee Coffee Sugar Coffee Sugar Milk Coffee Coffee Sugar Coffee Sugar Client1 Client1 Coffee Milk Coffee Sugar Coffee Milk Coffee Sugar Client2 Client2 Milk Coffee Milk Coffee Client3 Client3 Coffee Sugar Egg Coffee Sugar Egg Client4 Client4 • Sequence Database (SDB) • Sequence Database (SDB) • Sequence : < Milk Coffee Sugar Coffee Sugar> 3 3 Aoga et al., An Efficient Algorithm for SPM with CP, ECML PKDD 2016 Aoga et al., An Efficient Algorithm for SPM with CP, ECML PKDD 2016
Sub sequence Sequence Sub sequence Sequence Client1 Milk Coffee Coffee Sugar Coffee Sugar Client1 Milk Coffee Coffee Sugar Coffee Sugar Coffee Milk Coffee Sugar Coffee Milk Coffee Sugar Client2 Client2 Milk Coffee Milk Coffee Client3 Client3 Coffee Sugar Egg Coffee Sugar Egg Client4 Client4 • Sequence Database (SDB) • Sequence Database (SDB) • Sequence : < Milk Coffee Sugar Coffee Sugar> • Sequence : < Milk Coffee Sugar Coffee Sugar> • Subsequence : <Coffee Sugar> • Subsequence : <Coffee Sugar> • Support (<Coffee Sugar>) = 3 3 3 Aoga et al., An Efficient Algorithm for SPM with CP, ECML PKDD 2016 Aoga et al., An Efficient Algorithm for SPM with CP, ECML PKDD 2016 Related Work Sub sequence Sequence Milk Coffee Coffee Sugar Coffee Sugar Client1 5 6 0 1 5 9 9 0 0 0 9 9 0 0 0 1 1 2 2 2 Coffee Milk Coffee Sugar Client2 Timeline Lapin-Spam PrefixSpan Milk Coffee Apriori-All [c]SPADE Specialised Client3 GSP Methods [16] [18] [10] [1] Coffee Sugar Egg Client4 ⇪ ⇪ ⇪ ⇪ Last Position Vertical Prefix HashTree idea SDB Projection • Sequence Database (SDB) • Sequence : < Milk Coffee Sugar Coffee Sugar> 5 1 2 6 0 1 1 2 0 0 2 2 • Subsequence : <Coffee Sugar> Timeline • Support (<Coffee Sugar>) = 3 CP-Based SatEms CPSM GapSeq PPIC PP Methods [8] [3] [6] [5] ⇪ ⇪ ⇪ ⇪ ⇪ Problem : Find all subsequences with support ≥ One Global Gap ? SAT-Based Given Threshold Prop./Seq. Prop. constraint 3 4 Aoga et al., An Efficient Algorithm for SPM with CP, ECML PKDD 2016 Aoga et al., An Efficient Algorithm for SPM with CP, ECML PKDD 2016
Related Work Related Work 5 6 0 1 5 5 6 0 1 5 9 9 0 0 0 9 9 0 0 0 9 9 0 0 0 9 9 0 0 0 1 1 2 2 2 1 1 2 2 2 Timeline Timeline Lapin-Spam Lapin-Spam PrefixSpan PrefixSpan Apriori-All [c]SPADE Specialised Apriori-All [c]SPADE Specialised GSP GSP Methods Methods [16] [16] [18] [10] [18] [10] [1] [1] ⇪ ⇪ ⇪ ⇪ ⇪ ⇪ ⇪ ⇪ Last Position Last Position Vertical Prefix Vertical Prefix HashTree HashTree idea idea SDB Projection SDB Projection 5 5 1 1 2 0 6 2 0 6 1 1 1 1 2 2 0 0 0 0 2 2 2 2 Timeline Timeline CP-Based CP-Based SatEms CPSM GapSeq SatEms CPSM GapSeq PPIC PPIC PP PP Methods Methods [8] [8] [3] [6] [5] [3] [6] [5] ⇪ ⇪ ⇪ ⇪ ⇪ ⇪ ⇪ ⇪ ⇪ ⇪ One Global Gap One Global Gap ? ? SAT-Based SAT-Based Prop./Seq. Prop. Prop./Seq. Prop. constraint constraint 4 4 Aoga et al., An Efficient Algorithm for SPM with CP, ECML PKDD 2016 Aoga et al., An Efficient Algorithm for SPM with CP, ECML PKDD 2016 CP : Filtering + DFSearch CP : Filtering + DFSearch Variables P 1 P 2 P 3 P 4 P 5 Variables P 1 P 2 P 3 P 4 P 5 P 1 = Coffee P 1 !=Coffee 𝝑 𝝑 𝝑 𝝑 𝝑 𝝑 𝝑 𝝑 𝝑 𝝑 Milk Milk Milk Milk Milk Milk Milk Milk Milk Milk Domains Coffee Coffee Coffee Coffee Coffee Domains Coffee Coffee Coffee Coffee Coffee Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar P 1 = Milk Egg Egg Egg Egg Egg Egg Egg Egg Egg Egg P 2 = Sugar P 2 != Sugar Frequent Pattern Found: P 1 P 2 …P L =MS 𝝑𝝑𝝑 5 5 Aoga et al., An Efficient Algorithm for SPM with CP, ECML PKDD 2016 Aoga et al., An Efficient Algorithm for SPM with CP, ECML PKDD 2016
CP : Filtering + DFSearch CP : Filtering + DFSearch Variables P 1 P 2 P 3 P 4 P 5 Variables P 1 P 2 P 3 P 4 P 5 P 1 = Coffee P 1 !=Coffee P 1 = Coffee P 1 !=Coffee 𝝑 𝝑 𝝑 𝝑 𝝑 𝝑 𝝑 𝝑 𝝑 𝝑 Milk Milk Milk Milk Milk Milk Milk Milk Milk Milk Domains Coffee Coffee Coffee Coffee Coffee Domains Coffee Coffee Coffee Coffee Coffee Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar P 1 = Milk P 1 = Milk Egg Egg Egg Egg Egg Egg Egg Egg Egg Egg Constraint Store Constraint Store Constraints Constraints P 2 = Sugar P 2 = Sugar P 2 != Sugar P 2 != Sugar Support Support counting counting P 1 P 2 P 3 P 4 P 5 P 1 P 2 P 3 P 4 P 5 RegExpr RegExpr Fix-Point Algorithm Frequent Frequent repeat Pattern Found: select a constraint c Pattern Found: P 1 P 2 …P L =MS 𝝑𝝑𝝑 if c is OK wrt the domain store P 1 P 2 …P L =MS 𝝑𝝑𝝑 apply filtering algorithm of c // i.e. remove impossible values else return FAIL until domain store did not change 5 5 Aoga et al., An Efficient Algorithm for SPM with CP, ECML PKDD 2016 Aoga et al., An Efficient Algorithm for SPM with CP, ECML PKDD 2016 CP : Filtering + DFSearch Improvements of Literature (1/4) Variables P 1 P 2 P 3 P 4 P 5 P 1 = Coffee P 1 !=Coffee 𝝑 𝝑 𝝑 𝝑 𝝑 1. Memory and DFS improvement. How to Store and Milk Milk Milk Milk Milk Domains Coffee Coffee Coffee Coffee Coffee restore databases in the DFSearch ? => reversible Sugar Sugar Sugar Sugar Sugar P 1 = Milk Egg Egg Egg Egg Egg vectors making use of trailing techniques. Constraint Store Constraints 2. Support Count Improvement. Visit only the last P 2 = Sugar P 2 != Sugar Support position of each symbol after start position. [weakness 2] counting P 1 P 2 P 3 P 4 P 5 RegExpr 3. Sequence visited Improvement. Visit a sequence only if current start position is less than last position of Fix-Point Algorithm Frequent repeat prefix [weakness 3] select a constraint c Pattern Found: if c is OK wrt the domain store P 1 P 2 …P L =MS 𝝑𝝑𝝑 apply filtering algorithm of c // i.e. remove impossible values 4. Pruning Improvement. Remove infrequent item from else This is main return FAIL only D i+1 domains of P i+1 . [weakness 1] bottleneck until domain store did not change 5 Aoga et al., An Efficient Algorithm for SPM with CP, ECML PKDD 2016
Milk Coffee Sugar Coffee Sugar Client1 Coffee Milk Coffee Sugar Client2 Milk Coffee Client3 Coffee Sugar Egg Client4 0 1 2 3 4 M MC C S S C C S S 1 C C M M C CS S 2 M M C C 3 C C S E S E 4 7 7 0 1 2 3 4 0 1 2 3 4 M C S M C S MC C S S C S MC M C S S C S 1 1 C C C C M M CS S C C M M M CS S 2 2 M C M C M M C M C 3 3 C S E C S E C S E C S E 4 4 Supports M : P 1 P 2 P 3 P 4 P 5 P 1 P 2 P 3 P 4 P 5 𝝑 𝝑 𝝑 𝝑 𝝑 𝝑 𝝑 𝝑 Milk Milk Milk Milk Milk Milk Milk Milk Milk Milk Coffee Coffee Coffee Coffee Coffee Coffee Coffee Coffee Coffee Coffee Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Egg Egg Egg Egg Egg Egg Egg Egg Egg Egg 7 7
Given Threshold =3 (75%) 0 1 2 3 4 0 1 2 3 4 M MC C S S C C S S M MC C S S C C S S 1 1 C C M M C CS S C C M M C CS S 2 2 M M C C M C M C 3 3 C C S E S E C S E C S E 4 4 Supports Supports M : M : 3 3 P 1 P 2 P 3 P 4 P 5 P 1 P 2 P 3 P 4 P 5 C : 4 C : 4 𝝑 𝝑 𝝑 𝝑 𝝑 𝝑 𝝑 𝝑 S : 3 S : 3 E : 1 Milk Milk Milk Milk Milk E : 1 Milk Milk Milk Milk Milk Coffee Coffee Coffee Coffee Coffee Coffee Coffee Coffee Coffee Coffee Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Egg Egg Egg Egg Egg Egg Egg Egg Egg Egg 7 7 Given Given Threshold =3 Threshold =3 (75%) (75%) 0 1 2 3 4 0 1 2 3 4 M C S M C S MC C S S C S MC C S S C S 1 1 C C C C M M CS S C C M M CS S 2 2 M C M C M M C C 3 3 C S E C S E C C S E S E 4 4 Supports Supports M : M : 3 3 P 1 P 2 P 3 P 4 P 5 P 1 P 2 P 3 P 4 P 5 C : 4 C : 4 𝝑 𝝑 𝝑 𝝑 𝝑 𝝑 𝝑 𝝑 S : 3 S : 3 E : 1 Milk Milk Milk Milk Milk E : 1 Milk Milk Milk Milk Milk Coffee Coffee Coffee Coffee Coffee Coffee Coffee Coffee Coffee Coffee Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Egg Egg Egg Egg Egg Egg Egg Egg Egg 7 7
Recommend
More recommend