17th International Conference on Electronic Commerce and Web Technologies - EC-Web 2016 MFI-TransSW+ : Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim Bernardo Pereira Nunes Gisele Rabello Lopes Marco Antonio Casanova
17th International Conference on Electronic Commerce and Web Technologies - EC-Web 2016 MFI-TransSW+ : Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim Bernardo Pereira Nunes Gisele Rabello Lopes Marco Antonio Casanova
Agenda 1. Frequent Itemsets and Data Streams 2. MFI-TransSW+ algorithm 3. ClickRec Recommendation System 4. Experiments and results.
Frequent Itemsets {bread,milk,coffee},{bread,milk,cheese},{bread,cheese} N = 3 Item If a set I of items is frequent, then so is every subset of I . transaction s = 0.5 Itemsets k=2 Support X is frequent if and only if sup( X ) ≥ N · s , were N is the number of bread, milk 2 transactions and s is a limit, defined by bread, coffee 1 the user, called minimum support. milk, coffee 1 bread, cheese 2 Frequent itemset milk, cheese 1
Data Streams {a,b,c},{a,b,d}, {c,d,f},{a,d},{d,e},{b,c,e},{a,f},{a,c,f},{a,b,c}... Data stream
Data Stream - Sliding Windows window size = 6 {a,b,c},{a,b,d}, {c,d,f},{a,d},{d,e},{b,c,e},{a,f},{a,c,f} ,{a,b,c} Sliding window
MFI-TransSW & MFI-TransSW+
MFI-TransSW (original algorithm) • Process sliding windows • Uses bit vectors bit(x)=101001
MFI-TransSW Phases 1. Load window 2. Slide window 3. Generate frequent itemsets
MFI-TransSW Loading and sliding window Data stream T1=(acd) , T2=(bce) , T3=(abce) ,T4=(be) window size =3 bit(a)=1 0 1 0 bit(b)=0 1 1 1 bit(c)=1 1 1 0 bit(d)=1 0 0 0 bit(e)=0 1 1 1
MFI-TransSW Loading and sliding window Data stream T1=(acd) , T2=(bce) , T3=(abce) ,T4=(be) window size =3 bit(a)=1 0 1 0 bit(b)=0 1 1 1 left bit-shift bit(c)=1 1 1 0 bit(d)=1 0 0 0 bit(e)=0 1 1 1
MFI-TransSW Mining frequent itemsets window size =3 s =0.5 bit(a)=101 freq(a)=2 bit(b)=011 freq(b)=2 bit(c)=111 freq(c)=3 bit(d)=100 freq(e)=1 bit(e)=011 freq(f)=2
MFI-TransSW Mining frequent itemsets window size =3 s =0.5 bit(a)=101 freq(a)=2 bit(b)=011 freq(b)=2 bitwise AND bit(a <and> b)=001 freq(a <and> b)=1
MFI-TransSW • Fast • Finds all frequent itemsets • No false positives or false negatives • On-demand generation of frequent itemsets • Small memory footprint
MFI-TransSW+ Clickstream (user-1,a),(user-2,b),(user-3,a),(user-2,c),(user-3,b) ,(user-2,a) Transactions ({a}),({b,c}),({a,b}) ({a}),({ a ,b,c}),({a,b}) 0 1 2 List of UID's bit(a) 1 0 1 bit(a) 1 1 1 0 1 2 user-1 user-2 user-3 bit(b) 0 1 1 bit(c) 0 1 0
MFI-TransSW+ Clickstream (user-1,a),(user-2,b),(user-3,a),(user-2,c),(user-3,b),(user-2,a) ,(user-4,b) window size =3 List of UID's 0 1 2 0 1 2 user-2 user-3 user-4 user-1 0 1 bit(a) 1 1 0 0 1 bit(b) 1 1 1 List of Bit Vectors per User bit(c) 0 1 0 2 0 1 2 0 1 0,1,2 0,1
MFI-TransSW+ • Process clickstreams • Uses bit vectors as circular lists • More efficient “clean and update" • Faster
ClickRec
ClickRec A news article realtime recommendation system based on web clickstreams and semantic annotations.
ClickRec MFI-TransSW+ MFI-TransSW+ Clickstream 1) Data Streams 2) Frequent 01100100 01100001 01110100 Processor Itemsets Miner ClickRec 3) Recommender
ClickRec (user-1, {a,b,c}) (user-1, {<tag1>, <tag2>,<tag3>,<tag4>})
ClickRec (user-1, {a,b,c}) (user-1, {<neymar>, <messi>,<c.ronaldo>,<barcelona>})
ClickRec Frequent itemsets <messi> <neymar> TF-IDF <c.ronaldo> <barcelona> <messi> <neymar> <barcelona> <messi> <c.ronaldo> <chelsea> <messi> TF-IDF <c.ronaldo> <barcelona> <robben> <neymar> <chelsea> <robben>
Experiments
Experiments 1. Real world clickstream from one of the largest news Web sites in Brazil 2. Total = 24 hours of clickstream = 25 million “clicks" (pageviews) 3. Two editorials: sports and entertainment
Experiments MFI-TransSW vs MFI-TransSW+ 1. Load a window with w transactions 2. Execute 10k slidings 3. Measure the time to execute item 2
MFI-TransSW vs MFI-TransSW+ Window Size = 1.000 Execution time (seconds) 41,45 r e t s a f x 0 0 1 0,41 MFI-TranSW MFI-TranSW+
MFI-TransSW vs MFI-TransSW+ 816x 666x 623x Times faster 521x 476x 413x 337x 286x 216x 102x 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000 Window Size
Experiments MFI-TransSW vs MFI-TransSW+ Execution time (seconds) Window size MFI-TranSW MFI-TranSW+ 1.000 41,45 0,41 2.000 136,74 0,63 3.000 272,24 0,95 4.000 395,55 1,18 5.000 533,10 1,29 6.000 761,31 1,60 7.000 996,10 1,91 8.000 1.295,16 2,08 9.000 1.484,10 2,23 10.000 1.928,76 2,36
Experiments ClickRec 1. Divide clickstream in pairs of two consecutive hours A. The first hour is used to mine the frequent itemsets B. The second hour is used to extract a sample of 10k users (the sample users must have accessed more than one page) 2. Test recommendations C. Feed the first page accessed by the user to ClickRec, which recommends 10 pages to the user D. Verify if the user accessed one of the recommendations
Experiments ClickRec 40% Sports editorial 35% 30% 25% Hit rate 20% 15% 10% 5% 0% 0:00 vs 1:00 6:00 vs 7:00 12:00 vs 13:00 18:00 vs 19:00 Late Night Morning Afternoon Night
Experiments ClickRec 50% Entertainment editorial 45% 40% 35% 30% Hit rate 25% 20% 15% 10% 5% 0% 0:00 vs 1:00 6:00 vs 7:00 12:00 vs 13:00 18:00 vs 19:00 Late Night Morning Afternoon Night
Conclusion
Conclusion MFI-TransSW+ • Processes clickstreams • Uses bit vectors as circular lists • Up to 2 orders of magnitude faster than the original algorithm (MFI-TransSW)
Conclusion ClickRec • Based on MFI-TransSW+ • Uses semantic annotations • Generates recommendations in realtime • Hit rate > 20%
References [Agrawal et al. 1994] AGRAWAL, R.; SRIKANT, R.. Fast Algorithms for Mining Association Rules. Proc. 20th int. conf. very large data bases, VLDB, p. 1–32, 1994. 3, 4.1.3 [Chi et al. 2006] CHI, Y.; WANG, H.; PHILIP, S. Y. ; MUNTZ, R. R.. Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowledge and Information Systems, 10(3):265– 294, 2006. 3 [Li et al. 2009] LI, H.-F.; LEE, S.-Y.. Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Systems with Applications, 36(2):1466–1477, 2009. 1.2, 3, 20
Thanks!
Recommend
More recommend