mfi transsw efficiently mining frequent itemsets in
play

MFI-TransSW+ : Efficiently Mining Frequent Itemsets in Clickstreams - PowerPoint PPT Presentation

17th International Conference on Electronic Commerce and Web Technologies - EC-Web 2016 MFI-TransSW+ : Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim Bernardo Pereira Nunes Gisele Rabello Lopes Marco Antonio


  1. 17th International Conference on Electronic Commerce and Web Technologies - EC-Web 2016 MFI-TransSW+ : Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim Bernardo Pereira Nunes Gisele Rabello Lopes Marco Antonio Casanova

  2. 17th International Conference on Electronic Commerce and Web Technologies - EC-Web 2016 MFI-TransSW+ : Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim Bernardo Pereira Nunes Gisele Rabello Lopes Marco Antonio Casanova

  3. Agenda 1. Frequent Itemsets and Data Streams 2. MFI-TransSW+ algorithm 3. ClickRec Recommendation System 4. Experiments and results.

  4. Frequent Itemsets {bread,milk,coffee},{bread,milk,cheese},{bread,cheese} N = 3 Item If a set I of items is frequent, then so is every subset of I . transaction s = 0.5 Itemsets k=2 Support X is frequent if and only if sup( X ) ≥ N · s , were N is the number of bread, milk 2 transactions and s is a limit, defined by bread, coffee 1 the user, called minimum support. milk, coffee 1 bread, cheese 2 Frequent itemset milk, cheese 1

  5. Data Streams {a,b,c},{a,b,d}, {c,d,f},{a,d},{d,e},{b,c,e},{a,f},{a,c,f},{a,b,c}... Data stream

  6. Data Stream - Sliding Windows window size = 6 {a,b,c},{a,b,d}, {c,d,f},{a,d},{d,e},{b,c,e},{a,f},{a,c,f} ,{a,b,c} Sliding window

  7. MFI-TransSW & MFI-TransSW+

  8. MFI-TransSW (original algorithm) • Process sliding windows • Uses bit vectors bit(x)=101001

  9. MFI-TransSW Phases 1. Load window 2. Slide window 3. Generate frequent itemsets

  10. MFI-TransSW Loading and sliding window Data stream T1=(acd) , T2=(bce) , T3=(abce) ,T4=(be) window size =3 bit(a)=1 0 1 0 bit(b)=0 1 1 1 bit(c)=1 1 1 0 bit(d)=1 0 0 0 bit(e)=0 1 1 1

  11. MFI-TransSW Loading and sliding window Data stream T1=(acd) , T2=(bce) , T3=(abce) ,T4=(be) window size =3 bit(a)=1 0 1 0 bit(b)=0 1 1 1 left bit-shift bit(c)=1 1 1 0 bit(d)=1 0 0 0 bit(e)=0 1 1 1

  12. MFI-TransSW Mining frequent itemsets window size =3 s =0.5 bit(a)=101 freq(a)=2 bit(b)=011 freq(b)=2 bit(c)=111 freq(c)=3 bit(d)=100 freq(e)=1 bit(e)=011 freq(f)=2

  13. MFI-TransSW Mining frequent itemsets window size =3 s =0.5 bit(a)=101 freq(a)=2 bit(b)=011 freq(b)=2 bitwise AND bit(a <and> b)=001 freq(a <and> b)=1

  14. MFI-TransSW • Fast • Finds all frequent itemsets • No false positives or false negatives • On-demand generation of frequent itemsets • Small memory footprint

  15. MFI-TransSW+ Clickstream (user-1,a),(user-2,b),(user-3,a),(user-2,c),(user-3,b) ,(user-2,a) Transactions ({a}),({b,c}),({a,b}) ({a}),({ a ,b,c}),({a,b}) 0 1 2 List of UID's bit(a) 1 0 1 bit(a) 1 1 1 0 1 2 user-1 user-2 user-3 bit(b) 0 1 1 bit(c) 0 1 0

  16. MFI-TransSW+ Clickstream (user-1,a),(user-2,b),(user-3,a),(user-2,c),(user-3,b),(user-2,a) ,(user-4,b) window size =3 List of UID's 0 1 2 0 1 2 user-2 user-3 user-4 user-1 0 1 bit(a) 1 1 0 0 1 bit(b) 1 1 1 List of Bit Vectors per User bit(c) 0 1 0 2 0 1 2 0 1 0,1,2 0,1

  17. MFI-TransSW+ • Process clickstreams • Uses bit vectors as circular lists • More efficient “clean and update" • Faster

  18. ClickRec

  19. ClickRec A news article realtime recommendation system based on web clickstreams and semantic annotations.

  20. ClickRec MFI-TransSW+ MFI-TransSW+ Clickstream 1) Data Streams 2) Frequent 01100100 01100001 01110100 Processor Itemsets Miner ClickRec 3) Recommender

  21. ClickRec (user-1, {a,b,c}) (user-1, {<tag1>, <tag2>,<tag3>,<tag4>})

  22. ClickRec (user-1, {a,b,c}) (user-1, {<neymar>, <messi>,<c.ronaldo>,<barcelona>})

  23. ClickRec Frequent itemsets <messi> <neymar> TF-IDF <c.ronaldo> <barcelona> <messi> <neymar> <barcelona> <messi> <c.ronaldo> <chelsea> <messi> TF-IDF <c.ronaldo> <barcelona> <robben> <neymar> <chelsea> <robben>

  24. Experiments

  25. Experiments 1. Real world clickstream from one of the largest news Web sites in Brazil 2. Total = 24 hours of clickstream = 25 million “clicks" (pageviews) 3. Two editorials: sports and entertainment

  26. Experiments MFI-TransSW vs MFI-TransSW+ 1. Load a window with w transactions 2. Execute 10k slidings 3. Measure the time to execute item 2

  27. MFI-TransSW vs MFI-TransSW+ Window Size = 1.000 Execution time (seconds) 41,45 r e t s a f x 0 0 1 0,41 MFI-TranSW MFI-TranSW+

  28. MFI-TransSW vs MFI-TransSW+ 816x 666x 623x Times faster 521x 476x 413x 337x 286x 216x 102x 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000 Window Size

  29. Experiments MFI-TransSW vs MFI-TransSW+ Execution time (seconds) Window size MFI-TranSW MFI-TranSW+ 1.000 41,45 0,41 2.000 136,74 0,63 3.000 272,24 0,95 4.000 395,55 1,18 5.000 533,10 1,29 6.000 761,31 1,60 7.000 996,10 1,91 8.000 1.295,16 2,08 9.000 1.484,10 2,23 10.000 1.928,76 2,36

  30. Experiments ClickRec 1. Divide clickstream in pairs of two consecutive hours A. The first hour is used to mine the frequent itemsets B. The second hour is used to extract a sample of 10k users (the sample users must have accessed more than one page) 2. Test recommendations C. Feed the first page accessed by the user to ClickRec, which recommends 10 pages to the user D. Verify if the user accessed one of the recommendations

  31. Experiments ClickRec 40% Sports editorial 35% 30% 25% Hit rate 20% 15% 10% 5% 0% 0:00 vs 1:00 6:00 vs 7:00 12:00 vs 13:00 18:00 vs 19:00 Late Night Morning Afternoon Night

  32. Experiments ClickRec 50% Entertainment editorial 45% 40% 35% 30% Hit rate 25% 20% 15% 10% 5% 0% 0:00 vs 1:00 6:00 vs 7:00 12:00 vs 13:00 18:00 vs 19:00 Late Night Morning Afternoon Night

  33. Conclusion

  34. Conclusion MFI-TransSW+ • Processes clickstreams • Uses bit vectors as circular lists • Up to 2 orders of magnitude faster than the original algorithm (MFI-TransSW)

  35. Conclusion ClickRec • Based on MFI-TransSW+ • Uses semantic annotations • Generates recommendations in realtime • Hit rate > 20%

  36. References [Agrawal et al. 1994] AGRAWAL, R.; SRIKANT, R.. Fast Algorithms for Mining Association Rules. Proc. 20th int. conf. very large data bases, VLDB, p. 1–32, 1994. 3, 4.1.3 [Chi et al. 2006] CHI, Y.; WANG, H.; PHILIP, S. Y. ; MUNTZ, R. R.. Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowledge and Information Systems, 10(3):265– 294, 2006. 3 [Li et al. 2009] LI, H.-F.; LEE, S.-Y.. Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Systems with Applications, 36(2):1466–1477, 2009. 1.2, 3, 20

  37. Thanks!

Recommend


More recommend