Outline MAFIA: A Maximal � Introduction Frequent Itemset � Related Work Algorithm for � Algorithmic Components Transactional Databases � Database Representation � Experimental Results Authors: Doug Burdick, Manuel Calimlim, Johannes Gehrke � Comparison - DepthProject Presented by: Benjamin Chu � Conclusion CMPUT 695 Fall 2004 2 MAFIA Presentation Introduction - Problem Introduction - Solutions � Alternatives to “FI” Association Rule Mining, Two Phases � � Frequent Closed Itemsets ( FCI ) Find Frequent Itemsets ( FI ) 1) � FCI: Itemset X is closed if there are no supersets Generate “interesting” patterns 2) with the same support. Most time in association rule mining is � � Maximal Frequent Itemsets ( MFI ) spent in finding the frequent itemsets � MFI: Itemset X is maximally frequent if no superset of X is frequent. When itemsets are long (g.t. 15-20 � items), finding entire FI can be infeasible. 3 4 MAFIA Presentation MAFIA Presentation
Introduction - MAFIA Introduction - Item Subset Lattice / Tree � Integrates new and old ideas into practical algorithm for solving MFI problem � Problem of mining frequent itemsets viewed as finding a cut through itemset lattice � All items above cut are frequent itemsets � Head(N) – itemset identifying node N � All items below cut are infrequent itemsets � Tail(N) – set of all possible extensions of the node N � HUT – Head U(nion) Tail 5 6 MAFIA Presentation MAFIA Presentation Related/Prior Work on MFI Outline � Introduction � Apriori � Related Work � MaxMiner � Algorithmic Components � DepthProject � Database Representation � MaxClique � Experimental Results � MaxEclat � Comparison – DepthProject � Pincer-Search � Conclusion � VIPER 7 8 MAFIA Presentation MAFIA Presentation
Outline Algorithmic Components � Introduction � MAFIA: depth-first traversal of item subset � Related Work lattice with search space pruning: � Algorithmic Components � PEP � Database Representation � FHUT � HUTMFI � Experimental Results � Dynamic reordering � Comparison – DepthProject � Conclusion 9 10 MAFIA Presentation MAFIA Presentation Search Space Pruning - PEP Search Space Pruning - FHUT � Parent Equivalence Pruning � Frequent Head Union Tail � Given current node in itemset tree with � For a node n , the largest possible frequent head x and tail element y , t(x) ⊆ t(y) itemset contained in subtree rooted at n is means any transaction containing x also n’s HUT (Head Union Tail). contains y � If n ’s HUT is found to be frequent, do not � Since we only want maximal frequent explore any subsets of the HUT. itemsets, we can move y to the head if � The subtree rooted at n can be pruned t(x) ⊆ t(y) holds away. 11 12 MAFIA Presentation MAFIA Presentation
Search Space Pruning – Dynamic Search Space Pruning - HUTMFI Reordering � Head Union Tail Maximal Frequent � Tail of a node such that it only contains Itemset frequent extensions of the current node � If a superset of HUT for the current node is � Tail elements are ordered by increasing already in the MFI, then the HUT is support (keeps search space as small as frequent. possible). � The subtree rooted at this node can be pruned away. 13 14 MAFIA Presentation MAFIA Presentation MAFIA Algorithm Outline � Introduction � Related Work � Algorithmic Components � Database Representation � Experimental Results � Comparison – DepthProject � Conclusion (from: http://himalaya-tools.sourceforge.net/mafiappt_files/800x600/Slide26.html) 15 16 MAFIA Presentation MAFIA Presentation
Database Representation Database Representation � Vertical bitmap representation allows for � Vertical Bitmap optimized support counting and efficient itemset � Each item is allocated a set of bits, one bit for each transaction in the database generation � If item X appears in transaction j, then the j th bit of item X is set to one X X Y XY 1 1 0 0 1 1 2 3 4 5 0 0 0 Lookup 0 TID Items 1 1 0 1 0 pregenerated 0 1 0 1 1 1, 2, 4 & ‘onecount’ for 1 1 1 1 1 1 0 0 1 2 1, 2, 5 219 0 1 1 1 3 1, 2, 3, 4 1 1 1 1 0 1 0 0 0 4 1, 4 1 1 0 0 1 0 Itemset 1 0 0 Support Counting … Generation 17 18 MAFIA Presentation MAFIA Presentation Database Compression Outline � Introduction � Problem: Sparse bitmaps at low support levels � Related Work � Algorithmic Components � Solution: Remove bits that don’t matter � Database Representation � To count support of subtree rooted at a � Experimental Results node N, only need transactions containing itemset X at node N � Comparison - DepthProject � Conclusion � Product: projected bit vector 19 20 MAFIA Presentation MAFIA Presentation
Experimental Results - Experimental Results Compression 21 22 MAFIA Presentation MAFIA Presentation Outline Comparison - DepthProject � Introduction � DepthProject: “state-of-the-art” maximal pattern algorithm � Related Work � Algorithmic Components � Differences: � Database Representation � Uses horizontal database layout � Experimental Results � Alternate pruning: bucketing � Comparison - DepthProject � Conclusion 23 24 MAFIA Presentation MAFIA Presentation
Comparison - DepthProject Comparison - DepthProject � Influence of PEP Reduction Factor of Nodes Considered Due to PEP Pruning 25 26 MAFIA Presentation MAFIA Presentation Comparison - DepthProject Comparison – DepthProject Scaleup of Chess.data Time Comparison on Chess.data � MAFIA: Only a � MAFIA: Scales factor of two very well with better than the number of DepthProject transactions on this dataset 27 28 MAFIA Presentation MAFIA Presentation
Outline Conclusions � Introduction � Increased efficiency of MAFIA over � Related Work DepthProject due to: � Algorithmic Components � Fast itemset generation and support counting � Database Representation � Parent-equivalence pruning � Experimental Results � Comparison – DepthProject � Conclusion 29 30 MAFIA Presentation MAFIA Presentation Conclusions – MAFIA flexibility Conclusions – MAFIA flexibility � MAFIA can also be used to find all FI � MAFIA can be used to mine FCI � To Find FI: � To find FCI: � Suppress all pruning tools (PEP, FHUT, � Only use PEP for pruning HUTMFI). � Still check for supersets in previously � Add all frequent nodes in itemset lattice to FI discovered FCI without superset checking 31 32 MAFIA Presentation MAFIA Presentation
Conclusion - Followup Conclusions � After original paper, new version of MAFIA uses � MAFIA shines when: progressive focusing technique introduced in � Data is dense and contains long itemsets GenMax [Gouda,Zaki]: LMFI update � Database is large � MAFIA is not so good when: � minimum support is high (short itemsets) � MAFIA and GenMax are both useful 33 34 MAFIA Presentation MAFIA Presentation Thank you! Questions?
Recommend
More recommend