Outline Fast Algorithms for Mining Association Rules This is an - PDF document

11/9/2009 Outline Fast Algorithms for Mining Association Rules � This is an important paper because � VLDB 10 Years Best Paper Award Rakesh Agrawal , Ramakrishnan Srikant � Has been 1st highest cited paper of all papers in the fields of databases and data mining until 2007 in Citeseer � 2009 Citeseer Citations: Rank 18 in all computer science papers � Two authors all better jobs!!! Agenda What is the problem? Why it is so important? Presented by Wenhao Xu � It addresses an important problem. � It proposes an algorithm that is Apriori Algorithm Discussion led by Sophia Liang better than previous algorithms What are its basic What are its basic � Lots of papers afterwards are concepts? concepts? based on its basic concepts Recent development Conclusion Example of Association Rule Mining Example & Notions Transaction Items 1 {milk, diaper, beer, Coke} {milk, diaper} � {beer} 2 {milk, bread} 3 {milk, bread, beer, diaper} For Amazon: Earn more money! 4 {milk, bread, diaper, coke} {milk} � {bread} For you: Good user experience! 5 {bread, diaper, beer, eggs} � Item Sets : a set of items, like {milk, diaper}, is an item set; � Association rule : implication in the form of X � � � Y; X and Y are both item sets. � Like {milk, diaper} � {beer} � Implication means co-occurrence, not causation � Support of the rule: the fraction of transactions that contain both X and Y. I.e. F ({X, Y}) S(({milk, diaper} � {beer}) = F({milk, diaper, beer}) = 2/5 Amazon � Confidence of the rule: the ratio of transactions that contain X contain Y, i.e. F( X, Y )/F( X ) C({milk, diaper} � {beer}) = F({milk, diaper, beer})/F({milk, diaper}) = (2/5)/(3/5) = 2/3 Formal definition: Association Rule Mining Generic Algorithms � Step 1: Find all itemsets that have transaction support above � Given a large set of transaction D, generate all minimum support. These itemsets are called large itemsets . association rules that have support and � Focus of this paper: find large itemsets � AIS, SETM confidence greater than the user-specified � Apriori, AprioriTid, ApriorHybrid � minimum support (called minsup ), and Step 2: Use the large itemsets to generate the desired rules. � A straightforward algorithm: minimum confidence (called minconf ) For every large itemset L respectively. for every non-empty subset a of L, rule <- a � (L-a) � Minsup & Minconf : ensure usefulness if(C(rule) >= minconf ) output � Large : endfor � A significant of data sets in data mining endfor - Refer to <fast algorithms for mining association rules in large � require effective algorithms databases> for a fast algorithm 1

11/9/2009 Apriori: Find Large Itemsets Apriori-gen Apriori-gen(L k-1 ) � Basic Concepts: � Join step Get the superset of the set of all large k- � Generate all possible candicate large itemsets: Any Subset(Ck, t) itemsets from (k-1)-itemsets subset of a large itemset must be large � Filter out small itemsets � Assumption: items with in an itemset are kept in lexicographic order � Basic steps: Delete all itemsets that have some (k-1)- Prune step subsets which are not in (k-1) large � Generate candidate k-itemsets from large (k-1)- itemsets itemsets � For each candidate k-itemsets, calculate its support; � If its support is larger than minsup, add it to large k- itemsets � Continue the above three steps by adding 1 to k until no large k itemsets are found. Subset Example � Candidate itemsets are stored in a hash-tree � Leaf-node: contains a list of itemsets � � �� Interior node: contains a hash table, each �� bucket of the hash table points to a children Where is node. �� {1,4,5}? �� A Hash(1) �� Hash(2) �� B C Hash(2) Hash(3) {1,2,3,4}.count++; D {2,3,5}.count++; �� E F G �� AprioriTid & AprioriHybrid Evaluation Scan the whole database every time � Still use apriori-gen to generate candidate itemsets � Try to reduce the times of scan the database � Use a candidate set (called D k ) that include TID of the corresponding transaction � Due to the time limit, refer to the paper by yourself. � D k could be smaller than the whole database when k is large and can fit into the memory � Use 6 sets of the synthetic � However, may be slower than Apriori because when k is small, D k is even data larger than the original transaction. � An IBM RS/6000 530H � AprioriHybrid to combine the benefits of Apriori and AprioriTid by using a workstation heuristic to swich from Apriori to AprioriTid on the fly. � Compare to SETM and AIS 2

11/9/2009 Recent Development Inefficient when there is a large number of large itemsets and /or long large itemsets. A large itemset with the size of 100, need to generate 2 100 -2 candidates in total. � FP-tree � Refer to “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach”, Jiawei Han, Jian Pei, et al. � About an order of magnitude faster than Aprori � I don’t find other significant improved approaches. Conclusion � Important problem � Good paper as a foundation of association- Thanks for your attention! rule mining � Can be improved � Fp-tree 3

Outline Fast Algorithms for Mining Association Rules This is an - PDF document

11/9/2009 Outline Fast Algorithms for Mining Association Rules This is an important paper because VLDB 10 Years Best Paper Award Rakesh Agrawal , Ramakrishnan Srikant Has been 1st highest cited paper of all papers in the fields of

Mining Association Rules Mining Association Rules Additional Measures of rule interestingness

Association Rules from transactional databases ! Mining multilevel association rules from

Association Rules Data Mining and Exploration: Association Rules Itemsets, association rules

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Contents Association Rules: Concept and Algorithms Basics of Association Rules Algorithms:

Association Rule Mining 1 What Is Association Rule Mining? Association rule mining is finding

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Outline Association Rules: Concept and Algorithms Basics of Association Rules Algorithms:

Relationship Mining Association Rule Mining Association Rule Mining Try to automatically find

Week 5 Video 3 Relationship Mining Association Rule Mining Association Rule Mining Try to

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Week 5 Video 4 Relationship Mining Sequential Pattern Mining Association Rule Mining Try to

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Marshall McLuhan Media Poet, Confused Academic, Guru? we make our tools, and then our tools

University of North Carolina Wilmington Opened September 5 th ,

State of the X.Org Foundation FOSDEM 2014 Martin Peres & others of the board of directors

Money! https://github.com/drbean/curriculum/tree/master/conversation/trend Wed Oct 19 19:44:19

Frequent Itemsets Itemset: a set of items E.g., acm = {a, c, m} Transaction database TDB

Fundamental Data Mining Algorithms Weinan Zhang Shanghai Jiao Tong University

Languages Solve problems using a computer, give the computer instructions. Remember our

dilettantes A DILEMMA THE AGE OF INNOCENCE CHAPTERS 14 - 18 Issues from MYE IRRELEVANT

Outline Fast Algorithms for Mining Association Rules This is an - PDF document

11/9/2009 Outline Fast Algorithms for Mining Association Rules This is an important paper because VLDB 10 Years Best Paper Award Rakesh Agrawal , Ramakrishnan Srikant Has been 1st highest cited paper of all papers in the fields of

Mining Association Rules Mining Association Rules Additional Measures of rule interestingness

Association Rules from transactional databases ! Mining multilevel association rules from

Association Rules Data Mining and Exploration: Association Rules Itemsets, association rules

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Contents Association Rules: Concept and Algorithms Basics of Association Rules Algorithms:

Association Rule Mining 1 What Is Association Rule Mining? Association rule mining is finding

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Outline Association Rules: Concept and Algorithms Basics of Association Rules Algorithms:

Relationship Mining Association Rule Mining Association Rule Mining Try to automatically find

Week 5 Video 3 Relationship Mining Association Rule Mining Association Rule Mining Try to

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Week 5 Video 4 Relationship Mining Sequential Pattern Mining Association Rule Mining Try to

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Marshall McLuhan Media Poet, Confused Academic, Guru? we make our tools, and then our tools

University of North Carolina Wilmington Opened September 5 th ,

State of the X.Org Foundation FOSDEM 2014 Martin Peres &amp; others of the board of directors

Money! https://github.com/drbean/curriculum/tree/master/conversation/trend Wed Oct 19 19:44:19

Frequent Itemsets Itemset: a set of items E.g., acm = {a, c, m} Transaction database TDB

Fundamental Data Mining Algorithms Weinan Zhang Shanghai Jiao Tong University

Languages Solve problems using a computer, give the computer instructions. Remember our

dilettantes A DILEMMA THE AGE OF INNOCENCE CHAPTERS 14 - 18 Issues from MYE IRRELEVANT

State of the X.Org Foundation FOSDEM 2014 Martin Peres & others of the board of directors