mining association rules mining association rules
play

Mining Association Rules Mining Association Rules Additional - PowerPoint PPT Presentation

Mining Association Rules Mining Association Rules What is Association rule mining Apriori Algorithm Mining Association Rules Mining Association Rules Additional Measures of rule interestingness Advanced Techniques 1 2 What Is


  1. Mining Association Rules Mining Association Rules � What is Association rule mining � Apriori Algorithm Mining Association Rules Mining Association Rules � Additional Measures of rule interestingness � Advanced Techniques 1 2 What Is Association Rule Mining? What Is Association Rule Mining? What Is Association Rule Mining? What Is Association Rule Mining? � Rule form Association rule mining � Antecedent → Consequent [support, confidence] Finding frequent patterns, associations, correlations, or causal � structures among sets of items in transaction databases (support and confidence are user defined measures of interestingness) Understand customer buying habits by finding associations and � correlations between the different items that customers place in � Examples their “shopping basket” � buys(x, “computer”) → buys(x, “financial management software”) [0.5%, 60%] Applications � Basket data analysis, cross-marketing, catalog design, loss-leader � � age(x, “30..39”) ^ income(x, “42..48K”) � buys(x, “car”) [1%,75%] analysis, web log analysis, fraud detection (supervisor->examiner) 3 4

  2. How can Association Rules be used? How can Association Rules be used? How can Association Rules be used? How can Association Rules be used? Probably mom was � Let the rule discovered be calling dad at work to buy {Bagels,...} → {Potato Chips} diapers on way home and he � Potato chips as consequent => Can be used to determine what decided to buy a should be done to boost its sales six-pack as well. The retailer � Bagels in the antecedent => Can be used to see which products could move diapers and beers would be affected if the store discontinues selling bagels to separate places and position high- � Bagels in antecedent and Potato chips in the consequent => profit items of Can be used to see what products should be sold with Bagels interest to young to promote sale of Potato Chips fathers along the path. 5 6 Rule basic Measures: Support and Confidence Rule basic Measures: Support and Confidence Association Rule: Basic Concepts Association Rule: Basic Concepts A ⇒ B [ s, c ] � Given: Support: denotes the frequency of the rule within � (1) database of transactions, transactions. A high value means that the rule involve a � (2) each transaction is a list of items purchased by a customer great part of database . in a visit support(A ⇒ B [ s, c ]) = p(A ∪ B) � Find: � all rules that correlate the presence of one set of items Confidence: denotes the percentage of transactions ( itemset ) with that of another set of items containing A which contain also B. It is an estimation of conditioned probability . � E.g., 98% of people who purchase tires and auto accessories also get automotive services done confidence(A ⇒ B [ s, c ]) = p(B|A) = sup(A,B)/sup(A). 7 8

  3. Example Example Mining Association Rules Mining Association Rules Itemset: A,B or B,E,F � What is Association rule mining Trans. Id Purchased Items Support of an itemset: 1 A,D � Apriori Algorithm Sup(A,B)=1 Sup(A,C)=2 2 A,C � Additional Measures of rule interestingness Frequent pattern: 3 A,B,C 4 B,E,F Given min. sup=2, {A,C} is a � Advanced Techniques frequent pattern For minimum support = 50% and minimum confidence = 50%, we have the following rules A => C with 50% support and 66% confidence C => A with 50% support and 100% confidence 9 10 Boolean association rules association rules Boolean Mining Association Rules - - An An Example Example Mining Association Rules Min. support 50% Transaction ID Items Bought Min. confidence 50% Each transaction is 2000 A,B,C represented by a Boolean vector 1000 A,C Frequent Itemset Support 4000 A,D {A} 75% 5000 B,E,F {B} 50% {C} 50% {A,C} 50% For rule A ⇒ C : support = support({ A , C }) = 50% confidence = support({ A , C }) / support({ A }) = 66.6% 11 12

  4. The Apriori Apriori principle principle The Apriori principle Apriori principle Any subset of a frequent itemset must � No superset of any infrequent itemset should be be frequent generated or tested � Many item combinations can be pruned � A transaction containing {beer, diaper, nuts} also contains {beer, diaper} � {beer, diaper, nuts} is frequent � {beer, diaper} must also be frequent 13 14 Itemset Lattice Itemset Lattice Apriori principle for pruning candidates Apriori principle for pruning candidates null null If an itemset is infrequent, then all of its A B C D E supersets must also be A B C D E infrequent AB AC AD AE BC BD BE CD CE DE AB AC AD AE BC BD BE CD CE DE Found to be ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE Infrequent ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE Pruned ABCD ABCE ABDE ACDE BCDE supersets ABCDE ABCDE 15 16

  5. The Apriori Algorithm — — Example Example The Apriori Algorithm Mining Frequent Itemsets Itemsets (the Key Step) Mining Frequent (the Key Step) Min. support: 2 transactions Database D C 1 itemset sup. � Find the frequent itemsets: the sets of items that L 1 itemset sup. {1} 2 TID Items have minimum support {1} 2 {2} 3 100 1 3 4 Scan D {2} 3 {3} 3 200 2 3 5 � A subset of a frequent itemset must also be a frequent {4} 1 {3} 3 300 1 2 3 5 itemset {5} 3 {5} 3 400 2 5 � Generate length (k+1) candidate itemsets from length k C 2 C 2 itemset frequent itemsets, and itemset sup L 2 {1 2} itemset sup {1 2} 1 � Test the candidates against DB to determine which are in fact {1 3} {1 3} 2 {1 3} 2 frequent Scan D {1 5} {1 5} 1 {2 3} 2 {2 3} {2 3} 2 {2 5} 3 {2 5} {2 5} 3 � Use the frequent itemsets to generate association {3 5} 2 {3 5} {3 5} 2 rules. L 3 C 3 Scan D itemset itemset sup � Generation is straightforward {2 3 5} {2 3 5} 2 17 18 How to Generate Candidates? How to Generate Candidates? How to Generate Candidates? How to Generate Candidates? � The items in L k-1 are listed in an order � Step 2: pruning � Step 1: self-joining L k-1 for all itemsets c in C k do insert into C k for all (k-1)-subsets s of c do select p.item 1 , p.item 2 , …, p.item k-1 , q.item k-1 if (s is not in L k-1 ) then delete c from C k from L k-1 p, L k-1 q where p.item 1 =q.item 1 , …, p.item k-2 =q.item k-2 , p.item k-1 < q.item k-1 A D E F A D E = = < A D F 19 20

  6. The Apriori Algorithm The Apriori Algorithm Example of Generating Candidates Example of Generating Candidates C k : Candidate itemset of size k L k : frequent itemset of size k � � L 3 = { abc, abd, acd, ace, bcd } Join Step: C k is generated by joining L k-1 with itself � Prune Step : Any (k-1)-itemset that is not frequent cannot be a � � Self-joining : L 3 *L 3 subset of a frequent k-itemset Algorithm: � abcd from abc and abd � L 1 = {frequent items}; for ( k = 1; L k != ∅ ; k ++) do begin � acde from acd and ace C k+1 = candidates generated from L k ; for each transaction t in database do � Pruning (before counting its support) : increment the count of all candidates in C k+1 that are contained in t � acde is removed because ade is not in L 3 L k+1 = candidates in C k+1 with min_support end � C 4 ={ abcd } return L = ∪ k L k ; 21 22 How to Count Supports of Candidates? How to Count Supports of Candidates? Generating AR from frequent intemsets Generating AR from frequent intemsets support_count({A B}) Why counting supports of candidates a problem? , � Confidence (A ⇒ B) = P(B|A) = � support_count({A}) The total number of candidates can be very huge � One transaction may contain many candidates � � For every frequent itemset x, generate all non-empty subsets of x Method: � Candidate itemsets are stored in a hash-tree � Leaf node of hash-tree contains a list of itemsets and counts � � For every non-empty subset s of x, output the rule Interior node contains a hash table � Subset function: finds all the candidates contained in a transaction � “ s ⇒ (x-s) ” if support_co unt({x}) min_conf ≥ support_co unt({s}) 23 24

Recommend


More recommend