association rule mining
play

Association Rule Mining 1 What Is Association Rule Mining? - PowerPoint PPT Presentation

Association Rule Mining 1 What Is Association Rule Mining? Association rule mining is finding frequent patterns or associations among sets of items or objects, usually amongst transactional data Applications include Market Basket


  1. Association Rule Mining 1

  2. What Is Association Rule Mining?  Association rule mining is finding frequent patterns or associations among sets of items or objects, usually amongst transactional data  Applications include Market Basket analysis, cross-marketing, catalog design, etc. 2

  3. Association Mining  Examples.  Rule form: “ Body  ead [support, confidence]”.  buys(x, “diapers”)  buys(x, “beers”) [0.5%, 60%]  buys(x, "bread")  buys(x, "milk") [0.6%, 65%]  major(x, "CS") /\ takes(x, "DB")  grade(x, "A") [1%, 75%]  age(X,30-45) /\ income(X, 50K-75K)  buys(X, SUVcar)  age=“30 - 45”, income=“50K - 75K”  car=“SUV”

  4. Market-basket Analysis & Finding Associations  Do items occur together?  Proposed by Agrawal et al in 1993.  It is an important data mining model studied extensively by the database and data mining community.  Assumes all data are categorical.  Initially used for Market Basket Analysis to find how items purchased by customers are related. Bread  Milk [sup = 5%, conf = 100%]

  5. Association Rule: Basic Concepts  Given: (1) database of transactions, (2) each transaction is a list of items (purchased by a customer in a visit)  Find: all rules that correlate the presence of one set of items with that of another set of items  E.g., 98% of people who purchase tires and auto accessories also get automotive services done  Applications  *  Maintenance Agreement (What the store should do to boost Maintenance Agreement sales)  Home Electronics  * (What other products should the store stocks up?)  Detecting “ping - pong”ing of patients, faulty “collisions” 5

  6. Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction Market-Basket transactions Example of Association Rules TID Items {Diaper}  {Beer}, 1 Bread, Milk {Milk, Bread}  {Eggs,Coke}, {Beer, Bread}  {Milk}, 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer Implication means co-occurrence, not causality! 5 Bread, Milk, Diaper, Coke An itemset is simply a set of items 6

  7. Examples from a Supermarket  Can you think of association rules from a supermarket?  Let’s say you identify association rules from a supermarket, how might you exploit them?  That is, if you are the store manager, how might you make money?  Assume you have a rule of the form X  Y 7

  8. Supermarket examples  If you have a rule X  Y, you could:  Run a sale on X if you want to increase sales of Y  Locate the two items near each other  Locate the two items far from each other to make the shopper walk through the store  Print out a coupon on checkout for Y if shopper bought X but not Y 8

  9. Association “ rules ”– standard format Rule format: ( A set can consist of just a single item ) If {set of items}  Then {set of items} Condition Results Then If {Diapers, {Beer, Chips} Baby Food} Customer Customer buys both Condition implies Results buys diaper Right side very often is a single item Rules do not imply causality Customer buys beer

  10. What is an Interesting Association?  Requires domain-knowledge validation  Actionable, non-trivial, understandable  Algorithms provide first-pass based on statistics on how “unexpected” an association is  Some standard statistics used: C  R  support ≈ p(R&C)  percent of “baskets” where rule holds  confidence ≈ p(R|C)  percent of times R holds when C holds

  11. Support and Confidence  Find all the rules X  Y with Customer Customer buys both minimum confidence and support buys diaper  Support = probability that a transaction contains {X,Y}  i.e., ratio of transactions in which X, Y occur together to all transactions in DB.  Confidence = conditional probability Customer that a transaction having X contains Y buys beer  i.e., ratio of transactions in which X, Y occur together to those in which X occurs. Thel confidence of a rule LHS => RHS can be computed as the support of the whole itemset divided by the support of LHS: Confidence (LHS => RHS) = Support(LHS  RHS) / Support(LHS)

  12. Definition: Frequent Itemset  Itemset  A collection of one or more items TID Items  Example: {Milk, Bread, Diaper} 1 Bread, Milk  k-itemset: itemset with k items 2 Bread, Diaper, Beer, Eggs Support count (  )  3 Milk, Diaper, Beer, Coke  Frequency count of occurrence of itemset 4 Bread, Milk, Diaper, Beer E.g.  ({Milk, Bread,Diaper}) = 2  5 Bread, Milk, Diaper, Coke  Support  Fraction of transactions containing the itemset  E.g. s({Milk, Bread, Diaper}) = 2/5  Frequent Itemset  An itemset whose support is greater than or equal to a minsup threshold 12

  13. Support and Confidence Calculations  Given Association Rule TID Items – {Milk, Diaper}  {Beer} 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs  Rule Evaluation Metrics 3 Milk, Diaper, Beer, Coke – Support (s) 4 Bread, Milk, Diaper, Beer  Fraction of transactions that 5 Bread, Milk, Diaper, Coke contain both X and Y – Confidence (c)  Measures how often items in Y appear in transactions that  { Milk , Diaper } Beer contain X Now Compute these two metrics   ( Milk , Diaper, Beer ) 2    0 . 4 s | T | 5  ( Milk, Diaper, Beer ) 2    0 . 67 c  ( Milk , Diaper ) 3

  14. Support and Confidence – 2 nd Example Itemset {A, C} has a support of 2/5 = 40% Transaction ID Items Bought Rule {A} ==> {C} has confidence of 1001 A, B, C 50% 1002 A, C Rule {C} ==> {A} has confidence of 1003 A, D 100% 1004 B, E, F Support for {A, C, E} ? 1005 A, D, F Support for {A, D, F} ? Confidence for {A, D} ==> {F} ? Confidence for {A} ==> {D, F} ? Goal: Find all rules that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf).

  15. t1: Beef, Chicken, Milk Example t2: Beef, Cheese t3: Cheese, Boots t4: Beef, Chicken, Cheese t5: Beef, Chicken, Clothes, Cheese, Milk t6: Chicken, Clothes, Milk  Transaction data t7: Chicken, Milk, Clothes  Assume: minsup = 30% minconf = 80%  An example frequent itemset : {Chicken, Clothes, Milk} [sup = 3/7]  Rules from the itemset are partitions of the items  Association rules from above itemset: Clothes  Milk, Chicken [sup = 3/7, conf = 3/3] … … Clothes, Chicken  Milk, [sup = 3/7, conf = 3/3] 15

  16. Mining Association Rules Example of Rules: TID Items {Milk,Diaper}  {Beer} (s=0.4, c=0.67) 1 Bread, Milk {Milk,Beer}  {Diaper} (s=0.4, c=1.0) 2 Bread, Diaper, Beer, Eggs {Diaper,Beer}  {Milk} (s=0.4, c=0.67) 3 Milk, Diaper, Beer, Coke {Beer}  {Milk,Diaper} (s=0.4, c=0.67) 4 Bread, Milk, Diaper, Beer {Diaper}  {Milk,Beer} (s=0.4, c=0.5) 5 Bread, Milk, Diaper, Coke {Milk}  {Diaper,Beer} (s=0.4, c=0.5) Observations: • All the above rules are binary partitions of the same itemset: {Milk, Diaper, Beer} • Rules originating from the same itemset have identical support (by definition) but may have different confidence values

  17. Drawback of Confidence Coffee Coffee Tea 15 5 20 Tea 75 5 80 90 10 100 Association Rule: Tea  Coffee Confidence= P(Coffee|Tea) = 0.75 but P(Coffee) = 0.9  Although confidence is high, rule is misleading  P(Coffee|Tea) = 0.9375

  18. Mining Association Rules  Two-step approach: Frequent Itemset Generation 1. Generate all itemsets whose support  minsup – Rule Generation 2. – Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset  Frequent itemset generation is still computationally expensive

  19. Transaction data representation  A simplistic view of “shopping baskets”  Some important information not considered:  the quantity of each item purchased  the price paid 19

  20. Many mining algorithms  There are a large number of them  They use different strategies and data structures.  Their resulting sets of rules are all the same.  Given a transaction data set T , and a minimum support and a minimum confident, the set of association rules existing in T is uniquely determined.  Any algorithm should find the same set of rules although their computational efficiencies and memory requirements may be different.  We study only one: the Apriori Algorithm 20

  21. The Apriori algorithm  The best known algorithm  Two steps :  Find all itemsets that have minimum support ( frequent itemsets , also called large itemsets).  Use frequent itemsets to generate rules.  E.g., a frequent itemset {Chicken, Clothes, Milk} [sup = 3/7] and one rule from the frequent itemset Clothes  Milk, Chicken [sup = 3/7, conf = 3/3] 21

  22. Step 1: Mining all Frequent Itemsets  A frequent itemset is an itemset whose support is ≥ minsup.  Key idea: The Apriori property (downward closure property): any subsets of a frequent itemset are also frequent itemsets ABC ABD ACD BCD AB AC AD BC BD CD A B C D 22

Recommend


More recommend