interesting patterns
play

Interesting Patterns Jilles Vreeken 15 May 2015 Questions of the - PowerPoint PPT Presentation

Interesting Patterns Jilles Vreeken 15 May 2015 Questions of the Day What is interestingness? what is a pattern? and how can we mine interest esting patterns? What is a pattern? Data ata Pattern ern y = x - 1 What is a pattern?


  1. Interesting Patterns Jilles Vreeken 15 May 2015

  2. Questions of the Day What is interestingness? what is a pattern? and how can we mine interest esting patterns?

  3. What is a pattern? Data ata Pattern ern y = x - 1

  4. What is a pattern? Recurring structure Dat ata Pat attern

  5. Pattern mining, formally For a database 𝑒𝑒  a pattern language 𝑄 and a set of constraints 𝐷 the go goal al is to find the set of patterns 𝑇 βŠ† 𝑄 such that  each π‘ž ∈ 𝑄 satisfies each 𝑑 ∈ 𝐷 on 𝑒𝑒 , and 𝑇 is maximal That is, find all ll patterns that satisfy the constraints

  6. Frequent Pattern Mining Suppose a supermarket,  which sells it items , 𝐽 , and  logs every trans nsaction n 𝑒 βŠ† 𝐽 in a database db  an interesting question to ask is, β€˜What products are often sold together?’  pattern language: all possible sets of items 𝑄 =  ( 𝐽 )  pattern: an itemse mset, π‘Œ βŠ† 𝐽 , π‘Œ ∈ 𝑄

  7. Frequent Itemsets π‘‘π‘‘π‘žπ‘žπ‘‘π‘‘π‘’ ( ) = 3

  8. Frequent Conjunctive Formulas 4.9, 3.1, 1.5, 0.1, Iris-setosa 5.0, 3.2, 1.2, 0.2, Iris-setosa 5.5, 3.5, 1.3, 0.2, Iris-setosa 4.9, 3.1, 1.5, 0.1, Iris-setosa Petal length >= 2.0 4.4, 3.0, 1.3, 0.2, Iris-setosa and Petal width <= 0.5 5.1, 3.4, 1.5, 0.2, Iris-setosa 5.0, 3.5, 1.3, 0.3, Iris-setosa 4.5, 2.3, 1.3, 0.3, Iris-setosa 4.4, 3.2, 1.3, 0.2, Iris-setosa 5.0, 3.5, 1.6, 0.6, Iris-setosa 5.1, 3.8, 1.9, 0.4, Iris-setosa 4.8, 3.0, 1.4, 0.3, Iris-setosa 5.1, 3.8, 1.6, 0.2, Iris-setosa 4.6, 3.2, 1.4, 0.2, Iris-setosa 5.3, 3.7, 1.5, 0.2, Iris-setosa 5.0, 3.3, 1.4, 0.2, Iris-setosa 7.0, 3.2, 4.7, 1.4, Iris-versicolor 6.4, 3.2, 4.5, 1.5, Iris-versicolor 6.9, 3.1, 4.9, 1.5, Iris-versicolor 5.5, 2.3, 4.0, 1.3, Iris-versicolor 6.5, 2.8, 4.6, 1.5, Iris-versicolor 5.7, 2.8, 4.5, 1.3, Iris-versicolor 6.3, 3.3, 4.7, 1.6, Iris-versicolor 4.9, 2.4, 3.3, 1.0, Iris-versicolor 6.6, 2.9, 4.6, 1.3, Iris-versicolor 5 2 2 7 3 9 1 4 Iris-versicolor

  9. Frequent Subgraphs

  10. The Frequent Pattern Problem The task is to find all frequent patterns β€˜how often is π‘Œ sold’ ↔ sup π‘Œ βŠ† 𝑒 | π‘Œ = | 𝑒 ∈ 𝑒𝑒 𝑒𝑒  the number of transactions in 𝑒𝑒 that β€˜support’ the pattern β€˜often enough’ ↔ sup β‰₯ π‘›π‘›π‘›π‘‘π‘‘π‘ž 𝑒𝑒  have a support higher than the minimal-support threshold So, the problem is to find all π‘Œ ∈  with sup π‘Œ β‰₯ π‘›π‘›π‘›π‘‘π‘‘π‘ž 𝑒𝑒  how can do we do this?

  11. Monotonicity The number of possible patterns is exponential, and hence exhaustive search is not a feasible option. However, in 1994 it was discovered that support exhibits mono notonic nicity. That is, for two itemsets π‘Œ and 𝑍 , we know π‘Œ βŠ‚ 𝑍 β†’ π‘‘π‘‘π‘žπ‘ž π‘Œ β‰₯ π‘‘π‘‘π‘žπ‘ž 𝑍 This is known as the A Priori property. It allows efficient search for frequent itemsets over the lattice of all itemsets.

  12. The Itemset Lattice abcd (1) a b c d bcd (1) abc (2) abd (3) acd (1) 1 1 1 1 1 1 1 0 cd (1) ab (4) ac (2) ad (3) bc (2) bd (3) 1 1 0 1 1 1 0 1 a (4) b (4) c (3) d (3) 0 0 1 0 1 0 0 0 βˆ… (6) data itemset lattice

  13. The Itemset Lattice abcd (1) a b c d frequent bcd (1) abc (2) abd (3) acd (1) 1 1 1 1 1 1 1 0 cd (1) ab (4) ac (2) ad (3) bc (2) bd (3) 1 1 0 1 1 1 0 1 a (5) b (4) c (3) d (3) 0 0 1 0 1 0 0 0 βˆ… (6) data itemset lattice

  14. Levelwise search 1 = { 𝑛 ∈ 𝐽 ∣ π‘‘π‘‘π‘žπ‘ž 𝑛 β‰₯ π‘›π‘›π‘›π‘‘π‘‘π‘ž } 𝐺 1. while le 𝐺 𝑙 not empty 2. 2. π‘Œ = 𝑙 + 1, βˆ€ π‘βŠ‚π‘Œ , 𝑍 =𝑙 𝑍 ∈ 𝐺 𝑙 𝐷 𝑙+1 = π‘Œ ∈ 𝑄 3. π‘‘π‘‘π‘žπ‘ž π‘Œ β‰₯ π‘›π‘›π‘›π‘‘π‘‘π‘ž 𝐺 𝑙+1 = π‘Œ ∈ 𝐷 𝑙+1 4. 1 βˆͺ 𝐺 2 βˆͺ β‹― retu eturn 𝐺 5. 5. The A Priori algorithm can be applied to mine patterns for any enumerable pattern language 𝑄 for any monotonic constraint 𝑑 . Many algorithms exist that are more efficient, but none so versatile.

  15. Problems in pattern paradise The pattern explosion  high thresholds few, but well-known patterns  low thresholds a gazillion patterns Many patterns are redundant Unstable  small data change, yet different results  even when distribution did not really change

  16. The Wine Explosion the Wine dataset has 178 rows, 14 columns

  17. T o the Max! Why not just report only patterns for which there is no extension that is frequent? These patterns are called maxim imall lly frequent. abcd (1) frequent bcd (1) abc (2) abd (3) acd (1) cd (1) ab (4) ac (2) ad (3) bc (2) bd (3) d (3) a (5) b (4) c (3) βˆ… (6) (Bayardo, 1998)

  18. Closure! Why throw away so much information? If we keep all π‘Œ that cannot be extended without π‘‘π‘‘π‘žπ‘ž π‘Œ dropping, all frequent itemsets and their frequencies can be reconstructed without loss! These are called closed d frequent itemsets. abcd (1) frequent bcd (1) abc (2) abd (3) acd (1) cd (1) ab (4) ac (2) ad (3) bc (2) bd (3) a (5) b (4) c (3) d (3) βˆ… (6) (Pasquier, 1999)

  19. Non-Derivable Patterns Through inclusion/exclusion, we can derive the support of 𝑏𝑒𝑑 . As π‘‘π‘‘π‘žπ‘ž 𝑒𝑑 = π‘‘π‘‘π‘žπ‘ž 𝑑 = 2 , we know 𝑒 and 𝑑 always co-occur. Then, knowing that π‘‘π‘‘π‘žπ‘ž 𝑏𝑑 = 2 , we can derive π‘‘π‘‘π‘žπ‘ž 𝑏𝑒𝑑 = 2 . abcd (1) frequent bcd (1) abc (2) abd (3) acd (1) non-derivable cd (1) ab (4) ac (2) ad (3) bc (2) bd (3) a (5) b (4) c (3) d (3) βˆ… (6) (Calders & Goethals, 2003)

  20. Margin-Closed Who cares that we can reconstruct all frequencies exactly? Why not allow a little bit of slack and zap more patterns? That is the main of idea of margin-closed ed frequent itemsets. abcd (1) frequent bcd (1) abc (2) abd (3) acd (1) cd (1) ab (4) ac (2) ad (3) bc (2) bd (3) a (5) b (4) c (3) d (3) βˆ… (6) (Moerchen et al, 2011)

  21. Associations Why is a frequent pattern π‘Œ interesting? Because it identifies assoc sociation ons between elements of π‘Œ . Many people buy both and . What’s going on? Many patients have active genes A, B and C. What’s going on? Many molecules share this structure. What’s going on? Okay… but does higher frequency mean more interesting?

  22. Expectation Frequency alone is deceiving, and leads to redundant results. Say that many many people buy . Then all `real’ patterns, such as can be extended with and we likely find that is also frequent. Do we want it to be reported? Not unless its support deviates strongly from our expectation.

  23. What did you expect? What do we expect? How do we model this? How can we measure whether expectation and reality are different enough? Let’s start simple. Let’s assume all ll it items are in independ ndent nt.

  24. Independence! Under the assumption that all items 𝑛 ∈ 𝐽 are independent, the expected frequency of an itemset π‘Œ is simply 𝑛𝑛𝑒 π‘Œ = οΏ½ 𝑔𝑑 ( 𝑦 ) π‘¦βˆˆπ‘Œ 𝑑𝑑𝑑𝑑 𝑦 where we write 𝑔𝑑 𝑦 = for the frequency – the relative 𝑒𝑒 support – of an item 𝑦 ∈ π‘Œ in our database. Item frequencies can easily be extracted from data, as well as reasonably expected to be known by your domain expert.

  25. Bro, do you even lift? We want to identify patterns for which their frequency in the data deviates strongly from our expectation. One way to measure this deviation is lift. π‘šπ‘›π‘”π‘’ π‘Œ = 𝑔𝑑 π‘Œ 𝑛𝑛𝑒 π‘Œ Patterns with a lift higher than 1 are more frequent than expected. Those with lift lower than 1 are less frequent. In our data/lattice example π‘šπ‘›π‘”π‘’ 𝐡𝐡 = 1.2 and π‘šπ‘›π‘”π‘’ 𝐡𝐡𝐷 = 1.83 , (IBM, 1996)

  26. Example: Lift 4 𝑔𝑑 𝐡𝐡 = 6 5 4 𝑛𝑛𝑒 𝐡𝐡 = 6 βˆ— 6 = 0.55 A B C D 0 . 66 1 1 1 1 0 . 55 = 1.2 π‘šπ‘›π‘”π‘’ 𝐡𝐡 = 1 1 1 0 1 1 0 1 1 1 0 1 2 𝑔𝑑 𝐡𝐡𝐷 = 6 0 0 1 0 5 4 3 𝑛𝑛𝑒 𝐡𝐡𝐷 = 6 βˆ— 6 βˆ— 6 = 0.28 1 0 0 0 0 . 33 π‘šπ‘›π‘”π‘’ 𝐡𝐡𝐷 = 0 . 28 = 1.83 That is, according to π‘šπ‘›π‘”π‘’ , 𝐡𝐡𝐷 is more interesting than 𝐡𝐡 .

  27. Lift Lift is ad hoc. Lift strongly over-estimates, or under-estimates how surprising the frequency of a pattern is. It is a ba bad interestingness measure. Somewhat more formally: Lift is ad hoc because it comp mpares s scores dir irectly, it does not consider how likely scores are, and doe oes not ot use se a prop oper statistic ical l test to determine how significant the deviation is.

  28. Better Lift The probability of a random om transa saction on to suppor ort π‘Œ is 𝑛𝑛𝑒 ( π‘Œ ) . Assume our dataset contains 𝑂 transactions, and let π‘Ž be a random variable to state how many transactions support π‘Œ . Then, 𝑄 ( π‘Ž = 𝑁 ) is the probability that the support of π‘Œ is 𝑁 , and is given by the binomial distribution, with π‘Ÿ = 𝑛𝑛𝑒 π‘Œ 𝑂 𝑁 π‘Ÿ 𝑁 1 βˆ’ π‘Ÿ π‘‚βˆ’π‘ π‘ž π‘Ž = 𝑁 = We can now calculate how likely it is to observe a support of π‘‘π‘‘π‘žπ‘ž ( π‘Œ ) or hig ighe her, and decide whether the π‘ž -value π‘ž π‘Ž β‰₯ π‘‘π‘‘π‘žπ‘ž π‘Œ 𝑂 is significant (eg. < 0.05 )

Recommend


More recommend