part 2 mdl in l in ac actio ion
play

Part 2 MDL in L in Ac Actio ion Jilles V Vreeke ken 1 - PowerPoint PPT Presentation

Part 2 MDL in L in Ac Actio ion Jilles V Vreeke ken 1 Explicit Coding Ad hoc sounds bad, but is it really? Bayesian learning for instance, is in inherent ntly ly subjectiv ive, plus biasing search is a time-honoured tradition in


  1. Part 2 MDL in L in Ac Actio ion Jilles V Vreeke ken 1

  2. Explicit Coding Ad hoc sounds bad, but is it really? Bayesian learning for instance, is in inherent ntly ly subjectiv ive, plus  biasing search is a time-honoured tradition in data analysis  Using an expli licit encoding ing allows us to steer towards the type of structure we want nt to dis iscover We so also mitigate one of the practical weak spots of AIT all data is a string, but wouldn’t it be nice if the structure  you found would not depend on the order of the data? 2

  3. Matrix Factorization The rank of a matrix 𝑩 is number of rank-1 matrices that when summed form 𝑩 (Sche hein n rank)  𝑑 1 𝑑 2 𝑑 3 = + + … 𝑐 3 𝑐 1 𝑐 2 𝑩 𝒄 𝟑 ∘ 𝒅 𝟑 𝒄 𝟒 ∘ 𝒅 𝟒 𝒄 𝟐 ∘ 𝒅 𝟐 3

  4. Boolean Matrix Factorization The rank of a Boolean matrix 𝑩 is number of rank-1 matrices that when summed form 𝑩 (Sche hein n rank)  𝑑 3 𝑑 1 𝑑 1 𝑑 2 𝑑 1 𝑑 1 𝑐 1 + + … = 𝑐 2 𝑐 2 𝑐 1 𝑐 2 𝑐 3 𝑩 𝒄 𝟑 ∘ 𝒅 𝟑 𝒄 𝟒 ∘ 𝒅 𝟒 𝒄 𝟐 ∘ 𝒅 𝟐 (Miettinen et al 2006, 2008) 4

  5. Boolean Matrix Factorization The rank of a Boolean matrix 𝑩 is number of rank-1 matrices that when summed form 𝑩 (Sche hein n rank)  noise quickly inflate the ‘true’ latent rank to min 𝑜 , 𝑛  = + + … 𝑐 1 𝑐 1 𝑐 1 𝑩 𝒄 𝟑 ∘ 𝒅 𝟑 𝒄 𝟒 ∘ 𝒅 𝟒 𝒄 𝟐 ∘ 𝒅 𝟐 (Miettinen et al 2006, 2008) 5

  6. Boolean Matrix Factorization Noise quickly inflates the rank to min ( 𝑜 , 𝑛 ) how can we determine the ‘true’ latent rank?  ≈ 𝑩 𝑪 ∘ 𝑫 (Miettinen & Vreeken 2012, 2014) 6

  7. Boolean Matrix Factorization Separating structure and noise matrices 𝐶 and 𝐷 contain structure, matrix 𝐹 contains noise  = ⊕ 𝑩 𝑭 𝑪 ∘ 𝑫 (Miettinen & Vreeken 2012, 2014) 7

  8. Boolean Matrix Factorization Encoding the structure 𝑜 𝑀 𝑪 = log 𝑜 + � log 𝑜 + log 𝑐 𝑐∈𝑪 = ⊕ 𝑩 𝑭 𝑪 ∘ 𝑫 (Miettinen & Vreeken 2012, 2014) 8

  9. Boolean Matrix Factorization Encoding the structure 𝑀 𝑫 = log 𝑛 + � log 𝑛 + log 𝑛 𝑑 𝑑∈𝑫 = ⊕ 𝑩 𝑭 𝑪 ∘ 𝑫 (Miettinen & Vreeken 2012, 2014) 9

  10. Boolean Matrix Factorization Encoding the noise 𝑀 𝑭 = log 𝑜𝑛 + log 𝑜𝑛 𝑭 = ⊕ 𝑩 𝑭 𝑪 ∘ 𝑫 (Miettinen & Vreeken 2012, 2014) 10

  11. Boolean Matrix Factorization MDL for BMF 𝑀 𝐸 , 𝐼 = 𝑀 𝑪 + 𝑀 𝑫 + 𝑀 ( 𝑭 ) = ⊕ 𝑩 𝑭 𝑪 ∘ 𝑫 (Miettinen & Vreeken 2012, 2014) 11

  12. Pattern Mining The idea deal outcome of pattern mining  patterns that show the structure of the data  preferably a small set, without redundancy or noise Frequent pattern mining does not ot achieve this  pattern explosion → overly many, overly redundant results MDL allows us to effectively pursue the ideal  we want a group of patterns that summarise the data well  we take a patt attern s set t mining approach (Tatti & Vreeken 2012, Bertens et al. 2016, Bhattacharyya & Vreeken 2017) (for transaction data, Vreeken et al (2011), for graphs Koutra et al (2014) 12

  13. Event sequences Alphabet Ω { a, b, c, d, … } Data 𝐸 a b d c a d b a a b c a d a b a b c one, or a b d c a d b a a b c , { multiple a b d c a d b , sequences a b d c a d b a a , … } (Tatti & Vreeken 2012, Bertens et al. 2016, Bhattacharyya & Vreeken 2017) 13

  14. Event sequences Alphabet Ω { a, b, c, d, … } a b a a b b Data 𝐸 a b d c a d b a a b c a d a b a b c one, or a b d c a d b a a b c , { multiple a b d c a d b , sequences a b d c a d b a a , … } Patterns serial ‘subsequences a b episodes allowing gaps’ (Tatti & Vreeken 2012, Bertens et al. 2016, Bhattacharyya & Vreeken 2017) 14

  15. Event sequences Alphabet Ω { a, b, c, d, … } a b a a b b Data 𝐸 a b d c a d b a a b c a d a a a b c d b c one, or a b d c a d b a a b c , { multiple a b d c a d b , sequences a b d c a d b a a , … } Patterns serial ‘subsequences a b episodes allowing gaps’ d b c (Tatti & Vreeken 2012, Bertens et al. 2016, Bhattacharyya & Vreeken 2017) 15

  16. Models As models we use code de tab tables dictionary of patterns & codes  always contains all singletons  p abc ! ? q ! da ? We use optimal prefix codes a a - - easy to compute,  b b - - behave predictably, c  c - - good results, d d  - - more details follow  16

  17. Encoding Event Sequences Data 𝐸 : a b d c a d b a a b c Encoding 1: using only singletons 1 : 𝐷𝑈 a a b b 𝐷 𝑞 a b d c a d b a a b c c c d d The length of the code for pattern 𝑌 X 𝑣𝑣𝑣 ( 𝑌 ) 𝑀 = − log 𝑞 = − log ( ∑𝑣𝑣𝑣 ( 𝑍 ) ) X X The length of the code stream 𝑀 𝐷 𝑞 = ∑ 𝑣𝑣𝑣 𝑌 𝑀 ( ) X 𝑌∈𝐷𝐷 17

  18. Encoding Event Sequences Data 𝐸 : a b d c a d b a a b c Encoding 2: using patterns 𝐷𝑈 2 : a a b b 𝐷 𝑞 p q p d a b c c 𝐷 𝑣 ! ? ! ? ! ! ! d d p abc ! ? gap q ! da ? gap Alignment: a b d c a d b a a b c p ! ? ! q ? ! p ! ! 18

  19. Encoding Event Sequences Data 𝐸 : a b d c a d b a a b c Encoding 2: using patterns 𝐷𝑈 2 : a a b b 𝐷 𝑞 p q p d a b c c 𝐷 𝑣 ! ? ! ? ! ! ! d d p abc ! ? q ! da ? The length of a gap code for pattern 𝑌 ? 𝑀 = − log( 𝑞 )) p ? ? and analogue for non-gap codes ! 19

  20. Encoding Event Sequences By which, the encoded size of 𝐸 given 𝐷𝑈 and 𝐷 is 𝑀 𝐸 𝐷𝑈 = 𝑀 𝐷 𝑞 𝐷𝑈 + 𝑀 ( 𝐷 𝑣 ∣ 𝐷𝑈 ) which leaves us to define 𝑀 ( 𝐷𝑈 ∣ 𝐷 ) 20

  21. Encoding a Code T able X X ? ! … 𝑀 ( 𝐷𝑈 ∣ 𝐷 , 𝐸 ) consists of Y Y ? ! a a … z z 21

  22. Encoding a Code T able X X ? ! … 𝑀 ( 𝐷𝑈 ∣ 𝐷 , 𝐸 ) consists of Y Y ? ! a a … 1) base singleton counts in 𝐸 z z 𝐸 − 1 𝑀 ℕ Ω + 𝑀 ℕ 𝐸 + log Ω − 1 (Rissanen 1983) 22

  23. Encoding a Code T able X X ? ! … 𝑀 ( 𝐷𝑈 ∣ 𝐷 , 𝐸 ) consists of Y Y ? ! a a … 1) base singleton counts in 𝐸 z z 𝐸 − 1 𝑀 ℕ Ω + 𝑀 ℕ 𝐸 + log Ω − 1 (Rissanen 1983) 23

  24. Encoding a Code T able X X ? ! 𝒬 … 𝑀 ( 𝐷𝑈 ∣ 𝐷 , 𝐸 ) consists of Y Y ? ! a a … 1) base singleton counts in 𝐸 z z 𝐸 − 1 𝑀 ℕ Ω + 𝑀 ℕ 𝐸 + log Ω − 1 2) number of patterns, total, and per pattern usage 𝑀 ℕ 𝒬 + 1 + 𝑀 ℕ 𝑣𝑣𝑣 𝒬 + 1 + log 𝑣𝑣𝑣 𝒬 − 1 𝒬 − 1 24

  25. Encoding a Code T able X X ? ! 𝒬 … 𝑀 ( 𝐷𝑈 ∣ 𝐷 , 𝐸 ) consists of Y Y ? ! a a … 1) base singleton counts in 𝐸 z z 𝐸 − 1 𝑀 ℕ Ω + 𝑀 ℕ 𝐸 + log Ω − 1 2) number of patterns, total, and per pattern usage 𝑀 ℕ 𝒬 + 1 + 𝑀 ℕ 𝑣𝑣𝑣 𝒬 + 1 + log 𝑣𝑣𝑣 𝒬 − 1 𝒬 − 1 25

  26. Encoding a Code T able X X ? ! 𝒬 … 𝑀 ( 𝐷𝑈 ∣ 𝐷 , 𝐸 ) consists of Y Y ? ! a a … 1) base singleton counts in 𝐸 z z 𝐸 − 1 𝑀 ℕ Ω + 𝑀 ℕ 𝐸 + log Ω − 1 2) number of patterns, total, and per pattern usage 𝑀 ℕ 𝒬 + 1 + 𝑀 ℕ 𝑣𝑣𝑣 𝒬 + 1 + log 𝑣𝑣𝑣 𝒬 − 1 𝒬 − 1 3) per pattern 𝑌 : its length, elements, and number of gaps 𝑀 ℕ 𝑌 − � log 𝑞 𝑦 𝐸 + 𝑀 ℕ 𝑣𝑕𝑞𝑣 𝑌 + 1 𝑦∈𝑌 26

  27. Encoding a Code T able X X ? ! 𝒬 … 𝑀 ( 𝐷𝑈 ∣ 𝐷 , 𝐸 ) consists of Y Y ? ! a a … 1) base singleton counts in 𝐸 z z 𝐸 − 1 𝑀 ℕ Ω + 𝑀 ℕ 𝐸 + log Ω − 1 2) number of patterns, total, and per pattern usage 𝑀 ℕ 𝒬 + 1 + 𝑀 ℕ 𝑣𝑣𝑣 𝒬 + 1 + log 𝑣𝑣𝑣 𝒬 − 1 𝒬 − 1 3) per pattern 𝑌 : its length, elements, and number of gaps 𝑀 ℕ 𝑌 − � log 𝑞 𝑦 𝐸 + 𝑀 ℕ 𝑣𝑕𝑞𝑣 𝑌 + 1 𝑦∈𝑌 27

  28. Encoding Event Sequences By which we have a lossless encoding. In other words, an objective function. By MDL, our goal is now to minimise 𝑀 𝐷𝑈 , 𝐸 = 𝑀 𝐷𝑈 𝐷 + 𝑀 ( 𝐸 ∣ 𝐷𝑈 ) for how to do so, please see the papers Tatti & Vreeken (2012) Bertens et al. (2016), Bhattacharyya & Vreeken (2017) for transaction data, Vreeken et al (2011) Budhathoki & Vreeken (2015) for graphs Koutra et al (2014) 28

  29. Experiments synt nthe hetic data random  no structure found  HMM  structure recovered real l data text data for interpretation  S QS -C ANDS S QS -S EARCH Ω Δ𝑀 | 𝐸 | # 𝐷𝑜𝐷𝑣 |  | |  | Addresses 5 295 56 15 506 138 155 5k JMLR 3 846 788 40 879 563 580 30k Moby Dick 10 277 1 22 559 215 231 10k (implementation available at http://eda.mmci.uni-saarland/sqs) 29

Recommend


More recommend