vegan fleas movie ratings and the em algorithm
play

Vegan fleas, movie ratings, and the EM algorithm Carlos Cotrini - PowerPoint PPT Presentation

Vegan fleas, movie ratings, and the EM algorithm Carlos Cotrini Department of Computer Science ETH Z urich ccarlos@inf.ethz.ch March 25, 2019 Carlos Cotrini (ETH Z urich) The EM algorithm March 25, 2019 1 / 36 Overview The vegan-flea


  1. Vegan fleas, movie ratings, and the EM algorithm Carlos Cotrini Department of Computer Science ETH Z¨ urich ccarlos@inf.ethz.ch March 25, 2019 Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 1 / 36

  2. Overview The vegan-flea optimization problem 1 Building a movie recommendation system 2 The EM algorithm 3 Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 2 / 36

  3. The vegan-flea optimization problem Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 3 / 36

  4. A two-dimensional dog Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 4 / 36

  5. The dog’s cardiovascular system Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 5 / 36

  6. The dog’s cardiovascular system Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 6 / 36

  7. The flea, the dog’s skin, and the vessel’s upper border Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 7 / 36

  8. Animation Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 8 / 36

  9. Formalization Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 9 / 36

  10. Assumptions We assume that for any x ∈ [0 , 1] and any two time points t 1 , t 2 ∈ [0 , ∞ ) , skin ( x, t 1 ) − vessel ( x, t 1 ) = skin ( x, t 2 ) − vessel ( x, t 2 ) . For any x ∈ [0 , 1] and any t ∈ [0 , ∞ ) , there is t ′ ≥ t such that vessel ( x, t ′ ) is a maximum of vessel ( · , t ′ ) . For any t ∈ [0 , ∞ ) , the flea can efficiently compute a point x ∗ that maximizes skin ( · , t ) . For any x ∈ [0 , 1] and any t ∈ [0 , ∞ ) , the flea can efficiently compute ˆ t ≥ t such that vessel ( x, ˆ t ) is a maximum of vessel ( · , ˆ t ) . Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 10 / 36

  11. Objective Can the flea compute x ∗ such that d ( x ∗ ) ≥ d ( x 0 ) , where x 0 is the flea’s current position? Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 11 / 36

  12. Optimization algorithm Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 12 / 36

  13. Optimization algorithm Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 13 / 36

  14. Optimization algorithm Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 14 / 36

  15. Why does this work? Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 15 / 36

  16. A movie recommendation system Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 16 / 36

  17. A simple dataset of movie ratings Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 17 / 36

  18. A simple dataset of movie ratings Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 18 / 36

  19. A simple dataset of movie ratings Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 19 / 36

  20. A simple dataset of movie ratings Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 20 / 36

  21. A probability model for movie ratings Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 21 / 36

  22. A probability model for movie ratings Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 22 / 36

  23. A probability model for movie ratings Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 23 / 36

  24. A probability model for movie ratings Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 24 / 36

  25. A probability model for movie ratings Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 25 / 36

  26. A probability model for movie ratings Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 26 / 36

  27. A probability model for movie ratings Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 27 / 36

  28. Notation X = ( x i,j ) i ≤ N,j ≤ D . Here, x i,j ∈ { 0 , 1 } indicates whether person i liked movie j or not. µ = ( µ k,j ) k ≤ K,j ≤ D . Here, µ k,j ∈ [0 , 1] denotes the probability that ¯ someone in category k likes movie j . ν = ( ν k ) k ≤ K . Here, ν k ∈ [0 , 1] denotes the probability that a ¯ person belongs to category k . z = ( z ( i )) i ≤ N . Here, z ( i ) ∈ { 0 , . . . , K } indicates person i ’s ¯ category. Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 28 / 36

  29. How to mine a probability model from X ? Maximum-likelihood approach: Solve the following problem. arg max log p ( X | ¯ µ, ¯ ν ) . ¯ µ, ¯ ν � s.t. ν k = 1 . k ≤ K Incomplete-data log likelihood. Complete-data log likelihood. log p ( X, ¯ z | ¯ µ, ¯ ν ) . Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 29 / 36

  30. How to mine a probability model from X ? Maximum-likelihood approach: Solve the following problem. �� j ≤ D µ x i,j � z ( i ) ,j (1 − µ z ( i ) ,j ) 1 − x i,j arg max � i ≤ N log � z ( i ) ν z ( i ) . µ, ¯ ¯ ν � s.t. ν k = 1 . k ≤ K Incomplete-data log likelihood. Complete-data log likelihood. � � � i ≤ N log ν z ( i ) + � j ≤ D x i,j log µ z ( i ) ,j + (1 − x i,j ) log 1 − µ z ( i ) ,j . Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 30 / 36

  31. The dilemma We are between a problem we want to solve, but we don’t know how, and a problem we know how to solve but we don’t want to solve. Let’s try to connect them. Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 31 / 36

  32. Connecting incomplete-data and complete-data log likelihoods Let θ = (¯ µ, ¯ ν ) How can we connect log p ( X | θ ) and log p ( X, ¯ z | θ ) ? We can start with the following: z | X, θ ) = p ( X, ¯ z | θ ) p (¯ p ( X | θ ) . Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 32 / 36

  33. Connecting incomplete-data and complete-data log likelihoods Let θ = (¯ µ, ¯ ν ) How can we connect log p ( X | θ ) and log p ( X, ¯ z | θ ) ? We can start with the following: z | X, θ ) = p ( X, ¯ z | θ ) p (¯ p ( X | θ ) . From here, we can derive that: log p ( X | θ ) = log p ( X, ¯ z | θ ) − log p (¯ z | X, θ ) . Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 32 / 36

  34. Connecting incomplete-data and complete-data log likelihoods Let θ = (¯ µ, ¯ ν ) How can we connect log p ( X | θ ) and log p ( X, ¯ z | θ ) ? We can start with the following: z | X, θ ) = p ( X, ¯ z | θ ) p (¯ p ( X | θ ) . From here, we can derive that: log p ( X | θ ) = log p ( X, ¯ z | θ ) − log p (¯ z | X, θ ) . But we don’t know the value of ¯ z . Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 32 / 36

  35. Take expectations on both sides with respect to ¯ z , using some pdf ˜ p (¯ z ) for ¯ z . � � � p (¯ ˜ z ) log p ( X | θ ) d ¯ z = p (¯ ˜ z ) log p ( X, ¯ z | θ ) d ¯ z − p (¯ ˜ z ) log p (¯ z | X, θ ) d ¯ z. Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 33 / 36

  36. Since log p ( X | θ ) does not depend on ¯ z , we get � � log p ( X | θ ) = p (¯ ˜ z ) log p ( X, ¯ z | θ ) d ¯ z − p (¯ ˜ z ) log p (¯ z | X, θ ) d ¯ z. Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 34 / 36

  37. In other words, log p ( X | θ ) = E ˜ z ) log p ( X, ¯ z | θ ) − E ˜ z ) log p (¯ z | X, θ ) . p (¯ p (¯ Does this look familiar? Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 35 / 36

  38. In other words, log p ( X | θ ) = E ˜ z ) log p ( X, ¯ z | θ ) − E ˜ z ) log p (¯ z | X, θ ) . p (¯ p (¯ Does this look familiar? d ( θ ) = skin ( θ, ˜ p ) − vessel ( θ, ˜ p ) . Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 35 / 36

  39. In other words, log p ( X | θ ) = E ˜ z ) log p ( X, ¯ z | θ ) − E ˜ z ) log p (¯ z | X, θ ) . p (¯ p (¯ Does this look familiar? d ( θ ) = skin ( θ, ˜ p ) − vessel ( θ, ˜ p ) . Like a vegan flea, we want to maximize the value for θ that maximizes the distance between E ˜ z ) log p ( X, ¯ z | θ ) and p (¯ z ) log p (¯ z | X, θ ) ! E ˜ p (¯ Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 35 / 36

  40. In other words, log p ( X | θ ) = E ˜ z ) log p ( X, ¯ z | θ ) − E ˜ z ) log p (¯ z | X, θ ) . p (¯ p (¯ Does this look familiar? d ( θ ) = skin ( θ, ˜ p ) − vessel ( θ, ˜ p ) . Like a vegan flea, we want to maximize the value for θ that maximizes the distance between E ˜ z ) log p ( X, ¯ z | θ ) and p (¯ z ) log p (¯ z | X, θ ) ! E ˜ p (¯ It turns out that all assumptions hold! Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 35 / 36

  41. In other words, log p ( X | θ ) = E ˜ z ) log p ( X, ¯ z | θ ) − E ˜ z ) log p (¯ z | X, θ ) . p (¯ p (¯ Does this look familiar? d ( θ ) = skin ( θ, ˜ p ) − vessel ( θ, ˜ p ) . Like a vegan flea, we want to maximize the value for θ that maximizes the distance between E ˜ z ) log p ( X, ¯ z | θ ) and p (¯ z ) log p (¯ z | X, θ ) ! E ˜ p (¯ It turns out that all assumptions hold! We can apply our optimization algorithm to approximately maximize log p ( X | θ ) with respect to θ . Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 35 / 36

Recommend


More recommend