Vegan fleas, movie ratings, and the EM algorithm Carlos Cotrini Department of Computer Science ETH Z¨ urich ccarlos@inf.ethz.ch March 25, 2019 Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 1 / 36
Overview The vegan-flea optimization problem 1 Building a movie recommendation system 2 The EM algorithm 3 Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 2 / 36
The vegan-flea optimization problem Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 3 / 36
A two-dimensional dog Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 4 / 36
The dog’s cardiovascular system Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 5 / 36
The dog’s cardiovascular system Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 6 / 36
The flea, the dog’s skin, and the vessel’s upper border Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 7 / 36
Animation Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 8 / 36
Formalization Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 9 / 36
Assumptions We assume that for any x ∈ [0 , 1] and any two time points t 1 , t 2 ∈ [0 , ∞ ) , skin ( x, t 1 ) − vessel ( x, t 1 ) = skin ( x, t 2 ) − vessel ( x, t 2 ) . For any x ∈ [0 , 1] and any t ∈ [0 , ∞ ) , there is t ′ ≥ t such that vessel ( x, t ′ ) is a maximum of vessel ( · , t ′ ) . For any t ∈ [0 , ∞ ) , the flea can efficiently compute a point x ∗ that maximizes skin ( · , t ) . For any x ∈ [0 , 1] and any t ∈ [0 , ∞ ) , the flea can efficiently compute ˆ t ≥ t such that vessel ( x, ˆ t ) is a maximum of vessel ( · , ˆ t ) . Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 10 / 36
Objective Can the flea compute x ∗ such that d ( x ∗ ) ≥ d ( x 0 ) , where x 0 is the flea’s current position? Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 11 / 36
Optimization algorithm Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 12 / 36
Optimization algorithm Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 13 / 36
Optimization algorithm Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 14 / 36
Why does this work? Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 15 / 36
A movie recommendation system Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 16 / 36
A simple dataset of movie ratings Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 17 / 36
A simple dataset of movie ratings Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 18 / 36
A simple dataset of movie ratings Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 19 / 36
A simple dataset of movie ratings Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 20 / 36
A probability model for movie ratings Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 21 / 36
A probability model for movie ratings Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 22 / 36
A probability model for movie ratings Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 23 / 36
A probability model for movie ratings Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 24 / 36
A probability model for movie ratings Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 25 / 36
A probability model for movie ratings Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 26 / 36
A probability model for movie ratings Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 27 / 36
Notation X = ( x i,j ) i ≤ N,j ≤ D . Here, x i,j ∈ { 0 , 1 } indicates whether person i liked movie j or not. µ = ( µ k,j ) k ≤ K,j ≤ D . Here, µ k,j ∈ [0 , 1] denotes the probability that ¯ someone in category k likes movie j . ν = ( ν k ) k ≤ K . Here, ν k ∈ [0 , 1] denotes the probability that a ¯ person belongs to category k . z = ( z ( i )) i ≤ N . Here, z ( i ) ∈ { 0 , . . . , K } indicates person i ’s ¯ category. Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 28 / 36
How to mine a probability model from X ? Maximum-likelihood approach: Solve the following problem. arg max log p ( X | ¯ µ, ¯ ν ) . ¯ µ, ¯ ν � s.t. ν k = 1 . k ≤ K Incomplete-data log likelihood. Complete-data log likelihood. log p ( X, ¯ z | ¯ µ, ¯ ν ) . Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 29 / 36
How to mine a probability model from X ? Maximum-likelihood approach: Solve the following problem. �� j ≤ D µ x i,j � z ( i ) ,j (1 − µ z ( i ) ,j ) 1 − x i,j arg max � i ≤ N log � z ( i ) ν z ( i ) . µ, ¯ ¯ ν � s.t. ν k = 1 . k ≤ K Incomplete-data log likelihood. Complete-data log likelihood. � � � i ≤ N log ν z ( i ) + � j ≤ D x i,j log µ z ( i ) ,j + (1 − x i,j ) log 1 − µ z ( i ) ,j . Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 30 / 36
The dilemma We are between a problem we want to solve, but we don’t know how, and a problem we know how to solve but we don’t want to solve. Let’s try to connect them. Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 31 / 36
Connecting incomplete-data and complete-data log likelihoods Let θ = (¯ µ, ¯ ν ) How can we connect log p ( X | θ ) and log p ( X, ¯ z | θ ) ? We can start with the following: z | X, θ ) = p ( X, ¯ z | θ ) p (¯ p ( X | θ ) . Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 32 / 36
Connecting incomplete-data and complete-data log likelihoods Let θ = (¯ µ, ¯ ν ) How can we connect log p ( X | θ ) and log p ( X, ¯ z | θ ) ? We can start with the following: z | X, θ ) = p ( X, ¯ z | θ ) p (¯ p ( X | θ ) . From here, we can derive that: log p ( X | θ ) = log p ( X, ¯ z | θ ) − log p (¯ z | X, θ ) . Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 32 / 36
Connecting incomplete-data and complete-data log likelihoods Let θ = (¯ µ, ¯ ν ) How can we connect log p ( X | θ ) and log p ( X, ¯ z | θ ) ? We can start with the following: z | X, θ ) = p ( X, ¯ z | θ ) p (¯ p ( X | θ ) . From here, we can derive that: log p ( X | θ ) = log p ( X, ¯ z | θ ) − log p (¯ z | X, θ ) . But we don’t know the value of ¯ z . Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 32 / 36
Take expectations on both sides with respect to ¯ z , using some pdf ˜ p (¯ z ) for ¯ z . � � � p (¯ ˜ z ) log p ( X | θ ) d ¯ z = p (¯ ˜ z ) log p ( X, ¯ z | θ ) d ¯ z − p (¯ ˜ z ) log p (¯ z | X, θ ) d ¯ z. Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 33 / 36
Since log p ( X | θ ) does not depend on ¯ z , we get � � log p ( X | θ ) = p (¯ ˜ z ) log p ( X, ¯ z | θ ) d ¯ z − p (¯ ˜ z ) log p (¯ z | X, θ ) d ¯ z. Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 34 / 36
In other words, log p ( X | θ ) = E ˜ z ) log p ( X, ¯ z | θ ) − E ˜ z ) log p (¯ z | X, θ ) . p (¯ p (¯ Does this look familiar? Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 35 / 36
In other words, log p ( X | θ ) = E ˜ z ) log p ( X, ¯ z | θ ) − E ˜ z ) log p (¯ z | X, θ ) . p (¯ p (¯ Does this look familiar? d ( θ ) = skin ( θ, ˜ p ) − vessel ( θ, ˜ p ) . Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 35 / 36
In other words, log p ( X | θ ) = E ˜ z ) log p ( X, ¯ z | θ ) − E ˜ z ) log p (¯ z | X, θ ) . p (¯ p (¯ Does this look familiar? d ( θ ) = skin ( θ, ˜ p ) − vessel ( θ, ˜ p ) . Like a vegan flea, we want to maximize the value for θ that maximizes the distance between E ˜ z ) log p ( X, ¯ z | θ ) and p (¯ z ) log p (¯ z | X, θ ) ! E ˜ p (¯ Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 35 / 36
In other words, log p ( X | θ ) = E ˜ z ) log p ( X, ¯ z | θ ) − E ˜ z ) log p (¯ z | X, θ ) . p (¯ p (¯ Does this look familiar? d ( θ ) = skin ( θ, ˜ p ) − vessel ( θ, ˜ p ) . Like a vegan flea, we want to maximize the value for θ that maximizes the distance between E ˜ z ) log p ( X, ¯ z | θ ) and p (¯ z ) log p (¯ z | X, θ ) ! E ˜ p (¯ It turns out that all assumptions hold! Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 35 / 36
In other words, log p ( X | θ ) = E ˜ z ) log p ( X, ¯ z | θ ) − E ˜ z ) log p (¯ z | X, θ ) . p (¯ p (¯ Does this look familiar? d ( θ ) = skin ( θ, ˜ p ) − vessel ( θ, ˜ p ) . Like a vegan flea, we want to maximize the value for θ that maximizes the distance between E ˜ z ) log p ( X, ¯ z | θ ) and p (¯ z ) log p (¯ z | X, θ ) ! E ˜ p (¯ It turns out that all assumptions hold! We can apply our optimization algorithm to approximately maximize log p ( X | θ ) with respect to θ . Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 35 / 36
Recommend
More recommend