Vegan fleas, movie ratings, and the EM algorithm Carlos Cotrini - PowerPoint PPT Presentation

Vegan fleas, movie ratings, and the EM algorithm Carlos Cotrini Department of Computer Science ETH Z¨ urich ccarlos@inf.ethz.ch March 25, 2019 Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 1 / 36

Overview The vegan-flea optimization problem 1 Building a movie recommendation system 2 The EM algorithm 3 Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 2 / 36

The vegan-flea optimization problem Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 3 / 36

A two-dimensional dog Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 4 / 36

The dog’s cardiovascular system Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 5 / 36

The dog’s cardiovascular system Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 6 / 36

The flea, the dog’s skin, and the vessel’s upper border Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 7 / 36

Animation Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 8 / 36

Formalization Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 9 / 36

Assumptions We assume that for any x ∈ [0 , 1] and any two time points t 1 , t 2 ∈ [0 , ∞ ) , skin ( x, t 1 ) − vessel ( x, t 1 ) = skin ( x, t 2 ) − vessel ( x, t 2 ) . For any x ∈ [0 , 1] and any t ∈ [0 , ∞ ) , there is t ′ ≥ t such that vessel ( x, t ′ ) is a maximum of vessel ( · , t ′ ) . For any t ∈ [0 , ∞ ) , the flea can efficiently compute a point x ∗ that maximizes skin ( · , t ) . For any x ∈ [0 , 1] and any t ∈ [0 , ∞ ) , the flea can efficiently compute ˆ t ≥ t such that vessel ( x, ˆ t ) is a maximum of vessel ( · , ˆ t ) . Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 10 / 36

Objective Can the flea compute x ∗ such that d ( x ∗ ) ≥ d ( x 0 ) , where x 0 is the flea’s current position? Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 11 / 36

Optimization algorithm Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 12 / 36

Why does this work? Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 15 / 36

A movie recommendation system Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 16 / 36

A simple dataset of movie ratings Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 17 / 36

A probability model for movie ratings Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 21 / 36

Notation X = ( x i,j ) i ≤ N,j ≤ D . Here, x i,j ∈ { 0 , 1 } indicates whether person i liked movie j or not. µ = ( µ k,j ) k ≤ K,j ≤ D . Here, µ k,j ∈ [0 , 1] denotes the probability that ¯ someone in category k likes movie j . ν = ( ν k ) k ≤ K . Here, ν k ∈ [0 , 1] denotes the probability that a ¯ person belongs to category k . z = ( z ( i )) i ≤ N . Here, z ( i ) ∈ { 0 , . . . , K } indicates person i ’s ¯ category. Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 28 / 36

How to mine a probability model from X ? Maximum-likelihood approach: Solve the following problem. arg max log p ( X | ¯ µ, ¯ ν ) . ¯ µ, ¯ ν � s.t. ν k = 1 . k ≤ K Incomplete-data log likelihood. Complete-data log likelihood. log p ( X, ¯ z | ¯ µ, ¯ ν ) . Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 29 / 36

How to mine a probability model from X ? Maximum-likelihood approach: Solve the following problem. �� j ≤ D µ x i,j � z ( i ) ,j (1 − µ z ( i ) ,j ) 1 − x i,j arg max � i ≤ N log � z ( i ) ν z ( i ) . µ, ¯ ¯ ν � s.t. ν k = 1 . k ≤ K Incomplete-data log likelihood. Complete-data log likelihood. � � � i ≤ N log ν z ( i ) + � j ≤ D x i,j log µ z ( i ) ,j + (1 − x i,j ) log 1 − µ z ( i ) ,j . Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 30 / 36

The dilemma We are between a problem we want to solve, but we don’t know how, and a problem we know how to solve but we don’t want to solve. Let’s try to connect them. Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 31 / 36

Connecting incomplete-data and complete-data log likelihoods Let θ = (¯ µ, ¯ ν ) How can we connect log p ( X | θ ) and log p ( X, ¯ z | θ ) ? We can start with the following: z | X, θ ) = p ( X, ¯ z | θ ) p (¯ p ( X | θ ) . Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 32 / 36

Connecting incomplete-data and complete-data log likelihoods Let θ = (¯ µ, ¯ ν ) How can we connect log p ( X | θ ) and log p ( X, ¯ z | θ ) ? We can start with the following: z | X, θ ) = p ( X, ¯ z | θ ) p (¯ p ( X | θ ) . From here, we can derive that: log p ( X | θ ) = log p ( X, ¯ z | θ ) − log p (¯ z | X, θ ) . But we don’t know the value of ¯ z . Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 32 / 36

Take expectations on both sides with respect to ¯ z , using some pdf ˜ p (¯ z ) for ¯ z . � � � p (¯ ˜ z ) log p ( X | θ ) d ¯ z = p (¯ ˜ z ) log p ( X, ¯ z | θ ) d ¯ z − p (¯ ˜ z ) log p (¯ z | X, θ ) d ¯ z. Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 33 / 36

Since log p ( X | θ ) does not depend on ¯ z , we get � � log p ( X | θ ) = p (¯ ˜ z ) log p ( X, ¯ z | θ ) d ¯ z − p (¯ ˜ z ) log p (¯ z | X, θ ) d ¯ z. Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 34 / 36

In other words, log p ( X | θ ) = E ˜ z ) log p ( X, ¯ z | θ ) − E ˜ z ) log p (¯ z | X, θ ) . p (¯ p (¯ Does this look familiar? Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 35 / 36

In other words, log p ( X | θ ) = E ˜ z ) log p ( X, ¯ z | θ ) − E ˜ z ) log p (¯ z | X, θ ) . p (¯ p (¯ Does this look familiar? d ( θ ) = skin ( θ, ˜ p ) − vessel ( θ, ˜ p ) . Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 35 / 36

In other words, log p ( X | θ ) = E ˜ z ) log p ( X, ¯ z | θ ) − E ˜ z ) log p (¯ z | X, θ ) . p (¯ p (¯ Does this look familiar? d ( θ ) = skin ( θ, ˜ p ) − vessel ( θ, ˜ p ) . Like a vegan flea, we want to maximize the value for θ that maximizes the distance between E ˜ z ) log p ( X, ¯ z | θ ) and p (¯ z ) log p (¯ z | X, θ ) ! E ˜ p (¯ Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 35 / 36

In other words, log p ( X | θ ) = E ˜ z ) log p ( X, ¯ z | θ ) − E ˜ z ) log p (¯ z | X, θ ) . p (¯ p (¯ Does this look familiar? d ( θ ) = skin ( θ, ˜ p ) − vessel ( θ, ˜ p ) . Like a vegan flea, we want to maximize the value for θ that maximizes the distance between E ˜ z ) log p ( X, ¯ z | θ ) and p (¯ z ) log p (¯ z | X, θ ) ! E ˜ p (¯ It turns out that all assumptions hold! Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 35 / 36

In other words, log p ( X | θ ) = E ˜ z ) log p ( X, ¯ z | θ ) − E ˜ z ) log p (¯ z | X, θ ) . p (¯ p (¯ Does this look familiar? d ( θ ) = skin ( θ, ˜ p ) − vessel ( θ, ˜ p ) . Like a vegan flea, we want to maximize the value for θ that maximizes the distance between E ˜ z ) log p ( X, ¯ z | θ ) and p (¯ z ) log p (¯ z | X, θ ) ! E ˜ p (¯ It turns out that all assumptions hold! We can apply our optimization algorithm to approximately maximize log p ( X | θ ) with respect to θ . Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 35 / 36

Vegan fleas, movie ratings, and the EM algorithm Carlos Cotrini - PowerPoint PPT Presentation

Vegan fleas, movie ratings, and the EM algorithm Carlos Cotrini Department of Computer Science ETH Z urich ccarlos@inf.ethz.ch March 25, 2019 Carlos Cotrini (ETH Z urich) The EM algorithm March 25, 2019 1 / 36 Overview The vegan-flea

NETFLIX Movie Recommendations Virgil Pavlu Shahzad Rajput Keshi Dai Movie ratings: 1 (bad) - 5

Exact Minimization of # of Joins Example (movie database) select m1.director movie m1, movie

Tie Vegan Data Diet Tie Vegan Data Diet How Wikipedia cuts down privacy issues while keeping

Movie & Actor QI, Xiaoxu CHEN, Guanhao JIN, Yue OVERVIEW Goal: build a movie and actor

Going beyond the algorithms Yehuda Koren Haifa movie #15868 Rese search movie #7614 movie

Claims Heating food above 118 degrees F. causes Benefits of a Raw Vegan Diet? Benefits of a

Academic Affairs Student Ratings Report University-wide System of Student Ratings on Teaching

Fire Group Ratings & Critical Radiant Flux Fire Group Ratings for Interior wall &

Using Windows Movie Maker to Edit or Compile Media for use with Presentations and Classroom

Medicare Part C & D Star Ratings: Update for 2017 August 3, 2016 Part C & D User Group

Click to edit Master title style Entergy Practices Dynamic Line Ratings Presented to MISO

Care Quality Commission Progress Report Hertfordshire Health Scrutiny Committee 13 December 2018

Medicare Part C & D Star Ratings: Update for 2018 August 9, 2017 Part C & D User Group

REVERSE-ENGINEERING COUNTRY RISK RATINGS: A COMBINATORIAL NON-RECURSIVE MODEL Peter L. Hammer

Co u rse ratings IN TR OD U C TION TO DATA E N G IN E E R IN G Vincent Vankr u nkels v en Data

Medicare Part C & D Star Ratings: Update for 2019 August 8, 2018 Part C & D User

What were learning about Animals that live on or in other animals Sometimes theyre too

Electrochemistry Slide 2 / 144 Electrochemistry Electrochemistry deals with relationships

Seasonal variations of CO near-surface concentration in central Siberia Obukhov Institute for

Lifelong Learning Promotion for Community Empowerment through Sufficiency Economy Philosophy By

FEATURED SPEAKERS BRYAN GAY Special Thanks to PRESIDENT/CEO, INVEST AURORA Mayor Richard C

Interim results presentation - Half year ended 31 March 2017 1 DISCLAIMER For the purposes of

River John Support Our School Hub School Model Presentation What is our mission

M artinsville Community Center 2017 Annual M eeting November 21, 2017 11/ 21/ 17 Agenda

Vegan fleas, movie ratings, and the EM algorithm Carlos Cotrini - PowerPoint PPT Presentation

Vegan fleas, movie ratings, and the EM algorithm Carlos Cotrini Department of Computer Science ETH Z urich ccarlos@inf.ethz.ch March 25, 2019 Carlos Cotrini (ETH Z urich) The EM algorithm March 25, 2019 1 / 36 Overview The vegan-flea

NETFLIX Movie Recommendations Virgil Pavlu Shahzad Rajput Keshi Dai Movie ratings: 1 (bad) - 5

Exact Minimization of # of Joins Example (movie database) select m1.director movie m1, movie

Tie Vegan Data Diet Tie Vegan Data Diet How Wikipedia cuts down privacy issues while keeping

Movie &amp; Actor QI, Xiaoxu CHEN, Guanhao JIN, Yue OVERVIEW Goal: build a movie and actor

Going beyond the algorithms Yehuda Koren Haifa movie #15868 Rese search movie #7614 movie

Claims Heating food above 118 degrees F. causes Benefits of a Raw Vegan Diet? Benefits of a

Academic Affairs Student Ratings Report University-wide System of Student Ratings on Teaching

Fire Group Ratings &amp; Critical Radiant Flux Fire Group Ratings for Interior wall &amp;

Using Windows Movie Maker to Edit or Compile Media for use with Presentations and Classroom

Medicare Part C &amp; D Star Ratings: Update for 2017 August 3, 2016 Part C &amp; D User Group

Click to edit Master title style Entergy Practices Dynamic Line Ratings Presented to MISO

Care Quality Commission Progress Report Hertfordshire Health Scrutiny Committee 13 December 2018

Medicare Part C &amp; D Star Ratings: Update for 2018 August 9, 2017 Part C &amp; D User Group

REVERSE-ENGINEERING COUNTRY RISK RATINGS: A COMBINATORIAL NON-RECURSIVE MODEL Peter L. Hammer

Co u rse ratings IN TR OD U C TION TO DATA E N G IN E E R IN G Vincent Vankr u nkels v en Data

Medicare Part C &amp; D Star Ratings: Update for 2019 August 8, 2018 Part C &amp; D User

What were learning about Animals that live on or in other animals Sometimes theyre too

Electrochemistry Slide 2 / 144 Electrochemistry Electrochemistry deals with relationships

Seasonal variations of CO near-surface concentration in central Siberia Obukhov Institute for

Lifelong Learning Promotion for Community Empowerment through Sufficiency Economy Philosophy By

FEATURED SPEAKERS BRYAN GAY Special Thanks to PRESIDENT/CEO, INVEST AURORA Mayor Richard C

Interim results presentation - Half year ended 31 March 2017 1 DISCLAIMER For the purposes of

River John Support Our School Hub School Model Presentation What is our mission

M artinsville Community Center 2017 Annual M eeting November 21, 2017 11/ 21/ 17 Agenda

Movie & Actor QI, Xiaoxu CHEN, Guanhao JIN, Yue OVERVIEW Goal: build a movie and actor

Fire Group Ratings & Critical Radiant Flux Fire Group Ratings for Interior wall &

Medicare Part C & D Star Ratings: Update for 2017 August 3, 2016 Part C & D User Group

Medicare Part C & D Star Ratings: Update for 2018 August 9, 2017 Part C & D User Group

Medicare Part C & D Star Ratings: Update for 2019 August 8, 2018 Part C & D User