Unbiased Offline Recommender Evaluation for Missing-Not-At-Random - PowerPoint PPT Presentation

Unbiased Offline Recommender Evaluation for Missing-Not-At-Random Implicit Feedback Serge Belongie Deborah Estrin Lo Longqi Yang Yuan Xuan Chenyang Wang Yin Cui Funders: 1

Offline Evaluation of Recommendation Algorithm user-item interactions recommendation algorithms ( , ) … ( , ) R … … rewards ( , ) 2

Offline Evaluation of Recommendation Algorithm user interaction history recommendation algorithms Pr Pros: ( , ) … • Cost effective. • Efficient. ( , ) R • Iterate faster. • Experiment before deployment. … … rewards ( , ) 3

Offline Evaluation of Recommendation Algorithm user interaction history recommendation algorithms Pr Pros: ( , ) … • Cost effective. • Efficient. ( , ) R • Iterate faster. • Experiment before deployment. … … rewards ( , ) Co Cons: • The data is Missing-Not-At-Random (MNAR) 4

Of Offline E e Evaluation on procedure item " user ! interacted user ! with item " 5

Of Offline E e Evaluation on procedure train/test 6

Of Offline E e Evaluation on procedure 1. Train and validate a 2. Averaged performance over held- recommendation model out (user, item) interaction pairs (Average-Over-All) 7

Of Offline E e Evaluation on procedure Rating-based recommendation systems Implicit feedback-based recommendation systems 1. Train and validate a 2. Averaged performance over held- recommendation model out (user, item) interaction pairs (policy) ! (Average-Over-All) 8

Previous work: Av Average-Ov Over er-Al All is is bia biased fo for r ra rating ting-ba based d re recommenda ndatio tion n systems, be becaus use ra rating tings are re MN MNAR [Marlin et al. 09], [Schnabel et al. 16], [Steck 10], [Steck 11], and [Steck 13] 9

Previous work: Av Average-Ov Over er-Al All is is bia biased fo for r ra rating ting-ba based d re recommenda ndatio tion n systems, be becaus use ra rating tings are re MN MNAR [Marlin et al. 09], [Schnabel et al. 16], [Steck 10], [Steck 11], and [Steck 13] Previous work: Av Average-Ov Over er-Al All is is unb unbiased fo for r im implic plicit it fe feedba dback-ba based d re recommenda datio ion systems, be because im implic plicit it fe feedba dback is is mi missing uniforml mly at random. [Lim 15] 10

This work: Av Average-Ov Over er-Al All is is bia biased fo for r im implic plicit it fe feedba dback-ba based d re recommenda datio ion systems, be because im impl plic icit it fe feedbac dback k is is NO NOT mi missing uniforml mly at random . 11

This work: Av Average-Ov Over er-Al All is is bia biased fo for r im implic plicit it fe feedba dback-ba based d re recommenda datio ion systems, be because im impl plic icit it fe feedbac dback k is is NO NOT mi missing uniforml mly at random. trending tr re recommendation Popularity bias (Users are more likely to be exposed to popular items) 12

A Hypothetical Example Popular Items Long-tail Items # of liked items 1 : 10 (over all items) : # of liked items 10 1 (over observations) Algorithm 1 0.8 0 Performance Algorithm 2 0.75 0.75 Performance 13

A Hypothetical Example Popular Items Long-tail Items # of liked items 1 : 10 (over all items) : # of liked items 10 1 (over observations) Any sensible Algorithm 1 0.8 0 evaluation Performance Algorithm 2 0.75 0.75 Performance 18

A Hypothetical Example Popular Items Long-tail Items # of liked items 1 : 10 (over all items) : # of liked items 10 1 Average- (over observations) Over-All Algorithm 1 0.8 0 Performance Algorithm 2 0.75 0.75 Performance 19

<latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">ACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCF/ghf/FS8eWopXj978b5ykEWzRBwOP97H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf3Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkLhpjYuHRYgljDHkacGNLrJeldow96JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8Umz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY3s9RZafsjJ2ziL1nXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA=</latexit> <latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">ACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCF/ghf/FS8eWopXj978b5ykEWzRBwOP97H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf3Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkLhpjYuHRYgljDHkacGNLrJeldow96JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8Umz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY3s9RZafsjJ2ziL1nXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA=</latexit> <latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">ACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCF/ghf/FS8eWopXj978b5ykEWzRBwOP97H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf3Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkLhpjYuHRYgljDHkacGNLrJeldow96JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8Umz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY3s9RZafsjJ2ziL1nXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA=</latexit> <latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">ACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCF/ghf/FS8eWopXj978b5ykEWzRBwOP97H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf3Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkLhpjYuHRYgljDHkacGNLrJeldow96JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8Umz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY3s9RZafsjJ2ziL1nXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA=</latexit> Formalize Reward ! Item rankings predicted by an algorithm Z ) = 1 1 R ( ˆ c ( ˆ X X Z u,i ) Ideal evaluation: |U| |S u | u ∈ U i ∈ S u 20

Unbiased Offline Recommender Evaluation for Missing-Not-At-Random - PowerPoint PPT Presentation

Unbiased Offline Recommender Evaluation for Missing-Not-At-Random Implicit Feedback Serge Belongie Deborah Estrin Lo Longqi Yang Yuan Xuan Chenyang Wang Yin Cui Funders: 1 Offline Evaluation of Recommendation Algorithm user-item

Finite Projective Planes http://math.uwyo.edu/moorhouse/pub/planes/ Eric Moorhouse Mutually

Web Mining and Recommender Systems Recommender Systems: Introduction Learning Goals

M6 Offline Analysis Katarina Pajchel University of Oslo April 18, 2008 Katarina Pajchel

2. Recommender Systems Recommenders Everywhere Advanced Topics in Information Retrieval /

Affect- and Personality-based Recommender Systems Part II: Acquisition, Usage in Recommender

On the Economics of Recommender Systems Emilio Calvano Center for Studies in Econ and Finance U.

Privacy in Recommender Systems CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 21:

Multiple Imputation for Missing Data in KLoSA Juwon Song Korea University and UCLA Contents 1.

Missing Values in SAS Magnus Mengelbier Director PhUSE 2011 1 Topics Introduction

Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH , 2017 Outline Types of missing data

Bayesian Generalized linear mixed models with data missing not at random Overview: Two simple

Opyum: offline package management with Yum -- Debarshi Ray What is it? An offline package

Offline Inbox Interceptor - Ultimate Presentation Offline Inbox Interceptor - Ultimate

CAF Benchmarking CAF Benchmarking Marco MEONI CERN - Offline Week C N O e Wee Alice Offline

Offline Data Processing: Tasks and Infrastructure Support T. Yang, UCSB 293S Table of Content

5.1 Online versus Offline SVMs We start with a review of the Offline Support Vector Machine.

T HANK Y OU ! Questions?

Faults Found by Random Testing WanChi Chio Justin Molinyawe Introduction Why apply random

Efficient and good Delaunay meshes from points random points M. S. Ebeida et a.l Intro MPS M.

DCP COMMAND bringing two -way communications to your DCPs Over 22,000 DCPs are operating today

By Jeroen Klaver & Roel van der Jagt Background Research question Test approach

Safety First Drug & Alcohol Testing Program Management 2018 DOT 49 CFR Revisions

Presentation of Major Findings Preparedby Public Research Group, LLC 1280 IroquoisAvenue

Lead Protection Program October 25, 2018 1 Overview 1. Sources of lead in drinking water at the

Sambuz

Useful Links

Newsletter

Mail Us

Unbiased Offline Recommender Evaluation for Missing-Not-At-Random - PowerPoint PPT Presentation

Unbiased Offline Recommender Evaluation for Missing-Not-At-Random Implicit Feedback Serge Belongie Deborah Estrin Lo Longqi Yang Yuan Xuan Chenyang Wang Yin Cui Funders: 1 Offline Evaluation of Recommendation Algorithm user-item

Finite Projective Planes http://math.uwyo.edu/moorhouse/pub/planes/ Eric Moorhouse Mutually

Web Mining and Recommender Systems Recommender Systems: Introduction Learning Goals

M6 Offline Analysis Katarina Pajchel University of Oslo April 18, 2008 Katarina Pajchel

2. Recommender Systems Recommenders Everywhere Advanced Topics in Information Retrieval /

Affect- and Personality-based Recommender Systems Part II: Acquisition, Usage in Recommender

On the Economics of Recommender Systems Emilio Calvano Center for Studies in Econ and Finance U.

Privacy in Recommender Systems CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 21:

Multiple Imputation for Missing Data in KLoSA Juwon Song Korea University and UCLA Contents 1.

Missing Values in SAS Magnus Mengelbier Director PhUSE 2011 1 Topics Introduction

Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH , 2017 Outline Types of missing data

Bayesian Generalized linear mixed models with data missing not at random Overview: Two simple

Opyum: offline package management with Yum -- Debarshi Ray What is it? An offline package

Offline Inbox Interceptor - Ultimate Presentation Offline Inbox Interceptor - Ultimate

CAF Benchmarking CAF Benchmarking Marco MEONI CERN - Offline Week C N O e Wee Alice Offline

Offline Data Processing: Tasks and Infrastructure Support T. Yang, UCSB 293S Table of Content

5.1 Online versus Offline SVMs We start with a review of the Offline Support Vector Machine.

T HANK Y OU ! Questions?

Faults Found by Random Testing WanChi Chio Justin Molinyawe Introduction Why apply random

Efficient and good Delaunay meshes from points random points M. S. Ebeida et a.l Intro MPS M.

DCP COMMAND bringing two -way communications to your DCPs Over 22,000 DCPs are operating today

By Jeroen Klaver &amp; Roel van der Jagt Background Research question Test approach

Safety First Drug &amp; Alcohol Testing Program Management 2018 DOT 49 CFR Revisions

Presentation of Major Findings Preparedby Public Research Group, LLC 1280 IroquoisAvenue

Lead Protection Program October 25, 2018 1 Overview 1. Sources of lead in drinking water at the

Sambuz

Useful Links

Newsletter

Mail Us

By Jeroen Klaver & Roel van der Jagt Background Research question Test approach

Safety First Drug & Alcohol Testing Program Management 2018 DOT 49 CFR Revisions