unbiased offline recommender evaluation for missing not
play

Unbiased Offline Recommender Evaluation for Missing-Not-At-Random - PowerPoint PPT Presentation

Unbiased Offline Recommender Evaluation for Missing-Not-At-Random Implicit Feedback Serge Belongie Deborah Estrin Lo Longqi Yang Yuan Xuan Chenyang Wang Yin Cui Funders: 1 Offline Evaluation of Recommendation Algorithm user-item


  1. Unbiased Offline Recommender Evaluation for Missing-Not-At-Random Implicit Feedback Serge Belongie Deborah Estrin Lo Longqi Yang Yuan Xuan Chenyang Wang Yin Cui Funders: 1

  2. Offline Evaluation of Recommendation Algorithm user-item interactions recommendation algorithms ( , ) … ( , ) R … … rewards ( , ) 2

  3. Offline Evaluation of Recommendation Algorithm user interaction history recommendation algorithms Pr Pros: ( , ) … • Cost effective. • Efficient. ( , ) R • Iterate faster. • Experiment before deployment. … … rewards ( , ) 3

  4. Offline Evaluation of Recommendation Algorithm user interaction history recommendation algorithms Pr Pros: ( , ) … • Cost effective. • Efficient. ( , ) R • Iterate faster. • Experiment before deployment. … … rewards ( , ) Co Cons: • The data is Missing-Not-At-Random (MNAR) 4

  5. Of Offline E e Evaluation on procedure item " user ! interacted user ! with item " 5

  6. Of Offline E e Evaluation on procedure train/test 6

  7. Of Offline E e Evaluation on procedure 1. Train and validate a 2. Averaged performance over held- recommendation model out (user, item) interaction pairs (Average-Over-All) 7

  8. Of Offline E e Evaluation on procedure Rating-based recommendation systems Implicit feedback-based recommendation systems 1. Train and validate a 2. Averaged performance over held- recommendation model out (user, item) interaction pairs (policy) ! (Average-Over-All) 8

  9. Previous work: Av Average-Ov Over er-Al All is is bia biased fo for r ra rating ting-ba based d re recommenda ndatio tion n systems, be becaus use ra rating tings are re MN MNAR [Marlin et al. 09], [Schnabel et al. 16], [Steck 10], [Steck 11], and [Steck 13] 9

  10. Previous work: Av Average-Ov Over er-Al All is is bia biased fo for r ra rating ting-ba based d re recommenda ndatio tion n systems, be becaus use ra rating tings are re MN MNAR [Marlin et al. 09], [Schnabel et al. 16], [Steck 10], [Steck 11], and [Steck 13] Previous work: Av Average-Ov Over er-Al All is is unb unbiased fo for r im implic plicit it fe feedba dback-ba based d re recommenda datio ion systems, be because im implic plicit it fe feedba dback is is mi missing uniforml mly at random. [Lim 15] 10

  11. This work: Av Average-Ov Over er-Al All is is bia biased fo for r im implic plicit it fe feedba dback-ba based d re recommenda datio ion systems, be because im impl plic icit it fe feedbac dback k is is NO NOT mi missing uniforml mly at random . 11

  12. This work: Av Average-Ov Over er-Al All is is bia biased fo for r im implic plicit it fe feedba dback-ba based d re recommenda datio ion systems, be because im impl plic icit it fe feedbac dback k is is NO NOT mi missing uniforml mly at random. trending tr re recommendation Popularity bias (Users are more likely to be exposed to popular items) 12

  13. A Hypothetical Example Popular Items Long-tail Items # of liked items 1 : 10 (over all items) : # of liked items 10 1 (over observations) Algorithm 1 0.8 0 Performance Algorithm 2 0.75 0.75 Performance 13

  14. A Hypothetical Example Popular Items Long-tail Items # of liked items 1 : 10 (over all items) : # of liked items 10 1 (over observations) Algorithm 1 0.8 0 Performance Algorithm 2 0.75 0.75 Performance 14

  15. A Hypothetical Example Popular Items Long-tail Items # of liked items 1 : 10 (over all items) : # of liked items 10 1 (over observations) Algorithm 1 0.8 0 Performance Algorithm 2 0.75 0.75 Performance 15

  16. A Hypothetical Example Popular Items Long-tail Items # of liked items 1 : 10 (over all items) : # of liked items 10 1 (over observations) Algorithm 1 0.8 0 Performance Algorithm 2 0.75 0.75 Performance 16

  17. A Hypothetical Example Popular Items Long-tail Items # of liked items 1 : 10 (over all items) : # of liked items 10 1 (over observations) Algorithm 1 0.8 0 Performance Algorithm 2 0.75 0.75 Performance 17

  18. A Hypothetical Example Popular Items Long-tail Items # of liked items 1 : 10 (over all items) : # of liked items 10 1 (over observations) Any sensible Algorithm 1 0.8 0 evaluation Performance Algorithm 2 0.75 0.75 Performance 18

  19. A Hypothetical Example Popular Items Long-tail Items # of liked items 1 : 10 (over all items) : # of liked items 10 1 Average- (over observations) Over-All Algorithm 1 0.8 0 Performance Algorithm 2 0.75 0.75 Performance 19

  20. <latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">ACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCF/ghf/FS8eWopXj978b5ykEWzRBwOP97H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf3Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkLhpjYuHRYgljDHkacGNLrJeldow96JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8Umz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY3s9RZafsjJ2ziL1nXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA=</latexit> <latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">ACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCF/ghf/FS8eWopXj978b5ykEWzRBwOP97H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf3Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkLhpjYuHRYgljDHkacGNLrJeldow96JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8Umz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY3s9RZafsjJ2ziL1nXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA=</latexit> <latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">ACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCF/ghf/FS8eWopXj978b5ykEWzRBwOP97H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf3Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkLhpjYuHRYgljDHkacGNLrJeldow96JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8Umz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY3s9RZafsjJ2ziL1nXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA=</latexit> <latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">ACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCF/ghf/FS8eWopXj978b5ykEWzRBwOP97H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf3Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkLhpjYuHRYgljDHkacGNLrJeldow96JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8Umz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY3s9RZafsjJ2ziL1nXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA=</latexit> Formalize Reward ! Item rankings predicted by an algorithm Z ) = 1 1 R ( ˆ c ( ˆ X X Z u,i ) Ideal evaluation: |U| |S u | u ∈ U i ∈ S u 20

Recommend


More recommend