recommender system industry challenges move towards real
play

Recommender system industry challenges move towards real-world, - PowerPoint PPT Presentation

Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova March 23th, 2016 Andreas Lommatzsch - TU Berlin, Berlin, Germany Jonas Seiler - plista, Berlin, Germany Daniel Kohlsdorf - XING,


  1. Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU Berlin, Berlin, Germany Jonas Seiler - plista, Berlin, Germany Daniel Kohlsdorf - XING, Hamburg, Germany CrowdRec - www.crowdrec.eu

  2. Andreas Lommatzsch • Andreas Andreas.Lommatzsch@tu-berlin.de http://www.dai-lab.de

  3. Jonas Seiler • s Jonas.Seiler@plista.com http://www.plista.com

  4. Daniel Kohlsdorf • Daniel Daniel.Kohlsdorf@xing.com http://www.xing.com

  5. Moving towards real-world evaluation Where are recommender system challenges headed? Direction 1: Use info beyond the user- item matrix. Direction 2: Online evaluation + multiple metrics. Flickr credit: rodneycampbell

  6. Why evaluate? <Images showing “our” use cases> Evaluation is crucial for the success of real-life systems • ● ● How should we evaluate? • Influence on sales Precision and ● Recall ● Required hardware ● Technical resources complexity Business ● User models ● satisfaction ● Scalability Diversity of the presented results

  7. Traditional Evaluation in IR Evaluation Settings • A static collection of documents • A set of queries • A list of relevant documents defined by Query0 experts for each query * #nn * #nn * #nn Advantages “The Cranfield paradigm” • Reproducible setting • All researches have exactly the same information • Optimized for measuring precision

  8. Traditional Evaluation in IR Weaknesses of traditional IR evaluation • High costs for creating dataset • Datasets are not up-to-date • Domain-specific documents • The expert-defined ground truth does not consider individual user preferences • Individual user preferences Context is everythin g • Context-awareness is not considered • Technical aspects are ignored

  9. Industry and recsys challenges Challenges benefit both industry and academic research. • • We look at how industry challenges have evolved since the Netflix prize 2009.

  10. Traditional Evaluation in RecSys Evaluation Settings • Rating prediction on user-item matrices • Large, sparse dataset • Predict personalized ratings • Cross-validation, RMSE Advantages • Reproducible setting • Personalization • Dataset is based on real user ratings “The Netflix paradigm”

  11. Traditional Evaluation in RecSys Weaknesses of traditional Recommender evaluation • Static data • Only one type of data - only user ratings • User ratings are noisy • Temporal aspects tend to be ignored • Context-awareness is not considered • Technical aspects are ignored

  12. Challenges of Developing Applications Challenges • Data streams - continuous changes • Big data • Combine knowledge from different sources • Context-Awareness • Users expect personally relevant results • Heterogeneous devices • Technical complexity , real-time requirements

  13. How to Setup a better Evaluation? ● How to address these challenges in the Evaluation? • Realistic evaluation setting ● – Heterogeneous data sources – Streams – Dynamic user feedback ● • Appropriate metrics – Precision and User satisfaction ● – Technical complexity – Sales and Business models • Online and Offline Evaluation

  14. Approaches for a better Evaluation • News recommendations @ plista • Job recommendations @ XING

  15. The plista Recommendation Scenario Setting ● 250 ms response time ● 350 Mio AI/day ● In 10 Countries Challenges ● News change continuously ● User do not log-in explicitly ● Seasonality, context- depend user preferences

  16. Evaluation @ plista Offline Online • • Cross-validation AB Tests – – M etric O ptimization E ngine Limited • (https://github.com/Yelp/MOE) by Caching Memory – • Integration into Spark Computational • How well does it correlate with Resources – Online Evaluation? MOE* • Time Complexity

  17. Evaluation using MOE Offline • Mean and variance estimation of parameter space with Gaussian Process • Evaluate parameter with highest Expected Improvement (EI), Upper Confidence Interval …. • Rest API

  18. Evaluation using MOE Online • A/B Tests are expensive • Model non-stationarity • Integrate out non-stationarity to get mean EI

  19. The CLEF-NewsREEL challenge Provide an API enabling researchers testing own ideas • The CLEF-NewsREEL challenge • A Challenge in CLEF (Conferences and Labs of the Evaluation Forum) • 2 Tasks: Online and Offline Evaluation

  20. CLEF-NewsREEL Online Task How does the challenge work? • Live streams consisting of impressions, requests, and clicks, 5 publishers, approx 6 Million messages per day • Technical requirements: 100 ms per request • Live evaluation based on CTR

  21. CLEF-NewsREEL Offline Task Online vs. Offline Evaluation • Technical aspects can be evaluated without user feedback • Analyze the required resources and the response time • Simulate the online evaluation by replaying a recorded stream

  22. CLEF-NewsREEL Offline Task Challenge • Realistic simulation of streams • Reproducible setup of computing environments Solution • A framework simplifying the setup of the evaluation environment • The Idomaar framework developed in the CrowdRec project http://rf.crowdrec.eu

  23. CLEF-NewsREEL More Information • SIGIR forum Dec 2015 (Vol 49, #2) http://sigir.org/files/forum/2015D/p129.pdf Evaluate your algorithm online and offline in NewsREEL • Register for the challenge! http://crowdrec.eu/2015/11/clef-newsreel-2016/ (register until 22nd of April) • Tutorials and Templates are provided at orp.plista.com

  24. XING - RecSys Challenge https://recsys.xing.com/

  25. Job Recommendations @ XING

  26. XING - Evaluation based on interaction ● On Xing users can give feedback on recommendations. ● Number of user feedback way lower than implicit measures. ● A/B Tests focus on clickthrough rate.

  27. XING - RecSys Challenge, Scoring, Space on Page Top 6 ● Predict 30 items for each user. ● Score: weighted combination of the precision ○ precisionAt(2) ○ precisionAt(4) ○ precisionAt(6) ○ precisionAt(20)

  28. XING - RecSys Challenge, User Data • User ID • Job Title • Educational Degree • Field of Study • Location

  29. XING - RecSys Challenge, User Data • Number of past jobs • Years of Experience • Current career level • Current discipline • Current industry

  30. XING - RecSys Challenge, Item Data • Job title • Desired career level • Desired discipline • Desired industry

  31. XING - RecSys Challenge, Interaction Data • Timestamp • User • Job • Type: – Deletion – Click – Bookmark

  32. XING - RecSys Challenge, Anonymization

  33. XING - RecSys Challenge, Anonymization

  34. XING - RecSys Challenge, Future • Live Challenge – Users submit predicted future interactions – The solution is recommended on the platform – Participants get points for actual user clicks Score Release to Challenge Collect Clicks Work On Predictions

  35. Concluding ... How to setup a better Evaluation • Consider different quality criteria (prediction, technical, business models) • Aggregate heterogeneous information sources • Consider user feedback • Use online and offline analyses to understand users and their requirements

  36. Concluding ... Participate in challenges based on real-life scenarios • • NewsREEL challenge RecSys 2016 challenge http://orp.plista.com http://2016.recsyschallenge.com/ => Organize a challenge. Focus on real-life data .

  37. Thank You More Information • http://www.crowdrec.eu • http://www.clef-newsreel.org • http://orp.plista.com • http://2016.recsyschallenge.com • http://www.xing.com

Recommend


More recommend