Ensemble Methods + Recommender Systems Matt Gormley Lecture 28 - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Ensemble Methods + Recommender Systems Matt Gormley Lecture 28 Apr. 29, 2019 1

Reminders • Homework 9: Learning Paradigms – Out: Wed, Apr 24 – Due: Wed, May 1 at 11:59pm – Can only be submitted up to 3 days late, so we can return grades before final exam • Today’s In-Class Poll – http://p28 .mlcourse.org 2

Q&A Q: In k-Means, since we don’t have a validation set, how do we pick k? A: Look at the training objective function as a function of k J( c , z ) and pick the value at the “elbo” of the curve. k Q: What if our random initialization for k-Means gives us poor performance? A: Do random restarts : that is, run k-means from scratch, say, 10 times and pick the run that gives the lowest training objective function value. The objective function is nonconvex , so we’re just looking for 3 the best local minimum.

ML Big Picture Learning Paradigms: Problem Formulation: Vision, Robotics, Medicine, What is the structure of our output prediction? What data is available and NLP, Speech, Computer when? What form of prediction? boolean Binary Classification • supervised learning categorical Multiclass Classification • unsupervised learning ordinal Ordinal Classification Application Areas • semi-supervised learning • real Regression reinforcement learning Key challenges? • active learning ordering Ranking • imitation learning multiple discrete Structured Prediction • domain adaptation • multiple continuous (e.g. dynamical systems) online learning Search • density estimation both discrete & (e.g. mixed graphical models) • recommender systems cont. • feature learning • manifold learning • dimensionality reduction Facets of Building ML Big Ideas in ML: • ensemble learning Systems: Which are the ideas driving • distant supervision How to build systems that are development of the field? • hyperparameter optimization robust, efficient, adaptive, • inductive bias effective? Theoretical Foundations: • generalization / overfitting 1. Data prep • bias-variance decomposition What principles guide learning? 2. Model selection • 3. Training (optimization / generative vs. discriminative q probabilistic search) • deep nets, graphical models q information theoretic 4. Hyperparameter tuning on • PAC learning q evolutionary search validation data • distant rewards 5. (Blind) Assessment on test q ML as optimization data 5

Outline for Today We’ll talk about two distinct topics: 1. Ensemble Methods : combine or learn multiple classifiers into one (i.e. a family of algorithms) 2. Recommender Systems : produce recommendations of what a user will like (i.e. the solution to a particular type of task) We’ll use a prominent example of a recommender systems (the Netflix Prize) to motivate both topics… 6

RECOMMENDER SYSTEMS 7

Recommender Systems A Common Challenge: – Assume you’re a company selling items of some sort: movies, songs, products, etc. – Company collects millions of ratings from users of their items – To maximize profit / user happiness, you want to recommend items that users are likely to want 8

Recommender Systems 9

Recommender Systems Problem Setup • 500,000 users • 20,000 movies • 100 million ratings • Goal: To obtain lower root mean squared error (RMSE) than Netflix’s existing system on 3 million held out ratings 12

ENSEMBLE METHODS 13

Recommender Systems Top performing systems were ensembles 14

Weighted Majority Algorithm (Littlestone & Warmuth, 1994) • Given : pool A of binary classifiers (that you know nothing about) • Data: stream of examples (i.e. online learning setting) • Goal: design a new learner that uses the predictions of the pool to make new predictions • Algorithm : – Initially weight all classifiers equally – Receive a training example and predict the (weighted) majority vote of the classifiers in the pool – Down-weight classifiers that contribute to a mistake by a factor of β 15

Weighted Majority Algorithm (Littlestone & Warmuth, 1994) 17

Weighted Majority Algorithm (Littlestone & Warmuth, 1994) This is a “mistake bound” of the variety we saw for the Perceptron algorithm 18

ADABOOST 19

Comparison AdaBoost Weighted Majority Algorithm • an example of an • an example of a boosting ensemble method method • assumes the classifiers are • simultaneously learns: learned ahead of time – the classifiers themselves • only learns (majority vote) – (majority vote) weight for each classifiers weight for each classifiers 20

AdaBoost: Toy Example D 1 weak classifiers = vertical or horizontal half-planes 23 Slide from Schapire NIPS Tutorial

AdaBoost: Toy Example h 1 D 2 ε 1 =0.30 =0.42 α 1 24 Slide from Schapire NIPS Tutorial

AdaBoost: Toy Example h 2 D 3 ε 2 =0.21 =0.65 α 2 25 Slide from Schapire NIPS Tutorial

AdaBoost: Toy Example h 3 ε 3 =0.14 3=0.92 α 26 Slide from Schapire NIPS Tutorial

AdaBoost: Toy Example H = sign 0.42 + 0.65 + 0.92 final = 27 Slide from Schapire NIPS Tutorial

AdaBoost Given: where , Initialize . For : Train weak learner using distribution . Get weak hypothesis with error Choose . Update: if if where is a normalization factor (chosen so that will be a distribution). Output the final hypothesis: 28 Algorithm from (Freund & Schapire, 1999)

AdaBoost 1.0 20 cumulative distribution 15 10 0.5 error 5 0 10 100 1000 -1 -0.5 0.5 1 # rounds margin Figure 2: Error curves and the margin distribution graph for boosting C4.5 on the letter dataset as reported by Schapire et al. [41]. Left : the training and test error curves (lower and upper curves, respectively) of the combined classifier as a function of the number of rounds of boosting. The horizontal lines indicate the test error rate of the base classifier as well as the test error of the final combined classifier. Right : The cumulative distribution of margins of the training examples after 5, 100 and 1000 iterations, indicated by short-dashed, long-dashed (mostly hidden) and solid curves, respectively. 30 Figure from (Freund & Schapire, 1999)

Learning Objectives Ensemble Methods / Boosting You should be able to… 1. Implement the Weighted Majority Algorithm 2. Implement AdaBoost 3. Distinguish what is learned in the Weighted Majority Algorithm vs. Adaboost 4. Contrast the theoretical result for the Weighted Majority Algorithm to that of Perceptron 5. Explain a surprisingly common empirical result regarding Adaboost train/test curves 31

Outline • Recommender Systems – Content Filtering – Collaborative Filtering (CF) – CF: Neighborhood Methods – CF: Latent Factor Methods • Matrix Factorization – Background: Low-rank Factorizations – Residual matrix – Unconstrained Matrix Factorization • Optimization problem • Gradient Descent, SGD, Alternating Least Squares • User/item bias terms (matrix trick) – Singular Value Decomposition (SVD) – Non-negative Matrix Factorization 32

RECOMMENDER SYSTEMS 33

Recommender Systems Problem Setup • 500,000 users • 20,000 movies • 100 million ratings • Goal: To obtain lower root mean squared error (RMSE) than Netflix’s existing system on 3 million held out ratings 38

Recommender Systems • Setup: – Items: movies, songs, products, etc. (often many thousands) Star Trek: Zootopia – Users: Strange Beyond Doctor watchers, listeners, purchasers, etc. (often many millions) – Feedback: Alice 1 5 5-star ratings, not-clicking ‘next’, purchases, etc. Bob 3 4 • Key Assumptions: – Can represent ratings numerically Charlie 3 5 2 as a user/item matrix – Users only rate a small number of items (the matrix is sparse) 40

Two Types of Recommender Systems Content Filtering Collaborative Filtering • Example : Pandora.com • Example : Netflix movie music recommendations recommendations (Music Genome Project) • Pro: Does not assume • Con: Assumes access to access to side information side information about about items (e.g. does not items (e.g. properties of a need to know about movie song) genres) • Pro: Got a new item to • Con: Does not work on add? No problem, just be new items that have no sure to include the side ratings information 41

COLLABORATIVE FILTERING 43

Collaborative Filtering • Everyday Examples of Collaborative Filtering... – Bestseller lists – Top 40 music lists – The “recent returns” shelf at the library – Unmarked but well-used paths thru the woods – The printer room at work – “Read any good books lately?” – … • Common insight: personal tastes are correlated – If Alice and Bob both like X and Alice likes Y then Bob is more likely to like Y – especially (perhaps) if Bob knows Alice 44 Slide from William Cohen

Two Types of Collaborative Filtering 1. Neighborhood Methods 2. Latent Factor Methods 45 Figures from Koren et al. (2009)

Ensemble Methods + Recommender Systems Matt Gormley Lecture 28 - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Ensemble Methods + Recommender Systems Matt Gormley Lecture 28 Apr. 29, 2019 1 Reminders Homework 9: Learning

Web Mining and Recommender Systems Recommender Systems: Introduction Learning Goals

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

2. Recommender Systems Recommenders Everywhere Advanced Topics in Information Retrieval /

Affect- and Personality-based Recommender Systems Part II: Acquisition, Usage in Recommender

On the Economics of Recommender Systems Emilio Calvano Center for Studies in Econ and Finance U.

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Privacy in Recommender Systems CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 21:

Introduction to ensemble methods EN S EMBLE METH ODS IN P YTH ON Romn de las Heras Data

Ensemble Methods + Recommender Systems Matt Gormley Lecture 21 Nov. 4, 2019 1 Reminders

CSE 255 Lecture 5 Data Mining and Predictive Analytics Recommender Systems Why

Content- -based Recommender Systems based Recommender Systems Content problems, challenges

CSE 158 Lecture 7 Web Mining and Recommender Systems Recommender Systems Announcements

Web Mining and Recommender Systems Advanced Recommender Systems: Bayesian Personalized Ranking

CSE 158 Lecture 7 Web Mining and Recommender Systems Recommender Systems Announcements

CSE 258 Web Mining and Recommender Systems Advanced Recommender Systems This week

Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, Sanjiv Kumar Overview

Manish Mehta Security Engineer Jan 11, 2018 @ RWC 2018 Disclaimer Design discussions and

Measuring Video Quality with VMAF: Why You Should Care Christos Bampis Encoding Technologies,

Contests for Experimentation Marina Halac Navin Kartik Qingmin Liu September 2014 Introduction

Eliciting Informative Feedback: The Peer-Prediction Method Colin Zheng and Kenneth Wang 1 The

IT350: Web & Internet Programming Set 7: Website Design 1 Principles: Web sites should be

How Stranger Things can happen with Visual Analytics Jason Flittner Senior Analytics

Differentially Private Recommender Systems David Madras University of Toronto April 4, 2017

Transfer to Rank for Top-N Recommendation Wei Dai, Qing Zhang, Weike Pan and Zhong Ming

Ensemble Methods + Recommender Systems Matt Gormley Lecture 28 - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Ensemble Methods + Recommender Systems Matt Gormley Lecture 28 Apr. 29, 2019 1 Reminders Homework 9: Learning

Web Mining and Recommender Systems Recommender Systems: Introduction Learning Goals

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

2. Recommender Systems Recommenders Everywhere Advanced Topics in Information Retrieval /

Affect- and Personality-based Recommender Systems Part II: Acquisition, Usage in Recommender

On the Economics of Recommender Systems Emilio Calvano Center for Studies in Econ and Finance U.

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Privacy in Recommender Systems CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 21:

Introduction to ensemble methods EN S EMBLE METH ODS IN P YTH ON Romn de las Heras Data

Ensemble Methods + Recommender Systems Matt Gormley Lecture 21 Nov. 4, 2019 1 Reminders

CSE 255 Lecture 5 Data Mining and Predictive Analytics Recommender Systems Why

Content- -based Recommender Systems based Recommender Systems Content problems, challenges

CSE 158 Lecture 7 Web Mining and Recommender Systems Recommender Systems Announcements

Web Mining and Recommender Systems Advanced Recommender Systems: Bayesian Personalized Ranking

CSE 158 Lecture 7 Web Mining and Recommender Systems Recommender Systems Announcements

CSE 258 Web Mining and Recommender Systems Advanced Recommender Systems This week

Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, Sanjiv Kumar Overview

Manish Mehta Security Engineer Jan 11, 2018 @ RWC 2018 Disclaimer Design discussions and

Measuring Video Quality with VMAF: Why You Should Care Christos Bampis Encoding Technologies,

Contests for Experimentation Marina Halac Navin Kartik Qingmin Liu September 2014 Introduction

Eliciting Informative Feedback: The Peer-Prediction Method Colin Zheng and Kenneth Wang 1 The

IT350: Web &amp; Internet Programming Set 7: Website Design 1 Principles: Web sites should be

How Stranger Things can happen with Visual Analytics Jason Flittner Senior Analytics

Differentially Private Recommender Systems David Madras University of Toronto April 4, 2017

Transfer to Rank for Top-N Recommendation Wei Dai, Qing Zhang, Weike Pan and Zhong Ming

IT350: Web & Internet Programming Set 7: Website Design 1 Principles: Web sites should be