Large-Scale Matrix Factorization with Distributed Stochastic - PowerPoint PPT Presentation

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent Rainer Gemulla August 23, 2011 Peter J. Haas Yannis Sismanis Erik Nijkamp

Outline Matrix Factorization Stochastic Gradient Descent Distributed SGD with MapReduce Experiments Summary 2 / 32

Collaborative Filtering ◮ Problem ◮ Set of users ◮ Set of items (movies, books, jokes, products, stories, ...) ◮ Feedback (ratings, purchase, click-through, tags, ...) 4 / 32

Collaborative Filtering ◮ Problem ◮ Set of users ◮ Set of items (movies, books, jokes, products, stories, ...) ◮ Feedback (ratings, purchase, click-through, tags, ...) ◮ Predict additional items a user may like ◮ Assumption: Similar feedback = ⇒ Similar taste 4 / 32

Collaborative Filtering ◮ Problem ◮ Set of users ◮ Set of items (movies, books, jokes, products, stories, ...) ◮ Feedback (ratings, purchase, click-through, tags, ...) ◮ Predict additional items a user may like ◮ Assumption: Similar feedback = ⇒ Similar taste ◮ Example Avatar The Matrix Up   Alice 4 2 Bob 3 2   5 3 Charlie 4 / 32

Collaborative Filtering ◮ Problem ◮ Set of users ◮ Set of items (movies, books, jokes, products, stories, ...) ◮ Feedback (ratings, purchase, click-through, tags, ...) ◮ Predict additional items a user may like ◮ Assumption: Similar feedback = ⇒ Similar taste ◮ Example Avatar The Matrix Up   Alice ? 4 2 Bob 3 2 ?   5 ? 3 Charlie 4 / 32

Collaborative Filtering ◮ Problem ◮ Set of users ◮ Set of items (movies, books, jokes, products, stories, ...) ◮ Feedback (ratings, purchase, click-through, tags, ...) ◮ Predict additional items a user may like ◮ Assumption: Similar feedback = ⇒ Similar taste ◮ Example Avatar The Matrix Up   Alice ? 4 2 Bob 3 2 ?   5 ? 3 Charlie ◮ Netflix competition: 500k users, 20k movies, 100M movie ratings, 3M question marks 4 / 32

λ ∈ ∑ κ - λ κ Semantic Factors (Koren et al., 2009) Serious ∈  Braveheart ∈  Amadeus The Color Purple Lethal Weapon Sense and Ocean’s 11 Sensibility Geared Geared toward toward males females Dave The Lion King Dumb and Dumber The Princess Independence Diaries Day Gus Escapist ∈  5 / 32

Latent Factor Models ◮ Discover latent factors ( r = 1) Avatar The Matrix Up Alice 4 2 Bob 3 2 Charlie 5 3 6 / 32

Latent Factor Models ◮ Discover latent factors ( r = 1) Avatar The Matrix Up ( 2.24 ) ( 1.92 ) ( 1.18 ) Alice 4 2 ( 1.98 ) Bob 3 2 ( 1.21 ) Charlie 5 3 ( 2.30 ) 6 / 32

Latent Factor Models ◮ Discover latent factors ( r = 1) Avatar The Matrix Up ( 2.24 ) ( 1.92 ) ( 1.18 ) Alice 4 2 ( 1.98 ) ( 3.8 ) ( 2.3 ) Bob 3 2 ( 1.21 ) ( 2.7 ) ( 2.3 ) Charlie 5 3 ( 2.30 ) ( 5.2 ) ( 2.7 ) ◮ Minimum loss � ( V ij − [ WH ] ij ) 2 min W , H ( i , j ) ∈ Z 6 / 32

Latent Factor Models ◮ Discover latent factors ( r = 1) Avatar The Matrix Up ( 2.24 ) ( 1.92 ) ( 1.18 ) ? Alice 4 2 ( 1.98 ) ( 4.4 ) ( 3.8 ) ( 2.3 ) Bob 3 2 ? ( 1.21 ) ( 2.7 ) ( 2.3 ) ( 1.4 ) ? Charlie 5 3 ( 2.30 ) ( 5.2 ) ( 4.4 ) ( 2.7 ) ◮ Minimum loss � ( V ij − [ WH ] ij ) 2 min W , H ( i , j ) ∈ Z 6 / 32

Latent Factor Models ◮ Discover latent factors ( r = 1) Avatar The Matrix Up ( 2.24 ) ( 1.92 ) ( 1.18 ) ? Alice 4 2 ( 1.98 ) ( 4.4 ) ( 3.8 ) ( 2.3 ) Bob 3 2 ? ( 1.21 ) ( 2.7 ) ( 2.3 ) ( 1.4 ) ? Charlie 5 3 ( 2.30 ) ( 5.2 ) ( 4.4 ) ( 2.7 ) ◮ Minimum loss � ( V ij − µ − u i − m j − [ WH ] ij ) 2 min W , H , u , m ( i , j ) ∈ Z ◮ Bias 6 / 32

Latent Factor Models ◮ Discover latent factors ( r = 1) Avatar The Matrix Up ( 2.24 ) ( 1.92 ) ( 1.18 ) Alice ? 4 2 ( 1.98 ) ( 4.4 ) ( 3.8 ) ( 2.3 ) Bob 3 2 ? ( 1.21 ) ( 2.7 ) ( 2.3 ) ( 1.4 ) Charlie 5 ? 3 ( 2.30 ) ( 5.2 ) ( 4.4 ) ( 2.7 ) ◮ Minimum loss � ( V ij − µ − u i − m j − [ WH ] ij ) 2 min W , H , u , m ( i , j ) ∈ Z + λ ( � W � + � H � + � u � + � m � ) ◮ Bias, regularization 6 / 32

Latent Factor Models ◮ Discover latent factors ( r = 1) Avatar The Matrix Up ( 2.24 ) ( 1.92 ) ( 1.18 ) Alice ? 4 2 ( 1.98 ) ( 4.4 ) ( 3.8 ) ( 2.3 ) Bob 3 2 ? ( 1.21 ) ( 2.7 ) ( 2.3 ) ( 1.4 ) Charlie 5 ? 3 ( 2.30 ) ( 5.2 ) ( 4.4 ) ( 2.7 ) ◮ Minimum loss � ( V ij − µ − u i ( t ) − m j ( t ) − [ W ( t ) H ] ij ) 2 min W , H , u , m ( i , j , t ) ∈ Z t + λ ( � W ( t ) � + � H � + � u ( t ) � + � m ( t ) � ) ◮ Bias, regularization, time 6 / 32

Generalized Matrix Factorization ◮ A general machine learning problem ◮ Recommender systems, text indexing, face recognition, . . . 7 / 32

Generalized Matrix Factorization ◮ A general machine learning problem ◮ Recommender systems, text indexing, face recognition, . . . ◮ Training data ◮ V : m × n input matrix (e.g., rating matrix) ◮ Z : training set of indexes in V (e.g., subset of known ratings) V ij V 7 / 32

Generalized Matrix Factorization ◮ A general machine learning problem ◮ Recommender systems, text indexing, face recognition, . . . ◮ Training data ◮ V : m × n input matrix (e.g., rating matrix) ◮ Z : training set of indexes in V (e.g., subset of known ratings) ◮ Parameter space ◮ W : row factors (e.g., m × r latent customer factors) ◮ H : column factors (e.g., r × n latent movie factors) H H ∗ j W W i ∗ V ij V 7 / 32

Generalized Matrix Factorization ◮ A general machine learning problem ◮ Recommender systems, text indexing, face recognition, . . . ◮ Training data ◮ V : m × n input matrix (e.g., rating matrix) ◮ Z : training set of indexes in V (e.g., subset of known ratings) ◮ Parameter space ◮ W : row factors (e.g., m × r latent customer factors) ◮ H : column factors (e.g., r × n latent movie factors) H ◮ Model H ∗ j ◮ L ij ( W i ∗ , H ∗ j ): loss at element ( i , j ) ◮ Includes prediction error, regularization, auxiliary information, . . . ◮ Constraints (e.g., non-negativity) W W i ∗ V ij V 7 / 32

Generalized Matrix Factorization ◮ A general machine learning problem ◮ Recommender systems, text indexing, face recognition, . . . ◮ Training data ◮ V : m × n input matrix (e.g., rating matrix) ◮ Z : training set of indexes in V (e.g., subset of known ratings) ◮ Parameter space ◮ W : row factors (e.g., m × r latent customer factors) ◮ H : column factors (e.g., r × n latent movie factors) H ◮ Model H ∗ j ◮ L ij ( W i ∗ , H ∗ j ): loss at element ( i , j ) ◮ Includes prediction error, regularization, auxiliary information, . . . ◮ Constraints (e.g., non-negativity) W W i ∗ V ij ◮ Find best model � argmin L ij ( W i ∗ , H ∗ j ) V W , H ( i , j ) ∈ Z 7 / 32

Successful Applications ◮ Movie recommendation (Netflix, competition papers) ◮ > 12M users, > 20k movies, 2.4B ratings (projected) ◮ 36GB data, 9.2GB model (projected) ◮ Latent factor model ◮ Website recommendation (Microsoft, WWW10) ◮ 51M users, 15M URLs, 1.2B clicks ◮ 17.8GB data, 161GB metadata, 49GB model ◮ Gaussian non-negative matrix factorization ◮ News personalization (Google, WWW07) ◮ Millions of users, millions of stories, ? clicks ◮ Probabilistic latent semantic indexing 8 / 32

Successful Applications ◮ Movie recommendation (Netflix, competition papers) ◮ > 12M users, > 20k movies, 2.4B ratings (projected) ◮ 36GB data, 9.2GB model (projected) ◮ Latent factor model ◮ Website recommendation (Microsoft, WWW10) ◮ 51M users, 15M URLs, 1.2B clicks ◮ 17.8GB data, 161GB metadata, 49GB model ◮ Gaussian non-negative matrix factorization ◮ News personalization (Google, WWW07) ◮ Millions of users, millions of stories, ? clicks ◮ Probabilistic latent semantic indexing Distributed processing is necessary! ◮ Big data ◮ Large models ◮ Expensive computations 8 / 32

Stochastic Gradient Descent 1.0 5 4.5 4 5 ◮ Find minimum θ ∗ of function L . 5 5 . 6 6 5 . 5 4 7.5 7 3 . 5 3 2 . 5 0.5 0.0 ● 0 . 5 − 0.5 1 1.5 2 − 1.0 6.5 4.5 5 6 7 4 5 5 . − 1.0 − 0.5 0.0 0.5 1.0 10 / 32

Stochastic Gradient Descent 1.0 5 4.5 4 5 ◮ Find minimum θ ∗ of function L 5 . 5 . 6 6 5 . 5 4 7.5 7 3 . 5 3 2 . 5 ◮ Pick a starting point θ 0 0.5 ● 0.0 ● 0 . 5 − 0.5 1 1.5 2 − 1.0 6.5 4.5 5 6 7 4 5 5 . − 1.0 − 0.5 0.0 0.5 1.0 10 / 32

Large-Scale Matrix Factorization with Distributed Stochastic - PowerPoint PPT Presentation

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent Rainer Gemulla August 23, 2011 Peter J. Haas Yannis Sismanis Erik Nijkamp Outline Matrix Factorization Stochastic Gradient Descent Distributed SGD with MapReduce

Online-Updating Regularized Kernel Matrix Factorization Models for Large-Scale Recommender

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent Rainer Gemulla

L101: Matrix Factorization In a nutshell Matrix factorization/completion you know? In NLP?

CuMF: Large-Scale Matrix Factorization on Just One Machine with GPUs Wei Tan, IBM T. J. Watson

Tensor Factorization via Matrix Factorization Volodymyr Kuleshov Arun Tejasvi Chaganty Percy

A Model For Mixed Linear-Tropical Matrix Factorization James Hook, Sanjar Karaev, Pauli Miettinen

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Singular Value Decomposition (matrix factorization) Singular Value Decomposition The SVD is a

Matrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Large-Scale Data Fusion Tutorial at BC^2, Basel 2015 by Collective Matrix Factorization

NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization Jiezhong Qiu Tsinghua

Robust Spectral Inference for Joint Stochastic Matrix Factorization Kun Dong Cornell University

Multimodal Visualization Based On Non-negative Matrix Factorization Jorge Camargo Juan Caicedo

CSE 440: Introduction to HCI User Interface Design, Prototyping, and Evaluation Lecture 10: James

An Ensemble-based Approach to Click-Through Rate Prediction for Promoted Listings Kamelia

jQuery: Introduction ATLS 3020 - Digital Media 2 Week 8 - Day 1 Announcements March Mayhem

Data Mining and Matrices 00 Organization Rainer Gemulla, Pauli Miettinen April 18, 2013

CS6220: DATA MINING TECHNIQUES Chapter 11: Advanced Clustering Analysis Instructor: Yizhou Sun

OKSAT at NTCIR-13 OpenLiveQ Task - Mainly Offline Test Trials and Improvement- Takashi SATO

Email Marketing Tips Split Testing Case Studies

Questions and Answers Questions and Answers Q. What is Albridge? A. Albridge is a leading