Recommender systems From LDA to SVD-LDA SVD-LDA: Topic Modeling for Full-Text Recommender Systems Sergey Nikolenko Steklov Mathematical Institute at St. Petersburg Laboratory for Internet Studies, National Research University Higher School of Economics, St. Petersburg October 30, 2015 Sergey Nikolenko SVD-LDA
Recommender systems Intro From LDA to SVD-LDA Recsys overview Outline Recommender systems 1 Intro Recsys overview From LDA to SVD-LDA 2 Latent Dirichlet Allocation SVD-LDA Sergey Nikolenko SVD-LDA
Recommender systems Intro From LDA to SVD-LDA Recsys overview Overview Very brief overview of the paper: our main goal is to recommend full text items (posts in social networks, web pages etc.) to users; in particular, we want to extend recommender systems with features coming from the texts; this is especially important for cold start; these features can come from topic modeling; in this work, we combine classical SVD and LDA models into one, training them together. Sergey Nikolenko SVD-LDA
Recommender systems Intro From LDA to SVD-LDA Recsys overview Recommender systems Recommender systems analyze user interests and attempt to predict what the current user will be most interested in now. Collaborative filtering: given a sparse matrix of ratings assigned by users to items, predict unknown ratings (and hence recommend items with best predictions): nearest neighbor methods (user-user and item-item) – GroupLens; SVD – singular value decomposition: decompose a user × item matrix, reducing dimensionality as user × item = ( user × feature )( feature × item ) (with very few features compared to users and items), learning user and item features that can be used to make predictions. Sergey Nikolenko SVD-LDA
Recommender systems Intro From LDA to SVD-LDA Recsys overview Recommender systems Formally speaking, SVD models a rating as r SVD = µ + b i + b a + q ⊤ ^ a p i , where i , a b i is the baseline predictor for user i ; b a is the baseline predictor for user a ; q a and p i are feature vectors for i and a . Then you can train b i , b a , q a , p i together by fitting actual ratings to the model (by alternating least squares). Importantly for us, if you have likes/dislikes rather than explicit ratings, you can use logistic SVD (trained by alternating logistic regression): � � µ + b i + b a + q ⊤ p ( Like i , a ) = σ a p i . Sergey Nikolenko SVD-LDA
Recommender systems Intro From LDA to SVD-LDA Recsys overview Recommender systems Many modifications of classical recommender systems use additional information: implicit user preferences (what the user viewed – e.g., SVD++); the time when ratings appear (e.g., time-SVD++); social graph when the users’ social network profiles are available; context-aware recommendations (time of day, situation, company etc.); recommendations aware of other recommendations (optimizing diversity, novelty, serendipity). In this work, we concentrate on the textual content of items. Sergey Nikolenko SVD-LDA
Recommender systems Intro From LDA to SVD-LDA Recsys overview Recommender systems The main dataset for the project comes from a Russian recommender system Surfingbird : Surfingbird recommends web pages to users; a user clicks “Surf”, sees a new page, maybe rates it by clicking “Like” or “Dislike”; web pages usually have content , often textual content; the text may be very useful for recommendations; how do we use it? Sergey Nikolenko SVD-LDA
Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA Outline Recommender systems 1 Intro Recsys overview From LDA to SVD-LDA 2 Latent Dirichlet Allocation SVD-LDA Sergey Nikolenko SVD-LDA
Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA Topic modeling with LDA Latent Dirichlet Allocation (LDA) – topic modeling for a corpus of texts: a document is represented as a mixture of topics; a topic is a distribution over words; to generate a document, for each word we sample a topic and then sample a word from that topic; by learning these distributions, we learn what topics appear in a dataset and in which documents. Sergey Nikolenko SVD-LDA
Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA Topic modeling with LDA Sample LDA result from (Blei, 2012): Sergey Nikolenko SVD-LDA
Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA Topic modeling with LDA Sample LDA result from (Blei, 2012): Sergey Nikolenko SVD-LDA
Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA PGM for LDA Sergey Nikolenko SVD-LDA
Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA Inference in LDA There are two major approaches to inference in probabilistic models with a loopy factor graph like LDA: variational approximations simplify the graph by approximating the underlying distribution with a simpler one, but with new parameters that are subject to optimization; Gibbs sampling approaches the underlying distribution by sampling a subset of variables conditional on fixed values of all other variables. Both approaches have been applied to LDA. In a way, LDA is similar to SVD – it performs dimensionality reduction and, so to speak, decomposes document × word = ( document × topic )( topic × word ) . Sergey Nikolenko SVD-LDA
Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA LDA likelihood Thus, the total likelihood of the LDA model is � p ( z , w , α, β ) = p ( θ | α ) p ( z | θ ) p ( w | z , φ ) p ( φ | β ) d θ d φ . θ , φ And in Gibbs sampling, we sample p ( z w = t | z − w , w , α, β ) ∝ q ( z w , t , z − w , w , α, β ) = n ( d ) n ( w ) − w , t + α − w , t + β = � . � � � n ( d ) n ( w ′ ) � − w , t ′ + α � − w , t + β t ′ ∈ T w ′ ∈ W Sergey Nikolenko SVD-LDA
Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA LDA extensions There already exist LDA extensions relevant to our research: DiscLDA: LDA for classification with a class-dependent transformation in the topic mixtures; Supervised LDA: documents with a response variable, we mine topics that are indicative of the response; TagLDA: words have tags that mark context or linguistic features; Tag-LDA: documents have topical tags, the goal is to recommend new tags to documents; Topics over Time: topics change their proportions with time; hierarchical modifications with nested topics are also important. In this work, we develop a novel extension: SVD-LDA. Sergey Nikolenko SVD-LDA
Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA Supervised LDA We begin with supervised LDA: assumes that each document has a response variable; and the purpose is to predict this variable rather than just “learn something about the dataset”; can we learn topics that are relevant to this specific response variable? In recommender systems, the response variable would be the probability of a like, an explicit rating, or some other desirable action. This adds new variables to the graph. Sergey Nikolenko SVD-LDA
Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA PGM for sLDA Sergey Nikolenko SVD-LDA
Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA Supervised LDA Mathematically, we add a factor corresponding to the response variable (Gaussian in sLDA): � − 1 � 2 � � p ( y d | z , b , σ 2 ) = exp y d − b ⊤ ✖ z − a , 2 the total likelihood is now p ( z | w , y , b , α, β, σ 2 ) ∝ � � 2 � B ( n d + α ) B ( n t + β ) − 1 � � � � y d − b ⊤ ✖ z d − a exp , ∝ B ( α ) B ( β ) 2 t d d Sergey Nikolenko SVD-LDA
Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA Supervised LDA The Gibbs sampling goes as p ( z w = t | z − w , w , α, β ) ∝ � − 1 � 2 � � y d − b ⊤ ✖ ∝ q ( z w , t , z − w , w , α, β ) exp z − a = 2 n ( d ) n ( w ) − w , t + α − w , t + β � � 2 � − 1 � y d − b ⊤ ✖ = z − a , � exp � � � n ( d ) n ( w ′ ) 2 � − w , t ′ + α � − w , t + β t ′ ∈ T w ′ ∈ W but it is now a two-step iterative algorithm: sample z according to equations above; train b , a as a regression. Sergey Nikolenko SVD-LDA
Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA Logistic sLDA Hence, our first results: a Gibbs sampling scheme for supervised LDA (the original paper offered only variational approximations); an extension of supervised LDA to handle logistic variables: � � b ⊤ ✖ p = σ z + a . Sergey Nikolenko SVD-LDA
Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA Logistic sLDA Hence, our first results: logistic regression is used to train b , a ; and the Gibbs sampling goes as p ( z w = t | z − w , w , α, β ) ∝ �� 1 − y x = �� y x � � � � � b ⊤ ✖ b ⊤ ✖ ∝ q ( z w , t , z − w , w , α, β ) σ z d + a 1 − σ z d + a x ∈ X d n ( d ) n ( w ) − w , t + α − w , t + β = � × � � � n ( w ′ ) n ( d ) � − w , t ′ + α � − w , t + β t ′ ∈ T w ′ ∈ W × exp ( s d log p d + ( | X d | − s d ) log ( 1 − p d )) . Sergey Nikolenko SVD-LDA
Recommend
More recommend