artm vs. lda: an svd extension case study . Sergey I. Nikolenko 1,2,3,4 1 Steklov Institute of Mathematics at St. Petersburg 2 National Research University Higher School of Economics, St. Petersburg 3 Kazan (Volga Region) Federal University, Kazan, Russia 4 Deloitte Analytics Institute, Moscow, Russia April 7, 2016
problem setting . • Main goal: to recommend full text items (posts in social networks, web pages etc.) to users. • In particular, enrich recommender systems with text-based features; this is especially important for cold start. • These features can come from topic modeling . • LDA extensions can be developed to extract features relevant to recommendations. 2
supervised lda . • Supervised LDA: • assumes that each document has a response variable; • and the purpose is to predict this variable rather than just “learn something about the dataset”; • can we learn topics that are relevant to this specific response variable? • In recommender systems, the response variable would be the probability of a like, an explicit rating, or some other desirable action. • This adds new variables to the graph. 3
pgm for slda . 4
logistic slda and svd-lda • a new unified SVD-LDA model: This wasn’t easy. developed an approximate sampler (first-order approximation). • Gibbs sampling was too computationally intensive, so we Gibbs sampling scheme for it; . 5 • an extension of supervised LDA to handle logistic variables: offered only variational approximations); • a Gibbs sampling scheme for supervised LDA (the original paper • Our previous results (MICAI 2015): p = σ (b ⊤ ̄ z + a) , p(success i,a ) = σ (µ + b i + b a + q ⊤ a p i + θ ⊤ a l i ) , where b i , b a , q a , q i are SVD predictors, θ a are topic distributions, and l i are user predictors for the topics, and a
artm • Solve Karush-Kuhn-Tucker conditions with Newton’s method: ∂θ td ∂R ∂φ wt ∂R n dw p tdw , w∈d ∑ n dw p tdw , d∈D . ∑ ρ i R i (Φ, Θ). w∈W • Additive Regularization of Topic Models (ARTM) simply adds regularizers to the objective function on the training stage of the basic pLSA model (Vorontsov et al., 2013, 2014, 2015): i=1 d∈D ∑ 6 r ∑ L(Φ, Θ)+R(Φ, Θ) = ∑ n dw ln p(w ∣ d)+ p tdw = norm + t∈T (φ wt θ td ) , n wt = n td = φ wt = norm + w∈W (n wt + φ wt ) , θ td = norm + t∈T (n td + θ td ) , where norm + denotes non-negative normalization.
svd-artm is the vector of topics trained for document a in the LDA model, a l i . + θ ⊤ i,a r SVD ̂ 1 . 7 where [r = −1] = 1 if r = −1 and [r = −1] = 0 otherwise, and θ a D R(Φ, Θ) = ln p(D ∣ μ, b i , b a , p i , q a , l i , θ a ) = D = {(i, a, r)} (user i rated item a as r ∈ {−1, 1} ) is likelihood of the dataset with ratings comprised of triples • We extend ARTM with an SVD-based regularizer: the total = ∑ ln ([r = −1] − σ (̂ r i,a )) , θ a = N a ∑ w∈a z w , where N a is the length of document a , and r i,a = ̂ a l i = μ + b i + b a + q ⊤ a p i + θ ⊤
svd-artm . scheme automatically. • So this is a good case study for ARTM: we got a reasonable approximation scheme for SVD-LDA! • Turns out it’s exactly the same as the painstakingly developed + θ ⊤ i,a r SVD ∑ = ∂θ ta ∂R(Φ, Θ) = 0, ∂φ wt ∂R(Φ, Θ) derivatives: • To add this regularizer to the pLSA model, we compute its partial 7 a l i )] l i . (i,a,r)∈D [[r = 1] − σ(̂
thank you! . Thank you for your attention! 8
Recommend
More recommend