Relational Stacked Denoising Autoencoder for Tag Recommendation Hao Wang Dept. of Computer Science and Engineering Hong Kong University of Science and Technology Joint work with Xingjian Shi and Dit-Yan Yeung To appear in AAAI 2015

  Relational Stacked Denoising Autoencoder for Tag Recommendation Hao Wang Dept. of Computer Science and Engineering Hong Kong University of Science and Technology Joint work with Xingjian Shi and Dit-Yan Yeung To appear in AAAI 2015

  Outline Background and Related Work 1 Generalized Probabilistic SDAE 2 Relational SDAE 3 Performance Evaluation 4 Case study 5 Conclusion 6

  Outline Background and Related Work 1 Generalized Probabilistic SDAE 2 Relational SDAE 3 Performance Evaluation 4 Case study 5 Conclusion 6

  Tag Recommendation: Flickr

  Tag Recommendation: CiteULike

  item tag 1 1 2 2 3 3 4 4 5 5 Tag Recommendation

  Related Work Content-based: 1 Chen et al., 2008 2 Chen et al., 2010 3 Shen and Fan, 2010 Co-occurrence based: 1 Garg and Weber, 2008 2 Weinberger et al., 2008 3 Rendle and Schmidt-Thieme, 2010 Hybrid: 1 Wu et al., 2009 2 Wang and Blei, 2011 3 Yang et al., 2013

  Content-based 1 Chen et al., 2008 2 Chen et al., 2010 3 Shen and Fan, 2010 4 . . . Pros: 1 Tag independence 2 Interpretability 3 No New-item problem Cons: 1 Need domain knowledge

  Co-occurrence based 1 Garg and Weber, 2008 2 Weinberger et al., 2008 3 Rendle and Schmidt-Thieme, 2010 4 . . . Pros: 1 No domain knowledge needed Cons: 1 Requires some form of rating feedback (co-occurrence matrix) 2 New-tag problem and new-item problem

  Hybrid 1 Wu et al., 2009 2 Wang and Blei, 2011 3 Yang et al., 2013 4 . . . BEST OF BOTH WORLDS

  Collaborative Topic Regression (CTR) (Wang and Blei, KDD 2011)

  Collaborative Topic Regression (CTR) (Wang and Blei, KDD 2011) LDA : sparse, relatively high dimension MF : low rank, low dimension

  Problems to Explore 1 Can SDAE learn effective representation for recommendation? 2 How to incorporate relational information into SDAE? 3 How is the performance?

  Outline Background and Related Work 1 Generalized Probabilistic SDAE 2 Relational SDAE 3 Performance Evaluation 4 Case study 5 Conclusion 6

  Stacked Denoising Autoencoder (Vincent et al. JMLR 2010) X 1 X 1 X 2 X 2 X 3 X 3 X 4 X 4 X c X c X 0 X 0 � { W l } , { b l } � X c − X L � 2 � W l � 2 min F + λ F , l where λ is a regularization parameter and � · � F denotes the Frobenius norm.

  Generalized Probabilistic SDAE ¸ w ¸ w W + W + x 4 x 4 x c x c x 0 x 0 x 1 x 1 x 2 x 2 x 3 x 3 ¸ n ¸ n J 1 For each layer l of the SDAE network, 1 For each column n of the weight matrix W l , draw W l, ∗ n ∼ N (0 , λ − 1 w I K l ) . 2 Draw the bias vector b l ∼ N (0 , λ − 1 w I K l ) . 3 For each row j of X l , draw X l,j ∗ ∼ N ( σ ( X l − 1 ,j ∗ W l + b l ) , λ − 1 s I K l ) . 2 For each item j , draw a clean input X c,j ∗ ∼ N ( X L,j ∗ , λ − 1 n I B ) .

  Outline Background and Related Work 1 Generalized Probabilistic SDAE 2 Relational SDAE 3 Performance Evaluation 4 Case study 5 Conclusion 6

  Relational SDAE: Generative Process 1 Draw the relational latent matrix S from a matrix variate normal distribution : S ∼ N K,J (0 , I K ⊗ ( λ l L a ) − 1 ) . 2 For layer l of the SDAE where l = 1 , 2 , . . . , L 2 − 1 , 1 For each column n of the weight matrix W l , draw W l, ∗ n ∼ N (0 , λ − 1 w I K l ) . 2 Draw the bias vector b l ∼ N (0 , λ − 1 w I K l ) . 3 For each row j of X l , draw X l,j ∗ ∼ N ( σ ( X l − 1 ,j ∗ W l + b l ) , λ − 1 s I K l ) . 3 For layer L 2 of the SDAE network, draw the representation vector for item j from the product of two Gaussians (PoG): 2 − 1 ,j ∗ W l + b l ) , s T j , λ − 1 s I K , λ − 1 2 ,j ∗ ∼ PoG ( σ ( X L X L r I K ) .

  Relational SDAE: Generative Process 1 For layer l of the SDAE network where l = L 2 + 1 , L 2 + 2 , . . . , L , 1 For each column n of the weight matrix W l , draw W l, ∗ n ∼ N (0 , λ − 1 w I K l ) . 2 Draw the bias vector b l ∼ N (0 , λ − 1 w I K l ) . 3 For each row j of X l , draw X l,j ∗ ∼ N ( σ ( X l − 1 ,j ∗ W l + b l ) , λ − 1 s I K l ) . 2 For each item j , draw a clean input X c,j ∗ ∼ N ( X L,j ∗ , λ − 1 n I B ) .

  Relational SDAE: Graphical Model ¸ w ¸ w W + W + x c x c x 0 x 0 x 2 x 2 x 3 x 3 x 4 x 4 x 1 x 1 ¸ n ¸ n s J ¸ l ¸ l ¸ r ¸ r A

  Multi-Relational SDAE: Graphical Model ¸ w ¸ w W + W + x c x c x 4 x 4 x 0 x 0 x 1 x 1 x 2 x 2 x 3 x 3 ¸ n ¸ n s J ¸ l ¸ l ¸ r ¸ r A Q

  Relational SDAE: Objective function The log-likelihood: L = − λ l 2 tr( S L a S T ) − λ r � � ( s T 2 ,j ∗ ) � 2 j − X L 2 2 j − λ w � ( � W l � 2 F + � b l � 2 2 ) 2 l − λ n � � X L,j ∗ − X c,j ∗ � 2 2 2 j − λ s � � � σ ( X l − 1 ,j ∗ W l + b l ) − X l,j ∗ � 2 2 , 2 l j where X l,j ∗ = σ ( X l − 1 ,j ∗ W l + b l ) . Similar to the generalized SDAE, taking λ s to infinity, the last term of the joint log-likelihood will vanish.

  Updating Rules For S : S k ∗ ( t + 1) ← S k ∗ ( t ) + δ ( t ) r ( t ) r ( t ) ← λ r X T 2 , ∗ k − ( λ l L a + λ r I J ) S k ∗ ( t ) L r ( t ) T r ( t ) δ ( t ) ← r ( t ) T ( λ l L a + λ r I J ) r ( t ) . For X , W , and b : Use Back Propagation.

  From Representation to Tag Recommendation Objective function: L = − λ u 2 − λ v � � u i � 2 � � v j − X T 2 ,j ∗ � 2 L 2 2 2 i j c ij � 2 ( R ij − u T i v j ) 2 , − i,j where λ u and λ v are hyperparameters. c ij is set to 1 for the existing ratings and 0 . 01 for the missing entries.

  Algorithm 1. Learning representation: repeat Update S using the updating rules Update X , W , and b until convergence Get resulting representation X L 2 ,j ∗ 2. Learning u i and v j : Optimize the objective function L 3. Recommend tags to items according to the predicted R ij : R ij = u T i v j Rank R 1 j , R 2 j , . . . , R Ij Recommend tags with largest R ij to item j

  Problems to Explore 1 Can SDAE learn effective representation for recommendation? 2 How to incorporate relational information into SDAE? 3 How is the performance?

  Outline Background and Related Work 1 Generalized Probabilistic SDAE 2 Relational SDAE 3 Performance Evaluation 4 Case study 5 Conclusion 6

  Datasets Description of datasets citeulike-a citeulike-t movielens-plot #items 16980 25975 7261 #tags 7386 8311 2988 #tag-item paris 204987 134860 51301 #relations 44709 32665 543621

  citeulike-a , Sparse Settting RSDAE SDAE 0.35 CTR−SR CTR 0.3 Recall 0.25 0.2 0.15 50 100 150 200 250 300 M

  citeulike-a , Dense Settting 0.6 RSDAE SDAE 0.55 CTR−SR CTR 0.5 Recall 0.45 0.4 0.35 0.3 50 100 150 200 250 300 M

  movielens-plot , Sparse Settting 0.24 RSDAE SDAE 0.22 CTR−SR 0.2 CTR 0.18 Recall 0.16 0.14 0.12 0.1 0.08 0.06 50 100 150 200 250 300 M

  movielens-plot , Dense Settting 0.45 RSDAE SDAE CTR−SR 0.4 CTR 0.35 Recall 0.3 0.25 0.2 50 100 150 200 250 300 M

  Outline Background and Related Work 1 Generalized Probabilistic SDAE 2 Relational SDAE 3 Performance Evaluation 4 Case study 5 Conclusion 6

  Tagging Scientific Articles An example article with recommended tags Title: Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews Example Article Top topic 1: language, text, mining, representation, semantic, concepts, words, relations, processing, categories SDAE True? RSDAE True? 1. instance no 1. sentiment analysis no 2. consumer yes 2. instance no 3. sentiment analysis no 3. consumer yes 4. summary no 4. summary no Top 10 tags 5. 31july09 no 5. sentiment yes 6. medline no 6. product review mining yes 7. eit2 no 7. sentiment classification yes 8. l2r no 8. 31july09 no 9. exploration no 9. opinion mining yes 10. biomedical no 10. product yes


