Introduction & Problem Definition Approach Experiments & Analysis POLAR: Attention-based CNN for One-shot Personalized Article Recommendation Zhengxiao Du, Jie Tang, Yuhui Ding Tsinghua University { duzx16, dingyh15 } @mails.tsinghua.edu.cn, jietang@tsinghua.edu.cn September 13, 2018 1 / 20
Introduction & Problem Definition Approach Experiments & Analysis Motivation The publication output is growing every year (data source: DBLP) Book and Theses Conference and Workshop Papers 300k Editorship Informal Publications Journal Articles Parts in Books or Collections Reference Works 250k 200k 150k 100k 50k 0k 2 / 20 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
Introduction & Problem Definition Approach Experiments & Analysis Related-Article Recommendation Figure: An example from AMiner.org 3 / 20
Introduction & Problem Definition Approach Experiments & Analysis Challenge How to provide personalized and non-personalized recommendation? How to overcome the sparsity of user feedback? How to utilize representative texts of articles effectively? 4 / 20
Introduction & Problem Definition Approach Experiments & Analysis Problem Definition Definition One-shot Personalized Article Recommendation Problem 5 / 20
Introduction & Problem Definition Approach Experiments & Analysis Problem Definition Definition One-shot Personalized Article Recommendation Problem Input : query article d q candidate set D = { d 1 , d 2 , · · · , d N } set S = { ( ˆ y i ) } T support d i , ˆ i =1 related to user u 5 / 20
Introduction & Problem Definition Approach Experiments & Analysis Problem Definition Definition One-shot Personalized Article Recommendation Problem Input : query article d q candidate set D = { d 1 , d 2 , · · · , d N } set S = { ( ˆ y i ) } T support d i , ˆ i =1 related to user u Output : a totally ordered set R ( d q , S ) ⊂ D with | R | = k 5 / 20
Introduction & Problem Definition Approach Experiments & Analysis One-shot Learning Image Classification 1 k � y = ˆ a (ˆ x , x i ) y i i =1 1 Vinyals et al., Matching Networks for One Shot Learning. 6 / 20
Introduction & Problem Definition Approach Experiments & Analysis One-shot Learning Image Classification 1 Article Recommendation Query article d q Support set { ( d i , y i ) } T i =1 j =1 c( ˆ 1 � T d i , d j ) y j T the matching to the user k preference(maybe missing) � y = ˆ a (ˆ x , x i ) y i i =1 1 Vinyals et al., Matching Networks for One Shot Learning. 6 / 20
Introduction & Problem Definition Approach Experiments & Analysis One-shot Learning Image Classification 1 Article Recommendation Query article d q Support set { ( d i , y i ) } T i =1 j =1 c( ˆ 1 � T s i = ˆ d i , d j ) y j T the matching to the user k preference(maybe missing) � y = ˆ a (ˆ x , x i ) y i i =1 1 Vinyals et al., Matching Networks for One Shot Learning. 6 / 20
Introduction & Problem Definition Approach Experiments & Analysis One-shot Learning Image Classification 1 Article Recommendation Query article d q Support set { ( d i , y i ) } T i =1 s i = c( d q , ˆ j =1 c( ˆ d i ) + 1 � T ˆ d i , d j ) y j T the matching to the query article the matching to the user k preference(maybe missing) � y = ˆ a (ˆ x , x i ) y i i =1 1 Vinyals et al., Matching Networks for One Shot Learning. 6 / 20
Introduction & Problem Definition Approach Experiments & Analysis Architecture Query w q 1 ⃗ k w q 2 ⃗ k · · · w ql q ⃗ k Candidate Personalized w d 1 ⃗ k Score w d 2 ⃗ k · · · w dl d ⃗ k Embedding Layer 7 / 20
Introduction & Problem Definition Approach Experiments & Analysis Architecture Query w q 1 ⃗ k Matching w q 2 ⃗ Matrix k · · · w ql q ⃗ k Candidate Personalized w d 1 ⃗ k Score w d 2 ⃗ k · · · Attention w dl d ⃗ k Matrix Embedding Conv Layer Input 7 / 20
Introduction & Problem Definition Approach Experiments & Analysis Architecture Query w q 1 ⃗ k Matching Hidden State w q 2 ⃗ Matrix k Feature · · · Map w ql q ⃗ k Matching Candidate Score Personalized w d 1 ⃗ k Score w d 2 ⃗ k · · · Attention w dl d ⃗ k Matrix Embedding Conv Convolution and Full-Connected Layer Input Max-Pooling Layer 7 / 20
Introduction & Problem Definition Approach Experiments & Analysis Architecture Query Support Set w q 1 ⃗ Matching k Hidden State w q 2 ⃗ k Matrix ˆ ˆ ˆ d 1 d 2 d T Feature · · · Map ˆ y 1 y 2 ˆ y T ˆ w ql q ⃗ k Matching Candidate Score Personalized w d 1 ⃗ k Score w d 2 ⃗ k · · · Attention w dl d ⃗ k Matrix Embedding Conv Convolution and Full-Connected One Shot Layer Input Max-Pooling Layer Matching 7 / 20
Introduction & Problem Definition Approach Experiments & Analysis Architecture Query Support Set w q 1 ⃗ Matching k Hidden State w q 2 ⃗ k Matrix ˆ ˆ ˆ d 1 d 2 d T Feature · · · Map y 1 ˆ y 2 ˆ y T ˆ w ql q ⃗ k Matching Candidate Score Personalized w d 1 ⃗ k Score w d 2 ⃗ k · · · Final Attention Score w dl d ⃗ k Matrix Embedding Conv Convolution and Full-Connected One Shot Layer Input Max-Pooling Layer Matching 7 / 20
Introduction & Problem Definition Approach Experiments & Analysis Matching Matrix and Attention Matrix Matching Matrix :( d m , d n ) → R l m × l n the similarity between the words of two articles. w T � mi · � w nj M ( m , n ) = i , j � � w mi � · � � w nj � 8 / 20
Introduction & Problem Definition Approach Experiments & Analysis Matching Matrix and Attention Matrix Matching Matrix :( d m , d n ) → R l m × l n the similarity between the words of two articles. w T � mi · � w nj M ( m , n ) = i , j � � w mi � · � � w nj � Attention Matrix :( d m , d n ) → R l m × l n the importance of the matching signals A ( m , n ) = r mi · r nj i , j 8 / 20
Introduction & Problem Definition Approach Experiments & Analysis Local Weight and Global Weight The word weight r t is the product of its local weight and global weight. 9 / 20
Introduction & Problem Definition Approach Experiments & Analysis Local Weight and Global Weight The word weight r t is the product of its local weight and global weight. Global Weight : The importance of a word in the corpus(shared among different articles) υ ij = [IDF( t ij )] β 9 / 20
Introduction & Problem Definition Approach Experiments & Analysis Local Weight and Global Weight The word weight r t is the product of its local weight and global weight. Global Weight : The importance of a word in the corpus(shared among different articles) υ ij = [IDF( t ij )] β The local weight is a little more complicated. . . 9 / 20
Introduction & Problem Definition Approach Experiments & Analysis Local Weight Local Weight : The importance of a word in the article A neural network is employed to compute the local weight. The lines with arrows denote the feature vectors 7 The feature vector for 6 The triangular points denote the 5 word t ij vectors of the words in two texts 4 3 w ij − � x ij = � � w i 2 The circular points denote the mean vectors of the texts. 1 0 1 2 3 4 5 6 7 10 / 20
Introduction & Problem Definition Approach Experiments & Analysis Local Weight Network The feature vector � x ij represents the semantic difference between the article and the term. u ( L ) Let � be the output of the last linear layer, the output of ij the local weight network is µ ij = σ ( W ( L ) · � u ( L ) + b ( L ) ) + α ij α sets a lower bound for local weights. 11 / 20
Introduction & Problem Definition Approach Experiments & Analysis CNN & Training The matching matrix and attention matrix are combined by element-wise multiplication and sent to a CNN. Matching Hidden State Matrix Feature Map Matching Score Attention Matrix Conv Convolution and Full-Connected Input Max-Pooling Layer The entire model, including the local weight network, is trained on the target task. 12 / 20
Introduction & Problem Definition Approach Experiments & Analysis Dataset AMiner : papers from ArnetMiner 1 Patent : patent documents from USPTO RARD (Related Article Recommendation Dataset 2 ) :from Sowiport, a digital library service provider. 1 Tang et al. ArnetMiner: Extraction and Mining of Academic Social Networks. In SIGKDD’2008. 2 Beel et al. Rard: The related-article recommendation dataset (2017) 13 / 20
Recommend
More recommend