Shallow Reading with Deep Learning Predicting popularity of online content using only its title W. Stokowiec, W Trzci ń ski, K. Wo ł k, K. Marasek and P. Rokita Polish-Japanese Academy of Information Technology, Warsaw University of Technology Tooploox
Presentation plan 1. Popularity. What is it exactly? 2. Datasets description 3. Baselines 1. BoW + SVM 2. CNN 4. Bi-LSTM (our approach) 5. Results
Problem Given a title predict whether the content would be performing well with respect to a given popularity metric (views, reactions, etc.).
Problem definition Title: This Syrian-American poet just lost 10 family members in Syria — her story will break your heart (via NowThis Politics) Views: 5,623,842 Title: ‘He colluded or obstructed’: Trump turns Russia suspicions against Obama Reactions: 4,336
Problem definition • Popularity prediction framed as binary classification task • Population split into classes according to the median of normalized popularity metric distribution Distribution of logarithmized views Distribution of views for the NowThisNews dataset for the NowThisNews dataset
Datasets NowThis News (4K) Popularity proxy - number of views one week after publication
Datasets The BreakingNews Dataset (38K) Popularity proxy - number of comments under the article
This is amazing!
Last chance for a good title
Keyword analysis
Related work • Most of the work focuses on Twitter and its specific characteristics such as retweets or social graph analysis (Hong, 2011). • Recently, several works have touched on multimodal popularity prediction (Trzcinski, 2017) and (Chen, 2016). • Prediction of popularity online articles based on their whole text (Ramisa, 2016). • In our work, we focus only on the title ignoring everything else.
Baselines Bag of Words + SVM with Linear Kernel CNN (Ramisa, 2016) • Representing a title by a D x N matrix of concatenated GloVe word vectors. • 256 convolution filters with width 5 and stride equal to 1 and max pooling (x3). • FC layer with L2 regularization, ReLU dropout • Final FC layer with sigmoid* Convolutional Neural Network (Ramisa, 2016)
Bidirectional Long Short-Term Memory Network Architecture • 1-of-K word encoding • GloVe as an embedding layer • Bidirectional LSTM for title encoding • Regularization (Dropout, L2) • Sigmoid on the top
Results We used k-fold evaluation protocol with k=5
Results
BiLSTM Hidden State Interpretation • Concatenation of hidden states at time t ( )can be seen as context- depended vector representation of word w_t • This allows us to introspect a given title and approximate the contribution of each word in the sequence to the popularity • The output of the last fully-connected layer could be interpreted as context-depended influence of a word w_t on popularity Visualization of context-depended word influence
Conclusions • To our knowledge, this is the first attempt of predicting the performance of content on social media using only textual information from its title. • We show that our method consistently outperforms baseline models. • We are able to introspect the model and use the hidden states to better understand audience preferences.
Thank you!
References 1. A. Ramisa, F. Yan, F. Moreno-Noguer, K. Mikolajczyk; BreakingNews: Article Annotation by Image and Text Processing; arXiv:1603.07141 [cs.CV], 2016. 2. J. Chen, X. Song, L. Nie, X. Wang, H. Zhang, and T. Chua. Micro tells macro: Predicting the popularity of micro-videos via a transductive model. In ACMMM, 2016. 3. L. Hong, O. Dan, and B. Davison. Predicting popular messages in twitter. In Proc. International Conference Companion on World Wide Web, 2011. 4. T. Trzcinski, P. Rokita. Predicting popularity of online videos using Support Vector Regression. IEEE Trans. Multimedia (TMM), 2017.
Recommend
More recommend