tagspace semantic embeddings from hashtags
play

#TagSpace: Semantic Embeddings from Hashtags Jason Weston, Sumit - PowerPoint PPT Presentation

#TagSpace: Semantic Embeddings from Hashtags Jason Weston, Sumit Chopra, Keith Adams 2014 Jack Lanchantin Motivation Word and document embeddings are difficult to learn Most current techniques use unsupervised methods word2vec


  1. #TagSpace: Semantic Embeddings from Hashtags Jason Weston, Sumit Chopra, Keith Adams – 2014 Jack Lanchantin

  2. Motivation • Word and document embeddings are difficult to learn • Most current techniques use unsupervised methods • word2vec learns word embeddings by trying to predict each word in a doc based on surrounding text • Hashtags: labels of text for such as sentiment (#happy) or topic annotation (#nyc) written by the post author • Hashtag prediction provides better way to learn word and document embeddings than unsupervised learning because hashtags provide stronger semantic guidance

  3. Overview Model Hashtag Prediction Document Recommendation Conclusion

  4. Overview • #TagSpace: Convolutional Neural Network that learns features (embeddings) of short textual posts using hashtags as the supervised signal • Train the network to be able to optimally predict hashtags on test posts • The learned embedding of text (ignoring the hashtag labels) is useful for other tasks such as document recommendation

  5. Overview Model Hashtag Prediction Document Recommendation Conclusion

  6. Neural Net For Scoring a (doc, hashtag) Pair Representation of entire document Assigning a d Assigning a d Hidden network layers dimensional dimensional Scoring vector to each of vector to the function the l words hashtag

  7. Training the Scoring Function • Given a document, rank all hashtags by score: • • Loss function is used to approximately optimize the top of the ranked list – useful for P and R@k • More energy spent on improving ranking of positive labels near the top of ranked list

  8. Overview Model Hashtag Prediction Document Recommendation Conclusion

  9. Hashtag Prediction • Goal: Rank a post’s ground truth hashtags higher than hashtags it does not contain • Test using: Precision @ 1, Recall @10, mean rank for the hashtags of 50,000 test posts • Compared to 4 other models: – Frequency: always ranks hashtags by training frequency – #words: “crazy commute this am” → #crazy, #commute, #this, #am – Word2vec (unsupervised) – WSABIE (supervised)

  10. Data Business, Celebrity, Brand, or Product Individual users

  11. #TagSpace Examples (256 dim) Post Predicted Hashtags

  12. Hashtag Prediction Results

  13. Overview Model Hashtag Prediction Document Recommendation Conclusion

  14. Personalized Document Recommendation • Goal: extend the learned representations from predicting hashtags to do other tasks • Document recommendation: recommending documents to users based on interaction history • Used day-long interaction histories for 34,000 people on Facebook • Text of posts that he/she liked, clicked, replied to • Given n-1 trailing posts, predict the nth post by ranking it against 10,000 other posts • Score of nth post is obtained by max embedding similarity over n-1 posts • Used cosine similarity between post embeddings

  15. Document Recommendation Results TF-IDF weighted bag of words baseline Best results come from summing BOW scores w/ Tagspace

  16. Overview Model Hashtag Prediction Document Recommendation Conclusion

  17. Conclusion • Outperformed all comparison models in hashtag prediction • Model scales very well when considering a large number (millions) of hashtags • Logistic regression and SVMs do not • Semantics of hashtags cause #TagSpace to learn features that capture important aspects of text • Able to port the learned embeddings to the task of personalized document recommendation with better accuracy than other models

  18. #TagSpace: https://research.facebook.com/publications/279494668926031/- tagspace-semantic-embeddings-from-hashtags/ WSABIE: http://www.thespermwhale.com/jaseweston/papers/wsabie-ijcai.pdf Word2Vec: http://arxiv.org/abs/1301.3781

Recommend


More recommend