word embeddings for arabic sentiment analysis
play

Word Embeddings for Arabic Sentiment Analysis A. Aziz Altowayan and - PowerPoint PPT Presentation

Word Embeddings for Arabic Sentiment Analysis A. Aziz Altowayan and L. Tao Pace University IEEE BigData 2016, Workshop (Dec. 5, 2016) A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis DISCLAIMER This


  1. Word Embeddings for Arabic Sentiment Analysis A. Aziz Altowayan and L. Tao Pace University IEEE BigData 2016, Workshop (Dec. 5, 2016) A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis

  2. DISCLAIMER This work directly apply existing word embedding techniques to Arabic text. With two main objectives in mind: How well (automatic) embedding-based features can perform in compare to manually-crafted features in Arabic text. Release the pre-trained embeddings data (and the datasets) for future use and comparison. A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis

  3. Problem Polarity classification and subjectivity detection in Arabic text Simple Classification Problem A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis

  4. Approach Workflow A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis

  5. Data For training word embeddings, we build the following corpus : A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis

  6. Data For classification, we experiment on 3 different datasets : Arabic translation of MPQA (Banea et al., 2010) Book reviews “LABR” (Aly and Atiya, 2013) Twitter (table below) A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis

  7. Word Vectors CBOW 1 T � � log p ( w | c ) w =1 c ∈ C Where, w = the current word and c = its context words 1 word2vec parameters: window size 10, dimension 300. And p(x) is calculated using Negative Sampling “NCE” A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis

  8. Vectors Evaluation How to evaluate the embedding vectors quality? Embeddings are evaluated based on the classifiers’ performance. A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis

  9. Vectors Evaluation How to evaluate the embedding vectors quality? Embeddings are evaluated based on the classifiers’ performance. Then manually verify results, e.g. sample analogy queries A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis

  10. Features Representation Embedding vs. manual features A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis

  11. Features Representation (linguistic issues) Spelling errors Semantic relatedness (lexicon like?) A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis

  12. Results and Comparison Banea et al. (2010) and Mourad et al. (2013) compared their models on two datasets: MPQA “Subjectivity” and ArabSenti 2 “Sentiment” Results compared on MPQA: 2 We could not obtain ArabSenti data. A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis

  13. Results and Comparison Performance can improve with more data Better accuracy with more examples (not fair comparison) Possibly, larger embeddings size will give better accuracy too. Current size: ~159K vocabulary. A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis

  14. Future Work and Conclusion Future Work Vectors evaluation. Incorporate manual features into the model. Deep models instead of the conventional classifiers. Conclusion Simple 3 word embeddings perform better than top hand-crafted methods 3 i.e. without any modification on the training samples. A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis

  15. Q/A Thank you for attention A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis

Recommend


More recommend