Word Embeddings for Arabic Sentiment Analysis A. Aziz Altowayan and L. Tao Pace University IEEE BigData 2016, Workshop (Dec. 5, 2016) A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis
DISCLAIMER This work directly apply existing word embedding techniques to Arabic text. With two main objectives in mind: How well (automatic) embedding-based features can perform in compare to manually-crafted features in Arabic text. Release the pre-trained embeddings data (and the datasets) for future use and comparison. A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis
Problem Polarity classification and subjectivity detection in Arabic text Simple Classification Problem A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis
Approach Workflow A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis
Data For training word embeddings, we build the following corpus : A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis
Data For classification, we experiment on 3 different datasets : Arabic translation of MPQA (Banea et al., 2010) Book reviews “LABR” (Aly and Atiya, 2013) Twitter (table below) A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis
Word Vectors CBOW 1 T � � log p ( w | c ) w =1 c ∈ C Where, w = the current word and c = its context words 1 word2vec parameters: window size 10, dimension 300. And p(x) is calculated using Negative Sampling “NCE” A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis
Vectors Evaluation How to evaluate the embedding vectors quality? Embeddings are evaluated based on the classifiers’ performance. A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis
Vectors Evaluation How to evaluate the embedding vectors quality? Embeddings are evaluated based on the classifiers’ performance. Then manually verify results, e.g. sample analogy queries A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis
Features Representation Embedding vs. manual features A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis
Features Representation (linguistic issues) Spelling errors Semantic relatedness (lexicon like?) A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis
Results and Comparison Banea et al. (2010) and Mourad et al. (2013) compared their models on two datasets: MPQA “Subjectivity” and ArabSenti 2 “Sentiment” Results compared on MPQA: 2 We could not obtain ArabSenti data. A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis
Results and Comparison Performance can improve with more data Better accuracy with more examples (not fair comparison) Possibly, larger embeddings size will give better accuracy too. Current size: ~159K vocabulary. A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis
Future Work and Conclusion Future Work Vectors evaluation. Incorporate manual features into the model. Deep models instead of the conventional classifiers. Conclusion Simple 3 word embeddings perform better than top hand-crafted methods 3 i.e. without any modification on the training samples. A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis
Q/A Thank you for attention A. Aziz Altowayan and L. Tao Pace University Word Embeddings for Arabic Sentiment Analysis
Recommend
More recommend