introduction to artificial intelligence corenlp semantic
play

Introduction to Artificial Intelligence CoreNLP, Semantic Analysis, - PowerPoint PPT Presentation

Introduction to Artificial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classifier Janyl Jumadinova November 18, 2016 CoreNLP Reference: http://stanfordnlp.github.io/CoreNLP/ Package available in /opt/corenlp/ Run: java -cp


  1. Introduction to Artificial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classifier Janyl Jumadinova November 18, 2016

  2. CoreNLP ◮ Reference: http://stanfordnlp.github.io/CoreNLP/ ◮ Package available in /opt/corenlp/ ◮ Run: java -cp "/opt/corenlp/stanford-corenlp-3.7.0/*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner -file input.txt 2/24

  3. CoreNLP Annotators http://stanfordnlp.github.io/CoreNLP/annotators.html ◮ tokenize: Creates tokens from the given text. 3/24

  4. CoreNLP Annotators http://stanfordnlp.github.io/CoreNLP/annotators.html ◮ tokenize: Creates tokens from the given text. ◮ ssplit: Separates a sequence of tokens into sentences. 3/24

  5. CoreNLP Annotators http://stanfordnlp.github.io/CoreNLP/annotators.html ◮ tokenize: Creates tokens from the given text. ◮ ssplit: Separates a sequence of tokens into sentences. ◮ pos: Creates Parts of Speech (POS) tags for tokens. ◮ ner: Performs Named Entity Recognition classification. 3/24

  6. CoreNLP Annotators http://stanfordnlp.github.io/CoreNLP/annotators.html ◮ lemma: Creates word lemmas for tokens. 4/24

  7. CoreNLP Annotators http://stanfordnlp.github.io/CoreNLP/annotators.html ◮ lemma: Creates word lemmas for tokens. – The goal of lemmatization (as of stemming ) is to reduce related forms of a word to a common base form. 4/24

  8. CoreNLP Annotators http://stanfordnlp.github.io/CoreNLP/annotators.html ◮ lemma: Creates word lemmas for tokens. – The goal of lemmatization (as of stemming ) is to reduce related forms of a word to a common base form. – Lemmatization usually uses a vocabulary and morphological analysis of words to: - remove inflectional endings only, and - to return the base or dictionary form of a word, which is known as the lemma . 4/24

  9. Sentiment Analysis 5/24

  10. Sentiment Analysis ◮ https://www.csc.ncsu.edu/faculty/healey/tweet_viz/ tweet_app/ ◮ http://www.alchemyapi.com/developers/ getting-started-guide/twitter-sentiment-analysis ◮ www.sentiment140.com 6/24

  11. Sentiment analysis has many other names ◮ Opinion extraction ◮ Opinion mining ◮ Sentiment mining ◮ Subjectivity analysis 7/24

  12. Sentiment analysis is the detection of attitudes ◮ “enduring, affectively colored beliefs, dispositions towards objects or persons” 8/24

  13. Attitudes ◮ Holder (source) of attitude ◮ Target (aspect) of attitude 9/24

  14. Attitudes ◮ Holder (source) of attitude ◮ Target (aspect) of attitude ◮ Type of attitude - From a set of types: Like, love, hate, value, desire, etc. - Or (more commonly) simple weighted polarity: positive, negative, neutral, together with strength 9/24

  15. Attitudes ◮ Holder (source) of attitude ◮ Target (aspect) of attitude ◮ Type of attitude - From a set of types: Like, love, hate, value, desire, etc. - Or (more commonly) simple weighted polarity: positive, negative, neutral, together with strength ◮ Text containing the attitude - Sentence or entire document 9/24

  16. Sentiment analysis ◮ Simplest task : Is the attitude of this text positive or negative? 10/24

  17. Sentiment analysis ◮ Simplest task : Is the attitude of this text positive or negative? ◮ More complex : Rank the attitude of this text from 1 to 5 10/24

  18. Sentiment analysis ◮ Simplest task : Is the attitude of this text positive or negative? ◮ More complex : Rank the attitude of this text from 1 to 5 ◮ Advanced : Detect the target, source, or complex attitude types 10/24

  19. Baseline Algorithm ◮ Tokenization ◮ Feature Extraction ◮ Classification using different classifiers – Naive Bayes – MaxEnt – SVM 11/24

  20. Sentiment Tokenization Issues ◮ Deal with HTML and XML markup ◮ Twitter/Facebook/... mark-up (names, hash tags) ◮ Capitalization (preserve for words in all caps) ◮ Phone numbers, dates ◮ Emoticons 12/24

  21. Extracting Features for Sentiment Classification ◮ How to handle negation : I didn’t like this movie vs. I really like this movie 13/24

  22. Extracting Features for Sentiment Classification ◮ How to handle negation : I didn’t like this movie vs. I really like this movie ◮ Which words to use? –Only adjectives –All words 13/24

  23. Negation Add NOT to every word between negation and following punctuation 14/24

  24. Naive Bayes Algorithm ◮ Simple (“naive”) classification method based on Bayes rule ◮ Relies on very simple representation of document: - Bag of words 15/24

  25. Naive Bayes Algorithm 16/24

  26. Naive Bayes Algorithm 17/24

  27. Naive Bayes Algorithm 18/24

  28. Naive Bayes Algorithm For a document d and a class c 19/24

  29. Naive Bayes Algorithm 20/24

  30. Naive Bayes Algorithm 21/24

  31. Naive Bayes Algorithm 22/24

  32. Binarized (Boolean feature) Multinomial Naive Bayes Intuition: ◮ Word occurrence may matter more than word frequency ◮ The occurrence of the word fantastic tells us a lot ◮ The fact that it occurs 5 times may not tell us much more. 23/24

  33. Binarized (Boolean feature) Multinomial Naive Bayes Intuition: ◮ Word occurrence may matter more than word frequency ◮ The occurrence of the word fantastic tells us a lot ◮ The fact that it occurs 5 times may not tell us much more. Boolean Multinomial Naive Bayes Clips all the word counts in each document at 1 23/24

  34. Neural Networks and Deep Learning: Next! ◮ http://nlp.stanford.edu/sentiment/ ◮ java -cp "/opt/corenlp/stanford-corenlp-3.7.0/*" -Xmx2g edu.stanford.nlp.sentiment.SentimentPipeline -file input.txt 24/24

Recommend


More recommend