constructing aspect-based sentiment lexicons with topic modeling . - PowerPoint PPT Presentation

constructing aspect-based sentiment lexicons with topic modeling . 1 Kazan (Volga Region) Federal University, Kazan, Russia 2 Steklov Institute of Mathematics at St. Petersburg 3 National Research University Higher School of Economics, St. Petersburg 4 Deloitte Analytics Institute, Moscow, Russia April 7, 2016 Elena Tutubalina 1 and Sergey I. Nikolenko 1,2,3,4

intro: topic modeling and sentiment analysis .

overview . • Very brief overview of the paper: • we would like to do sentiment analysis; • there are topic model extensions that deal with sentiment; • but they always rely on an external dictionary of sentiment words; • in this work, we show a way to extend this dictionary automatically from that same topic model. 3

opinion mining . • Sentiment analysis / opinion mining: • traditional approaches set positive/negative labels by hand; • recently, machine learning models are trained to assign sentiment scores for most words in the corpora; • however, they can’t really work totally unsupervised, and high-quality manual annotation is expensive; • moreover, there are different aspects . • Problem : automatically mine sentiment lexicons for specific aspects. 4

topic modeling with lda . • Latent Dirichlet Allocation (LDA) – topic modeling for a corpus of texts: • a document is represented as a mixture of topics; • a topic is a distribution over words; • to generate a document, for each word we sample a topic and then sample a word from that topic; • by learning these distributions, we learn what topics appear in a dataset and in which documents. 5

topic modeling with lda . • Sample LDA result from (Blei, 2012): 5

topic modeling with lda . • There are two major approaches to inference in probabilistic models with a loopy factor graph like LDA: • variational approximations simplify the graph by approximating the underlying distribution with a simpler one, but with new parameters that are subject to optimization; • Gibbs sampling approaches the underlying distribution by sampling a subset of variables conditional on fixed values of all other variables. • Both approaches have been applied to LDA. • We will extend the Gibbs sampling. 5

lda likelihood . • The total likelihood of the LDA model is θ,φ p(θ ∣ α)p(z ∣ θ)p(w ∣ z, φ)p(φ ∣ β)dθdφ. 6 p(z, w, α, β) = ∫

gibbs sampling n ¬j • Samples are then used to estimate model variables: , . n ¬j • And in collapsed Gibbs sampling, we sample n ¬j 7 ∗,t,d + α ⋅ n ¬j w,t,∗ + α p(z j = t ∣ z −j , w, α, β) ∝ ∗,∗,d + Tα ∗,t,∗ + Wβ where z −j denotes the set of all z values except z j . θ td = n w,t,d + α φ wt = n w,t,∗ + β n w,∗,d + Tα, n ∗,t,∗ + Wβ.

lda extensions . • There exist many LDA extensions: • DiscLDA: LDA for classification with a class-dependent transformation in the topic mixtures; • Supervised LDA: documents with a response variable, we mine topics that are indicative of the response; • TagLDA: words have tags that mark context or linguistic features; • Tag-LDA: documents have topical tags, the goal is to recommend new tags to documents; • Topics over Time: topics change their proportions with time; • hierarchical modifications with nested topics are also important. • In particular, there are extensions tailored for sentiment analysis. 8

joint sentiment-topic . w ∼ Mult(φ l j ,z j ) . (3) sample a word (2) sample a topic (1) sample a sentiment label word position j : • Generative process – for each sentiment-topic pairs. conditional on distribution π d , words are document’s sentiment sentiments from a • JST: topics depend on 9 l j ∼ Mult(π d ) ; z j ∼ Mult(θ d,l j ) ;

joint sentiment-topic n ¬j for topic t with sentiment label k . , n ¬j n ¬j ⋅ . n ¬j ⋅ n ¬j n ¬j • In Gibbs sampling, one can marginalize out π d : 9 p(z j = t, l j = k ∣ z −j , w, α, β, γ, λ) ∝ ∗,k,t,d + α tk w,k,t,∗ + β kw ∗,k,∗,d + γ ∗,k,∗,d + ∑ t α tk ∗,k,t,∗ + ∑ w β kw ∗,∗,∗,d + Sγ where n w,k,t,d is the number of words w generated with topic t and sentiment label k in document d , α tk is the Dirichlet prior

aspect and sentiment unification model . w ∼ Mult(φ l s t s ) . (3) generate words sentiment label l s , conditional on the (2) choose topic (1) choose its sentiment label sentence in d , topic distribution θ d , for each (SLDA): for each review d with • Basic model – Sentence LDA only one aspect. each sentence speaks about sentences, assuming that a review is broken down into + sentiment for user reviews; • ASUM: aspect-based analysis 10 l s ∼ Mult(π d ) , t s ∼ Mult(θ dl s )

gibbs sampling for asum s ¬j • There are other models and extensions (USTM). , w ∏ . × s ¬j × ⋅ s ¬j assigned with topic t and sentiment label t in document d : s ¬j 11 • Denoting by s k,t,d the number of sentences (rather than words) p(z j = t, l j = k ∣ l −j , z −j , w, γ, α, β) ∝ k,t,d + α t k,∗,d + γ k k,∗,d + ∑ t α t ∗,∗,d + ∑ k ′ γ k ′ Γ (n ¬j ∗,k,t,∗ + ∑ w β kw ) Γ (n ¬j w,k,t,∗ + β kw + W w,j ) Γ (n ¬j ∗,k,t,∗ + ∑ w β kw + W ∗,j ) Γ (n ¬j w,k,t,∗ + β kw ) where W w,j is the number of words w in sentence j .

learning sentiment priors .

idea . • All of the models above assume that we have prior sentiment information from an external vocabulary: • in JST and Reverse-JST, word-sentiment priors λ are drawn from an • in ASUM, prior sentiment information is also encoded in the β • the same holds for other extensions such as USTM. 13 external dictionary and incorporated into β priors; β kw = β if word w can have sentiment label k and β kw = 0 otherwise; prior, making β kw asymmetric similar to JST;

idea . • Dictionaries of sentiment words do exist. • But they are often incomplete; for instance, we wanted to apply it to Russian where there are few such dictionaries. • It would be great to extend topic models for sentiment analysis to train sentiment for new words automatically! • We can assume access to a small seed vocabulary with predefined sentiment, but the goal is to extend it to new words and learn their sentiment from the model. 13

idea . • In all of these models, word sentiments are input as different β priors for sentiment labels. • If only we could train these priors automatically... 14

idea for N steps do ฀ E-step 4: run one Gibbs sampling update step 3: ฀ M-step 2: . 1: while inference has not converged do GeneralEMScheme • ...and we can do it with EM! • If only we could train these priors automatically... priors for sentiment labels. • In all of these models, word sentiments are input as different β 14 update β kw priors

em to train β . • This scheme works for every LDA extension considered above. the normalization coefficient ourselves, so we start with high annealing: τn w,k,∗,∗ , where τ is a regularization coefficient (temperature) that starts large (high variance) and then decrease (lower variance). 15 • At the E-step, we update β kw ∝ n w,k,∗,∗ , and we can choose variance and then gradually refine β kw estimates in simulated β kw = 1

em to train β . • Thus, the final algorithm is as follows: • start with some initial approximation to β w dictionary and maybe some simpler learning method used for initialization and then smoothed); • then, iteratively, 1 τ(i) n w,k,∗,∗ with, e.g., τ(i) = max(1, 200/i) ; • at the M-step, perform several iterations of Gibbs sampling for the corresponding model with fixed values of β kw . 15 s (from a small seed • at the E-step of iteration i , update β kw as β kw =

word embeddings . • Earlier (MICAI 2015), we have shown that this approach leads to improved results in terms of sentiment prediction quality. • In this work, we use improved sentiment-topic models to learn new aspect-based sentiment dictionaries. • To do so, we used distributed word representations (word embeddings). 16

word embeddings . • Distributed word representations map each word occurring in the dictionary to a Euclidean space, attempting to capture semantic relationships between the words as geometric relationships in the Euclidean space. • Started back in (Bengio et al., 2003), exploded after the works of Bengio et al. and Mikolov et al. (2009–2011), now used everywhere; we use embeddings trained on a very large Russian dataset (thanks to Nikolay Arefyev and Alexander Panchenko!). CBOW skip-gram 16

how to extend lexicons . • Intuition: words similar in some aspects of their meaning, e.g., sentiment, will be expected to be close in the semantic Euclidean space. • To expand the top words of resulting topics: • extract word vectors for all top words from the distribution φ in topics and all words in available general-purpose sentiment lexicons; • for every top word in the topics, construct a list of its nearest neighbors according to the cosine similarity measure in the R 500 space among the sentiment words from the lexicons ( 20 neighbors is almost always enough). • We have experimented with other similarity metrics ( L 1 , L 2 , variations on L ∞ ) with either worse or very similar results. 17

experiments .

constructing aspect-based sentiment lexicons with topic modeling . - PowerPoint PPT Presentation

constructing aspect-based sentiment lexicons with topic modeling . 1 Kazan (Volga Region) Federal University, Kazan, Russia 2 Steklov Institute of Mathematics at St. Petersburg 3 National Research University Higher School of Economics, St.

Virtual Student Orientation Information for Families SLIDESMANIA.COM TOPIC TOPIC TOPIC TOPIC

ConnectHome ConnectHome Topic 2 Topic 2 Nation Webinar Nation Webinar Topic 3 Topic 3 Topic

Sentiment Analysis Learning Sen*ment Lexicons Dan Jurafsky

Homework Assignment: 5 11-721: Grammars and Lexicons 11-721: Grammars and Lexicons Fall 2007

Sentiment analysis Christopher Potts CS 244U: Natural language understanding May 19 1 / 83

Word Senses Polysemy: many meanings The book uses aspect in these senses Informal

Aspect-Oriented Programming and Aspect-J TDDD05 Ola Leifer Most slides courtesy of Jens

Twitter Sentiment Analysis Twitter Sentiment Analysis Presented by: Loitongbam Gyanendro Singh

Aspect Based Sentiment Analysis Jared Kramer and Clara Gordon Overview Background Our

Linguistic Expressions of Sentiment, Subjectivity & Stance Ling575 Sentiment April 1, 2014

Pl u tchik ' s w heel of emotion , polarit y v s . sentiment SE N TIME N T AN ALYSIS IN R Ted K

Use of LDA Topics in Aspect and Sentiment Analysis by: Masha Igra Adviser: Prof. Michael Elhadad

Constructing Sentiment Sensitive Vectors for Word Polarity Classification Speaker: Johann Chu

Aspect Extraction with Automated Prior Knowledge Learning Zhiyuan (Brett) Chen Arjun Mukherjee

Exploiting Domain Knowledge in Aspect Extraction Meichun Hsu Zhiyuan (Brett) Chen Malu

Semantics is an indispensable aspect of a query language Semantics is an indispensable aspect of

Welcome Address Second French-Russian Conference . Random geometry and Physics . Institut

Archangel Ministry A Year in Review Eucharistic The number of people signed up to

Informatik I at the ITET departement of ETH Zrich. Place and time: Vorlesung am D-ITET der ETH

trt t rs

(3) (1) (2) 2 TENSE STUDY VOCABULARY TENSE STUDY IMMEDIATE FUTURE USAGE

Prescriptive versus Descriptive Prescriptive (largely proscriptive): old-school grammar; mostly

and Sequential Programs 01204111 Computers and Programmin ing Inti In tira raporn rn Mula

Attention-based Encoder-Decoder Networks NLP challenges Methods for Spelling and Grammatical

constructing aspect-based sentiment lexicons with topic modeling . - PowerPoint PPT Presentation

constructing aspect-based sentiment lexicons with topic modeling . 1 Kazan (Volga Region) Federal University, Kazan, Russia 2 Steklov Institute of Mathematics at St. Petersburg 3 National Research University Higher School of Economics, St.

Virtual Student Orientation Information for Families SLIDESMANIA.COM TOPIC TOPIC TOPIC TOPIC

ConnectHome ConnectHome Topic 2 Topic 2 Nation Webinar Nation Webinar Topic 3 Topic 3 Topic

Sentiment Analysis Learning Sen*ment Lexicons Dan Jurafsky

Homework Assignment: 5 11-721: Grammars and Lexicons 11-721: Grammars and Lexicons Fall 2007

Sentiment analysis Christopher Potts CS 244U: Natural language understanding May 19 1 / 83

Word Senses Polysemy: many meanings The book uses aspect in these senses Informal

Aspect-Oriented Programming and Aspect-J TDDD05 Ola Leifer Most slides courtesy of Jens

Twitter Sentiment Analysis Twitter Sentiment Analysis Presented by: Loitongbam Gyanendro Singh

Aspect Based Sentiment Analysis Jared Kramer and Clara Gordon Overview Background Our

Linguistic Expressions of Sentiment, Subjectivity &amp; Stance Ling575 Sentiment April 1, 2014

Pl u tchik ' s w heel of emotion , polarit y v s . sentiment SE N TIME N T AN ALYSIS IN R Ted K

Use of LDA Topics in Aspect and Sentiment Analysis by: Masha Igra Adviser: Prof. Michael Elhadad

Constructing Sentiment Sensitive Vectors for Word Polarity Classification Speaker: Johann Chu

Aspect Extraction with Automated Prior Knowledge Learning Zhiyuan (Brett) Chen Arjun Mukherjee

Exploiting Domain Knowledge in Aspect Extraction Meichun Hsu Zhiyuan (Brett) Chen Malu

Semantics is an indispensable aspect of a query language Semantics is an indispensable aspect of

Welcome Address Second French-Russian Conference . Random geometry and Physics . Institut

Archangel Ministry A Year in Review Eucharistic The number of people signed up to

Informatik I at the ITET departement of ETH Zrich. Place and time: Vorlesung am D-ITET der ETH

trt t rs

(3) (1) (2) 2 TENSE STUDY VOCABULARY TENSE STUDY IMMEDIATE FUTURE USAGE

Prescriptive versus Descriptive Prescriptive (largely proscriptive): old-school grammar; mostly

and Sequential Programs 01204111 Computers and Programmin ing Inti In tira raporn rn Mula

Attention-based Encoder-Decoder Networks NLP challenges Methods for Spelling and Grammatical

Linguistic Expressions of Sentiment, Subjectivity & Stance Ling575 Sentiment April 1, 2014