Mining the Peanut Gallery Opinion Extraction and Semantic Classification of Product Reviews A paper by Kushal Dave, Steve Lawrence, David M. Pennock Presented by Ledao Chen and David Zhao 1
Problem ● Product reviews are everywhere! ● How can you possibly read them all? 2
Relevant background ● Objectivity classification ○ Separating reviews from other content ● Word classification ○ How similar are two words ● Sentiment classification ○ What emotion a word is associated with 3
Data ● CNET ○ 7 categories, all electronics ○ Review with binary good/bad ● Amazon ○ 7 categories, varied ○ Review with 5-star rating 4
Negative Positive Category 1 Evaluation Category 2 Category 3 Test 1 Cat. 4 Category 5 Train fold Category 6 Test fold Category 7 5
10x sets Negative Positive Category 1 Evaluation Category 2 Category 3 Test 2 Category 5 Cat. 4 Category 6 Category 7 6
Tokenization ● Strip HTML ● Tokenize document into sentences ● Tokenize sentences into words [ [“Peace” “cannot” “be” “kept” “by” “force” “;” “it” “can” “only” ...], [“Darkness” “cannot” “drive” “out” “darkness” “;” “only” “light”...], [“Hate” “cannot” “drive” “out” “hate” “;” “only” “love” “can” “do”...] ] 7
Metadata and statistical substitution ● Numerical tokens ‒ “I have 35 ” → “I have number ” ● Product names ‒ “I like Nikon ” & “I like Kodak ” → “I like productname ” ● Low-frequency terms ‒ “ Peach fuzz” and “ Pollen fuzz” → “ unique fuzz” ● Product-specific terms ‒ “ Lens is bad” and “ RAM is bad” → “ producttypeword is bad” 8
Linguistic substitution ● WordNet from tokens with part-of-speech tags ● Colocation of nouns and modifying adjectives ● Stemming of tokens ● Negation propagation ○ “not good or useful” → “not NOT good NOT or NOT useful” 9
N-gram and proximity ● For Test 1, trigrams performed best ● For Test 2, bigrams performed best ● Mixing n-grams with lower-order features ○ e.g. bigrams mixed with unigrams ● Smoothing using lower-order reference model ● Proximity features [Peace cannot be kept by force it can only be achieved by understanding ] Combined into “achieved-understanding” feature 10
Substrings 11
Substring Trade-off Substrings become longer their frequency decrease generally more discriminatory less evidence for considering them relevant 12
Thresholding 1. Count the frequency of features 2. Normalize (optional) 3. Thresholding The difference of different thresholds are not significantly different. 13
Smoothing ● Add-one Smoothing ● Witten-Bell Smoothing P= ● Good-Turing Smoothing P= 14
Scoring Baseline: The normalized term frequency, by taking the number of times a feature f i occurs in C and dividing it by the total number of tokens in C. A term’s score is thus a measure of bias ranging from –1 to 1. 15
Scoring 16
Scoring Alternatives: odds ratio Performs on par with SVM Sensitive to different class sizes, thus performs poorly on Test 1 17
Scoring Alternatives: Fisher Discriminant Performs poorly on both tests 18
Reweighting ● Multiplying by document frequency, dampened by logarithm, provided better result on Test 1 ● Gaussian weighting scheme on TF provided better result on Test 1 19
Classifying Basic idea: Sum the scores of the words in an unknown document and use the sign of the total to determine 20
Mining Basic idea: crawl search engine results for a given product’s name and attempt to identify and analyze product reviews within this set. Model by data from Discard some certain pages, paragraphs, sentences (such as pages without “review” in their title, paragraphs not containing the name of the product, and excessively long or short sentences) 21
Mining Evaluation Randomly selected 600 sentences (200 for each of 3 products) from search engine as parsed and thresholded by the mining tool. Manually tagged as positive (P) or negative (N) or ambiguous (I) Ambiguous means they were ambiguous when taken out of context, did not express an opinion at all, or were not describing the product. ------ Very Subjective P: 173 N:71 I: 356 22
Mining Evaluation Worse than tossing a coin 23
Summary and Conclusions ● Obtained fairly good results for the review classification task through the choice of appropriate features and metrics ● Identified a number of issues that make this problem difficult 24
The Issues ● Rating inconsistency ○ Different understanding on 1-5 stars ● Ambivalence and comparison ○ Some reviewers use terms that have negative connotations, but then write an equivocating final sentence explaining that overall they were satisfied. 25
The Issues ● Sparse data ○ Many of the reviews are very short ○ Amazon is OK, but most reviews from C|net are within 3 documents occurrence ● Skewed distribution ○ positive reviews were predominant ○ certain products and product types have more reviews ■ ‘Camera’ is positive ■ Negative reviews a longer, language are more varied. 26
Questions 27
Recommend
More recommend