mining and summarizing customer reviews
play

Mining and Summarizing Customer Reviews ---- Mansuo Shen, Yonghua - PowerPoint PPT Presentation

Mining and Summarizing Customer Reviews ---- Mansuo Shen, Yonghua Yu, Weicheng Chao Introduction Problems Too Many Reviews Customers: Difficult to Read and Make Decisions Manufaturer: Difficlut to Track and Manage Products


  1. Mining and Summarizing Customer Reviews ---- Mansuo Shen, Yonghua Yu, Weicheng Chao

  2. Introduction Problems ● Too Many Reviews ○ Customers: Difficult to Read and Make Decisions ■ Manufaturer: Difficlut to Track and Manage Products ■ Goal ● ○ mine and summarize all the customer reviews of a product ■ Mine the Features of the Product Positive and Negative Opinions ■

  3. Introduction Tasks ● ○ mine product features data mining and natural language ■ processing techniques ○ identify opinion sentences find opinion words ■ ○ decide whether each opinion sentence is positive or negative semantic orientation ■ summarize the results. ○

  4. Related Work Subjective Genre Classification ● Sentiment Classification ● Text Summarization ● TerminologyFinding ●

  5. The Proposed Techniques --- Part-of-SpeechTagging(POS) Product Feature: Usually Nones ● e.g. Tool ● <W C=‘NN’> : None NLProcessor Linguistic Parser(XML ○ <NG>: None Group / Phrase outputs) Split Texts into Sentences ■ Produce the Part-of-Speech Tag ■ for Each Word Identify None and Verb Groups ■ Saved in Database ● Transaction File Generated ● Preprocessing: Stopwords Removal, ● Stemming and etc

  6. The Proposed Techniques --- Frequent Features Identification ● Problem Difficulty of natural language understanding ○ e.g. ○ The pictures are very clear. ---- Picture Quality ■ While light, it will not easily fit in my pocket. ---- Camera Size ■ Frequent Features ● Associate Mining ○ Words Converage ■ Frequent: More than 1% of the Review Sentences ■ Not All Features are Genuine Features ○ Compactness Pruning ■ Prune candidate features which are not in a specific order ● Redundancy Pruning ■ p-support ●

  7. The Proposed Techniques --- Opinion Words Extraction Opinion Sentence ● Defination ○ Contain one or more product features ■

  8. The Proposed Techniques --- Orientation Identification for Opinion Words Adjectives are organized into bipolar clusters. in WordNet. Satellite synsets have similar sense with head synset. Procedure : 1. Set seed set with common adjectives and their orientations. 2. If given adj. has synonym or antonym in seed set, then we know its orientation and add it into seed set. 3. If not, keep the adj. and search it when the seed list is updated.

  9. The Proposed Techniques --- Infrequent Feature Identification Infrequent Features: Only small number of people talk about them, but can still be useful for customers and manufacturers. Procedure : For each sentence, if it doesn't contain frequent feature, but has one or more opinion words: Find the nearest noun/noun phrase around the opinion word, and store it into feature set as infrequent feature. Irrelevant noun: not a serious problem Infrequent features account a small part of the whole set. ● Infrequent features have lower p-support, so they are less important when ranked. ●

  10. The Proposed Techniques --- Predicting the Orientations of Opinion Sentences Procedure : 1. Count positive and negative opinions in a sentence, and if one wins, here comes the sentence’s orientation. 2. If there is a tie, for each feature, count effective opinions. 3. If still can’t decide, take the orientation of previous sentence. If there is a negation word close to a opinion word, use it’s opposite orientation. Examples : 1. “Overall this is a good camera with a really good picture clarity & an exceptional close-up shooting capability.” 2. “The auto and manual along with movie modes are very easy to use, but the software is not intuitive”

  11. The Proposed Techniques --- Summary Generation For each feature, list related opinion sentences grouped by positive/negative and show both counts. All features are ranked by frequency. Feature phrases have a higher rank.

  12. Evaluation Reviews are from: ● 2 digital cameras, 1 DVD player, 1 mp3 player, 1 cellular phone ○ Amazon.com and CNet.com ○ Reviews contain: ● a text review and a title ○ The first 100 reviews of each product were crawled and downloaded ● Using NLProcessor for generating POS tags ● Manually tagging ●

  13. Evaluation Evaluate FBS(Feture-Based Summarization) from following perspectives: Effectiveness of feature extraction ● Effectiveness of opinion sentence extraction ● Accuracy of orientation prediction of opinion sentences ●

  14. Evaluation Issue 1: ---“The taste of this burger is quite amazing” ---“This burger is coming from heaven” ---“The burger’s taste makes me say NO to any other burgers” Issue 2: ---“The taste makes me wander”

  15. Evaluation

  16. Evaluation

  17. Evaluation Two Reasons: FASTR generates a large number of terms ● Not features at all ○ FASTR does not find one-word terms ●

  18. Evaluation

  19. Limitations Pronoun resolution is hard ● E.g. “It’s quite light” ○ Only adjectives are considered as indicators of opinion orientations ● E.g. “I love its resolution” ○ Strength of opinions is not been considered ● E.g. “The color of it is astonishing!!!!!! But the screen is not that good.” ○

  20. Conclusion Provide a feature-based summary of lots of customer reviews of a online-sold product. ● The experimental results indicate that the proposed techniques are very promising. ● The problem would be increasingly important as more people are buying stuffs online. ●

  21. Questions?

Recommend


More recommend