comparing opinions on the web
play

Comparing Opinions on the Web Authors Bing Liu, Minqing Hu, - PowerPoint PPT Presentation

Opinion Observer: Analyzing and Comparing Opinions on the Web Authors Bing Liu, Minqing Hu, Junsheng Cheng Paper Presentation: Asif Salekin Introduction Web: excellent source of consumer opinions Introduction Technical Tasks Useful


  1. Opinion Observer: Analyzing and Comparing Opinions on the Web Authors Bing Liu, Minqing Hu, Junsheng Cheng Paper Presentation: Asif Salekin

  2. Introduction • Web: excellent source of consumer opinions Introduction Technical Tasks • Useful information to customers and Problem Statement product manufacturers Prepare a Training Dataset Associated Rule Mining Post-processing • Opinion Extraction of Product Features observer Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments

  3. Technical Tasks • Identify product features Introduction Technical Tasks • For each feature, identify whether the Problem Statement opinion is positive or negative Prepare a Training • Review Format – Dataset Associated Rule Mining -Pros Post-processing -Cons Extraction of Product Features -Detailed Feature Refinement review Mapping to Implicit Features • The paper proposes a technique to identify Grouping Synonyms product features from pros and cons in this Experiments format

  4. Problem Statement • Set of products P={P 1 ,P 2 … P n } Introduction Technical Tasks • Set of reviews R i for P i ={r 1 ,r 2 … r k } Problem Statement • r j ={s j1 ,s j2 … s jm } : sequence of sentenses Prepare a Training Dataset • A product feature f in r j is an attribute of Associated Rule Mining the product that has been commented in r j Post-processing Extraction of • If f appears in r j , explicit feature Product Features Feature – “The battery life of this camera is too short” Refinement • If f does not appear in r j but is implied, Mapping to Implicit Features implicit feature Grouping Synonyms – “This camera is too large ” ( size ) Experiments

  5. Problem Statement • Opinion segment of a feature Introduction Technical Tasks – Set of consecutive sentences that expresses a Problem positive or negative opinion on f Statement – “The picture quality is good, but the battery life is Prepare a Training Dataset short” Associated Rule Mining Post-processing • Positive opinion set of a feature ( Pset ) Extraction of Product Features – Set of opinion segments of f that expresses Feature Refinement positive opinions about f from all the reviews of Mapping to the product Implicit Features – Nset can be defined similarly Grouping Synonyms Experiments

  6. Problem Statement • Observation: Each sentence segment contains at Introduction Technical Tasks most one product feature. Sentence segments are Problem separated by : , . ; and but Statement Prepare a Training Dataset Associated Rule Mining Post-processing Extraction of Product Features Feature Refinement – Pros: Cons: Mapping to Implicit Features Grouping Synonyms Experiments

  7. Prepare a Training Dataset “Battery usage; included 16MB is stingy” Introduction • Perform Part-Of-Speech (POS) tagging and remove Technical Tasks Problem digits Statement – “<N> Battery <N> usage” Prepare a – “<V> included <N> MB <V>is < Adj > stingy” Training Dataset • Replace feature words with [feature]. Associated Rule Mining – “<N> [feature] <N> usage” Post-processing – “<V> included <N> [feature] <V> is < Adj > stingy” Extraction of • Use 3-gram to produce shorter segments Product Features Feature • “<V> included <N> [feature] <V> is < Adj > stingy” Refinement → “< Adj > included <N> [feature] <V> is” Mapping to “<N> [feature] <V> is < Adj > stingy” Implicit Features • Distinguish duplicate tags Grouping Synonyms – “<N1> [feature] <N2> usage” Experiments • Perform word stemming

  8. Associated Rule Mining • Association Rule Mining model Introduction Technical Tasks • I = { i 1 , …, i n } : a set of items. Problem - I={milk,bread, butter, beer} Statement • D : a set of transactions. Each transaction Prepare a Training Dataset consists of a subset of items in I. Associated Rule • Association rule: Mining X → Y , where X ⊂ I , Y ⊂ I , and X ∩ Y = ∅ Post-processing Extraction of - Rule: {butter, bread} -> {milk} Product Features The rule has support s in D if s % of transactions in D contain X ∪ Y . • Feature - Support: 1/5 =.2 since, X ∪ Y occurs in only Refinement • Mapping to The rule X → Y holds in D with confidence c if c % of transactions Implicit Features in D that support X also support Y . Grouping - Confidence: 0.2/0.2=1.0 Synonyms for 100% of the transactions containing butter and bread , also Experiments contain milk

  9. Associated Rule Mining • The resulting sentence (3-gram) segments using Introduction human labeling are saved in a transaction file D . Technical Tasks Problem • Association rule mining finds all rules in the Statement database that satisfy some minimum support and Prepare a Training Dataset minimum confidence constraints. Associated Rule • Use the association mining system CBA (Liu, B., Hsu, Mining W., Ma, Y. 1998) to mine rules. Post-processing • Use 1% as the minimum support. Extraction of Product Features • No minimum confident used Feature • Some example rules: Refinement Mapping to – <N1>, <N2> → [feature] Implicit Features – <V>, <N> → [feature] Grouping – <N1> → [feature], <N2> Synonyms Experiments – <N1>, [feature] → <N2>

  10. Post-processing Rules: Introduction <N1>, <N2> → [feature] Technical Tasks Problem <V>, <N> → [feature] Statement <N1> → [feature], <N2> Prepare a Training <N1>, [feature] → <N2> Dataset Associated Rule • Step 1: We only need rules that have [feature] on Mining the RHS. Post-processing – Need only rule 1, rule 2 Extraction of Product Features • Step 2: We need to consider the sequence of items Feature in the LHS. Refinement – e.g., “<V>, <N> → [feature]” can have variation like: Mapping to Implicit Features “<N>, <V> → [feature]” Grouping – Checking each rule against the transaction file to find Synonyms the possible sequences. Experiments – Remove those derived rules with confidence < 50%.

  11. Post-processing Rules: Introduction <N1>, <N2> → [feature] Technical Tasks Problem <N>, <V> → [feature] Statement Prepare a Training • Step 3: Generate language patterns. Dataset – changed to the language patterns according to the Associated Rule Mining ordering of the items in the rules from step 2 and the Post-processing feature location Extraction of Product Features <N1> [feature] <N2> Feature Refinement <N> <V> [feature] Mapping to Implicit Features Grouping Synonyms Experiments

  12. Extraction of Product Features • Do POS tagging on new reviews Introduction • resulting patterns are used to match and Technical Tasks Problem identify candidate features Statement • Allow gaps for pattern matching Prepare a Training Dataset - <N1> [feature] <N2> can match with Associated Rule - “Animals like kind people” gap like F: kind Mining Post-processing • If a sentence segment satisfies multiple patterns Extraction of - Choose the pattern with highest confidence. Product Features • If no pattern applies, Feature Refinement - use nouns or noun phrases as features. Mapping to Implicit Features • If a sentence segment has only a single word, Grouping e.g., “heavy” and “big Synonyms Experiments - use that word as feature

  13. Feature Refinement Two main mistakes made during extraction: Introduction Technical Tasks – Feature conflict: Two or more feature in Problem one sentence segment Statement Prepare a Training – There is a more likely feature in the Dataset sentence segment but not extracted by Associated Rule Mining any pattern. Post-processing • e.g., “slight noise from speaker when not in Extraction of Product Features use” Feature “noise” is found to be the feature but not Refinement “speaker”. Mapping to – How to find this? “speaker” was found as Implicit Features Grouping candidate features in other reviews, but Synonyms Experiments “noise” was never.

  14. Feature Refinement Frequent-noun Introduction Technical Tasks • The generated product features together Problem Statement with their frequency counts are saved in a Prepare a Training candidate feature list . Dataset Associated Rule • For each sentence segment, if there are two Mining or more nouns, we choose the most frequent Post-processing Extraction of noun in the candidate feature list. Product Features Feature Frequent-term Refinement • For each sentence segment, we simply Mapping to Implicit Features choose the word/phrase (it does not need to Grouping Synonyms be a noun) with the highest frequency in the Experiments candidate feature list.

  15. Mapping to Implicit Features • In tagging the training data for mining Introduction Technical Tasks rules, we also tag the mapping of Problem Statement candidate features to their actual Prepare a Training Dataset features. Associated Rule Mining • “<V> included <N> MB <V>is <Adj > stingy” Post-processing Here, MB was tagged as feature. Now Map it to Extraction of Product Features Memory. Feature Refinement Mapping to Implicit Features Grouping Synonyms Experiments

Recommend


More recommend