Overview Extracting Product Feature Motivation & Terminology Opinion Mining Work Assessments from Overview of OPI NE Reviews Product Feature Extraction Ana-Maria Popescu Customer Opinion Extraction Oren Etzioni Experimental Results http:/ / www.cs.washington.edu/ homes/ amp Conclusion and Future Work 1 2 Motivation Terminology Reviews abound on the Web Reviews contain features and opinions. consumer electronics, hotels, etc. Product features include: Automatic extraction of customer opinions Parts the cover of the scanner can benefit both manufacturers and Properties the size of the Epson3200 Related Concepts the image from this scanner customers Properties & Parts of Related Concepts the image size for the HP610 Other Applications Product features can be: Explicit the size is too big Automatic analysis of survey information I mplicit the scanner is not small Automatic analysis of newsgroup posts 3 4 Opinion Mining Work Terminology Reviews contain features and opinions. Extract positive/ negative opinion words Hatzivassiloglou & McKeown’97, Turney’03, etc. Opinions can be expressed by: Adjectives noisy scanner Nouns scanner is a disappointment Verbs I love this scanner Adverbs the scanner performs beautifully Opinions are characterized by polarity (+ , -) and strength (great > good). 5 6 1
Opinion Mining Work Opinion Mining Work Extract positive/ negative opinion words Extract positive/ negative opinion words Hatzivassiloglou & McKeown’97, Turney’03, etc. Hatzivassiloglou & McKeown’97, Turney’03, etc. Classify reviews as positive or negative Classify reviews as positive or negative Turney’02, Pang’02, Kushal’03 Turney’02, Pang’02, Kushal’03 I dentify feature-opinion pairs together with the polarity of each opinion Hu & Liu’04, Hu & Liu’05 7 8 Opinion Mining Work The OPI NE System Extract positive/ negative opinion words Hotel Majestic, Barcelona: HotelNoise OpinionPhrase Rank Polarity Frequency Hatzivassiloglou & McKeown’97, Turney’03, etc. Classify reviews as positive or negative Deafening 1 - 2 Turney’02, Pang’02, Kushal’03 Loud 2 - 7 I dentify feature-opinion pairs together with the Silent 3 + 3 polarity of each opinion Quiet 4 + 4 Hu & Liu’04, Hu & Liu’05 OPI NE: High-precision feature-opinion extraction, opinion polarity and strength Sample OPI NE output in the Hotel domain extraction 9 10 KI A Overview OPI NE Overview OPI NE is built on top of KI A, a domain-independent I E I nput: product class C, reviews R system which extracts concepts and relationships from Output: set of feature-opinion pairs { (f,o)} . the Web. Given relation R and pattern P R’ parseReviews( R ) KI A instantiates P into extraction rules for R E findExplicitProductFeatures(R’, C) KI A extracts candidate facts from the Web O findOpinions(R’, E) Each fact is assessed using a form of PMI : Hits (“Seattle is a city”) CO clusterOpinions(O) PMI (Seattle, is a city ) = I findI mplicitFeatures(CO, E) Hits (“Seattle ”) RO solveOpinionRankingCSP(CO) is a city = discriminator for the I S-A relationship outputFeatureOpinionPairs(RO, I ∪ E) { (f, o)} 11 12 2
Explicit Feature Extraction Parts and Properties Given product class C Extract review noun phrases with frequency f > k as potential meronyms. Assess candidates using discriminators D derived from patterns P: 1. Extract parts and properties of C Example: C= scanner , M= size , P= [M] of C Recursively extract parts and properties of P = [M] of C D 0 = [M] of scanner … D k = [M] of Epson 3200. Hits (“size of scanner”) C’s parts and properties, etc. PMI (size, [M] of scanner ) = 2. Extract related concepts of C Hits (“ of scanner ”) * Hits (“size”) (Popescu & all, 2004) … Hits (“size of Epson 3200”) PMI (size, [M] of Epson3200 ) = Extract parts and properties of related Hits (“ of Epson 3200 ”) * Hits (“size”) concepts Compute PMI T (M, P) = f(PMI (M,D 0 ), … PMI (M, D k )). Convert PMI T (M, P 0 ) … PMI T (M, P j ) into binary features for a NB classifier (NBC). Retain meronyms M with p(meronym(M, C)) > t. 13 14 Separate parts from properties using WordNet and Web information. OPI NE Overview Opinion Extraction Given feature f and sentence s containing f I nput: product class C, reviews R Extract phrases whose head modifies head(f) Output: set of feature-opinion pairs { (f,o)} . R’ parseReviews( R ) Example E findExplicitFeatures(R’, C) f = resolution s = … great resolution … f = scanner s = … . scanner is white … O findOpinions(R’, E) f = scanner s = … scanner is a horror … CO clusterOpinions(O) f = scanner s = I hate this scanner. I findI mplicitFeatures(CO, E) f = scanner s = The scanner works well. RO solveOpinionRankingCSP(CO); OPI NE then determines the polarity of each potential outputFeatureOpinionPairs(RO, I ∪ E) { (f, o)} opinion phrase. 15 16 Polarity Extraction OPI NE Overview Each potential opinion op has a semantic orientation label L(op): + , -, | I nput: product class C, reviews R Output: set of feature-opinion pairs { (f,o)} . I nitial SO Label Assignment OPI NE derives an initial label for each potential opinion: R’ parseReviews( R ) SO(op) = PMI (op, good) - PMI (op, bad). I f SO(op) < t or Hits(op) < t 1 , L(op) = “| ” (neutral). E findExplicitFeatures(R’, C) Else O findOpinions(R’, E) I f SO(op) > 0, L(op) = “+ ”. Else L(op) = “-”. CO clusterOpinions(O) Final SO Label Assignment I findI mplicitFeatures(CO, E) OPI NE uses constraints to derive a final set of labels WordNet constraints antonym(operative, inoperative) RO solveOpinionRankingCSP(CO) Conjunction/ disjunction constraints outputFeatureOpinionPairs(RO, I ∪ E) attractive, but expensive { (f, o)} I teration i : L i (op) = f(L i-1 (op 0 ), L i-1 (op 1 )… L i-1 (op k )) 17 18 Termination Condition: Labels remain constant over consecutive iterations. 3
I mplicit Properties Clustering Adjectives Generate initial clusters using WordNet syn/ antonyms. Adjectival opinions refer to implicit or explicit properties Clusters A i and A j are merged if there exist multiple elements Example: slow driver speed, slow driver a i , a j s.t. a i is similar to a j with respect to WordNet: similar(a1, a2): derived(a1, C), att(C, a2). OPI NE extracts properties corresponding to adjectives similar(a1, a2): att(C1, a1), att(C2, a2), subclass(C1, C2), etc. and uses them to derive implicit features For each cluster A i OPI NE uses queries such as [a 1 , a 2 and X] [a 1 , even X] , [a 1 , or even X], etc. Clarity: intuitive understandable clear straightforward to extract additional related adjectives a r from the Web. Noise: silent noisy quiet loud deafening I f multiple a r are elements of cluster A r Price: cheap inexpensive affordable expensive A i + A r = A’ { intuitive} + { clear, straightforward} Generate adjective cluster labels WordNet: big= valueOf(size) I mplicit Features: Add suffixes to cluster elements -iness, -ity the interface is intuitive clarity(interface): intuitive straightforward interface clarity(interface): straightforward 19 20 Rank Opinion Phrases Opinion Sentences I nitial opinion phrase ranking Opinion sentences are sentences containing at least one Derived from the magnitude of the SO scores: product feature and at least one corresponding opinion. |SO(great)| > |SO(good)|: great > good Final opinion phrase ranking Determining Opinion Sentence Polarity Given cluster A Determine the average strength s of sentence opinions op Use patterns such as I f s > t, [a, even a’] [a, just not a’] [a, but not a’], etc. Sentence polarity is indicated by the sign of s to derive set S of constraints on relative opinion strength Else c = silent > quiet c= deafening > loud Sentence polarity is that of the previous sentence Augment S with antonymy/ synonymy constraints Solve CSP S to find final opinion phrase ranking HotelNoise: deafening > loud > silent > quiet 21 22 Experimental Results OPI NE vs. Hu&Liu Datasets: 7 product classes, 1621 reviews Feature Extraction OPI NE improves precision by 22% with a 3% loss in recall. 5 product classes from Hu&Liu’04 I ncreased precision is due to Web-based feature assessment. 2 additional classes: Hotels, Scanners Opinion Sentence Extraction Experiments: OPI NE outperforms Hu & Liu on opinion sentence extraction: Feature Extraction: Hu&Liu’04 vs. OPI NE 22% higher precision, 11% higher recall Opinion Sentences: Hu&Liu’04 vs. OPI NE OPI NE outperforms Hu & Liu on sentence polarity extraction: Opinion Phrase Extraction & Ranking: OPI NE 8% higher accuracy OPI NE handles adjectives, noun, verb, adverb opinions and limited pronoun resolution. OPI NE also uses a more restrictive definition of opinion sentence than Hu & Liu. 23 24 4
Recommend
More recommend