Opinio n Mining F e iyu XU & Xiwe n CHE NG Xiwe n.c he ng @ dfki.de DF K I , Sa a rb rue c ke n, Ge rma ny Ja n 19th, 2011 2011-1-19 L a ng ua g e T e c hno lo g y I 1
Disc ussio n o n Opinio n Mining Applic a tio n
T e xtma p: to pic mo nito ring syste m
T we e tmo tif: T o pic summa riza tio n o n T witte r- e .g . wikile a k, pa re nting
Wha t the tre nd: T re nd mo nito ring - e .g . wikile a k
Opinio n g a the ring spe e d o n Inte rne t • WSJ pub lishe s a n a rtic le “why c hine se mo the r a re supe rio r” writte n b y Amy Chua o n 8 th , Ja n, 2011. Until 18 th , Ja n • 6,800 c o mme nts o n WSJ; K e ywo rd: Amy Chua • 3,490,000 o n Go o g le • 5,600 o n twitte r.c o m • 5,289 o n wo rdpre ss.c o m K e ywo rd: pa re nting • 83,200,000 se a rc h re sults o n Go o g le ; • 1,620,000 fro m twitte r.c o m; • 502,000 fro m wo rdpre ss.c o m 2011-1-19 L a ng ua g e T e c hno lo g y I 6
A q ue stio n fro m Quo ra
Pro po sa ls o f Opinio n Mining Applic a tio n a nd So lutio n?
Disc ussio n o n Re so urc e fo r Mo vie Re vie w Summa riza tio n
Re vie ws o n “Da s L e b e n de r Ande re n” @ imdb
Re vie ws o n “Da s L e b e n de r Ande re n” @ imdb
T o p 250 mo vie s vo te d b y imdb use rs
Wha t re so urc e a nd whic h fe a ture s yo u wo uld like to c ho o se fo r OM ta sks?
E xpe rime nt o n K o mPa rse Ma king NPCs e xpre ss the ir o pinio ns e mo tio na lly
Go ssip Ga lo re in Ra sc a lli
Ha nk in K o mPa rse
Pa ul‘ s so lutio n • Unsupe rvise d ma c hine le a rning • Da ta : c o mme nts ra nke d b y re vie we rs (1 ~ 10 sta rs) • F e a ture s – N-Gra m T o ke n Pa tte rns – De pe nde nc y Pa tte rns • E xtra kno wle dg e – Wo rdNe t – Ne g a tio n e xpre ssio ns • L e a rning a lg o rithm – Sc o ring syste m
Da ta Pro c e ssing • Re so urc e – I MDb (http:/ / www.imdb .c o m/ ), A mo vie o nline sto re ho use • I nte re ste d in I MDB pa g e s: – with na me (a c to rs, a utho rs, dire c to rs e tc .) – with title (mo vie title , mo vie re c o mme nda tio ns fro m I MDb ) • Co nta ining the info rma tio n: – Mo vie title – Re vie w – Re vie w title – Re vie w da te – Autho r na me – Autho r o rig in (o ptio na l) – Re c o mme nda tio n o f o the r use rs to this re vie w (o ptio na l) – T he sc o re the a utho r g a ve the re vie we d mo vie x/ 10 (o ptio na l)
Da ta Pro c e ssing <Re c o rd na me ="Pa yc he c k (2003)" isA="Mo vie " type ="IMDb use r re vie ws"> <F e a ture na me ="Re c o mme nd ">0 o ut o f 3</ F e a ture > <F e a ture na me ="T ime ">25 De c e mb e r 2003</ F e a ture > <F e a ture na me ="Autho r">a k2k</ F e a ture > <F e a ture na me ="Re vie w">A po o r re ma ke o f Mino rity Re po rt, with le ss ta le nte d a c to rs. Pro mising plo t line tha t wilte d a wa y in the first thirty minute s o f the film. Inte re sting induc tive jo urne y a nd ne a t c a r c ha se s, b ut no whe re c lo se to my mo ne y's wo rth. I'd re c o mme nd to g o a nd se e L OR a g a in.</ F e a ture > <F e a ture na me ="Sc o re ">1/ 10</ F e a ture > <F e a ture na me ="F ro m">Illino is</ F e a ture > <F e a ture na me ="T itle ">A pe rfe c t Christma s mo vie ha s a b o ut a s muc h c o nne c tio n with re a lity a s Sa nta Cla use do e s.</ F e a ture > </ Re c o rd>
Da ta Pro c e ssing Pre sumptio ns a nd o b se rva tio ns: • Sc o re indic a te s the se ntime nt o f the re vie w • Sho rt re vie ws a re pre fe rre d o ve r lo ng re vie ws – lo ng re vie ws ha ve a lo t o f o b je c tive pa rts a b o ut sto ryline , a ne c do te s e tc . – sho rt re vie ws c o nta ining o nly the o pinio n o ve r the mo vie a nd o fte n e xpre sse d se ntime nta l • T he se ntime nt c la ssific a tio n o n e xtre me re vie ws (ve ry hig h o r ve ry lo w ra ting ) a re mo stly una mb ig uo us a nd c le a r while mid ra te d re vie ws ha ve a lo t o f unc le a r se nte nc e s, suc h a s o ne the o ne ha nd …o n the o the r
Da ta Pro c e ssing • F ilte ring the re vie w – T he numb e r o f to ke ns > 900 – with a ra ting 4, 5, 6, 7 o r 8 o ut o f 10 • SCORE a ssig nme nt to e a c h se nte nc e in the se le c te d re vie ws – SCORE = Ra nk ( 1 ~ 10 sta rt) – SCORE + 1, if the se nte nc e : • I s the first, se c o nd o r la st se nte nc e • And c o nta ins the ke ywo rds, suc h a s I , me , mo vie , film a nd this mo vie . – SCORE – 1, if the se nte nc e : • Ha s the le ng th > 100 • And c o nta ins the ke ywo rds, suc h a s imdb , yo u, yo ur, spo ile r a nd re vie w e tc . • T he se nte nc e with the hig he st SCORE fro m a re vie w a re se le c te d.
F e a ture s – N-g ra m to ke n pa tte rn E xtra c ting uni-, b i- a nd trig ra ms o ut o f e ve ry se nte nc e fro m the se ntime nta l c o rpus • F o r e xa mp le : I a b so lute ly lo ve d this mo vie . • Unig ra ms: – i (NP), a b so lute ly (RB), lo ve d (VVD) • Big ra ms: – i a b so lute ly (NP RB), a b so lute ly lo ve d (RB VVD) • T rig ra ms: – i a b so lute ly lo ve d (NP RB VVD), a b so lute ly lo ve d this (RB VVD DT )
F e a ture s – De pe nde nc y Pa tte rn This is a funny super interesting and exciting movie. So me imp o rta nt info rma tio n is misse d in N-g ra m to ke ns pa tte rn. • funny a nd mo vie a re no t c a ug ht b y a n-g ra m (n<6) So , we inc lude de pe nd s pa tte rns: • a mo d(mo vie -9, funny-4) T o o l: Sta nfo rd -De pe nde nc y Pa rse r
E xtra K no wle d g e - Unig ra m pa tte rns e xte nd e d with Wo rd Ne t • All 1-g ra m a dje c tive a nd a d ve rb pa tte rns will b e e xte nd e d with Wo rdNe t. Bo th the syno nyms a nd the a nto nyms a re use d. • F o r instanc e , 1-g ram patte rn “dry” c an b e e xte nde d with – Pa rc he d / a rid / a nhydro us / se re / drie d-up – We t / wa te ry / da mp / mo ist / humid / so g g y • In o ur e xpe rime nt, the a nto nyms/ syno nyms a re the wo rds whic h c o nne c t the o rig ina l wo rd with a ma ximum dista nc e o f two .
E xtra K no wle d g e – Ne g a tio ns • So me e le me nts in a se nte nc e c a n c ha ng e the se ntime nt o f a wo rd o r phra se , suc h a s – Sub junc tive : I tho ug ht this mo vie is g o o d. – T e mpus: T his mo vie wa s g o o d. – Ne g a tio n: T his film is no t funny. – Quo ta tio n: My frie nd to ld me “this is the b e st mo vie e ve r, yo u ha ve to wa tc h it” b ut I didn’ t like d it. • In o ur wo rk, the c o nte nt in the q uo ta tio n is re mo ve d • we c a re o nly ne g a tio ns suc h a s no t, no , ne ve r a nd n’ t, inc luding – no wo nde r, no t just, no t to me ntio n e tc . – Re stric te d c o mpa ra tive se nte nc e s “no t b e tte r a s” “no mo re ” e tc .
Alg o rithm – Sc o re o f pa tte rns • E a c h pa tte rn ha s a n iSCORE , inc luding two sub -va lue s – iSCORE po s : the va lue o f b e ing po sitive – iSCORE ne g : the va lue o f b e ing ne g a tive • T he iSCORE is initia lize d with the fre q ue nc y o f this p a tte rn fro m the c o rpus
Alg o rithm – Da ta b ia s • Altho ug h “mo re ” ne g a tive sc o re d se nte nc e s a re use d , i.e . (1/ 10, 2/ 10, 3/ 10) vs.(9/ 10, 10/ 10), p o sitive re vie ws a re still twic e the na tive o ne s. • Assuming 1) the re a re X ne g a tive se nte nc e s a nd Y po sitive o ne s o r o n the o the r wa y ro und, a nd 2) Y > X e q ua lize r= Y / X BIAS= e q ua lize r/ (X + Y + Y – X) iSCORE Y = iSCORE Y / 2Y – BIAS
Alg o rithm – iSCORE • iSCORE = iSCORE po s - iSCORE ne g – If the va lue o f the iSCORE is po sitive the c o mpute d po la rity o f the pa tte rn is po sitive a nd if the va lue is ne g a tive the po la rity is ne g a tive • iSCORE = iSCORE * 2, if the pa tte rn is b ina ry • iSCORE = iSCORE * 3, if the pa tte rn is triple • iSCORE = iSCORE * 2.5, if the pa tte rn is a de pe nd e nc y pa tte rn
Alg o rithm – iSCORE e xte nd e d b y Wo rd Ne t • T he syno nyms ha ve the sa me po la rity a s the wo rd, while the a nto nyms ha ve a re ve rse d po la rity.
Alg o rithm – iSCORE e xte nd e d b y Wo rd Ne t • F o r instanc e , if Po larity(fast JJ) = po sitive , fo r wo rds with the Wo rdNe t de pth = 1 – [iSCORE (swift), iSCORE (pro mpt) , …] += 0.3 – [iSCORE (slo w) ] += - 0.3 • fo r the wo rds with a Wo rdNe t de pth = 2 – [iSCORE (swift) , iSCORE (pro mpt), …] += 0.3 * 2= 0.6 – [iSCORE (slo w)] += - 0.3 * 2 = - 0.6 – [iSCORE (slug g ish)] += - 0.3 (syno nyms a t the x nd de pth) += 0.3 * ((ma x. de pth + 1) – x) – iSCORE (a nto nyms a t the x nd de pth) += -0.3 * ((ma x. de pth + 1) – x) – iSCORE • # 0.3 is an arb itrarily c ho se n value
Recommend
More recommend