SENTIMENT ANALYSIS CS 498 | Mar 6
Macbeth, Scene 1, Act 2 from Wordle
my Citeulike page
Brad Paley’s TextArc.
Fernanda Viégas’s Themail.
Martin Wattenberg’s recent Word Tree visualization, showing Alberto Gonzales’s testimony.
PNNL’s ThemeRiver.
PNNL’s IN-SPIRE.
tools you know...practical stuff
Stanford’s list http://nlp.stanford.edu/links/statnlp.html LIWC http://www.liwc.net SentiWordNet http://sentiwordnet.isti.cnr.it Pang & Lee’s data at Cornell http://www.cs.cornell.edu/People/pabo/movie-review-data http://www.cs.cornell.edu/home/llee/data/convote.html
analysis & design how we might use it and why
“ If I could figure out a way to determine whether people are more fearful or changing to more euphoric … I can forecast the economy better than any way I know. e trouble is, we can't figure that out. — Alan Greenspan, Jan 2008
1841
Nasdaq vs. LiveJournal “anxious” moods, Jan 3 – Oct 26, 2007.
BOOSTED DECISION TREE CLASSIFIER 1. nerv* 8. fun 2. wor* 9. war 3. anx* 10. your* 4. hop* 11. going 5. you* 12. be* 6. scar* 13. interview 7. tomorrow other notables: 16. lov*, 21. hospital, 36. awesome, 51. yay, 89. exam*
All LiveJournal blog posts posts per minute: ~107 Bagged Naive Bayes classifier Boosted Decision Tree classifier Anxious true positive rate: 28% Anxious true positive rate: ~30% Anxious false positive rate: 3.4% Anxious false positive rate: ~6% Percentage of anxious Percentage of anxious posts in 10-min period posts in 10-min period Adapted Wald adjustment Adapted Wald adjustment (lower bound on 95% CI) (lower bound on 95% CI) average 60-min moving average
13.2K Dow Jones daily close 13.0K 7-day exponential moving average 12.8K 12.6K 12.4K 12.2K 12.0K 11.8K 15 % 11.6K 10 % μ + 6 σ 5 % 11.5M posts Percentage anxious blog posts 0 Jan 2008 Feb Mar Apr May Jun Jan 26 Feb 24 Mar 25 May 16 Of three predictive spikes, This spike comes three days This spike is probably noise, This spike appears three days this is the furthest from a before the second most critical although it does preface a steep before the most important local local maximum: it appears 5 maxima over this six month decline. Detecting important blogs maxima. As of June 24, the Dow trading days later on Feb 1. period. The Dow takes nearly and topics may eliminate spikes has still not recovered from May The SC primary happens on two months to recover. After like these. Conference Board’s 19, dropping nearly 10 % to date. this date. The Fed lowers searching newspapers near this consumer con fi dence came out Michigan’s consumer sentiment rate 4 days before and 4 date, it is not clear what event this day and could be responsible index came out this day, along days after this date. may have caused this spike. for the spike. with unexpectedly poor housing Consumer con fi dence and poor numbers. May 19 followed with business/housing reports many poor business reports (2.5 follow in the next two days. s.d. anxiety spikes on May 19).
OUR BLOG COMMENT DATASET 5 blog genres 33 top blogs 1,094 blog comments
“ Great post and I really like the video. This is extremely similar to the approach I use in writing almost anything … ProBlogger “ Just wait until hackers exploit the print layer to this mesh stuff enough to grab root and start injecting python code … Scobelizer
“ Great post and I really like the video. This is extremely similar to the approach I use in writing almost anything … ProBlogger “ Just wait until hackers exploit the print layer to this mesh stuff enough to grab root and start injecting python code … Scobelizer
Proportions of agreement 49.4% neither agree 39.2% disagree 11.1% Wald method p < 0.05
LEXICAL uni/bi/trigams TFIDF POS raw tags combo lexical AGREE/DISAGREE/NEITHER SENTIMENT congressional fl oor rotten tomatoes LIWC SEMANTIC sim to post ESA NAMED ENTITY organizations people
Features + Info Gain LIWC pos. emotion words agree 0.079 LIWC affect words agree 0.049 exclamations agree 0.043 adjectives agree 0.041 @ neither 0.041 ellipsis !disagree 0.038 great agree 0.035 is tech blog neither 0.034 cosine similarity to post !disagree 0.034 great [noun] agree 0.03 personal pronouns !disagree 0.028 present tense verbs neither 0.026 [prepos] [poss pronoun] agree 0.026 tf-idf dot product with post !neither 0.026 coordinating conjunctions agree 0.026
Recommend
More recommend