Using sentiment analysis for stock market prediction BIRGER KLEVE

Project Goals • Increase Machine Learning knowledge – Learning real world practice – Facing real world problems – Optimize algorithm parameters

Project Definition Hypothesis: There is a correlation between tweet sentiment from certain people and a stocks movement. System: 1 Find tweets mentioning stocks 2 Classify sentiment of the tweet 3 Predict stock movement by processing stock data and tweet sentiment

Availability of Financial data on Twitter

Project Redefinition • Drop the financial aspect of the project and only focus on the sentiment of tweets

Sentiment Analysis • Keyword spotting – E.g. Happy, sad, bored • Lexical affinity – Affinity (swe: samhörighet) to a certain probability of polarity • Statistical methods • Concept-level techniques – Semantic analysis of text Cambria, E. An introduction to Concept-Level Sentiment Analysis. National University of Singapore

Pang & Lee • Thumbs up? 2002 • Movie reviews • Presence of Unigram + Bigram w/ negation Pang, B. Lee, L. Shivakumar, V. Thumbs up? Sentiment classification using Machine Learning Techniques. Cornell University, IBM Almaden. 2002

Social Media Features • Words entirely in caps • Prolonged words like angryyyyy • Positive/negative emoticons • Amount of hashtags • Frequency of different POS tags

Sentiment lexicon • Look up each word in a sentiment lexicon. • Lexical affinity • Use Features: – Highest score – Total score – Mean score

Tokenization and negation • Change usernames, URLs, hashtags etc. into normalized tokens • Tag certain words with negation. E.g. ”This horse is not that bad” => ”This horse is not that_NOT bad_NOT” ”not quite as great” => ”not quite_NOT as great” • Use the presence of each unigram as a feature

Classifier • SVM with Linear kernel • Parameters: C

Training • Tokenize and collect each unique word in the training data and save it as a vocabulary. • Fit SVM to the entire training set • Optimizing parameter C – 3-fold Cross Validation – Grid Search – Test the final classifier against a separate test set

Data • Training set 1 600 000 automatic classified tweets – w/ Keyword search – 2 classes: Negative & Positive • Test set 357 manually classified tweets Go, A., Bhayani, R., & Huang, L. Twitter sentiment classification using distant supervision. Tech. rep., Stanford University, 2009. • Sentiment lexicons: – Lexical affinity Kiritchenko, S., Zhu, X., Mohammad, S. Sentiment Analysis of short Informal Texts. Journal of Artificial Intelligence Research, 2014

Result

Result • Using 1.6% of the training data(25600 samples): – 54981 features – > 12 hours of optimizing » DNF – 1 hour final training – Sparse features => enormous RAM allocation

Result • Human test: ~80% • Expected: close to 79% • My baseline: ~65% • My Improved: ~75% – Might be higher

Tools • Python’s Scikit-learn • NLTK – for POS tagging (as features and to negate context)

What I have learned • Pitfalls of data collection • Handling LARGE amount of data • Using popular machine learning tools • (SVM, its kernels and their parameters)

Using sentiment analysis for stock market prediction BIRGER KLEVE - PowerPoint PPT Presentation

Using sentiment analysis for stock market prediction BIRGER KLEVE Project Goals Increase Machine Learning knowledge Learning real world practice Facing real world problems Optimize algorithm parameters Project Definition

Twitter Sentiment Analysis Twitter Sentiment Analysis Presented by: Loitongbam Gyanendro Singh

Sentiment analysis Christopher Potts CS 244U: Natural language understanding May 19 1 / 83

Pl u tchik ' s w heel of emotion , polarit y v s . sentiment SE N TIME N T AN ALYSIS IN R Ted K

Linguistic Expressions of Sentiment, Subjectivity & Stance Ling575 Sentiment April 1, 2014

Rule-Based Sentiment Analysis in Narrow Domain Detecting Sentiment in Daily Horoscopes Using

How recurrent networks implement contextual processing in sentiment analysis Niru

Welcome! Julia Silge Data Scientist at Stack Overflow DataCamp Sentiment Analysis in R: The

Sentiment Analysis What is Sentiment Analysis? Positive or negative

Sentiment Analysis What is Sentiment Analysis? Dan Jurafsky Positive or negative movie review?

Multilingual Sentiment Analysis in Social Media Supervisors Candidate Dr. Rodrigo Agerri Iaki

Sentiment analysis IN TRODUCTION TO N ATURAL LAN GUAGE P ROCES S IN G IN R Kasey Jones

Sentiment Analysis in Twitter Rohit Kumar Jha, Sakaar Khurana Sentiment Analysis in Twitter

Analysis in Hindi Naman Bansal Umair Z Ahmed MOTIVATION Why Sentiment Analysis? Labeling

Tidying Shakespeare Julia Silge Data Scientist at Stack Overflow DataCamp Sentiment Analysis in

Feature extraction for sentiment analysis on twitter data with spanish language Victor Mu niz

Sentiment analysis tasks and methods Mike Thelwall University of Wolverhampton, UK Contents

Identification of Fine Grained Feature Based Event and Sentiment Phrases from Business News

Welcome ! SE N TIME N T AN ALYSIS IN P YTH ON Violeta Mishe v a Data Scientist What is sentiment

CS260: Machine Learning Theory Lecture 1: Course Introduction Jenn Wortman Vaughan September 26,

What is a smart city? Alexis Tsoukis LAMSADE - CNRS, Universit Paris-Dauphine

Sentiment Analysis A Baseline Algorithm Dan Jurafsky Sentiment

rmarkdown Introduction David Dalpiaz STAT 430, Fall 2017 1 Announcements Homework 00 has

Electronic marking of electronically submitted coursework Tim Lowe Director of Teaching School

[ ANNOTATED VERSION, 5-16-2012 ] Rational maps of degree d 2. (Mostly d = 2.) Let K be an