@ColditzJB #SBM2016 Use of Twitter to Assess Sentiment toward - PowerPoint PPT Presentation

@ColditzJB #SBM2016

Use of Twitter to Assess Sentiment toward Waterpipe Tobacco Smoking Jason B. Colditz, MEd Maharsi Naidu, Class of 2018 Noah A. Smith, PhD Joel Welling, PhD Brian A. Primack, MD, PhD

Goals Summarize known harms related to waterpipe • tobacco smoking (WTS) List ways in which Twitter trends are currently • being used in public health and medicine Define “machine learning” and describe how it • can be used to automate large-scale data classification Compare Western and Eastern hemispheres • with regard to overall sentiment toward WTS

Background:WTS • Waterpipe Tobacco Smoking (WTS) – Hookah, Shisha, Narghile [nar ‧ ghee ‧ leh] Head / Bowl: • Flavored tobacco mixture • Charcoal to maintain heat Base: • Filled with water or flavored liquid • Smoke is cooled as it bubbles through Hose / Mouthpiece: • Shared by smokers • Typically not filtered

Background:WTS & Health Typical toxicants from tobacco combustion • – Additional toxicants from charcoal – Carbon monoxide and second-hand smoke – High volume of smoke Addictive potential • – From social to habitual use – Transitioning to other tobacco products

Background: WTS Epidemiology • Traditional and widely prevalent in Eastern global cultures – Widespread public health concerns of addiction and preventable disease • Novel and gaining popularity in Western global cultures – Fun social activity / cultural immersion – Seen as relatively harmless vs. “smoking”

Background:Twitter & Health • Twitter for “ Big Data” – Used by nearly a third of young adults – Access to large scale data via Twitter’s Application Programming Interface (API) • Twitter for Public Health infodemiology : – Natural disaster relief – Foodborne illness / Communicable diseases – E-cigarette sentiment & marketing

Background:Twitter Data • Characteristics – 140 characters includes text, links, and... • Hashtags: #SBM2016 #DataScience • Emoji: – Basic location metadata: Metadata Prevalence Accuracy Geo-location ~ 1% Calculated & exact Time Zone Common Self-reported & broad Location from Very Self-reported & aberrant user profile Common

Background: Machine Learning Machine Learning is a subfield of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. • Computers are adept at discovering patterns in large sets of data. • Researchers can train computers to look for particularly useful patterns.

Methods: Data Collection • Twitter stream for 48 weekend hours: – From Friday, 11/14/2014, 17:00 GMT through Sunday, 11/16/2014, 16:59 GMT • Filters: – English language – Search terms: hookah, hooka, shisha, sheesha, narghile Tweets: N = 43,155

Methods: Human Coding • Random subset of 2,000 tweets – Independently double-coded • Coding: Relevant? No Yes • WTS Sentiment: False positive • Marijuana • Marketing • Pop-culture Positive? Negative?

Methods: Machine Learning • Supervised learning – Natural Language Toolkit (NLTK) for Python – Human coding as gold standard – Trained Naïve Bayes classifiers for WTS sentiment • Testing model’s Accuracy , Precision , and Recall • 3:1 training to testing ratio: Coded as WTS-relevant n = 1,345 – Unigram parameters • Individual words Sentiment classification: • Emoji Training Data Testing Data n = 1,008 (75%) n = 337 (25%)

Results: Human Coding • 655 (33%) Tweets excluded • Not WTS related • Marketing or pop-culture references • 1,345 Tweets considered relevant: • 54% Positive sentiment – Cohen’s K = 0.74 Neutral – Agreement = 87% Pos. • 21% Negative sentiment Neg. – Cohen’s K = 0.71 – Agreement = 92% • Disagreements manually adjudicated by coders to provide overall consensus

Results: Machine Learning • Positive sentiment : – Precision: 71% * & 76% † Recall: 84% * & 60% † – Overall accuracy: 73% • Exemplar predictive features: * Is positive: † Is not positive: 13.9 13.7 “starter” 7.6 12.9 “cigarettes” 5.9 “chill” 5.5 “hit” 4.8 4.9 “lounges” 3.4 3.5

Results: Machine Learning • Negative sentiment : – Precision: 41% * & 75% † Recall: 93% * & 60% † – Overall accuracy: 70% • Exemplar predictive features: * Is negative: † Is not negative: 23.1 “cigarettes” 6.7 “lads” 20.1 “shit” 6.4 “tonight” 18.6 “tar” 8.7 “ban” 6.9

Results: Hemispheres • Coded WTS tweets had time zone data 66% ( n = 890) • Western n = 727 • Eastern n = 163 • 56% positive* • 31% positive* • 24% negative • 23% negative * χ 2 =32.0, p < .001

Limitations / Considerations Twitter data biases • – English language – Timeframes Keyword search parameters • – Broad terms like “smoke” increase recall (sensitivity), but decrease precision (specificity) Classifier sophistication • – Unigrams vs. n -grams (bigrams, trigrams, etc.) Human coding is time and labor intensive • – Crowdsourcing (e.g., Mechanical Turk)

Discussion Waterpipe tobacco smoking (WTS) has serious • health risks and is gaining popularity in the US Twitter provides opportunities for researchers • and public health advocates to tap into online discourse and assess sentiment toward health behaviors Machine learning methods allow for • infodemiology: large-scale data categorization using geographic metadata, words, and symbols (e.g., emoji) Initial appraisal of our Twitter data indicated • proportionately higher positive sentiment toward WTS in the western hemisphere – This warrants further investigation

Thank You! Jason B. Colditz, M.Ed. jbc28@pitt.edu @ColditzJB ~ Center for Research on Media, Technology, and Health @CRMTH_Pitt

@ColditzJB #SBM2016 Use of Twitter to Assess Sentiment toward - PowerPoint PPT Presentation

@ColditzJB #SBM2016 Use of Twitter to Assess Sentiment toward Waterpipe Tobacco Smoking Jason B. Colditz, MEd Maharsi Naidu, Class of 2018 Noah A. Smith, PhD Joel Welling, PhD Brian A. Primack, MD, PhD Goals Summarize known harms related

Twitter Sentiment Analysis Twitter Sentiment Analysis Presented by: Loitongbam Gyanendro Singh

Sentiment Analysis in Twitter Rohit Kumar Jha, Sakaar Khurana Sentiment Analysis in Twitter

Feature extraction for sentiment analysis on twitter data with spanish language Victor Mu niz

Sentiment analysis Christopher Potts CS 244U: Natural language understanding May 19 1 / 83

Linguistic Expressions of Sentiment, Subjectivity & Stance Ling575 Sentiment April 1, 2014

Pl u tchik ' s w heel of emotion , polarit y v s . sentiment SE N TIME N T AN ALYSIS IN R Ted K

Sentiment analysis IN TRODUCTION TO N ATURAL LAN GUAGE P ROCES S IN G IN R Kasey Jones

Twitter Sentiment Analysis Group 23a CS365A- Project Presentation Ajay Singh (12056)

Analyzing #POTUS Sentiment on Twitter to Predict Public Opinion on Presidential Issues By: Jacob

SemEval-2013 Task 2: Sentiment Analysis in Twitter Preslav Nakov Sara Rosenthal Zornitsa

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

Using Twitter for your CPD Janet Thomas November 2019 #PHYSIO19 Why twitter for CPD?

ML at Twitter: A Deep Dive into Twitters Timeline Cibele Montez Halasz, Twitter Cortex

//Dashboard //Twitter Panel //Twitter Panel Context and Actions Act based on the document

Unpaired Sentiment-to-Sentiment Translation: A Cycled Reinforcement Learning Approach Jingjing

Exploiting New Sentiment-Based Meta-level Features for Effective Sentiment Analysis Srgio

Youll Have to Wait a Little Bit Longer: Practical Support and Policy Considerations for

More Social Issues Impact and Control 1 Questions to Ponder How are computers affecting the

Ubiquitous and Mobile Computing CS 528: Hooked on Smartphones: An Exploratory Study on Smartphone

Archiving and Sharing Confidential Data in the Social Sciences George Alter Director, ICPSR About

cse240a: Graduate Computer Architecture Steven Swanson Hung-Wei Tseng 1 Todays Agenda

A Partnership to Establish Tobacco free Mental Health and Substance Abuse Treatment Centers in

FIRST Seville 2007 Frank Wintle The past is a foreign country: they do things

9/24/2018 Asking Questions Membership Collegiate Recovery and The audio is by default through

Sambuz

Useful Links

Newsletter

Mail Us

@ColditzJB #SBM2016 Use of Twitter to Assess Sentiment toward - PowerPoint PPT Presentation

@ColditzJB #SBM2016 Use of Twitter to Assess Sentiment toward Waterpipe Tobacco Smoking Jason B. Colditz, MEd Maharsi Naidu, Class of 2018 Noah A. Smith, PhD Joel Welling, PhD Brian A. Primack, MD, PhD Goals Summarize known harms related

Twitter Sentiment Analysis Twitter Sentiment Analysis Presented by: Loitongbam Gyanendro Singh

Sentiment Analysis in Twitter Rohit Kumar Jha, Sakaar Khurana Sentiment Analysis in Twitter

Feature extraction for sentiment analysis on twitter data with spanish language Victor Mu niz

Sentiment analysis Christopher Potts CS 244U: Natural language understanding May 19 1 / 83

Linguistic Expressions of Sentiment, Subjectivity &amp; Stance Ling575 Sentiment April 1, 2014

Pl u tchik ' s w heel of emotion , polarit y v s . sentiment SE N TIME N T AN ALYSIS IN R Ted K

Sentiment analysis IN TRODUCTION TO N ATURAL LAN GUAGE P ROCES S IN G IN R Kasey Jones

Twitter Sentiment Analysis Group 23a CS365A- Project Presentation Ajay Singh (12056)

Analyzing #POTUS Sentiment on Twitter to Predict Public Opinion on Presidential Issues By: Jacob

SemEval-2013 Task 2: Sentiment Analysis in Twitter Preslav Nakov Sara Rosenthal Zornitsa

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

Using Twitter for your CPD Janet Thomas November 2019 #PHYSIO19 Why twitter for CPD?

ML at Twitter: A Deep Dive into Twitters Timeline Cibele Montez Halasz, Twitter Cortex

//Dashboard //Twitter Panel //Twitter Panel Context and Actions Act based on the document

Unpaired Sentiment-to-Sentiment Translation: A Cycled Reinforcement Learning Approach Jingjing

Exploiting New Sentiment-Based Meta-level Features for Effective Sentiment Analysis Srgio

Youll Have to Wait a Little Bit Longer: Practical Support and Policy Considerations for

More Social Issues Impact and Control 1 Questions to Ponder How are computers affecting the

Ubiquitous and Mobile Computing CS 528: Hooked on Smartphones: An Exploratory Study on Smartphone

Archiving and Sharing Confidential Data in the Social Sciences George Alter Director, ICPSR About

cse240a: Graduate Computer Architecture Steven Swanson Hung-Wei Tseng 1 Todays Agenda

A Partnership to Establish Tobacco free Mental Health and Substance Abuse Treatment Centers in

FIRST Seville 2007 Frank Wintle The past is a foreign country: they do things

9/24/2018 Asking Questions Membership Collegiate Recovery and The audio is by default through

Sambuz

Useful Links

Newsletter

Mail Us

Linguistic Expressions of Sentiment, Subjectivity & Stance Ling575 Sentiment April 1, 2014