CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 3: SENTIMENT ANALYSIS - PowerPoint PPT Presentation

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 3: SENTIMENT ANALYSIS Spring 2019 Marion Neumann

SENTIMENT ANALYSIS …discover people’s opinions , emotions , feelings about a subject , topic , product , or service from text Step 3: Step 1: Step 2: Infer sentiment Get the data Process text into features 2

SENTIMENT ANALYSIS Recap: Data Science Workflow scientific, collect & clean & use data social, or data understand format to create business problem data solution data problem ? improve movie sentiment scrape working with rule-based predictor • recommender analysis web/twitter text data machine learning • or classifier gauging brand perception 3

SENTIMENT ANALYSIS WORKFLOW à rule-based prediction à machine learning classifier bad & Negation Handling Feature Extraction excluded bad ping pong excluded rio 2016 Stemming 4 bad ping pong exclude rio 2016

RULE-BASED APPROACH à Lab 3 DSFS p25 5 Control Flow

TEXT DATA • Data representation à strings • four kinds of string data 1) categorical data 2) free strings (that can be semantically mapped to categories) 3) structured string data 4) free-form text data à What makes text different ? 6

TEXT DATA …is Big Data! 7

MACHINE LEARNING APPROACH • Classification 8

FEATURES FOR TEXT DATA • bag of words à does word occur in document yes / no à binary feature location great Same great flavor and friendly service as in the S 18th street friends location. This location is not as small but it's hard to talk to friends. small Thankfully there is great outdoor seating to escape the noise. … • word counts à how often does word occur? à count feature • more advanced: n-grams, TF-IDF 9

FEATURE REPRESENTATION • bag-of-words and word counts are vectors of review features or binary review counts III resin great D f's'EukeJvocasueary o 1 horrible 170.000 D i words Tpositive easel KEITEL dictionary D extremely sparse features many zeros since most word do not appear in review PDSH p38 10 Arrays

WHAT IS A CLASSIFIER? • Rule-based • list of positive and negative words results in fixed score (+1, -1, or 0) for each word • Classifier • no fixed lists of positive/negative words • each word gets a weight parameter ! assigned w ( x is referred to as • classifier = parameterized model of the dot product, • inner product, or • relationship between input and output/label scalar product • • e.g. label = sign(w ( x + +) using a linear relationship • classifier learns the weights from labeled training data 11

CLASSIFIER • output ( sentiment ) is a binary class Is this new review positive or negative? or 12

EVALUATION • Which approach (rule-based or machine learning) performs better? à How can we measure this? • Measures: • error rate (or misclassification rate) = # #$%%&'(%%$)$*+ ,*%, -.$/,% # ,*%, -.$/,% • average accuracy ( = 1 − 23343 3562 ) 13

SUMMARY & READING • Sentiment Analysis automatically identifies , extracts , and analyzes emotions in text data. • Text data needs to be preprocessed to get features that can be used for prediction and learning. • Linear classification is used to predict binary or categorical targets . Do not use the implementations PDSH • DSFS introduced in this p38 chapter à use NumPy Arrays! • Ch4: Linear Algebra à Vectors (p49-53) • Ch9: Getting Data (p105-108, p114-120) • Ch20: Natural Language Processing (p239-244) 14

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 3: SENTIMENT ANALYSIS - PowerPoint PPT Presentation

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 3: SENTIMENT ANALYSIS Spring 2019 Marion Neumann SENTIMENT ANALYSIS discover peoples opinions , emotions , feelings about a subject , topic , product , or service from text Step 3: Step 1:

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 2: EXPLORATORY DATA ANALYSIS Spring 2019 Marion

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: REGRESSION Spring 2019 Marion Neumann RECAP:

Twitter Sentiment Analysis Twitter Sentiment Analysis Presented by: Loitongbam Gyanendro Singh

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 10: DATA ENGINEERING Spring 2019 Marion Neumann

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 6: LEARNING PRINCIPLES Spring 2019 Marion Neumann

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 8: SIMILARITY-BASED PREDICTION Spring 2019 Marion

Sentiment analysis Christopher Potts CS 244U: Natural language understanding May 19 1 / 83

Pl u tchik ' s w heel of emotion , polarit y v s . sentiment SE N TIME N T AN ALYSIS IN R Ted K

CSE217 INTRODUCTION TO DATA SCIENCE COURSE WEBSITE, SYLLABUS, ACADEMIC INTEGRITY Spring 2019

Linguistic Expressions of Sentiment, Subjectivity & Stance Ling575 Sentiment April 1, 2014

Feature extraction for sentiment analysis on twitter data with spanish language Victor Mu niz

Welcome! Julia Silge Data Scientist at Stack Overflow DataCamp Sentiment Analysis in R: The

Welcome ! SE N TIME N T AN ALYSIS IN P YTH ON Violeta Mishe v a Data Scientist What is sentiment

Tidying Shakespeare Julia Silge Data Scientist at Stack Overflow DataCamp Sentiment Analysis in

The Columbia-GWU System at the 2016 TAC KBP BeSt Evaluation Owen Rambow, Tao Yu, Axinia Radeva,

Sentiment analysis IN TRODUCTION TO N ATURAL LAN GUAGE P ROCES S IN G IN R Kasey Jones

Teaching a robot to interpret natural language navigation instructions Ryan Eloff Supervisor:

Simulations Sam Reid PhET Interactive Simulation University of Colorado Boulder Denver HTML5

SAU Budget 01/23/2018 FY16 FY17 FY18 Adopted Dollar Prop. Cost TECHNOLOGY FY19 Proposed % change

Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences

Community Prevention and Wellness Initiative Needs Assessment Webinar November 6, 2013 Julia

Escape from the Textbook! Carlos Cabana ccabana@pacbell.net Mission High School SFUSD Henri

Akamatsus third flying geese paradigm Model for internal division of labor in East Asia based

Introduction to Political Research Session 5: Theory in the Research Process, Concepts, Laws and

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 3: SENTIMENT ANALYSIS - PowerPoint PPT Presentation

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 3: SENTIMENT ANALYSIS Spring 2019 Marion Neumann SENTIMENT ANALYSIS discover peoples opinions , emotions , feelings about a subject , topic , product , or service from text Step 3: Step 1:

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 2: EXPLORATORY DATA ANALYSIS Spring 2019 Marion

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: REGRESSION Spring 2019 Marion Neumann RECAP:

Twitter Sentiment Analysis Twitter Sentiment Analysis Presented by: Loitongbam Gyanendro Singh

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 10: DATA ENGINEERING Spring 2019 Marion Neumann

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 6: LEARNING PRINCIPLES Spring 2019 Marion Neumann

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 8: SIMILARITY-BASED PREDICTION Spring 2019 Marion

Sentiment analysis Christopher Potts CS 244U: Natural language understanding May 19 1 / 83

Pl u tchik ' s w heel of emotion , polarit y v s . sentiment SE N TIME N T AN ALYSIS IN R Ted K

CSE217 INTRODUCTION TO DATA SCIENCE COURSE WEBSITE, SYLLABUS, ACADEMIC INTEGRITY Spring 2019

Linguistic Expressions of Sentiment, Subjectivity &amp; Stance Ling575 Sentiment April 1, 2014

Feature extraction for sentiment analysis on twitter data with spanish language Victor Mu niz

Welcome! Julia Silge Data Scientist at Stack Overflow DataCamp Sentiment Analysis in R: The

Welcome ! SE N TIME N T AN ALYSIS IN P YTH ON Violeta Mishe v a Data Scientist What is sentiment

Tidying Shakespeare Julia Silge Data Scientist at Stack Overflow DataCamp Sentiment Analysis in

The Columbia-GWU System at the 2016 TAC KBP BeSt Evaluation Owen Rambow, Tao Yu, Axinia Radeva,

Sentiment analysis IN TRODUCTION TO N ATURAL LAN GUAGE P ROCES S IN G IN R Kasey Jones

Teaching a robot to interpret natural language navigation instructions Ryan Eloff Supervisor:

Simulations Sam Reid PhET Interactive Simulation University of Colorado Boulder Denver HTML5

SAU Budget 01/23/2018 FY16 FY17 FY18 Adopted Dollar Prop. Cost TECHNOLOGY FY19 Proposed % change

Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences

Community Prevention and Wellness Initiative Needs Assessment Webinar November 6, 2013 Julia

Escape from the Textbook! Carlos Cabana ccabana@pacbell.net Mission High School SFUSD Henri

Akamatsus third flying geese paradigm Model for internal division of labor in East Asia based

Introduction to Political Research Session 5: Theory in the Research Process, Concepts, Laws and

Linguistic Expressions of Sentiment, Subjectivity & Stance Ling575 Sentiment April 1, 2014