Opinion Mining in GATE Opinion Mining in GATE Horacio Saggion & - PowerPoint PPT Presentation

�� Opinion Mining in GATE Opinion Mining in GATE Horacio Saggion & Adam Funk

�� Is interested in the opinion a particular piece of discourse expresses • – Opinions are subjective statements reflecting people’s sentiments or perceptions on entities or events There are various problems associated to opinion mining • – Identify if a piece of text is opinionated or not (factual news vs. – Identify if a piece of text is opinionated or not (factual news vs. Editorial) – Identify the entity expressing the opinion – Identify the polarity and degree of the opinion (in favour vs. against) – Identify the theme of the opinion (opinion about what?)

�� Extract Factual Data with Information Extraction from Company Web Site Extract Opinions using Opinion Mining from Web Fora

�� Combine information extraction from company Web site with OM • findings – Given a review find company web pages and extract factual information from it including products and services – Associate the opinion to the found information Use information extraction to identify positive/negative phrases and • the “object” of the opinion – Positive: correctly packed bulb , a totally free service , a very efficient management … – Negative: the same disappointing experience , unscrupulous double glazing sales , do not buy a sofa from DFS Poole or DFS anywhere , the utter inefficiency …

�� sentiment �� opinion

�� positive opinions negative opinions negative opinion, but less evident

�� Because we have access to documents which have already an associated class, we • see OM as a classification problem – we consider our data “opinionated” We are interested in: • differentiate between positive opinion vs negative opinion – • “customer service is diabolical” • “I have always been impressed with this company” • “I have always been impressed with this company” recognising fine grained evaluative texts (1-star to 5-star classification) – • “one of the easiest companies to order with” (5-stars) • “STAY AWAY FROM THIS SUPPLIER!!!” (1-star) We use a supervised learning approach (Support Vector Machines) that uses • linguistic features; the system decides which features are most valuable for classification We use precision, recall, and F-score to assess classification accuracy •

�� We have a customisable crawling process to collect all texts from Web fora • 92 texts from a Web Consumer forum • – Each text contains a review about a particular company/service/product and a thumbs up/down – texts are short (one/two paragraphs) – 67% negative and 33% positive 600 texts from another Web forum containing reviews on companies or 600 texts from another Web forum containing reviews on companies or • • products – Each text is short and it is associated with a 1 to 5 stars review – * ~ 8%; ** ~ 2; *** ~ 3%; **** ~ 20%; ***** ~ 67% Each document is analysed to separate the commentary/review from the • rest of the document and associate a class to each review After this, the documents are processed with GATE processing resources: • – tokenisation; sentence identification; parts of speech tagging; morphological analysis; named entity recognition, and sentence parsing

�� Support Vector Machines (SVM) are very good algorithms used for • classification and have been also used in information extraction Learning in SVM is treated as a binary classification problem and a • multiclass problem is transformed in a set of n binary classification problems Given a set of training examples, each is represented as a vector in a space • of features and SVM tries to find an hyper plane which separates positive of features and SVM tries to find an hyper plane which separates positive from negative instances Given a new instance SVM will identify in which side of the hyper plane the • new instance lies and produce the classification accordingly The distance from the hyper plane to the positive and negative instances is • the margin and we use SVM with uneven margins available in GATE In order to use them, we need to specify how instances are represented and • decide on a number of parameters usually adjusted experimentally over training data

�� We decided to start investigating a very simple approach – word-based or • bag of words approach (usually works very well in text classification) – the original word – the root or lemma of the word (for “running” we use “run”) – the parts of speech category of the word (determinant, noun, verb, etc.) – the orthography of the word (all uppercase, lowercase, etc.) – the orthography of the word (all uppercase, lowercase, etc.) Each sentence/text is represented as a vector of features and values • – we carried out different combinations of features (different n-grams) – 10-fold cross validation experiments were run over the corpus with binary classifications (up/down) – the combination of root and orthography (unigram) provides the best classifier • around 80% F-score – use of higher n-grams decreases performance of the classifier – use of more features not necessarily improves performance – a uninformed classifier would have a 67% accuracy

�� • Same learning system used to produce the 5 stars classification over the fine-grained dataset • Same feature combinations were studied: – 74% overall classification accuracy using word root only only – other combinations degrade performance – 1* classification accuracy = 80%; 5* classification accuracy = 75% – 2* = 2%; 3*=3%; 4*=19% – 2*, 3*, 4* difficult to classify because or either share vocabulary with extreme cases or are vague

�� !�� !�� !�� !�� • word-based binary classification – thumbs-down: !, not, that, will, … – thumbs-up: excellent, good, www, com, site, … • word-based fine-grained classification • word-based fine-grained classification – 1*: worst, not, cancelled, avoid,… – 2*: shirt, ball, waited,…. – 3*: another, didn’t, improve, fine, wrong, … – 4*: ok, test, wasn’t, but, however,… – 5*: very, excellent, future, experience, always, great,…

Opinion Mining in GATE Opinion Mining in GATE Horacio Saggion & - PowerPoint PPT Presentation

Opinion Mining in GATE Opinion Mining in GATE Horacio Saggion & Adam Funk

Advanced GATE Embedded Track II, Module 8 Second GATE Training Course May 2010 Advanced GATE

Lesson 6 Combinational Logic Circuits Gate Review AND Gate OR Gate NOT Gate NAND

Opinion Mining Opinion Mining Feiyu Xu DFKI, LT-Lab Xu, LT1, 2013 Outline Outline

Gate B Gate B Gate B Gate D Gate D Gate D Gate E Gate E Gate E Ferry Plaza Ferry Plaza

CHAPTER IV GATE DESIGN R.M. Dansereau; v.1.0 GATE NETWORKS INTRO. TO COMP. ENG. GATE

The GATE Embedded API Track II, Module 5 Second GATE Training Course May 2010 The GATE Embedded

Opinion Mining Opinion Mining Feiyu Xu DFKI, LT-Lab Xu, LT1, 2011 Outline Outline

GATE APIs Track II, Module 6 Second GATE Training Course May 2010 GATE APIs 1 / 62 Using Java

CSS GATE TESTING AND IDENTIFICATION 2017-2018 GATE PROGRAM DESCRIPTION GATE Mission

Xpanda security products The gate way to peace of mind Retail security gate solutions

Advanced GATE Embedded Track II, Module 8 Sixth GATE Training Course June 2013 2013 The

FOR SINGLE POLE SLALOM & SINGLE GATE GIANT SLALOM* THE CHIEF GATE JUDGE

Advanced GATE Embedded Track II, Module 8 Fifth GATE Training Course June 2012 2012 The

Advanced GATE Embedded Track II, Module 8 Third GATE Training Course AugustSeptember 2010

Advanced GATE Embedded Additional material: UIMA/GATE integration Fifth GATE Training Course

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

COS 495 Precept 2 Machine Learning in Practice Misha Precept Objectives Review how to train

MATH 105: Finite Mathematics 9-1: Introduction to Statistics Prof. Jonathan Duncan Walla Walla

CGT 215 Computer Graphics Programming I Introduc9on CGT 215

2D graphics Week 4 Part 2 Vector graphics Use of geometric primitives: points, lines,

Distributed Representation of Sentences LU Yangyang luyy11@sei.pku.edu.cn July 16,2014 @ KERE

Latinos in Oregon: Evaluation Jam Session Trends and Opportunities Qualitative Analysis Part Two

Information Visualization Task Abstraction Tamara Munzner Department of Computer Science

Sentences and Documents Authors: QUOC LE, TOMAS MIKOLOV Presenters: Marjan Delpisheh, Nahid

Opinion Mining in GATE Opinion Mining in GATE Horacio Saggion & - PowerPoint PPT Presentation

Opinion Mining in GATE Opinion Mining in GATE Horacio Saggion & Adam Funk

Advanced GATE Embedded Track II, Module 8 Second GATE Training Course May 2010 Advanced GATE

Lesson 6 Combinational Logic Circuits Gate Review AND Gate OR Gate NOT Gate NAND

Opinion Mining Opinion Mining Feiyu Xu DFKI, LT-Lab Xu, LT1, 2013 Outline Outline

Gate B Gate B Gate B Gate D Gate D Gate D Gate E Gate E Gate E Ferry Plaza Ferry Plaza

CHAPTER IV GATE DESIGN R.M. Dansereau; v.1.0 GATE NETWORKS INTRO. TO COMP. ENG. GATE

The GATE Embedded API Track II, Module 5 Second GATE Training Course May 2010 The GATE Embedded

Opinion Mining Opinion Mining Feiyu Xu DFKI, LT-Lab Xu, LT1, 2011 Outline Outline

GATE APIs Track II, Module 6 Second GATE Training Course May 2010 GATE APIs 1 / 62 Using Java

CSS GATE TESTING AND IDENTIFICATION 2017-2018 GATE PROGRAM DESCRIPTION GATE Mission

Xpanda security products The gate way to peace of mind Retail security gate solutions

Advanced GATE Embedded Track II, Module 8 Sixth GATE Training Course June 2013 2013 The

FOR SINGLE POLE SLALOM &amp; SINGLE GATE GIANT SLALOM* THE CHIEF GATE JUDGE

Advanced GATE Embedded Track II, Module 8 Fifth GATE Training Course June 2012 2012 The

Advanced GATE Embedded Track II, Module 8 Third GATE Training Course AugustSeptember 2010

Advanced GATE Embedded Additional material: UIMA/GATE integration Fifth GATE Training Course

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

COS 495 Precept 2 Machine Learning in Practice Misha Precept Objectives Review how to train

MATH 105: Finite Mathematics 9-1: Introduction to Statistics Prof. Jonathan Duncan Walla Walla

CGT 215 Computer Graphics Programming I Introduc9on CGT 215

2D graphics Week 4 Part 2 Vector graphics Use of geometric primitives: points, lines,

Distributed Representation of Sentences LU Yangyang luyy11@sei.pku.edu.cn July 16,2014 @ KERE

Latinos in Oregon: Evaluation Jam Session Trends and Opportunities Qualitative Analysis Part Two

Information Visualization Task Abstraction Tamara Munzner Department of Computer Science

Sentences and Documents Authors: QUOC LE, TOMAS MIKOLOV Presenters: Marjan Delpisheh, Nahid

FOR SINGLE POLE SLALOM & SINGLE GATE GIANT SLALOM* THE CHIEF GATE JUDGE