CS 4501 Machine Learning for NLP Text Classification (I): Logistic - PowerPoint PPT Presentation

CS 4501 Machine Learning for NLP Text Classification (I): Logistic Regression Yangfeng Ji Department of Computer Science University of Virginia

Overview 1. Problem Definition 2. Bag-of-Words Representation 3. Case Study: Sentiment Analysis 4. Logistic Regression 5. 퐿 2 Regularization 6. Demo Code 1

Problem Definition

Case I: Sentiment Analysis [Pang et al., 2002] 3

Case II: Topic Classification Example topics ◮ Business ◮ Arts ◮ Technology ◮ Sports ◮ · · · 4

Classification ◮ Input: a text 풙 ◮ Example: a product review on Amazon ◮ Output: 푦 ∈ Y , where Y is the predefined category set (sample space) ◮ Example: Y = { Positive , Negative } 1 In this course, we use 풙 for both text and its representation with no distinction 5

Classification ◮ Input: a text 풙 ◮ Example: a product review on Amazon ◮ Output: 푦 ∈ Y , where Y is the predefined category set (sample space) ◮ Example: Y = { Positive , Negative } The pipeline of text classification: 1 Text Numeric Vector 풙 Classifier Category 푦 1 In this course, we use 풙 for both text and its representation with no distinction 5

Probabilistic Formulation With the conditional probability 푃 ( 푌 | 푿 ) , the prediction on 푌 for a given text 푿 = 풙 is 푦 = argmax ˆ 푃 ( 푌 = 푦 | 푿 = 풙 ) (1) 푦 ∈ Y 6

Probabilistic Formulation With the conditional probability 푃 ( 푌 | 푿 ) , the prediction on 푌 for a given text 푿 = 풙 is 푦 = argmax ˆ 푃 ( 푌 = 푦 | 푿 = 풙 ) (1) 푦 ∈ Y Or, for simplicity 푦 = argmax ˆ 푃 ( 푦 | 풙 ) (2) 푦 ∈ Y 6

Key Questions Recall ◮ The formulation defined in the previous slide 푦 = argmax ˆ 푃 ( 푌 = 푦 | 푿 = 풙 ) (3) 푦 ∈ Y ◮ The pipeline of text classification Text Numeric Vector 풙 Classifier Category 푦 7

Key Questions Recall ◮ The formulation defined in the previous slide 푦 = argmax ˆ 푃 ( 푌 = 푦 | 푿 = 풙 ) (3) 푦 ∈ Y ◮ The pipeline of text classification Text Numeric Vector 풙 Classifier Category 푦 Building a text classifier is about answering the following two questions 1. How to represent a text as 풙 ? 2. How to estimate 푃 ( 푦 | 풙 ) ? 7

Key Questions Recall ◮ The formulation defined in the previous slide 푦 = argmax ˆ 푃 ( 푌 = 푦 | 푿 = 풙 ) (3) 푦 ∈ Y ◮ The pipeline of text classification Text Numeric Vector 풙 Classifier Category 푦 Building a text classifier is about answering the following two questions 1. How to represent a text as 풙 ? ◮ Bag-of-words representation 2. How to estimate 푃 ( 푦 | 풙 ) ? 7

Key Questions Recall ◮ The formulation defined in the previous slide 푦 = argmax ˆ 푃 ( 푌 = 푦 | 푿 = 풙 ) (3) 푦 ∈ Y ◮ The pipeline of text classification Text Numeric Vector 풙 Classifier Category 푦 Building a text classifier is about answering the following two questions 1. How to represent a text as 풙 ? ◮ Bag-of-words representation 2. How to estimate 푃 ( 푦 | 풙 ) ? ◮ Logistic regression models 7

Key Questions Recall ◮ The formulation defined in the previous slide 푦 = argmax ˆ 푃 ( 푌 = 푦 | 푿 = 풙 ) (3) 푦 ∈ Y ◮ The pipeline of text classification Text Numeric Vector 풙 Classifier Category 푦 Building a text classifier is about answering the following two questions 1. How to represent a text as 풙 ? ◮ Bag-of-words representation 2. How to estimate 푃 ( 푦 | 풙 ) ? ◮ Logistic regression models ◮ Neural network classifiers 7

Bag-of-Words Representation

Bag-of-Words Representation Example Texts Text 1: I love coffee. Text 2: I don’t like tea. 9

Bag-of-Words Representation Example Texts Text 1: I love coffee. Text 2: I don’t like tea. Step I : convert a text into a collection of tokens Tokenized Texts Tokenized text 1: I love coffee Tokenized text 2: I don t like tea 9

Bag-of-Words Representation Example Texts Text 1: I love coffee. Text 2: I don’t like tea. Step I : convert a text into a collection of tokens Tokenized Texts Tokenized text 1: I love coffee Tokenized text 2: I don t like tea Step II : build a dictionary/vocabulary Vocabulary { I love coffee don t like tea } 9

Bag-of-Words Representations Step III : based on the vocab, convert each text into a numeric representation as Bag-of-Words Representations I love coffee don t like tea 풙 ( 1 ) = 0] T [1 1 1 0 0 0 풙 ( 2 ) = 1] T [1 0 0 1 1 1 10

Bag-of-Words Representations Step III : based on the vocab, convert each text into a numeric representation as Bag-of-Words Representations I love coffee don t like tea 풙 ( 1 ) = 0] T [1 1 1 0 0 0 풙 ( 2 ) = 1] T [1 0 0 1 1 1 The pipeline of text classification: Text Numeric Vector 풙 Classifier Category 푦 Bag-of-words Representation 10

Preprocessing for Building Vocab 1. convert all characters to lowercase UVa , UVA → uva 11

Preprocessing for Building Vocab 1. convert all characters to lowercase UVa , UVA → uva 2. map low frequency words to a special token unk Zipf’s law: 푓 ( 푤 푡 ) ∝ 1 / 푟 푡 11

Information Embedded in BoW Representations It is critical to keep in mind about what information is preserved in bag-of-words representations: ◮ Keep: ◮ words in texts ◮ Lose: ◮ word order ◮ sentence boundary ◮ paragraph boundary ◮ · · · 12

Case Study: Sentiment Analysis

A Simple Predictor Consider the following toy example Tokenized Texts Tokenized text 1: I love coffee Tokenized text 2: I don t like tea 14

A Simple Predictor Consider the following toy example Tokenized Texts Tokenized text 1: I love coffee Tokenized text 2: I don t like tea I love coffee don t like tea 풙 ( 1 ) 0 ] T [1 1 1 0 0 0 0 ] T [0 1 0 0 0 1 풘 Pos 0 ] T [0 0 0 1 0 0 풘 Neg 14

A Simple Predictor Consider the following toy example Tokenized Texts Tokenized text 1: I love coffee Tokenized text 2: I don t like tea I love coffee don t like tea 풙 ( 1 ) 0 ] T [1 1 1 0 0 0 0 ] T [0 1 0 0 0 1 풘 Pos 0 ] T [0 0 0 1 0 0 풘 Neg The prediction of sentiment polarity can be formulated as the following 풘 T Pos 풙 = 1 > 풘 T Neg 풙 = 0 (4) 14

A Simple Predictor Consider the following toy example Tokenized Texts Tokenized text 1: I love coffee Tokenized text 2: I don t like tea I love coffee don t like tea 풙 ( 1 ) 0 ] T [1 1 1 0 0 0 0 ] T [0 1 0 0 0 1 풘 Pos 0 ] T [0 0 0 1 0 0 풘 Neg The prediction of sentiment polarity can be formulated as the following 풘 T Pos 풙 = 1 > 풘 T Neg 풙 = 0 (4) Essentially, this way of prediction is counting the positive and 14 negative words in the text.

Another Example The limitation of word counting I love coffee don t like tea 풙 ( 2 ) 1 ] T [1 0 0 1 1 1 0 ] T 풘 Pos [0 1 0 0 0 1 0 ] T 풘 Neg [0 0 0 1 0 0 15

Another Example The limitation of word counting I love coffee don t like tea 풙 ( 2 ) 1 ] T [1 0 0 1 1 1 0 ] T 풘 Pos [0 1 0 0 0 1 0 ] T 풘 Neg [0 0 0 1 0 0 ◮ Different words should contribute differently. e.g., not vs. dislike 15

Another Example The limitation of word counting I love coffee don t like tea 풙 ( 2 ) 1 ] T [1 0 0 1 1 1 0 ] T 풘 Pos [0 1 0 0 0 1 0 ] T 풘 Neg [0 0 0 1 0 0 ◮ Different words should contribute differently. e.g., not vs. dislike ◮ Sentiment word lists are not complete Example II: Positive Din Tai Fung, every time I go eat at anyone of the locations around the King County area, I keep being reminded on why I have to keep coming back to this restaurant. · · · 15

Logistic Regression

Log-linear Models Directly modeling a linear classifier as ℎ 푦 ( 풙 ) = 풘 T 푦 풙 + 푏 푦 (5) with ◮ 풙 ∈ ℕ 푉 : vector, bag-of-words representation ◮ 풘 푦 ∈ ℝ 푉 : vector, classification weights associated with label 푦 ◮ 푏 푦 ∈ ℝ : scalar, label bias in the training set 푦 17

Log-linear Models Directly modeling a linear classifier as ℎ 푦 ( 풙 ) = 풘 T 푦 풙 + 푏 푦 (5) with ◮ 풙 ∈ ℕ 푉 : vector, bag-of-words representation ◮ 풘 푦 ∈ ℝ 푉 : vector, classification weights associated with label 푦 ◮ 푏 푦 ∈ ℝ : scalar, label bias in the training set 푦 About Label Bias Consider a case where we have 90 positive examples and 10 negative examples in the training set. With 푏 Pos > 푏 Neg , a classifier can get 90% predictions correct without even resorting the texts. 17

Logistic Regression Rewrite the linear decision function in the log probabilitic form log 푃 ( 푦 | 풙 ) ∝ 풘 T 푦 풙 + 푏 푦 (6) � �� ℎ 푦 ( 풙 ) 18

Logistic Regression Rewrite the linear decision function in the log probabilitic form log 푃 ( 푦 | 풙 ) ∝ 풘 T 푦 풙 + 푏 푦 (6) � �� ℎ 푦 ( 풙 ) or, the probabilistic form is 푃 ( 푦 | 풙 ) ∝ exp ( 풘 T 푦 풙 + 푏 푦 ) (7) 18

CS 4501 Machine Learning for NLP Text Classification (I): Logistic - PowerPoint PPT Presentation

CS 4501 Machine Learning for NLP Text Classification (I): Logistic Regression Yangfeng Ji Department of Computer Science University of Virginia Overview 1. Problem Definition 2. Bag-of-Words Representation 3. Case Study: Sentiment Analysis

CS 4501 Machine Learning for NLP Introduction Yangfeng Ji Department of Computer Science

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

SI485i : NLP Set 12 Features and Prediction What is NLP, really? Many of our tasks boil down

Government Support Services Dean W. Stotler, CPPO Director Phone: 302-857-4501 Government

NOVOBIO A New 3D Bioprinting Solution Group 3: Changxiao Liang, Demany McKinnon, Hunter

Towards Green Cryptography: a Comparison of Lightweight Ciphers from the Energy Viewpoint St

Reading Tea Leaves: How Humans Interpret Topic Models By Jonathan Chang, Jordan Boyd-Graber,

Enter The room Room Setup Touch the Jars Jars Illuminated Kettle Sounds Noticing the Scrolls

October 19, 2020 Alternative Education Accountability (AEA) Taskforce Agenda a 9:00 9:30

Subgrant Program 2018 TEXAS CONFERENCE ON ENDING HOMELESSNESS SEPTEMBER 26 - 28, 2018 Agenda

DoD and Railroads Partnering for Safe Military Loads Michael Bartosiak 9 July 2019 T R U S T E

Virtual Connection & Collaboration Reframing action for sustainability Kaz McGrath | Fi

TEA: Enabling State-Intensive Network Functions on Programmable Switches Daehyeok Kim

CS 4501 Machine Learning for NLP Text Classification (I): Logistic - PowerPoint PPT Presentation

CS 4501 Machine Learning for NLP Text Classification (I): Logistic Regression Yangfeng Ji Department of Computer Science University of Virginia Overview 1. Problem Definition 2. Bag-of-Words Representation 3. Case Study: Sentiment Analysis

CS 4501 Machine Learning for NLP Introduction Yangfeng Ji Department of Computer Science

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

SI485i : NLP Set 12 Features and Prediction What is NLP, really? Many of our tasks boil down

Government Support Services Dean W. Stotler, CPPO Director Phone: 302-857-4501 Government

NOVOBIO A New 3D Bioprinting Solution Group 3: Changxiao Liang, Demany McKinnon, Hunter

Towards Green Cryptography: a Comparison of Lightweight Ciphers from the Energy Viewpoint St

Reading Tea Leaves: How Humans Interpret Topic Models By Jonathan Chang, Jordan Boyd-Graber,

Enter The room Room Setup Touch the Jars Jars Illuminated Kettle Sounds Noticing the Scrolls

October 19, 2020 Alternative Education Accountability (AEA) Taskforce Agenda a 9:00 9:30

Subgrant Program 2018 TEXAS CONFERENCE ON ENDING HOMELESSNESS SEPTEMBER 26 - 28, 2018 Agenda

DoD and Railroads Partnering for Safe Military Loads Michael Bartosiak 9 July 2019 T R U S T E

Virtual Connection &amp; Collaboration Reframing action for sustainability Kaz McGrath | Fi

TEA: Enabling State-Intensive Network Functions on Programmable Switches Daehyeok Kim

Virtual Connection & Collaboration Reframing action for sustainability Kaz McGrath | Fi