Natural Language Processing CSCI 4152/6509 Lecture 12 Classifier - PowerPoint PPT Presentation

Natural Language Processing CSCI 4152/6509 — Lecture 12 Classifier Evaluation Instructor: Vlado Keselj Time and date: 09:35–10:25, 31-Jan-2020 Location: Dunn 135 CSCI 4152/6509, Vlado Keselj Lecture 12 1 / 29

Previous Lecture IR evaluation measures ◮ Precision, Recall, F-measure Precision-recall curves, example Other evaluation measures Text classification ◮ Text classification as a text mining problem ◮ Types of text classification CSCI 4152/6509, Vlado Keselj Lecture 12 2 / 29

Evaluation Measures for Text Classification Contingency table (confusion matrix) and Accuracy Example (classes A , B , and C ): Gold standard A B C Model A 5 1 1 7 classification B 3 10 2 15 C 0 2 10 12 8 13 13 34 Accuracy: percentage of correct classifications; in the example, = 25 / 34 ≈ 0 . 7353 = 73 . 53% CSCI 4152/6509, Vlado Keselj Lecture 12 3 / 29

Per class: Precision, Recall, and F-measure For each class: Yes = in class, No = not in class Yes is correct No is correct Yes assigned a b No assigned c d precision ( a a + b ), recall ( a a + c ), fallout ( b b + d ), F-measure: F = ( β 2 + 1) PR β 2 P + R If β = 1 ⇒ Precision and Recall treated equally macro-averaging (equal weight to each class) and micro-averaging (equal weight to each object) (2 × 2 contingency tables vs. one large contingency table) CSCI 4152/6509, Vlado Keselj Lecture 12 4 / 29

Example: Classification Results Gold standard A1 A2 A3 A1 5 1 1 7 System response A2 3 10 2 15 A3 0 2 10 12 8 13 13 34 Or, we can create contingency tables for each class separately: Gold standard Gold standard A1 not A1 A2 not A2 A1 5 2 7 A2 10 5 15 not A1 3 24 27 not A2 3 16 19 8 26 34 13 21 34 CSCI 4152/6509, Vlado Keselj Lecture 12 5 / 29

Gold standard A3 not A3 A3 10 2 12 not A3 3 19 22 13 21 34 The overall accuracy can be calculated using the overall table; Accuracy = 5 + 10 + 10 34 Per-class precisions are: P A 1 = 5 P A 2 = 10 P A 3 = 10 7 15 12 Per-class recalls are: R A 1 = 5 R A 2 = 10 R A 3 = 10 8 13 13 CSCI 4152/6509, Vlado Keselj Lecture 12 6 / 29

Macro-averaged precision, recall, and F-measure are: P macro = 5 / 7 + 10 / 15 + 10 / 12 R macro = 5 / 8 + 10 / 13 + 10 / 13 3 3 F macro = 2 · P macro · R macro P macro + R macro CSCI 4152/6509, Vlado Keselj Lecture 12 7 / 29

To calculate micro-averaged precision, recall, and F-measure, we calculate cumulative per-class table: Gold standard A not A A 25 9 34 not A 9 59 68 34 68 102 and then we calculate the micro-averaged measures: P micro = 25 R micro = 25 F micro = 2 · P micro · R micro = 25 34 34 P micro + R micro 34 CSCI 4152/6509, Vlado Keselj Lecture 12 8 / 29

Evaluation Methods for Classification General issues in classification ◮ Underfitting and Overfitting Example with polynomial-based function learning ◮ Underfitting and Overfitting CSCI 4152/6509, Vlado Keselj Lecture 12 9 / 29

Evaluation Methods for Text Classifiers Training Error Train and Test N-fold Cross-validation CSCI 4152/6509, Vlado Keselj Lecture 12 10 / 29

Train and Test Labeled data is divided into training and testing data Typically training data size : testing data size = 9 : 1, sometimes 2 : 1 training classifier training data testing data evaluation CSCI 4152/6509, Vlado Keselj Lecture 12 11 / 29

N-fold Cross-Validation fold 1 fold 1 training training fold 2 fold 2 fold 3 fold 3 classifier 1 classifier 2 . . . . . . fold n−1 fold n evaluation evaluation fold n fold n−1 fold 2 training fold 3 . . . . . . classifier n fold n−1 fold n evaluation fold 1 CSCI 4152/6509, Vlado Keselj Lecture 12 12 / 29

Text Clustering Text clustering is an interesting text mining task It is relevant to the course and a clustering task can be a project topic Since it is covered in some other courses, we will not cover it in much detail here Some notes are provided for your information CSCI 4152/6509, Vlado Keselj Lecture 12 13 / 29

Similarity-based Text Classification Aggregate training text for each class into a profile Aggregate testing text into another profile Classify according to profile similarity If a profile is a vector, we can use different similarity measures; e.g., ◮ cosine similarity, ◮ Euclidean similarity, or ◮ some other type of vector similarity CSCI 4152/6509, Vlado Keselj Lecture 12 14 / 29

CNG Method for Text Classification A simple method, initially used for authorship attribution Authorship attribution problem: CSCI 4152/6509, Vlado Keselj Lecture 12 15 / 29

CNG Method Overview Method based on character n-grams Language independent Based on creating n-gram based author profiles Similarity based (a type of kNN method— k Nearest Neighbours) Similarity measure: � 2 � � 2 f 1 ( g ) − f 2 ( g ) � 2 · ( f 1 ( g ) − f 2 ( g )) � � = (1) f 1 ( g )+ f 2 ( g ) f 1 ( g ) + f 2 ( g ) g ∈ D 1 ∪ D 2 2 g ∈ D 1 ∪ D 2 where f i ( g ) = 0 if g �∈ D i . CSCI 4152/6509, Vlado Keselj Lecture 12 16 / 29

Example of Creating an Author Profile Preparing character n−gram profile (n=3, L=5) M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) CSCI 4152/6509, Vlado Keselj Lecture 12 17 / 29

Example of Creating an Author Profile Preparing character n−gram profile (n=3, L=5) M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) n=3 Mar CSCI 4152/6509, Vlado Keselj Lecture 12 18 / 29

Example of Creating an Author Profile Preparing character n−gram profile (n=3, L=5) M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) n=3 Mar arl CSCI 4152/6509, Vlado Keselj Lecture 12 19 / 29

Example of Creating an Author Profile Preparing character n−gram profile (n=3, L=5) M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) n=3 Mar arl rle CSCI 4152/6509, Vlado Keselj Lecture 12 20 / 29

Example of Creating an Author Profile Preparing character n−gram profile (n=3, L=5) M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) n=3 Mar arl rle ley CSCI 4152/6509, Vlado Keselj Lecture 12 21 / 29

Example of Creating an Author Profile Preparing character n−gram profile (n=3, L=5) M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) n=3 Mar arl rle ley ey_ CSCI 4152/6509, Vlado Keselj Lecture 12 22 / 29

Example of Creating an Author Profile Preparing character n−gram profile (n=3, L=5) M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) n=3 Mar arl rle ley ey_ y_w CSCI 4152/6509, Vlado Keselj Lecture 12 23 / 29

Example of Creating an Author Profile Preparing character n−gram profile (n=3, L=5) M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) n=3 Mar arl rle ley ey_ y_w _wa was ... CSCI 4152/6509, Vlado Keselj Lecture 12 24 / 29

Example of Creating an Author Profile Preparing character n−gram profile (n=3, L=5) M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) n=3 _th 0.015 ___ 0.013 sort by frequency Mar the 0.013 arl he_ 0.011 rle and 0.007 ley _an 0.007 ey_ nd_ 0.007 y_w ed_ 0.006 _wa was ... CSCI 4152/6509, Vlado Keselj Lecture 12 25 / 29

Example of Creating an Author Profile Preparing character n−gram profile (n=3, L=5) M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) n=3 _th 0.015 ___ 0.013 sort by frequency Mar L=5 the 0.013 arl he_ 0.011 rle and 0.007 ley _an 0.007 ey_ nd_ 0.007 y_w ed_ 0.006 _wa was ... CSCI 4152/6509, Vlado Keselj Lecture 12 26 / 29

How to measure profile similarity? CSCI 4152/6509, Vlado Keselj Lecture 12 27 / 29

CNG Distance Measure Euclidean-style distance with relative differences, rather than absolute Example: instead of using 0 . 88 − 0 . 80 = 0 . 10 , we say it is about 10% difference, which is the same for 0.088 and 0.080 To be symmetric, divide by the arithmetic average: � 2 � f 1 ( n ) − f 2 ( n ) d ( f 1 , f 2 ) = Σ n ∈ dom ( f 1 ) ∪ dom ( f 2 ) f 1 ( n )+ f 2 ( n ) 2 dom ( f i ) is the domain of function f i , i.e., of the profile i CSCI 4152/6509, Vlado Keselj Lecture 12 28 / 29

Classification using CNG Create profile for each class using training text ◮ done by merging all texts in each class into one long document ◮ another option: centroid of profiles of individual documents Create profile for the test document Assign class to the document according to the closest class profile according to the CNG distance CSCI 4152/6509, Vlado Keselj Lecture 12 29 / 29

Natural Language Processing CSCI 4152/6509 Lecture 12 Classifier - PowerPoint PPT Presentation

Natural Language Processing CSCI 4152/6509 Lecture 12 Classifier Evaluation Instructor: Vlado Keselj Time and date: 09:3510:25, 31-Jan-2020 Location: Dunn 135 CSCI 4152/6509, Vlado Keselj Lecture 12 1 / 29 Previous Lecture IR

Natural Language Processing CSCI 4152/6509 Lecture 1 Course Introduction Instructor: Vlado

Natural Language Processing CSCI 4152/6509 Lecture 2 Introduction to Natural Language

Natural Language Processing CSCI 4152/6509 Lecture 7 Perl Processing Examples Instructor:

Natural Language Processing CSCI 4152/6509 Lecture 29 Context-Free Grammars for Natural

Natural Language Processing CSCI 4152/6509 Lecture 31 Introduction to Semantic Processing

Natural Language Processing CSCI 4152/6509 Lecture 6 Regular Expressions; Text Processing in

Natural Language Processing CSCI 4152/6509 Lecture 27 Parsing with Prolog Instructor: Vlado

Natural Language Processing CSCI 4152/6509 Lecture 9 Elements of Morphology Instructor:

Natural Language Processing CSCI 4152/6509 Lecture 26 CFGs and CYK Parsing Algorithm

Natural Language Processing CSCI 4152/6509 Lecture 17 N-gram Model Smoothing Instructor:

Natural Language Processing CSCI 4152/6509 Lecture 14 Probabilistic Modeling Instructor:

Natural Language Processing CSCI 4152/6509 Lecture 11 IR Measures and Text Mining

Natural Language Processing CSCI 4152/6509 Lecture 30 Efficient PCFG Inference Instructor:

Natural Language Processing CSCI 4152/6509 Lecture 4 About Course Project; Automata and

Natural Language Processing CSCI 4152/6509 Lecture 10 Elements of Information Retrieval

Natural Language Processing CSCI 4152/6509 Lecture 18 POS Tags; Hidden Markov Model (HMM)

CRA Cyber-Security Collaborative Research Alliance: MACRO: Models for Enabling Continuous

Topic Discovery and Future Trend Prediction In Scholarly Networks Interim Report 515030910600

Time Series Non-linear Forecasting Duen Horng (Polo) Chau Assistant Professor Associate

Forecasting based on surveillance data Sebastian Meyer Institute of Medical Informatics,

Pareto Optimal Streaming Unsupervised Ensemble Learning Soumya Basu University of Texas at

Analysis of BTI TK Crotte de bof S` arl 1 / 5 What is the BTI protocol? Recently introduced

Dependencies in Formal Mathematics: Applications and extraction for Coq and Mizar Jesse Alama 1 ,

CSEE 3827: Fundamentals of Computer Systems Course Introduction and Overview Course website