Natural Language Processing CSCI 4152/6509 — Lecture 12 Classifier Evaluation Instructor: Vlado Keselj Time and date: 09:35–10:25, 31-Jan-2020 Location: Dunn 135 CSCI 4152/6509, Vlado Keselj Lecture 12 1 / 29
Previous Lecture IR evaluation measures ◮ Precision, Recall, F-measure Precision-recall curves, example Other evaluation measures Text classification ◮ Text classification as a text mining problem ◮ Types of text classification CSCI 4152/6509, Vlado Keselj Lecture 12 2 / 29
Evaluation Measures for Text Classification Contingency table (confusion matrix) and Accuracy Example (classes A , B , and C ): Gold standard A B C Model A 5 1 1 7 classification B 3 10 2 15 C 0 2 10 12 8 13 13 34 Accuracy: percentage of correct classifications; in the example, = 25 / 34 ≈ 0 . 7353 = 73 . 53% CSCI 4152/6509, Vlado Keselj Lecture 12 3 / 29
Per class: Precision, Recall, and F-measure For each class: Yes = in class, No = not in class Yes is correct No is correct Yes assigned a b No assigned c d precision ( a a + b ), recall ( a a + c ), fallout ( b b + d ), F-measure: F = ( β 2 + 1) PR β 2 P + R If β = 1 ⇒ Precision and Recall treated equally macro-averaging (equal weight to each class) and micro-averaging (equal weight to each object) (2 × 2 contingency tables vs. one large contingency table) CSCI 4152/6509, Vlado Keselj Lecture 12 4 / 29
Example: Classification Results Gold standard A1 A2 A3 A1 5 1 1 7 System response A2 3 10 2 15 A3 0 2 10 12 8 13 13 34 Or, we can create contingency tables for each class separately: Gold standard Gold standard A1 not A1 A2 not A2 A1 5 2 7 A2 10 5 15 not A1 3 24 27 not A2 3 16 19 8 26 34 13 21 34 CSCI 4152/6509, Vlado Keselj Lecture 12 5 / 29
Gold standard A3 not A3 A3 10 2 12 not A3 3 19 22 13 21 34 The overall accuracy can be calculated using the overall table; Accuracy = 5 + 10 + 10 34 Per-class precisions are: P A 1 = 5 P A 2 = 10 P A 3 = 10 7 15 12 Per-class recalls are: R A 1 = 5 R A 2 = 10 R A 3 = 10 8 13 13 CSCI 4152/6509, Vlado Keselj Lecture 12 6 / 29
Macro-averaged precision, recall, and F-measure are: P macro = 5 / 7 + 10 / 15 + 10 / 12 R macro = 5 / 8 + 10 / 13 + 10 / 13 3 3 F macro = 2 · P macro · R macro P macro + R macro CSCI 4152/6509, Vlado Keselj Lecture 12 7 / 29
To calculate micro-averaged precision, recall, and F-measure, we calculate cumulative per-class table: Gold standard A not A A 25 9 34 not A 9 59 68 34 68 102 and then we calculate the micro-averaged measures: P micro = 25 R micro = 25 F micro = 2 · P micro · R micro = 25 34 34 P micro + R micro 34 CSCI 4152/6509, Vlado Keselj Lecture 12 8 / 29
Evaluation Methods for Classification General issues in classification ◮ Underfitting and Overfitting Example with polynomial-based function learning ◮ Underfitting and Overfitting CSCI 4152/6509, Vlado Keselj Lecture 12 9 / 29
Evaluation Methods for Text Classifiers Training Error Train and Test N-fold Cross-validation CSCI 4152/6509, Vlado Keselj Lecture 12 10 / 29
Train and Test Labeled data is divided into training and testing data Typically training data size : testing data size = 9 : 1, sometimes 2 : 1 training classifier training data testing data evaluation CSCI 4152/6509, Vlado Keselj Lecture 12 11 / 29
N-fold Cross-Validation fold 1 fold 1 training training fold 2 fold 2 fold 3 fold 3 classifier 1 classifier 2 . . . . . . fold n−1 fold n evaluation evaluation fold n fold n−1 fold 2 training fold 3 . . . . . . classifier n fold n−1 fold n evaluation fold 1 CSCI 4152/6509, Vlado Keselj Lecture 12 12 / 29
Text Clustering Text clustering is an interesting text mining task It is relevant to the course and a clustering task can be a project topic Since it is covered in some other courses, we will not cover it in much detail here Some notes are provided for your information CSCI 4152/6509, Vlado Keselj Lecture 12 13 / 29
Similarity-based Text Classification Aggregate training text for each class into a profile Aggregate testing text into another profile Classify according to profile similarity If a profile is a vector, we can use different similarity measures; e.g., ◮ cosine similarity, ◮ Euclidean similarity, or ◮ some other type of vector similarity CSCI 4152/6509, Vlado Keselj Lecture 12 14 / 29
CNG Method for Text Classification A simple method, initially used for authorship attribution Authorship attribution problem: CSCI 4152/6509, Vlado Keselj Lecture 12 15 / 29
CNG Method Overview Method based on character n-grams Language independent Based on creating n-gram based author profiles Similarity based (a type of kNN method— k Nearest Neighbours) Similarity measure: � 2 � � 2 f 1 ( g ) − f 2 ( g ) � 2 · ( f 1 ( g ) − f 2 ( g )) � � = (1) f 1 ( g )+ f 2 ( g ) f 1 ( g ) + f 2 ( g ) g ∈ D 1 ∪ D 2 2 g ∈ D 1 ∪ D 2 where f i ( g ) = 0 if g �∈ D i . CSCI 4152/6509, Vlado Keselj Lecture 12 16 / 29
Example of Creating an Author Profile Preparing character n−gram profile (n=3, L=5) M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) CSCI 4152/6509, Vlado Keselj Lecture 12 17 / 29
Example of Creating an Author Profile Preparing character n−gram profile (n=3, L=5) M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) n=3 Mar CSCI 4152/6509, Vlado Keselj Lecture 12 18 / 29
Example of Creating an Author Profile Preparing character n−gram profile (n=3, L=5) M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) n=3 Mar arl CSCI 4152/6509, Vlado Keselj Lecture 12 19 / 29
Example of Creating an Author Profile Preparing character n−gram profile (n=3, L=5) M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) n=3 Mar arl rle CSCI 4152/6509, Vlado Keselj Lecture 12 20 / 29
Example of Creating an Author Profile Preparing character n−gram profile (n=3, L=5) M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) n=3 Mar arl rle ley CSCI 4152/6509, Vlado Keselj Lecture 12 21 / 29
Example of Creating an Author Profile Preparing character n−gram profile (n=3, L=5) M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) n=3 Mar arl rle ley ey_ CSCI 4152/6509, Vlado Keselj Lecture 12 22 / 29
Example of Creating an Author Profile Preparing character n−gram profile (n=3, L=5) M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) n=3 Mar arl rle ley ey_ y_w CSCI 4152/6509, Vlado Keselj Lecture 12 23 / 29
Example of Creating an Author Profile Preparing character n−gram profile (n=3, L=5) M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) n=3 Mar arl rle ley ey_ y_w _wa was ... CSCI 4152/6509, Vlado Keselj Lecture 12 24 / 29
Example of Creating an Author Profile Preparing character n−gram profile (n=3, L=5) M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) n=3 _th 0.015 ___ 0.013 sort by frequency Mar the 0.013 arl he_ 0.011 rle and 0.007 ley _an 0.007 ey_ nd_ 0.007 y_w ed_ 0.006 _wa was ... CSCI 4152/6509, Vlado Keselj Lecture 12 25 / 29
Example of Creating an Author Profile Preparing character n−gram profile (n=3, L=5) M a r l e y w a s d e a d : to begin with. There is no doubt whatever about that... (from Christmas Carol by Charles Dickens) n=3 _th 0.015 ___ 0.013 sort by frequency Mar L=5 the 0.013 arl he_ 0.011 rle and 0.007 ley _an 0.007 ey_ nd_ 0.007 y_w ed_ 0.006 _wa was ... CSCI 4152/6509, Vlado Keselj Lecture 12 26 / 29
How to measure profile similarity? CSCI 4152/6509, Vlado Keselj Lecture 12 27 / 29
CNG Distance Measure Euclidean-style distance with relative differences, rather than absolute Example: instead of using 0 . 88 − 0 . 80 = 0 . 10 , we say it is about 10% difference, which is the same for 0.088 and 0.080 To be symmetric, divide by the arithmetic average: � 2 � f 1 ( n ) − f 2 ( n ) d ( f 1 , f 2 ) = Σ n ∈ dom ( f 1 ) ∪ dom ( f 2 ) f 1 ( n )+ f 2 ( n ) 2 dom ( f i ) is the domain of function f i , i.e., of the profile i CSCI 4152/6509, Vlado Keselj Lecture 12 28 / 29
Classification using CNG Create profile for each class using training text ◮ done by merging all texts in each class into one long document ◮ another option: centroid of profiles of individual documents Create profile for the test document Assign class to the document according to the closest class profile according to the CNG distance CSCI 4152/6509, Vlado Keselj Lecture 12 29 / 29
Recommend
More recommend