c NB argmax log P ( c j ) log P ( x i | c j ) - PowerPoint PPT Presentation

V ECTOR V ECTOR ECTOR S PACE ECTOR S PACE PACE C LASSIFICATION PACE C LASSIFICATION LASSIFICATION LASSIFICATION Christopher D. Christopher D. Manning, Manning, Prabhaka Prabhakar Raghavan aghavan and Hinrich nd Hinrich Schütze, chütze, I I Intro ntroductio d d uction to to Information I f I f ormation Retrie R R etrieval, l C l , Cam C ambri b b d ridge ge University U niversity Pre P ress ss. . Cha Chapter p ter 14 Wei Wei wwei@idi.ntnu.no Lect Lecture serie eries Vector Space Classification 1 TDT4215

RecALL RecALL: Naïve RecALL: Naïve RecALL : Naïve Bayes : Naïve Bayes Bayes classifiers Bayes classifiers classifiers classifiers • • Classify based on prior weight of class and conditional parameter for Classify based on prior weight of class and conditional parameter for what each word says:         c NB  argmax log P ( c j )  log P ( x i | c j )     c j  C i  positions Training is done by counting and dividing: • T c j x k   N c j P ( c j )  P ( x k | c j )    [ T c x   ] N [ ] c j x i x i  V • Don’t forget to smooth Vector Space Classification 2 TDT4215

Vector SPACE text classification Vector SPACE text classification Vector SPACE text classification Vector SPACE text classification • Today: T d : – Vector space methods for Text Classification l f • Rocchio classification • K Nearest Neighbors – Linear classifier and non-linear classifier Linear classifier and non-linear classifier – Classification with more than two classes Vector Space Classification 3 TDT4215

Vector SPACE text classification Vector SPACE text classification Vector SPACE text classification Vector SPACE text classification Vector space methods for Vector space methods for Text Classification Vector Space Classification 4 TDT4215

V ECTOR V ECTOR ECTOR S PACE ECTOR S PACE PACE C LASSIFICATION PACE C LASSIFICATION LASSIFICATION LASSIFICATION • Vector Space Representation • Vector Space Representation – Each document is a vector, one component for each term (= word). each term (= word) – Normally normalize vectors to unit length. – High dimensional vector space: – High-dimensional vector space: • Terms are axes • 10 000+ dimensions or even 100 000+ • 10,000+ dimensions, or even 100,000+ • Docs are vectors in this space – How can we do classification in this space? Vector Space Classification 5 TDT4215

V ECTOR V ECTOR ECTOR S PACE ECTOR S PACE PACE C LASSIFICATION PACE C LASSIFICATION LASSIFICATION LASSIFICATION • As before the training set is a set of documents • As before, the training set is a set of documents, each labeled with its class (e.g., topic) • In vector space classification, this set corresponds • In vector space classification this set corresponds to a labeled set of points (or, equivalently, vectors) in the vector space p • Hypothesis 1: Documents in the same class form a contiguous region of space • Hypothesis 2: Documents from different classes don’t overlap • We define surfaces to delineate classes in the space Vector Space Classification 6 TDT4215

Documents Documents in a vector Documents Documents in a vector in a vector space in a vector space space space Government Science Arts Vector Space Classification 7 TDT4215

Test document: which class? Test document: which class? Test document: which class? Test document: which class? Government Science Arts Vector Space Classification 8 TDT4215

Test document Test document = government Test document Test document government government government Government Science Arts Our main topic today is how to find good separators 9 Vector Space Classification 9 TDT4215

Vector SPACE text classification Vector SPACE text classification Vector SPACE text classification Vector SPACE text classification R Rocchio text classification hi l ifi i Vector Space Classification 10 TDT4215

Rocchio Rocchio text classification Rocchio Rocchio text classification text classification text classification • Rocchio Text Classification • Rocchio Text Classification – Use standard tf-idf weighted vectors to represent text documents represent text documents – For training documents in each category, compute a prototype vector by summing the vectors of the a prototype vector by summing the vectors of the training documents in the category • Prototype = centroid of members of class yp f m m f – Assign test documents to the category with the closest prototype vector based on cosine p yp similarity Vector Space Classification 11 TDT4215

D EFINITION D EFINITION OF C ENTROID OF C ENTROID EFINITION OF EFINITION OF ENTROID ENTROID   1   ( c )  ( d ) v | D | | D c | d  D c  • Where D c is the set of all documents that belong g c to class c and v ( d ) is the vector space representation of d. • Note that centroid will in general not be a unit vector even when the inputs are unit vectors. h h Vector Space Classification 12 TDT4215

R OCCHIO R OCCHIO CCHIO TEXT CCHIO TEXT TEXT CLASSIFICA TEXT CLASSIFICA CLASSIFICATION CLASSIFICATION ON ON r r1 r2 r3 t t=b b b b1 b2 Vector Space Classification 13 TDT4215

R OCCHIO R OCCHIO CCHIO P ROPERTIES CCHIO P ROPERTIES ROPERTIES ROPERTIES • Forms a simple generalization of the • Forms a simple generalization of the examples in each class (a prototype ). • Prototype vector does not need to be • Prototype vector does not need to be averaged or otherwise normalized for length since cosine similarity is insensitive to since cosine similarity is insensitive to vector length. • Classification is based on similarity to class Classification is based on similarity to class prototypes. • Does not guarantee classifications are Does not guarantee classifications are consistent with the given training data. Vector Space Classification 14 TDT4215

R OCCHIO R OCCHIO CCHIO A NOMA CCHIO A NOMA NOMALY NOMALY LY LY • Prototype models have problems with polymorphic (disjunctive) categories. r r1 b r2 t t=r b1 b2 r3 r4 Vector Space Classification 15 TDT4215

Vector SPACE text classification Vector SPACE text classification Vector SPACE text classification Vector SPACE text classification k N k Nearest Neighbor Classification N i hb Cl ifi i Vector Space Classification 16 TDT4215

K N EAREST K N EAREST EAREST N EIGHBOR EAREST N EIGHBOR EIGHBOR C LASSIFICA EIGHBOR C LASSIFICA LASSIFICATION LASSIFICATION ON ON • kNN = k Nearest Neighbor • kNN = k Nearest Neighbor • To classify document d into class c: T l ssif d m nt d int l ss : • Define k -neighborhood N as k nearest neighbors of d • Count number of documents i in N that belong to c • Estimate P(c| d ) as i/k • Estimate P(c| d ) as i/k • Choose as class argmax c P(c| d ) [ = majority class] Vector Space Classification 17 TDT4215

K N EAREST K N EAREST EAREST N EIGHBOR EAREST N EIGHBOR EIGHBOR C LASSIFICA EIGHBOR C LASSIFICA LASSIFICATION LASSIFICATION ON ON • Unlike Rocchio kNN classification determines the • Unlike Rocchio, kNN classification determines the decision boundary locally. • For 1NN (k=1) we assign each document to the class • For 1NN (k=1), we assign each document to the class of its closest neighbor. • For kNN we assign each document to the majority For kNN, we assign each document to the majority class of its k closest neighbors. K here is a parameter. • The rationale of kNN ： contiguity hypothesis. – We expect a test document d to have the same p label as the training documents located nearly. Vector Space Classification 18 TDT4215

Knn Knn: k Knn Knn: k : k=1 : k 1 Vector Space Classification 19 TDT4215

Knn Knn: k Knn Knn: k : k=1 5 1 : k 1,5,10 1 5 10 10 Vector Space Classification 20 TDT4215

KNN: weighted sum voting KNN: weighted sum voting KNN: weighted sum voting KNN: weighted sum voting Vector Space Classification 21 TDT4215

K N EAREST K N EAREST EAREST N EIGHBOR EAREST N EIGHBOR EIGHBOR C LASSIFICA EIGHBOR C LASSIFICA LASSIFICATION LASSIFICATION ON ON Test Government Science Arts Vector Space Classification 22 TDT4215

K N EAREST K N EAREST EAREST N EIGHBOR EAREST N EIGHBOR EIGHBOR C LASSIFICA EIGHBOR C LASSIFICA LASSIFICATION LASSIFICATION ON ON • Learning is just storing the representations of the training examples in D . • • Testing instance x (under 1NN) : Testing instance x (under 1NN) : – Compute similarity between x and all examples in D . – Assign x the category of the most similar example in D . g g y p • Does not explicitly compute a generalization or category prototypes. • • Also called: Also called: – Case-based learning – Memory-based learning y g – Lazy learning • Rationale of kNN: contiguity hypothesis Vector Space Classification 23 TDT4215

c NB argmax log P ( c j ) log P ( x i | c j ) - PowerPoint PPT Presentation

V ECTOR V ECTOR ECTOR S PACE ECTOR S PACE PACE C LASSIFICATION PACE C LASSIFICATION LASSIFICATION LASSIFICATION Christopher D. Christopher D. Manning, Manning, Prabhaka Prabhakar Raghavan aghavan and Hinrich nd Hinrich Schtze, chtze, I

Deep RL Robert Platt Northeastern University Q-learning Q-function Q action argmax state

(142733/102960-Log[4])+(614851/73920-2 Log[64]) h 2 +(2329/1680-Log[4]) h 4 -h 10 /20160

Chandra data reduction The CDFs Giorgio, Margherita, Elisabeta, Eleonora, Lazarus, Enrica,

Backpropagating through Structured Argmax using a SPIGOT Hao Peng, Sam Thomson, Noah A. Smith

Syslog and Log Rotate Computer Center, CS, NCTU Log files Execution information of each

Distributed ephemeral log service Log entries are replicated,dispersed See Ivy,

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Section 3.7 Derivatives of logarithmic functions 1 Rules of exponentials and logarithms 1.

Section5.4 Properties of Logarithmic Functions PropertiesofLogarithms Formulas Basic

STUDIES OF CLOSED/OPEN MIRROR SYMMETRY FOR QUINTIC THREE-FOLDS THROUGH LOG MIXED HODGE THEORY 0.

CS320: Performance Evaluation Plotting data sets Semi log plots Log log plots Analyzing Program

Complementary log-log and probit: activation functions implemented in artificial neural networks

CS4102 Algorithms Summer 2020 Warm up Show log ! = ( log ) Hint: show !

CS320: Performance Evaluation Plotting data sets Semi-log plots Log-log plots Analyzing Program

Paper 1 example question A Find the value of 1 1 1 1 27 1 27 log 3 2 27 4

y ( y log x x a ) a is equivalent to f(x) = log a x Logarithmic

Precursors of endometrioid Disclosure carcinoma of the uterus S tate of the Art *

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology M.

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology M.

Classification of Line and Character Pixels on Raster Maps Using Discrete Cosine Transformation

Empirical Methods in Natural Language Processing Lecture 12 Text Classification and Clustering

Hierarchical Classification of Pulmonary Lesions: A Large-Scale Radio-Pathomics Study Jiancheng

Top-Down AND Bottom-Up CGA Conference: Illuminating Space and Time in Data Science Krzysztof

1 SOME NOTES ON STATISTICAL INTERPRETATION Below I provide some basic notes on statistical