(B) Di Dista tance-based syste tems Mich Michael ael Bieh Biehl Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence University of Groningen www.cs.rug.nl/biehl
Intr troducti tion: • supervised learning, clasification, regression • machine learning “vs.” statistical modeling Ea Early (importa tant! t!) syste tems • linear threshold classifier, Rosenblatt’s Perceptron • adaptive linear neuron, Widrow and Hoff’s Adaline From Perceptr tron to Support t Vecto tor Machine • large margin classification • beyond linear separability Di Dista tance-based syste tems • prototypes: K-means and Vector Quantization • from K-Nearest_Neighbors to Learning Vector Quantization • adaptive distance measures and relevance learning IAC Winter School 2018, La Laguna 2
ov overv erview iew Basic concepts ts of similarity ty / dista tance based classificati tion prototype based systems: Vector Quantization, K-means (K) Nearest Neighbor classifier Learning Vector Quantization (LVQ) Di Dista tance measures and Relevance Learning predefined distances, e.g. divergence based LVQ adaptive distances, e.g. Matrix Relevance LVQ IAC Winter School 2018, La Laguna 3
Di Dista tance-based classificati tion
dista tance-based classifiers a simple dista tance-based syste tem: (K) NN classifier • store a set of labeled examples • classify a query according to the label of the Nearest t Neighbor (or the majority of K K NN NN) • piece-wise linear decision ? ? boundaries according to (e.g.) Euclidean dista tance from all examples N-dim. feature space + conceptually simple, - expensive (storage, computation) + no training phase - sensitive to mislabeled data + only one parameter (K) - overly complex decision boundaries IAC Winter School 2018, La Laguna
proto toty type-based classificati tion Learning Vecto tor Quanti tizati tion [Ko Koho honen nen] ] • represent the data by one or several proto toty types per class • classify a query according to the label of the nearest t proto toty type (or alternative schemes) ? ? • local decision boundaries acc. to (e.g.) Euclidean distances + robust, low storage needs, N-dim. feature space little computational effort + + parameterization in feature space, interpretability - model selection: number of prototypes per class, etc. - - - requires training: placement of prototypes in feature space IAC Winter School 2018, La Laguna
Nearest t Proto toty type Classifier set of proto toty types carrying class-labels nearest t proto toty type classifier (NPC): based on dissimilarity/distance measure given - determine the winner - assign x to the class reasonable requirements: most prominent example: (squared) Eu Euclidean dista tance IAC Winter School 2018, La Laguna
Learning Vecto tor Quanti tizati tion N-dimensional data, feature vectors ∙ identification of proto toty type vecto tors from labeled example data ∙ distance based classificati tion (e.g. Euclidean) competi titi tive learning: LVQ1 LVQ1 [Kohonen] • initi tialize prototype vectors for different classes • present a single single ex examp ample le • identify the wi winne nner (closest prototype) • move the winner - closer to towards th the data ta (same class) - away from th the data ta (different class) IAC Winter School 2018, La Laguna
Learning Vecto tor Quanti tizati tion N-dimensional data, feature vectors ∙ identification of proto toty type vecto tors from labeled example data ∙ distance based classificati tion (e.g. Euclidean) ∙ dista tance-based classificati tion [here: Euclidean distances] ∙ te tesselati tion of featu ture space [piece-wise linear] ∙ aim: discriminati tion of classes ( ≠ vector quantization or density estimation ) ∙ generalizati tion ability ty correct classification of new data IAC Winter School 2018, La Laguna
LVQ1 LVQ1 ite terati tive tr training procedure: randomized initial , e.g. close to the class-conditional means sequential pres. of labelled labelled examples … the winner takes it all: LVQ1 update te ste tep: learning rate many heuristic variants/modifications: - learning rate schedules η w (t) - update more than one prototype per step IAC Winter School 2018, La Laguna
LVQ1 LVQ1 LVQ1 update te ste tep: LVQ1-like update te for generalized dista tance: addtl. requirement: update decreases (increases) distance if classes coincide (are different) IAC Winter School 2018, La Laguna
remark: th the curse of dimension ? concentr trati tion of dista tances for large N ??? ??? „distance based methods are bound to fail in high dimensions“ LVQ: LVQ: - prototypes are not just random data points - carefully selected low-noise representatives of the data - distances of a given data point to prototypes are com compared pared projecti tion to to non-tr trivial low low-dim dimen ension sional su al subspace! bspace! IAC Winter School 2018, La Laguna
cost t functi tion based LVQ one example: Generalized LVQ (GLVQ) cost t functi tion [Sato&Yamada, 1995] d ( w J , x m ) − d ( w K , x m ) � X minimize e ( x m ) with e ( x m ) = ϕ E = d ( w J , x m ) + d ( w K , x m ) m two winning proto toty types: E favors - small number of misclassifications, e.g. with - large margins between classes d J d K - s - small , larg all , large e - class-ty typical proto toty types IAC Winter School 2018, La Laguna
GL GLVQ VQ tr training = optimization with respect to prototype position, e.g. single example presentation, stochastic sequence of examples, update of two prototypes per step based on non-negati tive, differenti tiable dista tance IAC Winter School 2018, La Laguna
GL GLVQ VQ tr training = optimization with respect to prototype position, e.g. single example presentation, stochastic sequence of examples, update of two prototypes per step based on non-negati tive, differenti tiable dista tance IAC Winter School 2018, La Laguna
GL GLVQ VQ tr training = optimization with respect to prototype position, e.g. single example presentation, stochastic sequence of examples, update of two prototypes per step based on non-negati tive, differenti tiable dista tance moves prototypes towards / away from sample with prefactors IAC Winter School 2018, La Laguna
Alte ternati tive dista tance measures fixed, pre-defined dista tance measures: heuristic LVQ1, GLVQ (or more general cost function based LVQ): can be based on general, differentiable distances, | w j − x j | p � 1 /p � X e.g. Minkowski measures d p ( w , x ) = j pos possible w ible work- ork-flow low - - select several distance measures according to prior knowledge or a data-driven choice in a preprocessing step - compare performance of various measures examples: kernelized distances divergences (statistics) IAC Winter School 2018, La Laguna 17
Kernelized Kernelized dista tances rewrite squared Euclidean d ( w , x ) = w 2 − 2 w · x + x 2 distance in terms of dot-product analagous: distance measure associated with general inner product or kernel function d K ( w , x ) = K ( w , w ) − 2 K ( w , x ) + K ( x , x ) e.g. Gaussian Kernel − ( x − y ) 2 ✓ ◆ K ( x , y ) = exp with kernel width σ 2 σ 2 implicit mapping to high-dimensional space for better seperability of classes, similar: Support Vector Machine Biehl, Hammer, Villmann. 2014 Distance measures for prototype-based classification 2016 Prototype-based models in machine learning IAC Winter School 2018, La Laguna 18
Relev Relevan ance Learn ce Learnin ing elegant approach: Relevance Learning / adapti tive dista tances - employ a parameterized distance measure - with only the mathematical form fixed in advance - optimize its parameters in the training process - adaptive, data driven dissimilarity example: Matr trix Relevance LVQ - data ta-driven optimization of prototypes and relevance matrix - - in the same training process ( ≠ pre-processing ) IAC Winter School 2018, La Laguna
Generalized Matr trix Relevance LVQ: GM GMLVQ VQ [Schneider, Biehl, Hammer, 2009] generalized quadrati tic dista tance in LVQ: d ( w , x ) = ( w − x ) > Λ ( w − x ) = [ Ω ( w − x ) ] 2 IAC Winter School 2018, La Laguna
Generalized Matr trix Relevance LVQ: GM GMLVQ VQ [Schneider, Biehl, Hammer, 2009] generalized quadrati tic dista tance in LVQ: d ( w , x ) = ( w − x ) > Λ ( w − x ) = [ Ω ( w − x ) ] 2 training: adaptation of prototypes tr and distance measure guided by GLVQ cost function variants ts: one g global, lobal, several local, local, class-wise relevance matrices diag diagon onal al matrices: : single feature weight s [Hammer et al., 2002] recta tangular low-dim. representation / visualization [Bunte et al., 2012] IAC Winter School 2018, La Laguna
inte terpreta tati tion after training: proto toty types represent typical class properties or subtypes Relevance Matr trix quantifies the contribution of the pair Λ ij of features (i,j) to the distance summarizes • the contribution of a single dimension • th the relevance of original features in the classifier Note: interpretation assumes implicitly that features have equal order of magnitude e.g. after z-score-transformation → (averages over data set) IAC Winter School 2018, La Laguna 22
Recommend
More recommend