Non-metric Methods We have focused on real-valued feature vectors - PowerPoint PPT Presentation

Non-metric Methods • We have focused on real-valued feature vectors or discrete valued numbers with a natural measure of distance between vectors (metric) • Some classification problems describe a pattern by a list of attributes: a fruit may be described by 4-tuple (red, shiny, sweet, small) • How to learn categories using non-metric data where distance between attributes can not be measured? • Decision tree, a.k.a. hierarchical classifier, multi- stage classification, rule-based methods

Data Type and Scale Data type: degree of quantization in the data • – binary feature: two values (Yes-No response) – discrete feature: small number of values (image gray values) – continuous feature: real value in a fixed range Data scale: relative significance of numbers • – qualitative scales • Nominal (categorical): numerical values are simply used as names; e.g., (yes, no) response can be coded as (0,1) or (1,0) or (50,100) • Ordinal: numbers have meaning only in relation to one another (e.g., one value is larger than the other); e.g., scales (1, 2, 3), and (10, 20, 30) are equivalent – quantitative scales • Interval: separation between values has meaning; equal differences on this scale represent equal differences in temperature, but temperature of 30 degrees is not twice as warm as one of 15 degrees. • Ratio: an absolute zero exists along with a unit of measurement; ratio between two numbers has meaning (height)

Properties of a Metric • A metric D(.,.) is merely a function that gives a generalized scalar distance between two argument patterns • A metric must have four properties: For all vectors a , b , and c , the properties are: – Non-negativity: D( a , b ) >= 0 – reflexivity: D( a , b ) = 0 if and only if a = b – symmetry: D( a , b ) = D( b , a ) – triangle inequality: D( a , b ) + D( b , c ) >= D( a , c ) • It is easy to verify that the Euclidean formula for distance in d dimensions possesses the properties of metric 1 / 2 æ ö d = å - ç 2 ÷ D ( a , b ) ( a b ) k k è ø = k 1

General Class of Metrics • Minkowski metric 1 / k æ ö = å d ç - ÷ k L ( a , b ) | a b | k i i è ø = i 1 • Manhattan distance d å = - L ( a , b ) | a b | 1 i i = i 1

Scaling the Data • Although one can always compute the Euclidean distance between two vectors, the results may or may not be meaningful • If the space is transformed by multiplying each coordinate by an arbitrary constant, the Euclidean distance in the transformed space is different from original distance relationship; such scale changes can have a major impact on NN classifiers

Decision Trees (Sections 8.1-8.4) • Non-metric methods • CART (Classification & regression Trees) • Number of splits • Query selection & node impurity • Multiway splits • When to stop splitting? • Pruning • Assignment of leaf node labels • Feature choice • Multivariate decision trees • Missing attributes

Decision Tree Seven-class, 4-feature classification problem Apple = (green AND medium) OR (red AND medium) = (Medium AND NOT yellow)

Advantages of Decision Trees • A single-stage classifier assigns a test pattern X to one of C classes in a single step • Limitations of single-stage classifier – Common feature set is used for distinguishing C classes; may not be the best for specific pairs of classes – Requires a large no. of features for large no. of classes – Does not perform well when classes are multimodal – Not easy to handle nominal data • Advantages of decision trees – Classify patterns by sequence of questions (20-question game); next question depends on previous answer – Interpretability; rapid classification; high accuracy & speed

How to Grow A Tree? • Given a set D of labeled training samples and a feature set • How to organize the tests into a tree? Each test or question involves a single feature or subset of features • A decision tree progressively splits the training set into smaller and smaller subsets • Pure node: all the samples at that node have the same class label; no need to further split a pure node • Recursive tree-growing: Given data at a node, decide the node as a leaf node or find another feature to split the node • CART (Classification & Regression Trees)

Classification & Regression Tree (CART) • Six design issues – Binary or multivalued attributes (answers to questions)? How many splits at a node? – Which feature or feature combinations at a node? – When is a node leaf node? – If tree becomes “too large”, can it be pruned? – If a leaf node is impure, how to assign it a category? – How should missing data be handled?

Number of Splits Binary tree: every decision can be represented using just binary outcome; tree of Fig 8.1 can be equivalently written as

Query Selection & Node Impurity • Which attribute test or query should be performed at each node? • Seek a query T at node N so descendent nodes are as pure as possible • Query of the form x i <= x is leads to hyperplanar boundaries (monothetic tree; one feature/node)

Query Selection and Node Impurity P( w j ) : fraction of patterns at node N in category w j • Node impurity is 0 when all patterns at a node are from same category • Impurity is maximum when all classes at node N are equally likely • Entropy impurity is most popular • Gini impurity (Fig 8.4) • é ù 1 å å = w w = - w 2 i N ( ) P ( ) ( P ) 1 P ( ) (3) ê ú i j j 2 ë û ¹ i j j • Misclassification impurity

Query Selection and Node Impurity Given a partial tree down to node N, what query to choose? • Choose the query at node N to decrease the impurity as much as possible • Drop in impurity is defined as • PL is the fraction of patterns going to the left node. Best query value s for test T is value that maximizes the drop in impurity • Optimization in Eq. (5) is “greedy”—done at a single node so no guarantee • of global optimum of impurity

When to Stop Splitting? • If tree is grown until each leaf node has lowest impurity, then overfitting; in the limit, each leaf node will have one pattern! • If splitting is stopped too early, training set error will be high • Validation and cross-validation – Continue splitting until error on validation set is minimum – Cross-validation relies on several independently chosen subsets • Stop splitting when the best candidate split at a node reduces the impurity by less than the preset amount (threshold) • How to set the threshold? Stop when a node has small no. of points or some fixed percentage of total training set (say 5%) • Trade off between tree complexity (size) vs. test set accuracy

Pruning • Stopping tree splitting early may suffer from lack of sufficient look ahead • Pruning is the inverse of splitting • Grow the tree fully—until leaf nodes have minimum impurity. Then all pairs of leaf nodes (with a common antecedent node) are considered for elimination • Any pair whose elimination yields a satisfactory (small) increase in impurity is eliminated, and the common antecedent node is declared as leaf node

Example 1: A Simple Tree

Example 1. Simple Tree Entropy impurity at nonterminal nodes is shown in red and impurity at each leaf node is 0 Instability or sensitivity of tree to training points; alteration of a single point leads to a very different tree; due to discrete & greedy nature of CART

Decision Tree

Choice of Features Using PCA may be more effective than original features!

Multivariate Decision Trees Allow splits that are not parallel to feature axes

Missing Attributes • Some attributes for some of the patterns may be missing during training, during classification, or both • Naïve approach: delete any such deficient patterns • Calculate impurities at anode N using only the attribute information present

Decision Tree – IRIS data Used first 25 samples from each category • Two of the four features x1 and x2 do not appear in the tree à feature • selection capability Sethi and Sarvaraydu, IEEE Trans. PAMI, July 1982

Decision Tree for IRIS data • 2-D Feature space representation of the decision boundaries X 4 (Petal width) Virginica Setosa 1.65 Versicolor 2.6 4.95 X 3 (Petal length)

Random Forests • Random forests or random decision forests are an ensemble learning method for classification/regression • construct multiple decision trees at training time and output the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. • Random decision forests correct for overfitting at training • How do you construct multiple decision tress? random subspace, bagging and random selection of feature

Decision Tree – Hand printed digits 160 7-dimensional patterns from 10 classes; 16 patterns/class. Independent test set of 40 samples

Non-metric Methods We have focused on real-valued feature vectors - PowerPoint PPT Presentation

Non-metric Methods We have focused on real-valued feature vectors or discrete valued numbers with a natural measure of distance between vectors (metric) Some classification problems describe a pattern by a list of attributes: a fruit may

Welcome back... Metric spaces. Approximate metric using a tree. Tree metric: 16 16 A metric

Metric Spaces Definition If d is a metric on X , then the metric topology on X induced by d is

Information- -Velocity Metric Velocity Metric Information-Velocity Metric Information for the

Metric Conversions Ladder Method T. Trimpe 2008 http://sciencespot.net/ Metric System The

Dynamical Systems Continuous maps of metric spaces We work with metric spaces, usually a

Distance Metric Learning: Beyond 0/1 Loss Praveen Krishnan CVIT, IIIT Hyderabad June 14, 2017 1

The Metric Coalescent joint with David Aldous Daniel Lanoue University of California, Berkeley

The Metric Coalescent Process joint with David Aldous Daniel Lanoue June 17, 2014 Daniel Lanoue

The Metric Dimension Problem. J. D az Monash U., May 2018 The Metric Dimension problem

GENERAL RULES OF MEASUREMENT All measurements are METRIC! Use the correct piece of

Effects of the quantum conformal matter on metric perturbations Jen-Tsung Hsiang National

Multiwavelength UV-metric and pH-metric determination of the dissociation constants of the

A Few Pearls in the Theory of Quasi-Metric Spaces Jean Goubault-Larrecq ANR Blanc CPP TACL

Shortest Path Similar Routing 2 A New Metric A new metric path- based metric that can use used

SIMILARITY SEARCH The Metric Space Approach Pavel Zezula, Giuseppe Amato, Vlastislav Dohnal,

SIMILARITY SEARCH The Metric Space Approach Pavel Zezula, Giuseppe Amato, Vlastislav Dohnal,

Indirect measurements of a harmonic oscillator Gian Michele Graf ETH Zurich Quantissima in the

Measuring -- a Simple Solution!? Why not obtain a WCET estimate by measuring the execution time?

The Webinar Will Begin Shortly. Please remember to mute your phone lines and computers. Please

Software Measurement Massimo Felici and Conrad Hughes mfelici@staffmail.ed.ac.uk

Types for Units-of-Measure in F# Andrew Kennedy Microsoft Research Cambridge NASA Star

Physical quantities, measurement sets and theories ADASS, Paris Nov. 8 2011 F. Viallefond.

2020 Le Level el II Pr II Prep C Clas ass The e Sal ales es C Comparis ison on A

A Study of Quantities Sebastian Brandt brandt@cs.manchester.ac.uk (Slides by Bijan Parsia,