Instance Based Learning [Read Ch. 8] � k -Nearest Neigh b or � Lo cally w eigh ted regression � Radial basis functions � Case-based reasoning � Lazy and eager learning 199 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
Instance-Based Learning Key idea: just store all training examples h x ; f ( x ) i i i Nearest neigh b or: � Giv en query instance x , �rst lo cate nearest q training example x , then estimate n ^ f ( x ) f ( x ) q n k -Nearest neigh b or: � Giv en x , tak e v ote among its k nearest n brs (if q discrete-v alued target function) � tak e mean of f v alues of k nearest n brs (if real-v alued) P k f ( x ) i i =1 ^ f ( x ) q k 200 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
When T o Consider Nearest Neigh b or n � Instances map to p oin ts in < � Less than 20 attributes p er instance � Lots of training data Adv an tages: � T raining is v ery fast � Learn complex target functions � Don't lose information Disadv an tages: � Slo w at query time � Easily fo oled b y irrelev an t attributes 201 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
V oronoi Diagram − − − + + x q − + + − 202 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
Beha vior in the Limit Consider p ( x ) de�nes probabilit y that instance x will b e lab eled 1 (p ositiv e) v ersus 0 (negativ e). Nearest neigh b or: � As n um b er of training examples ! 1 , approac hes Gibbs Algorithm Gibbs: with probabilit y p ( x ) predict 1, else 0 k -Nearest neigh b or: � As n um b er of training examples ! 1 and k gets large, approac hes Ba y es optimal Ba y es optimal: if p ( x ) > : 5 then predict 1, else 0 Note Gibbs has at most t wice the exp ected error of Ba y es optimal 203 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
Distance-W eigh ted k NN Migh t w an t w eigh t nearer neigh b ors more hea vily ... P k w f ( x ) i i i =1 ^ f ( x ) q P k w i i =1 where 1 w � i 2 d ( x ; x ) q i and d ( x ; x ) is distance b et w een x and x q i q i Note no w it mak es sense to use al l training examples instead of just k ! Shepard's metho d 204 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
Curse of Dimensionali t y Imagine instances describ ed b y 20 attributes, but only 2 are relev an t to target function Curse of dimensionality : nearest n br is easily mislead when high-dimensional X One approac h: � Stretc h j th axis b y w eigh t z , where z ; : : : ; z j 1 n c hosen to minimize prediction error � Use cross-v alidati on to automatically c ho ose w eigh ts z ; : : : ; z 1 n � Note setting z to zero eliminates this dimension j altogether see [Mo ore and Lee, 1994] 205 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
Lo cally W eigh ted Regression Note k NN forms lo cal appro ximation to f for eac h query p oin t x q ^ Wh y not form an explici t appro ximation f ( x ) for region surrounding x q � Fit linear function to k nearest neigh b ors � Fit quadratic, ... � Pro duces \piecewise appro ximation" to f Sev eral c hoices of error to minimize: � Squared error o v er k nearest neigh b ors 1 X 2 ^ E ( x ) � ( f ( x ) � f ( x )) 1 q 2 x 2 k near est nbr s of x q � Distance-w eigh ted squared error o v er all n brs 1 X 2 ^ E ( x ) � ( f ( x ) � f ( x )) K ( d ( x ; x )) 2 q q 2 x 2 D � : : : 206 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
Radial Basis F unction Net w orks � Global appro ximation to target function, in terms of linear com bination of lo cal appro ximations � Used, e.g., for image classi�cati on � A di�eren t kind of neural net w ork � Closely related to distance-w eigh ted regression, but \eager" instead of \lazy" 207 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
Radial Basis F unction Net w orks f(x) where a ( x ) are the attributes describing instance i x , and k X f ( x ) = w 0 w + w w K w k ( d ( x ; x )) 0 u u u 1 u =1 ... 1 One common c hoice for K ( d ( x ; x )) is u u 1 2 � d ( x ;x ) u 2 2 � u K ( d ( x ; x )) = e u u ... a (x) a (x) a (x) 1 2 n 208 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
T raining Radial Basis F unction Net- w orks Q1: What x to use for eac h k ernel function u K ( d ( x ; x )) u u � Scatter uniformly throughout instance space � Or use training instances (re�ects instance distribution) Q2: Ho w to train w eigh ts (assume here Gaussian K ) u � First c ho ose v ariance (and p erhaps mean) for eac h K u { e.g., use EM � Then hold K �xed, and train linear output la y er u { e�cien t metho ds to �t linear function 209 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
Case-Based Reasoning Can apply instance-based learning ev en when n X 6 = < ! need di�eren t \distance" metric Case-Based Reasoning is instance-based learning applied to instances with sym b olic logic descriptions ((user-complaint error53-on-shutd own) (cpu-model PowerPC) (operating-system Windows) (network-connecti on PCIA) (memory 48meg) (installed-applic ation s Excel Netscape VirusScan) (disk 1gig) (likely-cause ???)) 210 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
Case-Based Reasoning in CADET CADET: 75 stored examples of mec hanical devices � eac h training example: h qualitati v e function, mec hanical structure i � new query: desired function, � target v alue: mec hanical structure for this function Distance metric: matc h qualitat i v e function descriptions 211 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
Case-Based Reasoning in CADET A stored case: T−junction pipe Structure: Function: Q ,T T = temperature Q 1 1 + Q 1 = waterflow Q 3 Q + 2 Q ,T 3 3 T + 1 T 3 Q ,T T + 2 2 2 A problem specification: Water faucet Structure: Function: + C Q + ? t c + + Q 212 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997 + m C Q f + h − + + T c T m T + h
Case-Based Reasoning in CADET � Instances represen ted b y ric h structural descriptions � Multiple cases retriev ed (and com bined) to form solution to new problem � Tigh t coupling b et w een case retriev al and problem solving Bottom line: � Simple matc hing of cases useful for tasks suc h as answ ering help-desk queries � Area of ongoing researc h 213 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
Lazy and Eager Learning Lazy: w ait for query b efore generalizi ng � k -Nearest Neighbor , Case based reasoning Eager: generalize b efore seeing query � Radial basis function net w orks, ID3, Bac kpropagation, Naiv eBa y es, : : : Do es it matter? � Eager learner m ust create global appro ximation � Lazy learner can create man y lo cal appro ximations � if they use same H , lazy can represen t more complex fns (e.g., consider H = linear functions) 214 lecture slides for textb o ok Machine L e arning , � c T om M. Mitc hell, McGra w Hill, 1997
Recommend
More recommend