j vega
play

J. Vega Asociacin EURATOM/CIEMAT para Fusin jesus.vega@ciemat.es - PowerPoint PPT Presentation

J. Vega Asociacin EURATOM/CIEMAT para Fusin jesus.vega@ciemat.es Concepts Classification Regression Advanced methods 7th FDPVA. Frascati (March 26-28, 2012) 2 Technology: it gives stereotyped solutions to stereotyped


  1. J. Vega Asociación EURATOM/CIEMAT para Fusión jesus.vega@ciemat.es

  2.  Concepts  Classification  Regression  Advanced methods 7th FDPVA. Frascati (March 26-28, 2012) 2

  3.  Technology: it gives stereotyped solutions to stereotyped problems  Basic science: it is the accumulation of knowledge to explain a phenomenon  Applied science: it is the application of scientific knowledge to a particular environment ◦ Machine learning can increase our comprehension of plasma physics 7th FDPVA. Frascati (March 26-28, 2012) 3

  4.  Learning does not mean ‘ learning by heart ’ (any computer can memorize)  Learning means ‘ generalization capability ’: we learn with some samples and predict for other samples 7th FDPVA. Frascati (March 26-28, 2012) 4

  5.  The learning problem is the problem of finding a desired dependence (function) using a limited number of observations (training data) ◦ Classification: the function can represent the separation frontier between two classes ◦ Regression: the function can provide a fit to the data o o o o o o o o o x o x x x x o o o x x o o o o o o o o Classification Regression 7th FDPVA. Frascati (March 26-28, 2012) 5

  6.  The general model of learning from examples is described through three components  a ˆ y f ( , ) x Generator of random Learning machine:  n x vectors: p( x ) f( x , a ) Supervisor: p(y| x )    y p y | x p( x ) : fixed but unknown probability distribution function y = p(y| x ) (fixed and unknown) ( x i , y i ) , i = 1, ..., N : training samples  The problem of learning is that of choosing from the given set of functions f( x , a ) , the one that best approximates the supervisor’s response     ˆ ˆ ˆ y y , ,..., y "close" to y y , ,..., y 1 2 N 1 2 N 7th FDPVA. Frascati (March 26-28, 2012) 6

  7.  Main hypothesis ◦ The training set, ( x i , y i ) , i = 1, ..., N , is made up of independent and identically distributed (iid) observations drawn according to p( x , y) = p(y| x )p( x )  Loss function: L(y, f( x , a ) ) ◦ It measures the quality of the approach performed by the learning algorithm, i.e. the discrepancy between the response y of the supervisor and the response f( x , a ) of the learning machine. Its values are ≥ 0     Risk functional:        a a R L y f , , p , y d dy x x x The goal of a learning process is to find the function f( x , a 0 ) that minimizes R( a ) (over the class of functions f( x , a ) ) in the situation where p( x , y) is unknown and the only available information is contained in the training set 7th FDPVA. Frascati (March 26-28, 2012) 7

  8.         a   a R L y f , , p , y d dy x x x  Pattern recognition (or classification)     a  0 if y f x ,     a   L y f , , x    a  1 if y f x ,   Regression estimation         2 a   a L y f , , y f , x x  Density estimation       a   a L p , log p x , x 7th FDPVA. Frascati (March 26-28, 2012) 8

  9.    0 if x ax b   a   2 1 f x ,    1 if x ax b 2 1   a  a b , 7th FDPVA. Frascati (March 26-28, 2012) 9

  10.       a   a  f x , x A f , , , A f , 1 1 1 1 1            f x , x A , , f , A , f 2 2 2 2 2            f x , x m b , , , m n , 3 7th FDPVA. Frascati (March 26-28, 2012) 10

  11. 7th FDPVA. Frascati (March 26-28, 2012) 11

  12. Dataset: ( x 1 , y 1 ) , ( x 2 , y 2 ) , ..., ( x i , y i ) , ..., ( x N , y N ) x i ∈ R m : features that are of distinctive nature (object description with attributes managed by computers) y i ∈{L 1 , L 2 , ..., L K }: label of the sample x i Continuous-valued (length, pressure) Quantitative Discrete (numerical) (total basketball score, number of citizens in a town) Feature types Qualitative Ordinal (categorical) (education degree) Nominal (profession, brand of a car) 7th FDPVA. Frascati (March 26-28, 2012) 12

  13. Dataset: ( x 1 , y 1 ) , ( x 2 , y 2 ) , ..., ( x i , y i ) , ..., ( x N , y N ) x i ∈ R m : features that are of distinctive nature (object description with attributes managed by computers) y i ∈{L 1 , L 2 , ..., L K }: known label of the sample x i Objective ve: to determine a separating function between classes (generalization) to predict the label of new samples with known feature vectors ( ( x N+1 , y N+1 ) , ( x N+2 , y N+2 ) , ...) Overfitting x x x x Decision boundary Decision boundary 7th FDPVA. Frascati (March 26-28, 2012) 13

  14.  How good is a classifier? ( x i , y i ) , i = 1, ..., J: training set Dataset: ( x 1 , y 1 ) , ( x 2 , y 2 ) , ..., ( x N , y N ) ( x i , y i ) , i = J+1, ..., N: test set  Training set: a model is created to make predictions. Given x i , the model predicts y i  Test set: model validation. The success rate is taken as the level of confidence and it is assumed to be the same for all future samples 7th FDPVA. Frascati (March 26-28, 2012) 14

  15.  Multi-class problems: K > 2 ◦ It can be tackled as K binary problems. Each class is compared with the rest (one-versus-the-rest approach) c 2 not c 2 c 1 c 1 not c 1 ambiguity c 2 c 4 region not c 3 c 3 c 3 c 4 not c 4 7th FDPVA. Frascati (March 26-28, 2012) 15

  16.  Examples of feature vectors ◦ Disruptions        ( ), t I ( ), t n t ( ),... , y D N , x 1 p s p s e s 1           x ( t T ), I ( t T ), n t ( T ),... , y D N , 2 p s p s e s 2           . . . . ( t 2 ), T I ( t 2 ), T n t ( 2 ),... , T y D N , x 3 p s p s e s 3 ◦ L/H transition       m , y , , y L H , x x i i i i ◦ Image classification  x i : the set of pixels of an image  y i ∈{1, 2, 3} 0 5 10 15 20 time 7th FDPVA. Frascati (March 26-28, 2012) 16

  17.  Single classifiers ◦ Support Vector Machines (SVM) ◦ Neural networks ◦ Bayes decision theory  Parametric method  Non-parametric method ◦ Classification trees  Combining classifiers 7th FDPVA. Frascati (March 26-28, 2012) 17

  18. Binary classifier  It finds the optimal separating hyper-plane between classes  Samples: (x k , y k ), x k ∈R n , k = 1, ..., N, y∈{C {+1} , C {-1} }  C {+1} C {+1}  D ( ) x 1    D ( ) x w x . b 0   D ( ) 1 x D (x)>+1 | D x ( ) | k || w || C {-1} C {-1} D (x)<-1 w Maximum y D ( ) x margin: 2 t    t     k k , y 1, 1 , k 1, , N k w To find the optimal hyper-plane it is necessary to determine the vector w that maximizes the margin t There are infinite solutions due to the presence of a scale factor. To avoid this: t || w|| = 1 Therefore, to maximize the margin is equivalent to minimize || w|| Opti timi miza zation ion problem: lem:    2     min J ( ) , subject to y . w 1 w w  w x  w , w k i 0 0 7th FDPVA. Frascati (March 26-28, 2012) 18

  19. N   Solution: a * *  y w x (x k , y k ), x k ∈R n , k = 1, ..., N, y∈C {+1} , C {-1} } i i i  i 1 a i are the Lagrange multipliers Samples associated to a i ≠ 0 are C {+1}  called “ support vectors ”    * * ( ) b 0 w x  Support  a * y w x vectors i i i support vectors The rest of training samples are irrelevant to classify new samples C {-1} The constant b is obtained from any  condition (Karush-Kuhn-Tucker) w   a        y ( ) b 1 0, i 1 , ,N w x   i i i   is the distance (with sign) from X to the separating hyper-plane * * D ( ) · b x w x Given to classify x    a      * *   if sign y ( ) b 0, C . Otherwise C x x x x       i i i 1 1   vectores soporte V. Cherkassky, F. Mulier. Learning from data . 2 nd edition. Wiley-Interscience. 7th FDPVA. Frascati (March 26-28, 2012) 19

  20.  Non-linearly separable case input space feature space H ( x , x ’)  Kernels ◦ Linear:  H ( , ) ( . ) x x' x x' ◦ Polynomials of degree q:   [( . ) 1] q H ( , ) x x' x x'   2    x x' ◦ Radial basis functions:     H ( , ) exp x x'  2 ◦ Neural network:       H ( , ) tanh(2( . ) 1) x x' x x' Given to classify x      a     * *   if sign y H , b 0, C . Otherwise C x x x x       i i i 1 1   vectores soporte V. Cherkassky, F. Mulier. Learning from data . 2 nd edition. Wiley-Interscience. 7th FDPVA. Frascati (March 26-28, 2012) 20

Recommend


More recommend