DM825 Introduction to Machine Learning Lecture 4 Model Assessment Generalized Linear Models Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark
Error Estimation Methods Outline Generalized Linear Models 1. Error Estimation Methods 2. Generalized Linear Models 2
Error Estimation Methods Outline Generalized Linear Models 1. Error Estimation Methods 2. Generalized Linear Models 3
Error Estimation Methods Loss Function in Classification Generalized Linear Models G = { 1 , . . . , k } x ) = Pr( G = k | � p k ( � X = � x ) the probability modeled ˆ G ( � x ) = argmax k ˆ p k ( � x ) predicted L ( G, ˆ x )) = I ( G � = ˆ G ( � G ( � x )) 0–1 loss K � L ( G, ˆ G ( � x ) = − 2 I ( G = k ) log 2 ˆ p k ( � x ) entropy k =1 = − 2 log 2 ˆ p G ( � x ) 4
Error Estimation Methods Akaike Information Criterion Generalized Linear Models AIC = log ( p ( D | θ )) − p requires an adjustment of max likelihood to account for different complexities in the models choose model with largest AIC: computed on training set only. 5
Error Estimation Methods Methods to Estimate Error Curves Generalized Linear Models Model selection: estimate performance in order to choose the best model model assessment: selected a final model, estimating its prediction error on new data. If plenty of data, divide data randomly and use: 50% for training 25% for model selection (validation) 25% for assessment If less data: cross validation Bootstrap method 6
Error Estimation Methods Cross Validation Generalized Linear Models k -fold cross validation: k parts of m/k elements leave k part out and use the rest of the data to train the model (if k = m then leave-one-out) We use extra sample to estimate error Err = E [ L ( Y, h ( x ))] where ( Y, � X ) from joint distribution for i from 1 to k do take out the i th part fit models on other k − 1 parts calculate prediction error when predicting i th part ϕ : { 1 . . . m } → { 1 . . . k } by randomization ˆ h − i ( � x ) fitted function on data � x with i th part removed m � CV = 1 ( L ( y i , ˆ h − ϕ ( i ) ( � x i )) m i =1 k = 5 , 10 search ˆ θ that minimizes CV. 7
Error Estimation Methods Bootstrap Method Generalized Linear Models z = ( z 1 , z 2 , . . . , z m ) and z i = ( x i , y i ) Training set � randomly draw data sets with replacement repeat draw a data set fit the model until B = 100 times ; 8
Error Estimation Methods Generalized Linear Models We can estimate any aspect of S ( � z ) � B 1 � ( S ( z ∗ b ) − ¯ S ∗ ) 2 Var[ S ( � z )] = B − 1 b =1 B m � � Err boost = 1 1 � L ( y i , ˆ h ∗ b ( x i )) B m j =1 b =1 x i of model fitted on b th. There are common ˆ h ∗ b ( x i ) is predicted value at � observations between training and test observations. To avoid this: m � � Err boost = 1 1 � L ( y i , ˆ h ∗ b ( x i )) m | C − i | i =1 b ∈ C − i C − i is set of indices of the bootstrap samples b that do not contain observation i . 9
Error Estimation Methods Outline Generalized Linear Models 1. Error Estimation Methods 2. Generalized Linear Models 10
Error Estimation Methods Exponential Family of Distributions Generalized Linear Models We have seen: regression y | x ; θ ∼ N ( µ, σ 2 ) classification y | x ; θ ∼ Bern( µ, σ 2 ) They can be shown to belong to the framework: GLM Exponential distribution: η T � η T � p ( � y | η ) = c ( � y ) g ( � η ) exp { � u ( � y ) } = b ( � y ) exp { � T ( � y ) − a ( � η ) } � y scalar or vector, discrete or continuous c ( y ) = b ( y ) � η canonical or natural parameters u ( y ) = T ( y ) � u ( � y ) function of � y 1 g ( η ) = g ( � η ) ensures the distribution is normalized: exp( a ( η )) � η T � g ( � η ) c ( � y ) exp { � u ( � y ) } d� y = 1 11
Exponential Family of Distributions Error Estimation Methods Generalized Linear Models Gaussian distribution Gaussian distribution with σ 2 = 1 as an exponential distribution � � 1 − 1 2( y − µ ) 2 √ p ( y | µ ) = 2 π exp � � � � 1 − 1 µy − 1 2 y 2 2 µ 2 √ = 2 π exp exp η = µ u ( y ) = y � � 1 − 1 2 y 2 c ( y ) = √ 2 π exp � � − µ 2 g ( η ) = exp 2 12
Exponential Family of Distributions Error Estimation Methods Generalized Linear Models Gaussian distribution Gaussian distribution as an exponential distribution � � 1 − 1 2 σ 2 ( y − µ ) 2 p ( y | µ ) = √ 2 πσ 2 exp � � � µy � 1 − 1 1 2 σ 2 y 2 2 σ 2 µ 2 = √ 2 πσ 2 exp exp σ 2 − � � µ � η = σ 2 1 − 2 σ 2 � y � � u ( y ) = y 2 1 √ c ( y ) = 2 π � η 2 � � 1 g ( � η ) = − 2 η 2 exp 4 η 2 13
Exponential Family of Distributions Error Estimation Methods Generalized Linear Models Bernoulli distribution Bernoulli distribution as an exponential distribution p ( y | µ ) = Bern( y | µ ) = µ y (1 − µ ) 1 − y = exp { y log µ + (1 − y ) log(1 − µ ) } exponent of log = exp { y log µ + log(1 − µ ) − y log(1 − µ ) } � � � � µ = (1 − µ ) exp log y 1 − µ µ 1 η = log µ = σ ( η ) = 1 − µ 1+exp( − η ) link function response function 1 − µ = 1 − σ ( η ) 1 − σ ( η ) = σ ( − η ) u ( y ) = y p ( y | η ) = σ ( − η ) exp( ηy ) c ( y ) = 1 = σ ( − η ) g ( η ) 14
Exponential Family of Distributions Error Estimation Methods Generalized Linear Models Multinomial distribution y ∈ { 1 , 2 , . . . k } modeled as multinomial variable: � y | θ ∼ Multinomial( � µ ) � k j =1 µ j = 1 � µ 1 , . . . µ k − 1 independent parameters � p ( y = j | � µ ) = µ j µ ) = µ k = 1 − � k − 1 and p ( y = k | � j =1 µ j j =1 µ x j µ ) = Π k p ( � y | � � y = ( y 1 , . . . , y k ) j � k = exp y j ln µ j j =1 η T � y | � p ( � η ) = exp( � y ) η j = ln µ j , � η = ( η 1 , . . . , η m ) � u ( � y ) = � y c ( � y ) = 1 g ( � η ) = 1 15
Error Estimation Methods Generalized Linear Models removing the constraint that � k j =1 µ j = 1 � k � k − 1 k − 1 � k − 1 � exp y j ln µ j = exp y j ln µ j + (1 − y j ) ln(1 − µ j ) j =1 j =1 j =1 j =1 k − 1 � � k − 1 µ j + ln(1 − = exp y j ln (1 − � m − 1 µ j ) j =1 y j ) j =1 j =1 µ j ln = η j (1 − � k − 1 j =1 y j ) exp( η j ) µ j = softmax function 1 + � k − 1 exp( η j ) j η T � exp( � x ) u ( � � y ) = � y p ( � y | � η ) = 1 + � k − 1 j =1 exp( η j ) c ( � y ) = 1 1 g ( � y ) = 1 + � k − 1 j =1 exp( η j ) 16
Error Estimation Methods Exponential Family of Distributions Generalized Linear Models Other distributions: Poisson (for counting problems) gamma and exponential (for continuous nonnegative random variables, such as time intervals) beta and Dirichelet (for distributions over probabilities) 17
Error Estimation Methods Maximum Likelihood Generalized Linear Models estimate parameter � η in general exponential family distribution x 1 , . . . , � x m ) training data X = ( � � m � � � m � � η ) m exp x i ) η T x i ) p ( X | � η ) = h ( � g ( � � � u ( � i =1 i =1 m � −∇ log g ( η ML ) = 1 x i ) � u ( � m i =1 18
Error Estimation Methods Conjugate Priors Generalized Linear Models we seek a prior that is conjugate to the likelihood function such that the posterior has the same functional form as the prior η ) ν exp { ν� η T � p ( � η | X , � χ, ν ) = f ( � χ, ν ) g ( � χ } 19
Error Estimation Methods Constructing GLM Generalized Linear Models Consider a classification or a regression problem ( y, � x ) . Predict y as a function of � x . (eg, predict number of page views in our web site based on certain features such as time of the day, advertising, etc.) Assumptions: 1. y | � x ; θ ∼ ExpFam( � η ) 2. given � x , predict expected value of u ( y ) : if u ( y ) = y = ⇒ h ( y ) = E [ y | � x ] 3. � η and input � x are related linearly (linear predictor): η = � ( η i = � θ T � θ T x i � x ) 20
Error Estimation Methods Ordinary Least Squares Generalized Linear Models x ; θ ∼ N ( µ, σ 2 ) y | � h � θ ( � x ) = E [ y | � x ; θ ] assumption 2. = µ because normal = η ass. 1 + what shown before = θ T � x ass. 2. 21
Error Estimation Methods Logistic Regression Generalized Linear Models y | � x ; θ ∼ Bern( µ ) h � θ ( � x ) = E [ y | � x ; θ ] assumption 2. = µ because Bernoulli 1 = ass. 1 + what shown before 1 + exp( − � η ) 1 = ass. 2. 1 + exp( − � θ T � x ) This answers also the question why the logistic sigmoid function was chosen g ( η ) = E [ � u ( � x ); η ] canonical response function g − 1 canonical link function 22
Recommend
More recommend