estimation theory overview introduction up until now we
play

Estimation Theory Overview Introduction Up until now we have - PowerPoint PPT Presentation

Estimation Theory Overview Introduction Up until now we have defined and discussed properties of random Properties variables and processes Bias, Variance, and Mean Square Error In each case we started with some known property (e.g.


  1. Estimation Theory Overview Introduction • Up until now we have defined and discussed properties of random • Properties variables and processes • Bias, Variance, and Mean Square Error • In each case we started with some known property (e.g. • Cram´ er-Rao lower bound autocorrelation) and derived other related properties (e.g. PSD) • Maximum likelihood • In practical problems we rarely know these properties a priori • Consistency • In stead, we must estimate what we wish to know from finite sets • Confidence intervals of measurements • Properties of the mean estimator J. McNames Portland State University ECE 4/557 Estimation Theory Ver. 1.26 1 J. McNames Portland State University ECE 4/557 Estimation Theory Ver. 1.26 2 Terminology Estimators as Random Variables � � • Suppose we have N independent, identically-distributed (i.i.d.) • Our estimator is a function of the measurements ˆ { x i }| N θ i =1 observations { x i }| N i =1 • It is therefore a random variable • Ideally we would like to know the pdf of the data • It will be different for every different set of observations f ( x ; θ ) • It is called an estimate or, if θ is a scalar, a point estimate where θ ∈ R p × 1 • Of course we want ˆ θ to be as close to the true θ as possible • In probability theory, we think about the “likeliness” of { x i }| N i =1 given the pdf and θ • In inference, we are given { x i }| N i =1 and are interested in the “likeliness” of θ • Called the sampling distribution • We will use θ to denote the parameter (or vector of parameters) we wish to estimate • This could be, for example, the process mean µ x J. McNames Portland State University ECE 4/557 Estimation Theory Ver. 1.26 3 J. McNames Portland State University ECE 4/557 Estimation Theory Ver. 1.26 4

  2. Natural Estimators Good Estimators N � � = 1 � µ x = ˆ { x i }| N ˆ θ x i θ (ˆ f ˆ θ ) i =1 N n =1 • This is the obvious or “natural” estimator of the process mean ˆ θ • Sometimes called the average or sample mean θ • It will also turn out to be the “best” estimator • Without loss of generality, let us consider a scalar parameter θ for • I will define “best” shortly the time being • What is a “good” estimator N – Distribution of ˆ � � θ should be centered at the true value = 1 � x = ˆ { x i }| N µ x ) 2 σ 2 ˆ θ ( x i − ˆ i =1 N – Want the distribution to be as narrow as possible n =1 • Lower-order moments enable coarse measurements of “good” • This is the obvious or “natural” estimator of the process variance • Not the “best” J. McNames Portland State University ECE 4/557 Estimation Theory Ver. 1.26 5 J. McNames Portland State University ECE 4/557 Estimation Theory Ver. 1.26 6 Bias Variance Bias of an estimator ˆ Variance of an estimator ˆ θ of a parameter θ is defined as θ of a parameter θ is defined as �� 2 � B (ˆ θ ) � E[ˆ �� � θ ] − θ var(ˆ � ˆ ˆ θ ) = σ 2 � � θ � E θ − E θ ˆ � • Unbiased : an estimator is said to be unbiased if B (ˆ θ ) = 0 • A measure of the spread of ˆ θ about its mean • This implies the pdf of the estimator is centered at the true value θ • Would like the variance to be as small as possible • The sample mean is unbiased • The estimator of variance on the earlier slide is biased • Unbiased estimators are generally good, but they are not always best (more later) J. McNames Portland State University ECE 4/557 Estimation Theory Ver. 1.26 7 J. McNames Portland State University ECE 4/557 Estimation Theory Ver. 1.26 8

  3. Bias-Variance Tradeoff The Bias-Variance Tradeoff θ (ˆ θ (ˆ f ˆ θ ) f ˆ θ ) θ (ˆ θ (ˆ f ˆ θ ) f ˆ θ ) ˆ ˆ θ θ θ θ ˆ ˆ θ θ θ θ • Understanding of the bias-variance tradeoff is crucial to this course • In many cases minimizing variance conflicts with minimizing bias • Unbiased models are not always best • Note that ˆ θ � 0 has zero variance, but is generally biased • The methods we will use to estimate the model coefficients are biased • In these cases we must trade variance for bias (or vice versa) • But they may be more accurate, because they have less variance • This idea applies to nonlinear models as well J. McNames Portland State University ECE 4/557 Estimation Theory Ver. 1.26 9 J. McNames Portland State University ECE 4/557 Estimation Theory Ver. 1.26 10 Bias, Variance, and Modeling Notation and Prediction Error y = g ( x ) + ε g = g ( x ) g = ˆ ˆ g ( x ) g e = E[ˆ ˆ g ( x )] y ( x ) = g ( x ) + ε • Expectation is taken over the distribution of data sets used to y ( x ) = ˆ ˆ g ( x ) construct ˆ g ( x ) and the distribution of the process noise f ( ε ) • In the modeling context, we are usually interested in estimating a • Everything is a function of x function • Recall that ε is i.i.d. with zero mean • For a given input x , this function is a scalar • We are treating x as a fixed, non-random variable • We can define θ = g ( x ) • The dependence on x is not shown to simplify notation • Thus, all of the ideas that apply to estimating parameters also The prediction error for a new, given input is defined as apply to estimating functional relationships g ) 2 ] PE( x ) = E[( y − ˆ g ) + ε ) 2 ] = E[(( g − ˆ g ) 2 ] + 2 E[( g − ˆ g ) ε ] + E[ ε 2 ] = E[( g − ˆ MSE( x ) + σ 2 = ε J. McNames Portland State University ECE 4/557 Estimation Theory Ver. 1.26 11 J. McNames Portland State University ECE 4/557 Estimation Theory Ver. 1.26 12

  4. The Bias-Variance Tradeoff Derivation Bias-Variance Tradeoff Derivation Continued 1 y = g ( x ) + ε g = g ( x ) g = ˆ ˆ g ( x ) ˆ g e = E[ˆ g ( x )] g e ) 2 − 2( g − ˆ ① = E[( g − ˆ g e )(ˆ g − ˆ g e )] E[ g 2 − 2 g ˆ g 2 g 2 g 2 = g e + ˆ e − 2 g (ˆ g − ˆ g e )] + 2ˆ e − 2ˆ e E[ g 2 − 2 g ˆ g 2 = g e + ˆ e − 2 g ˆ g + 2 g ˆ g e ] • Only ˆ g is a random function E[ g 2 − 2 g ˆ g 2 = g + ˆ e ] • Nothing else is dependent on the data set g 2 − 2 g E[ˆ g 2 = g ] + ˆ e g 2 − 2 g ˆ g ) 2 ] g 2 MSE( x ) = E[( g − ˆ = g e + ˆ e g e ) } 2 ] g e ) 2 = ( g − ˆ = E[ { ( g − ˆ g e ) − (ˆ g − ˆ ⎡ ⎤ Thus g e ) 2 − 2( g − ˆ g e ) 2 = E ⎣ ( g − ˆ g e )(ˆ g − ˆ g e ) + (ˆ g − ˆ ⎦ � �� � � �� � MSE( x ) = ① + ② ① ② g e ) 2 + E[(ˆ g e ) 2 ] = ( g − ˆ g − ˆ g ]) 2 + E � g ]) 2 � = ( g − E[ˆ (ˆ g − E[ˆ J. McNames Portland State University ECE 4/557 Estimation Theory Ver. 1.26 13 J. McNames Portland State University ECE 4/557 Estimation Theory Ver. 1.26 14 Bias-Variance Tradeoff Comments Bias-Variance Tradeoff Comments Continued g ]) 2 + E g ]) 2 + E � g ]) 2 � � g ]) 2 � MSE( x ) = ( g − E[ˆ (ˆ g − E[ˆ MSE( x ) = ( g − E[ˆ (ˆ g − E[ˆ = Bias 2 + Variance = Bias 2 + Variance • Large variance: the model is sensitive to small changes in the • Large variance, small bias data set – If the model is too flexible, it can overfit the data • Large bias: if the model was compared to the true function on a – The model will change dramatically from one data set to large number of data sets, the expected value of the model ˆ g ( x ) another would not be close to the true function g ( x ) – In this case it has high variance, but potentially low variance • If the model is sensitive to small changes in the data, a biased • Small variance, large bias model may have smaller error ( MSE ) than an unbiased model – If the model is not very flexible, it may not capture the true relationship between the inputs and the output • If the data is strongly collinear, biased models can result in more accurate models! – It will not vary as much from one data set to another – In this case the model has low variance, but potentially high bias J. McNames Portland State University ECE 4/557 Estimation Theory Ver. 1.26 15 J. McNames Portland State University ECE 4/557 Estimation Theory Ver. 1.26 16

Recommend


More recommend