Estimation Theory Overview Introduction Up until now we have - PowerPoint PPT Presentation

Estimation Theory Overview Introduction • Up until now we have defined and discussed properties of random • Properties variables and processes • Bias, Variance, and Mean Square Error • In each case we started with some known property (e.g. • Cram´ er-Rao lower bound autocorrelation) and derived other related properties (e.g. PSD) • Maximum likelihood • In practical problems we rarely know these properties apriori • Consistency • In stead, we must estimate what we wish to know from finite sets • Confidence intervals of measurements • Properties of the mean estimator • Properties of the variance estimator • Examples J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.09 1 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.09 2 Terminology Estimators as Random Variables • Suppose we have N observations { x ( n ) }| N − 1 � � • Our estimator is a function of the measurements ˆ { x ( n ) }| N − 1 collected from a θ 0 0 WSS stochastic process • It is therefore a random variable • This is one realization of the random process { x ( n, ζ ) } N − 1 0 • It will be different for every different set of observations • Ideally we would like to know the joint pdf • It is called an estimate or, if θ is a scalar, a point estimate f ( x 1 , x 2 , . . . , x n ; θ 1 , θ 2 , . . . , θ p ) • Of course we want ˆ θ to be as close to the true θ as possible • Here θ are unknown parameters of the joint pdf • In probability theory, we think about the likeliness of { x ( n ) }| N − 1 0 given the pdf and θ • In inference, we are given { x ( n ) }| N − 1 and are interested in the 0 likeliness of θ • Called the sampling distribution • We will use θ to denote a scalar parameter (or θ for a vector of parameters) we wish to estimate J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.09 3 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.09 4

Natural Estimators Good Estimators N − 1 = 1 � � µ x = ˆ { x ( n ) }| N − 1 � ˆ θ x ( n ) θ (ˆ f ˆ θ ) 0 N n =0 ˆ • This is the obvious or “natural” estimator of the process mean θ θ • Sometimes called the average or sample mean • What is a “good” estimator? • It will also turn out to be the “best” estimator – Distribution of ˆ θ should be centered at the true value • I will define “best” shortly – Want the distribution to be as narrow as possible N − 1 = 1 � � • Lower-order moments enable coarse measurements of “goodness” x = ˆ { x ( n ) }| N − 1 µ x ] 2 σ 2 � ˆ θ [ x ( n ) − ˆ 0 N n =0 • This is the obvious or “natural” estimator of the process variance • Not the “best” J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.09 5 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.09 6 Bias Variance Bias of an estimator ˆ Variance of an estimator ˆ θ of a parameter θ is defined as θ of a parameter θ is defined as B (ˆ θ ) � E[ˆ �� 2 � θ ] − θ � var(ˆ � ˆ θ − E[ˆ θ ) = σ 2 θ � E θ ] � � ˆ � Normalized Bias of an estimator ˆ θ of a non-negative parameter θ is Normalized Standard deviation of an estimator ˆ defined as θ of a non-negative ε b � B (ˆ θ ) parameter θ is defined as ε r � σ ˆ θ θ θ • Unbiased : an estimator is said to be unbiased if B (ˆ θ ) = 0 • A measure of the spread of ˆ θ about its mean • This implies the pdf of the estimator is centered at the true value θ • Would like the variance to be as small as possible • The sample mean is unbiased • The estimator of variance on the earlier slide is biased • Unbiased estimators are generally good, but they are not always best (more later) J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.09 7 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.09 8

Bias-Variance Tradeoff Mean Square Error Mean Square Error of an estimator ˆ θ of a parameter θ is defined as θ (ˆ θ (ˆ f ˆ θ ) f ˆ θ ) � θ − θ | 2 � | ˆ θ + | B (ˆ MSE( θ ) � E = σ 2 θ ) | 2 ˆ Normalized MSE of an estimator ˆ θ of a parameter θ is defined as ˆ ˆ θ θ θ θ ε � MSE(ˆ θ ) θ � = 0 • In many cases minimizing variance conflicts with minimizing bias θ • Note that ˆ θ � 0 has zero variance, but is generally biased • The decomposition of MSE into variance plus bias squared is very • In these cases we must trade variance for bias (or vice versa) similar to the DC and AC decomposition of signal power • We will use MSE as a global measure of estimator performance • Note that two different estimators may have the same MSE , but different bias and variance • This criterion is convenient for building estimators • Creating a problem we can solve J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.09 9 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.09 10 Cram´ er-Rao Lower Bound Cram´ er-Rao Lower Bound Comments 1 1 1 1 var(ˆ var(ˆ θ ) ≥ − � 2 � = − θ ) ≥ � 2 � = − ∂ 2 ln f x ; θ ( x ; θ ) �� ∂ 2 ln f x ; θ ( x ; θ ) �� ∂ ln f x ; θ ( x ; θ ) E ∂ ln f x ; θ ( x ; θ ) E E E ∂θ 2 ∂θ 2 ∂θ ∂θ • Minimum Variance Unbiased (MVU) : Estimators that are both • Efficient Estimator : an unbiased estimate that achieves the unbiased and have the smallest variance of all possible estimators CRLB with equality • Note that these do not necessarily achieve the minimum MSE • If it exists, then the unique solution is given by • Cram´ er-Rao Lower Bound (CRLB) is a lower bound on ∂ ln f x ; θ ( x ; θ ) = 0 unbiased estimators ∂θ • Derived in text where the pdf is evaluated at the observed outcome x ( ζ ) • Log Likelihood Function of θ is ln f x ; θ ( x ; θ ) • Maximum Likelihood (ML) Estimate : an estimator that • Note that the pdf f x ; θ ( x ; θ ) describes the distribution of the data satisfies the equation above (stochastic process), not the parameter • This can be generalized to vectors of parameters • Recall that θ is not a random variable, it is a parameter that • Limited use — f x ; θ ( x ; θ ) is rarely known in practice defines the distribution J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.09 11 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.09 12

Consistency Confidence Intervals • Confidence Interval : interval, a ≤ θ ≤ b , that has a specified • Consistent Estimator an estimator such that probability of covering the unknown true parameter value N →∞ MSE(ˆ lim θ ) = 0 Pr { a < θ ≤ b } = 1 − α • Implies the following as the sample size grows ( N → ∞ ) • The interval is estimated from the data, therefore it is also a pair – The estimator becomes unbiased of random variables – The variance approaches zero • Confidence Level : coverage probability of a confidence interval, – The distribution f ˆ θ ( x ) becomes an impulse centered at θ 1 − α • The confidence interval is not uniquely defined by the confidence level • More later J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.09 13 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.09 14 Properties of the Sample Mean Sample Mean Confidence Intervals N − 1 � ˆ µ x � 1 � � 2 � � 1 − 1 µ x − µ x ˆ x ( n ) f ˆ µ x (ˆ µ x ) = √ √ exp √ N 2 2 π ( σ x / N ) σ x / N n =0 E[ˆ µ x ] = µ x N N µ x ) = 1 � 1 − | ℓ | � γ x ( ℓ ) ≤ 1 � � � µ x − k σ x µ x < µ x + k σ x � var(ˆ γ x ( ℓ ) Pr √ < ˆ √ = N N N N N ℓ = − N ℓ = − N � µ x − k σ x µ x + k σ x � Pr ˆ √ < µ x < ˆ √ = 1 − α µ x ) = σ 2 • If x ( n ) is WN, then this reduces to var(ˆ x N N N • The estimator is unbiased • In general, we don’t know the pdf • If γ x ( ℓ ) → 0 as ℓ → ∞ , then var(ˆ µ x ) → 0 (estimator is • If we can assume the process is Gaussian and IID, we know the consistent) pdf (sampling distribution) of the estimator • The variance increases as the correlation of x ( n ) increases • If N is large and the distribution doesn’t have heavy tails, the • In processes with long memory or heavy tails, it is harder to distribution of ˆ µ x is Gaussian by the Central Limit Theorem (CLT) estimate the mean J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.09 15 J. McNames Portland State University ECE 538/638 Estimation Theory Ver. 1.09 16

Estimation Theory Overview Introduction Up until now we have - PowerPoint PPT Presentation

Estimation Theory Overview Introduction Up until now we have defined and discussed properties of random Properties variables and processes Bias, Variance, and Mean Square Error In each case we started with some known property (e.g.

Estimation Theory Overview Introduction Up until now we have defined and discussed properties

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Estimation theory Parametric estimation Properties of estimators Minimum variance

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea

Part 3. Spectrum Estimation Part 3. Spectrum Estimation 3.2 Parametric Methods for Spectral

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Music and Words by Stephen Eisenhauer At JDPS were dragons from now until the end. And this

Outline Introduction Knowledge Structures Parameter Estimation Maximum Likelihood Estimation

State estimation approach to nonstationary Introduction inverse problems State estimation

Review of Estimation Theory Berlin 2003 References: 1. X. Huang et. al., Spoken Language

Chapter 2- -3 3 Chapter 2 Definition of Theory: A theory is a systematic Definition of

Detection and Estimation Theory Lecture 13 Mojtaba Soltanalian- UIC msol@uic.edu

ESTIMATION AS UNCERTAINTY REDUCTION What is this estimation thing, anyway? Michael Godeck

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Point Estimation The goal of Point Estimation is to find the point in -space which gives the

Chapter 8.3. Maximum Likelihood Estimation Prof. Tesler Math 283 Fall 2019 Prof. Tesler 8.3

Parameter Estimation Probability theory tells us what to expect when we carry out some experiment

Bias, Variance and Error Bias and Variance given algorithm that outputs estimate for , we

Estimation & Maximum Likelihood Jonathan Pillow Mathematical Tools for Neuroscience (NEU 314)

Estimation: Sample Complexity and the Bias-Variance Tradeoff CMPUT 296: Basics of Machine

Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical

1 1 easy to compute , 1 easy to compute 2

Lecture 18 Local Methods Sasha Rakhlin Nov 07, 2018 1 / 23 Today: analysis of local

Estimation Theory Overview Introduction Up until now we have - PowerPoint PPT Presentation

Estimation Theory Overview Introduction Up until now we have defined and discussed properties of random Properties variables and processes Bias, Variance, and Mean Square Error In each case we started with some known property (e.g.

Estimation Theory Overview Introduction Up until now we have defined and discussed properties

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Estimation theory Parametric estimation Properties of estimators Minimum variance

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea

Part 3. Spectrum Estimation Part 3. Spectrum Estimation 3.2 Parametric Methods for Spectral

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Music and Words by Stephen Eisenhauer At JDPS were dragons from now until the end. And this

Outline Introduction Knowledge Structures Parameter Estimation Maximum Likelihood Estimation

State estimation approach to nonstationary Introduction inverse problems State estimation

Review of Estimation Theory Berlin 2003 References: 1. X. Huang et. al., Spoken Language

Chapter 2- -3 3 Chapter 2 Definition of Theory: A theory is a systematic Definition of

Detection and Estimation Theory Lecture 13 Mojtaba Soltanalian- UIC msol@uic.edu

ESTIMATION AS UNCERTAINTY REDUCTION What is this estimation thing, anyway? Michael Godeck

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Point Estimation The goal of Point Estimation is to find the point in -space which gives the

Chapter 8.3. Maximum Likelihood Estimation Prof. Tesler Math 283 Fall 2019 Prof. Tesler 8.3

Parameter Estimation Probability theory tells us what to expect when we carry out some experiment

Bias, Variance and Error Bias and Variance given algorithm that outputs estimate for , we

Estimation &amp; Maximum Likelihood Jonathan Pillow Mathematical Tools for Neuroscience (NEU 314)

Estimation: Sample Complexity and the Bias-Variance Tradeoff CMPUT 296: Basics of Machine

Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical

1 1 easy to compute , 1 easy to compute 2

Lecture 18 Local Methods Sasha Rakhlin Nov 07, 2018 1 / 23 Today: analysis of local

Estimation & Maximum Likelihood Jonathan Pillow Mathematical Tools for Neuroscience (NEU 314)