Linear Models Overview Topic Introduction & Justification • Introduction & Model Assumptions We are discussing linear models for the following reasons • Matrix Representation • In many practical situations, the assumptions underlying linear models do apply • Coefficient Estimation • The theory and methods are very rich • Least Squares • These simple models often work reasonably well on difficult • Properties problems • Analysis of Variance • They demonstrate the broad range of information that can be • Performance Coefficients obtained from a model. Would like similar information from • Statistical Inferences nonlinear models. • Prediction Errors • Linear models form the core of many nonlinear models • “Many” relationships in data are simple (monotonic) J. McNames Portland State University ECE 4/557 Linear Models Ver. 1.27 1 J. McNames Portland State University ECE 4/557 Linear Models Ver. 1.27 2 Notation Notation Continued • Data set is of fixed size n • Output Variables • Consider the i th set of input-output points in the data set – Assume a single model output • Input variables – This does not reduce the generality – The inputs of the i th set can be collected into a vector x i – Could create a separate linear model for each output – x i is a column vector with p − 1 elements, x i ∈ R p − 1 – More efficient method computationally (see me if interested) – Thus, there are p − 1 model inputs – y i : output of the i th set of input-output points – Individual inputs of a vector of inputs will be written as x i,j • Will also use ambiguous notation for outputs • Notation for vectors – Sometimes y is a vector of all the outputs in the data set – Sometimes x i will represent a column vector of all the model – Other times y is a scalar output due to a single input vector x inputs in the i th set of input-output pairs in the data set – Will try to keep clear by context and use of boldface for – Other times x i will represent the i th point (scalar) in the input vectors and matrices vector x • Note that in this set of notes, random variables are not – May also denote as x · ,i represented with capital letters (e.g., y , ε ) – Distinction will be clear from context and use of boldface for vectors and matrices J. McNames Portland State University ECE 4/557 Linear Models Ver. 1.27 3 J. McNames Portland State University ECE 4/557 Linear Models Ver. 1.27 4
Statistical Model for the Data: Assumptions Linear Model Comments ⎛ ⎞ p − 1 Statistical Model Regression Model � ⎠ + ε i y = ω 0 + ω j x j ⎝ ⎛ ⎞ p − 1 p − 1 � � j =1 ⎠ + ε y = ω 0 + ω j x j y = w 0 + ˆ w j x j ⎝ With linear models, we make the following key assumptions: j =1 j =1 • The output variable is related to the input variables by the linear • The p model parameters ( ω j ) are unknown relationship shown above • Our goal: estimate y given an input vector x and a data set • All of the inputs x i,j are known exactly (non-random) • The noise term, ε , is independent of x and therefore unpredictable – Fortunately, the optimal estimator for the random inputs case is the same • E[ y ] = ω 0 + � p − 1 j =1 ω j x j • ε is a random error term • Our estimate (regression model) will be of the same form – ε i has zero mean: E[ ε i ] = 0 • If w j = ω j , ˆ y = E[ y ] – ε has constant variance: E[ ε 2 i ] = E[ ε 2 k ] = σ 2 ε for all i, k • E[ y ] is the optimal model in the sense of minimum MSE – ε i and ε k are uncorrelated for i � = k • Our goal is equivalent to estimating the model parameters ω j – We will sometimes assume ε i ∼ N (0 , σ 2 ε ) ∀ i J. McNames Portland State University ECE 4/557 Linear Models Ver. 1.27 5 J. McNames Portland State University ECE 4/557 Linear Models Ver. 1.27 6 Where Does ε Come From? Linear Model Performance Limits Observed p − 1 p − 1 � � Output Variables Process y = ω 0 + ω j x j + ε y = w 0 + ˆ w j x j x 1 ,...,x n y c j =1 j =1 Unobserved Observed • Even in the best case scenario w j = ω j , our model will not be Variables Variables z 1 ,...,z n x n ,...,x p-1 perfect: y − E[ y ; x ] = y − ˆ y = ε z c+ 1 • The random component of y is not predictable • In general, y = f ( x 1 , . . . , x p − 1 , z 1 , . . . , z n z ) • ε represents the net contribution of the unmeasured effects on the output y • We only have access to x 1 , . . . , x p − 1 • For this topic, we are assuming that the output has the following • ε is unpredictable given the data set and the model inputs relationship: y = ω 0 + � p − 1 j =1 ω j x j + f z ( z 1 , . . . , z n z ) • Since the ε is the only random variable in the expression for y , it is easy to show that var( y ) = σ 2 • Here ε = f z ( z 1 , . . . , z n z ) accounts for the effect of all these ε unknown variables • Recall that var( c + X ) = var( X ) where X is a random variable • If ε is of the form ε = � n z j =1 α j z j , the CLT applies and and c is a constant ε ∼ N (0 , σ 2 ε ) J. McNames Portland State University ECE 4/557 Linear Models Ver. 1.27 7 J. McNames Portland State University ECE 4/557 Linear Models Ver. 1.27 8
Geometric Interpretation Example 1: Linear Surface • The process parameter ω i represents how much influence the 1−Dimensional Response Surface Example 2 model input x i (scalar) has on the process output y : Data Set 1.5 Approximation ∂y True Model = ω i 1 ∂x i 0.5 • This is useful information that many nonlinear models lack 0 • Geometrically, ˆ y = f ( x ) represents a hyperplane Output y −0.5 • Also called the response surface −1 • For a single input the response surface is is a line −1.5 −2 −2.5 −3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Input x J. McNames Portland State University ECE 4/557 Linear Models Ver. 1.27 9 J. McNames Portland State University ECE 4/557 Linear Models Ver. 1.27 10 Example 1: MATLAB Code Practical Application function [] = LinearSurface(); • Often the assumptions we make will be invalid close all; • People (engineers, researchers, etc.) uses these models profusely FigureSet(1,4.5,2.8); N = 20; anyway x = rand(N,1); y = 2*x - 1 + randn(N,1); • In most ways, linear models are robust xh = 0:0.01:1; A = [ones(N,1) x]; b = y; • They usually still generate reasonable fits even if the assumptions w = pinv(A)*b; yh = w(1) + w(2)*xh; % y_hat are incorrect yt = -1 + 2 *xh; % y_true • Nonlinear models make fewer assumptions, but have other h = plot(x,y,’ko’,xh,yh,’b’,xh,yt,’r’); set(h(1),’MarkerSize’,2); set(h(1),’MarkerFaceColor’,’k’); problems set(h(1),’MarkerEdgeColor’,’k’); set(h,’LineWidth’,1.5); • There is no perfect method for modeling xlabel(’Input x’); ylabel(’Output y’); title(’1-Dimensional Response Surface Example’); set(gca,’Box’,’Off’); grid on; AxisSet(8); legend(’Data Set’,’Approximation’,’True Model’,2); print -depsc LinearSurface; J. McNames Portland State University ECE 4/557 Linear Models Ver. 1.27 11 J. McNames Portland State University ECE 4/557 Linear Models Ver. 1.27 12
Process Diagram Model Diagram x 1 ω 1 x 1 w 1 ω 0 ε w 0 x 2 ω 2 y x 2 w 2 Σ Σ Σ y ˆ x p − 1 ω p − 1 x p − 1 w p − 1 • Often pseudo block-diagrams are used to show the flow of • Can draw a similar diagram for the regression model computation in the statistical model or regression model • We want w j ≈ ω j so that ˆ y ≈ E [ y ; x ] • The diagram above illustrates how the output of the statistical • We cannot estimate ε from knowledge of x model is generated • Our primary goal is to estimate y • An equivalent goal is to estimate the parameters ω i , if the statistical model is accurate J. McNames Portland State University ECE 4/557 Linear Models Ver. 1.27 13 J. McNames Portland State University ECE 4/557 Linear Models Ver. 1.27 14 Nominal Data & Classification Efficient Nominal Data Encoding • Linear models can be used with nominal data as well as interval • Suppose the process output (or input) is one of four colors: red, • Use a simple transformation blue, yellow, or green • This best way to encode this is to declare a model output for each • Example: � category 1 if male x i = Category y 1 y 2 y 3 y 4 0 if female Red +1 -1 -1 -1 • Can also used for classification Blue -1 +1 -1 -1 Yellow -1 -1 +1 -1 • For example, if the data set outputs are encoded as follows Green -1 -1 -1 +1 � 1 if success • This example requires constructing 4 models y = 0 if failure • For new inputs, the output of the model will not be binary then the decision rule for predicted outputs during application of • For example, y 1 = 0 . 6 , y 2 = − 1 . 1 , y 3 = 0 . 8 , y 4 = 0 the model could be • How do you choose what the final nominal output is? If y < 0 . 5 , declare failure. • This yields additional information about the inputs Otherwise, declare success. J. McNames Portland State University ECE 4/557 Linear Models Ver. 1.27 15 J. McNames Portland State University ECE 4/557 Linear Models Ver. 1.27 16
Recommend
More recommend