topic 5 non linear relationships and non linear least

Topic 5: Non-Linear Relationships and Non-Linear Least Squares - PowerPoint PPT Presentation

Topic 5: Non-Linear Relationships and Non-Linear Least Squares Non-linear Relationships Many relationships between variables are non-linear. (Examples) OLS may not work (recall A.1). It may be biased and inconsistent. In other situations, we

  1. Topic 5: Non-Linear Relationships and Non-Linear Least Squares Non-linear Relationships Many relationships between variables are non-linear. (Examples) OLS may not work (recall A.1). It may be biased and inconsistent. In other situations, we may still be able to use OLS, either by approximating the non-linear relationship, or by appropriately transforming the population model. 1

  2.  The models we’ ve worked with so far have been linear in the parameters .  They’ve been of the form: 𝒛 = 𝑌𝜸 + 𝜻  Many models based on economic theory are actually non-linear in the parameters.  In general: 𝒛 = 𝑔(𝜾; 𝑌) + 𝜻 where 𝑔 is non-linear.  Note the linear model is a special case. 2

  3. Transforming a non-linear population model Cobb-Douglas production function: 𝑍 = 𝐵𝐿 𝛾 2 𝑀 𝛾 3 𝜁 By taking logs, the Cobb-Douglas production function can be rewritten as: log 𝑍 = 𝛾 1 + 𝛾 2 log 𝐿 + 𝛾 3 log 𝑀 + log⁡ (𝜁) This model now satisfies A.1 (linear in the parameters), however, it is not advisable to estimate by OLS in most cases. Silva and Tenreyro (2006) 1 : If log⁡ (𝜁) is heteroskedastic (it likely is), 𝑌 and 𝜻 are not independent! 1 Silva and Tenreyro (2006). The Log of Gravity. The Review of Economics and Statistics. 3

  4. “It may be surprising that the pattern of heteroscedasticity … can affect the consistency of an estimator, rather than just its efficiency. The reason is that the nonlinear transformation …changes the properties of the error term in a nontrivial way” Approximations Some mathematical properties may be exploited in order to approximate the function 𝑔(𝜾; 𝑌) .  Polynomials  Logarithms  Dummy variables 4

  5. Polynomial Regression Model One way to characterize the non-linear relationship between 𝑧 and 𝑦 is to say that the marginal effect of 𝑦 on 𝑧 depends on the value of 𝑦 itself.  Just include powers of the regressors on the right-hand-side  Not a violation of A.2  e.g. 𝑧 = 𝛾 0 + 𝛾 1 𝑦 + 𝛾 2 𝑦 2 + 𝛾 3 𝑦 3 + ⋯ + 𝜁  Take the derivative  Choosing 𝜸 approximates the non-linear function 𝑔  The validity of the approximation is based on Taylor-series expansion  The appropriate order of the polynomial may be determined through a series of t -tests 5

  6. Logarithms Can take the logarithm of the LHS and/or RHS variables.  The 𝛾 s have approximate percentage-change interpretations  log-lin  lin-log  log-log For example: log 𝑥𝑏𝑕𝑓 =⁡ 𝛾 0 + 𝛾 1 𝑓𝑒𝑣𝑑 + 𝛾 2 𝑔𝑓𝑛𝑏𝑚𝑓 + ⋯ + 𝜁  Take the derivative w.r.t. 𝑓𝑒𝑣𝑑  Change in 𝑓𝑒𝑣𝑑 leads to a multiplicative change of exp(𝛾 1 ) in 𝑥𝑏𝑕𝑓  approximately 100 𝛾 1 % change (approx. based on Taylor-series expansion of exp(𝑦) )  females make 100[ exp(𝛾 2 ) − 1 ]% more than males 6

  7. Dummy variables – Splines Ther e may be a “break” in the model so that it is “piecewise” linear.  Example: wage before and after age = 18.  “knots” and dummy variables  [pictures and notes]  Nothing in the unrestricted estimators to ensure the two functions join at the knot  Use RLS  Multiple knots can be introduced  Location of the knots can be arbitrary, leading to nonparametric kernel regression 7

  8. Non-linear population models There are many situations where transformations/approximations of the non- linear model is not desirable/possible, and the non-linear pop. model should be estimated directly.  CES Production function : −𝜍 + (1 − 𝜀)𝑀 𝑗 −𝑤/𝜍 exp⁡ −𝜍 ] 𝑍 𝑗 = 𝛿[𝜀𝐿 𝑗 (𝜁 𝑗 ) −𝜍 + (1 − 𝜀)𝑀 𝑗 𝑤 −𝜍 ] +𝜁 𝑗 or, 𝑚𝑜(𝑍 𝑗 ) = 𝑚𝑜(𝛿) − ( 𝜍 ) 𝑚𝑜[𝜀𝐿 𝑗  Linear Expenditure System : ( Stone, 1954 ) Max. 𝑉(𝒓) = ∑ 𝛾 𝑗 𝑚𝑜(𝑟 𝑗 − 𝛿 𝑗 ) ( Stone-Geary /Klein-Rubin ) 𝑗 s.t. ∑ 𝑞 𝑗 𝑟 𝑗 = 𝑁 𝑗 8

  9. Yields the following system of demand equations: 𝑞 𝑗 𝑟 𝑗 = 𝛿 𝑗 𝑞 𝑗 + 𝛾 𝑗 (𝑁 − ∑ 𝛿 𝑘 𝑞 𝑘 ) ; i = 1, 2, … ., n 𝑘 The 𝛾 𝑗 ’s are the Marginal Budget Shares . So, we require that 0 < 𝛾 𝑗 < 1 ; i = 1, 2, …., n .  Box-Cox transform (often applied to positive valued variables  “Limited dependent variables” o y must be positive (or negative) o y is a dummy o y is an integer 9

  10. In general, suppose we have a single non-linear equation: 𝑧 𝑗 = 𝑔(𝑦 𝑗1 , 𝑦 𝑗2 , … , 𝑦 𝑗𝑙 ; 𝜄 1 , 𝜄 2 , … , 𝜄 𝑞 ) + 𝜁 𝑗  We can still consider a “Least Squares” approach.  The Non-Linear Least Squares estimator is the vector, 𝜾 ̂ , that minimizes the 𝟑 ̂)] quantity: 𝑇(𝑌, 𝜾) = ∑ [𝑧 𝑗 − 𝑔 𝑗 (𝑌, 𝜾 . 𝒋  Clearly the usual LS estimator is just a special case of this.  To obtain the estimator, we differentiate S with respect to each element of ̂ ; set up the “ p ” first -order conditions and solve. 𝜾  Difficulty – usually, the first-order conditions are themselves non-linear in the unknowns (the parameters).  This means there is (generally) no exact, closed-form, solution.  Can’t write down an explicit formula for the estimators of parameter s. 10

  11. Example 𝑧 𝑗 = 𝜄 1 + 𝜄 2 𝑦 𝑗2 + 𝜄 3 𝑦 𝑗3 + (𝜄 2 𝜄 3 )𝑦 𝑗4 + 𝜁 𝑗 𝑇 = ∑[𝑧 𝑗 − 𝜄 1 − 𝜄 2 𝑦 𝑗2 − 𝜄 3 𝑦 𝑗3 − (𝜄 2 𝜄 3 )𝑦 𝑗4 ] 2 𝑗 𝜖𝑇 = −2 ∑[ 𝑧 𝑗 − 𝜄 1 − 𝜄 2 𝑦 𝑗2 − 𝜄 3 𝑦 𝑗3 − ( 𝜄 2 𝜄 3 ) 𝑦 𝑗4 ] 𝜖𝜄 1 𝑗 𝜖𝑇 = −2 ∑[(𝜄 3 𝑦 𝑗4 + 𝑦 𝑗2 )(𝑧 𝑗 − 𝜄 1 − 𝜄 2 𝑦 𝑗2 − 𝜄 3 𝑦 𝑗3 − 𝜄 2 𝜄 3 𝑦 𝑗4 )] 𝜖𝜄 2 𝑗 𝜖𝑇 = −2 ∑[(𝜄 2 𝑦 𝑗4 + 𝑦 𝑗3 )(𝑧 𝑗 − 𝜄 1 − 𝜄 2 𝑦 𝑗2 − 𝜄 3 𝑦 𝑗3 − 𝜄 2 𝜄 3 𝑦 𝑗4 )] 𝜖𝜄 3 𝑗 11

  12. Setting these 3 equ ations to zero, we can’t solve analytically for the estimators of the three parameters.  In situations such as this, we need to use a numerical algorithm to obtain a solution to the first-order conditions.  Lots of methods for doing this – one possibility is Newton’s algorithm (the Newton-Raphson algorithm ). Methods of Descent ̃ = 𝜾 0 + 𝑡⁡𝒆(𝜾 0 ) 𝜾⁡ 𝜾 0 = initial (vector) value. s = step-length (positive scalar) 𝒆(. ) = direction vector 12

  13.  Usually, 𝒆(. ) Depends on the gradient vector at 𝜾 0 .  It may also depend on the change in the gradient (the Hessian matrix) at 𝜾 0 .  Some specific algorithms in the “family” make the step -length a function of the Hessian.  One very useful, specific member of the family of “Descent Methods” is the Newton-Raphson algorithm : Suppose we want to minimize some function, 𝑔(𝜾) . ̃ , the vector Approximate the function using a Taylor’s series expansion about 𝜾 value that minimizes 𝑔(𝜾) : ′ [ 𝜖 2 𝑔 ′ (𝜖𝑔 ̃ + 1 ̃) + (𝜾 − 𝜾 ̃) ̃) ̃) 𝑔(𝜾) ≅ 𝑔(𝜾 𝜖𝜾) 2! (𝜾 − 𝜾 𝜖𝜾𝜖𝜾 ′ ] (𝜾 − 𝜾 𝜾 ̃ 𝜾 13

  14. Or: ̃) + 1 ′ 𝑕(𝜾 ′ 𝐼(𝜾 ̃) + (𝜾 − 𝜾 ̃) ̃) ̃)(𝜾 − 𝜾 ̃) 𝑔(𝜾) ≅ 𝑔(𝜾 2! (𝜾 − 𝜾 So, 𝜖𝑔(𝜾) ̃) + 1 ′ 𝑕(𝜾 ̃) ̃)(𝜾 − 𝜾 ̃) ≅ 0 + (𝜾 − 𝜾 2! 2𝐼(𝜾 𝜖𝜾 ̃ ) = 0 ; as 𝜾 ̃ locates a minimum. However, 𝑕 (𝜾 So, 𝜖𝑔(𝜾) ̃) ≅ 𝐼 −1 (𝜾 ̃) ( (𝜾 − 𝜾 𝜖𝜾 ) ; ̃ ≅ 𝜾 − 𝐼 −1 (𝜾 ̃)𝑕(𝜾) or, 𝜾 14

  15. This suggests a numerical algorithm: Set 𝜾 = 𝜾 0 to begin, and then iterate – 𝜾 1 = 𝜾 0 − 𝐼 −1 (𝜾 1 )𝑕(𝜾 0 ) 𝜾 2 = 𝜾 1 − 𝐼 −1 (𝜾 2 )𝑕(𝜾 1 ) ⋮ ⋮ ⋮ 𝜾 𝑜+1 = 𝜾 𝑜 − 𝐼 −1 (𝜾 𝑜+1 )𝑕(𝜾 𝑜 ) or, approximately: 𝜾 𝑜+1 = 𝜾 𝑜 − 𝐼 −1 (𝜾 𝑜 )𝑕(𝜾 𝑜 ) 15

  16. (𝑗) −𝜄 𝑜 (𝑗) ) (𝜄 𝑜+1 | <⁡𝜁 (𝑗) ; i = 1, 2, …, p Stop if | (𝑗) 𝜄 𝑜 Note: 1. s = 1. 2. 𝒆(𝜾 𝑜 ) = −𝐼 −1 (𝜾 𝑜 )𝑕(𝜾 𝑜 ) . 3. Algorithm fails if H ever becomes singular at any iteration. 4. Achieve a minimum of f (.) if H is positive definite . 5. Algorithm may locate only a local minimum. 6. Algorithm may oscillate . The algorithm can be given a nice geometric interpretation – scalar θ . 16

  17. 𝜖𝑔(𝜄) = 𝑕(𝜄) = 0 . To find an extremum of f (.), solve 𝜖𝜄 ⁡⁡⁡⁡𝑕 𝜄 𝑛𝑗𝑜 𝜄 𝜄 1 𝜄 0 17

  18. ⁡⁡⁡⁡𝑕 𝜄 𝑛𝑏𝑦 𝜄 𝑛𝑗𝑜 𝜄 𝜄 1 𝜄 2 𝜄 0 18

  19. ⁡⁡⁡⁡𝑕 𝑕(𝜄 0 ) = 𝐼(𝜄 0 ) 𝜄 0 − 𝜄 1 ⁡⁡⁡⇒⁡⁡⁡⁡⁡⁡⁡⁡𝜄 1 = 𝜄 0 − 𝐼 −1 (𝜄 0 )𝑕(𝜄 0 ) ⁡⁡⁡⁡⁡⁡⁡⁡𝜾 𝒐+𝟐 = 𝜾 𝒐 − 𝑰 −𝟐 (𝜾 𝒐 )𝒉(𝜾 𝒐 ) 𝜄 𝑛𝑗𝑜 𝜄 𝜄 1 𝜄 0 19

  20. If 𝑔(𝜾) is quadratic in 𝜾 , then the algorithm converges in one iteration: ⁡⁡⁡⁡𝑕 If the function is quadratic, then its gradient is linear: 𝜄 𝑛𝑗𝑜 𝜄 𝜄 1 𝜄 0 20

  21. In general, different choices of 𝜄 0 may lead to different solutions, or no solution at all. ⁡⁡⁡⁡𝑕 𝜄 𝑛𝑏𝑦 𝜄 𝑛𝑏𝑦 𝜄 𝑛𝑗𝑜 𝜄 𝑛𝑗𝑜 𝜄 𝜄 0 21


More recommend