combining forecasts forty years later
play

Combining Forecasts Forty Years Later Kenneth F. Wallis Emeritus - PowerPoint PPT Presentation

6 th Eurostat Colloquium on Modern Tools for Business Cycle Analysis Luxembourg, 26-29 September 2010 Combining Forecasts Forty Years Later Kenneth F. Wallis Emeritus Professor of Econometrics, University of Warwick


  1. 6 th Eurostat Colloquium on Modern Tools for Business Cycle Analysis Luxembourg, 26-29 September 2010 Combining Forecasts – Forty Years Later Kenneth F. Wallis Emeritus Professor of Econometrics, University of Warwick http://go.warwick.ac.uk/kfwallis This paper is dedicated to the memory of Clive Granger. It will appear in a special issue of Applied Financial Economics in his memory.

  2. 1. Introduction A major theme of “Combining forecasts – twenty years later” (Granger, 1989) is the relevance of information sets, discussion of which “is an essential feature of all forecast situations, including combining.” This paper returns to two of the topics in that article that are of much current interest, some forty years after Bates and Granger’s seminal article on “The combination of forecasts”: • the impact on the original point forecast combination result of forecasters having different information sets, with each forecaster using their information efficiently. (In the original result, the efficiency gain rests on the component forecasts being inefficient.) • properties of different methods of combining (pooling) density forecasts, where the quantile-based methods proposed in “Twenty years later” have not been taken up, for reasons I have discussed in a previous paper (Wallis, 2005). 1

  3. 2. Point forecasts with different information sets Each one of N forecasters has an information set with a public and a private component: public: current and past values of variables t z = private: … variables x , j 1,..., N ; all components are mutually independent. jt The universal information set t U consists of all information available to all forecasters; if t U were known, the optimum (l.l.s.) one-step-ahead forecast of target variable y would be N ( ) + ∑ = = α β E E y U ( ) B z ( ) B x . (1) + t t 1 t t j jt = j 1 = α + β = Individual forecasts: E ( ) B z ( ) B x , j 1,..., N (2) jt t j jt N N 1 1 ∑ ∑ = = α + β Combined forecast: E E ( ) B z ( ) B x (3) t jt t j jt N N = = j 1 j 1 E is based on “… aggregating forecasts is not the same as aggregating information sets. t all the information, but is not equal to E , as the information is not being used efficiently.” t 2

  4. However, if yet another forecast is constructed, using only the public information, namely = α ( ) E B z (4) 0 t t “then it is seen immediately that the optimal forecast E can be achieved by t N ∑ = − − E E ( N 1) E .” (5) t jt 0 t = j 1 (Equations and quotations taken from “Twenty years later”) E and E . The inefficiency-of-mean-forecasts rests on a comparison of expressions for t t I then slightly modify Granger’s setting to obtain explicit expressions for the mean square error (MSE) of the various forecasts, with inequalities that deliver the inefficiency result. These inequalities match those given by Kim, Lim and Shaw (2001) in their independent rediscovery of the inefficiency-of-mean-forecasts result (their term). Forecasters in their model receive a public and a private signal, each equal to the target variable plus random noise. In my setting the source of random variation is the innovation in the DGP, as usual. 3

  5. However Kim, Lim and Shaw do not consider the possibility of constructing a fully efficient forecast, since “neither … common nor private information is separately observable”. Instead they consider the exploitation of a sequence of signals and associated forecast revisions in a fixed-event forecasting context. Crowe (2010) adopts this framework to analyse the monthly forecasts for current-year and next-year economic growth collected by Consensus Economics in 38 countries. In fixed-event forecasting the public information includes previously-published mean forecasts of the same target variable, and hence the successive revisions to those mean forecasts. Crowe finds that use of this information could have achieved efficiency gains of some 5% RMSE in out-of-sample mean current-year forecasts for 2007 and 2008. 4

  6. 3. Linear and logarithmic combination of density forecasts = Consider N individual density forecasts f ( ), y j 1,..., N , at some time, horizon, etc. j N = ∑ ≥ = Σ = . Linear combination: f ( ) y w f ( ) y with weights w 0, j 1,..., N , w 1 C j j j j = j 1 ( ) = μ σ = 2 N ( ) , , 1,..., , ( ) If f y j N then f y is a mixture of normals (Pearson, 1894). j j j C For any distributional forms, the combined density has mean and variance: N N N = ∑ ( ) ∑ ∑ 2 μ μ σ = σ + μ − μ 2 2 w and w w . C j j C j j j j C = = = j 1 j 1 j 1 = In the case of equal weights, w 1 N , the variance equation can be interpreted as: j aggregate uncertainty = average individual uncertainty + disagreement. 5

  7. Logarithmic combination: = ∏ w j f ( ) y j This is usually written in geometric form: f ( ) y G ∏ ∫ w j f ( ) y dy j ( ) ( ) = μ σ 2 = = N μ σ 2 N If f ( ) y , , j 1,..., N , then f ( ) y , , where j j j G G G w μ μ N N = ∑ = ∑ 1 1 j G w and . j j σ σ σ σ 2 2 2 2 = = j 1 j 1 G j G j In the case of equal weights: σ 2 is the harmonic mean of the individual variances; this is less than their arithmetic mean, G σ 2 which is less than the finite mixture variance ; C μ is the linear combination of the μ with inverse variance weights. G j 6

  8. Example We add a logarithmic combination to some of the competing forecasts that appear in the simulation study of Mitchell and Wallis (2010). The data-generating process is the Gaussian second-order autoregression: ( ) = φ + φ + ε ε σ 2 N Y Y Y , ~ 0, − − ε t 1 t 1 2 t 2 t t ( ) ( ) ρ = φ − φ ρ = φ ρ + φ σ = − φ ρ − φ ρ σ 2 2 . 1 , , 1 ε 1 1 2 2 1 1 2 1 1 2 2 y y − and y − , the model and its The universal information set comprises the observations t 1 t 2 parameter values, so the optimal or “ideal” density forecast of y is t ( ) = φ + φ σ 2 N f ( y ) y y , . − − ε o t 1 t 1 2 t 2 Two variant density forecasts, and their linear and logarithmic combinations (with equal weights), are constructed as follows: 7

  9. ( ) ( ) = N ρ σ σ = − ρ σ 2 2 2 2 AR1 forecaster: , f ( y ) y , 1 − 1 t 1 t 1 1 1 1 y ( ) ( ) = N ρ σ σ = − ρ σ 2 2 2 2 AR2 (same, with data delay): , f ( y ) y , 1 − 2 t 2 t 2 2 2 2 y ( ) ( ) = ρ σ + ρ σ 2 2 N N Linear combination: f ( y ) 0.5 y , 0.5 y , − − C t 1 t 1 1 2 t 2 2 ( ) = N μ σ 2 f ( y ) , Logarithmic combination: , with G t G G σ ρ + σ ρ σ σ 2 2 2 2 y y 2 − − μ = σ = 2 2 1 t 1 1 2 t 2 1 2 , . G G σ + σ σ + σ 2 2 2 2 1 2 1 2 The AR1 and AR2 forecasts are the correct conditional distributions with respect to their specific information sets. So we expect to find that they satisfy probabilistic calibration, but not complete calibration; that is, they have PITs that are U (0,1) but are not independent. The two combined forecasts satisfy neither of these calibration concepts. 8

  10. σ = in all cases) 2 1 Table 1 . Simulation design and density forecast variances ( ε Parameter Autocorrel’n Density forecast variance φ φ ρ ρ σ σ σ σ 2 2 2 2 Case 1 2 1 2 1 2 C G (1) 1.5 –0.6 0.94 0.81 1.56 4.52 3.40 2.32 (2) 0.15 0.2 0.19 0.23 1.04 1.02 1.05 1.03 (3) 0 0.95 0 0.95 10.26 1 7.94 1.82 (4) –0.5 0.3 –0.71 0.66 1.10 1.27 1.34 1.18 σ σ σ and σ , in all four cases 2 2 2 2 • is less than , and intermediate between G C 1 2 σ 2 σ and 2 σ in two cases 2 • exceeds both C 1 2 9

  11. Figure 1. PIT histograms for the autoregressive example Rows: cases (1)-(4). Columns: forecasts, respectively AR1, AR2, linear pool, log pool 10

  12. Table 2 . Tests of complete calibration and forecast performance: rejection percentages at nominal 5% level Case (1) Case (2) Case (3) Case (4) Forecast Bk KLIC Bk KLIC Bk KLIC Bk KLIC AR1 100 98 17 25 99 100 62 46 AR2 100 100 30 15 5.6 n.a. 97 93 Lin combn 100 100 14 10 100 100 62 55 Log combn 100 100 12 7 99 84 37 25 Note : Bk is the likelihood ratio test of Berkowitz (2001); KLIC is a test of KLIC differences vs. the ideal forecast (Mitchell and Hall, 2005) • although AR1 and AR2 are probabilistically calibrated, no forecast is completely calibrated (except AR2 in case (3)) 11

Recommend


More recommend