fundamentals of prequential analysis
play

Fundamentals of Prequential Analysis Philip Dawid Statistical - PowerPoint PPT Presentation

Fundamentals of Prequential Analysis Philip Dawid Statistical Laboratory University of Cambridge 1 / 36 Forecasting Context and purpose One-step Forecasts Time development Some comments Forecasting systems Absolute assessment


  1. Some comments Forecast type: Pretty arbitrary: e.g. Forecasting Context and purpose One-step Forecasts Point forecast � Time development ⊲ Some comments Action � Probability distribution Forecasting systems � Absolute assessment Comparative assessment Prequential efficiency Model choice Conclusions 6 / 36

  2. Some comments Forecast type: Pretty arbitrary: e.g. Forecasting Context and purpose One-step Forecasts Point forecast � Time development ⊲ Some comments Action � Probability distribution Forecasting systems � Absolute assessment Black-box: Not interested in the truth/beauty/. . . of any Comparative assessment theory underlying our forecasts—only in their performance Prequential efficiency Model choice Conclusions 6 / 36

  3. Some comments Forecast type: Pretty arbitrary: e.g. Forecasting Context and purpose One-step Forecasts Point forecast � Time development ⊲ Some comments Action � Probability distribution Forecasting systems � Absolute assessment Black-box: Not interested in the truth/beauty/. . . of any Comparative assessment theory underlying our forecasts—only in their performance Prequential Close to the data: Concerned only with realized data and efficiency forecasts — not with their provenance, what might have Model choice Conclusions happened in other circumstances, hypothetical repetitions,. . . 6 / 36

  4. Some comments Forecast type: Pretty arbitrary: e.g. Forecasting Context and purpose One-step Forecasts Point forecast � Time development ⊲ Some comments Action � Probability distribution Forecasting systems � Absolute assessment Black-box: Not interested in the truth/beauty/. . . of any Comparative assessment theory underlying our forecasts—only in their performance Prequential Close to the data: Concerned only with realized data and efficiency forecasts — not with their provenance, what might have Model choice Conclusions happened in other circumstances, hypothetical repetitions,. . . No peeping: Forecast of X i +1 made before its value is observed — unbiased assessment 6 / 36

  5. Forecasting Forecasting ⊲ systems Probability Forecasting Systems Statistical Forecasting Systems Prequential consistency Absolute assessment Forecasting systems Comparative assessment Prequential efficiency Model choice Conclusions 7 / 36

  6. Probability Forecasting Systems Very general idea, e.g. : Forecasting Forecasting systems Probability Forecasting ⊲ Systems Statistical Forecasting Systems Prequential consistency Absolute assessment Comparative assessment Prequential efficiency Model choice Conclusions 8 / 36

  7. Probability Forecasting Systems Very general idea, e.g. : Forecasting Forecasting systems No system: e.g. day-by-day weather forecasts Probability Forecasting ⊲ Systems Statistical Forecasting Systems Prequential consistency Absolute assessment Comparative assessment Prequential efficiency Model choice Conclusions 8 / 36

  8. Probability Forecasting Systems Very general idea, e.g. : Forecasting Forecasting systems No system: e.g. day-by-day weather forecasts Probability Forecasting ⊲ Probability model: Fully specified joint distribution P for X Systems Statistical (allow arbitrary dependence) Forecasting Systems Prequential consistency Absolute assessment Comparative assessment Prequential efficiency Model choice Conclusions 8 / 36

  9. Probability Forecasting Systems Very general idea, e.g. : Forecasting Forecasting systems No system: e.g. day-by-day weather forecasts Probability Forecasting ⊲ Probability model: Fully specified joint distribution P for X Systems Statistical (allow arbitrary dependence) Forecasting Systems Prequential probability forecast f i +1 = P ( X i +1 | X i = x i ) consistency � Absolute assessment Comparative assessment Prequential efficiency Model choice Conclusions 8 / 36

  10. Probability Forecasting Systems Very general idea, e.g. : Forecasting Forecasting systems No system: e.g. day-by-day weather forecasts Probability Forecasting ⊲ Probability model: Fully specified joint distribution P for X Systems Statistical (allow arbitrary dependence) Forecasting Systems Prequential probability forecast f i +1 = P ( X i +1 | X i = x i ) consistency � Absolute assessment Family P = { P θ } of joint distributions for Statistical model: Comparative assessment X Prequential efficiency Model choice Conclusions 8 / 36

  11. Probability Forecasting Systems Very general idea, e.g. : Forecasting Forecasting systems No system: e.g. day-by-day weather forecasts Probability Forecasting ⊲ Probability model: Fully specified joint distribution P for X Systems Statistical (allow arbitrary dependence) Forecasting Systems Prequential probability forecast f i +1 = P ( X i +1 | X i = x i ) consistency � Absolute assessment Family P = { P θ } of joint distributions for Statistical model: Comparative assessment X Prequential forecast f i +1 = P ∗ ( X i +1 | X i = x i ) , where P ∗ is formed efficiency � Model choice from P by somehow estimating/eliminating θ , using the Conclusions currently available data X i = x i 8 / 36

  12. Probability Forecasting Systems Very general idea, e.g. : Forecasting Forecasting systems No system: e.g. day-by-day weather forecasts Probability Forecasting ⊲ Probability model: Fully specified joint distribution P for X Systems Statistical (allow arbitrary dependence) Forecasting Systems Prequential probability forecast f i +1 = P ( X i +1 | X i = x i ) consistency � Absolute assessment Family P = { P θ } of joint distributions for Statistical model: Comparative assessment X Prequential forecast f i +1 = P ∗ ( X i +1 | X i = x i ) , where P ∗ is formed efficiency � Model choice from P by somehow estimating/eliminating θ , using the Conclusions currently available data X i = x i Collection of models e.g. forecast X i +1 using model that has performed best up to time i 8 / 36

  13. Statistical Forecasting Systems Forecasting —based on a statistical model P = { P θ } for X . Forecasting systems Probability Forecasting Systems Statistical Forecasting ⊲ Systems Prequential consistency Absolute assessment Comparative assessment Prequential efficiency Model choice Conclusions 9 / 36

  14. Statistical Forecasting Systems Forecasting —based on a statistical model P = { P θ } for X . Forecasting systems Probability Given the past data x i , construct Plug-in forecasting system Forecasting Systems some estimate ˆ Statistical θ i of θ ( e.g. , by maximum likelihood), and Forecasting ⊲ Systems proceed as if this were the true value: Prequential consistency θ i ( X i +1 | x i ) . P ∗ i +1 ( X i +1 ) = P ˆ Absolute assessment Comparative assessment NB: This requires re-estimating θ with each new observation! Prequential efficiency Model choice Conclusions 9 / 36

  15. Statistical Forecasting Systems Forecasting —based on a statistical model P = { P θ } for X . Forecasting systems Probability Given the past data x i , construct Plug-in forecasting system Forecasting Systems some estimate ˆ Statistical θ i of θ ( e.g. , by maximum likelihood), and Forecasting ⊲ Systems proceed as if this were the true value: Prequential consistency θ i ( X i +1 | x i ) . P ∗ i +1 ( X i +1 ) = P ˆ Absolute assessment Comparative assessment NB: This requires re-estimating θ with each new observation! Prequential efficiency Bayesian forecasting system (BFS) Let π ( θ ) be a prior Model choice density for θ , and π i ( θ ) the posterior based on the past data Conclusions x i . Use this to mix the various θ -specific forecasts: � P θ ( X i +1 | x i ) π i ( θ ) dθ. P ∗ i +1 ( X i +1 ) = 9 / 36

  16. Statistical Forecasting Systems Forecasting —based on a statistical model P = { P θ } for X . Forecasting systems Probability Given the past data x i , construct Plug-in forecasting system Forecasting Systems some estimate ˆ Statistical θ i of θ ( e.g. , by maximum likelihood), and Forecasting ⊲ Systems proceed as if this were the true value: Prequential consistency θ i ( X i +1 | x i ) . P ∗ i +1 ( X i +1 ) = P ˆ Absolute assessment Comparative assessment NB: This requires re-estimating θ with each new observation! Prequential efficiency Bayesian forecasting system (BFS) Let π ( θ ) be a prior Model choice density for θ , and π i ( θ ) the posterior based on the past data Conclusions x i . Use this to mix the various θ -specific forecasts: � P θ ( X i +1 | x i ) π i ( θ ) dθ. P ∗ i +1 ( X i +1 ) = Other e.g. fiducial predictive distribution, . . . 9 / 36

  17. Prequential consistency Forecasting Gaussian process: X i ∼ N ( µ, σ 2 ) , corr ( X i , X j ) = ρ Forecasting systems Probability Forecasting Systems Statistical Forecasting Systems Prequential ⊲ consistency Absolute assessment Comparative assessment Prequential efficiency Model choice Conclusions 10 / 36

  18. Prequential consistency Forecasting Gaussian process: X i ∼ N ( µ, σ 2 ) , corr ( X i , X j ) = ρ Forecasting systems Probability MLEs: Forecasting Systems Statistical Forecasting Systems L N (0 , ρσ 2 ) µ n ˆ = X n → Prequential ⊲ consistency p n − 1 � n σ 2 i =1 ( X i − X n ) 2 (1 − ρ ) σ 2 → ˆ = Absolute assessment n Comparative ρ n ˆ = 0 assessment Prequential efficiency Model choice Conclusions 10 / 36

  19. Prequential consistency Forecasting Gaussian process: X i ∼ N ( µ, σ 2 ) , corr ( X i , X j ) = ρ Forecasting systems Probability MLEs: Forecasting Systems Statistical Forecasting Systems L N (0 , ρσ 2 ) µ n ˆ = X n → Prequential ⊲ consistency p n − 1 � n σ 2 i =1 ( X i − X n ) 2 (1 − ρ ) σ 2 → ˆ = Absolute assessment n Comparative ρ n ˆ = 0 assessment Prequential efficiency — not classically consistent. Model choice Conclusions 10 / 36

  20. Prequential consistency Forecasting Gaussian process: X i ∼ N ( µ, σ 2 ) , corr ( X i , X j ) = ρ Forecasting systems Probability MLEs: Forecasting Systems Statistical Forecasting Systems L N (0 , ρσ 2 ) µ n ˆ = X n → Prequential ⊲ consistency p n − 1 � n σ 2 i =1 ( X i − X n ) 2 (1 − ρ ) σ 2 → ˆ = Absolute assessment n Comparative ρ n ˆ = 0 assessment Prequential efficiency — not classically consistent. Model choice But the estimated predictive distribution ˆ σ 2 Conclusions P n +1 = N (ˆ µ n , ˆ n ) does approximate the true predictive distribution P n +1 : normal with mean x n + (1 − ρ )( µ − x n ) / { nρ + (1 − ρ ) } and variance (1 − ρ ) σ 2 + σ 2 / { nρ + (1 − ρ ) } . 10 / 36

  21. Forecasting Forecasting systems Absolute ⊲ assessment Weak Prequential Principle Calibration Example Calibration plot Absolute assessment Computable calibration Well-calibrated forecasts are essentially unique Significance test Recursive residuals Comparative assessment Prequential efficiency Model choice Conclusions 11 / 36

  22. Weak Prequential Principle Forecasting Forecasting systems Absolute assessment Weak Prequential ⊲ Principle Calibration Example The assessment of the quality of a forecasting system in the light Calibration plot of a sequence of observed outcomes should depend only on the Computable calibration forecasts it in fact delivered for that sequence Well-calibrated forecasts are essentially unique Significance test Recursive residuals Comparative assessment Prequential efficiency Model choice Conclusions 12 / 36

  23. Weak Prequential Principle Forecasting Forecasting systems Absolute assessment Weak Prequential ⊲ Principle Calibration Example The assessment of the quality of a forecasting system in the light Calibration plot of a sequence of observed outcomes should depend only on the Computable calibration forecasts it in fact delivered for that sequence Well-calibrated forecasts are essentially unique Significance test — and not, for example, on how it might have behaved for other Recursive residuals sequences. Comparative assessment Prequential efficiency Model choice Conclusions 12 / 36

  24. Calibration Forecasting Binary variables ( X i ) � Forecasting systems Realized values ( x i ) � Absolute assessment Emitted probability forecasts ( p i ) � Weak Prequential Principle ⊲ Calibration Example Calibration plot Computable calibration Well-calibrated forecasts are essentially unique Significance test Recursive residuals Comparative assessment Prequential efficiency Model choice Conclusions 13 / 36

  25. Calibration Forecasting Binary variables ( X i ) � Forecasting systems Realized values ( x i ) � Absolute assessment Emitted probability forecasts ( p i ) � Weak Prequential Principle ⊲ Calibration Want (??) the ( p i ) and ( x i ) to be close “on average”: Example Calibration plot Computable x n − p n → 0 calibration Well-calibrated forecasts are essentially unique where x n is the average of all the ( x i ) up to time n , etc. Significance test Recursive residuals Comparative assessment Prequential efficiency Model choice Conclusions 13 / 36

  26. Calibration Forecasting Binary variables ( X i ) � Forecasting systems Realized values ( x i ) � Absolute assessment Emitted probability forecasts ( p i ) � Weak Prequential Principle ⊲ Calibration Want (??) the ( p i ) and ( x i ) to be close “on average”: Example Calibration plot Computable x n − p n → 0 calibration Well-calibrated forecasts are essentially unique where x n is the average of all the ( x i ) up to time n , etc. Significance test Recursive residuals Probability calibration: Fix π ∈ [0 , 1] , average over only those Comparative times i when p i is “close to” π : assessment Prequential efficiency x ′ n − π → 0 Model choice Conclusions 13 / 36

  27. Example Forecasting Forecasting systems Absolute assessment Weak Prequential Principle Calibration ⊲ Example Calibration plot Computable calibration Well-calibrated forecasts are essentially unique Significance test Recursive residuals Comparative assessment Prequential efficiency Model choice Conclusions 14 / 36

  28. Example Forecasting Forecasting systems Absolute assessment Weak Prequential Principle Calibration ⊲ Example Calibration plot Computable calibration Well-calibrated forecasts are essentially unique Significance test Recursive residuals Comparative assessment Prequential efficiency Model choice Conclusions 14 / 36

  29. Calibration plot Forecasting Forecasting systems Absolute assessment Weak Prequential Principle Calibration Example ⊲ Calibration plot Computable calibration Well-calibrated forecasts are essentially unique Significance test Recursive residuals Comparative assessment Prequential efficiency Model choice Conclusions 15 / 36

  30. Computable calibration Let σ be a computable strategy for selecting trials in the light of Forecasting Forecasting systems previous outcomes and forecasts Absolute assessment Weak Prequential Principle Calibration Example Calibration plot Computable ⊲ calibration Well-calibrated forecasts are essentially unique Significance test Recursive residuals Comparative assessment Prequential efficiency Model choice Conclusions 16 / 36

  31. Computable calibration Let σ be a computable strategy for selecting trials in the light of Forecasting Forecasting systems previous outcomes and forecasts Absolute assessment Weak Prequential — e.g. third day following two successive rainy days, where Principle Calibration forecast is below 0.5. Example Calibration plot Computable ⊲ calibration Well-calibrated forecasts are essentially unique Significance test Recursive residuals Comparative assessment Prequential efficiency Model choice Conclusions 16 / 36

  32. Computable calibration Let σ be a computable strategy for selecting trials in the light of Forecasting Forecasting systems previous outcomes and forecasts Absolute assessment Weak Prequential — e.g. third day following two successive rainy days, where Principle Calibration forecast is below 0.5. Example Calibration plot Then require asymptotic equality of averages, p σ and x σ , of the Computable ⊲ calibration ( p i ) and ( x i ) over those trials selected by σ . Well-calibrated forecasts are essentially unique Significance test Recursive residuals Comparative assessment Prequential efficiency Model choice Conclusions 16 / 36

  33. Computable calibration Let σ be a computable strategy for selecting trials in the light of Forecasting Forecasting systems previous outcomes and forecasts Absolute assessment Weak Prequential — e.g. third day following two successive rainy days, where Principle Calibration forecast is below 0.5. Example Calibration plot Then require asymptotic equality of averages, p σ and x σ , of the Computable ⊲ calibration ( p i ) and ( x i ) over those trials selected by σ . Well-calibrated forecasts are essentially unique Why? Significance test Recursive residuals Comparative assessment Prequential efficiency Model choice Conclusions 16 / 36

  34. Computable calibration Let σ be a computable strategy for selecting trials in the light of Forecasting Forecasting systems previous outcomes and forecasts Absolute assessment Weak Prequential — e.g. third day following two successive rainy days, where Principle Calibration forecast is below 0.5. Example Calibration plot Then require asymptotic equality of averages, p σ and x σ , of the Computable ⊲ calibration ( p i ) and ( x i ) over those trials selected by σ . Well-calibrated forecasts are essentially unique Why? Significance test Recursive residuals Can show following. Let P be a distribution for X , and Comparative P i := P ( X i = 1 | X i − 1 ) . Then assessment Prequential efficiency P σ − X σ → 0 Model choice Conclusions P -almost surely, for any distribution P . 16 / 36

  35. Well-calibrated forecasts are essentially unique Forecasting Forecasting systems Absolute assessment Weak Prequential Principle Calibration Suppose p and q are computable forecast sequences, each Example computably calibrated for the same outcome sequence x . Calibration plot Computable calibration Well-calibrated forecasts are ⊲ essentially unique Significance test Recursive residuals Comparative assessment Prequential efficiency Model choice Conclusions 17 / 36

  36. Well-calibrated forecasts are essentially unique Forecasting Forecasting systems Absolute assessment Weak Prequential Principle Calibration Suppose p and q are computable forecast sequences, each Example computably calibrated for the same outcome sequence x . Calibration plot Computable calibration Well-calibrated forecasts are ⊲ essentially unique Significance test Recursive residuals Then p i − q i → 0 . Comparative assessment Prequential efficiency Model choice Conclusions 17 / 36

  37. Significance test Consider e.g. Forecasting � ( X i − P i ) Forecasting systems � P i (1 − P i ) Z n := Absolute assessment Weak Prequential Principle where P i = P ( X i = 1 | X i − 1 ) . Calibration Example Calibration plot Computable calibration Well-calibrated forecasts are essentially unique ⊲ Significance test Recursive residuals Comparative assessment Prequential efficiency Model choice Conclusions 18 / 36

  38. Significance test Consider e.g. Forecasting � ( X i − P i ) Forecasting systems � P i (1 − P i ) Z n := Absolute assessment Weak Prequential Principle where P i = P ( X i = 1 | X i − 1 ) . Calibration Example Calibration plot Then Computable L calibration → N (0 , 1) Z n Well-calibrated forecasts are essentially unique for (almost) any P . ⊲ Significance test Recursive residuals Comparative assessment Prequential efficiency Model choice Conclusions 18 / 36

  39. Significance test Consider e.g. Forecasting � ( X i − P i ) Forecasting systems � P i (1 − P i ) Z n := Absolute assessment Weak Prequential Principle where P i = P ( X i = 1 | X i − 1 ) . Calibration Example Calibration plot Then Computable L calibration → N (0 , 1) Z n Well-calibrated forecasts are essentially unique for (almost) any P . ⊲ Significance test Recursive residuals So can refer value of Z n to standard normal tables to test Comparative departure from calibration, even without knowing generating assessment distribution P Prequential efficiency Model choice Conclusions 18 / 36

  40. Significance test Consider e.g. Forecasting � ( X i − P i ) Forecasting systems � P i (1 − P i ) Z n := Absolute assessment Weak Prequential Principle where P i = P ( X i = 1 | X i − 1 ) . Calibration Example Calibration plot Then Computable L calibration → N (0 , 1) Z n Well-calibrated forecasts are essentially unique for (almost) any P . ⊲ Significance test Recursive residuals So can refer value of Z n to standard normal tables to test Comparative departure from calibration, even without knowing generating assessment distribution P Prequential efficiency Model choice Conclusions — ”Strong Prequential Principle” 18 / 36

  41. Recursive residuals Suppose the X i are continuous variables, and the forecast for X i Forecasting Forecasting systems has the form of a continuous cumulative distribution function Absolute assessment F i ( · ) . Weak Prequential Principle Calibration Example Calibration plot Computable calibration Well-calibrated forecasts are essentially unique Significance test Recursive ⊲ residuals Comparative assessment Prequential efficiency Model choice Conclusions 19 / 36

  42. Recursive residuals Suppose the X i are continuous variables, and the forecast for X i Forecasting Forecasting systems has the form of a continuous cumulative distribution function Absolute assessment F i ( · ) . Weak Prequential Principle Calibration If X ∼ P , and the forecasts are obtained from P : Example Calibration plot F i ( x ) := P ( X i ≤ x | X i − 1 = x i − 1 ) Computable calibration Well-calibrated forecasts are then, defining essentially unique Significance test U i := F i ( X i ) Recursive ⊲ residuals we have Comparative U i ∼ U [0 , 1] assessment Prequential independently, for any P . efficiency Model choice Conclusions 19 / 36

  43. Forecasting Forecasting systems Absolute assessment Weak Prequential Principle Calibration So we can apply various tests of uniformity and/or independence Example to the observed values Calibration plot Computable calibration Well-calibrated u i := F i ( x i ) forecasts are essentially unique Significance test to test the validity of the forecasts made Recursive residuals ⊲ Comparative assessment Prequential efficiency Model choice Conclusions 20 / 36

  44. Forecasting Forecasting systems Absolute assessment Weak Prequential Principle Calibration So we can apply various tests of uniformity and/or independence Example to the observed values Calibration plot Computable calibration Well-calibrated u i := F i ( x i ) forecasts are essentially unique Significance test to test the validity of the forecasts made Recursive residuals ⊲ — again, without needing to know the generating distribution P . Comparative assessment Prequential efficiency Model choice Conclusions 20 / 36

  45. Forecasting Forecasting systems Absolute assessment Comparative ⊲ assessment Loss function Examples: Single distribution P Likelihood Comparative assessment Bayesian forecasting system Plug-in SFS Prequential efficiency Model choice Conclusions 21 / 36

  46. Loss function Measure inadequacy of forecast f of outcome x by Forecasting Forecasting systems loss function: L ( x, f ) Absolute assessment Comparative assessment ⊲ Loss function Examples: Single distribution P Likelihood Bayesian forecasting system Plug-in SFS Prequential efficiency Model choice Conclusions 22 / 36

  47. Loss function Measure inadequacy of forecast f of outcome x by Forecasting Forecasting systems loss function: L ( x, f ) Absolute assessment Comparative assessment Then measure of overall inadequacy of forecast sequence f n for ⊲ Loss function outcome sequence x n is cumulative loss: Examples: Single distribution P Likelihood Bayesian forecasting n system L n = � L ( x i , f i ) Plug-in SFS Prequential i =1 efficiency We can use this to compare different forecasting systems. Model choice Conclusions 22 / 36

  48. Examples: Forecasting Forecasting systems Absolute assessment Squared error: f a point forecast of real-valued X Comparative assessment Loss function L ( x, f ) = ( x − f ) 2 . ⊲ Examples: Single distribution P Likelihood Bayesian forecasting system Plug-in SFS Prequential efficiency Model choice Conclusions 23 / 36

  49. Examples: Forecasting Forecasting systems Absolute assessment Squared error: f a point forecast of real-valued X Comparative assessment Loss function ⊲ Examples: L ( x, f ) = ( x − f ) 2 . Single distribution P Likelihood Bayesian forecasting system Plug-in SFS Prequential f a probability density q ( · ) for X Logarithmic score: efficiency Model choice Conclusions L ( x, q ) = − log q ( x ) . 23 / 36

  50. Single distribution P Forecasting At time i , having observed x i , probability forecast for X i +1 is its conditional distribution P i +1 ( X i +1 ) := P ( X i +1 | X i = x i ) . Forecasting systems Absolute assessment Comparative assessment Loss function Examples: Single ⊲ distribution P Likelihood Bayesian forecasting system Plug-in SFS Prequential efficiency Model choice Conclusions 24 / 36

  51. Single distribution P Forecasting At time i , having observed x i , probability forecast for X i +1 is its conditional distribution P i +1 ( X i +1 ) := P ( X i +1 | X i = x i ) . Forecasting systems Absolute assessment When we then observe X i +1 = x i +1 , the associated logarithmic Comparative assessment score is Loss function − log p ( x i +1 | x i ) . Examples: Single ⊲ distribution P Likelihood Bayesian forecasting system Plug-in SFS Prequential efficiency Model choice Conclusions 24 / 36

  52. Single distribution P Forecasting At time i , having observed x i , probability forecast for X i +1 is its conditional distribution P i +1 ( X i +1 ) := P ( X i +1 | X i = x i ) . Forecasting systems Absolute assessment When we then observe X i +1 = x i +1 , the associated logarithmic Comparative assessment score is Loss function − log p ( x i +1 | x i ) . Examples: Single ⊲ distribution P So the cumulative score is Likelihood Bayesian forecasting system n − 1 � Plug-in SFS − log p ( x i +1 | x i ) L n ( P ) = Prequential i =0 efficiency n Model choice � p ( x i | x i − 1 ) = − log Conclusions i =1 − log p ( x n ) = where p ( · ) is the joint density of X under P . 24 / 36

  53. Likelihood Forecasting L n ( P ) is just the (negative) log-likelihood of the joint Forecasting systems distribution P for the observed data-sequence x n . Absolute assessment Comparative assessment Loss function Examples: Single distribution P ⊲ Likelihood Bayesian forecasting system Plug-in SFS Prequential efficiency Model choice Conclusions 25 / 36

  54. Likelihood Forecasting L n ( P ) is just the (negative) log-likelihood of the joint Forecasting systems distribution P for the observed data-sequence x n . Absolute assessment Comparative If P and Q are alternative joint distributions, considered as assessment Loss function forecasting systems, then the excess score of Q over P is just the Examples: log likelihood ratio for comparing P to Q for the full data x n . Single distribution P ⊲ Likelihood Bayesian forecasting system Plug-in SFS Prequential efficiency Model choice Conclusions 25 / 36

  55. Likelihood Forecasting L n ( P ) is just the (negative) log-likelihood of the joint Forecasting systems distribution P for the observed data-sequence x n . Absolute assessment Comparative If P and Q are alternative joint distributions, considered as assessment Loss function forecasting systems, then the excess score of Q over P is just the Examples: log likelihood ratio for comparing P to Q for the full data x n . Single distribution P ⊲ Likelihood Bayesian forecasting system Plug-in SFS This gives an interpretation to and use for likelihood that does Prequential not rely on the assuming the truth of any of the models efficiency considered. Model choice Conclusions 25 / 36

  56. Bayesian forecasting system For a BFS: Forecasting Forecasting systems � Absolute assessment P θ ( X i +1 | x i ) π i ( θ ) dθ P ∗ i +1 ( X i +1 ) = Comparative assessment P B ( X i +1 | x i ) = Loss function Examples: Single distribution P � where P B := P θ π ( θ ) dθ is the Bayes mixture joint distribution. Likelihood Bayesian forecasting ⊲ system Plug-in SFS Prequential efficiency Model choice Conclusions 26 / 36

  57. Bayesian forecasting system For a BFS: Forecasting Forecasting systems � Absolute assessment P θ ( X i +1 | x i ) π i ( θ ) dθ P ∗ i +1 ( X i +1 ) = Comparative assessment P B ( X i +1 | x i ) = Loss function Examples: Single distribution P � where P B := P θ π ( θ ) dθ is the Bayes mixture joint distribution. Likelihood Bayesian forecasting ⊲ system This is equivalent to basing all forecasts on the single Plug-in SFS distribution P B . The total logarithmic score is thus Prequential efficiency L n ( P ) = L n ( P B ) Model choice Conclusions − log p B ( x n ) = � p θ ( x n ) π ( θ ) dθ − log = 26 / 36

  58. Plug-in SFS Forecasting For a plug-in system: L n = − log � n − 1 θ i ( x i +1 | x i ) . i =0 p ˆ Forecasting systems Absolute assessment Comparative assessment Loss function Examples: Single distribution P Likelihood Bayesian forecasting system ⊲ Plug-in SFS Prequential efficiency Model choice Conclusions 27 / 36

  59. Plug-in SFS Forecasting For a plug-in system: L n = − log � n − 1 θ i ( x i +1 | x i ) . i =0 p ˆ Forecasting systems The data ( x i +1 ) used to evaluate performance, and the data Absolute assessment � Comparative ( x i ) used to estimate θ , do not overlap assessment Loss function “unbiased” assessments (like cross-validation) – Examples: Single distribution P Likelihood Bayesian forecasting system ⊲ Plug-in SFS Prequential efficiency Model choice Conclusions 27 / 36

  60. Plug-in SFS Forecasting For a plug-in system: L n = − log � n − 1 θ i ( x i +1 | x i ) . i =0 p ˆ Forecasting systems The data ( x i +1 ) used to evaluate performance, and the data Absolute assessment � Comparative ( x i ) used to estimate θ , do not overlap assessment Loss function “unbiased” assessments (like cross-validation) – Examples: Single distribution P If x i is used to forecast x j , then x j is not used to forecast x i Likelihood � Bayesian forecasting system “uncorrelated” assessments (unlike cross-validation) – ⊲ Plug-in SFS Prequential efficiency Model choice Conclusions 27 / 36

  61. Plug-in SFS Forecasting For a plug-in system: L n = − log � n − 1 θ i ( x i +1 | x i ) . i =0 p ˆ Forecasting systems The data ( x i +1 ) used to evaluate performance, and the data Absolute assessment � Comparative ( x i ) used to estimate θ , do not overlap assessment Loss function “unbiased” assessments (like cross-validation) – Examples: Single distribution P If x i is used to forecast x j , then x j is not used to forecast x i Likelihood � Bayesian forecasting system “uncorrelated” assessments (unlike cross-validation) – ⊲ Plug-in SFS Prequential efficiency Both under- and over-fitting automatically and appropriately Model choice penalized. Conclusions 27 / 36

  62. Forecasting Forecasting systems Absolute assessment Comparative assessment Prequential ⊲ efficiency Efficiency Prequential efficiency Model testing Model choice Conclusions 28 / 36

  63. Efficiency Forecasting Let P be a SFS. P is prequentially efficient for { P θ } if, for any Forecasting systems PFS Q : Absolute assessment Comparative L n ( P ) − L n ( Q ) remains bounded above as n → ∞ , assessment with P θ probability 1, for almost all θ . Prequential efficiency ⊲ Efficiency [In particular, the losses of any two efficient SFS’s differ by an Model testing amount that remains asymptotically bounded under almost all Model choice P θ .] Conclusions 29 / 36

  64. Efficiency Forecasting Let P be a SFS. P is prequentially efficient for { P θ } if, for any Forecasting systems PFS Q : Absolute assessment Comparative L n ( P ) − L n ( Q ) remains bounded above as n → ∞ , assessment with P θ probability 1, for almost all θ . Prequential efficiency ⊲ Efficiency [In particular, the losses of any two efficient SFS’s differ by an Model testing amount that remains asymptotically bounded under almost all Model choice P θ .] Conclusions A BFS with π ( θ ) > 0 is prequentially efficient. � 29 / 36

  65. Efficiency Forecasting Let P be a SFS. P is prequentially efficient for { P θ } if, for any Forecasting systems PFS Q : Absolute assessment Comparative L n ( P ) − L n ( Q ) remains bounded above as n → ∞ , assessment with P θ probability 1, for almost all θ . Prequential efficiency ⊲ Efficiency [In particular, the losses of any two efficient SFS’s differ by an Model testing amount that remains asymptotically bounded under almost all Model choice P θ .] Conclusions A BFS with π ( θ ) > 0 is prequentially efficient. � A plug-in SFS based on a Fisher efficient estimator sequence � is prequentially efficient. 29 / 36

  66. Model testing Model: Forecasting Forecasting systems X ∼ P θ ( θ ∈ Θ) Absolute assessment Comparative assessment Prequential efficiency Efficiency ⊲ Model testing Model choice Conclusions 30 / 36

  67. Model testing Model: Forecasting Forecasting systems X ∼ P θ ( θ ∈ Θ) Absolute assessment Comparative assessment Let P be prequentially efficient for P = { P θ } , and define: Prequential efficiency E P ( X i | X i − 1 ) µ i = Efficiency ⊲ Model testing var P ( X i | X i − 1 ) σ 2 = Model choice i � n i =1 ( X i − µ i ) Conclusions Z n = � 1 �� n i =1 σ 2 2 i 30 / 36

  68. Model testing Model: Forecasting Forecasting systems X ∼ P θ ( θ ∈ Θ) Absolute assessment Comparative assessment Let P be prequentially efficient for P = { P θ } , and define: Prequential efficiency E P ( X i | X i − 1 ) µ i = Efficiency ⊲ Model testing var P ( X i | X i − 1 ) σ 2 = Model choice i � n i =1 ( X i − µ i ) Conclusions Z n = � 1 �� n i =1 σ 2 2 i L → N (0 , 1) under any P θ ∈ P . Then Z n 30 / 36

  69. Model testing Model: Forecasting Forecasting systems X ∼ P θ ( θ ∈ Θ) Absolute assessment Comparative assessment Let P be prequentially efficient for P = { P θ } , and define: Prequential efficiency E P ( X i | X i − 1 ) µ i = Efficiency ⊲ Model testing var P ( X i | X i − 1 ) σ 2 = Model choice i � n i =1 ( X i − µ i ) Conclusions Z n = � 1 �� n i =1 σ 2 2 i L → N (0 , 1) under any P θ ∈ P . Then Z n So refer Z n to standard normal tables to test the model P . 30 / 36

  70. Forecasting Forecasting systems Absolute assessment Comparative assessment Prequential efficiency ⊲ Model choice Model choice Prequential consistency Out-of-model performance Conclusions 31 / 36

  71. Prequential consistency Forecasting Probability models Collection C = { P j : j = 1 , 2 , . . . } . Forecasting systems Absolute assessment Comparative assessment Prequential efficiency Model choice Prequential ⊲ consistency Out-of-model performance Conclusions 32 / 36

  72. Prequential consistency Forecasting Probability models Collection C = { P j : j = 1 , 2 , . . . } . Forecasting systems Both BFS and (suitable) plug-in SFS are prequentially � Absolute assessment consistent: with probability 1 under any P j ∈ C , their Comparative assessment forecasts will come to agree with those made by P j . Prequential efficiency Model choice Prequential ⊲ consistency Out-of-model performance Conclusions 32 / 36

  73. Prequential consistency Forecasting Probability models Collection C = { P j : j = 1 , 2 , . . . } . Forecasting systems Both BFS and (suitable) plug-in SFS are prequentially � Absolute assessment consistent: with probability 1 under any P j ∈ C , their Comparative assessment forecasts will come to agree with those made by P j . Prequential efficiency Collection C = {P j : j = 1 , 2 , . . . } , where Parametric models Model choice each P j is itself a parametric model: P j = { P j,θ j } . Can have Prequential ⊲ consistency different dimensionalities. Out-of-model performance Conclusions 32 / 36

  74. Prequential consistency Forecasting Probability models Collection C = { P j : j = 1 , 2 , . . . } . Forecasting systems Both BFS and (suitable) plug-in SFS are prequentially � Absolute assessment consistent: with probability 1 under any P j ∈ C , their Comparative assessment forecasts will come to agree with those made by P j . Prequential efficiency Collection C = {P j : j = 1 , 2 , . . . } , where Parametric models Model choice each P j is itself a parametric model: P j = { P j,θ j } . Can have Prequential ⊲ consistency different dimensionalities. Out-of-model performance Replace each P j by a prequentially efficient single � Conclusions distribution P j and proceed as above. 32 / 36

  75. Prequential consistency Forecasting Probability models Collection C = { P j : j = 1 , 2 , . . . } . Forecasting systems Both BFS and (suitable) plug-in SFS are prequentially � Absolute assessment consistent: with probability 1 under any P j ∈ C , their Comparative assessment forecasts will come to agree with those made by P j . Prequential efficiency Collection C = {P j : j = 1 , 2 , . . . } , where Parametric models Model choice each P j is itself a parametric model: P j = { P j,θ j } . Can have Prequential ⊲ consistency different dimensionalities. Out-of-model performance Replace each P j by a prequentially efficient single � Conclusions distribution P j and proceed as above. For each j , for almost all θ j , with probability 1 under P j,θ j � the resulting forecasts will come to agree with those made by P j,θ j . 32 / 36

  76. Out-of-model performance Forecasting Suppose we use a model P = { P θ } for X , but the data are Forecasting systems generated from a distribution Q �∈ P . For an observed Absolute assessment data-sequence x , we have sequences of probability forecasts Comparative P θ,i := P θ ( X i | x i − 1 ) , based on each P θ ∈ P : and “true” assessment Prequential predictive distributions Q i := Q ( X i | x i − 1 ) . The “best” value of efficiency θ , for predicting x n , might be defined as: Model choice Prequential consistency n Out-of-model ⊲ � θ Q performance n := arg min K ( Q i , P θ,i ) . θ Conclusions i =1 NB: This typically depends on the observed data 33 / 36

  77. Out-of-model performance Forecasting Suppose we use a model P = { P θ } for X , but the data are Forecasting systems generated from a distribution Q �∈ P . For an observed Absolute assessment data-sequence x , we have sequences of probability forecasts Comparative P θ,i := P θ ( X i | x i − 1 ) , based on each P θ ∈ P : and “true” assessment Prequential predictive distributions Q i := Q ( X i | x i − 1 ) . The “best” value of efficiency θ , for predicting x n , might be defined as: Model choice Prequential consistency n Out-of-model ⊲ � θ Q performance n := arg min K ( Q i , P θ,i ) . θ Conclusions i =1 NB: This typically depends on the observed data With ˆ θ n the maximum likelihood estimate based on x n , we can show that for any Q , with Q -probability 1: ˆ θ n − θ Q n → 0 . 33 / 36

Recommend


More recommend