Higher order elicitability Johanna F. Ziegel University of Bern - - PowerPoint PPT Presentation

higher order elicitability
SMART_READER_LITE
LIVE PREVIEW

Higher order elicitability Johanna F. Ziegel University of Bern - - PowerPoint PPT Presentation

Higher order elicitability Johanna F. Ziegel University of Bern joint work with Fernando Fasciati, Tobias Fissler, Tilmann Gneiting, Alexander Jordan, Fabian Kr uger and Natalia Nolde 2 June 2017 Van Dantzig Seminar 1 / 52 Outline 1.


slide-1
SLIDE 1

Higher order elicitability

Johanna F. Ziegel

University of Bern

joint work with Fernando Fasciati, Tobias Fissler, Tilmann Gneiting, Alexander Jordan, Fabian Kr¨ uger and Natalia Nolde 2 June 2017 Van Dantzig Seminar

1 / 52

slide-2
SLIDE 2

Outline

  • 1. Elicitability

◮ Definition and a simple example ◮ Risk measures ◮ k-Elicitability ◮ Osband’s principle

  • 2. Evaluating forecasts of expected shortfall

◮ Absolute forecast evaluation ◮ Classical comparative forecast evaluation ◮ Comparative forecast evaluation with Murphy diagrams

  • 3. Summary

2 / 52

slide-3
SLIDE 3

Outline

  • 1. Elicitability

◮ Definition and a simple example ◮ Risk measures in banking ◮ k-Elicitability

  • 2. Evaluating forecasts of expected shortfall

◮ Absolute forecast evaluation ◮ Classical comparative forecast evaluation ◮ Comparative forecast evaluation with Murphy diagrams

  • 3. Summary

3 / 52

slide-4
SLIDE 4

Elicitability

Let P be a class of probability measures on O ⊆ Rd. Let T : P → A, F → T(F) be a functional where A ⊆ R.

Definition

A scoring (or loss) function S : A × O → R is consistent for T relative to P, if EFS(T(F), Y ) ≤ EFS(x, Y ), F ∈ P, x ∈ A. It is strictly consistent if “=” implies x = T(F). The functional T is called elicitable relative to P if there exists a scoring function S that is strictly consistent for it. In other words T(F) = arg min

x∈A EFS(x, Y ).

4 / 52

slide-5
SLIDE 5

A simple example – the mean

Let Y be a random variable with distribution function F. Suppose that EFY 2 < ∞. Then, EFY = arg min

x∈R EF(Y − x)2. ◮ The mean is elicitable with respect to the class of all

probability measures with finite second moment.

5 / 52

slide-6
SLIDE 6

A simple example – the mean

Theorem (Savage, 1971)

Let P be a class of probability measures with finite first moments. Let φ be a (strictly) convex function such that EFφ(Y ) exists and is finite for all F ∈ P. Then, S(x, y) = φ(y) − φ(x) − φ′(x)(y − x) is (strictly) consistent for the mean.

◮ Under suitable assumptions on P, the Bregman functions are

the only consistent scoring functions for the mean.

◮ Choosing φ(y) = y 2/(1 + |y|) shows that the mean is

elicitable with respect to the class of all probability measures with finite first moment.

6 / 52

slide-7
SLIDE 7

A simple example – the mean

Theorem (Savage, 1971)

Let P be a class of probability measures with finite first moments. Let φ be a (strictly) convex function such that EFφ(Y ) exists and is finite for all F ∈ P. Then, S(x, y) = φ(y) − φ(x) − φ′(x)(y − x) is (strictly) consistent for the mean.

◮ Under suitable assumptions on P, the Bregman functions are

the only consistent scoring functions for the mean.

◮ Choosing φ(y) = y 2/(1 + |y|) shows that the mean is

elicitable with respect to the class of all probability measures with finite first moment.

6 / 52

slide-8
SLIDE 8

A simple example – the mean

Theorem (Savage, 1971)

Let P be a class of probability measures with finite first moments. Let φ be a (strictly) convex function such that EFφ(Y ) exists and is finite for all F ∈ P. Then, S(x, y) = φ(y) − φ(x) − φ′(x)(y − x) is (strictly) consistent for the mean.

◮ Under suitable assumptions on P, the Bregman functions are

the only consistent scoring functions for the mean.

◮ Choosing φ(y) = y 2/(1 + |y|) shows that the mean is

elicitable with respect to the class of all probability measures with finite first moment.

6 / 52

slide-9
SLIDE 9

Why is elicitability interesting?

Generalized regression/M-estimation Assume the following model T(L(Y |Z)) = m(Z, β) parametrized by β ∈ Θ and let S be a strictly consistent scoring function for T. Suppose we have iid observations (zi, yi), i = 1, . . . , n from (Z, Y ). Then, we can estimate β by ˆ β = arg min

β′∈Θ

1 n

n

  • i=1

S(yi, m(zi, β′)).

◮ Least squares regression ◮ Quantile regression ◮ Logistic regression

7 / 52

slide-10
SLIDE 10

Why is elicitability interesting?

Generalized regression/M-estimation Assume the following model T(L(Y |Z)) = m(Z, β) parametrized by β ∈ Θ and let S be a strictly consistent scoring function for T. Suppose we have iid observations (zi, yi), i = 1, . . . , n from (Z, Y ). Then, we can estimate β by ˆ β = arg min

β′∈Θ

1 n

n

  • i=1

S(yi, m(zi, β′)).

◮ Least squares regression ◮ Quantile regression ◮ Logistic regression

7 / 52

slide-11
SLIDE 11

Why is elicitability interesting?

Forecast comparison/Model selection Suppose we have sequences of competing forecasts xA

1 , . . . , xA n ,

xB

1 , . . . , xB n for T and observations y1, . . . , yn. Let S be a strictly

consistent scoring function for T. Then it is natural to prefer method A over method B if 1 n

n

  • i=1

S(xA

i , yi) < 1

n

n

  • i=1

S(xB

i , yi).

8 / 52

slide-12
SLIDE 12

Why is elicitability interesting?

Forecast comparison/Model selection Suppose we have sequences of competing forecasts xA

1 , . . . , xA n ,

xB

1 , . . . , xB n for T and observations y1, . . . , yn. Let S be a strictly

consistent scoring function for T. Then it is natural to prefer method A over method B if 1 n

n

  • i=1

S(xA

i , yi) < 1

n

n

  • i=1

S(xB

i , yi).

8 / 52

slide-13
SLIDE 13

Risk measures

Let Y ∼ F be the single-period return of some financial asset.

◮ A risk measure assigns a real number to Y (interpreted as the

risk of the asset). Risk measures are used for

◮ external regulatory capital calculation ◮ management, optimization and decision making ◮ performance analysis ◮ capital allocation

9 / 52

slide-14
SLIDE 14

Risk measures

Let Y ∼ F be the single-period return of some financial asset.

◮ A risk measure assigns a real number to Y (interpreted as the

risk of the asset). Risk measures are used for

◮ external regulatory capital calculation ◮ management, optimization and decision making ◮ performance analysis ◮ capital allocation

9 / 52

slide-15
SLIDE 15

Value at Risk and expected shortfall

Let Y ∼ F, α ∈ (0, 1).

Value at Risk (VaR)

VaRα(Y ) = qα(F) = inf{x ∈ R : P(Y ≤ x) ≥ α},

Expected shortfall (ES)

ESα(Y ) = 1 α α VaRu(Y ) du.

◮ Profits are positive. ◮ We consider α close to zero (α = 0.01, α = 0.025). ◮ Risky positions yield large negative values of VaRα and ESα.

10 / 52

slide-16
SLIDE 16

Criticism of VaR as a risk measure

Lack of super-additivity

◮ Usually there are several Y (1), Y (2), . . . to be considered with

limited knowledge of their dependence.

◮ Goal: Bound on risk of the total i Y (i). ◮ VaR is not super-additive: There are Y (1), Y (2) such that

VaRα(Y (1) + Y (2)) < VaRα(Y (1)) + VaRα(Y (2)).

◮ Problematic for risk aggregation. ◮ Counterintuitive to diversification.

It is just a quantile. . .

◮ VaRα does not take sizes of losses beyond the threshold α

into account.

11 / 52

slide-17
SLIDE 17

Criticism of VaR as a risk measure

Lack of super-additivity

◮ Usually there are several Y (1), Y (2), . . . to be considered with

limited knowledge of their dependence.

◮ Goal: Bound on risk of the total i Y (i). ◮ VaR is not super-additive: There are Y (1), Y (2) such that

VaRα(Y (1) + Y (2)) < VaRα(Y (1)) + VaRα(Y (2)).

◮ Problematic for risk aggregation. ◮ Counterintuitive to diversification.

It is just a quantile. . .

◮ VaRα does not take sizes of losses beyond the threshold α

into account.

11 / 52

slide-18
SLIDE 18

Expected shortfall

For continuous distributions, we have ESα(Y ) = E(Y | Y ≤ VaRα(Y )).

◮ ESα is a coherent risk measure, so in particular super-additive. ◮ It takes the entire tail of the distribution into account. ◮ Largest coherent risk measure is dominated by VaRα. ◮ It has a natural interpretation.

12 / 52

slide-19
SLIDE 19

Elicitable and non-elicitable functionals

Elicitable

◮ Mean, moments ◮ Median, quantiles/VaR ◮ Expectiles (Newey and Powell, 1987)

Not elicitable

◮ Variance ◮ Expected Shortfall (Weber, 2006, Gneiting, 2011)

  • Elicitable. . .

◮ coherent risk measures: Expectiles ◮ convex risk measures: Shortfall risk measures ◮ distortion risk measures: VaR and mean

(Weber, 2006, Z 2014, Bellini and Bignozzi, 2014, Delbaen et

  • al. 2015, Kou and Peng 2014, Wang and Z 2015)

13 / 52

slide-20
SLIDE 20

k-Elicitability

Let P be a class of probability measures on O ⊆ Rd. Let T : P → A, F → T(F) be a functional where A ⊆ Rk.

Definition

A scoring (or loss) function S : A × O → R is P-consistent for T, if EFS(T(F), Y ) ≤ EFS(x, Y ), F ∈ P, x ∈ A. It is strictly P-consistent if “=” implies x = T(F). The functional T is called k-elicitable relative to P if there exists a scoring function S that is strictly consistent for it. In other words T(F) = arg min

x∈A EFS(x, Y ).

14 / 52

slide-21
SLIDE 21

Elicitable functionals

1-Elicitable

◮ Mean, moments ◮ Median, quantiles/VaR ◮ Expectiles (Newey and Powell, 1987)

2-Elicitable

◮ Mean and variance ◮ Second moment and variance ◮ VaR and expected shortfall

(Acerbi and Szekely, 2014, Fissler and Z, 2016) k-Elicitable

◮ Some spectral risk measures together with several VaRs at

certain levels (Fissler and Z, 2016)

15 / 52

slide-22
SLIDE 22

T = (VaRα, ESα)

Theorem (Fissler and Z, 2016)

Let α ∈ (0, 1), and A0 := {x ∈ R2 : x1 ≥ x2}. Let P be a class of probability measures on R with finite first moments and unique α-quantiles. Any scoring function S : A0 × R → R of the form S(x1, x2, y) =

  • {y ≤ x1} − α
  • G1(x1) −

{y ≤ x1}G1(y) + G2(x2)

  • x2 − x1 + 1

α {y ≤ x1}(x1 − y)

  • − G2(x2)

with G′

2 = G2, is P-consistent for T = (VaRα, ESα) if (−∞,x1]G1

is P-integrable and

◮ G1 is increasing and G2 is increasing and convex.

It is strictly P-consistent if, additionally,

◮ G2 is strictly increasing and strictly convex.

16 / 52

slide-23
SLIDE 23

T = (VaRα, ESα)

Theorem (Fissler and Z, 2016, Part 2)

If T(P) = A0, the class P is rich enough and S fulfils some smoothness conditions, all strictly P-consistent scoring functions for T are of the above form (up to equivalence).

Corollary

If the elements of P have finite first moment and unique α-quantiles, then the pair T = (VaRα, ESα): P → A0 is 2-elicitable.

17 / 52

slide-24
SLIDE 24

T = (VaRα, ESα)

Theorem (Fissler and Z, 2016, Part 2)

If T(P) = A0, the class P is rich enough and S fulfils some smoothness conditions, all strictly P-consistent scoring functions for T are of the above form (up to equivalence).

Corollary

If the elements of P have finite first moment and unique α-quantiles, then the pair T = (VaRα, ESα): P → A0 is 2-elicitable.

17 / 52

slide-25
SLIDE 25

Osband’s Principle

◮ Osband’s principle originates from Osband (1985) and gives

necessary condition for strictly consistent scoring functions.

◮ It gives a connection of partial derivatives of the expected

score and an expected identification function.

Definition

An P-identification function for a functional T is a function V : A × R → Rk such that EFV (x, Y ) = 0 ⇐ ⇒ x = T(F) for all F ∈ P and for all x ∈ A. Examples:

◮ Mean: V (x, y) = x − y ◮ α-quantile: V (x, y) =

{y ≤ x} − α.

18 / 52

slide-26
SLIDE 26

Osband’s Principle

◮ Osband’s principle originates from Osband (1985) and gives

necessary condition for strictly consistent scoring functions.

◮ It gives a connection of partial derivatives of the expected

score and an expected identification function.

Definition

An P-identification function for a functional T is a function V : A × R → Rk such that EFV (x, Y ) = 0 ⇐ ⇒ x = T(F) for all F ∈ P and for all x ∈ A. Examples:

◮ Mean: V (x, y) = x − y ◮ α-quantile: V (x, y) =

{y ≤ x} − α.

18 / 52

slide-27
SLIDE 27

Osband’s Principle

Theorem (Osband’s Principle; Fissler and Z (2016))

Let T : P → A ⊆ Rk be a surjective, elicitable and identifiable functional with P-identification function V : A × R → Rk and a strictly P-consistent scoring function S : A × R → R. Under some assumptions, there exists a matrix-valued function h: int(A) → Rk×k such that ∇x EFS(x, Y ) = h(x) EFV (x, Y ) for all x ∈ int(A) and F ∈ P. Key idea: Exploit the first order condition of the minimization problem: ∇x EFS(x, Y ) = 0 for x = T(F) for all F ∈ P. “The gradient ∇S is an identification function.”

19 / 52

slide-28
SLIDE 28

Osband’s Principle

Theorem (Osband’s Principle; Fissler and Z (2016))

Let T : P → A ⊆ Rk be a surjective, elicitable and identifiable functional with P-identification function V : A × R → Rk and a strictly P-consistent scoring function S : A × R → R. Under some assumptions, there exists a matrix-valued function h: int(A) → Rk×k such that ∇x EFS(x, Y ) = h(x) EFV (x, Y ) for all x ∈ int(A) and F ∈ P. Key idea: Exploit the first order condition of the minimization problem: ∇x EFS(x, Y ) = 0 for x = T(F) for all F ∈ P. “The gradient ∇S is an identification function.”

19 / 52

slide-29
SLIDE 29

Osband’s Principle

(Under some smoothness conditions) Second order conditions for the minimization problem: The Hessian ∇2

x [EFS(x, Y )] ∈ Rk×k

must be symmetric for all x ∈ A, F ∈ P, and positive semi-definite at x = T(F).

◮ For k = 1 the necessary conditions of Osband’s principle

directly lead to sufficient conditions: For an oriented identification function, choose some h > 0 and integrate.

◮ Harder for k > 1:

◮ Symmetry/positive semi-definiteness of the Hessian imposes

(complicated) restrictions on the function h.

◮ Even if x → EF S(x, Y ) has only one critical point and the

Hessian is positive definite there, we can only guarantee a local minimum!

◮ Generally, we must verify sufficient conditions on a case by

case basis.

20 / 52

slide-30
SLIDE 30

Osband’s Principle

(Under some smoothness conditions) Second order conditions for the minimization problem: The Hessian ∇2

x [EFS(x, Y )] ∈ Rk×k

must be symmetric for all x ∈ A, F ∈ P, and positive semi-definite at x = T(F).

◮ For k = 1 the necessary conditions of Osband’s principle

directly lead to sufficient conditions: For an oriented identification function, choose some h > 0 and integrate.

◮ Harder for k > 1:

◮ Symmetry/positive semi-definiteness of the Hessian imposes

(complicated) restrictions on the function h.

◮ Even if x → EF S(x, Y ) has only one critical point and the

Hessian is positive definite there, we can only guarantee a local minimum!

◮ Generally, we must verify sufficient conditions on a case by

case basis.

20 / 52

slide-31
SLIDE 31

Osband’s Principle

(Under some smoothness conditions) Second order conditions for the minimization problem: The Hessian ∇2

x [EFS(x, Y )] ∈ Rk×k

must be symmetric for all x ∈ A, F ∈ P, and positive semi-definite at x = T(F).

◮ For k = 1 the necessary conditions of Osband’s principle

directly lead to sufficient conditions: For an oriented identification function, choose some h > 0 and integrate.

◮ Harder for k > 1:

◮ Symmetry/positive semi-definiteness of the Hessian imposes

(complicated) restrictions on the function h.

◮ Even if x → EF S(x, Y ) has only one critical point and the

Hessian is positive definite there, we can only guarantee a local minimum!

◮ Generally, we must verify sufficient conditions on a case by

case basis.

20 / 52

slide-32
SLIDE 32

Application examples

In Fissler and Z (2016), we considered:

◮ Functionals with elicitable components (vectors of quantiles,

expectiles, ratios of expectations,. . . )

◮ Spectral risk measures with finitely supported spectral measure ◮ In particular: (VaRα, ESα)

21 / 52

slide-33
SLIDE 33

Functionals with elicitable components – two examples

Vectors of quantiles

◮ Let T(F) = (qα1(F), . . . , qαk(F)) with pairwise different

α1, . . . , αk.

◮ Strictly consistent scoring functions are of the form

S(x1, . . . , xk, y) =

k

  • m=1

Sm(xm, y). Vectors of expectations

◮ Let T(F) = EF(p(Y )) for some p : Rd → Rk. ◮ Strictly consistent scoring functions are of the form

S(x, y) = φ(y) − φ(x) − ⟨∇φ(x), p(y) − x⟩. where φ is strictly convex.

22 / 52

slide-34
SLIDE 34

Functionals with elicitable components – two examples

Vectors of quantiles

◮ Let T(F) = (qα1(F), . . . , qαk(F)) with pairwise different

α1, . . . , αk.

◮ Strictly consistent scoring functions are of the form

S(x1, . . . , xk, y) =

k

  • m=1

Sm(xm, y). Vectors of expectations

◮ Let T(F) = EF(p(Y )) for some p : Rd → Rk. ◮ Strictly consistent scoring functions are of the form

S(x, y) = φ(y) − φ(x) − ⟨∇φ(x), p(y) − x⟩. where φ is strictly convex.

22 / 52

slide-35
SLIDE 35

Outline

  • 1. Elicitability

◮ Definition and a simple example ◮ Risk measures in banking ◮ k-Elicitability

  • 2. Evaluating forecasts of expected shortfall

◮ Absolute forecast evaluation ◮ Classical comparative forecast evaluation ◮ Comparative forecast evaluation with Murphy diagrams

  • 3. Summary

23 / 52

slide-36
SLIDE 36

Evaluating forecasts of expected shortfall

Filtration F = {Ft}t∈N Prediction-observation triples (Qt, Et, Yt)t∈N Qt: VaRα prediction for time point t, Ft−1-measurable Et: ESα prediction for time point t, Ft−1-measurable Yt: Realization at time point t, Ft-measurable

24 / 52

slide-37
SLIDE 37

Absolute evaluation: Model verification

Let V be an identification function for (VaRα, ESα), that is E(V (q, v, Y )) = 0 ⇔ (q, v) = (VaRα(Y ), ESα(Y )).

Definition (Calibration)

The sequence of predictions {(Qt, Et)}t∈N is conditionally calibrated for (VaRα, ESα) if E(V (Qt, Et, Yt)|Ft−1) = 0 for all t ∈ N. Compare Davis (2016).

25 / 52

slide-38
SLIDE 38

Traditional backtesting

HC

0 : The sequence of predictions {(Qt, Et)}t∈N is conditionally

calibrated.

◮ Backtesting decision: If we do not reject HC 0 , the risk measure

estimates are adequate.

◮ Most existing backtests can be described as a test for

conditional calibration. (McNeil and Frey, 2000, Acerbi and Szekely 2014)

◮ Elicitability is not relevant. ◮ Does not give guidance for decision between methods. ◮ Does not respect increasing information sets.

26 / 52

slide-39
SLIDE 39

Traditional backtesting

HC

0 : The sequence of predictions {(Qt, Et)}t∈N is conditionally

calibrated.

◮ Backtesting decision: If we do not reject HC 0 , the risk measure

estimates are adequate.

◮ Most existing backtests can be described as a test for

conditional calibration. (McNeil and Frey, 2000, Acerbi and Szekely 2014)

◮ Elicitability is not relevant. ◮ Does not give guidance for decision between methods. ◮ Does not respect increasing information sets.

26 / 52

slide-40
SLIDE 40

Comparative evaluation: Model selection

Filtrations F = {Ft}t∈N and F∗ = {F∗

t }t∈N

Qt, Q∗

t : VaRα predictions for time point t

Et, E ∗

t : ESα predictions for time point t

Qt, Et: internal model, Ft−1-measurable Q∗

t , E ∗ t : standard model, F∗ t−1-measurable

Yt: Realization at time point t, Ft-measurable and F∗

t -measurable

27 / 52

slide-41
SLIDE 41

Comparative evaluation: Model selection

Filtrations F = {Ft}t∈N and F∗ = {F∗

t }t∈N

Qt, Q∗

t : VaRα predictions for time point t

Et, E ∗

t : ESα predictions for time point t

Qt, Et: internal model, Ft−1-measurable Q∗

t , E ∗ t : standard model, F∗ t−1-measurable

Yt: Realization at time point t, Ft-measurable and F∗

t -measurable

27 / 52

slide-42
SLIDE 42

Comparative evaluation: Model selection

Filtrations F = {Ft}t∈N and F∗ = {F∗

t }t∈N

Qt, Q∗

t : VaRα predictions for time point t

Et, E ∗

t : ESα predictions for time point t

Qt, Et: internal model, Ft−1-measurable Q∗

t , E ∗ t : standard model, F∗ t−1-measurable

Yt: Realization at time point t, Ft-measurable and F∗

t -measurable

27 / 52

slide-43
SLIDE 43

Forecast dominance

Let S be a consistent scoring function for (VaRα, ESα).

Definition (S-Dominance)

The sequence of predictions {(Qt, Et)}t∈N S-dominates {(Q∗

t , E ∗ t )}t∈N if

E(S(Qt, Et, Yt) − S(Q∗

t , E ∗ t , Yt)) ≤ 0,

for all t ∈ N.

28 / 52

slide-44
SLIDE 44

Comparative backtesting

λ∗ := lim sup

n→∞

1 n

n

  • t=1

E(S(Qt, Et, Xt) − S(Q∗

t , E ∗ t , Xt)),

λ∗ := lim inf

n→∞

1 n

n

  • t=1

E(S(Qt, Et,t ) − S(Q∗

t , E ∗ t , Xt)). ◮ S-dominance implies λ∗ ≤ λ∗ ≤ 0. ◮ λ∗ ≤ 0: Internal model is at least as good as the standard

model.

◮ λ∗ ≥ 0: Internal model predicts at most as well as the

standard model.

29 / 52

slide-45
SLIDE 45

Comparative backtesting

H−

0 : λ∗ ≤ 0,

H+

0 : λ∗ ≥ 0. ◮ ∆n ¯

S := 1

n

n

t=1(S(Qt, Et, Yt) − S(Q∗ t , E ∗ t , Y ∗ t )). ◮ Under suitable assumptions on the process of score

differences: Asymptotically normal test statistic T2 = ∆n ¯ S ˆ σn/√n, where ˆ σn2 is an HAC estimator of σ2

n = var(√n∆n ¯

S).

◮ Reject H− 0 if T2 is “too much” ≥ 0. ◮ Reject H+ 0 if T2 is “too much” ≤ 0.

(Diebold and Mariano, 1995, Giacomini and White, 2006)

30 / 52

slide-46
SLIDE 46

Comparative backtesting

◮ Backtesting decision using H− 0 : If we do not reject H− 0 , the

risk measure estimates are acceptable (compared to the standard).

◮ Backtesting decision using H+ 0 : If we reject H+ 0 , the risk

measure estimates are acceptable (compared to the standard).

◮ Elicitability is crucial. ◮ Allows for sensible comparison between methods. ◮ Necessitates a standard reference model. ◮ Respects increasing information sets (Holzmann and Eulert,

2014).

31 / 52

slide-47
SLIDE 47

Comparative backtesting

◮ Backtesting decision using H− 0 : If we do not reject H− 0 , the

risk measure estimates are acceptable (compared to the standard).

◮ Backtesting decision using H+ 0 : If we reject H+ 0 , the risk

measure estimates are acceptable (compared to the standard).

◮ Elicitability is crucial. ◮ Allows for sensible comparison between methods. ◮ Necessitates a standard reference model. ◮ Respects increasing information sets (Holzmann and Eulert,

2014).

31 / 52

slide-48
SLIDE 48

Three zone approaches

BIS three zone approach for VaRα

◮ Traditional backtest: One-sided binomial test. ◮ Backtesting decision:

Red Yellow Green p-value very small moderately small sufficiently big

◮ Generalization of three zone approach for ESα by Costanzino

and Curran (2015).

32 / 52

slide-49
SLIDE 49

Three zone approaches

BIS three zone approach for VaRα

◮ Traditional backtest: One-sided binomial test. ◮ Backtesting decision:

Red Yellow Green p-value very small moderately small sufficiently big

◮ Generalization of three zone approach for ESα by Costanzino

and Curran (2015). Three zone approach for comparative backtesting

¯ S∗

1.64 ΣN 1.64 ΣN

¯ S H− pass fail H+ pass fail

32 / 52

slide-50
SLIDE 50

A numerical illustration on nested information sets

(µt)t=1,...,N iid standard normal, Yt ∼ N(µt, 1), conditional on µt. Scenario A (vt, et) = (VaRα(N(µt, 1)), ESα(N(µt, 1))) (v ∗

t , e∗ t )

= (VaRα(N(0, 2)), ESα(N(0, 2))) Scenario B (vt, et) = (VaRα(N(0, 2)), ESα(N(0, 2))) (v ∗

t , e∗ t )

= (VaRα(N(µt, 1)), ESα(N(µt, 1)))

33 / 52

slide-51
SLIDE 51

A numerical illustration – cont’d

Scenario A Green Yellow Red Traditional VaR0.01 89.35 10.65 0.00 Traditional ES0.025 93.62 6.36 0.02 Comparative VaR0.01 88.23 11.77 0.00 Comparative ES0.025 87.22 12.78 0.00 Scenario B Green Yellow Red Traditional VaR0.01 89.33 10.67 0.00 Traditional ES0.025 93.80 6.18 0.02 Comparative VaR0.01 0.00 11.77 88.23 Comparative ES0.025 0.00 12.78 87.22 N = 250; 10’000 simulations

34 / 52

slide-52
SLIDE 52

Choice of a scoring function

◮ Fissler and Z (2016): G1(x1) = x1, G2(x2) = ex2 ◮ Fissler, Z and Gneiting (2016): G1(x1) = x1,

G2(x2) = ex2/(1 + ex2) A scoring function S is called positively homogeneous of degree b if S(cx, cy) = cbS(x, y), for all c > 0.

◮ Important in regression; see Efron (1991). ◮ Important in forecast ranking; see Patton (2011). ◮ Implies “unit consistency”; see Acerbi and Szekely (2014).

35 / 52

slide-53
SLIDE 53

Choice of a scoring function

◮ Fissler and Z (2016): G1(x1) = x1, G2(x2) = ex2 ◮ Fissler, Z and Gneiting (2016): G1(x1) = x1,

G2(x2) = ex2/(1 + ex2) A scoring function S is called positively homogeneous of degree b if S(cx, cy) = cbS(x, y), for all c > 0.

◮ Important in regression; see Efron (1991). ◮ Important in forecast ranking; see Patton (2011). ◮ Implies “unit consistency”; see Acerbi and Szekely (2014).

35 / 52

slide-54
SLIDE 54

Homogeneous scores for T = (VaRα, ESα)

◮ For the action domain A = R × (−∞, 0), there are positively

homogeneous strictly consistent scoring functions of degree b ∈ (−∞, 1)\{0}.

◮ There are strictly consistent scoring functions on

A = R × (−∞, 0) such that the score differences are positively homogeneous of degree b = 0.

◮ For b ≥ 1 positively homogeneous strictly consistent scoring

functions can only be defined on smaller action domains A = {Wx1 < x2}, compare the proposal by Acerbi and Szekely (2014). Details can be found in Nolde and Z (2016).

36 / 52

slide-55
SLIDE 55

Homogeneous scores for T = (VaRα, ESα)

◮ For the action domain A = R × (−∞, 0), there are positively

homogeneous strictly consistent scoring functions of degree b ∈ (−∞, 1)\{0}.

◮ There are strictly consistent scoring functions on

A = R × (−∞, 0) such that the score differences are positively homogeneous of degree b = 0.

◮ For b ≥ 1 positively homogeneous strictly consistent scoring

functions can only be defined on smaller action domains A = {Wx1 < x2}, compare the proposal by Acerbi and Szekely (2014). Details can be found in Nolde and Z (2016).

36 / 52

slide-56
SLIDE 56

Homogeneous scores for T = (VaRα, ESα)

◮ For the action domain A = R × (−∞, 0), there are positively

homogeneous strictly consistent scoring functions of degree b ∈ (−∞, 1)\{0}.

◮ There are strictly consistent scoring functions on

A = R × (−∞, 0) such that the score differences are positively homogeneous of degree b = 0.

◮ For b ≥ 1 positively homogeneous strictly consistent scoring

functions can only be defined on smaller action domains A = {Wx1 < x2}, compare the proposal by Acerbi and Szekely (2014). Details can be found in Nolde and Z (2016).

36 / 52

slide-57
SLIDE 57

A larger simulation study on comparative backtesting

AR(1)-GARCH(1,1)-model: Yt = µt + εt, µt = −0.05 + 0.3Yt−1, εt = σtZt, σ2

t = 0.01 + 0.1ε2 t−1 + 0.85σ2 t−1,

(Zt) iid with skewed t distribution with shape = 5 and skewness = 1.5. Estimation procedures:

◮ Fully parametric (n-FP, t-FP, st-FP) ◮ Filtered historical simulation (n-FHS, t-FHS, st-FHS) ◮ EVT based semi-parametric estimation (n-EVT, t-EVT,

st-EVT) Moving window of size 500 5000 out-of-sample verifying observations

37 / 52

slide-58
SLIDE 58

P-values of traditional backtests for (VaRα, ESα)

α = 0.246 simple general n-FP 0.000 0.000 n-FHS 0.881 0.184 n-EVT 0.754 0.672 t-FP 0.086 0.006 t-FHS 0.936 0.512 t-EVT 0.880 0.475 st-FP 0.569 0.824 st-FHS 0.909 0.796 st-EVT 0.935 0.706

  • pt

0.401 0.337 α = 0.025 simple general n-FP 0.000 0.000 n-FHS 0.653 0.231 n-EVT 0.886 0.226 t-FP 0.000 0.000 t-FHS 0.697 0.717 t-EVT 0.995 0.498 st-FP 0.695 0.419 st-FHS 0.843 0.758 st-EVT 0.962 0.564

  • pt

0.131 0.571

38 / 52

slide-59
SLIDE 59

n−FP n−FHS n−EVT t−FP t−FHS t−EVT st−FP st−FHS st−EVT

  • pt
  • pt

st−EVT st−FHS st−FP t−EVT t−FHS t−FP n−EVT n−FHS n−FP

internal model standard model ν = 0.754

n−FP n−FHS n−EVT t−FP t−FHS t−EVT st−FP st−FHS st−EVT

  • pt
  • pt

st−EVT st−FHS st−FP t−EVT t−FHS t−FP n−EVT n−FHS n−FP

internal model standard model ν = 0.975

n−FP n−FHS n−EVT t−FP t−FHS t−EVT st−FP st−FHS st−EVT

  • pt
  • pt

st−EVT st−FHS st−FP t−EVT t−FHS t−FP n−EVT n−FHS n−FP

internal model standard model ν = 0.754

n−FP n−FHS n−EVT t−FP t−FHS t−EVT st−FP st−FHS st−EVT

  • pt
  • pt

st−EVT st−FHS st−FP t−EVT t−FHS t−FP n−EVT n−FHS n−FP

internal model standard model ν = 0.975

39 / 52

slide-60
SLIDE 60

Can we avoid the choice of a specific scoring function for forecast comparison?

40 / 52

slide-61
SLIDE 61

Forecast dominance

Let S be a consistent scoring function for (VaRα, ESα).

Definition (S-Dominance)

The sequence of predictions {(Qt, Et)}t∈N S-dominates {(Q∗

t , E ∗ t )}t∈N if

E(S(Qt, Et, Yt) − S(Q∗

t , E ∗ t , Yt)) ≤ 0,

for all t ∈ N.

41 / 52

slide-62
SLIDE 62

Forecast dominance

Definition (Dominance)

The sequence of predictions {(Qt, Et)}t∈N dominates {(Q∗

t , E ∗ t )}t∈N if

E(S(Qt, Et, Yt) − S(Q∗

t , E ∗ t , Yt)) ≤ 0,

for all t ∈ N, and for all consistent scoring functions S for (VaRα, ESα).

41 / 52

slide-63
SLIDE 63

Mixture representation

Proposition

Let α ∈ (0, 1). For v1, v2, y ∈ R, (x1, x2) ∈ A, we define Sv1(x1, y) = ( {y ≤ x1} − α)

  • {v1 ≤ x1} −

{v1 ≤ y}

  • Sv2(x1, x2, y) =

{v2 ≤ x2} 1 α {y ≤ x1}(x1 − y) − (x1 − v2)

  • +

{v2 ≤ y}(y − v2). All scoring functions for (VaRα, ESα) can be written as S(x1, x2, y) =

  • Sv1(x1, y) dH1(v1) +
  • Sv2(x1, x2, y) dH2(v2),

where H1 is a locally finite measure and H2 is a measure that is finite on all intervals of the form (−∞, x], x ∈ R.

42 / 52

slide-64
SLIDE 64

Assessing forecast dominance

Corollary

The sequence of predictions {(Qt, Et)}t∈N dominates {(Q∗

t , E ∗ t )}t∈N if

E(Sv1(Qt, Yt) − Sv1(Q∗

t , Yt)) ≤ 0,

for all t ∈ N, and E(Sv2(Qt, Et, Yt) − Sv2(Q∗

t , E ∗ t , Yt)) ≤ 0,

for all t ∈ N, and for all v1, v2 ∈ R.

◮ Forecast dominance can be assessed by considering a

two-parameter family of consistent scoring functions, only.

◮ We are (primarily) interested in the ES forecast. Consider Sv2

  • nly.

43 / 52

slide-65
SLIDE 65

Murphy diagrams

Simplifying assumption

Assume that (Qt, Et, Yt)t∈N, (Q∗

t , E ∗ t , Yt)t∈N are stationary and

ergodic.

Murphy diagram

Plot v2 → 1 n

n

  • t=1

(Sv2(Qt, Et, Yt) − Sv2(Q∗

t , E ∗ t , Yt))

as an estimate of v2 → E(Sv2(Qt, Et, Yt) − Sv2(Q∗

t , E ∗ t , Yt))

Idea of Murphy diagrams: Ehm et al. (2016, JRSSB).

44 / 52

slide-66
SLIDE 66

Comparison of parametric models, α = 0.025

−10 −8 −6 −4 −2 2 0.0 0.5 1.0 1.5 v2 Score n−FP t−FP st−FP OPT

45 / 52

slide-67
SLIDE 67

Comparison of parametric models, α = 0.025

n-FP t-FP st-FP OPT n-FP

−10 −8 −6 −4 −2 2 −0.10 −0.05 0.00 0.05 0.10 v2 −6 −4 −2 2 −0.10 −0.05 0.00 0.05 0.10 v2 −6 −4 −2 2 −0.10 −0.05 0.00 0.05 0.10 v2

t-FP

−10 −8 −6 −4 −2 2 −0.10 −0.05 0.00 0.05 0.10 v2 −10 −8 −6 −4 −2 2 −0.10 −0.05 0.00 0.05 0.10 v2 −10 −8 −6 −4 −2 2 −0.10 −0.05 0.00 0.05 0.10 v2

st-FP

−6 −4 −2 2 −0.10 −0.05 0.00 0.05 0.10 v2 −10 −8 −6 −4 −2 2 −0.10 −0.05 0.00 0.05 0.10 v2 −6 −4 −2 2 −0.10 −0.05 0.00 0.05 0.10 v2

45 / 52

slide-68
SLIDE 68

Comparison of parametric models, α = 0.246

−4 −2 2 0.0 0.1 0.2 0.3 0.4 0.5 v2 Score n−FP t−FP st−FP OPT

46 / 52

slide-69
SLIDE 69

Comparison of parametric models, α = 0.246

n-FP t-FP st-FP OPT n-FP

−4 −2 2 −0.015 −0.005 0.005 0.015 v2 −4 −2 2 −0.015 −0.005 0.005 0.015 v2 −4 −2 2 −0.015 −0.005 0.005 0.015 v2

t-FP

−4 −2 2 −0.015 −0.005 0.005 0.015 v2 −4 −2 2 −0.015 −0.005 0.005 0.015 v2 −4 −2 2 −0.015 −0.005 0.005 0.015 v2

st-FP

−4 −2 2 −0.015 −0.005 0.005 0.015 v2 −4 −2 2 −0.015 −0.005 0.005 0.015 v2 −4 −2 2 −0.015 −0.005 0.005 0.015 v2

46 / 52

slide-70
SLIDE 70

Influence of the filtering distribution, α = 0.025

n-FHS t-FHS st-FHS OPT n-FHS

−8 −6 −4 −2 2 −0.10 −0.05 0.00 0.05 0.10 v2 −8 −6 −4 −2 2 −0.10 −0.05 0.00 0.05 0.10 v2 −8 −6 −4 −2 2 −0.10 −0.05 0.00 0.05 0.10 v2

t-FHS

−8 −6 −4 −2 2 −0.10 −0.05 0.00 0.05 0.10 v2 −8 −6 −4 −2 2 −0.10 −0.05 0.00 0.05 0.10 v2 −8 −6 −4 −2 2 −0.10 −0.05 0.00 0.05 0.10 v2

st-FHS

−8 −6 −4 −2 2 −0.10 −0.05 0.00 0.05 0.10 v2 −8 −6 −4 −2 2 −0.10 −0.05 0.00 0.05 0.10 v2 −6 −4 −2 2 −0.10 −0.05 0.00 0.05 0.10 v2

47 / 52

slide-71
SLIDE 71

Influence of the filtering distribution, α = 0.246

n-FHS t-FHS st-FHS OPT n-FHS

−4 −2 2 −0.015 −0.005 0.005 0.015 v2 −4 −2 2 −0.015 −0.005 0.005 0.015 v2 −4 −2 2 −0.015 −0.005 0.005 0.015 v2

t-FHS

−4 −2 2 −0.015 −0.005 0.005 0.015 v2 −4 −2 2 −0.015 −0.005 0.005 0.015 v2 −4 −2 2 −0.015 −0.005 0.005 0.015 v2

st-FHS

−4 −2 2 −0.015 −0.005 0.005 0.015 v2 −4 −2 2 −0.015 −0.005 0.005 0.015 v2 −4 −2 2 −0.015 −0.005 0.005 0.015 v2

48 / 52

slide-72
SLIDE 72

Formal tests for forecast dominance

◮ Formal tests for forecast dominance are possible. ◮ We have suggested the following procedure:

◮ Diebold-Mariano tests for each grid point v2. ◮ Adjust p-values for multiple testing by the Westfall-Young

procedure.

◮ Test works well in simulation examples, theoretical properties

not fully understood, yet.

49 / 52

slide-73
SLIDE 73

Outline

  • 1. Elicitability

◮ Definition and a simple example ◮ Risk measures in banking ◮ k-Elicitability

  • 2. Evaluating forecasts of expected shortfall

◮ Absolute forecast evaluation ◮ Classical comparative forecast evaluation ◮ Comparative forecast evaluation with Murphy diagrams

  • 3. Summary

50 / 52

slide-74
SLIDE 74

Summary

◮ k-Elicitability allows to find scoring functions for functionals

that are not elicitable individually.

◮ A relevant example in banking and insurance is the

non-elicitable risk measure ESα which is 2-elicitable with VaRα.

◮ Consistent scoring functions can be used for forecast

comparison.

◮ Characterization results for consistent scoring functions may

allow for Murphy diagrams. These can be used for forecast comparison without the choice of a specific scoring function.

◮ The scoring functions for (VaRα, ESα) allow for M-estimation

(Zwingmann & Holzmann, 2016), generalized regression (Bayer & Dimitriadis, 2017, Barendse, 2017).

51 / 52

slide-75
SLIDE 75

References

(Almost) all mentioned references can be found in:

  • T. Fissler and J. F. Ziegel (2016). Higher order elicitability

and Osband’s principle. Annals of Statistics, 44:1680–1707.

  • T. Fissler, J. F. Ziegel and T. Gneiting (2016). Expected

Shortfall is jointly elicitable with Value at Risk – Implications for

  • backtesting. Risk Magazine, January 2016.
  • N. Nolde and J. F. Ziegel (2016). Elicitability and
  • backtesting. Annals of Applied Statistics, to appear.
  • J. F. Ziegel, F. Kr¨

uger, A. Jordan and F. Fasciati (2017). Murphy Diagrams: Forecast Evaluation of Expected

  • Shortfall. Preprint, arXiv:1705.04537.

Thank you for your attention!

52 / 52