Detecting mixtures in multivariate extremes S.H.A. Tendijck Lancaster University January 31, 2020
Motivating application 2 / 16
Motivating application 2 / 16
Two types of waves Swell versus wind waves: 3 / 16
Contents 1 Crash course in underlying theory 2 My model
Overview 1 Crash course in underlying theory 2 My model 5 / 16
Univariate extremes 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 -10 -5 0 5 10 x 6 / 16
Univariate extremes 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 -10 -5 0 5 10 x 6 / 16
Univariate extremes 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 5 10 0.02 0 -10 -5 0 5 10 x 6 / 16
Univariate extremes 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 5 10 0.02 0 -10 -5 0 5 10 x 6 / 16
Multivariate extremes 7 / 16
Multivariate extremes 7 / 16
Multivariate extremes 7 / 16
Multivariate extremes 7 / 16
Conditional extremes 8 / 16
Conditional extremes 8 / 16
Conditional extremes 8 / 16
Conditional extremes Heffernan-Tawn model: = α X + Y Z for ( X , Y ) on standard margins and Z some residual distribution, independent of X . 8 / 16
Conditional extremes Heffernan-Tawn model: Y ∣( X > u ) = α X + Z for ( X , Y ) on standard margins and Z some residual distribution, independent of X . 8 / 16
Conditional extremes Heffernan-Tawn model: Y ∣( X > u ) = α X + X β Z for ( X , Y ) on standard margins and Z some residual distribution, independent of X . 8 / 16
Conditional extremes Heffernan-Tawn model: Y ∣( X > u ) = α X + X β Z for ( X , Y ) on standard margins and Z some residual distribution, independent of X . Pros: ● It can capture both asymptotic dependence ( α = 1) and asymptotic independence ( α < 1); ● Many bivariate distributions follow this structure asymptotically; ● Extends well to multivariate distributions. 8 / 16
Conditional extremes Heffernan-Tawn model: Y ∣( X > u ) = α X + X β Z for ( X , Y ) on standard margins and Z some residual distribution, independent of X . Pros: ● It can capture both asymptotic dependence ( α = 1) and asymptotic independence ( α < 1); ● Many bivariate distributions follow this structure asymptotically; ● Extends well to multivariate distributions. Cons: ● It doesn’t capture mixture structures; ● Data needs to be on standard margins; ● Inconsistent in modelling X ∣ Y and Y ∣ X when both are large. 8 / 16
Mixtures in extremes 10 8 6 Y L 4 2 0 -2 2 4 6 8 10 12 14 X L 9 / 16
Mixtures in extremes 10 8 6 Y L 4 2 0 -2 2 4 6 8 10 12 14 X L 9 / 16
Mixtures in extremes 10 8 6 Y L 4 2 0 -2 2 4 6 8 10 12 14 X L 9 / 16
Mixtures in extremes The Heffernan-Tawn model extends to ⎧ ⎪ α 1 X + X β 1 Z 1 ⎪ with probability p ; Y ∣( X > u ) = ⎨ ⎪ α 2 X + X β 2 Z 2 with probability 1 − p . ⎪ ⎩ 9 / 16
Mixtures in extremes The Heffernan-Tawn model extends to ⎧ ⎪ α 1 X + X β 1 Z 1 ⎪ with probability p ; Y ∣( X > u ) = ⎨ ⎪ α 2 X + X β 2 Z 2 with probability 1 − p . ⎪ ⎩ What do we want: ● Fit the model; ● Estimate the number of mixture components; ● Estimate the mixture probabilities. Methods: 1 Quantile-Regression model ; 2 Fitting a Heffernan-Tawn mixture model directly. 9 / 16
Quantile Regression How do we estimate the 90% conditional quantile of Y given X ? 12 10 8 6 Y 4 2 0 -2 0 2 4 6 8 10 12 X 10 / 16
Quantile Regression How do we estimate the 90% conditional quantile of Y given X ? 12 10 8 6 Y 4 2 0 -2 0 2 4 6 8 10 12 X 10 / 16
Quantile Regression How do we estimate the 90% conditional quantile of Y given X ? 12 10 8 6 Y 4 2 0 -2 0 2 4 6 8 10 12 X 10 / 16
Quantile Regression How do we estimate the 90% conditional quantile of Y given X ? 12 10 8 6 Y 4 2 0 -2 0 2 4 6 8 10 12 X 10 / 16
Quantile Regression How do we estimate the 90% conditional quantile of Y given X ? 12 10 8 6 Y 4 2 0 -2 0 2 4 6 8 10 12 X Minimise the L 1 distance to the line, while keeping 90% below. 10 / 16
Overview 1 Crash course in underlying theory 2 My model 11 / 16
My model We assume the Heffernan-Tawn model holds, i.e., Y ∣( X > u ) = α X + X β Z . Our quantile regression model is given by α x + x β z . q τ ( x ) = 12 / 16
My model We assume the Heffernan-Tawn model holds, i.e., Y ∣( X > u ) = α X + X β Z . Our quantile regression model is given by q τ ( x ) = c + α x + x β z . 12 / 16
My model We assume the Heffernan-Tawn model holds, i.e., Y ∣( X > u ) = α X + X β Z . Our quantile regression model is given by q τ ( x ) = c + α x + x β z . For stability, we fit simultaneously for τ = 0 . 05 , 0 . 15 ,..., 0 . 95. We get 13 estimated parameters: α, ˆ ( ˆ z 10 ) . β, ˆ c , ˆ z 1 ,..., ˆ 12 / 16
My model Logistic Model 15 10 Y 5 0 -5 2 4 6 8 10 12 X 12 / 16
My model We assume a mixture HT model holds, i.e., ⎧ ⎪ α 1 X + X β 1 Z 1 with probability 1 − p , ⎪ Y ∣( X > u ) = ⎨ ⎪ α 2 X + X β 2 Z 2 ⎪ with probability p . ⎩ where α 1 > α 2 . 13 / 16
My model We assume a mixture HT model holds, i.e., ⎧ ⎪ α 1 X + X β 1 Z 1 with probability 1 − p , ⎪ Y ∣( X > u ) = ⎨ ⎪ α 2 X + X β 2 Z 2 ⎪ with probability p . ⎩ where α 1 > α 2 . Our quantile regression model is given by ⎧ ⎪ c 1 + α 1 x + x β 1 z if τ > p , ⎪ q τ ( x ) ∼ ⎨ ⎪ c 2 + α 2 x + x β 2 z if τ < p , ⎪ ⎩ 13 / 16
My model We assume a mixture HT model holds, i.e., ⎧ ⎪ α 1 X + X β 1 Z 1 with probability 1 − p , ⎪ Y ∣( X > u ) = ⎨ ⎪ α 2 X + X β 2 Z 2 ⎪ with probability p . ⎩ where α 1 > α 2 . Our quantile regression model is given by ⎧ ⎪ c 1 + α 1 x + x β 1 z if τ > p , ⎪ q τ ( x ) ∼ ⎨ ⎪ c 2 + α 2 x + x β 2 z if τ < p , ⎪ ⎩ For stability, we fit simultaneously for τ = 0 . 05 , 0 . 15 ,..., 0 . 95. We get 17 estimated parameters: α 2 , ˆ β 1 , ˆ ( ˆ z 10 ) . p , ˆ α 1 , ˆ β 2 , ˆ c 1 , ˆ c 2 , ˆ z 1 , ..., ˆ 13 / 16
My model Asymmetric Logistic Model 15 10 5 Y 0 -5 -10 2 4 6 8 10 12 14 X 13 / 16
Estimating the number of components Best fit with 1 mixture(s) 10 8 6 4 Y 2 0 -2 -4 -6 2 3 4 5 6 7 8 9 10 11 X 14 / 16
Estimating the number of components Best fit with 1 mixture(s) Best fit with 2 mixture(s) 10 12 8 10 8 6 6 4 4 Y 2 Y 2 0 0 -2 -2 -4 -4 -6 -6 2 3 4 5 6 7 8 9 10 11 2 3 4 5 6 7 8 9 10 11 X X 14 / 16
Estimating the number of components Best fit with 1 mixture(s) Best fit with 2 mixture(s) Best fit with 3 mixture(s) 10 12 12 8 10 10 8 8 6 6 6 4 4 4 Y 2 Y Y 2 2 0 0 0 -2 -2 -2 -4 -4 -4 -6 -6 -6 2 3 4 5 6 7 8 9 10 11 2 3 4 5 6 7 8 9 10 11 2 3 4 5 6 7 8 9 10 11 X X X Question: How can we compare? 14 / 16
Estimating the number of components Best fit with 1 mixture(s) Best fit with 2 mixture(s) Best fit with 3 mixture(s) 10 12 12 8 10 10 8 8 6 6 6 4 4 4 Y 2 Y Y 2 2 0 0 0 -2 -2 -2 -4 -4 -4 -6 -6 -6 2 3 4 5 6 7 8 9 10 11 2 3 4 5 6 7 8 9 10 11 2 3 4 5 6 7 8 9 10 11 X X X Question: How can we compare? Method: 10-fold cross-validation. 14 / 16
Estimating the number of components 10 4 1.206 1.204 Cross-Validation Statistics 1.202 1.2 1.198 1.196 1.194 1.192 1.19 1.188 1 2 3 4 5 6 7 8 9 Number of components 14 / 16
Estimating the number of components 0.04 9 0.035 8 7 0.03 6 5 0.025 4 3 0.02 2 1 0.015 0.01 0.005 0 1.185 1.19 1.195 1.2 1.205 1.21 10 4 CV statistics density 14 / 16
Assessing the model fit 20 15 10 Y 5 0 -5 2 4 6 8 10 12 X 15 / 16
Assessing the model fit 20 15 10 Y 5 0 -5 2 4 6 8 10 12 X 15 / 16
Assessing the model fit 20 15 10 Y 5 0 -5 2 4 6 8 10 12 X 15 / 16
Assessing the model fit 20 15 10 Y 5 0 -5 2 4 6 8 10 12 X 15 / 16
Assessing the model fit 20 15 10 Y 5 0 -5 2 4 6 8 10 12 X Method p ˆ 95% confidence interval Simulation Quantile Regression 15 / 16
Assessing the model fit 20 15 10 Y 5 0 -5 2 4 6 8 10 12 X Method p ˆ 95% confidence interval Simulation 1 . 11 ⋅ 1 e − 5 ( 1 . 00 , 1 . 21 ) ⋅ 1 e − 5 Quantile Regression 0 . 90 ⋅ 1 e − 5 ?? 15 / 16
Problems Is this method already perfect? 16 / 16
Problems Is this method already perfect? No, there are just a couple of minor issues: 16 / 16
Problems Is this method already perfect? No, there are just a couple of minor issues: 1 Cross-Validation statistics are not necessarily convex; 16 / 16
Problems Is this method already perfect? No, there are just a couple of minor issues: 1 Cross-Validation statistics are not necessarily convex; 2 Not trivial how to fit this framework into a Bayesian setting; 16 / 16
Recommend
More recommend