Detecting seasonality changes in multivariate extremes of - PowerPoint PPT Presentation

Detecting seasonality changes in multivariate extremes of climatological time series Philippe Naveau Laboratoire des Sciences du Climat et l’Environnement, France joint work with Sebastian Engelke (Geneva University) and Chen Zhou (Erasmus University)

Motivation : heavy rainfall in Brittany

Two regions 50 48 46 44 42 −4 −2 0 2 4 6 8 Longitudes

Daily rainfall from 1976 to 2015 BREST 50 40 Rainfall in mmm 30 20 10 0 Seasons in 1976−1977 LANVEOC 40 30 Rainfall in mmm 20 10 0 Seasons in 1976−1977 QUIMPER 70 60 Rainfall in mmm 50 40 30 20 10 0 Seasons in 1976−1977

Our climatological objectives Are heavy rainfall dependence structures change from seasons to seasons? Are extreme precipitation dependence structures differ from regions to regions? Our statistical objective Detecting changes in the dependence structure in multivariate time series of extremes

Statistical desiderata Few assumptions and non-parametric models Fast, simple and general tools No complicated MEVT jargon for climatologists

Our tools Strengthen links between the MEVT ( Multivariate Extreme Value Theory ) and information theory communities by revisiting the Kullback-Leibler divergence Detecting changes in precipitation extremes structure

Why the Kullback–Leibler divergence? Machine learning Supervised learning = minimizing the KL divergence objective Proper scoring rules in forecast The logarithm score Information theory Information gain for comparing two distributions Causality theory Bayesian causal conditionals Dynamical systems Statistical mechanics (Boltzmann thermodynamic) Climate science Detection & Attribution and compound events

Kullback–Leibler divergence Definition and notation Let X and Y two random variables with pdfs f and g D KL ( X || Y ) = E f log { f ( X ) } − E f log { g ( X ) }

Kullback–Leibler divergence Definition and notation Let X and Y two random variables with pdfs f and g D KL ( X || Y ) = E f log { f ( X ) } − E f log { g ( X ) } and D ( X , Y ) = D KL ( X || Y ) + D KL ( X || Y )

Kullback–Leibler divergence Definition and notation Let X and Y two random variables with pdfs f and g D KL ( X || Y ) = E f log { f ( X ) } − E f log { g ( X ) } and D ( X , Y ) = D KL ( X || Y ) + D KL ( X || Y ) Properties D KL ( X || Y ) ≥ 0 D KL ( X || Y ) = D KL ( X 1 || Y 1 ) + D KL ( X 2 || Y 2 ) if X 1 ⊥ X 2 and Y 1 ⊥ Y 2 D KL is convex D KL is not a metric. Still, the total variation distance, δ ( X , Y ) , satisfies � δ ( X , Y ) ≤ D KL ( X || Y )

Kullback–Leibler divergence A simple example If X and Y two Bernoulli random variables with p = P ( X = 1 ) and q = P ( Y = 1 ) , then � p 1 − q � D ( X , Y ) = ( p − q ) × log q 1 − p Bernoulli Kullback Liebler divergence 12 0.8 10 6 8 Bernoulli Kullback Liebler divergence 0.6 4 q 6 0.4 4 2 0.2 2 0 0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 1.0 p probability of sucess

Kullback–Leibler divergence A more complicated example If X and Y are two multinomial random variables with p 1 , p 2 , . . . , p K and q 1 , q 2 , . . . , q K where q 1 + · · · + q K = 1 = p 1 + · · · + p K , then K � D := D ( p 1 , . . . , p K ; q 1 , . . . , q K ) = ( p j − q j )(log p j − log q j ) j = 1

Kullback–Leibler divergence Inference Let X and Y two random variables with pdfs f and g � f ( X ) � D KL ( X || Y ) = E f log g ( X ) Estimation hurdle It seems that the densities f and g need to be known to estimate D KL ( X || Y ) . This will be an issue for multivariate extremes.

Univariate case

Univariate regularly varying case P ( X > x ) = ¯ F ( x ) = x − α L X ( x ) and P ( Y > x ) = ¯ G ( x ) = x − β L Y ( x ) where x > 0, L X , L Y slowly varying and α, β > 0 are the tail indices. Examples : Cauchy, t -distribution, α -stable, Pareto,...

Exceedances above a high threshold u X u = ( X / u | X > u ) , Y u = ( Y / u | Y > u ) with respective densities f u and g u on [ 1 , ∞ ) Symmetric Kullback–Leibler divergence � f u ( X u ) � � g u ( Y u ) � D ( X u , Y u ) = E log + E log g u ( X u ) f u ( Y u )

Exceedances above a high threshold u X u = ( X / u | X > u ) , Y u = ( Y / u | Y > u ) with respective densities f u and g u on [ 1 , ∞ ) Symmetric Kullback–Leibler divergence � f u ( X u ) � � g u ( Y u ) � D ( X u , Y u ) = E log + E log g u ( X u ) f u ( Y u ) Extremes for univariate regularly varying functions u →∞ D ( X u , Y u ) = ( α − β ) 2 lim αβ

Survival functions versus densities?

PN, Guillou and Riestch (2014, JRSSB ) The KL divergence has the representation � � � � �� G ( uX u ) F ( uY u ) D ( X u , Y u ) = − 2 + E log + E log + ∆( u ) G ( u ) F ( u ) = L ( X u , Y u ) + ∆( u ) , where ∆( u ) → 0, u → ∞ , under a second-order condition.

PN, Guillou and Riestch (2014, JRSSB ) The KL divergence has the representation � � � � �� G ( uX u ) F ( uY u ) D ( X u , Y u ) = − 2 + E log + E log + ∆( u ) G ( u ) F ( u ) = L ( X u , Y u ) + ∆( u ) , where ∆( u ) → 0, u → ∞ , under a second-order condition. E xample for the GPD distribution ¯

PN, Guillou and Riestch (2014, JRSSB ) The KL divergence has the representation � � � � �� G ( uX u ) F ( uY u ) D ( X u , Y u ) = − 2 + E log + E log + ∆( u ) F ( u ) G ( u ) = L ( X u , Y u ) + ∆( u ) , where ∆( u ) → 0, u → ∞ , under a second-order condition.

PN, Guillou and Riestch (2014, JRSSB ) The KL divergence has the representation � � � � �� G ( uX u ) F ( uY u ) D ( X u , Y u ) = − 2 + E log + E log + ∆( u ) F ( u ) G ( u ) = L ( X u , Y u ) + ∆( u ) , where ∆( u ) → 0, u → ∞ , under a second-order condition. Inference only based on cdf’s For two indep. samples, X ( 1 ) , . . . , X ( n ) ∼ F and Y ( 1 ) , . . . , Y ( n ) ∼ G ,  � � � G n ( X ( i ) ) � F n ( Y ( i ) )  2 + 1 + 1 � �  , L n ( f u , g u ) = − log log N n M n G n ( u ) F n ( u ) X ( i ) > u Y ( i ) > u where F n and G n the empirical survival functions.

PN, Guillou and Riestch (2014, JRSSB ) + Engelke, PN, Zhou (2020+)

Multivariate case

Classical EVT trick Let X in R d , we don’t like to define extremes in a full multivariate space, so we condition unto one dimensional vector r ( X ) in R +

Definition of the tail region 1 Homogeneity condition of order one r ( tx ) = t × r ( x ) , for any scalar t > 0 and x ∈ R d + Examples : r ( x ) = max( x 1 , . . . , x d ) , r ( x ) = x 1 + . . . x d , r ( x ) = min( x 1 , . . . , x d ) 1. Dombry and Ribatet (2015, Statistics and its Interface)

Cutting the tail region into smaller regions Partition { x ∈ R d : r ( x ) > 1 } = � K j = 1 A j

Our main assumption under some type of partitioning Pr( X ∈ uA j ) u →∞ p j ( u ) := lim lim Pr( r ( X ) > u ) = p j ∈ ( 0 , 1 ) u →∞ Pr( Y ∈ uA j ) u →∞ q j ( u ) := lim lim Pr( r ( Y ) > u ) = q j ∈ ( 0 , 1 ) u →∞

A special case with r ( x ) = max( x 1 , 0 ) with X 1 = X 2 in distribution and X i > 0 Pr( r ( X ) > u ) = Pr( X 1 > u ) and Pr( X ∈ uA 1 ) = Pr( X 1 > u , X 2 > u ) with A 1 = { min( x 1 , x 2 ) > 1 }

A special case with r ( x ) = max( x 1 , 0 ) with X 1 = X 2 in distribution and X i > 0 Pr( r ( X ) > u ) = Pr( X 1 > u ) and Pr( X ∈ uA 1 ) = Pr( X 1 > u , X 2 > u ) with A 1 = { min( x 1 , x 2 ) > 1 } χ the classical extremal dependence coefficient u →∞ p 1 ( u ) := lim lim u →∞ Pr( X 2 > u | X 1 > u ) = χ and u →∞ p 2 ( u ) = 1 − χ lim

Main objective Two sample hypothesis testing Given two independent samples of X in R d and Y in R d , we want to test H 0 : p j = q j , ∀ j Pr( X ∈ uA j ) Pr( Y ∈ uA j ) with lim Pr( r ( X ) > u ) = p j ∈ ( 0 , 1 ) and lim Pr( r ( Y ) > u ) = q j ∈ ( 0 , 1 )

Back to the multinomial Kullback–Leibler divergence Reminder If X and Y are two multinomial random variables with p 1 , p 2 , . . . , p K and q 1 , q 2 , . . . , q K where q 1 + · · · + q K = 1 = p 1 + · · · + p K , then K � D := D ( p 1 , . . . , p K ; q 1 , . . . , q K ) = ( p j − q j )(log p j − log q j ) j = 1

Multinomial Kullback-Liebler divergence Estimation K ˆ � (ˆ p j ( u ) − ˆ q j ( v ))(log ˆ p j ( u ) − log ˆ D ( u , v ) := q j ( v )) . j = 1 with � n � n i = 1 1 X i ∈ uA j i = 1 1 Y i ∈ vA j ˆ and ˆ p j ( u ) = q j ( v ) = . � n � n i = 1 1 r ( X i ) > u i = 1 1 r ( Y i ) > v

Our main result

Multinomial Kullback-Liebler divergence Choose two sequences u n and v n such that m n = n Pr ( r ( X ) > u n ) = n Pr ( r ( Y ) > v n ) → ∞ , and m n / n → 0, and assume some second order conditions of p j ( u ) towards p j (same for q j ( u ) )

Detecting seasonality changes in multivariate extremes of - PowerPoint PPT Presentation

Detecting seasonality changes in multivariate extremes of climatological time series Philippe Naveau Laboratoire des Sciences du Climat et lEnvironnement, France joint work with Sebastian Engelke (Geneva University) and Chen Zhou (Erasmus

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

Detecting mixtures in multivariate extremes S.H.A. Tendijck Lancaster University January 31,

Detecting Spammers and Content Detecting Spammers and Content Detecting Spammers and Content

12/6/2013 Detecting Fakes Image Forensics: Detecting Forged Photos 1.Detecting photorealistic

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Detecting changes in Detecting changes in the rate the rate of a of a Poisson process

Advanced PHP Dr. Steven Bitner A/B and Multivariate testing Why use multivariate testing If

Presented by: Ashley Lock and Jane Skapinker Outline Seasonality Barbequing Hydration

A Structural Time Series Model Facilitating Flexible Seasonality Yoshinori Kawasaki The

Multivariate extremes in ensemble forecasting Hans Wackernagel MINES ParisTech, Fontainebleau

Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Tatiana

Regression Diagnostics and the Forward Search 3. A Single Multivariate Sample Anthony Atkinson,

Multivariate Linear Regression Max Turgeon STAT 4690Applied Multivariate Analysis

Robust Statistics Part 2: Multivariate location and scatter Peter Rousseeuw LARS-IASC School,

Multivariate normal distribution Surajit Ray Reader, University of Glasgow DataCamp

Computer Mediated Transactions Hal Varian Google April 7 1 Outline -- what does CMT enable?

Thank you I

t ss t ss t

Living Wage Benchmarks ASIA FLOOR WAGE Global Living Wage Coalition What & Why ? To examine

Stereo Matching Shao-Yi Chien Department of Electrical Engineering National Taiwan

First Quarter 2016 Earnings Call April 20, 2016 Forward-Looking Statements This presentation may

Ab initio shape determination Al Kikhney EMBL Hamburg, Germany Outline Simple bodies

Enabling Language Models to Fill in the Blanks Chris Donahue Percy Liang Mina Lee Paper

Detecting seasonality changes in multivariate extremes of - PowerPoint PPT Presentation

Detecting seasonality changes in multivariate extremes of climatological time series Philippe Naveau Laboratoire des Sciences du Climat et lEnvironnement, France joint work with Sebastian Engelke (Geneva University) and Chen Zhou (Erasmus

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

Detecting mixtures in multivariate extremes S.H.A. Tendijck Lancaster University January 31,

Detecting Spammers and Content Detecting Spammers and Content Detecting Spammers and Content

12/6/2013 Detecting Fakes Image Forensics: Detecting Forged Photos 1.Detecting photorealistic

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Detecting changes in Detecting changes in the rate the rate of a of a Poisson process

Advanced PHP Dr. Steven Bitner A/B and Multivariate testing Why use multivariate testing If

Presented by: Ashley Lock and Jane Skapinker Outline Seasonality Barbequing Hydration

A Structural Time Series Model Facilitating Flexible Seasonality Yoshinori Kawasaki The

Multivariate extremes in ensemble forecasting Hans Wackernagel MINES ParisTech, Fontainebleau

Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Tatiana

Regression Diagnostics and the Forward Search 3. A Single Multivariate Sample Anthony Atkinson,

Multivariate Linear Regression Max Turgeon STAT 4690Applied Multivariate Analysis

Robust Statistics Part 2: Multivariate location and scatter Peter Rousseeuw LARS-IASC School,

Multivariate normal distribution Surajit Ray Reader, University of Glasgow DataCamp

Computer Mediated Transactions Hal Varian Google April 7 1 Outline -- what does CMT enable?

Thank you I

t ss t ss t

Living Wage Benchmarks ASIA FLOOR WAGE Global Living Wage Coalition What &amp; Why ? To examine

Stereo Matching Shao-Yi Chien Department of Electrical Engineering National Taiwan

First Quarter 2016 Earnings Call April 20, 2016 Forward-Looking Statements This presentation may

Ab initio shape determination Al Kikhney EMBL Hamburg, Germany Outline Simple bodies

Enabling Language Models to Fill in the Blanks Chris Donahue Percy Liang Mina Lee Paper

Living Wage Benchmarks ASIA FLOOR WAGE Global Living Wage Coalition What & Why ? To examine