Concentration of risk measures: A Wasserstein distance approach 1 - PowerPoint PPT Presentation

Concentration of risk measures: A Wasserstein distance approach 1 Prashanth L. A. ♯ Joint work with Sanjay P. Bhat † ♯ IIT Madras † TCS Research ∗ To appear in the proceedings of NeurIPS-2019.

Introduction

Risk criteria • Conditional Value-at-Risk (Rockafellar, Ursayev 2000) • Spectral risk measures (Acerbi 2002) • Cumulative prospect theory (Tversky,Kahnemann 1992) 2

Open Question ??? Given i.i.d. samples and an empirical version of the risk measure, for a distribution with unbounded support Obtain concentration bounds for each of the three risk measures Idea: Use finite sample bounds for Wasserstein distance between empirical and true distributions 3

Empirical risk concentration: summary of contributions Our work Probability Theory and Related Fields, 2015. 1N. Fournier and A. Guillin. On the rate of convergence in Wasserstein distance of the empirical measure. distributions 1 related to Wasserstein distance between empirical and true Unified approach: For each bound, the estimation error is Our work [Cheng et al. 2018] Cumulative prospect theory Our work Spectral risk measures Our work [Brown et al.], [Gao et al.] Conditional Value-at-Risk Sub-Gaussian Bounded support Risk measure 4 Goal: Bound P [ | ˆ r n − r ( X ) | > ϵ ] ˆ r n → empirical risk using n i.i.d. samples, r ( X ) → true risk

Wasserstein Distance

Wasserstein Distance the amount of mass shipped from a neighborhood d x of x to the neighborhood the optimal shipping plan plan F • The integral above is then the total transportation distance under the shipping 5 inf The Wasserstein distance between two CDFs F 1 and F 2 on R is [ ] ∫ W 1 ( F 1 , F 2 ) = R 2 | x − y | d F ( x , y ) , where the infimum is over all joint distributions having marginals F 1 and F 2 Related to the Kantorovich mass transference problem • Ship masses around so that the initial mass distribution F 1 changes into F 2 • Shipping plan: given by joint distribution F with marginals F 1 and F 2 such that d y of y is proportional to d F ( x , y ) • Wasserstein distance between F 1 and F 2 is the transportation distance under

Wasserstein Distance: Concentration Bounds exp Probability Theory and Related Fields, 2015. 2N. Fournier and A. Guillin. On the rate of convergence in Wasserstein distance of the empirical measure. exp Higher moment bound: exp 6 Exponential moment bound: samples. Then 2 , X → r.v. with CDF F , F n → empirical CDF formed using n i.i.d. P ( W 1 ( F n , F ) > ϵ ) ≤ B ( n , ϵ ) , for any ϵ > 0, ( ( γ | X − E ( X ) | β )) If ∃ β > 1 and γ > 0 such that E < ⊤ < ∞ , then ( ( − cn ϵ 2 ) ( − cn ϵ β ) ) B ( n , ϵ ) = C I { ϵ ≤ 1 } + exp I { ϵ > 1 } ( | X − E ( X ) | β ) If ∃ β > 2 such that E < ⊤ < ∞ , then, for any η ∈ ( 0 , β ) , ( ) ( − cn ϵ 2 ) I { ϵ ≤ 1 } + n ( n ϵ ) − ( β − η ) / p I { ϵ > 1 } B ( n , ϵ ) = C

Conditional Value-at-Risk

VaR and CVaR are Risk-Sensitive Metrics c X v X 1 1 X v X v X X X Conditional Value at Risk: • Widely used in financial portfolio optimization, credit risk X 1 F X v Value at Risk: 0 95) 0 1 (say • Fix a ‘risk level’ • Let X be a continuous random variable assessment and insurance 7

VaR and CVaR are Risk-Sensitive Metrics c X v X 1 1 X v X v X X X Conditional Value at Risk: • Widely used in financial portfolio optimization, credit risk X 1 F X v Value at Risk: 0 95) (say • Let X be a continuous random variable assessment and insurance 7 • Fix a ‘risk level’ α ∈ ( 0 , 1 )

VaR and CVaR are Risk-Sensitive Metrics X X v X 1 1 X v X v X X c • Widely used in financial portfolio optimization, credit risk Conditional Value at Risk: X 1 F X v Value at Risk: • Let X be a continuous random variable assessment and insurance 7 • Fix a ‘risk level’ α ∈ ( 0 , 1 ) (say α = 0 . 95)

VaR and CVaR are Risk-Sensitive Metrics v X v X 1 1 X v X X X • Widely used in financial portfolio optimization, credit risk X c Conditional Value at Risk: Value at Risk: • Let X be a continuous random variable assessment and insurance 7 • Fix a ‘risk level’ α ∈ ( 0 , 1 ) (say α = 0 . 95) v α ( X ) = F − 1 X ( α )

VaR and CVaR are Risk-Sensitive Metrics • Widely used in financial portfolio optimization, credit risk assessment and insurance • Let X be a continuous random variable Value at Risk: Conditional Value at Risk: 1 7 • Fix a ‘risk level’ α ∈ ( 0 , 1 ) (say α = 0 . 95) v α ( X ) = F − 1 X ( α ) c α ( X ) = E [ X | X > v α ( X )] 1 − α E [ X − v α ( X )] + = v α ( X ) +

Defining CVaR 1 1 Value at Risk: For a general r.v. X , Conditional Value at Risk: 8 v α ( X ) = F − 1 X ( α ) c α ( X ) = E [ X | X > v α ( X )] 1 − α E [ X − v α ( X )] + = v α ( X ) + { } , where ( y ) + = max ( y , 0 ) ( 1 − α ) E ( X − ξ ) + c α ( X ) = inf ξ + ξ

CVaR is a Coherent Risk Metric cannot lead to increased risk. Note: VaR is not sub-additive 3 3 P. Artzner et al. ”Coherent measures of risk.” Mathematical finance 9.3 (1999). 9 • Monotonicity: If X ≤ Y , then c ( X ) ≤ c ( Y ) • Sub-additivity: c ( X + Y ) ≤ c ( X ) + c ( Y ) , i.e., diversification • Positive Homogeneity: c ( λ X ) = λ c ( X ) for any λ ≥ 0 . • Translation Invariance: For deterministic a > 0 , c ( X + a ) = c ( X ) − a .

2. Gaussian Case: Suppose X Examples Q would do and – estimating For these distributions, no separate CVaR estimate is necessary 0 1 Z Z c X • c 1 X • v 2 1 10 1. Exponential Case: Suppose X ∼ Exp ( µ ) ( ) • v α ( X ) = 1 , µ ln 1 − α • c α ( X ) = v α ( X ) + 1 µ (memoryless!)

Examples For these distributions, no separate CVaR estimate is necessary would do and – estimating 10 1 1. Exponential Case: Suppose X ∼ Exp ( µ ) ( ) • v α ( X ) = 1 , µ ln 1 − α • c α ( X ) = v α ( X ) + 1 µ (memoryless!) 2. Gaussian Case: Suppose X ∼ N ( µ, σ 2 ) • v α ( X ) = µ − σ Q − 1 ( α ) • c α ( X ) = µ + σ c α ( Z ) , Z ∼ N ( 0 , 1 )

Examples 1 For these distributions, no separate CVaR estimate is necessary 10 1. Exponential Case: Suppose X ∼ Exp ( µ ) ( ) • v α ( X ) = 1 , µ ln 1 − α • c α ( X ) = v α ( X ) + 1 µ (memoryless!) 2. Gaussian Case: Suppose X ∼ N ( µ, σ 2 ) • v α ( X ) = µ − σ Q − 1 ( α ) • c α ( X ) = µ + σ c α ( Z ) , Z ∼ N ( 0 , 1 ) – estimating µ and σ would do

CVaR estimation: The problem X , estimate Nice to have : Sample complexity O 11 Problem: Given i.i.d. samples X 1 , . . . , X n from the distribution F of r.v. c α ( X ) = E [ X | X > v α ( X )] ( 1 /ϵ 2 ) for accuracy ϵ

12 n CVaR estimate: 1 distribution F , n 1 VaR estimate: following estimates 4 : i v n 1 X i n n v n 4Serfling, R. J. (2009). Approximation theorems of mathematical statistics, volume 162. John Wiley & Sons. c n Empirical distribution function (EDF): Given samples X 1 , . . . , X n from ∑ ˆ F n ( x ) = 1 I { X i ≤ x } , x ∈ R i = 1 Using EDF and the order statistics X [ 1 ] ≤ X [ 2 ] ≤ . . . , X [ n ] , form the v n ,α = inf { x : ˆ ˆ F n ( x ) ≥ α } = X [ ⌈ n α ⌉ ] .

12 following estimates 4 : 4Serfling, R. J. (2009). Approximation theorems of mathematical statistics, volume 162. John Wiley & Sons. n n n 1 CVaR estimate: distribution F , VaR estimate: Empirical distribution function (EDF): Given samples X 1 , . . . , X n from ∑ ˆ F n ( x ) = 1 I { X i ≤ x } , x ∈ R i = 1 Using EDF and the order statistics X [ 1 ] ≤ X [ 2 ] ≤ . . . , X [ n ] , form the v n ,α = inf { x : ˆ ˆ F n ( x ) ≥ α } = X [ ⌈ n α ⌉ ] . ∑ v n ,α ) + ˆ c n ,α = ˆ ( X i − ˆ v n ,α + n ( 1 − α ) i = 1

Concentration bounds for CVaR Estimation exp Sub-Gaussian r.v.s satisfy (C1), while sub-exponential r.v.s satisfy (C2) (C2) X satisfies a higher-moment bound, i.e., or • Need to put some restrictions on the tail distribution to obtain 13 (C1) X satisfies an exponential moment bound, i.e., • Our assumptions: exponential concentration ( ( γ | X − µ | β )) ∃ β > 0 and γ > 0 s.t. E < ⊤ < ∞ , where µ = E ( X ) ( | X − µ | β ) β > 0 such that E < ⊤ < ∞

e X e X c 1 exp 14 0 s.t. Tail dominated by an exponential r.v 0 c 2 X Or b 1 2 2 2 e b Or equivalently, c 0 0 s.t. c 0 A random variable is X is sub-exponential if Tail dominated by a Gaussian A random variable is X is sub-Gaussian if ∃ σ > 0 s.t. [ e λ X ] σ 2 λ 2 2 , ∀ λ ∈ R . ≤ e E Or equivalently, letting Z ∼ N ( 0 , σ 2 ) , P [ X > ϵ ] ≤ c P [ Z > ϵ ] , ∀ ϵ > 0 .

Concentration of risk measures: A Wasserstein distance approach 1 - PowerPoint PPT Presentation

Concentration of risk measures: A Wasserstein distance approach 1 Prashanth L. A. Joint work with Sanjay P. Bhat IIT Madras TCS Research To appear in the proceedings of NeurIPS-2019. Introduction Risk criteria Conditional

Bregman and Wasserstein, with Applications to Generative Adversarial Networks (GANs) and beyond

Distance Education Distance education used to be about the distance. 1700s 1800s 1900s 2000s

Stochastic Optimization for Regularized Wasserstein Estimators ICML 2020 Francis Bach Quentin

OSMOSIS and DIFFUSION Concentration gradient Concentration Gradient - change in the concentration

Concentration Risk Measures and De-concentration Optimization Luyang Fu, Ph.D., FCAS, MAAA March

Mark-recapture distance sampling (MRDS) in Distance 7.1 Setting up Distance for MRDS

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

A variational finite volume scheme for Wasserstein gradient flows es 1 , T. O. Gallou et 2 , G.

On the Complexity of Approximating Wasserstein Barycenters Alexey Kroshnin, Darina Dvinskikh,

Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters Pavel Dvurechensky,

Wasserstein barycenters over Riemannian manifolds Brendan Pass (joint work with Y.H. Kim (UBC))

Diffusion Contaminant at Contaminant Solutes (contaminants) migrate due to concentration

Probabilistic Program Analysis and Concentration of Measure Part I: Concentration of Measure

Generative Adversarial Networks, Wasserstein Distance, and Adversarial Loss Zhiyu Min Alibaba

Geometric ergodicity in Wasserstein distance of a Metropolis algorithm based on a first-order

Ordinary Least Squares for Histogram Data based on Wasserstein Distance Rosanna Verde Antonio

The Failure of a Clearinghouse: Empirical Evidence Vincent Bignon Guillaume Vuillemey Banque de

Recent Results on Algorithmic Fairness and Meta-Learning Massimiliano Pontil Computational

Risk bounds for cl classification and re regre ression rules that interpolate Daniel Hsu

Provably Robust Boosted Decision Stumps and Trees against Adversarial Attacks Maksym

ON THE OPTIMIZATION LANDSCAPE OF NEURAL NETWORKS JOAN BRUNA , CIMS + CDS, NYU in collaboration

Structured sparsity through convex optimization Francis Bach INRIA - Ecole Normale Sup

Decision Trees COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Decision

Using Strengths Based Measures to Assess and Manage Risk of Future Negative outcomes Simone