"RobExtremes" – Robust Extreme Value Statistics Outline – a New Member in the RobASt-Family of R Packages Introduction Project “Robust Risk Estimation” UseR! 2013 Infrastructure of R packages Packages for distributions: distr family Package for models: distrMod Nataliya Horbenko 1 , Matthias Kohl 2 , Our Robustification Approach Peter Ruckdeschel 3 , 4 Contribution of Robust Statistics: Outliers Packages for robust asymptotic statistics: RobASt family 1 KPMG AG Wirtschaftsprüfungsgesellschaft, Optimally robust estimation in R The SQUAIRE / Am Flughafen, 60549 Frankfurt/Main , Germany Storing interpolation grids 2 Furtwangen University , Dept. of Medical and Life Sciences, Jakob-Kienzle-Straße 17, 78054 Villingen-Schwenningen , Germany New package RobExtremes 3 Fraunhofer ITWM , Dept. of Financial Mathematics, Concept Fraunhofer-Platz 1, 67663 Kaiserslautern , Germany GPD as parametric model 4 TU Kaiserslautern , Dept. of Mathematics, Erwin-Schrödinger-Straße, Geb 48, 67663 Kaiserslautern , Germany Application to OpRisk quantification Setup Challenges Albacete, Spain, July 11, 2013 Optimally robust estimation in RobExtremes Diagnostic plots 1 2 Who we are: Project funded by Introduction Project “Robust Risk Estimation” . . . aims at theoretical foundation, development and application of robust procedures for risk management of complex systems in the presence of extreme events common base: < theory > robust statistics < implementation > Rpkg’s of distr & RobASt families 3 4
Available Infrastructure for Distributions: distr family Package name Short description distr S4 classes for distributions distrEx Functionals for distributions Infrastructure of R packages distrMod S4 classes for probability models distrEllipse S4 classes for elliptically contoured distrs Packages for distributions: distr family distrRmetrics S4 classes for distrs from fBasics & fGarch Package for models: distrMod distrSim S4 classes for simulations distrTEst S4 classes for estimation and testing distrTeach Extensions for teaching distrDoc Documentation for distr packages startupmsg Utilities for start-up messages SweaveListingUtils Utilities for Sweave 5 6 Distributions as Objects and Arithmetics Functionals for Distributions – the E Operator (distrEx) ## Initialize GPD-object R> GPD <- GPareto(loc = 10, scale = 2, shape = 0.5) Density of AbscontDistribution CDF of AbscontDistribution Quantile function of AbscontDistribution ## Create a normal and a Poisson Distribution 1.0 ## Returns analytical value 1.0 R> N <- Norm(mean = 0, sd = 3) R> E(GPD) R> P <- Pois(lambda = 2) [1] 14 0.8 ## identical calls for r (RNG), d (density), 0.8 ## Classical integration of density 0.5 ## p (cdf), q (quantile fct) R> E(as(GPD, "AbscontDistribution")) R> c(p(N)(.5), p(P)(2)) 0.6 [1] 0.5661838 0.6766764 0.6 � ∞ i.e., numerically compute E GPD = −∞ x d GPD ( x ) λ ( dx ) R> c(q(N)(.5), q(P)(.5)) d(x) p(q) q(p) 0.0 [1] 0 2 0.4 0.4 [1] 13.40216 ## Arithmetics R> X <- sin(N+P) ## Integration with probability integral transform −0.5 R> c(p(X)(.5), q(X)(.5)) 0.2 0.2 [1] 0.6642434 0.008785884 � 1 i.e., numerically compute E fun ( GPD ) = 0 fun ( q GPD ( x )) λ ( dx ) ## plotting density, cdf and quantile fct 0.0 −1.0 0.0 R> plot(X) R> E(GPD, fun=function(x){x}) [1] 13.99747 −1.0 0.0 0.5 1.0 −1.0 0.0 0.5 1.0 0.0 0.4 0.8 x q p ## Identical code for all distribution objects R> E(Pois(lambda = 10)) [1] 10 7 8
Maximum Likelihood Estimation (distrMod) Maximum Likelihood Estimation (distrMod) Copper in in wholemeal flour 30 Operational risk data Operational risk data mean ● zoomed 95%−CI of mean ML fit 0.008 25 0.0015 0.006 20 Parts per million [ppm] 0.0010 Density Density 0.004 15 0.0005 0.002 10 0.0000 0.000 ● 5 ● ● ● ● ● ● ● 0 10000 30000 50000 0 500 1000 1500 2000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Amount of loss Amount of loss 5 10 15 20 Observation Index 9 10 Outliers • What makes an observation an outlier? – happens rarely ( 5 % – 10 % ) – u ncontrollable, of u nknown distribution, u npredictable Our Robustification Approach – often: no error-free distinction from ideal obs.’ Contribution of Robust Statistics: Outliers – outlier situation may change from obs. to obs. • Despite outliers: Packages for robust asymptotic statistics: RobASt family realistic sample should be close to one from ideal setting Optimally robust estimation in R • neighborhoods : U = { Q | Q = ( 1 − ε ) P θ + ε H } note: no standard mixing model Storing interpolation grids H may vary from obs. to obs.! • accuracy as maxMSE ( S n ) := max U E Q | S n − θ | 2 11 12
Available Infrastructure for Robust Asymptotic Statistics: Optimally robust estimation in R RobASt family ## data: wholemeal flour Copper in in wholemeal flour R> library(ROptEst) 30 R> s0 <- c(median(chem), mad(chem)) mean ● R> ROest1 <- roptest(chem, 95%−CI of mean NormLocationScaleFamily()) Package name Short description RMXE 95%−CI of RMXE 25 # speed-up by interpolation: RandVar Implementation of random variables R> library(RobLox) RobAStBase Robust Asymptotic Statistics R> ROest2 <- roblox(chem, returnIC = TRUE) 20 Parts per million [ppm] ROptEst Optimally robust estimation R> rbind(estimate(ROest1),estimate(ROest2)) mean sd RobAStRDA sysdata.rda for pkg’s of RobASt - Family [1,] 3.163591 0.6613414 15 [2,] 3.338290 0.6184967 RobExtremes Opt-rob. est’ors for extreme value distr’s. RobLox Opt-rob. ICs and est’ors for location and scale 10 • roptest computes estimator with RobLoxBioC Opt-rob. est’ors for preprocessing omics data min max (as)MSE ROptEstOld Optimally robust estimation - old version ● 5 • w/o specifying outlier rate, roptest ● ● ● ● ● ● ● ● ● ● ● ● ● ● ROptRegTS Opt-rob. est’ors for regression-type models ● ● ● ● ● ● ● ● selects least favor. rate � RMXE RobRex Opt-rob. est’ors for regression and scale 5 10 15 20 • roptest takes 28sec, Observation Index roblox ∼ 0.1sec 13 14 Storing interpolation grids — some R insights just seen: interpolation is useful; technique: not only store grids, but also interpolating fct’s Issues and solutions / lessons learnt New package RobExtremes issue return values from approxfun and splinefun generated ≤ R-2.15.2 no longer valid ≥ R-3.0.0 and vice versa Concept solution store two sets of interpolators and switch acc. to R-version at run-time GPD as parametric model issue with many models/procedures, pkg containing interpolators can get large � conflict with CRAN policies solution delegate interpolators to separate (less frequently updated) package � RobAStRDA issue conflicts with namespaces when modifying interpolators outside pkg solution functions to generate/manipulate interpolators from within pkg namespace 15 16
New package RobExtremes Extreme Value Setup: GPD as parametric model • infrastructure for opt-rob. estimation for extreme value • Fisher-Tippett-Gnedenko Theorem: distributions / scale shape models, i.e., possible limit distributions of max ( X i ) have H θ ( x ) = exp ( − ( 1 + ξ ( x − µ ) /β ) − 1 /ξ ) ( GEVD ) cdf – Gamma – Generalized Extreme Value D. – Weibull – Gumbel – Generalized Pareto D. – Pareto • Pickands-Balkema-de Haan Theorem: • particular methods for expectations linked to tails ∼ Generalized Pareto distribution ( GPD ) • high breakdown starting estimators GPD: F θ ( x ) 1.0 – scale functionals: Sn and Qn (Rousseeuw&Croux[93]), F θ ( x ) = 1 − ( 1 + ξ ( x − µ ) /β ) − 1 /ξ cdf kMAD (asym. variant of mad , R.&Horbenko[12]) 0.8 – LDEstimators (Marazzi&Ruffieux[99]) Parameter θ = ( ξ, β, µ ) τ : in particular medkMAD , medSn , and medQn 0.6 – Pickands’ estimator (including asy. variance and IC) • shape ξ ( ≥ 0 ) (tail behavior) 0.4 ξ = 0.7 [goal] – Quantile estimator for Weibull (Boudt et al.[11]) β = 1 • scale β 0.2 µ = 0 [goal] • speed up by interpolation for opt-rob. estimators • location/threshold µ ( ≤ x ) 0.0 [fixed] • enhanced diagnostic plots (from RobAStBase ) 1e−04 1e−01 1e+02 1e+05 17 18 Application to OpRisk quantification Setup • OpRisk :: risk of loss resulting from inadequate or failed internal processes, people and systems or from external events Application to OpRisk quantification • Basel II: standards for regulatory capital required to cover losses from OpRisk • assessed by Loss Distribution Approach (LDA): Setup model severity and frequency of losses separately and cell-wise in a matrix built by business lines and event types , Challenges OpRisk quantified as 99 % -OpVaR, i.e., 99 % quantile of resp. compound distr. • involves parameter estimation in GPD Optimally robust estimation in RobExtremes Data [source Algorithmics, Inc. (IBM) ] Diagnostic plots • data: losses in business line Asset Management (AM in the sequel) • collected from 2431 institutes in last 20yrs • 600 observed damages > 1Mio USD • frequency: λ = 0 . 012 / yr 19 20
Recommend
More recommend