Isotonic Distributional Regression (IDR) Leveraging Monotonicity, - PowerPoint PPT Presentation

Isotonic Distributional Regression (IDR) Leveraging Monotonicity, Uniquely So! Tilmann Gneiting Heidelberg Institute for Theoretical Studies (HITS) Karlsruhe Institute of Technology (KIT) Alexander Henzi Johanna F. Ziegel Universit¨ at Bern MMMS2 June 2020

Isotonic Distributional Regression (IDR) 1 What is Regression? 2 Mathematical Background 2.1 Calibration and Sharpness 2.2 Proper Scoring Rules 2.3 Partial Orders 3 Isotonic Distributional Regression (IDR) 3.1 Definition, Existence, and Universality 3.2 Computing 3.3 Synthetic Example 4 Case Study on Precipitation Forecasts 5 Discussion

Origins of Regression regression originates from arguably the most notorious priority dispute in the history of mathematics and statistics between Carl-Friedrich Gauss (1777–1855) and Adrien-Marie Legendre (1752–1833) over the method of least squares ◮ Stigler (1981): “Gauss probably possessed the method well before Legendre, but [. . . ] was unsuccessful in communicating it to his contemporaries”

Current Views: Distributional Regression Wikipedia notes that ◮ “commonly, regression analysis estimates the conditional expectation [. . . ] Less commonly, the focus is on a quantile [. . . ] of the conditional distribution [. . . ] In all cases, a function of the independent variables called the regression function is to be estimated” ◮ “it is also of interest to characterize the variation of the dependent variable around the prediction of the regression function using a probability distribution” Hothorn, Kneib and B¨ uhlmann (2014) argue forcefully that the ◮ “ultimate goal of regression analysis is to obtain information about the conditional distribution of a response given a set of explanatory variables” in a nutshell, distributional regression ◮ uses training data { ( x i , y i ) ∈ X × R : i = 1 , . . . n } to estimate the conditional distribution of the response variable, y ∈ R , given the explanatory variables or covariates, x ∈ X ◮ isotonic distributional regression (IDR) uses monotonicity relations to find nonparametric conditional distributions

Isotonic Distributional Regression (IDR) . . . in Pictures 60 40 Y 20 0 0.0 2.5 5.0 7.5 10.0 X bivariate point cloud — regression of Y on X

Isotonic Distributional Regression (IDR) . . . in Pictures 60 40 Y 20 0 0.0 2.5 5.0 7.5 10.0 X linear ordinary least squares (OLS; L 2 ) regression line

Isotonic Distributional Regression (IDR) . . . in Pictures 60 40 Y 20 0 0.0 2.5 5.0 7.5 10.0 X linear L 2 regression line with 80% prediction intervals

Isotonic Distributional Regression (IDR) . . . in Pictures 60 40 Y 20 0 0.0 2.5 5.0 7.5 10.0 X linear L 1 regression line — median regression

Isotonic Distributional Regression (IDR) . . . in Pictures 60 40 Y 20 0 0.0 2.5 5.0 7.5 10.0 X linear quantile regression — levels 0.10, 0.30, 0.50, 0.70, 0.90

Isotonic Distributional Regression (IDR) . . . in Pictures 60 40 Y 20 0 0.0 2.5 5.0 7.5 10.0 X linear quantile regression — zoom in

Isotonic Distributional Regression (IDR) . . . in Pictures 4 2 Y 0 −2 0.0 0.2 0.4 0.6 X linear quantile regression — beware quantile crossing

Isotonic Distributional Regression (IDR) . . . in Pictures 60 40 Y 20 0 0.0 2.5 5.0 7.5 10.0 X linear quantile regression

Isotonic Distributional Regression (IDR) . . . in Pictures 60 40 Y 20 0 0.0 2.5 5.0 7.5 10.0 X nonparametric isotonic mean ( L 2 ) regression

Isotonic Distributional Regression (IDR) . . . in Pictures 60 40 Y 20 0 0.0 2.5 5.0 7.5 10.0 X nonparametric isotonic median ( L 1 ) regression

Isotonic Distributional Regression (IDR) . . . in Pictures 60 40 Y 20 0 0.0 2.5 5.0 7.5 10.0 X nonparametric isotonic quantile regression

Isotonic Distributional Regression (IDR) . . . in Pictures 60 40 Y 20 0 0.0 2.5 5.0 7.5 10.0 X isotonic distributional regression (IDR)

Isotonic Distributional Regression (IDR) . . . the Details isotonic distributional regression (IDR) uses training data of the form { ( x i , y i ) ∈ X × R : i = 1 , . . . n } to estimate a conditional distribution of the response variable or outcome, y ∈ R , given the explanatory variables or covariates, x ∈ X takes advantage of known or assumed nonparametric monotonicity relations between the covariates, x , and the real-valued outcome, y has primary uses in prediction and forecasting, where we know the covariates x , but do not know the outcome y a full understanding relies on a number of (partly, rather recent) mathematical concepts and developments, namely, ◮ calibration and sharpness, ◮ proper scoring rules, and ◮ partial orders

What is the Goal in Distributional Regression? the transition from classical regression to distributional regression poses unprecedented challenges, in that ◮ the regression functions are conditional predictive distributions in the form of probability measures or, equivalently, cumulative distribution functions (CDFs) ◮ the outcomes are real numbers ◮ so, in order to evaluate distributional regression techniques, we need to compare apples and oranges! guiding principle: the goal is to maximize the sharpness of the conditional predictive distributions subject to calibration ◮ calibration refers to the statistical compatibility between the conditional predictive CDFs and the outcomes ◮ essentially, the outcomes ought to be indistinguishable from random draws from the conditional predictive CDFs ◮ sharpness refers to the concentration of the conditional predictive distributions ◮ the more concentrated the better, subject to calibration

Probabilistic Framework Setting We consider a probability space (Ω , A , Q ), where the members of the sample space Ω are tuples ( X , F X , Y , V ) , such that ◮ the random vector X takes values in the covariate space X (the explanatory variables or covariates), ◮ F X is a CDF-valued random quantity that uses information based on X only (the conditional predictive distribution or regression function for Y , given X ), ◮ Y is a real-valued random variable (the outcome), and ◮ V is uniformly distributed on the unit interval and independent of X and Y (a randomization device). Definition The CDF-valued regression function F X is ideal if F X = L ( Y | X ) almost surely.

Notions of Calibration Definition Let F X be a CDF-valued regression function with probability integral transform (PIT) Z = F X ( Y − ) + V [ F X ( Y ) − F X ( Y − )] . Then F X is (a) probabilistically calibrated if Z is uniformly distributed, (b) threshold calibrated if Q ( Y ≤ y | F X ( y )) = F X ( y ) almost surely for all y ∈ R . Theorem An ideal regression function is both probabilistically calibrated and threshold calibrated. Remark In practice, calibration is assessed by plotting PIT histograms ◮ U-shaped PIT histograms indicate underdispersed forecasts with prediction intervals that are too narrow on average ◮ skewed PIT histograms indicate biased predictive distributions

Scoring Rules scoring rules seek to quantify predictive performance, assessing calibration and sharpness simultaneously a scoring rule is a function S( F , y ) that assigns a negatively oriented numerical score to each pair ( F , y ), where F is a probability distribution, represented by its cumulative distribution function (CDF), and y is the real-valued outcome a scoring rule S is proper if E Y ∼ G [S( G , Y )] ≤ E Y ∼ G [S( F , Y )] for all F , G , and strictly proper if, furthermore, equality implies F = G truth serum: under a proper scoring rule truth telling is an optimal stra- tegy in expectation characterization results relate closely to convex analysis (Gneiting and Raftery 2007)

Isotonic Distributional Regression (IDR) Leveraging Monotonicity, - PowerPoint PPT Presentation

Isotonic Distributional Regression (IDR) Leveraging Monotonicity, Uniquely So! Tilmann Gneiting Heidelberg Institute for Theoretical Studies (HITS) Karlsruhe Institute of Technology (KIT) Alexander Henzi Johanna F. Ziegel Universit at Bern

Online isotonic regression Wojciech Kot lowski DA2PL 2018 Pozna n University of

Repository (IDR) Dr. Chris Harle Becky Liao Integrated Data Repository (IDR) Mar. 3, 2020

Simple Inter-AS CoS draft-knoll-idr-qos-attribute draft-knoll-idr-cos-interconnect Thomas Martin

Distributional Semantics The unsupervised modeling of meaning on a large scale Tim Van de Cruys

IDR a brief introduction Martin H. Gutknecht ETH Zurich, Seminar for Applied Mathematics

Mitigation of BGP Route Leaks ietf-idr-route-leak-detection-mitigation-06 (Route leak definition:

FlowSpec MPLS Match draft-yong-idr-flowspec-mpls-match-00 Lucy Yong, Sue Hares, Qiangdeng Liang,

Large BGP Community draft-heitz-idr-large-community-00 Jakob Heitz (Cisco) Keyur Patel (Cisco)

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Distributional Compositionality Intro to Distributional Semantics Raffaella Bernardi University

Linear mixed models with improper priors and flexible distributional assumptions for longitudinal

Statistics and Samples in Distributional Reinforcement Learning Mark Rowland, Robert Dadashi,

Statistics and Samples in Distributional Reinforcement Learning Rowland, Dadashi, Kumar, Munos,

Compositional Distributional Semantic Models for Semantic Relatedness and Entailment Sidharth

Automatic construction of distributional thesaurus (for multiple languages) Zheng ZHANG 1 st

Chemistry 120 Fall 2016 Instructor: Dr. Upali Siriwardane e-mail: upali@latech.edu Office: CTH

problems of direct input and solutions Indirect vs. Direct pointing Absolute vs. Relative

Input Devices Robert W. Lindeman Worcester Polytechnic Institute Department of Computer Science

t tr rtt

Investigation of up-and-down strategies for isotonic dose-finding Anastasia Ivanova Department

Learning with Submodular Functions Francis Bach Sierra project-team, INRIA - Ecole Normale Sup

High Performance Linear System Solvers with Focus on Graph Laplacians Richard Peng Georgia Tech

Overview of PySpark MLlib BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science