Multi-Observation Elicitation July 2017 Sebastian Casalaina-Martin Colorado Rafael Frongillo Colorado Tom Morgan Harvard Bo Waggoner UPenn 1 / 11
Background: Properties of distributions Property or statistic of a probability distribution: Γ : ∆ Y → R Examples: Γ( p ) = E Y ∼ p Y mean 1 Γ( p ) = � y p ( y ) log entropy p ( y ) Γ( p ) = argmax y p ( y ) mode Γ( p ) = E Y ∼ p ( Y − E Y ) 2 variance 2 / 11
Background: Elicitation (1) If we minimize expected loss , what do we get? 3 / 11
Background: Elicitation (1) If we minimize expected loss under a distribution p , what property of p do we get? r ∗ = argmin Y ∼ p ℓ ( r, Y ) E minimize loss r ∈R
Background: Elicitation (1) If we minimize expected loss under a distribution p , what property of p do we get? r ∗ = argmin Y ∼ p ℓ ( r, Y ) E minimize loss r ∈R Γ( p ) = ψ ( r ∗ ) link 3 / 11
Background: Elicitation (1) If we minimize expected loss under a distribution p , what property of p do we get? r ∗ = argmin Y ∼ p ℓ ( r, Y ) E minimize loss r ∈R Γ( p ) = ψ ( r ∗ ) link Motivation: statistically consistent losses. Finite property space: classification, ranking, . . . Γ( p ) ∈ R d : regression, . . . 3 / 11
Background: Elicitation (2) If we minimize expected loss under a distribution p , what property of p do we get? r ∗ = argmin Y ∼ p ℓ ( r, Y ) E minimize loss r ∈R Γ( p ) = ψ ( r ∗ ) link Examples: The mean is elicited by squared loss . Variance : elicit mean and second moment, then link. Any property is a link from the whole distribution . . . but dimension of prediction r is unbounded. . . 4 / 11
This paper What if the loss takes multiple i.i.d. observations? r ∗ = argmin Y 1 ,...,Y m ∼ p ℓ ( r, Y 1 , . . . , Y m ) E r ∈R 5 / 11
This paper What if the loss takes multiple i.i.d. observations? r ∗ = argmin Y 1 ,...,Y m ∼ p ℓ ( r, Y 1 , . . . , Y m ) E r ∈R Examples: 2 ( Y 1 − Y 2 ) 2 � 2 . r − 1 � Var( p ) = argmin r E 2 -norm: unbounded dimension → 1 dimension, 2 observations! 5 / 11
This paper What if the loss takes multiple i.i.d. observations? r ∗ = argmin Y 1 ,...,Y m ∼ p ℓ ( r, Y 1 , . . . , Y m ) E r ∈R Examples: 2 ( Y 1 − Y 2 ) 2 � 2 . r − 1 � Var( p ) = argmin r E 2 -norm: unbounded dimension → 1 dimension, 2 observations! Motivating applications: Crowd labeling Numerical simulations climate science, engineering, . . . Regression? 5 / 11
3 all distributions with mean 2 1 2 Key concepts from prior research Elicitable properties have convex level sets , linear structures. Simplex on Y = { 1 , 2 , 3 } : 3 all distributions with mode 3 1 2 Mode 6 / 11
Key concepts from prior research Elicitable properties have convex level sets , linear structures. Simplex on Y = { 1 , 2 , 3 } : 3 3 all distributions all distributions with mode 3 with mean 2 1 2 1 2 Mode Mean 6 / 11
Results (1) Geometric approach Summary: k -observation level sets ↔ zeros of degree- k polynomials 7 / 11
Results (2) Upper and lower bounds. y p ( y ) k � 1 /k �� Key example: (integer) k -norm ( p ) = . Idea: 1 [ Y 1 = · · · = Y k ] is an unbiased estimator for � p � k k . 8 / 11
Results (2) Upper and lower bounds. y p ( y ) k � 1 /k �� Key example: (integer) k -norm ( p ) = . Idea: 1 [ Y 1 = · · · = Y k ] is an unbiased estimator for � p � k k . � 2 � Loss ( r, Y 1 , . . . , Y k ) = r − 1 [ Y 1 − · · · = Y k ] . Link ( r ) = r 1 /k . 8 / 11
Results (2) Upper and lower bounds. y p ( y ) k � 1 /k �� Key example: (integer) k -norm ( p ) = . Idea: 1 [ Y 1 = · · · = Y k ] is an unbiased estimator for � p � k k . � 2 � Loss ( r, Y 1 , . . . , Y k ) = r − 1 [ Y 1 − · · · = Y k ] . Link ( r ) = r 1 /k . Similar approach for products of expectations. Lower bound: k -norm requires k observations. Lower bound approach is general (algebraic geometry). 8 / 11
6 1.8 Mean 5 1.6 Fitted mean 4 2nd moment 1.4 Fitted 2nd 3 1.2 Var(y|x) 2 1.0 y 1 0.8 0 0.6 1 0.4 Fitted variance: old approach 2 0.2 Variance: new approach/true 3 0.0 0 1 2 3 4 5 0 1 2 3 4 5 x x Why could this be useful? Problem: Regress x vs Var( y | x ) . 9 / 11
6 1.8 Mean 5 1.6 Fitted mean 4 2nd moment 1.4 Fitted 2nd 3 1.2 Var(y|x) 2 1.0 y 1 0.8 0 0.6 1 0.4 Fitted variance: old approach 2 0.2 Variance: new approach/true 3 0.0 0 1 2 3 4 5 0 1 2 3 4 5 x x Why could this be useful? Problem: Regress x vs Var( y | x ) . Old approach: Regress on mean and second moment, then link. 9 / 11
Why could this be useful? Problem: Regress x vs Var( y | x ) . Old approach: Regress on mean and second moment, then link. 6 1.8 Mean 5 1.6 Fitted mean 4 2nd moment 1.4 Fitted 2nd 3 1.2 Var(y|x) 2 1.0 y 1 0.8 0 0.6 1 0.4 Fitted variance: old approach 2 0.2 Variance: new approach/true 3 0.0 0 1 2 3 4 5 0 1 2 3 4 5 x x = ⇒ Requires good modeling and sufficient data for these (unimportant) proxies! 9 / 11
Future directions Elicitation frontiers and ( d, m ) -elicitability In paper: central moments Regression In paper: preliminary results Additional useful examples e.g. expected max of k draws; risk measures Lots of COLT questions for multi-observation losses! Thanks! 10 / 11
Aside - comparison to property testing Property Testing Algorithmic problem Distribution p is initially unknown Algorithm draws samples to estimate property or test hypothesis 11 / 11
Aside - comparison to property testing Property Testing Algorithmic problem Distribution p is initially unknown Algorithm draws samples to estimate property or test hypothesis Property Elicitation Existential questions , e.g.. . . . . . does there exist a one-dim. loss function eliciting variance? no . . . two-dimensional? yes . . . describe all losses directly eliciting the mean divergences 11 / 11
Recommend
More recommend