Recent work in Truncated Statistics Andrew Ilyas Motivation: - PowerPoint PPT Presentation

Theme: Maximum Likelihood Estimation Projected Gradient Descent on the Negative Log-Likelihood (NLL) • Step 3: SGD recovers the true parameters! • Ingredients: • Convexity always holds (not necessarily strong) • Guaranteed constant probability of a sample falling into α S • E ffi cient projection algorithm into the set of valid parameters (defined by ) α • Strong convexity within the projection set: H ⪰ C ⋅ α 4 ⋅ λ m ( T − 1 ) ⋅ I

Theme: Maximum Likelihood Estimation Projected Gradient Descent on the Negative Log-Likelihood (NLL) • Step 3: SGD recovers the true parameters! • Ingredients: • Convexity always holds (not necessarily strong) • Guaranteed constant probability of a sample falling into α S • E ffi cient projection algorithm into the set of valid parameters (defined by ) α • Strong convexity within the projection set: H ⪰ C ⋅ α 4 ⋅ λ m ( T − 1 ) ⋅ I • Good initialization point (i.e., assigns constant mass to ) S

Theme: Maximum Likelihood Estimation Projected Gradient Descent on the Negative Log-Likelihood (NLL) • Step 3: SGD recovers the true parameters! • Ingredients: • Convexity always holds (not necessarily strong) • Guaranteed constant probability of a sample falling into α S • E ffi cient projection algorithm into the set of valid parameters (defined by ) α • Strong convexity within the projection set: H ⪰ C ⋅ α 4 ⋅ λ m ( T − 1 ) ⋅ I • Good initialization point (i.e., assigns constant mass to ) S • Result: E ffi cient algorithm for recovering parameters from truncated data!

Truncation bias in regression

Truncation bias in regression • Goal: infer the e ff ect of height x i on basketball ability y i

Truncation bias in regression • Goal: infer the e ff ect of height x i on basketball ability y i • Strategy: linear regression

Truncation bias in regression What we expect: • Goal: infer the e ff ect of height x i on basketball ability y i • Strategy: linear regression

Bias from truncation: an illustration What we get: • Goal: infer the e ff ect of height x i on basketball ability y i • Strategy: linear regression z

Bias from truncation: an illustration What we get: • Goal: infer the e ff ect of height x i on basketball ability y i Good • Strategy: linear regression enough for NBA! z

Bias from truncation: an illustration What we get: • Goal: infer the e ff ect of height x i on basketball ability y i Good • Strategy: linear regression enough for NBA! ability z

Bias from truncation: an illustration What we get: • Goal: infer the e ff ect of height x i on basketball ability y i Good • Strategy: linear regression enough for NBA! height ability z

Bias from truncation: an illustration What we get: • Goal: infer the e ff ect of height x i on basketball ability y i Good • Strategy: linear regression enough for NBA! height ability z ε

Bias from truncation: an illustration What we get: • Goal: infer the e ff ect of height x i on basketball ability y i Good • Strategy: linear regression enough for NBA! height ability NBA? z ε

Bias from truncation: an illustration What we get: • Goal: infer the e ff ect of height x i on basketball ability y i Good • Strategy: linear regression enough for NBA! height ability NBA? Yes Observe y i z ε

Bias from truncation: an illustration What we get: • Goal: infer the e ff ect of height x i on basketball ability y i Good • Strategy: linear regression enough for NBA! height ability NBA? Yes Observe y i z ε No Player unobserved

Bias from truncation: an illustration What we get: • Goal: infer the e ff ect of height x i on basketball ability y i Good • Strategy: linear regression enough for NBA! height ability NBA? Yes Observe y i z ε No Player unobserved • Truncation: only observe data based on the value of y i

Truncation in practice Not a hypothetical problem (or a new one!)

Truncation in practice Not a hypothetical problem (or a new one!) Fig 1 [Hausman and Wise 1977]

Truncation in practice Not a hypothetical problem (or a new one!) Fig 1 [Hausman and Wise 1977] Corrected previous findings about education (x) vs income (y) a ff ected by truncation on income (y)

Truncation in practice Not a hypothetical problem (or a new one!) Fig 1 [Hausman and Wise 1977] Table 1 [Lin et al 1999] Corrected previous findings about education (x) vs income (y) a ff ected by truncation on income (y)

Truncation in practice Not a hypothetical problem (or a new one!) Fig 1 [Hausman and Wise 1977] Table 1 [Lin et al 1999] Corrected previous findings about Found bias in income (x) vs child education (x) vs income (y) a ff ected support (y) because respondence by truncation on income (y) rate di ff ers based on y

Truncation in practice Not a hypothetical problem (or a new one!) Fig 1 [Hausman and Wise 1977] Table 1 [Lin et al 1999] Corrected previous findings about Found bias in income (x) vs child education (x) vs income (y) a ff ected support (y) because respondence Has inspired lots of prior work in statistics/econometrics by truncation on income (y) rate di ff ers based on y Our goal: unified e ffi cient (polynomial in dimension) algorithm [Galton 1897; Pearson 1902; Lee 1914; Fisher 1931; Hotelling 1948; Tukey 1949; Tobin 1958; Amemiya 1973; Breen 1996; Balakrishnan, Cramer 2014]

Truncated regression and classification

Truncated regression and classification Sample a covariate x x ∼ D

Truncated regression and classification Sample noise ε , Sample a compute latent z covariate x z = h θ * ( x ) + ε x ∼ D ε ∼ D N

Truncated regression and classification Sample noise ε , Sample a compute latent z covariate x z = h θ * ( x ) + ε x ∼ D ε ∼ D N w.p. 1 - φ (z)

Truncated regression and classification Sample noise ε , Sample a compute latent z covariate x z = h θ * ( x ) + ε x ∼ D ε ∼ D N w.p. 1 - φ (z) Throw away (x,z) and restart

Truncated regression and classification Sample noise ε , Sample a compute latent z covariate x w.p. φ (z) z = h θ * ( x ) + ε x ∼ D ε ∼ D N w.p. 1 - φ (z) Throw away (x,z) and restart

Truncated regression and classification Project z to Sample noise ε , a label y Sample a compute latent z covariate x w.p. φ (z) y := π ( z ) z = h θ * ( x ) + ε x ∼ D ε ∼ D N w.p. 1 - φ (z) Throw away (x,z) and restart

Truncated regression and classification Add ( x ,y) to Project z to training set Sample noise ε , a label y Sample a compute latent z covariate x w.p. φ (z) T ∪ {( x , y )} y := π ( z ) z = h θ * ( x ) + ε x ∼ D ε ∼ D N w.p. 1 - φ (z) Throw away (x,z) and restart

Parameter estimation

̂ Parameter estimation y i ∼ π ( h θ * ( x i ) + ε ) • We have a model where , want estimate θ for θ * ε ∼ D N

̂ Parameter estimation y i ∼ π ( h θ * ( x i ) + ε ) • We have a model where , want estimate θ for θ * ε ∼ D N • Standard (non-truncated) approach: maximize likelihood

̂ Parameter estimation y i ∼ π ( h θ * ( x i ) + ε ) • We have a model where , want estimate θ for θ * ε ∼ D N • Standard (non-truncated) approach: maximize likelihood p ( θ ; x , y ) = ∫ z ∈ π − 1 ( y ) D N ( z − h θ ( x )) dz

̂ Parameter estimation y i ∼ π ( h θ * ( x i ) + ε ) • We have a model where , want estimate θ for θ * ε ∼ D N • Standard (non-truncated) approach: maximize likelihood p ( θ ; x , y ) = ∫ z ∈ π − 1 ( y ) D N ( z − h θ ( x )) dz All possible latent variables corresponding to label

̂ Parameter estimation y i ∼ π ( h θ * ( x i ) + ε ) • We have a model where , want estimate θ for θ * ε ∼ D N • Standard (non-truncated) approach: maximize likelihood p ( θ ; x , y ) = ∫ z ∈ π − 1 ( y ) Likelihood of latent under model D N ( z − h θ ( x )) dz All possible latent variables corresponding to label

̂ Parameter estimation y i ∼ π ( h θ * ( x i ) + ε ) • We have a model where , want estimate θ for θ * ε ∼ D N • Standard (non-truncated) approach: maximize likelihood p ( θ ; x , y ) = ∫ z ∈ π − 1 ( y ) Likelihood of latent under model D N ( z − h θ ( x )) dz All possible latent variables corresponding to label • Example: if is a linear function, then: h θ

̂ Parameter estimation y i ∼ π ( h θ * ( x i ) + ε ) • We have a model where , want estimate θ for θ * ε ∼ D N • Standard (non-truncated) approach: maximize likelihood p ( θ ; x , y ) = ∫ z ∈ π − 1 ( y ) Likelihood of latent under model D N ( z − h θ ( x )) dz All possible latent variables corresponding to label • Example: if is a linear function, then: h θ • If and , MLE is ordinary least squares regression ε ∼ 𝒪 (0,1) π ( z ) = z

̂ Parameter estimation y i ∼ π ( h θ * ( x i ) + ε ) • We have a model where , want estimate θ for θ * ε ∼ D N • Standard (non-truncated) approach: maximize likelihood p ( θ ; x , y ) = ∫ z ∈ π − 1 ( y ) Likelihood of latent under model D N ( z − h θ ( x )) dz All possible latent variables corresponding to label • Example: if is a linear function, then: h θ • If and , MLE is ordinary least squares regression ε ∼ 𝒪 (0,1) π ( z ) = z • If and , MLE is probit regression ε ∼ 𝒪 (0,1) π ( z ) = 1 z ≥ 0

̂ Parameter estimation y i ∼ π ( h θ * ( x i ) + ε ) • We have a model where , want estimate θ for θ * ε ∼ D N • Standard (non-truncated) approach: maximize likelihood p ( θ ; x , y ) = ∫ z ∈ π − 1 ( y ) Likelihood of latent under model D N ( z − h θ ( x )) dz All possible latent variables corresponding to label • Example: if is a linear function, then: h θ • If and , MLE is ordinary least squares regression ε ∼ 𝒪 (0,1) π ( z ) = z • If and , MLE is probit regression ε ∼ 𝒪 (0,1) π ( z ) = 1 z ≥ 0 • If and ε ∼ Logistic (0,1) , MLE is logistic regression π ( z ) = 1 z ≥ 0

̂ Parameter estimation y i ∼ π ( h θ * ( x i ) + ε ) • We have a model where , want estimate θ for θ * ε ∼ D N • Standard (non-truncated) approach: maximize likelihood p ( θ ; x , y ) = ∫ z ∈ π − 1 ( y ) Likelihood of latent under model D N ( z − h θ ( x )) dz All possible latent variables corresponding to label • Example: if is a linear function, then: h θ • If and , MLE is ordinary least squares regression ε ∼ 𝒪 (0,1) π ( z ) = z • If and , MLE is probit regression ε ∼ 𝒪 (0,1) π ( z ) = 1 z ≥ 0 • If and ε ∼ Logistic (0,1) , MLE is logistic regression π ( z ) = 1 z ≥ 0 • What about the truncated case?

Parameter estimation from truncated data

Parameter estimation from truncated data Main idea: maximization of the truncated log-likelihood

Parameter estimation from truncated data Main idea: maximization of the truncated log-likelihood • Truncated likelihood:

Parameter estimation from truncated data Main idea: maximization of the truncated log-likelihood • Truncated likelihood: p ( θ ; x , y ) = ∫ z ∈ π − 1 ( y ) D N ( z − h θ ( x )) dz

Parameter estimation from truncated data Main idea: maximization of the truncated log-likelihood • Truncated likelihood: p ( θ ; x , y ) = ∫ z ∈ π − 1 ( y ) ∫ z ∈ π − 1 ( y ) D N ( z − h θ ( x )) ϕ ( z ) dz D N ( z − h θ ( x )) dz p ( θ ; x , y ) = ∫ z D N ( z − h θ ( x )) ϕ ( z ) dz

Parameter estimation from truncated data Main idea: maximization of the truncated log-likelihood • Truncated likelihood: p ( θ ; x , y ) = ∫ z ∈ π − 1 ( y ) ∫ z ∈ π − 1 ( y ) D N ( z − h θ ( x )) ϕ ( z ) dz D N ( z − h θ ( x )) dz p ( θ ; x , y ) = ∫ z D N ( z − h θ ( x )) ϕ ( z ) dz • Again, we can compute a stochastic gradient of the log-likelihood with only oracle access to Leads to another SGD-based algorithm ϕ ⟹

Parameter estimation from truncated data Main idea: maximization of the truncated log-likelihood • Truncated likelihood: p ( θ ; x , y ) = ∫ z ∈ π − 1 ( y ) ∫ z ∈ π − 1 ( y ) D N ( z − h θ ( x )) ϕ ( z ) dz D N ( z − h θ ( x )) dz p ( θ ; x , y ) = ∫ z D N ( z − h θ ( x )) ϕ ( z ) dz • Again, we can compute a stochastic gradient of the log-likelihood with only oracle access to Leads to another SGD-based algorithm ϕ ⟹ • However: this time the loss can actually be non-convex

Parameter estimation from truncated data

Parameter estimation from truncated data • However: this time the loss can actually be non-convex

Recent work in Truncated Statistics Andrew Ilyas Motivation: - PowerPoint PPT Presentation

Recent work in Truncated Statistics Andrew Ilyas Motivation: Poincar and the Baker Motivation: Poincar and the Baker Motivation: Poincar and the Baker Claimed weight: 1 kg/loaf Motivation: Poincar and the Baker Claimed weight: 1

Truncated Differentials Lars R. Knudsen June 2014 Lars R. Knudsen Truncated Differentials

On truncated discrete moment problems Tobias Kuna University of Reading, UK (Joint work with

Bivariate Truncated Moment Problems with Algebraic Variety in the Nonnegative Quadrant in R 2

Minimax risk of truncated series estimators over symmetric convex polytopes Adel Javanmard

Truncated Sums, Matrix Iteration Giacomo Boffi

Asymmetric truncated Toeplitz operators of rank one Bartosz anucha Maria Curie-Skodowska

Truncated Sums, Matrix Iteration Giacomo Boffi

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

An R package for analyzing truncated data Alvarez 1 and Rosa M. Crujeiras 2 na- Carla Moreira 1

Recent Development in India Recent Development in India Recent Development in India Recent

Truncated Unity Functional renormalization group (TUfRG) for 2D lattices: getting more

Truncated Moment Problems with Associated Finite Algebraic Varieties (joint work with Seonguk

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

Session 8 Recent initiatives on gender statistics in the ESCAP region: filling gender data gaps

Truncated Differential Analysis of Reduced-Round LBlock Sareh Emami, Cameron McDonald, Josef

Learning Mixtures of Truncated Basis Functions from Data Helge Langseth, Thomas D. Nielsen, and

Convex Optimization Lecture Notes for EE 227BT Draft, Fall 2013 Laurent El Ghaoui August 29,

Problems for Ivan Corwins OOPS lectures Read this first: Problems should be completed by groups

Diverse Particle Selection for High-Dimensional Inference in Graphical Models Erik Sudderth UC

Individual Choice Behavior: This is a large, sprawling literature, in economics and psychology,

Benefits and Contributions Update Key Decisions Day-to-Day Chronic Hospitalisation Expenses

A Multilevel Vertex Separator Algorithm Based on the Solution of Bilinear Programs William Hager 1

SuperCDMS Soudan 15 Ge iZIP detectors (9 kg) installed in CDMS II apparatus in Soudan

The EM Algorithm for Positive Linear Inverse Problems Bernard A. Mair Department of Mathematics