An Interval Estimation Approach to Selection Bias in Observational - PowerPoint PPT Presentation

An Interval Estimation Approach to Selection Bias in Observational Studies Matt Tudball 1 with Rachael Hughes 1 , Kate Tilling 1 , Qingyuan Zhao 2 and Jack Bowden 1 1 MRC Integrative Epidemiology Unit, University of Bristol 2 Department of Statistics, Wharton School, University of Pennsylvania 20 June, 2019 Matt Tudball (MRC IEU) Selection Bias in Obs. Studies 20 June, 2019 1 / 11

Motivating Problem UK Biobank is a population cohort widely analysed by epidemiologists, health economists, clinicians, etc. During recruitment in 2006, only 500,000 of 9.2 million invited individuals subsequently enrolled in the cohort (i.e. response rate of 5.5%). Follow-up studies show that participants tend to be better educated, higher earners, lower mortality, etc. compared to the UK population. This is called the ‘healthy volunteer’ effect. Matt Tudball (MRC IEU) Selection Bias in Obs. Studies 20 June, 2019 2 / 11

Motivating Problem If we knew or could estimate each individual’s probability of entering the sample, we could perform inverse probability weighting . This adjusts our sample to be more representative of the population from which it is drawn. However, in UK Biobank, we do not observe any individual-level data on people who did not select into the sample, so this approach is not possible. Matt Tudball (MRC IEU) Selection Bias in Obs. Studies 20 June, 2019 3 / 11

Existing Literature Aronow and Lee (2013) (AL) propose a method which provides an interval of possible inverse probability weighted sample means in settings like this. The key assumption is that each individual’s probability of sample selection lies between two user-specified constants, a and b . For example, a = 1% and b = 90%. The method works by finding configurations of individual-level weights which produce the biggest and smallest sample means, given the assumption above. A big advantage of this method is that it is fully non-parametric and allows selection on unobservables. Matt Tudball (MRC IEU) Selection Bias in Obs. Studies 20 June, 2019 4 / 11

Limitations of AL No applied papers have been written which use the AL method. We believe there are 4 key reasons for this: 1) The AL method is limited to population means. 2) AL did not propose a procedure for conducting statistical inference. That is, there are no confidence intervals or hypothesis tests. 3) The bounds are often implausibly wide for reasonable choices of a and b , making interpretation difficult. 4) They assume no knowledge of the selection mechanism or population from which the sample is drawn. Matt Tudball (MRC IEU) Selection Bias in Obs. Studies 20 June, 2019 5 / 11

Summary of our Method Our method builds on the AL estimator but addresses its limitations. 1) Our method works for a wide variety of estimands, including OLS and IV. 2) We propose and validate two approaches to valid confidence intervals and hypothesis tests: one based on the percentile bootstrap and one based on the asymptotic distribution of stochastic programs. 3) We show how to force the weights to be consistent with population-level information, thus tightening the bounds, sometimes significantly. Matt Tudball (MRC IEU) Selection Bias in Obs. Studies 20 June, 2019 6 / 11

Summary of our Method: 3) We consider 3 main types of population-level information: 1) Survey response rate: We can force the optimising weights to imply the the response rate for the survey, which is typically known to researchers. 2) Population means: We can also force the optimising weights to imply known population means of variables in our sample. For example, we may want the weights to imply a male proportion of 50%. 3) Parametric assumptions: We can impose a parametric form on the weights and choose variables within our sample which we believe are predictive of selection. We then optimise over the parameters of the function. Matt Tudball (MRC IEU) Selection Bias in Obs. Studies 20 June, 2019 7 / 11

Applied Example: Education on Income We estimate the effect of leaving school later than age 15 on the likelihood of earning more than £ 31,000 per annum in UK Biobank (Davies et al, 2018). We use the 1972 ROSLA as an instrumental variable. We use a 12 month bandwidth and control for sex and month-of-birth indicators. We assume a logit specification for the weights as a function of household income over £ 31,000, years of education, days of physical activity per week and sex. Matt Tudball (MRC IEU) Selection Bias in Obs. Studies 20 June, 2019 8 / 11

Applied Example: Education on Income We consider four specifications: Constraint 1) a = 0 . 1% and b = 50% Constraint 2) Above plus constraining the direction of the selection effects. That is, we assume education, income and physical activity positively influence selection, while being male negatively influences selection. Constraint 3) Above plus constraining the response rate to be 5.5%. Constraint 4) All above plus constraining the proportion of males to be 49.5%. Matt Tudball (MRC IEU) Selection Bias in Obs. Studies 20 June, 2019 9 / 11

Applied Example: Education on Income Unweighted estimate Constraint 1) Constraint 2) Constraint 3) Constraint 4) [0.001, 0.5] 0.0 0.2 0.4 0.6 0.8 Interval Estimates Matt Tudball (MRC IEU) Selection Bias in Obs. Studies 20 June, 2019 10 / 11

Conclusion This is a flexible sensitivity analysis which works for a variety of estimands. The assumptions can be selected by the researcher, ranging from a fully non-parametric selection-on-unobservables model to a parametric selection-on-(within-sample) observables model. It also allows researchers to incorporate a suite of population-level information to tighten the bounds. Confidence intervals and hypothesis tests are available as well. * Paper will be up on arXiv very soon! Matt Tudball (MRC IEU) Selection Bias in Obs. Studies 20 June, 2019 11 / 11

An Interval Estimation Approach to Selection Bias in Observational - PowerPoint PPT Presentation

An Interval Estimation Approach to Selection Bias in Observational Studies Matt Tudball 1 with Rachael Hughes 1 , Kate Tilling 1 , Qingyuan Zhao 2 and Jack Bowden 1 1 MRC Integrative Epidemiology Unit, University of Bristol 2 Department of

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Making Generative Classifiers Robust to Selection Bias Andrew Smith Charles Elkan November

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Massive Data Algorithmics Lecture 6: Interval Trees Massive Data Algorithmics Lecture 6:

Interval Estimation Edwin Leuven Interval estimation While an estimator may be unbiased or

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

Multiplicity and Estimation P.Bauer Medical University of Vienna London, November 2012

Towards More Realistic How Interval Data Is . . . Discussion Interval Models in How to Actually

Interval Computations Interval . . . Linearization and their Possible Use Interval Arithmetic:

Dynamic Programming: Interval Scheduling and Knapsack 6.1 Weighted Interval Scheduling Weighted

Interval Analysis for Guaranteed Set Estimation MaGiX@LiX September 2011 Eric Walter (joint

Equity & Excellence: Hidden Bias Implicit Bias Inherent Bias

Bias in, Bias out: Gender Equality and the Fourth Industrial Revolution Debra Howcroft and

High-energy monitoring of Seyfert galaxies: the case of NGC 5548 and NGC 4593 Francesco

Integration of Runtime Verification into Metamodeling F. Macias T. Scheffel M. Schmitz R. Wang

Fishery Dependent Data Visioning Project Greater Atlantic Regional Fisheries Office Barry

Third quarter 2019 Axxis Geo Solutions Lee Parker, CEO and Svein Knudsen, CFO 13 November 2019

TUTORS: P. Romano & S. Vercellone Vercellone TUTORS: P. Romano & S. Swift XRT 0.2-10

Astrometry with the WFIRST WFI Robyn Sanderson for the WFIRST Astrometry Working Group WFIRST is

Kerry Trapnell, CEO Learning Outcomes After attending this presentation, attendees will be able

Longitudinal Employer-Household Dynamics (LEHD) Program Samuel R. Bondurant Dallas-Fort Worth

Sambuz

Useful Links

Newsletter

Mail Us

An Interval Estimation Approach to Selection Bias in Observational - PowerPoint PPT Presentation

An Interval Estimation Approach to Selection Bias in Observational Studies Matt Tudball 1 with Rachael Hughes 1 , Kate Tilling 1 , Qingyuan Zhao 2 and Jack Bowden 1 1 MRC Integrative Epidemiology Unit, University of Bristol 2 Department of

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Making Generative Classifiers Robust to Selection Bias Andrew Smith Charles Elkan November

BIAS BIAS LIGHT LIGHT &amp; &amp; MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Massive Data Algorithmics Lecture 6: Interval Trees Massive Data Algorithmics Lecture 6:

Interval Estimation Edwin Leuven Interval estimation While an estimator may be unbiased or

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

Multiplicity and Estimation P.Bauer Medical University of Vienna London, November 2012

Towards More Realistic How Interval Data Is . . . Discussion Interval Models in How to Actually

Interval Computations Interval . . . Linearization and their Possible Use Interval Arithmetic:

Dynamic Programming: Interval Scheduling and Knapsack 6.1 Weighted Interval Scheduling Weighted

Interval Analysis for Guaranteed Set Estimation MaGiX@LiX September 2011 Eric Walter (joint

Equity &amp; Excellence: Hidden Bias Implicit Bias Inherent Bias

Bias in, Bias out: Gender Equality and the Fourth Industrial Revolution Debra Howcroft and

High-energy monitoring of Seyfert galaxies: the case of NGC 5548 and NGC 4593 Francesco

Integration of Runtime Verification into Metamodeling F. Macias T. Scheffel M. Schmitz R. Wang

Fishery Dependent Data Visioning Project Greater Atlantic Regional Fisheries Office Barry

Third quarter 2019 Axxis Geo Solutions Lee Parker, CEO and Svein Knudsen, CFO 13 November 2019

TUTORS: P. Romano &amp; S. Vercellone Vercellone TUTORS: P. Romano &amp; S. Swift XRT 0.2-10

Astrometry with the WFIRST WFI Robyn Sanderson for the WFIRST Astrometry Working Group WFIRST is

Kerry Trapnell, CEO Learning Outcomes After attending this presentation, attendees will be able

Longitudinal Employer-Household Dynamics (LEHD) Program Samuel R. Bondurant Dallas-Fort Worth

Sambuz

Useful Links

Newsletter

Mail Us

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Equity & Excellence: Hidden Bias Implicit Bias Inherent Bias

TUTORS: P. Romano & S. Vercellone Vercellone TUTORS: P. Romano & S. Swift XRT 0.2-10