On Estimating the Size and Confidence of a Statistical Audit Raluca - PowerPoint PPT Presentation

On Estimating the Size and Confidence of a Statistical Audit Raluca A. Popa and Ronald L. Rivest Javed A. Aslam College of Computer and Computer Science and Artificial Information Science Intelligence Laboratory Northeastern University M.I.T. August 6, 2007 Electronic Voting Technology 2007

Outline  Motivation  Background  How Do We Audit?  The Problem  Analysis  Model  Sample Size  Bounds  Conclusions August 6, 2007 Electronic Voting Technology 2007 2

Motivation  There have been cases of electoral fraud (Gumbel’s Steal This Vote , Nation Books, 2005)  Would like to ensure confidence in elections  Auditing = comparing statistical sample of paper ballots to electronic tally  Provides confidence in a software independent manner August 6, 2007 Electronic Voting Technology 2007 3

How Do We Audit?  Proposed Legislation: Holt Bill (2007)  Voter-verified paper ballots  Manual auditing  Granularity: Machine, Precinct, County  Procedure  Determine u, # precincts to audit, from margin of victory  Sample u precincts randomly  Compare hand count of paper ballots to electronic tally in sampled precincts  If all are sufficiently close, declare electronic result final  If any are significantly different, investigate! August 6, 2007 Electronic Voting Technology 2007 4

How Do We Audit?  Proposed Legislation: Holt Bill (2007)  Voter-verified paper ballots  Manual auditing  Granularity: Machine, Precinct, County  Procedure  Determine u, # precincts to audit, from margin of victory  Sample u precincts randomly  Compare hand count of paper ballots to electronic tally in sampled precincts Our formulas are independent of the auditing procedure August 6, 2007 Electronic Voting Technology 2007 5

The Problem  How many precincts should one audit to ensure high confidence in an election result? August 6, 2007 Electronic Voting Technology 2007 6

Previous Work  Saltman (1975): The first to study auditing by sampling without replacement  Dopp and Stenger (2006): Choosing appropriate audit sizes  Alvarez et al. (2005): Study of real case auditing of punch-card machines August 6, 2007 Electronic Voting Technology 2007 7

Hypothesis Testing  Null hypothesis: The reported election outcome is incorrect (electronic tally indicates different winner than paper ballots)  Want to reject the null hypothesis  Need to sample enough precincts to ensure that, if no fraud is detected, the election outcome is correct with high confidence August 6, 2007 Electronic Voting Technology 2007 8

Model n precincts b corrupted (“bad”) Sample u precincts (without replacement)  c = desired confidence  Want: If there are ≥ b corrupted precincts, then sample contains at least one with probability ≥ c  Equivalently: If the sample contains no corrupted precincts, then the election outcome is correct with probability ≥ c  Typical values: n = 400, b = 50, c = 95% August 6, 2007 Electronic Voting Technology 2007 9

What is b ?  Minimum # of precincts adversary must corrupt to change election outcome  Derived from margin of victory b = (half margin of victory) · n margin [times 5 (Dopp and Stenger, 2006)]  Our formulas are independent of b ’s calculation August 6, 2007 Electronic Voting Technology 2007 10

Rule of Three  If we draw a sample of size ≥ 3n/b with replacement , then:  Expect to see at least three corrupted precincts  Will see at least one corrupted precinct with c ≥ 95%  In practice, we sample without replacement (no repeated precincts) August 6, 2007 Electronic Voting Technology 2007 11

Sample Size  Probability that no corrupted precinct is detected: n-b n Pr = ﴾ ﴿ / ﴾ ﴿ u u  Optimal Sample Size: Minimum u such that Pr ≤ 1- c Problem: Need a computer  Goal: Derive a simple and accurate upper bound that an election official can compute on a hand-held calculator August 6, 2007 Electronic Voting Technology 2007 12

Our Bounds  Intuition: How many different precincts are sampled by the Rule of Three?  Our without replacement upper bounds: A C C U R A C Y August 6, 2007 Electronic Voting Technology 2007 13

Our Bounds  Intuition: How many different precincts are sampled by the Rule of Three?  Our without replacement upper bounds:  Example: n = 400, b = 50 (margin=5%), c = 95% August 6, 2007 Electronic Voting Technology 2007 14

Our Bound  Conservative: provably an upper bound  Accurate:  For n ≤ 10,000, b ≤ n /2, c ≤ 0.99 (steps of 0.01):  99% is exact, 1% overestimates by 1 precinct  Analytically, it overestimates by at most –ln(1- c )/2, e.g. three precincts for c < 0.9975  Can be computed on a hand-held calculator August 6, 2007 Electronic Voting Technology 2007 15

Observations Precincts to Audit n = 400, c=95% 1% 20% Margin of 10% 1% Victory  Fixed level of auditing is not appropriate August 6, 2007 Electronic Voting Technology 2007 16

Observations (cont’d) Precincts to Audit n = 400, c=65% Holt Tier 20% Margin of 10% 1% 2% Victory  Holt Bill (2007): Tiered auditing August 6, 2007 Electronic Voting Technology 2007 17

Related Problems Inverse questions  Estimate confidence level c from u, b , and n  Estimate detectable fraud level b from u, c , and n  Auditing with constraints  Holt Bill (2007): Audit at least one precinct in each  county Future work  Handling precincts of variable sizes ( Stanislevic,  2006 ) August 6, 2007 Electronic Voting Technology 2007 18

Conclusions  We develop a formula for the sample size: that is:  Conservative (an upper bound)  Accurate  Simple, easy to compute on a pocket calculator  Applicable to different other settings August 6, 2007 Electronic Voting Technology 2007 19

Thank you!  Questions? August 6, 2007 Electronic Voting Technology 2007 20

On Estimating the Size and Confidence of a Statistical Audit Raluca - PowerPoint PPT Presentation

On Estimating the Size and Confidence of a Statistical Audit Raluca A. Popa and Ronald L. Rivest Javed A. Aslam College of Computer and Computer Science and Artificial Information Science Intelligence Laboratory Northeastern University

Planning III-A: Planning III-A: Estimating Software Size - Estimating Software Size -

Lecture 25/Chapter 21 Estimating Means with Confidence Example: Meaning of Confidence Interval

Lecture 24/Chapter 20 Estimating Proportions with Confidence Example: Importance of Margin of

Estimating Estimating Covariance . . . Statistical Characteristics Estimating . . . Proof of

Confidence Intervals II 18.05 Spring 2014 Agenda Polling: estimating in Bernoulli( ). CLT

Confidence Intervals II 18.05 Spring 2014 Agenda Polling: estimating in Bernoulli( ). CLT

Estimating web size and search engine index size Near-duplicate document detection Size of the

Estimating Size and Effort Dr. James A. Bednar jbednar@inf.ed.ac.uk

Modelling the Size of Forest Trees Using Statistical Distributions Lauri Meht atalo

Accuracy & confidence Most of course so far: estimating stuff from data Today: how much

Estimating Size and Effort Massimo Felici and Conrad Hughes mfelici@staffmail.ed.ac.uk

Business Statistics CONTENTS Estimating parameters The sampling distribution Confidence

Statistical Significance and Performance Measures l Just a brief review of confidence intervals

CANADAS INNOVATION GAP ESTIMATING ITS SIZE; EXPLAINING ITS CAUSES by Peter J. Nicholson,

Estimating the Size of the Largest Families not Containing Tree-like Posets Wei-Tian Li Jerrold

HOW MANY? Estimating the Size of the Los Angeles County Jail Mental Health Population Appropriate

Estimating size requirements for pairings: Simulating the Tower-NFS algorithm in GF( p n ) Quentin

Quadratic versus Linear Estimating Equations GLS estimating equations 2 g 2 f

Confidence Intervals II 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom Agenda Polling:

Function Points What is Function Point Analysis? Approach to estimating SW size, which is

Estimating parameters 5.3 Confidence Intervals 5.4 Sample Variance Prof. Tesler Math 186

Assessing the Statistical Power/Precision Of Multisite Trials for Estimating Parameters Of

THE LISTING PRESENTATION A Natural Close! CONFIDENCE CONFIDENCE CONFIDENCE CONFIDENCE Hi

Statistical Modelling under Epistemic Data Imprecision Some Results on Estimating Multinomial

On Estimating the Size and Confidence of a Statistical Audit Raluca - PowerPoint PPT Presentation

On Estimating the Size and Confidence of a Statistical Audit Raluca A. Popa and Ronald L. Rivest Javed A. Aslam College of Computer and Computer Science and Artificial Information Science Intelligence Laboratory Northeastern University

Planning III-A: Planning III-A: Estimating Software Size - Estimating Software Size -

Lecture 25/Chapter 21 Estimating Means with Confidence Example: Meaning of Confidence Interval

Lecture 24/Chapter 20 Estimating Proportions with Confidence Example: Importance of Margin of

Estimating Estimating Covariance . . . Statistical Characteristics Estimating . . . Proof of

Confidence Intervals II 18.05 Spring 2014 Agenda Polling: estimating in Bernoulli( ). CLT

Confidence Intervals II 18.05 Spring 2014 Agenda Polling: estimating in Bernoulli( ). CLT

Estimating web size and search engine index size Near-duplicate document detection Size of the

Estimating Size and Effort Dr. James A. Bednar jbednar@inf.ed.ac.uk

Modelling the Size of Forest Trees Using Statistical Distributions Lauri Meht atalo

Accuracy &amp; confidence Most of course so far: estimating stuff from data Today: how much

Estimating Size and Effort Massimo Felici and Conrad Hughes mfelici@staffmail.ed.ac.uk

Business Statistics CONTENTS Estimating parameters The sampling distribution Confidence

Statistical Significance and Performance Measures l Just a brief review of confidence intervals

CANADAS INNOVATION GAP ESTIMATING ITS SIZE; EXPLAINING ITS CAUSES by Peter J. Nicholson,

Estimating the Size of the Largest Families not Containing Tree-like Posets Wei-Tian Li Jerrold

HOW MANY? Estimating the Size of the Los Angeles County Jail Mental Health Population Appropriate

Estimating size requirements for pairings: Simulating the Tower-NFS algorithm in GF( p n ) Quentin

Quadratic versus Linear Estimating Equations GLS estimating equations 2 g 2 f

Confidence Intervals II 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom Agenda Polling:

Function Points What is Function Point Analysis? Approach to estimating SW size, which is

Estimating parameters 5.3 Confidence Intervals 5.4 Sample Variance Prof. Tesler Math 186

Assessing the Statistical Power/Precision Of Multisite Trials for Estimating Parameters Of

THE LISTING PRESENTATION A Natural Close! CONFIDENCE CONFIDENCE CONFIDENCE CONFIDENCE Hi

Statistical Modelling under Epistemic Data Imprecision Some Results on Estimating Multinomial

Accuracy & confidence Most of course so far: estimating stuff from data Today: how much