Discussion Dean Foster Amazon @ NYC Differential privacy means in - PowerPoint PPT Presentation

Discussion Dean Foster Amazon @ NYC

Differential privacy means in statistics language: Fit the world not the data.

Differential privacy means in statistics language: Fit the world not the data. You shouldn’t be able to tell which data set the experiment came from. (I expect Gelman will say how impossible this is later.)

Differential privacy means in statistics language: Fit the world not the data. You shouldn’t be able to tell which data set the experiment came from. (I expect Gelman will say how impossible this is later.) More extreme, you should not be able to tell anything about the dataset even when given all but one person.

For most of the history of statistics this wouldn’t matter. Regression for example: i β with β ∈ ℜ p EY i = x ⊤ p ≪ n Once we have ˆ β we can estimate any thing (The estimate of: E ( g ( Y )) is simply E ( g ( x ⊤ ˆ β + σ Z )) For linear combination, we even have confidence intervals (Scheffe) There wasn’t all that much more in the data then in the model In fact, ˆ β was “sufficient” to answer any question we could dream of asking

Stepwise regression changed all that Model: Y i ∼ X ⊤ i β + σ Z i Penalized regression: n β ) 2 + 2 q ˆ β σ 2 log ( p ) ˆ � i ˆ ( Y i − X ⊤ β ≡ arg min ˆ β i = 1 β ∈ ℜ p β is the number of non-zeros in ˆ q ˆ β let q , the number of non-zeros in β Need q ≪ n , but p could be large

Sample of theory Competitive ratios: Risk Inflation bibliography: risk inflation Prediction risk: R (ˆ β, β ) = E β | X β − X ˆ β | 2 Foster and Edward George “The Risk Inflation Criterion for 2 Multiple Regression,” , The Annals of Statistics , 22 , 1994, Target risk: 1947 - 1975. R (ˆ β ) = q σ 2 Donoho, David L., and Jain M. Johnstone. “Ideal spatial adaptation by wavelet shrinkage.” Biometrika (1994): The L-0 penalized regression is within a log factor of this 425-455. target. Theorem (Foster and George, 1994) For any orthogonal X matrix, if Π = 2 log ( p ) , then the risk of ˆ β Π is within a 2 log ( p ) factor of the target. Complexity: A success for stepwise regression L0 regression is hard L0 regression is VERY hard bibliography: Computational issues Theorem (Zhang, Wainwright, Jordan 2014) Theorem (Foster, Karloff, Thaler 2014) Theorem (Natarajan 1995) There exists an design matrix X such that no polynomial time No algorithm exists which achieves all three of the following Stepwise regression will have a prediction accuracy of at most algorithm which outputs q variables achieves a risk better than goals: Natarajan, B. K. (1995). “Sparse Approximate Solutions to twice optimal using at most ≈ 18 | X + | 2 2 q variables. Linear Systems.” SIAM J. Comput., 24(2):227-234. 1 Runs efficiently (i.e. in polynomial time) R (ˆ θ ) � γ 2 ( X ) σ 2 q log ( p ) . “Lower bounds on the performance of polynomial-time Runs accurately (i.e. risk inflation < p) algorithms for sparse linear regression” Y Zhang, MJ Returns sparse answer (i.e. | ˆ β | 0 ≪ p) Wainwright, MI Jordan - arXiv preprint arXiv:1402.1918, Where γ is the RE, a measure of co-linearity. 2014 Justin Thaler, Howard Karloff, and Dean Foster, “L-0 regression is hard.” Moritz Hardt, Jonathan Ullman “Preventing False Discovery in Interactive Data Analysis is Hard.”

Stewise regression and beyond The gready search for a best model is called stepwise regression

Stewise regression and beyond The gready search for a best model is called stepwise regression Bob Stine and I came up alpha investing: It is an opportunistic search which doesn’t worry about finding the best at each step Try a variables sequentially and keep if if you like it

Properties of alpha investing “provides” mFDR protection (2008) mFDR for streaming feature selection Streaming feature selection was introduced in JMLR 2006 (with Zhou, Stine and Ungar). Let W ( j ) be the “alpha wealth” at time j . Then for a series of p-values p j , we can define: � ω if p j ≤ α j , W ( j ) − W ( j − 1 ) = (1) − α j / ( 1 − α j ) if p j > α j . Theorem (Foster and Stine, 2008, JRSS-B) An alpha-investing rule governed by (1) with initial alpha-wealth W ( 0 ) ≤ α η and pay-out ω ≤ α controls mFDR η at level α . Can be done really fast (2011) VIF regression VIF speed comparison Capacity 1e+05 ● ● vif−regression ● vif:100,000 gps ● ● stepwise Number of Candidate Variables 8e+04 lasso ● Theorem foba ● 6e+04 (Foster, Dongyu Lin, 2011) VIF regression approximates a ● streaming feature selection method with speed O ( np ) . ● 4e+04 ● ● 2e+04 ● ● stepwise:900 lasso:700 gps:6,000 foba:600 ● 0e+00 ● ● ● ●● ● ● ● ● ● ● ● ● ● ● 0 50 100 150 200 250 300 Elapsed Running Time Works well under sub-modularity (2013) Submodular bibliography: Streaming feature selection Foster, J. Zhou, L. Ungar and R. Stine “Streaming Feature Selection using alpha investing,” KDD 2005. Theorem “ α -investing: A procedure for Sequential Control of (Foster, Johnson, Stine, 2013) If the R-squared in a regression Expected False Discoveries” Foster and R. Stine, JRSS-B , is submodular (aka subadditive) then a streaming feature 70 , 2008, pages 429-444. selection algorithm will find an estimator whose out risk is “VIF Regression: A Fast Regression Algorithm for Large within a factor of e / ( e − 1 ) of the optimal risk. Data” Foster, Dongyu Lin, and Lyle Ungar, JASA, 2011. Kory Johnson, Bob Stine, Dean Foster “Submodularity in statistics.” But it encourages dynamic variable selection

Properties of alpha investing “provides” mFDR protection (2008) mFDR for streaming feature selection Streaming feature selection was introduced in JMLR 2006 (with Zhou, Stine and Ungar). Let W ( j ) be the “alpha wealth” at time j . Then for a series of p-values p j , we can define: � ω if p j ≤ α j , W ( j ) − W ( j − 1 ) = (1) − α j / ( 1 − α j ) if p j > α j . Theorem (Foster and Stine, 2008, JRSS-B) An alpha-investing rule governed by (1) with initial alpha-wealth W ( 0 ) ≤ α η and pay-out ω ≤ α controls mFDR η at level α . Can be done really fast (2011) VIF regression VIF speed comparison Capacity 1e+05 ● ● vif−regression ● vif:100,000 gps ● ● stepwise Number of Candidate Variables 8e+04 lasso ● Theorem foba ● 6e+04 (Foster, Dongyu Lin, 2011) VIF regression approximates a ● streaming feature selection method with speed O ( np ) . ● 4e+04 ● ● 2e+04 ● ● stepwise:900 lasso:700 gps:6,000 foba:600 ● 0e+00 ● ● ● ●● ● ● ● ● ● ● ● ● ● ● 0 50 100 150 200 250 300 Elapsed Running Time Works well under sub-modularity (2013) Submodular bibliography: Streaming feature selection Foster, J. Zhou, L. Ungar and R. Stine “Streaming Feature Selection using alpha investing,” KDD 2005. Theorem “ α -investing: A procedure for Sequential Control of (Foster, Johnson, Stine, 2013) If the R-squared in a regression Expected False Discoveries” Foster and R. Stine, JRSS-B , is submodular (aka subadditive) then a streaming feature 70 , 2008, pages 429-444. selection algorithm will find an estimator whose out risk is “VIF Regression: A Fast Regression Algorithm for Large within a factor of e / ( e − 1 ) of the optimal risk. Data” Foster, Dongyu Lin, and Lyle Ungar, JASA, 2011. Kory Johnson, Bob Stine, Dean Foster “Submodularity in statistics.” But it encourages dynamic variable selection Enter the dragon!

Sequential data collection Picture = 1000 words Talking points: A picture is worth a 1000 queries. The adage of “always graph your data” counts as doing many queries against the distribution People can pick out several different possible patterns in Talking points: one glance at a graph We can to grow the data set as we do more queries Probably not worth 1000, more like 50 Still cheaper to collectively generate data rather than doing it fresh In other words, the sample complexity of doing k queries is √ O ( k ) if each is done on a seperate dataset but only O ( k ) if each is done on one large dataset. (Thanks Jonathan!) Biased questions: Entropy vs number of queries Significant digits Talking points: Never quote: “ ˆ β = 3 . 2123245386703” Talking points: All I have had in the past to justify not giving all these extra In variable selection, we mostly have very wide confidence digits was saying something like, “do you really believe it is intervals when we fail to reject the null. ...703 and not ...704?” Can this be used to allow more queries? Now it is a theorem! You are leaking too much information Can the bound be phrase in terms of entropy of the number and saying things about the data and not about the of yes/no questions? population (Thanks Cynthia!) I’ve argued using about a 1-SD scale for approximation (based on information theory). I think differential privacy asks for even cruder scales. Can this difference be closed?

Thanks!

Sequential data collection Talking points: We can to grow the data set as we do more queries Still cheaper to collectively generate data rather than doing it fresh In other words, the sample complexity of doing k queries is √ O ( k ) if each is done on a seperate dataset but only O ( k ) if each is done on one large dataset. (Thanks Jonathan!)

Discussion Dean Foster Amazon @ NYC Differential privacy means in - PowerPoint PPT Presentation

Discussion Dean Foster Amazon @ NYC Differential privacy means in statistics language: Fit the world not the data. Differential privacy means in statistics language: Fit the world not the data. You shouldnt be able to tell which data set

Discussion Forums Kerri Vuolo, March 31st 2015 What are discussion forums? A discussion

WORK MAKING PCB Thematic discussion: PCB Thematic discussion: PCB Thematic discussion: Lessons

1 DRAFT FOR DISCUSSION. SUBJECT FOR CHANGE. Vision for the Philippines 2 DRAFT FOR DISCUSSION.

REVENUE DISCUSSION CITY OF FAIRFIELD OBJECTIVES Current Fiscal State Discussion of

AHS Strategic Planning Update and Discussion July 24, 2020 Goals of Discussion 1. Update Board

Class 26 Discussion of In-class part of Exam Discussion of Project Paper requirements

SOCI 325: Sociology of science Agenda 1. Discussion feedback 2. Studying laboratories 3.

SOCI 325: Sociology of science Agenda 1. Discussion questions 2. The Kuhnian revolution 3.

Discussion Researchers use the discussion to examine their work in the larger context of

Transform CSCU 2020 Board of Regents update June 18, 2014 Draftfor discussion only Contents

NOISE ABATEMENT ANALYSIS NOISE ABATEMENT ANALYSIS DISCUSSION TOPICS DISCUSSION TOPICS

POPULATION HEALTH PLAN Draft Overview for Discussion and Comment October 2016 1 Discussion

8/6/2009 Opening Discussion Agenda Using NSSE and CLA Opening discussion/foundations 1. How

Fleet Options Information and Comparison Topics for Discussion Topics for Discussion CNG

ResearchChannel ResearchChannel Topics of Discussion Topics of Discussion HD Netw ork Update

Discussion with the RCM Working Group 17 April Discussion outline CAUSATION T HE RCM AND

Sparse Coding and Dictionary Learning for Image Analysis Part IV: New sparse models Francis

Some results on convolution idempotents May 28, 2020 1 IIT Hyderabad, India 2 Stanford University

Online k -MLE for mixture modelling with exponential families Christophe Saint-Jean Frank

Choice with multiple alternatives 5.2 Specification of the deterministic part Michel

Specification of Landmarks and Forecasting Water Temperature Water Management in the River

An introduction to particle rare event simulation P. Del Moral INRIA Bordeaux- Sud Ouest &

Particle Monte Carlo methods in statistical learning and rare event simulation P. Del Moral

Determinantal point process models and statistical inference Fr ed eric Lavancier ,

Discussion Dean Foster Amazon @ NYC Differential privacy means in - PowerPoint PPT Presentation

Discussion Dean Foster Amazon @ NYC Differential privacy means in statistics language: Fit the world not the data. Differential privacy means in statistics language: Fit the world not the data. You shouldnt be able to tell which data set

Discussion Forums Kerri Vuolo, March 31st 2015 What are discussion forums? A discussion

WORK MAKING PCB Thematic discussion: PCB Thematic discussion: PCB Thematic discussion: Lessons

1 DRAFT FOR DISCUSSION. SUBJECT FOR CHANGE. Vision for the Philippines 2 DRAFT FOR DISCUSSION.

REVENUE DISCUSSION CITY OF FAIRFIELD OBJECTIVES Current Fiscal State Discussion of

AHS Strategic Planning Update and Discussion July 24, 2020 Goals of Discussion 1. Update Board

Class 26 Discussion of In-class part of Exam Discussion of Project Paper requirements

SOCI 325: Sociology of science Agenda 1. Discussion feedback 2. Studying laboratories 3.

SOCI 325: Sociology of science Agenda 1. Discussion questions 2. The Kuhnian revolution 3.

Discussion Researchers use the discussion to examine their work in the larger context of

Transform CSCU 2020 Board of Regents update June 18, 2014 Draftfor discussion only Contents

NOISE ABATEMENT ANALYSIS NOISE ABATEMENT ANALYSIS DISCUSSION TOPICS DISCUSSION TOPICS

POPULATION HEALTH PLAN Draft Overview for Discussion and Comment October 2016 1 Discussion

8/6/2009 Opening Discussion Agenda Using NSSE and CLA Opening discussion/foundations 1. How

Fleet Options Information and Comparison Topics for Discussion Topics for Discussion CNG

ResearchChannel ResearchChannel Topics of Discussion Topics of Discussion HD Netw ork Update

Discussion with the RCM Working Group 17 April Discussion outline CAUSATION T HE RCM AND

Sparse Coding and Dictionary Learning for Image Analysis Part IV: New sparse models Francis

Some results on convolution idempotents May 28, 2020 1 IIT Hyderabad, India 2 Stanford University

Online k -MLE for mixture modelling with exponential families Christophe Saint-Jean Frank

Choice with multiple alternatives 5.2 Specification of the deterministic part Michel

Specification of Landmarks and Forecasting Water Temperature Water Management in the River

An introduction to particle rare event simulation P. Del Moral INRIA Bordeaux- Sud Ouest &amp;

Particle Monte Carlo methods in statistical learning and rare event simulation P. Del Moral

Determinantal point process models and statistical inference Fr ed eric Lavancier ,

An introduction to particle rare event simulation P. Del Moral INRIA Bordeaux- Sud Ouest &