di ff erentially private empirical risk minimization with
play

Di ff erentially Private Empirical Risk Minimization with Non-convex - PowerPoint PPT Presentation

Di ff erentially Private Empirical Risk Minimization with Non-convex Loss Functions Di Wang , Changyou Chen and Jinhui Xu State University of New York at Bu ff alo International Conference on Machine Learning 2019 Di Wang Non-convex DP-ERM ICML


  1. Di ff erentially Private Empirical Risk Minimization with Non-convex Loss Functions Di Wang , Changyou Chen and Jinhui Xu State University of New York at Bu ff alo International Conference on Machine Learning 2019 Di Wang Non-convex DP-ERM ICML 2019 1 / 15

  2. Outline Introduction 1 Problem Description Result 1 Result 2 Result 3 Di Wang Non-convex DP-ERM ICML 2019 2 / 15

  3. Outline Introduction 1 Problem Description Result 1 Result 2 Result 3 Di Wang Non-convex DP-ERM ICML 2019 3 / 15

  4. Empirical Risk Minimization (ERM) Given: A dataset D = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , · · · , ( x n , y n ) } , where each ( x i , y i ) ∈ R d × R ∼ P . Regularization r ( · ) : R d 󰀂→ R , we use ℓ 2 regularization with r ( w ) = λ 2 󰀃 w 󰀃 2 2 . For a loss function ℓ , the (regularized) Empirical Risk: n 󰁜 L r ( w ; D ) = 1 ˆ ℓ ( w ; x i , y i ) + r ( w ) . n i =1 the (regularized) Population Risk: L r P ( w ) = E ( x , y ) ∼ P [ ℓ ( w ; x , y )] + r ( w ) . Goal: Find w so as to minimize the empirical or population risk. Di Wang Non-convex DP-ERM ICML 2019 4 / 15

  5. ( 󰂄 , δ )- Di ff erential Privacy (DP) Di ff erential Privacy (DP) [Dwork et al,. 2006] We say that two datasets, D and D ′ , are neighbors if they di ff er by only one entry, denoted as D ∼ D ′ . A randomized algorithm A is ( 󰂄 , δ )-di ff erentially private if for all neighboring datasets D , D ′ , and for all events S in the output space of A , we have Pr( A ( D ) ∈ S ) ≤ e 󰂄 Pr A ( D ′ ) ∈ S ) + δ . Di Wang Non-convex DP-ERM ICML 2019 5 / 15

  6. DP-ERM DP-ERM Determine a sample complexity n = n (1 / 󰂄 , 1 / δ , p , 1 / α ) such that there is an ( 󰂄 , δ )-DP algorithm whose output w priv achieves an α -error in the expected excess empirical risk : Err r D ( w priv ) = E ˆ L ( w LDP ; D ) − min w ∈ R d ˆ L ( w ; D ) ≤ α . or in the expected excess empirical risk : Err r P ( w priv ) = E [ L r P ( w priv )] − min w ∈ R d L r P ( w ) ≤ α . Di Wang Non-convex DP-ERM ICML 2019 6 / 15

  7. Motivation Previous work on DP-ERM mainly focuses on convex loss functions. Di Wang Non-convex DP-ERM ICML 2019 7 / 15

  8. Motivation Previous work on DP-ERM mainly focuses on convex loss functions. For non-convex loss functions, [Zhang et al, 2017] and [Wang and Xu 2019] studied the problem and used, as error measurement, the ℓ 2 gradient norm of a private estimator, i.e., 󰀃∇ ˆ L r D ( w priv ) 󰀃 2 and E P 󰀃∇ ℓ ( w priv ; x , y ) 󰀃 2 Di Wang Non-convex DP-ERM ICML 2019 7 / 15

  9. Motivation Previous work on DP-ERM mainly focuses on convex loss functions. For non-convex loss functions, [Zhang et al, 2017] and [Wang and Xu 2019] studied the problem and used, as error measurement, the ℓ 2 gradient norm of a private estimator, i.e., 󰀃∇ ˆ L r D ( w priv ) 󰀃 2 and E P 󰀃∇ ℓ ( w priv ; x , y ) 󰀃 2 Main Question: Can the excess empirical (population) risk be used to measure the error of non-convex loss functions in the di ff erential privacy model? Di Wang Non-convex DP-ERM ICML 2019 7 / 15

  10. Outline Introduction 1 Problem Description Result 1 Result 2 Result 3 Di Wang Non-convex DP-ERM ICML 2019 8 / 15

  11. Result 1 Theorem 1 If the loss function is L -Lipschitz, twice di ff erentiable and M -smooth, by using the private version of Gradient Langevin Dynamics (DP-GLD) we show that the excess empirical (or population) risk is upper bounded by O ( d log(1 / δ ) ˜ log n 󰂄 2 ). Di Wang Non-convex DP-ERM ICML 2019 9 / 15

  12. Result 1 Theorem 1 If the loss function is L -Lipschitz, twice di ff erentiable and M -smooth, by using the private version of Gradient Langevin Dynamics (DP-GLD) we show that the excess empirical (or population) risk is upper bounded by O ( d log(1 / δ ) ˜ log n 󰂄 2 ). The proof is based on some recent developments in Bayesian learning and analysis of GLD. By using a finer analysis of the time-average error of some SDE, we show the following Di Wang Non-convex DP-ERM ICML 2019 9 / 15

  13. Result 1 Theorem 1 If the loss function is L -Lipschitz, twice di ff erentiable and M -smooth, by using the private version of Gradient Langevin Dynamics (DP-GLD) we show that the excess empirical (or population) risk is upper bounded by O ( d log(1 / δ ) ˜ log n 󰂄 2 ). The proof is based on some recent developments in Bayesian learning and analysis of GLD. By using a finer analysis of the time-average error of some SDE, we show the following Theorem 2 For the excessed empirical risk, there is an ( 󰂄 , δ )-DP algorithm which satisfies 󰀄 C 0 ( d ) log(1 / δ ) 󰀅 T →∞ Err r D ( w T ) ≤ ˜ lim O , n τ 󰂄 τ where C 0 ( d ) is a function of d and 0 < τ < 1 is some cosntant. Di Wang Non-convex DP-ERM ICML 2019 9 / 15

  14. Outline Introduction 1 Problem Description Result 1 Result 2 Result 3 Di Wang Non-convex DP-ERM ICML 2019 10 / 15

  15. Result 2 Are these bounds tight? Di Wang Non-convex DP-ERM ICML 2019 11 / 15

  16. Result 2 Are these bounds tight? Based on the exponential mechanism, we have Empirical Risk For any β < 1, there is an 󰂄 -di ff erentially private algorithm whose output w priv induces an excess empirical risk Err r D ( w priv ) ≤ ˜ O ( d n 󰂄 ) with probability at least 1 − β . Di Wang Non-convex DP-ERM ICML 2019 11 / 15

  17. Result 2 Are these bounds tight? Based on the exponential mechanism, we have Empirical Risk For any β < 1, there is an 󰂄 -di ff erentially private algorithm whose output w priv induces an excess empirical risk Err r D ( w priv ) ≤ ˜ O ( d n 󰂄 ) with probability at least 1 − β . Population Risk For Generalized Linear model and Robust Regressions (whose loss function is ℓ ( w ; x , y ) = ( σ ( 〈 w , x 〉 ) − y ) 2 and ℓ ( w ; x , y ) = Φ ( 〈 w , x 〉 − y ), respectively), under some reasonable assumptions, there is an ( 󰂄 , δ )-DP algorithm whose excess population risk is upper bounded by 󰁵 d ln 1 4 󰀄 󰀅 δ Err P ( w priv ) ≤ O √ n 󰂄 . Di Wang Non-convex DP-ERM ICML 2019 11 / 15

  18. Outline Introduction 1 Problem Description Result 1 Result 2 Result 3 Di Wang Non-convex DP-ERM ICML 2019 12 / 15

  19. Finding Approximate Local Minimum Privately Finding global minimum of non-convex function is challenging! Di Wang Non-convex DP-ERM ICML 2019 13 / 15

  20. Finding Approximate Local Minimum Privately Finding global minimum of non-convex function is challenging! Recent research on Deep Learning and other non-convex problems show that local minima , but not critical points, are su ffi cient. Di Wang Non-convex DP-ERM ICML 2019 13 / 15

  21. Finding Approximate Local Minimum Privately Finding global minimum of non-convex function is challenging! Recent research on Deep Learning and other non-convex problems show that local minima , but not critical points, are su ffi cient. But , finding local minima is still NP-hard. Di Wang Non-convex DP-ERM ICML 2019 13 / 15

  22. Finding Approximate Local Minimum Privately Finding global minimum of non-convex function is challenging! Recent research on Deep Learning and other non-convex problems show that local minima , but not critical points, are su ffi cient. But , finding local minima is still NP-hard. Fortunately, many non-convex functions are strict saddle. Thus, it is su ffi cient to find the second order stationary point (or approximate local minimum). Definition w is an α -second-order stationary point ( α -SOSP), if 󰀃∇ F ( w ) 󰀃 2 ≤ α and λ min ( ∇ 2 F ( w )) ≥ −√ ρα . (1) Di Wang Non-convex DP-ERM ICML 2019 13 / 15

  23. Finding Approximate Local Minimum Privately Finding global minimum of non-convex function is challenging! Recent research on Deep Learning and other non-convex problems show that local minima , but not critical points, are su ffi cient. But , finding local minima is still NP-hard. Fortunately, many non-convex functions are strict saddle. Thus, it is su ffi cient to find the second order stationary point (or approximate local minimum). Definition w is an α -second-order stationary point ( α -SOSP), if 󰀃∇ F ( w ) 󰀃 2 ≤ α and λ min ( ∇ 2 F ( w )) ≥ −√ ρα . (1) Can we find some approximate local minimum which escapes saddle points and still keeps the algorithm ( 󰂄 , δ ) -di ff erentially private? Di Wang Non-convex DP-ERM ICML 2019 13 / 15

  24. Result 3 On one hand, (Ge et al. 2015) proposed an algorithm, noisy Stochastic Gradient Descent , to find approximate local minima. Di Wang Non-convex DP-ERM ICML 2019 14 / 15

  25. Result 3 On one hand, (Ge et al. 2015) proposed an algorithm, noisy Stochastic Gradient Descent , to find approximate local minima. On the other hand, in DP community, one popular method for ERM is called DP-SGD , which adds some Gaussian noise in each iteration. Di Wang Non-convex DP-ERM ICML 2019 14 / 15

  26. Result 3 On one hand, (Ge et al. 2015) proposed an algorithm, noisy Stochastic Gradient Descent , to find approximate local minima. On the other hand, in DP community, one popular method for ERM is called DP-SGD , which adds some Gaussian noise in each iteration. Using DP-GD, we can show Theorem 4 If the data size n is large enough such that 󰁵 log 1 δ d log 1 ξ n ≥ ˜ Ω ( ) , (2) 󰂄α 2 then with probability 1 − ζ , one of the outputs is an α -SOSP of the empirical risk ˆ L ( · , D ). Di Wang Non-convex DP-ERM ICML 2019 14 / 15

  27. Thank you! Di Wang Non-convex DP-ERM ICML 2019 15 / 15

Recommend


More recommend