Differentially-Private Deep Learning from an Optimization Perspective Presenter: Liyao Xiang Shanghai Jiao Tong University 4/30/2019
Privacy Threat ‣ Personal information in big data era ‣ Is anonymization sufficient to protect user privacy? ‣ Netflix recommendation challenge: remove personal identity information, replace names with random numbers ‣ De-anonymize the Netflix database with the public information on IMDb ‣ De-anonymization even works on partial, distorted, wrong data! 2
Side Information new comer Side information hurts privacy! 3 3
Differential Privacy adjacent inputs Constraint: P [ M ( I ) ∈ O ] ≤ e ✏ P [ M ( I 0 ) ∈ O ] P[ M (7) ∈ O] P[ M (8) ∈ O] smaller ✏ indicates higher privacy Output O 4 4
Deep Learning with Differential Privacy θ = ( 𝜄 1 , …, 𝜄 n ) ϑ = ( 𝜘 1 , …, 𝜘 n ) Perturbation Differentially-private model stochastic gradient tells! (DPSGD): add noise to gradient g t in each iteration of update 5
Deep Learning with Differential Privacy The recent work [Abadi et. al., CCS’ 16] only achieves ~90% accuracy whereas training w/o privacy reaches over 99% on MNIST. The result of [Shokri et. al., CCS’ 15] is even worse. Privacy Accuracy 6 6
higher lower more noise ϑ privacy accuracy? level ϑ In previous works: link between inserted noise less noise and accuracy is θ broken 7
Model Sensitivity 85% ϑ 90% different θ same accuracy amount levels of total noise ϑ 8
Example add noise different cost! b 1 b 2 b 3 𝝸 1 X1 h1 Y b 3 Noise X2 h2 θ 6 Noise 𝝸 6 9
Optimized Additive Noise Scheme ‣ Model sensitivity w = (w 1 , w 2 , …, w d ) ∈ D d : derivative vector of the cost on all training examples w.r.t. all parameters ‣ To keep the cost minimal, noise should be added to the least sensitive direction of the cost function ‣ Seek a probability distribution of the noise to minimize the cost as well as to meet differential privacy constraint! 10
Optimized Additive Noise Scheme Objective distribution of model sensitivity noise Z Z minimize h w , z i P (d z 1 . . . d z d ) . . . P z d z 1 additive noise cost increases as θ i increases ⇒ w i = ∂ C > 0 cost is more sensitive to ∂θ i pushes z i to a direction where z i < 0 changes of θ i than θ j ⇒ less ∂ C > ∂ C > 0 noise should be added to θ i ∂θ i ∂θ j 11
Optimized Additive Noise Scheme Constraint k g t � g 0 t k global sensitivity on α = sup 8 X , X 0 s.t. d ( X , X 0 )=1 adjacent inputs: L2-norm training datasets between the differ by a single Pr[ M ( g t ) ∈ O ] ≤ e ✏ Pr[ M ( g 0 t ) ∈ O ] gradients instance ⇒ Pr[ g t + z ∈ O ] ≤ e ✏ Pr[ g 0 t + z ∈ O ] ⇒ Pr[ z ∈ O − g t ] ≤ e ✏ Pr[ z ∈ O − g 0 t ] ⇒ Pr[ z ∈ O 0 ] ≤ e ✏ Pr[ z ∈ O 0 + g t − g 0 t ] 12
Optimized Additive Noise Scheme Z Z minimize h w , z i P (d z 1 . . . d z d ) . . . P z d z 1 s.t. Pr[ z 2 O 0 ] e ✏ Pr[ z 2 O 0 + ∆ ] 8 O 0 ✓ R d , || ∆ || α Z minimize z ∈ R d k w � z k 1 p ( z )d z p p ( z ) p ( z + ∆ ) ✏ , 8 || ∆ || ↵ , ∆ 2 R d . s.t. ln 13
Composition ‣ So far, we only show how to provide privacy guarantee in a single iteration of update ‣ In practice, SGD takes many iterations until convergence ‣ Iterative computation exposes the training data multiple times, degrading privacy level! ‣ Our solution: Advanced composition theorem for differential privacy + privacy amplification by sampling 14
Optimized Additive Noise Mechanism 1. Compute per-iteration privacy parameters according to composition theorem 2. For each iteration 1. Compute model sensitivity w 2. Solve the optimization problem to find noise distribution 3. Sample a noise 4. For each batch of training data: Compute and clip the gradient by global sensitivity 5. Compute the average gradient for the batch 6. Add noise to the average gradient 7. Update model parameters 15
Implementation Implement optimized noise generator (ours) and Gaussian noise generator (the state-of-the-art, Abadi et. al.) on Keras and Tensorflow Problem : computational challenges due to high dimensionality Solving the optimization problem using GPU operations Numpy noise generator 16
MNIST CIFAR-10 95 85 80 73.75 Accuracy (%) Accuracy (%) 65 62.5 G: δ =1e-5 G( 𝝑 =0.3, δ =1e-5) G: δ =1e-4 O( 𝝑 =0.3, δ =1e-5) 50 O: δ =1e-5 G( 𝝑 =1, δ =1e-5) 51.25 O( 𝝑 =1, δ =1e-5) O: δ =1e-4 Unperturbed 35 40 400 800 1200 1600 2000 0.01 0.025 0.1 0.5 1.2 Iteration No. ε Our scheme achieves higher accuracy over [Abadi CCS’ 16] under the same privacy guarantee 17 17
Thank you! 18
Recommend
More recommend