guarding user privacy with federated learning and
play

Guarding user Privacy with Federated Learning and Differential - PowerPoint PPT Presentation

Guarding user Privacy with Federated Learning and Differential Privacy Brendan McMahan mcmahan@google.com DIMACS/Northeast Big Data Hub Workshop on Overcoming Barriers to Data Sharing including Privacy and Fairness 2017.10.24 Our Goal Imbue


  1. Guarding user Privacy with Federated Learning and Differential Privacy Brendan McMahan mcmahan@google.com DIMACS/Northeast Big Data Hub Workshop on Overcoming Barriers to Data Sharing including Privacy and Fairness 2017.10.24

  2. Our Goal Imbue mobile devices with Federated state of the art machine learning Learning systems without centralizing data and with privacy by default.

  3. Our Goal Imbue mobile devices with Federated state of the art machine learning Learning systems without centralizing data and with privacy by default. A very personal computer 2015: 79% away from phone ≤2 hours/day 1 63% away from phone ≤1 hour/day 25% can't remember being away at all 2013: 72% of users within 5 feet of phone most of the time 2 . Plethora of sensors Innumerable digital interactions 1 2015 Always Connected Research Report, IDC and Facebook 2 2013 Mobile Consumer Habits Study, Jumio and Harris Interactive.

  4. Our Goal Imbue mobile devices with Federated state of the art machine learning Learning systems without centralizing data and with privacy by default. Deep Learning non-convex millions of parameters complex structure (eg LSTMs)

  5. Our Goal Imbue mobile devices with Federated state of the art machine learning Learning systems without centralizing data and with privacy by default. Distributed learning problem Horizontally partitioned Nodes: millions to billions Dimensions: thousands to millions Examples: millions to billions

  6. Our Goal Imbue mobile devices with Federated state of the art machine learning Learning systems without centralizing data and with privacy by default. Federated decentralization facilitator

  7. Deep Learning, the short short version 0 0.5 Is it 5? 0.5 0.9 1 1 1 f (input, parameters) = output

  8. Deep Learning, the short short version 0 0.5 Is it 5? 0.5 0.9 1 1 1 f (input, parameters) = output loss (parameters) = 1/n ∑ i difference( f (input i , parameters), desired i )

  9. Deep Learning, the short short version 0 0.5 Is it 5? 0.5 0.9 1 1 1 Adjust these f (input, parameters) = output loss (parameters) = 1/n ∑ i difference( f (input i , parameters), desired i ) to minimize this

  10. Deep Learning, the short short version Stochastic Choose a random subset of training data Gradient Compute the "down" direction on the loss function Descent Take a step in that direction (Rinse & repeat) f (input, parameters) = output loss (parameters) = 1/n ∑ i difference( f (input i , parameters), desired i )

  11. Cloud-centric ML for Mobile

  12. The model lives in the cloud. Current Model Parameters

  13. We train models in the cloud. training data

  14. Mobile Device Current Model Parameters

  15. Make predictions in the cloud. t s e u q e r n o i t c i d e r p

  16. Gather training data in the cloud. t s e u q e r n o i t c i d e r p training data

  17. And make the models better. training data

  18. On-Device Predictions (Inference)

  19. Instead of making predictions in the cloud t s e u q e r n o i t c i d e r p

  20. Distribute the model, make predictions on device.

  21. 1 On-device inference On-Device Inference User Advantages Low latency ● ● Longer battery life Less wireless data transfer ● ● Better offline experience Less data sent to the cloud ● Developer Advantages ● Data is already localized ● New product opportunities World Advantages ● Raise privacy expectations for the industry

  22. 1 On-device training On-Device Inference User Advantages Bringing Low latency ● model training ● Longer battery life onto mobile devices. Less wireless data transfer ● ● Better offline experience Less data sent to the cloud ● (training data stays on device) Developer Advantages ● Data is already localized ● New product opportunities ● Straightforward personalization ● Simple access to rich user context World Advantages ● Raise privacy expectations for the industry

  23. 1 On-device training On-Device Inference User Advantages Bringing Low latency ● model training ● Longer battery life onto mobile devices. Less wireless data transfer ● ● Better offline experience Less data sent to the cloud ● (training data stays on device) 2 Developer Advantages Federated Learning ● Data is already localized ● New product opportunities ● Straightforward personalization ● Simple access to rich user context World Advantages ● Raise privacy expectations for the industry

  24. Federated Learning

  25. Federated Learning Federated Learning is the problem of training a shared global model under the coordination of a central server, from a federation of participating devices which maintain control of their own data. 2 Federated Learning

  26. Federated Learning Mobile Device Local Cloud Training Service Data Provider Current Model Parameters

  27. Federated Learning Mobile Device Many devices will be offline. Local Cloud Training Service Data Provider Current Model Parameters

  28. Federated Learning Mobile Device 1. Server selects Local a sample of e.g. Training 100 online Data devices. Current Model Parameters

  29. Federated Learning Mobile Device 1. Server selects Local a sample of e.g. Training 100 online Data devices. Current Model Parameters

  30. Federated Learning 2. Selected devices download the current model parameters.

  31. Federated Learning 3. Users compute an update using local training data

  32. Federated Learning ∑ 4. Server aggregates users' updates into a new model. Repeat until convergence.

  33. Applications of federating learning What makes a good application? Example applications On-device data is more relevant Language modeling (e.g., next ● ● than server-side proxy data word prediction) for mobile keyboards ● On-device data is privacy sensitive or large Image classification for predicting ● which photos people will share ● Labels can be inferred naturally ● ... from user interaction

  34. … or, why this isn't just Challenges of Federated Learning "standard" distributed optimization Massively Distributed Training data is stored across a very large number of devices Limited Communication Only a handful of rounds of unreliable communication with each devices Unbalanced Data Some devices have few examples, some have orders of magnitude more Highly Non-IID Data Data on each device reflects one individual's usage pattern Unreliable Compute Nodes Devices go offline unexpectedly; expect faults and adversaries Dynamic Data Availability The subset of data available is non-constant, e.g. time-of-day vs. country

  35. The Federated Averaging algorithm Server Until Converged: 1. Select a random subset (e.g. 100) of the (online) clients 2. In parallel, send current parameters θ t to those clients Selected Client k 1. Receive θ t from server. θ' 2. Run some number of minibatch SGD steps, producing θ' θ t 3. Return θ'-θ t to server. H. B. McMahan, et al . 3. θ t+1 = θ t + data-weighted average of client updates Communication-Efficient Learning of Deep Networks from Decentralized Data. AISTATS 2017

  36. Large-scale LSTM for next-word prediction Rounds to reach 10.5% Accuracy FedSGD 820 FedAvg 35 23x decrease in communication rounds Model Details 1.35M parameters 10K word dictionary embeddings ∊ℝ 96 , state ∊ℝ 256 corpus: Reddit posts, by author

  37. CIFAR-10 convolutional model Updates to reach 82% SGD 31,000 FedSGD 6,600 FedAvg 630 49x decrease in communication (updates) vs SGD (IID and balanced data)

  38. Federated Learning & Privacy

  39. Federated Learning ∑ 4. Server aggregates users' updates into a new model. Repeat until convergence.

  40. Might these Federated Learning updates contain privacy-sensitive data? ∑

  41. Might these updates contain privacy-sensitive data?

  42. Might these updates contain privacy-sensitive data? 1. Ephemeral

  43. Might these updates contain privacy-sensitive data? Improve privacy & 1. Ephemeral security by minimizing the 2. Focussed "attack surface"

  44. Might these updates contain privacy-sensitive data? 1. Ephemeral 2. Focussed ∑ 3. Only in aggregate

  45. Wouldn't it be even better if ... ∑ Google aggregates users' updates, but cannot inspect the individual updates.

  46. A novel, practical protocol ∑ Google aggregates users' updates, but cannot inspect K. Bonawitz, et.al. Practical the individual updates. Secure Aggregation for Privacy-Preserving Machine Learning. CCS 2017.

  47. Might the final model memorize a user's data? 1. Ephemeral 2. Focussed ∑ 3. Only in aggregate 4. Differentially private

  48. Differential Privacy ∑

  49. Differential Privacy Differential Privacy (trusted aggregator) ∑ +

  50. Federated Averaging Server Until Converged: 1. Select a random subset (e.g. C=100) of the (online) clients 2. In parallel, send current parameters θ t to those clients Selected Client k 1. Receive θ t from server. θ' 2. Run some number of minibatch SGD steps, producing θ' θ t 3. Return θ'-θ t to server. 3. θ t+1 = θ t + data-weighted average of client updates

Recommend


More recommend