research interests
play

Research Interests Distributed algorithms Distributed shared memory - PowerPoint PPT Presentation

Privacy & Security in Machine Learning / Optimization Nitin Vaidya University of Illinois at Urbana-Champaign disc.ece.illinois.edu Research Interests Distributed algorithms Distributed shared memory systems Distributed


  1. Privacy & Security in Machine Learning / Optimization Nitin Vaidya University of Illinois at Urbana-Champaign disc.ece.illinois.edu

  2. Research Interests Distributed algorithms  Distributed shared memory systems  Distributed computations over wireless networks  Distributed optimization 2

  3. Privacy Machine for and Learning / Security Optimization

  4. Privacy Machine for and Learning / Security Optimization

  5. Outline  Motivation – distributed machine learning  Research problems – Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples

  6. Example – Image Classification CIFAR-10 dataset

  7. Deep neuron Neural Networks layer 1 a 1 W 111 x 1 1 a 2 x 2 2 a 3 3 x 3 W 132 a 4

  8. output parameters input layer 1 0.8 W 111 dog x 1 1 0.1 cat x 2 2 0.09 ship 3 x 3 W 132 0.01 car

  9. Deep Neural Networks layer 1 a 1 x 1 1 a 2 x 2 2 a 3 3 x 3 W 132 a 4

  10. x 2 W 131 3 x 3 s ( X 2 W 131 +X 3 W 132 +b 13 ) W 132

  11. s( z) Rectifier Linear Unit z x 2 W 131 3 x 3 s ( X 2 W 131 +X 3 W 132 +b 13 ) W 132 z

  12. network How to train your dragon  Given a machine structure  Parameters are the only free variables  Choose parameters that maximize accuracy 13

  13. How to train your network  Given a machine structure  Parameters are the only free variables  Choose parameters to maximize accuracy Optimize a suitably defined cost function h(w) to find the right parameter vector w

  14. How to train your network a 1 x 1 1 a 2 x 2 2 a 3 3 x 3 W 132 a 4 Optimize a suitably defined cost function h(w) parameters to find the right parameter vector w w

  15. Cost Function h(w)  Consider input x  True classification y(x)  Machine classification a(x,w) using parameters w  Cost for input x = || y(x)-a(x,w) || 2  Total cost h(w) = Σ || y(x)-a(x,w) || 2 x 16

  16. Convex Optimization h(W) W = (w 131, w 132 , …) w 132 w 131 Wikipedia

  17. Convex Optimization W[0] W = (w 131, w 132 , …) W[1] = (4,3,...)

  18. Convex Optimization W[0] W = (w 131, w 132 , …) W[1] W[1] = (4,3,...) W[2] = (3,2, …)

  19. Convex Optimization W[0] W = (w 131, w 132 , …) W[1] W[2]

  20. Convex Optimization W[0] W = (w 131, w 132 , …) W[1] W[2] W[3]

  21. So far … h(w) Training Machine  parameters Optimize w cost function h(w) 22

  22. Outline  Motivation – distributed machine learning  Research problems – Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples

  23. Distributed Machine Learning  Data is distributed across different agents  Mobile users  Hospitals  Competing vendors

  24. Distributed Machine Learning  Data is distributed across different Agent 1 Agent 2 agents  Mobile users  Hospitals  Competing vendors Agent 3 Agent 4

  25. Distributed Machine Learning  Data is distributed  across different Collaborate to learn agents

  26. Distributed Machine Learning  Data is distributed  across different Collaborate to learn agents h 1 (w) h 2 (w) Training  Machine Optimize parameters cost function w Σ h i (w) i h 3 (w) h 4 (w)

  27. Distributed Optimization  30+ years or work  Recent interest due to machine learning applications 28

  28. Distributed Optimization Different architectures h 1 (w) h 2 (w)  Peer-to-peer h 3 (w) 29

  29. Distributed Optimization Different architectures h 1 (w) h 2 (w)  Peer-to-peer h 3 (w) Parameter server  Parameter server h 1 (w) h 3 (w) h 2 (w)

  30. Distributed Gradient Method W 1 [0] h 1 (w) h 2 (w) W 2 [0] h 3 (w) W 3 [0]

  31. Distributed Gradient Method W 1 [0] W 1 [0] h 1 (w) h 2 (w) W 2 [0] W 2 [0] W 3 [0] W 1 [0] h 3 (w) W 3 [0]

  32. Distributed Gradient Method W 1 [0] W 1 [0] h 1 (w) h 2 (w) W 2 [0] W 2 [0] W 3 [0] W 1 [0] T = ½W 3 [0] + ¼W 1 [0]+ ¼W 2 [0] h 3 (w) W 3 [0]

  33. Distributed Gradient Method W 1 [0] W 1 [0] h 1 (w) h 2 (w) W 2 [0] W 2 [0] W 3 [0] W 1 [0] T = ½W 3 [0] + ¼W 1 [0]+ ¼W 2 [0] h 3 (w) W 3 [0] = T - 𝛃 ∇ h 3 (T) W 3 [0]

  34. Works in incomplete networks too !! W 1 [0] W 1 [0] h 1 (w) h 2 (w) W 2 [0] W 2 [0] W 3 [0] W 1 [0] T = ½W 3 [0] + ¼W 1 [0]+ ¼W 2 [0] h 3 (w) W 3 [0] = T - 𝛃 ∇ h 3 (T) W 3 [0]

  35. Parameter Server Architecture Parameter W[1] = W[0] – 𝛃 ∑ ∇ h i (W[0]) server ∇ h 1 (W[0]) ∇ h 2 (W[0]) W[0] h 1 (w) h 3 (w) h 2 (w)

  36. Outline  Motivation – distributed machine learning  Research problems – Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples

  37. Privacy Challenge h 1 (w) h 2 (w)  Peers may learn each other’s data h 3 (w) Parameter  Parameter server may learn data server h 1 (w) h 3 (w) h 2 (w)

  38. Privacy-Preserving Optimization Can agents collaboratively learn, and yet protect own data ? Optimize cost function Σ h i (w) i

  39. Peer-to-Peer Architecture W 1 [0] W 1 [0] h 1 (w) h 2 (w) W 2 [0] W 2 [0] W 3 [0] W 1 [0] h 3 (w) W 3 [0]

  40. Add Inter-Dependent Noise W 1 [0]+n 1 W 1 [0] h 1 (w) h 2 (w) W 2 [0] W 2 [0]+n 2 W 3 [0]+n 3 W 1 [0]+n 1 h 3 (w) W 3 [0]

  41. Add Inter-Dependent Noise W 1 [0]+n 1 W 1 [0] h 1 (w) h 2 (w) W 2 [0] W 2 [0]+n 2 W 3 [0]+n 3 W 1 [0]+n 1 n 1 + n 2 + n 3 = 0 h 3 (w) W 3 [0]

  42. Key Idea  Add correlated noise in information exchanged between agents  Noise “ cancels ” over the network  But can prevent coalition of bad agents learning information about others 43

  43. Privacy-Preserving Optimization Can agents collaboratively learn, and yet protect own data ? Yes!* Optimize cost function * conditions Σ h i (w) apply i

  44. Privacy-Preserving Optimization Can agents collaboratively learn, and yet protect own data ? Yes!* Optimize cost function * conditions Σ h i (w) apply i

  45. Outline  Motivation – distributed machine learning  Research problems – Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples

  46. Adversarial Agents h 1 (w) h 2 (w)  Adversarial agents may send bogus information h 3 (w)  Learned parameters impacted Parameter server h 1 (w) h 3 (w) h 2 (w) 48

  47. Adversarial Agents Can good agents learn h 1 (w) h 2 (w) despite bad agents? h 3 (w) Parameter server h 1 (w) h 3 (w) h 2 (w) 49

  48. Adversarial Agents Can good agents learn h 1 (w) h 2 (w) despite bad agents? Yes! * h 3 (w) Parameter server h 1 (w) h 3 (w) h 2 (w) 50

  49. Key Idea  Need to filter bad information  Define “ outliers ” appropriately 51

  50. Outline  Motivation – distributed machine learning  Research problems – Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples

  51. Adversarial Samples  Machine learning seems to work well  If it seems too good to be true … 55

  52. Adversarial Samples  Several researchers have shown that it is easy to fool a machine

  53. 57

  54. original adversarial sample sample

  55. Can we solve the problem? May be … or not  Some interesting ideas that seem promising in early evaluations … but not mature enought to report yet 59

  56. Summary  Achieving privacy/security in learning is non-trivial  Some promising progress  Plenty to keep us busy for a while … disc.ece.illinois.edu

  57. Collaborators  Lili Su (Ph.D. candidate)  Shripad Gade (Ph.D. candidate)  Nishad Phadke (BS thesis)  Brian Wang (BS thesis)  Professor Jungmin So (on sabbatical)

  58. Collaborators  Lili Su (Ph.D. candidate)  Shripad Gade (Ph.D. candidate)  Nishad Phadke (BS thesis)  Brian Wang (BS thesis)  Professor Jungmin So (on sabbatical) Other related effort -- fault-tolerant control  Professor Aranya Chakarabortty (on sabbatical)

  59. Summary  Achieving privacy/security in learning is non-trivial  Some promising progress  Plenty to keep us busy for a while … disc.ece.illinois.edu

  60. 64

  61. 65

  62. 66

  63. 67

  64. Parameter Server Architecture  Distributed gradient method Parameter W[0] server h 1 (w) h 3 (w) h 2 (w)

  65. Distributed Optimization Parameter W[0] server W[0] h 1 (w) h 3 (w) h 2 (w)

  66. Distributed Optimization Parameter W[0] server ∇ h 1 (W[0]) ∇ h 2 (W[0]) W[0] h 1 (w) h 3 (w) h 2 (w)

Recommend


More recommend