Privacy & Security in Machine Learning / Optimization Nitin Vaidya University of Illinois at Urbana-Champaign disc.ece.illinois.edu
Research Interests Distributed algorithms Distributed shared memory systems Distributed computations over wireless networks Distributed optimization 2
Privacy Machine for and Learning / Security Optimization
Privacy Machine for and Learning / Security Optimization
Outline Motivation – distributed machine learning Research problems – Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples
Example – Image Classification CIFAR-10 dataset
Deep neuron Neural Networks layer 1 a 1 W 111 x 1 1 a 2 x 2 2 a 3 3 x 3 W 132 a 4
output parameters input layer 1 0.8 W 111 dog x 1 1 0.1 cat x 2 2 0.09 ship 3 x 3 W 132 0.01 car
Deep Neural Networks layer 1 a 1 x 1 1 a 2 x 2 2 a 3 3 x 3 W 132 a 4
x 2 W 131 3 x 3 s ( X 2 W 131 +X 3 W 132 +b 13 ) W 132
s( z) Rectifier Linear Unit z x 2 W 131 3 x 3 s ( X 2 W 131 +X 3 W 132 +b 13 ) W 132 z
network How to train your dragon Given a machine structure Parameters are the only free variables Choose parameters that maximize accuracy 13
How to train your network Given a machine structure Parameters are the only free variables Choose parameters to maximize accuracy Optimize a suitably defined cost function h(w) to find the right parameter vector w
How to train your network a 1 x 1 1 a 2 x 2 2 a 3 3 x 3 W 132 a 4 Optimize a suitably defined cost function h(w) parameters to find the right parameter vector w w
Cost Function h(w) Consider input x True classification y(x) Machine classification a(x,w) using parameters w Cost for input x = || y(x)-a(x,w) || 2 Total cost h(w) = Σ || y(x)-a(x,w) || 2 x 16
Convex Optimization h(W) W = (w 131, w 132 , …) w 132 w 131 Wikipedia
Convex Optimization W[0] W = (w 131, w 132 , …) W[1] = (4,3,...)
Convex Optimization W[0] W = (w 131, w 132 , …) W[1] W[1] = (4,3,...) W[2] = (3,2, …)
Convex Optimization W[0] W = (w 131, w 132 , …) W[1] W[2]
Convex Optimization W[0] W = (w 131, w 132 , …) W[1] W[2] W[3]
So far … h(w) Training Machine parameters Optimize w cost function h(w) 22
Outline Motivation – distributed machine learning Research problems – Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples
Distributed Machine Learning Data is distributed across different agents Mobile users Hospitals Competing vendors
Distributed Machine Learning Data is distributed across different Agent 1 Agent 2 agents Mobile users Hospitals Competing vendors Agent 3 Agent 4
Distributed Machine Learning Data is distributed across different Collaborate to learn agents
Distributed Machine Learning Data is distributed across different Collaborate to learn agents h 1 (w) h 2 (w) Training Machine Optimize parameters cost function w Σ h i (w) i h 3 (w) h 4 (w)
Distributed Optimization 30+ years or work Recent interest due to machine learning applications 28
Distributed Optimization Different architectures h 1 (w) h 2 (w) Peer-to-peer h 3 (w) 29
Distributed Optimization Different architectures h 1 (w) h 2 (w) Peer-to-peer h 3 (w) Parameter server Parameter server h 1 (w) h 3 (w) h 2 (w)
Distributed Gradient Method W 1 [0] h 1 (w) h 2 (w) W 2 [0] h 3 (w) W 3 [0]
Distributed Gradient Method W 1 [0] W 1 [0] h 1 (w) h 2 (w) W 2 [0] W 2 [0] W 3 [0] W 1 [0] h 3 (w) W 3 [0]
Distributed Gradient Method W 1 [0] W 1 [0] h 1 (w) h 2 (w) W 2 [0] W 2 [0] W 3 [0] W 1 [0] T = ½W 3 [0] + ¼W 1 [0]+ ¼W 2 [0] h 3 (w) W 3 [0]
Distributed Gradient Method W 1 [0] W 1 [0] h 1 (w) h 2 (w) W 2 [0] W 2 [0] W 3 [0] W 1 [0] T = ½W 3 [0] + ¼W 1 [0]+ ¼W 2 [0] h 3 (w) W 3 [0] = T - 𝛃 ∇ h 3 (T) W 3 [0]
Works in incomplete networks too !! W 1 [0] W 1 [0] h 1 (w) h 2 (w) W 2 [0] W 2 [0] W 3 [0] W 1 [0] T = ½W 3 [0] + ¼W 1 [0]+ ¼W 2 [0] h 3 (w) W 3 [0] = T - 𝛃 ∇ h 3 (T) W 3 [0]
Parameter Server Architecture Parameter W[1] = W[0] – 𝛃 ∑ ∇ h i (W[0]) server ∇ h 1 (W[0]) ∇ h 2 (W[0]) W[0] h 1 (w) h 3 (w) h 2 (w)
Outline Motivation – distributed machine learning Research problems – Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples
Privacy Challenge h 1 (w) h 2 (w) Peers may learn each other’s data h 3 (w) Parameter Parameter server may learn data server h 1 (w) h 3 (w) h 2 (w)
Privacy-Preserving Optimization Can agents collaboratively learn, and yet protect own data ? Optimize cost function Σ h i (w) i
Peer-to-Peer Architecture W 1 [0] W 1 [0] h 1 (w) h 2 (w) W 2 [0] W 2 [0] W 3 [0] W 1 [0] h 3 (w) W 3 [0]
Add Inter-Dependent Noise W 1 [0]+n 1 W 1 [0] h 1 (w) h 2 (w) W 2 [0] W 2 [0]+n 2 W 3 [0]+n 3 W 1 [0]+n 1 h 3 (w) W 3 [0]
Add Inter-Dependent Noise W 1 [0]+n 1 W 1 [0] h 1 (w) h 2 (w) W 2 [0] W 2 [0]+n 2 W 3 [0]+n 3 W 1 [0]+n 1 n 1 + n 2 + n 3 = 0 h 3 (w) W 3 [0]
Key Idea Add correlated noise in information exchanged between agents Noise “ cancels ” over the network But can prevent coalition of bad agents learning information about others 43
Privacy-Preserving Optimization Can agents collaboratively learn, and yet protect own data ? Yes!* Optimize cost function * conditions Σ h i (w) apply i
Privacy-Preserving Optimization Can agents collaboratively learn, and yet protect own data ? Yes!* Optimize cost function * conditions Σ h i (w) apply i
Outline Motivation – distributed machine learning Research problems – Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples
Adversarial Agents h 1 (w) h 2 (w) Adversarial agents may send bogus information h 3 (w) Learned parameters impacted Parameter server h 1 (w) h 3 (w) h 2 (w) 48
Adversarial Agents Can good agents learn h 1 (w) h 2 (w) despite bad agents? h 3 (w) Parameter server h 1 (w) h 3 (w) h 2 (w) 49
Adversarial Agents Can good agents learn h 1 (w) h 2 (w) despite bad agents? Yes! * h 3 (w) Parameter server h 1 (w) h 3 (w) h 2 (w) 50
Key Idea Need to filter bad information Define “ outliers ” appropriately 51
Outline Motivation – distributed machine learning Research problems – Privacy-preserving distributed optimization – Adversarial learning – Robustness to adversarial samples
Adversarial Samples Machine learning seems to work well If it seems too good to be true … 55
Adversarial Samples Several researchers have shown that it is easy to fool a machine
57
original adversarial sample sample
Can we solve the problem? May be … or not Some interesting ideas that seem promising in early evaluations … but not mature enought to report yet 59
Summary Achieving privacy/security in learning is non-trivial Some promising progress Plenty to keep us busy for a while … disc.ece.illinois.edu
Collaborators Lili Su (Ph.D. candidate) Shripad Gade (Ph.D. candidate) Nishad Phadke (BS thesis) Brian Wang (BS thesis) Professor Jungmin So (on sabbatical)
Collaborators Lili Su (Ph.D. candidate) Shripad Gade (Ph.D. candidate) Nishad Phadke (BS thesis) Brian Wang (BS thesis) Professor Jungmin So (on sabbatical) Other related effort -- fault-tolerant control Professor Aranya Chakarabortty (on sabbatical)
Summary Achieving privacy/security in learning is non-trivial Some promising progress Plenty to keep us busy for a while … disc.ece.illinois.edu
64
65
66
67
Parameter Server Architecture Distributed gradient method Parameter W[0] server h 1 (w) h 3 (w) h 2 (w)
Distributed Optimization Parameter W[0] server W[0] h 1 (w) h 3 (w) h 2 (w)
Distributed Optimization Parameter W[0] server ∇ h 1 (W[0]) ∇ h 2 (W[0]) W[0] h 1 (w) h 3 (w) h 2 (w)
Recommend
More recommend