Oblivious Neural Network Predictions via MiniONN Transformations Presented by: Sherif Abdelfattah Liu, J., Juuti, M., Lu, Y., & Asokan, N. (2017, October). Oblivious neural network predictions via minionn transformations. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (pp. 619-631). ACM. (121 citation) 1
Machine Learning as a Service Input Predictions This way is a violation of clients’ privacy 2
Running predictions on client-side • A naive solution is to have clients download the model and run the prediction phase on client-side. Model • It becomes more difficult for service providers to update their models. • For security applications (e.g., spam or malware detection services), an adversary can use the model as an oracle to develop strategies for evading detection. • If the training data contains sensitive information (such as patient records from a hospital) revealing the model may compromise privacy of the training data. 3
Oblivious Neural Networks (O (ONN) The solution is using make the neural network oblivious • The server learns nothing about the client’s input. • The clients learn nothing about the model. 4
MiniONN • MiniONN: Mini mizing the O verhead for O blivious N eural N etwork Blinded Input Oblivious Protocols Blinded Predictions • Low overhead almost 1 s • Work with all neural networks 6
How it works? 𝑐 2 , 𝑋 ′ = 𝑥 ′1,1 𝑥 ′1,2 𝑦 2 , 𝑋 = 𝑥 1,1 𝑥 1,2 𝑌 = 𝑦 1 𝑥 2,2 , 𝑐 = 𝑐 1 , 𝑐′ = 𝑐′ 1 𝑥 2,1 𝑥 ′ 2,1 𝑥 ′ 2,2 𝑐′ 2 𝒂 = 𝑿 ′ . 𝒈 𝑿. 𝒀 + 𝒄 + 𝒄′ 𝒂 𝑿′. 𝒀′ + 𝒄′ 𝒀 ’ Represents Non-Linear 𝒈( 𝒛 ) Transformation (Activation Function) 𝒛 𝑿. 𝒀 + 𝒄 Represents Linear Transformation 𝒀 7
Core Id Idea • The core idea is to use secret sharing for oblivious computation. 𝒂 Server 𝒛 ′𝒅 𝒛 ′𝒕 Client 𝑧 ′𝑑 + 𝑧 ′𝑡 = 𝑧′ 𝑿′. + 𝒄′ 𝒚 ′𝒕 𝒚 ′𝒅 𝑦′ 𝑑 + 𝑦 ′𝑡 = 𝑌′ 𝒈( ) 𝒛 𝒅 𝒛 𝒕 𝑧 𝑑 + 𝑧 𝑡 = 𝑧 𝑿. + 𝒄 The client & the server shares 𝑦 𝑑 and 𝑦 𝑡 𝑦 𝑑 + 𝑦 𝑡 = 𝑌 𝒚 𝐝 𝒚 𝐭 8
Secret sharing input 𝒀 𝒅 𝒔𝒃𝒐𝒆𝒑𝒏 𝒂 𝑶 𝒅 , 𝒚 𝟑 𝒚 𝟐 𝒕 , 𝒚 𝟑 𝒕 𝒚 𝟐 𝒕 = 𝒚 𝟐 − 𝒚 𝟐 𝒅 𝒚 𝟐 𝒕 = 𝒚 𝟑 − 𝒚 𝟑 𝒅 𝒚 𝟑 𝒚 𝒅 is independent of 𝒚 so it can be pre-chosen 9
Oblivious linear transformation 𝑿. 𝒀 + 𝒄 𝑡 + 𝑦 1 𝑑 𝑥 1,1 𝑥 1,2 𝑐 2 = 𝑥 1,1 𝑥 1,2 𝑥 2,2 ⋅ 𝑦 1 𝑥 2,2 ⋅ 𝑦 1 𝑦 2 + 𝑐 1 𝑑 + 𝑐 1 𝑡 +𝑦 2 𝑥 2,1 𝑥 2,1 𝑐 2 𝑦 2 𝑡 + 𝑦 1 𝑡 +𝑦 2 𝑑 + 𝑐 1 𝑑 ) + 𝑥 1,2 𝑦 2 = 𝑥 1,1 (𝑦 1 𝑡 + 𝑦 1 𝑡 +𝑦 2 𝑑 + 𝑐 2 𝑑 ) + 𝑥 2,2 𝑦 2 𝑥 2,1 (𝑦 1 𝑡 + 𝑥 1,2 𝑦 2 𝑡 + 𝑐 1 + 𝑥 1,1 𝑦 1 𝑑 +𝑥 1,2 𝑦 2 𝑑 𝑥 1,1 𝑦 1 = 𝑡 + 𝑥 2,2 𝑦 2 𝑡 + 𝑐 2 + 𝑥 2,1 𝑦 1 𝑑 + 𝑥 2,2 𝑦 2 𝑑 𝑥 2,1 𝑦 1 Compute locally by the server Dot-product 10
Oblivious linear transformation (d (dot-product) Homomorphic Encryption with SIMD 1 𝑠𝑏𝑜𝑒𝑝𝑛 𝑎 𝑂 𝑠 1,1 , 𝑠 1,2 , 𝑠 2,1 , 𝑠 2,2 𝐹 𝑥 1,1 , 𝐹 𝑥 1,2 , 𝐹 𝑥 2,1 , 𝐹 𝑥 2,2 𝑑 − 𝑠 𝑑 1,1 = 𝐹 𝑥 1,1 𝑦 1 1,1 𝑑 − 𝑠 𝑑 1,2 = 𝐹 𝑥 1,2 𝑦 2 1,2 𝑑 − 𝑠 2,1 𝑑 2,1 = 𝐹 𝑥 2,1 𝑦 1 𝑑 − 𝑠 2,2 𝑑 1,1 , 𝑑 1,2 , 𝑑 2,1 , 𝑑 2,2 𝑑 2,2 = 𝐹 𝑥 2,2 𝑦 2 𝐸(𝑑 1,1 ), 𝐸(𝑑 1,2 ), 𝐸(𝑑 2,1 ), 𝐸(𝑑 2,2 ) 𝑑 + 𝑥 1,2 𝑦 2 𝑑 − (𝑠 𝑤 1 = 𝑠 1,1 + 𝑠 𝑣 1 = 𝐸(𝑑 1,1 ) + 𝐸(𝑑 1,2 ) = 𝑥 1,1 𝑦 1 1,1 +𝑠 1,2 ) 1,2 𝑑 + 𝑥 2,2 𝑦 2 𝑑 − (𝑠 2,1 +𝑠 2,2 ) 𝑤 2 = 𝑠 2,1 + 𝑠 2,2 𝑣 2 = 𝐸(𝑑 2,1 ) + 𝐸(𝑑 2,2 ) = 𝑥 2,1 𝑦 1 11 1 Single instruction multiple data (SIMD): technique used to reduce the memory of the circuit and improve the evaluation time .
Oblivious linear transformation 𝑿. 𝒀 + 𝒄 𝑡 + 𝑥 1,2 𝑦 2 𝑡 + 𝑐 1 + 𝑥 1,1 𝑦 1 𝑑 +𝑥 1,2 𝑦 2 𝑑 𝑥 1,1 𝑦 1 = 𝑡 + 𝑥 2,2 𝑦 2 𝑡 + 𝑐 2 + 𝑥 2,1 𝑦 1 𝑑 + 𝑥 2,2 𝑦 2 𝑑 𝑥 2,1 𝑦 1 𝑡 + 𝑥 1,2 𝑦 2 𝑡 + 𝑐 1 + 𝑣 1 𝑥 1,1 𝑦 1 + 𝑤 1 = 𝑡 + 𝑥 2,2 𝑦 2 𝑡 + 𝑐 2 + 𝑣 2 𝑤 2 𝑥 2,1 𝑦 1 𝑡 𝑑 = 𝑧 1 + 𝑧 1 𝑡 𝑑 𝑧 2 𝑧 2 12
Oblivious Activation Functions 𝒈(𝒛) Piecewise linear functions • For example (ReLU: 𝑦 = compare(𝑧, 0) ) • Oblivious ReLU 𝑦 𝑡 +𝑦 𝑑 = compare 𝑧 𝑡 + 𝑧 𝑑 , 0 • Computed obliviously by a garbled circuit 2 2 garbled circuit : is a two- party computation (2PC) technique that allow two parties to jointly compute a function without learning each other’s in put. 13
Oblivious Activation Functions 𝒈(𝒛) Smooth functions 1 + 𝑓 −𝑧 ) • For example (Sigmoid: 𝑦 = Τ 1 1 + 𝑓 −(𝑧 𝑡 +𝑧 𝑑 ) • Oblivious Sigmoid 𝑦 𝑡 +𝑦 𝑑 = Τ 1 • Approximate by a piecewise linear function • Computed obliviously by a garbled circuit 14
The fi final result 𝑡 , 𝑧 2 𝑡 𝑧 1 𝑑 + 𝑧 1 𝑡 𝑧 1 = 𝑧 1 𝑑 + 𝑧 2 𝑡 𝑧 2 = 𝑧 2 15
Performance 1. MNIST (60 000 training images and 10 000 test images) • Handwriting recognition • CNN model • ReLU activation function 2. CIFAR-10 (50 000 training images and 10 000 test images) • Image classification • CNN model • ReLU activation function 3. Penn Treebank (PTB) ( 929 000 training words, 73 000 validation words, and 82 000 test words .) • language modeling : predicting next words given the previous words • Long Short Term Memory (LSTM): commonly used for language modeling • Sigmoidal activation function 16
Performance • Comparison between MiniONN vs. CryptoNets Latency (s) Msg sizes (MB) MNIST/Square/CNN Accuracy % offline online offline online CryptoNets 0 297.5 0 372.2 98.95 MiniONN 0.88 0.4 3.6 44 98.95 17
Performance • For single query Latency (s) Msg sizes (MB) Model Accuracy % offline online offline online MNIST/ReLU/CNN 3.58 5.74 20.9 20.9 99.0 CIFAR-10/ReLU/CNN 472 72 3046 6226 81.61 cross-entropy PTB/Sigmoidal/LSTM 13.9 4.39 86.7 474 loss:4.79 18
Thank You 19
Recommend
More recommend