Walls Have Ears: Traffic-based Side-channel Attack in Video Streaming Jiaxi Gu ∗ , Jiliang Wang † , Zhiwen Yu ∗ , Kele Shen † ∗ Northwestern Polytechnical University, P .R. China † Tsinghua University, P .R. China 1
Outline ๏ Background & Motivation ๏ Objective ๏ Methodology ๏ Experiments ๏ Conclusion 2
Background & Motivation 3
Booming Video Industry “ Globally, IP video tra ffi c will be 82 percent of all consumer Internet tra ffi c by 2021, up from 73 percent in 2016. — Cisco VNI. “Forecast and methodology, 2016-2021, white paper.” (2017) 4
“Walls have ears” 3 Attacker : has video fingerprints Monitoring network tra ffj c 1 2 Video streaming Server : stores video data Client : fetches video data 5
Why does it matter? 1. Users’ watchlists can be obtained by malicious adversaries. 2. ISP or enterprise administrators can spy on their customers or employees. 3. Profit of companies providing streaming services can be damaged. 6
What makes it worse! 1. Tra ffi c-based video identification is ubiquitous despite data encryption. 2. Video streaming has a longer life cycle than other online services, e.g., web browsing. 3. Variable bitrate encoding and segmentation make video streams identifiable. 7
Objective 8
Objective Normally for a period of time during video An streaming. Downloaded eavesdropped video files traffic trace Shape matching by A traffic Fingerprints calculating distance pattern of videos (similarity) 9
VBR VBR (Variable Bit-Rate encoding) Data amount per time slot changes owing to VBR. The bitrate variation trends show similar patterns between different quality levels. 10
DASH DASH (Dynamic Adaptive Streaming over HTTP) ๏ Encoding: Videos are encoded in multiple quality levels. ๏ Segmenting: Video copies of multiple qualities are chunked into segments. ๏ Streaming: Video segments in adaptive quality levels are transmitted in order. 11
DASH 1500 kbps Quality 1000 kbps levels Server: Reply 500 kbps Time Bandwidth High Low Time Downloaded segments Client: Request Time ๏ 😋 Transmitted segments are length-fixed and in-order. ๏ 😕 Quality level while streaming is adaptively switched. 12
Methodology 13
Network Traffic of DASH Network traffic of streaming 3 different videos. ๏ Video segments are transmitted in order. ๏ Video segment length is fixed while streaming. ๏ Tra ffi c pattern owing to VBR is preserved. 14
Segment Aggregation Network traffic data amount per second Bits per period pi Exceed maximum period time τ Bits per second bt ) s p b m Data amount less than ε ( t u p h g u o r h T Time (s) ๏ One segment may take seconds for transmission. ๏ There are gaps between video segment transmissions. ๏ Noises are got rid of by threshold. 15
Bitrate Differential 2000 kbps 1500 kbps i = s i � s i − 1 s 0 1000 kbps s i − 1 500 kbps An example of data amount sequence in video segment. 16
Normalization 1 Sigmoid-normalization: S(x i ) = 1+ e − xi x i − min( x ) Min-max-normalization: M(x i ) = max( x ) − min( x ) Z(x i ) = x i − µ Z-normalization: σ 17
Video Fingerprinting Segmentation Aggregation Differential Normalization 18
Distance Calculation Video Traffic pattern fingerprints ๏ Eavesdropping can hardly start from the beginning. ๏ It is time-consuming to eavesdrop the entire video. 19
Dynamic Time Warping (DTW) Step pattern DTW Matrix Sequence Y 1..N Insertion d (i-1, j) d (i, j) Match Deletion d (i-1, j-1) d (i, j-1) Sequence X 1..M d ( i, j � 1) d ( i � 1 , j � 1) d ( i, j ) = k X i � Y j k + min d ( i � 1 , j ) 20
Partial Sequence Problem We need to relax the constraint of matching each pair of elements to support partial matching between sequences. 21
P(artial)-DTW • Query sequence ( Traffic pattern ): X = (X 1 , X 2 , … X M ) • Template sequence ( Video fingerprints ): Y = (Y 1 , Y 2 , … Y N ) f p − dtw ( X 1 ..M , Y 1 ..N ) = 1 ≤ p ≤ q ≤ N D ( X 1 ..M , Y p..q ) min 22
Normalized Distance Step pattern ❎ Sequence Y p..q d (i-1, j) d (i, j) ❎ d (i-1, j-1) d / M Sequence X 1..M d (i-1, j-2) 23
Experiments 24
Experimental Settings ๏ 200 videos for fingerprinting. ๏ 12 out of 200 videos for streaming. ๏ Quality levels: 500, 1000, 1500, 2000 kbps. ๏ Segment lengths: 4, 6, 8 seconds. 25
Discriminability matched matched unmatched unmatched 0.04 0.04 0.04 0.04 dist dist 0.02 0.02 0.02 0.02 0.00 0.00 0.00 0.00 4 4 6 6 8 8 60 60 90 90 120 120 150 150 180 180 Eavesdropping time (s) Segment length (s) 0.12 matched unmatched Distance calculation by P-DTW 0.08 has a good discriminability on dist multiple variables. 0.04 0.00 1 2 3 4 5 6 7 8 9 10 11 12 Video index 26
Distance Threshold Normalized similarity distance 0.8 0.10 matched False Positive 0.08 unmatched 0.6 False Negative False rate 0.06 0.4 0.04 0.2 0.02 0.0 0.00 0.010 0.015 0.020 0.025 0.030 0.035 0.040 DTW MVM P − DTW Threshold Method name ๏ P-DTW shows more discriminability. ๏ The threshold is calculated accordingly. 27
Accuracy ๏ It is greatly influenced by segment length. ๏ Video quality level has a limited impact. 28
Different Videos Final performance of our method by streaming 12 videos with DASH. 29
Conclusion 30
Conclusion Contributions ๏ A di ff erential bitrate pattern extraction method. ๏ An e ff ective shape matching method for identifying videos. ๏ Considerable accuracy with enough eavesdropping. Future Work ๏ More work needs to be done with various encoders and DASH strategies. ๏ Countermeasures considering network e ffi ciency and video QoE are worth studying. 31
Fin. gujiaxi@mail.nwpu.edu.cn 32
Recommend
More recommend