Synesthesia
The problem • Many colleagues appear blandly disengaged during crucial video-conference calls 2
The challenge • Telling what they are actually doing … VS. 3
Idea: “hear” the screen ? Victim (evil colleague appearing Attacker (you) aloof and disengaged) Voice over IP 4
acoustic noise ? 5
Acoustic leakage from screens is dangerous WWW Microphones are ubiquitous Audio is commonly Acoustic leakage highly shared and stored available compared to electromagnetic leakage …conveying [Eck’85][Kuh’04] on-screen content? 6
Detecting leakage: “see a Zebra” 66 stripes x 60 refresh per second = pixel color 4k black/white transitions transitions ( Zebra ) per second !! 4 kHz Time Frequency 7
Changing stripe width Time Frequency 8
Leakage pattern consistent across makes/models U3011t 920NW 170S4 ZR30w 9
Leakage pattern consistent across many makes/ models 10
Whence acoustic leakage? 11
Whence acoustic leakage? display control board power supply vs. acoustic leakage of CPU computation [GST’14] 12
So far: lab conditions 13
Victim’s environment Record using commodity equipment? Webcam microphone (close to screen) Codec-encoded Victim audio? (evil colleague appearing aloof and Attacker (you) disengaged) Voice over IP 14
Codec-encoded VoIP (Google Hangouts) VoIP 15
Recordings uploaded to the cloud Leakage still detectible in cloud-archived recordings! 16
Smart phone 17
Attack at a distance (using a parabolic dish) 18
What can an attacker do? • Activity/website distinguishing • On-screen keyboard snooping g abcdefg • Text extraction 19
How? 1. denoising 2. ML-based attacks • Website distinguishing • On-screen keyboard snoop • Text extraction 20
Observation (1): amplitude modulation amplitude time pixel line intensity modulated on 32 kHz carrier 21
Observation (2): signal redundancy • Screen refreshes every ~1/60 seconds è the signal is extremely redundant! • Chop and average? 0 sec 1/60 sec 2/60 sec 3/60 sec 4/60 sec Average: high SNR! 22
Leveraging redundancy: challenges • Drift 0 sec 1/60+ 𝜗 sec sec 2/60+ 2 𝜗 sec sec 3/60+ 3 𝜗 sec sec 4/60+ 4 𝜗 sec sec • Jitter (+anomalous refresh cycles) ??+1/60+ 𝜗 sec 0 sec 1/60+ 𝜗 sec sec ?? sec sec 23
Leveraging redundancy: our approach • Naïve approaches do not work • High-level idea: – Choose a “master” chop that correlates well with its consecutive one – Extract chops chronologically, starting with the master – Automatically account for minor drift on-the-fly using a correlation test – If correlation becomes very low (indicating jitter encountered), re- synchronize with master chop via correlation analysis Our Ground truth approach 24
How? 1. denoising 2. ML-based attacks • Website distinguishing • On-screen keyboard snoop • Text extraction 25
ML-based attacker: website distinguishing display different websites, training traces simulate attack (with known websites) attacker’s neural network denoise screen training off-line phase attack time victim’s trace inference victim’s victim’s denoise screen website 26
Website distinguishing: results attacker accuracy websites traces per website video-chat window vs. 97% 97 100x5s surfing the Web 90% 97 100x5s 91% 97 100x5s 99.4% 10 sites + 300x6s Hangouts window 27
How? 1. denoising 2. ML-based attacks • Website distinguishing • On-screen keyboard snoop • Text extraction 28
On-screen keyboards Considered “safe” against audio-recording attacks on physical keyboards [AA’04, BWY’06, VP’09, HS’12, BCV’08, HS’15, ZZT09, CCLT’17] Sometimes required for security, e.g., by online banking websites 29
victim’s trace inference victim’s victim’s denoise screen website key 30
Results: keyboard snooping 1 attacker screen key key layout accuracy top-3 accuracy Extract whole words 40.8% 71.9% with high accuracy? 96.4% 99.6% 31
Results: keyboard snooping 2 (grouping horizontally-aligned keys) attacker screen word contained in layout small “prediction set” 94% 98% 32
How? 1. denoising 2. ML-based attacks • Website distinguishing • On-screen keyboard snoop • Text extraction 33
ML-based attacker: text extraction victim’s trace inference victim’s victim’s “open-world” denoise screen website domain, cannot ??? directly apply classifier 34
Extracting on-screen text • Idea: 1. Train separate classifier for each character location è Up to 98% per-character accuracy 2. Error-correction exploiting natural language redundancy è Exact word extracted with probability >1/2 Some limitations: large monospace font, known layout … 35
Cross-screen train-test display different websites, training traces simulate attack (with known websites) attacker’s attacker’s neural network denoise Can we train on one screen screen training screen and attack off-line phase another screen? attack time victim’s trace inference victim’s victim’s victim’s denoise screen screen website 36
Are traces from different screens similar? S1 amplitude S2 T (sec) S1 37
Learning from multiple screens • Challenge: overfitting to training screen • Idea: learn from multiple screens Trend: more training screens à higher accuracy Up to 94% accuracy Distinguishing between 25 websites, training on up to 10 screens 38
cs.tau.ac.il/~tromer/synesthesia Microphones are ubiquitous Audio is commonly shared and stored It conveys on-screen A thousand words are content worth a picture 39
Recommend
More recommend