Le Lear arnin ing-based P based Prac actic ical S al Smar - PowerPoint PPT Presentation

Le Lear arnin ing-based P based Prac actic ical S al Smar martpho phone ne Eavesdrop oppin ing wi with Bu Built-in A in Acceler elerome meter er Authors: Zhongjie Ba, Tianhang Zheng, Xinyu Zhang, Zhan Qin, Baochun Li, Xue Liu and Kui Ren Presenter: Shiqing Luo

Smartphone Sensors Permission required No Permission needed Voice Sensor Motion Sensor Gyroscope Microphone Accelerometer Accelerometer Magnetic Sensor Image Sensor Magnetometer Camera

Motion Sensor Threat to Speech Privacy • A smartphone gyroscope can pick up surface vibrations incurred by an independent loudspeaker placed on the same table (Michalevsky et al., Usenix 2014). • Gyroscopes are (lousy but still) microphones. • Very low signal to noise ratio • Low sampling frequency Speaker Speaker Identification Digits Recognition Mixed Female/Male 50% 17% Female speakers 45% 26% Male speakers 65% 23%

Motion Sensor Threat to Speech Privacy • Only loudspeaker-rendered speech signals traveling through a solid surface can create noticeable impacts on motion sensors (Anand et al., S&P 2018). Through a Gyroscope shared surface Accelerometer Through air The threat does not go beyond the Loudspeaker-Same-Surface setup studied by Michalevsky et al.

Commonly Believed Limitations • Can only pick up a narrow band of speech signals • Android has a sampling ceiling of 200 Hz • iOS has a sampling ceiling of 100 Hz Fundamental frequency range of human speech 85-180 Hz 165-255 Hz • Does not go beyond the Loudspeaker-Same-Surface setup • Very low SNR (Signal-to-Noise Ratio) • Sensitive to sound angle of arrival

<latexit sha1_base64="FsdzMpQszuC+0z1Y2gmbgrn9ioY=">ADinicfZLdbtMwFMfdBNhaBuvgkhuLiobqiRt103cTIDUCjGpMLoNVXlOG5rzbEjxwFK1nfhmbjbThpw0TbsRNF+euc3/mwc4JY8MQ4zu+SZd+7/2Bnt1x5uPfo8X714Ml5olJN2YAqofRlQBImuGQDw41gl7FmJAoEuwiu3ubxi69MJ1zJz2Yes1FEpJPOCUGXOD0s+yH7Apl5khQSqIXmTX4prCs6iUy/4srwviVIVM4Dr+woiGzxmJYDg5xZ+IYdj31GjcLdVx57jHuK6zi492ODgfwkhfT3zTuxD13cbeMCwXvNqaXkm+M49N8kO/bN1r39H17GjFdaBmy8vBda6r1FQw3Of4czNFQspLXcbLbr3oUpfqxu07fwXQ4K+Dq6PqnPZHjzP8bVmtNwloa3hVuIGiqsP67+8kNF04hJQwVJkqHrxGaUEW04FWxR8dOExYRekSkbgpQkYskoW67SAr8AT4gnSsMrDV56/83I4A6TeRQAGREzSzZjufO2DA1k6NRxmWcGibpqtEkFRiWJd9LHLNqBFzEIRqDrNiOiOaUAPbW4FLcDePvC3OvYbLQ/tmonb4r2EXP0HP0Ermog05QD/XRAFrx3plHVode8/27GP79Qq1SkXOU7Rm9rs/jM74yw=</latexit> <latexit sha1_base64="yYOIwdWODfxZEK63X1Gki8GTDu8=">AC+HicdZLbtNAFIbHLpcSoKSwZDMiomIV2YW2LOvSlCK1NDRJWxRH0fHkJBl1PLZmxkipmydhwKE2PIo7HgbxonFpQ3HGun3Od8/lzMTpYJr43k/HXfpxs1bt5fvVO7eu7/yoLr68EQnmWLYlI1FkEGgWX2DHcCDxLFUIcCTyNzl8V9dMPqDRPZNtMUuzFMJ8yBkYm+qvOithCMucwNRJkBN80txyew3rYTjYtIK3UBE3qUFgZN18r/NdqC2O5QjugxGKRhWKGlIw+NobuNg+B92H97dHwYHEwtvu5NC78G3T/wtIL4M6bOWg5i3n/5V4Hh42C3Cxn9DfrWxami+m9oNVutNqFoeSDvaD5hw1RDn4fv1+teXVvFvS68EtRI2U0+9Uf4SBhWYzSMAFad30vNb0clOFMoO1ipjEFdg4j7FopIUbdy2cXN6VPbWZAh4myQxo6y/7tyCHWehJHlozBjPXVWpFcVOtmZviyl3OZgYlmy80zAQ1CS1eAR1whcyIiRXAFLd7pWwMCpixb6Vim+BfPfJ1cbJe95/XN969qG3vlO1YJo/JE/KM+GSLbJN90iQdwpzM+eh8dr64F+4n96v7bY6Tul5RP4J9/svAufiTA=</latexit> Our Observations: Sampling Frequency • The actual sampling rates of motion sensors are determined by the performance of the smartphone. • Accelerometers on recent smartphones can cover almost the entire fundamental frequency band (85-255Hz) of adult speech. Sampling frequencies supported by Android [1] Delay Options Delay Sampling Rate Model Year Sampling Rate 200 ms 5 Hz Moto G4 2016 100 Hz DELAY NORMAL 20 ms 50 Hz DELAY UI Samsung J3 2016 100 Hz 60 ms 16.7 Hz DELAY GAME LG G5 2016 200 Hz 0 ms AFAP DELAY FASTEST Huawei Mate 9 2016 250 Hz Samsung S8 2017 420 Hz The 200 Hz sampling ceiling Google Pixel 3 2018 410 Hz no longer exists Huawei P20 Pro 2018 500 Hz Huawei Mate 20 2018 500 Hz [1] “Sensor Overview,” https://developer.android.com/guide/topics/sensors/sensors_overview.

Our Observations: New Setup • Employs a smartphone’s accelerometer to eavesdrop on the speaker in the same smartphone. • Much Higher SNR • Sound always arrives from the same direction 0.03 0.4 x-axis z-axis

Our Observations: New Setup • Employs a smartphone’s accelerometer to eavesdrop on the speaker in the same smartphone. • Much Higher SNR • Sound always arrives from the same direction • A smartphone speaker is more likely to reveal sensitive information than an independent loudspeaker.

Threat Model Handhold setting Table setting

Accelerometer-based Smartphone Eavesdropping • Preprocessing: convert acceleration signals into spectrograms. • Speech Recognition: convert spectrograms to text. • Speech Reconstruction: reconstructs voice signals from spectrograms

Preprocessing • Problems in Raw Acceleration Signals • Raw accelerometer measurements are not sampled at fixed interval. • Raw accelerometer measurements can be distorted by human movement. • Raw accelerometer measurements have captured multiple digits and needs to be segmented. Time x-axis y-axis z-axis ("/$ % ) ("/$ % ) ("/$ % ) (ms) 1 -0.2130 -0.1410 10.0020 2 -0.1870 -0.1440 9.9970 3 -0.2110 -0.1510 9.9970 5 -0.2110 -0.1410 10.0070 8 -0.2080 -0.1340 10.0120 10 -0.2150 -0.1320 10.0070

Step 1: Generate Sanitized Single-word Signals • Interpolation Time x-axis y-axis z-axis (ms) ("/$ % ) ("/$ % ) ("/$ % ) • Upsample accelerometer signals to 1000 Hz using linear interpolation. 1 -0.2130 -0.1410 10.0020 2 -0.1870 -0.1440 9.9970 3 -0.2110 -0.1510 9.9970 5 -0.2110 -0.1410 10.0070 8 -0.2080 -0.1340 10.0120 10 -0.2150 -0.1320 10.0070

Step 1: Generate Sanitized Single-word Signals • Interpolation Time x-axis y-axis z-axis (ms) ("/$ % ) ("/$ % ) ("/$ % ) • Upsample accelerometer signals to 1000 Hz using linear interpolation. 1 -0.2130 -0.1410 10.0020 2 -0.1870 -0.1440 9.9970 3 -0.2110 -0.1510 9.9970 4 -0.2110 -0.1460 10.0020 5 -0.2110 -0.1410 10.0070 6 -0.2100 -0.1387 10.0087 7 -0.2090 -0.1363 10.0103 8 -0.2080 -0.1340 10.0120 9 -0.2115 -0.1330 10.0095 10 -0.2150 -0.1320 10.0070

Step 1: Generate Sanitized Single-word Signals • Interpolation Fundamental frequency range of human speech • Upsample accelerometer signals to 1000 Hz using linear interpolation. • High-pass filter • Convert the acceleration signal along each 85-180 Hz 165-255 Hz axis to the frequency domain and eliminate frequency components below 80 Hz.

Step 1: Generate Sanitized Single-word Signals • Interpolation Table setting • Upsample accelerometer signals to 1000 Hz using linear interpolation. • High-pass filter • Convert the acceleration signal along each axis to the frequency domain and eliminate frequency components below 80 Hz. Handhold setting

Step 1: Generate Sanitized Single-word Signals Table setting • Interpolation • Upsample accelerometer signals to 1000 Hz using linear interpolation. • High-pass filter • Convert the acceleration signal along each axis to the frequency domain and eliminate frequency components below 80 Hz. • Segmentation Handhold setting • Calculate the magnitude of the acceleration signal and smooth the obtained magnitude sequence with moving average. • Locate all regions with magnitudes higher than a threshold.

Step 2: Generate Spectrogram Images • Signal-to-spectrogram conversion • Divide the signal into multiple short segments with a fixed overlap. • Window each segment with a Hamming window and calculate its spectrum through STFT (Short- Time Fourier Transform). • Three spectrograms can be obtained for each single-word signal. Table setting Handhold setting

Step 2: Generate Spectrogram Images • Signal-to-spectrogram conversion • Divide the signal into multiple short segments with a fixed overlap. • Window each segment with a Hamming window and calculate its spectrum through STFT (Short- Time Fourier Transform). • Three spectrograms can be obtained for each single-word signal. Table setting Handhold setting • Generate Spectrogram-Images • Fit the three m x n spectrograms into one m x n x 3 tensor. • Take the square root of all the elements in the tensor and map the obtained values to integers between 0 and 255. • Export the m x n x 3 tensor as an image in PNG Table setting format Handhold setting

<latexit sha1_base64="gr8gdQU1LkMOStjg+tgd8EtEyJ8=">ACRHicbVDLSgMxFM3UVx1fVZdugkWoUIcZH+hGKLrpsoJ9QDuUTJpQzMPkoxYhvk4N36AO7/AjQtF3IqZaYU+vBy7jnkpvjhIwKaZqvWm5peWV1Lb+ub2xube8UdvcaIog4JnUcsIC3HCQIoz6pSyoZaYWcIM9hpOkMb1O9+UC4oIF/L0chsT3U96lLMZK6hbaHQ/JgePGj0mXwWv412Y3RiyuJkoTdvMpzqrLJhGDNMzE6sxD7W9W6haBpmVnARWBNQBJOqdQsvnV6AI4/4EjMkRNsyQ2nHiEuKGUn0TiRIiPAQ9UlbQR95RNhxFkICjxTg27A1fElzNjpiRh5Qow8RznTXcW8lpL/ae1Iuld2TP0wksTH4fciEZwDR2KOcYMlGCiDMqdoV4gHiCEuVexqCNf/lRdA4Nawz4+LuvFi5mcSRBwfgEJSABS5BVRBDdQBk/gDXyAT+1Ze9e+tO+xNadNZvbBTGk/v83dsTI=</latexit> Speech Recognition x l = H l ([ x 0 , x 1 , ..., x l − 1 ]) • DenseNet: • Direct connections between each layer • Fewer nodes and parameters • Comparable performance with VGG & ResNet Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition . 2017.

Le Lear arnin ing-based P based Prac actic ical S al Smar - PowerPoint PPT Presentation

Le Lear arnin ing-based P based Prac actic ical S al Smar martpho phone ne Eavesdrop oppin ing wi with Bu Built-in A in Acceler elerome meter er Authors: Zhongjie Ba, Tianhang Zheng, Xinyu Zhang, Zhan Qin, Baochun Li, Xue Liu

Re Resolution planning: Legal framework and pr prac actic ical al expe perie ienc nce Ewa

Ju Just-in in-Ti Time Teaching A State of f the Art of f a a Blended Lear arnin ing an

Hor oriz izon ons Lear arnin ing Fe Federati tion on Han andwritin ting an and

Wh What Can I Do To Help Su Support My My Chi hild d in in Lear arnin ing English lish?

20 2017 17- 2018 2018 le lear arnin ing toget ethe her, , growin wing togeth ether er

DIG IGITAL AND ENVIRONMENTAL SKIL ILLS FOR FACILITIES MANAGEMENT O3 - A1 Le Lear arnin ing

Microsoft AI and Research Deep Learning at Microsoft 2 De Deep L Lear arnin ing I Inference

Condensed Lear arnin ing Diar arie ies for Refle lectiv ive Develo lopment a new

CURRICULUM 19 September 2019 A lo love of lear arnin ing g grows s here (A farmer er went

I . Preliminaries: practical matters I . Preliminaries: practical matters A. Office

Spelling, Punctuation and Grammar Suffixes -ing Year One SPaG | Suffixes -ing Suffixes Suffixes

LEAR C ONTIGUOUS A REAS A NALYSIS (CAA) M APPING R EFINEMENT LEAR Open House Presentation April

Trainin ing and Exercis ising th the Nucle lear Safety and Nucle lear Securit ity In

Econom ical Aspects Econom ical Aspects Pay per Risk Pay per Use Pay per Use Pay per

Lear arni ning ng an and As d Assessment ment (OTL TLA) A) Techn hnic ical al Skil

Patchogue-M atchogue-Medfor dford UFSD d UFSD Smar art Schools I t Schools Investment P

RFID Technical Tutorial RFID Technical Tutorial Presented by: Dale R. Thompson Presented by:

Development of Over 1 MW and Multi-Frequency Gyrotrons for Fusion T. Imai, T. Kariya, R. Minami,

INTRODUCTION TO DELTA-SIGMA ADCS Richard Schreier richard.schreier@analog.com NLCOTD: Level

Outline FPGA clocking Programmable clocks Dynamic programmable oscillators EMI

Tonic Identification System for Hindustani and Carnatic Music Sankalp Gulati, Justin Salamon and

1.2. Frequency Spectrum of Signals Q: Can the Fourier Transform be applied to a non- periodic

Frequency Registration for Small Satellite Missions United Nations/South Africa Symposium on

Sampling Theory The world is continuous Like it or not, images are discrete. Intro to

Sambuz

Useful Links

Newsletter

Mail Us