Metamorph: Injecting Inaudible Commands into Over-the-air Voice Controlled Systems Tao Chen 1 Longfei Shangguan 2 Zhenjiang Li 1 Kyle Jamieson 3 1 City University of Hong Kong, 2 Microsoft, 3 Princeton University
Voice Assistants in Smart Home 2 5 4 1 3 2
Voice Assistants in Smart Home 2 5 4 1 3 2
Voice Assistants in Smart Home 2 5 4 1 111.8 million people in U.S. use voice assistants and related services! 3 2 https://www.emarketer.com/content/us-voice-assistant-users-2019
3 Are they safe enough?
How to attack the voice assistant? 4 Neural networks Speech Recognition Models (SR)
How to attack the voice assistant? 5 Audio Clip: I “ this is for you” T : SR ( I )
How to attack the voice assistant? 5 Audio Clip: I “ this is for you” T : “ open the door” T ′ : SR ( I ) Adversarial Example: I + δ Perturbation: δ
How to attack the voice assistant? 5 Audio Clip: I “ this is for you” T : “ open the door” T ′ : SR ( I ) Adversarial Example: I + δ minimize dB I ( δ ), Perturbation: δ SR ( I ) = T , such that SR ( I + δ ) = T ′ Nicholas Carlini et al. Audio Adversarial Examples, Deep Learning and Security Workshop, 2018
How to attack the voice assistant? 5 Audio Clip: I “ This is for you” T : “ Open the door” T ′ : SR ( I ) Adversarial Example: I + δ Audio Adversarial Attack minimize dB I ( δ ), Perturbation: δ SR ( I ) = T , such that SR ( I + δ ) = T ′ Nicholas Carlini et al. Audio Adversarial Examples, Deep Learning and Security Workshop, 2018
6
6
6
6
6 Is it a real threat? Yes!
6 Adversarial Example
6 Adversarial Example But, failed Over-the-air!
Challenge 7 Channel E ff ect Multi-path Attenuation Hardware Heterogeneity
Challenge 7 Channel E ff ect Multi-path Attenuation Hardware Heterogeneity VS SR ( I + δ ) SR ( H ( I + δ )) H is unknown in advance!
Understand Over-the-air Attack 8 Channel E ff ect Multi-path Attenuation Hardware Heterogeneity
9 Attenuation Attenuation
9 Attenuation Attenuation
9 Attenuation “ Open the door” Normalization Attenuation
9 Attenuation “ Open the door” Normalization Attenuation No frequency-selectivity, doesn’t matter at all!
Understand Over-the-air Attack 10 Channel E ff ect Noise Multi-path Attenuation Hardware Heterogeneity
11 Hardware Heterogeneity Transmitter Anechoic Materials Receiver Anechoic Chamber Testing
11 Hardware Heterogeneity Transmitter Anechoic Materials Receiver Anechoic Chamber Testing
11 Hardware Heterogeneity Transmitter Anechoic Materials Receiver Anechoic Chamber Testing
11 Hardware Heterogeneity Transmitter Anechoic Materials Receiver Not strong, device’s inherent feature, compensable! Anechoic Chamber Testing
12 Hardware Heterogeneity Transmitter Character Successful Rate (CSR): Anechoic Materials Receiver Static, predictable and compensable! Anechoic Chamber Testing
Understand Over-the-air Attack 13 Channel E ff ect Multi-path Attenuation Hardware Heterogeneity
14 Multi-path HIVI M200MK3 HIVI M200MK3 Speaker Speaker HIVI M200MK3 Speaker Over-the-air Over-the-air Ruler Over-the-air Channel Channel Ruler Channel SAMSUNG S7 SAMSUNG S7 SAMSUNG S7
15 Multi-path: Near range O ffi ce Corridor Home HIVI M200MK3 HIVI M200MK3 Speaker Speaker HIVI M200MK3 Tx to Rx: From 0.5m to 8m Speaker Over-the-air Over-the-air Ruler Over-the-air Channel Channel Ruler Channel SAMSUNG S7 SAMSUNG S7 SAMSUNG S7
15 Multi-path: Near range O ffi ce Corridor Home HIVI M200MK3 HIVI M200MK3 Speaker Speaker HIVI M200MK3 Tx to Rx: From 0.5m to 8m Speaker Over-the-air Over-the-air Ruler Over-the-air Q Channel Channel Ruler Channel LOS path SAMSUNG S7 Superimposed signal SAMSUNG S7 SAMSUNG S7 Reflection2 I Reflection1
15 Multi-path: Near range O ffi ce Corridor Home HIVI M200MK3 HIVI M200MK3 Speaker Speaker HIVI M200MK3 Tx to Rx: From 0.5m to 8m Speaker Over-the-air Over-the-air Ruler Over-the-air Q Channel Channel Ruler Channel LOS path SAMSUNG S7 Superimposed signal SAMSUNG S7 SAMSUNG S7 Reflection2 I Reflection1
15 Multi-path: Near range O ffi ce Corridor Home HIVI M200MK3 HIVI M200MK3 Speaker Speaker HIVI M200MK3 Tx to Rx: From 0.5m to 8m Speaker Over-the-air Over-the-air Ruler Over-the-air Q Channel Channel Ruler Channel LOS path SAMSUNG S7 Superimposed signal SAMSUNG S7 SAMSUNG S7 Reflection2 I Reflection1 Also not strong and similar!
16 Multi-path: Long range O ffi ce Corridor Home HIVI M200MK3 HIVI M200MK3 Speaker Speaker HIVI M200MK3 Tx to Rx: From 0.5m to 8m Speaker Over-the-air Over-the-air Ruler Over-the-air Channel Channel Ruler Channel SAMSUNG S7 SAMSUNG S7 SAMSUNG S7
16 Multi-path: Long range O ffi ce Corridor Home HIVI M200MK3 HIVI M200MK3 Speaker Speaker HIVI M200MK3 Tx to Rx: From 0.5m to 8m Speaker Over-the-air Over-the-air Ruler Over-the-air Q Channel Channel Ruler Channel SAMSUNG S7 LOS path SAMSUNG S7 SAMSUNG S7 Superimposed signal Reflection2 I Reflection1
16 Multi-path: Long range O ffi ce Corridor Home HIVI M200MK3 HIVI M200MK3 Speaker Speaker HIVI M200MK3 Tx to Rx: From 0.5m to 8m Speaker Over-the-air Over-the-air Ruler Over-the-air Q Channel Channel Ruler Channel SAMSUNG S7 LOS path SAMSUNG S7 SAMSUNG S7 Superimposed signal Reflection2 I Reflection1
16 Multi-path: Long range O ffi ce Corridor Home HIVI M200MK3 HIVI M200MK3 Speaker Speaker HIVI M200MK3 Tx to Rx: From 0.5m to 8m Speaker Over-the-air Over-the-air Ruler Over-the-air Q Channel Channel Ruler Channel SAMSUNG S7 LOS path SAMSUNG S7 SAMSUNG S7 Superimposed signal Reflection2 I Reflection1 Stronger and unpredictable!
17 Multi-path: Long range O ffi ce Corridor Home HIVI M200MK3 HIVI M200MK3 Speaker Speaker HIVI M200MK3 Tx to Rx: From 0.5m to 8m Speaker Over-the-air Over-the-air Ruler Over-the-air Q Channel Channel Character Successful Rate (CSR): Ruler Channel SAMSUNG S7 LOS path SAMSUNG S7 SAMSUNG S7 Superimposed signal Reflection2 I Reflection1 Highly unpredictable !
18 Design Inspiration “ Open the door” SR ( H ( I + δ )) I + δ
18 Design Inspiration “ Open the door” Unknown, but share similarity! SR ( H ( I + δ )) I + δ
18 Design Inspiration “ Open the door” Unknown, but share similarity! SR ( H ( I + δ )) I + δ SR ( H ( I + δ )) H: public acoustic CIR datasets
18 Design Inspiration “ Open the door” Unknown, but share similarity! SR ( H ( I + δ )) I + δ SR ( H ( I + δ )) H: public acoustic CIR datasets arg min δ α ⋅ dB I ( δ ) + 1 M ∑ i Loss ( SR ( H i ( I + δ )), T ′ )
19 Design Inspiration “ Open the door” Unknown, but share similarity! SR ( H ( I + δ )) I + δ Transcript and Character Successful Rate: SR ( H ( I + δ )) public acoustic CIR datasets arg min δ α ⋅ dB I ( δ ) + 1 M ∑ i Loss ( SR ( H i ( I + δ )), T ′ )
20 Design Inspiration “ Open the door” SR ( H ( I + δ )) I + δ Domain (environment-specific) information dominates! SR ( H ( I + δ )) H: public acoustic CIR datasets
21 Metamorph: Meta-Enha Clean domain information arg min δ α ⋅ dB I ( δ ) + 1 M ∑ i Loss ( SR ( H i ( I + δ )), T ′ ) − β ⋅ L d
̂ 22 Metamorph: Meta-Qual • Acoustic Gra ffi ti: distance ( δ , N ) • Reducing Perturbation’s Coverage: L1/L2 regularization
23 Evaluation: Audio Quality • Examples Classical music Original: Meta-Enha: Meta-Qual: [no transcription] “hello world” “hello world” Human speech Original: Meta-Enha: Meta-Qual: “your son went to “open the door” “open the door” serve at a distant place and became a centurion”
24 Evaluation: Attack Successful Rate • Attack Target: “DeepSpeech” (White-Box) A multi-path prevalent o ffi ce
25 Evaluation: Attack Successful Rate • Line-of-Sight (LOS) Attack Character Successful Rate Transcript Successful Rate Meta-Enha: > 90% attack successful rate
26 Evaluation: Attack Successful Rate • No-Line-of-Sight (NLOS) Attack Character Successful Rate Transcript Successful Rate Meta-Enha: over 85% attack successful rate across 11/20 NLOS location!
27 Conclusion 1. Investigate over-the-air audio adversarial attacks systematically. 2. Propose a “generate-and-clean” two-phase design and improve the audio quality. 3. Develop a prototype and conduct extensive evaluations. Visit acoustic-metamorph-system.github.io for more information!
Recommend
More recommend