Spot me if you can: Uncovering spoken phrases in encrypted VoIP - PowerPoint PPT Presentation

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and Cryptography Seminar 1 / 30

Overview 1 How does VoIP work? 2 Recognizing previously seen phrases 3 Recognizing phrases without example utterances 4 Evaluation 2 / 30

1 How does VoIP work? 2 Recognizing previously seen phrases 3 Recognizing phrases without example utterances 4 Evaluation 3 / 30

How does VoIP work? • Control channel: SIP, XMPP, Skype • negotiate IP ports, supported codecs etc. • Voice data: RTP over UDP • Speech codec: GSM, G.728, iSAC, Speex 4 / 30

Operation of a Codec → → audio stream sampling at 8000 or n most recent sam- 16000 samples per ples compressed second (Hz) to packet (usually 20ms) Example • 16kHz audio source: n = 320 samples per packet • 8kHz audio source: n = 160 samples per packet 5 / 30

Operation of a Codec (2) • brute-force search over entries in codebook of audio vectors • find one that most closely reproduces audio packet → 01001110 audio packet digital representation ↓ In Out 01001010 0110 → 0111 01001110 0111 output 01011001 1000 01011010 1001 01011110 1010 codebook 6 / 30

Operation of a Codec (3) • Quality of sound depends on # entries in codebook • Classification of coders according to bit-rate: Category Bit-rate range High bit-rate > 15 kbps Medium bit-rate 5 to 15 kbps Low bit-rate 2 to 5 kbps Very low bit-rate < 2 kbps 7 / 30

Variable Bit Rate • Variable bit rate (VBR): adaptively choose bit rate for each packet • Balance between audio quality and bandwidth • In a two-way conversation: speaker silent 63% of the time 8 / 30

Variable Bit Rate (2) LEAKAGE: • Bit rate depends on encoded data • e.g., Speex encodes vowel sounds ( aa , aw ) at higher bit rate than fricative sounds ( f , s ) 9 / 30

Problem Description Given: • utterances of n phrases phrase 1 phrase 2 phrase 3 • packet sizes of one of the phrases (5k,7k,3k,8k,12k,2k,1k) Goal: • recognize the phrase (5k,7k,3k,8k,12k,2k,1k) → “ the phrase ” 11 / 30

Profile Hidden Markov Model (HMM) • Match states - expected distribution of packet sizes at each position in the sequence • Insert states - emit packets according to some distribution (uniform). Allows “insertion” of additional packets. • Delete states - silent states. Allows “omitting” packets. 12 / 30

Building a Profile HMM Initially: • set Match state probabilities to uniform distribution • transition probabilities : make Match the most likely transition 13 / 30

Building a Profile HMM Initially: • set Match state probabilities to uniform distribution • transition probabilities : make Match the most likely transition Train the HMM using example utterances 13 / 30

Building a Profile HMM Initially: • set Match state probabilities to uniform distribution • transition probabilities : make Match the most likely transition Train the HMM using example utterances: • Apply Baum & Welch algorithm: iteratively improves the probability of the training sequences • Baum & Welch finds locally optimal set of parameters ⇒ apply Simulated annealing • Apply Viterbi training to further refine parameters. 13 / 30

Problem Description Given: • utterances of n phrases phrase 1 phrase 2 phrase 3 • packet sizes of one of the phrases (5k,7k,3k,8k,12k,2k,1k) Goal: • recognize the phrase (5k,7k,3k,8k,12k,2k,1k) → “ the phrase ” 14 / 30

Searching for a Phrase Changes: • Random - emit packets according to uniform distribution. Matches packets not part of phrase of interest • Profile Start/End - matches start/end of phrase • from PS: transition to the first M state is most likely 15 / 30

Searching for a Phrase (2) • Apply the Viterbi algorithm - find most likely sequence of states to explain observed packet sizes • A “hit” : subsequence of states that belong to the profile part of the model 16 / 30

Searching for a Phrase (2) • Apply the Viterbi algorithm - find most likely sequence of states to explain observed packet sizes • A “hit” : subsequence of states that belong to the profile part of the model • Evaluate the hit ’s goodness: l i , . . . , l j – packet lengths of the phrase of interest score i , j = log Pr [ l i , . . . , l j | Profile ] Pr [ l i , . . . , l j | Random ] • Discard hits below a threshold 16 / 30

Phrase Models from Phonemes • Phonemes – sounds like b , ch , t , s , aa , aw (English - 40 to 60 phonemes) • Idea: words built up by concatenated phonemes ⇒ model phonemes instead 18 / 30

Phrase Models from Phonemes • Phonemes – sounds like b , ch , t , s , aa , aw (English - 40 to 60 phonemes) • Idea: words built up by concatenated phonemes ⇒ model phonemes instead Advantages: • Flexibility • Cheaper 18 / 30

Problem Description Given: • recordings of all phonemes aa, ae, ah, ao, aw, ay, b, ch, d, dh, eh, er, ey, f, g, hh, etc. • packet sizes of a phrase (5k,7k,3k,8k,12k,2k,1k) Goal: • recognize the phrase (5k,7k,3k,8k,12k,2k,1k) → “ the phrase ” 19 / 30

Phrase Models from Phonemes (2) Straightforward method: 1 build HMMs for phonemes 2 concatenate them, build word HMM 3 concatenate word HMMs to phrase HMM 20 / 30

Phrase Models from Phonemes (2) Straightforward method: 1 build HMMs for phonemes 2 concatenate them, build word HMM 3 concatenate word HMMs to phrase HMM American English: “the phrase” (5k,7k,1k,8k,12k,2k,1k) ↓ (dh,ah),(f,r,ey,z) ↓ (“ the ”),(“ phrase ”) ↓ “ the phrase ” 20 / 30

Phrase Models from Phonemes (2) Straightforward method: 1 build HMMs for phonemes 2 concatenate them, build word HMM 3 concatenate word HMMs to phrase HMM Scottish English: “the phrase” (5k,7k,1k,8k,10k,2k,1k) ↓ (dh,ah),(f,r,eh,z) ↓ (“ the ”),(“ frese ”?) ↓ ? 20 / 30

Problem Description Given: • recordings of all phonemes aa, ae, ah, ao, aw, ay, b, ch, d, dh, eh, er, ey, f, g, hh, etc. • packet sizes of a phrase (5k,7k,3k,8k,12k,2k,1k) Goal: • recognize the phrase (5k,7k,3k,8k,12k,2k,1k) → “ the phrase ” 21 / 30

Problem Description Given: • recordings of all phonemes aa, ae, ah, ao, aw, ay, b, ch, d, dh, eh, er, ey, f, g, hh, etc. • packet sizes of a phrase (5k,7k,3k,8k,12k,2k,1k) • phonetic pronunciation dictionary Goal: • recognize the phrase (5k,7k,3k,8k,12k,2k,1k) → “ the phrase ” 21 / 30

Phrase Models from Phonemes (3) Advanced method: • build initial profile HMM for phrase (as usual) • train it using synthetic training set • search for phrase (as usual) 22 / 30

Phrase Models from Phonemes (3) Advanced method: • build initial profile HMM for phrase (as usual) • train it using synthetic training set • search for phrase (as usual) Synthetic training set: • phrase: “the phrase” • split into words: “the” “phrase” • create list of phonemes: “dh ah” “f r ey z” • replace with packet sizes: “9k 20k” “5k 8k 14k 3k” 22 / 30

Phrase Models from Phonemes (3) Advanced method: • build initial profile HMM for phrase (as usual) • train it using synthetic training set • search for phrase (as usual) Synthetic training set: • phrase: “the phrase” • split into words: “the” “phrase” • create list of phonemes: “dh ah” “f r ey z” • replace with packet sizes: “9k 20k” “5k 8k 14k 3k” Improved Model: use diphones and triphones instead of words 22 / 30

Experimental Setup • Use TIMIT continuous speech corporus • Concatenate sentences to “conversation” • Training of HMM: • TIMIT pronunciation dictionary (“proper” American English) • PRONLEX pronunciation dictionary (more colloquial English) 24 / 30

Evaluation Metrics • recall : Probability that algorithm finds phrase • precision : Probability that reported match is correct 25 / 30

Results of the Experiment recall precision 51% 50% 26 / 30

Results of the Experiment recall precision 51% 50% • Some phrases were found with high accuracy: “Young children should avoid exposure to contagious diseases.” (recall = 0.99, precision = 1) 26 / 30

Results of the Experiment recall precision 51% 50% • Some phrases were found with high accuracy: “Young children should avoid exposure to contagious diseases.” (recall = 0.99, precision = 1) • A high deviation of results for individual speakers 26 / 30

Robustness to Noise Using pink noise : • energy logarithmically distributed across range of human hearing • harder for noise removal algorithms to filter it sound noise recall precision 100% - .51 .50 90% 10% .39 .40 75% 25% .23 .22 27 / 30

Spot me if you can: Uncovering spoken phrases in encrypted VoIP - PowerPoint PPT Presentation

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and Cryptography Seminar 1 / 30 Overview 1

Adverbial Phrases Aim Aim To identify prepositional phrases and adverbial phrases To

2 Syntax 2.1 Words 2.2 The Elements of Simple Noun Phrases 2.3 Verb Phrases and Simple

Cotton Incorporated TARGET SPOT UPDATE A. K. Hagan Auburn University TARGET SPOT Target Spot

I- -66 Spot Improvement Design Study 66 Spot Improvement Design Study I 1 Spot Improvement

The Basics of Syntax Introducing Noun Phrases Some Further Details Introducing Verb Phrases

Spoken Language Structure Hsin-min Wang References: - X. Huang et al., Spoken Language

FDP101X: Lab Assignment 2 REFLECTION SPOT ACTIVITY IMAGE ON ALU IN MICROPROCESSOR Reflection

Flooding If the spot on the drawing is not empty return Color the spot using c

Cutting MapReduce Cost with Spot Market Huan Liu Accenture Technology Labs Why spot market? 2

How Yelp.com Runs on Apache Mesos in AWS Spot Fleet for Fun and Profit (75% Off) Kyle Anderson -

Traceback for End-to-End Encrypted Messaging Nirvan Tyagi Ian Miers Tom Ristenpart CCS 2019 1

TLS 1.3 Encrypted SNI ekr: ekr@rtfm.com dkg: dkg@aclu.org IETF 94 TLS 1.3 Encrypted SNI 1

Challenges With Building End-to-End Encrypted Challenges With Building End-to-End Encrypted

Learning Human Interaction by L i H I i b Interactive Phrases Interactive Phrases Yu Kong

Nesting habits of fmightless wh-phrases Patrick D. Elliott (MIT) November 25, 2019 Complex

Noun Phrases February 13, 2017 Next assignments Hundred noun phrases Hundred sentences

REVISED 10 CFR PART 35: MEDICAL USE OF BYPRODUCT MATERIAL Subpart H: Photon Emitting Remote

SpOT-Light: Lightweight Private Set Intersection from Sparse OT Extension Benny Pinkas Mike

Delivering Value. Kinross Gold Corporation Cautionary Statement on Forward-Looking Information

WC WC P P S S S S G G o o o o g g l l e e A A p p p p s s - - S S l

Dr. J r. Jones has served as the Director, Geisinger Regional Laboratories since 1985 and

FireSim Scale-Out System Simulation in the Public Cloud https://fires.im @firesimproject

SHA-1 is a Shambles First Chosen-Prefix Collision on SHA-1 and Application to the PGP Web of

From the Structure and Function of the Ribosome to new Antibiotics Cricks central dogma of