no training hurdles fast training agnostic attacks to
play

No Training Hurdles: Fast Training- Agnostic Attacks to Infer Your - PowerPoint PPT Presentation

No Training Hurdles: Fast Training- Agnostic Attacks to Infer Your Typing Song Fang * , Ian Markwood , Yao Liu , Shangqing Zhao , Zhuo Lu , Haojin Zhu * University of Oklahoma University of South Florida Shanghai


  1. No Training Hurdles: Fast Training- Agnostic Attacks to Infer Your Typing Song Fang * , Ian Markwood † , Yao Liu † , Shangqing Zhao † , Zhuo Lu † , Haojin Zhu ‡ * University of Oklahoma † University of South Florida ‡ Shanghai Jiaotong University

  2. Background • Typing via a keyboard plays a very important role in our daily life. What are you typing? Hacker 2 of 37

  3. Existing Non-invasive Attacks Software or hardware based keylogger General principle: pressing a key causes subtle environmental impacts unique to that key 3 of 37

  4. Example Attacks Vibration pattern Environmental Acoustic feature change Wireless distortion Trained model Training Attack Phase Phase Checking Unknown Keystrokes training data disturbances 4 of 37

  5. Why Is Training A Hurdle A user may change typing behaviors No physical control of Require keyboard pressed key knowledge 5 of 37

  6. Statistical Methods • Frequency analysis: analyzing the frequencies of observed disturbances 0.15 A large amount of 0.1 text 0.05 0 e t a o i n s h r d l c umw f g y p b v k j x q z Letter frequency distribution in English 6 of 37

  7. Question: Is it possible to develop a non-invasive keystroke eavesdropping within a shorter time? Self-contained Probabilistic structures of words Statistics Type Disturbances sense ! 7 of 37

  8. Wireless Signal Based Attacks v Advantages: ü Ubiquitous deployment of wireless infrastructures ü Radio signal nature of invisibility ü Elimination of the line-of-sight requirement • CSI (channel state information) quantifies the disturbances H ( f , t ) = Y ( f , t ) X ( f , t ) Public Y ( f , t ) X ( f , t ) Rx Tx 8 of 37

  9. Outline • Motivation • Attack Design • Experiment Results • Conclusion

  10. System Overview Signal CSI time series Channel estimation Noise removal Reduction Pre-processing Segmentation CSI word group generation CSI samples Dictionary demodulation Alphabet matching Keystrokes A CSI sample refers to an individual segment corresponding to the action of pressing a key. 10 of 37

  11. CSI Word Group Generation CSI word CSI Word Classification Sorting groups samples segmentation A CSI word group refers to the a group of CSI samples comprising each typed word. 11 of 37

  12. Word Classification Sorting segmentation · · · Set 1 Similarity CSI samples calculation Set 2 · · · 12 of 37

  13. Word Classification Sorting segmentation Set 1 Set 2 Set i Set N · · · · · · Sort based on the size · · · · · · · · · 13 of 37

  14. Word Classification Sorting segmentation time …… …… CSI word Dictionary …… group demodulation Space-associated Non-space-associated / / /··· 14 of 37

  15. Dictionary Demodulation (DD) DD Feature Extraction Joint Demodulation CSI word English groups words Error Tolerance (Eg., ) Non-Alphabetical Impact 15 of 37

  16. Feature Extraction Ø Length L : number of constituent letters Ø Repetition { L , ( t 1 , … , t r )} : o r is the number of distinct letters that repeat, o t i denotes how many times the corresponding letter repeats Ø Inter-Element Relationship Matrix M if x i and x j are same or similar 16 of 37

  17. Feature Extraction • Dictionary: Top 1,500 most frequently used word list [1] Set 1 · · · Selected English words feature Set 2 Length OR · · · Repetition OR Relationship Matrix [1] Mark Davies. “Word frequency data from the Corpus of Contemporary 17 of 37 American English (COCA),” http://www.wordfrequency.info/free.asp.

  18. Feature Extraction -- number of sets obtained Uniqueness rate = T p -- number of consider words T Better partitioning (distinguishability) Uniqueness Average set rate cardinality Length 0.009 107 Repetition 0.042 24 Relationship matrix 0.225 4 18 of 37

  19. Joint Demodulation • Example: o A dictionary W ={‘among’, ‘apple’, ‘are’, ‘hat’, ‘honey’, ‘hope’, ‘old’, ‘offer’, ‘pen’}. o Type in two words: “apple” and “pen” 1) R 1 : 2) compute the relationship matrix for each word in W , and compare each with R 1 Candidates: “apple” and “offer” 19 of 37

  20. Joint Demodulation 3) Candidates: {“hat”, “old”, “are”, “pen”} || R new 4) 5) Candidates T of the two-word sequence, {“apple||hat”, “apple||old”, “apple||are”, “apple||pen”, “offer||hat”, “offer||old”, “offer||are”, “offer||pen”} 6) Generate the relationship matrix for each new candidate in T and compare it with R new Final result: “apple||pen” 20 of 37

  21. Joint Demodulation • Input: m CSI word groups S = { S 1 , S 2 , … , S m }; Ø dictionary with q words W = { W 1 , W 2 , … , W q } Ø • Output: a corresponding phrase of m words Ø • Observation: each CSI word group => multiple candidate words Ø each candidate => <CSI sample, letter> mapping info Ø 21 of 37

  22. Joint Demodulation Step 1: find initial candidate words for each CSI word group R CSI word group R each word Compare => match, add the word as a candidate; no match, add the CSI word group to the “undemodulated set” U 22 of 37

  23. Joint Demodulation Step 2 (iteratively): (a) T i : concatenation of the first i -1 demodulated CSI word groups; candidates for T i are { T i 1 , T i 2 , … , T ip } (b) S i : the i- th CSI word group; candidates for S i are { S i 1 , S i 2 , … , S iq } (by step 1) (c) Find new candidates for concatenated CSI word groups R R Compare T i || S I T ij || S ik (1<=j<= p , 1<= k <= q ) => match, add T ij || S ik as a candidate for T i +1 ; no match, add S i to U and skip to S i +1 23 of 37

  24. Joint Demodulation • Alphabet matching: the mapping can be applied to the remaining CSI word groups and those in U Ø Example: the user types “deed” || “would” after the mapping is established; 24 of 37

  25. Error/Non-Alphabetical Characters Tolerance • Abnormal situations: Ø CSI classification errors X Set of CSI samples A CSI sample for the letter for the letter Ø Typos/Non-Alphabetical Characters Match with Have no Consequence invalid words candidates Add the CSI word Cascading discovery group to the set U failures 25 of 37

  26. Outline • Motivation • Attack Design • Experiment Results • Conclusion

  27. Experiment Results • Attack system: Ø a wireless transmitter + a receiver (each is a USRP connected with a PC) Ø the channel estimation algorithm runs at the receiver to extract the CSI for key inference. Ø dictionary: Top 1,500 most frequently used word list • Target user: Ø a desktop computer with a Dell SK-8115 USB wired standard keyboard 27 of 37

  28. Example Recovery Process • Randomly select 5 sentences from the representative English sentences in the Harvard sentences [2] . Input paragraph: The boy was there when the sun rose. A rod is used to catch pink salmon. The source of the huge river is the clear spring. Kick the ball straight and follow through. Help the woman get back to her feet. Step%1% Searching results: The boy/box was there when the sun rose. A *** is used to catch **** *****. The source of the huge river is the clear spring. **** the ball straight and follow through. Help the woman get back to her ****. Step%2% Recovering words not in the dictionary: (1) rod; (2) pink; (3) salmon; (4) Kick; (5) feet. [2] IEEE Subcommittee on Subjective Measurements. “IEEE Recommended Practice for Speech Quality Measurements,” IEEE Transactions on Audio and 28 of 37 Electroacoustics , vol. 17, no. 3 (Sep 1969), pp. 227–246.

  29. Eavesdropping Accuracy # of successfully recovered words Word recover ratio= total # of input words • Single article recovery (Type a piece of CNN news) 1 Word recovery ratio 0.8 0.6 0.4 0.2 0 0 50 100 Number of typed words 29 of 37

  30. Impact of CSI Sample Classification Errors • We artificially introduce errors into the groupings. 1 1500-word dictionary 1000-word dictionary 0.8 Word recovery ratio 500-word dictionary 0.6 0.4 0.2 0 0.4 0.5 0.6 0.7 0.8 0.9 1 Success rate of classification 30 of 37

  31. Overall Recovery Accuracy • L WRR > x denotes the required number of typed words from each article to satisfy the ratio x . 1 0.8 Empirical CDF 0.6 0.4 0.2 P ( L W RR> 0 . 8 < L ) P ( L W RR> 0 . 9 < L ) 0 0 20 40 60 Number L of typed words 31 of 37

  32. Time Complexity Analysis • The comparison of relationship matrices is the dominant part of the demodulation phase. 5 10 1500 − word dictionary New comparison number 4 1000 − word dictionary 10 500 − word dictionary 3 10 2 10 1 10 0 10 0 10 20 30 40 50 Number of words 32 of 37

  33. Password Entropy Reduction • The higher the entropy, the more the randomness • 2012 Yahoo! Voices hack [3] : 342,508 passwords: 98.42% of passwords are 12 characters or fewer 0.8 Ratio of letters 0.6 0.4 0.2 0 6 7 8 9 10 11 12 Key length [3] 2012 Yahoo! Voices hack. 33 of 37 https://en.wikipedia.org/wiki/2012_Yahoo!_Voices_hack

  34. Password Entropy Reduction (Cont’d) • Breaking a 9-character password is reduced to guessing 1-5 non-letter characters. 34 of 37

  35. Outline • Motivation • Attack Design • Experiment Results • Conclusion

Recommend


More recommend