beyond credential stuffing password similarity models
play

Beyond Credential Stuffing: Password Similarity Models using Neural - PowerPoint PPT Presentation

Beyond Credential Stuffing: Password Similarity Models using Neural Networks Bijeeta Pal*, Tal Daniel + , Rahul Chatterjee*, and Thomas Ristenpart* *Cornell Tech + Technion 1 Password Breaches Millions of passwords leaked every year First half


  1. Beyond Credential Stuffing: Password Similarity Models using Neural Networks Bijeeta Pal*, Tal Daniel + , Rahul Chatterjee*, and Thomas Ristenpart* *Cornell Tech + Technion 1

  2. Password Breaches Millions of passwords leaked every year First half of 2018 alone, about 4.5 billion records were exposed [1] 2 [1] "Data breaches compromised 4.5bn records in half year 2018 – Gemalto", The Citizen, October 17, 2018

  3. Implication of breaches Username Password Username Password mark jicDfba1 mark jicDfba1 julia password charlie 123456 mark, jicDfba1 tom abc123 amelie y567dty56 Attacker Server … … … … Authentication Database Leaked Dataset Prior work: 40% users reuse passwords [2] Credential Stuffing Attack 90% of login traffic and most prevalent form of account compromise! [3] 3 [2] S. Pearman et al. “Let’s go in for a closer look:Observing passwords in their natural habitat,”.ACM CCS 2017, pp. 295–310. [3]Shape Security, “2017 Credential spill report,” http://info.shapesecurity. com/rs/935-ZAM-778/images/Shape-2017-Credential-Spill-Report.pdf/, 2018.

  4. Countermeasures Username Password Username Password mark jicDfba1 mark jicDfba1 julia password charlie 123456 mark, jicDfba1 tom abc123 amelie y567dty56 Attacker Server … … … … Authentication Database Leaked Dataset mark Breach Notification Service Reset Password! 4

  5. Countermeasures Username Password Username Password mark jicDfba1 mark jicDfba123 julia password charlie 123456 mark, jicDfba1 tom abc123 amelie y567dty56 Attacker Server … … … … Authentication Database Leaked Dataset 5

  6. Credential tweaking attacks Username Password Username Password mark jicDfba1 mark jicDfba123 julia password charlie 123456 mark, jicDfba1 tom abc123 amelie y567dty56 Attacker Server … … mark, JicDfba … … Authentication Database Leaked Dataset mark, jicDfba123 6

  7. Our contributions Defense Attack Personalized password Most damaging credential strength meters (PPSM) tweaking attack to date § Built using neural network § Built using state of art deep based embedding models learning framework § Robust against all known § 16% of accounts compromised in attacks less than 1000 guesses § Fast and light-weight (3MB) § Evaluated on real user accounts of a large universit y 7

  8. Starting point: breach data First discovered by 4iQ on the Dark Web [4] User Password List mark jicDfba1, jicDfba123 1.4 billion email, password pairs 1.1 billion unique emails julia password, 123456, 1234567 463 million unique passwords tom abcd123, abcd More than 150 million users with … … 2 or more passwords Lots of similar Around 10% of distinct password pairs of passwords same user are within 1 edit distance 8 [4] J. Casal, “1.4 Billion Clear Text Credentials Discovered in a Single Database, ” https://medium.com/4iqdelvedeep/1-4-billion-clear-textcredentials-discovered-in-a-single- database-3131d0a1ae14, Dec, 2017.

  9. Prior work: manually chosen transformation rules Previous work [5][6] User Password List • Can’t generate new guesses once rules mark jicDfba1, jicDfba123 exhaust julia password, 123456, • Might have missed similarity patterns 1234567 markFacebook à mark@facebook tom abcd123, abcd markSuperman à marcSuperman … … [5] A. Das et al., “The tangled web of password reuse.” in NDSS, vol. 14, 2014, pp. 23–26. 9 [6] D. Wang et al., “Targeted online password guessing: An underestimated threat,” in ACM CCS, 2016, pp. 1242–1254

  10. Data-driven approach for learning similarity User Password List Similarity model mark jicDfba1, 𝑸 ( 𝒙 ’ | 𝒙 ) jicDfba123 Models probability user julia password, selects 𝑥 ’ given old Machine 123456, 1234567 password 𝑥 learning tom abcd123, abcd … … Goal: Build credential tweaking attacks using 𝑸 ( 𝒙 ’ | 𝒙 ) P( 𝒙 ’| 𝒙 ) Passwords ) jicDfba123 0.6 𝑥 = jicDfba1 jicDfba 0.2 JicDfba1 0.1 10

  11. Training generative similarity models Encoder-decoder architecture built using character level recurrent neural network (RNN) <add,2,-1>,0.4 Key-press 0.2 <add,3,-1>,0.3 Encoder Decoder representation -0.1 jicDfba1 jicDfba123 RNN RNN -0.4 0.1 Pass2Path Trained on 144 million of password pairs Took 2 days on Nvidia GTX 1080 GPU and Intel Core i9 processor Model has 2.4 million parameters, takes 60 MB space 11

  12. Simulation-based evaluation User Password Training data Pass2Path List (144 mn w,w’ pairs) mark jicDfba1, jicDfba123 julia password, 123456, 1234567 Test data Online credential tweak attack setting: tom abc123, (100,000 ftgKdu45 w’,w pairs) … … • Given 𝑥 , guess w’ with 𝑟 attempts 𝑟 ≤1000 • • Report fraction of passwords guessed 12

  13. Credential tweaking attacks Our Algo - Pass2Path 53% increase 23% increase Wang et al. Almost 16% of accounts compromised Das et al. 0 2 4 6 8 10 12 14 16 18 % of password cracked given a leaked password of the user q ≤ 10 q ≤ 1000 Using multiple leaked passwords: P ( 𝑥 ’ | 𝑥 1, 𝑥 2,…) Pass2path-based attack compromising 23% of accounts (see paper) 13

  14. Credential tweaking in practice Large-scale auth system Partnered with No real world • ~500,000 accounts Cornell University evaluation of cred • Use credential IT Security (ITSO) tweaking attacks stuffing defenses • Password rules 19,868 Cornell Total 1,374 emails in leaked active accounts dataset vulnerable Ran our attack on these accounts to Vulnerable accounts audit put under watchlist by ITSO 14

  15. Defense against these attacks only considers To date no defenses against credential tweaking attacks population wide • 71% vulnerable passwords considered strong by zxcvbn pw distribution Warn users when passwords are vulnerable to credential tweaking attacks Expensive to run Our solution Run audits Personalized password using credential tweaking attacks strength meter (PPSM) 15

  16. Personalized password strength meter (PPSM) Username Password mark jicDfba1 charlie 123456 … … Reset notification Authentication Database Mark Username Password Server mark jicDfba1 julia password Breach … … Notification Service Leaked Dataset 16

  17. Personalized password strength meter (PPSM) Username Password mark jicDfba1 charlie 123456 jicDfba123 … … Similar password PPSM Authentication Database Mark password Username Password Server Weak password mark jicDfba1 julia password Breach DioWs@194 … … Notification Accepted Service Leaked Dataset 17

  18. Building PPSMs Pass2path too big and slow for PPSM qwerty QWERTY1 Password Embedding QWERTY Qfhjs3$4fg4 Model jicDfba123 jicDfba1 Feed forward neural network 123456 jicDfba1 jicDfba123 Compressed model detects 96% vulnerable passwords Easy to deploy: 3 MB, Fast: 0.3 ms

  19. Beyond credential stuffing Modeling similarity of human chosen passwords Build both damaging tweaking attack and first-ever defense against it Attack Defense • Data-driven, state-of-the-art deep learning • PPSM using password embedding model • Outperforms the best previous attacks • Prevents credential tweaking attacks • 1,374 active user accounts at Cornell • Fast and lean (3MB) University vulnerable Email: bp397@cornell.edu Thank you! Website: cs.cornell.edu/~bijeeta/ Github: github.com/Bijeeta/credtweak 19

Recommend


More recommend