investigating the distribution of password choices
play

Investigating the Distribution of Password Choices David Malone and - PowerPoint PPT Presentation

Investigating the Distribution of Password Choices David Malone and Kevin Maher, Hamilton Institute, NUI Maynooth. 19 April 2012 How to Guess a Password? Passwords are everywhere. If you dont know the password, can you guess it? 1. Make a


  1. Investigating the Distribution of Password Choices David Malone and Kevin Maher, Hamilton Institute, NUI Maynooth. 19 April 2012

  2. How to Guess a Password? Passwords are everywhere. If you dont know the password, can you guess it? 1. Make a list of passwords. 2. Assess the probability that each was used. 3. Guess from most likely to least likely. A dictionary attack, but with optimal ordering. (Applies to computers and keys too.)

  3. How long will that take? If we knew probability P i of i th password. Rank the passwords from 1 (most likely) to N (least likely). Average number of guesses is: N � G = iP i . i =1 Note, not the same as Entropy (Massey ’94, Arikan ’96). Does this P i really make sense? Is there a distribution with which passwords are chosen?

  4. Outline • Is there password distribution? Is knowing it better than a crude guess? • Are there any general features? Do different user groups behave in a similar way? • Some distributions better than others. Can we help users make better decisions?

  5. Getting data Want a collection of passwords to study distribution. Asked Yahoo, Google. • . . . Crackers eventually obliged. • 2006: flirtlife, 98930 users, 43936 passwords, 0.44. • 2009: hotmail, 7300 users, 6670 passwords, 0.91. • 2009: computerbits, 1795 users, 1656 passwords, 0.92. • 2009: rockyou, 32603043 users, 14344386 passwords, 0.44. Good: cleartext! Bad: Had to clean up data.

  6. Top Ten Rank hotmail #users flirtlife #users computerbits #users rockyou #users 1 123456 48 123456 1432 password 20 123456 290729 2 123456789 15 ficken 407 computerbits 10 12345 79076 3 111111 10 12345 365 123456 7 123456789 76789 4 12345678 9 hallo 348 dublin 6 password 59462 5 tequiero 8 123456789 258 letmein 5 iloveyou 49952 6 000000 7 schatz 230 qwerty 4 princess 33291 7 alejandro 7 12345678 223 ireland 4 1234567 21725 8 sebastian 6 daniel 185 1234567 3 rockyou 20901 9 estrella 6 1234 175 liverpool 3 12345678 20553 10 1234567 6 askim 171 munster 3 abc123 16648 (c.f. Imperva analysis of Rockyou data, 2010)

  7. Distribution? 100 10000 least squares s=0.44 least squares s=0.69 1000 10 100 Frequency Frequency 10 1 1 0.1 0.1 1 10 100 1000 10000 1 10 100 1000 10000 100000 Rank (binned) Rank (binned) hotmail flirtlife 100 1e+06 least squares s=0.45 least squares s=0.78 100000 10000 10 Frequency Frequency 1000 100 1 10 1 0.1 0.1 1 10 100 1000 10000 1 10 100 1000 10000 100000 1e+06 1e+07 1e+08 Rank (binned) Rank (binned) computerbits rockyou

  8. Zipf? • A straight line on a log-log plot points towards heavy tail. • Zipf? P r ∝ 1 r s • Slope gives s . • Can check p-values (Clauset ’09). • s is small, less than 1.

  9. Guesswork Predictions 3500 25000 Uniform model Guesswork Uniform model Guesswork Real data Guesswork Real data Guesswork Zipf model Guesswork Zipf model Guesswork 3000 Uniform model 0.85 Guesswork Uniform model 0.85 Guesswork Real data 0.85 Guesswork 20000 Real data 0.85 Guesswork Zipf model 0.85 Guesswork Zipf model 0.85 Guesswork 2500 15000 2000 Guesses Guesses 1500 10000 1000 5000 500 0 0 hotmail flirtlife 900 8e+06 Uniform model Guesswork Uniform model Guesswork Real data Guesswork Real data Guesswork 800 Zipf model Guesswork 7e+06 Zipf model Guesswork Uniform model 0.85 Guesswork Uniform model 0.85 Guesswork Real data 0.85 Guesswork Real data 0.85 Guesswork 700 Zipf model 0.85 Guesswork Zipf model 0.85 Guesswork 6e+06 600 5e+06 500 Guesses Guesses 4e+06 400 3e+06 300 2e+06 200 1e+06 100 0 0 computerbits rockyou

  10. Who cares? • Algorithm Design — exploit heavy tail? • Can we get close to optimal dictionary attack? • Can we make dictionary attack less effective? 2 and 3 answer questions about common behavior and helping users.

  11. Dictionary Attack Suppose we use one dataset as a dictionary to attack another. 1e+00 1e+03 1e-01 probability / fraction of users users 1e+02 1e-02 1e+01 1e-03 optimal guessing guesses from rockyou guesses from flirtlife guesses from computerbits 40% 1e+00 1e+00 1e+01 1e+02 1e+03 1e+04 1e+05 1e+06 1e+07 1e+08 guesses hotmail

  12. Dictionary Attack — Same Story 1e+00 1e+05 1e+00 1e+04 1e-01 1e+03 1e-01 probability / fraction of users probability / fraction of users 1e+03 1e-02 users 1e+02 users 1e-02 1e+02 1e-03 1e+01 1e-03 optimal guessing 1e+01 optimal guessing 1e-04 guesses from rockyou guesses from rockyou guesses from flirtlife guesses from hotmail guesses from computerbits guesses from computerbits 40% 40% 1e+00 1e+00 1e-05 1e+00 1e+01 1e+02 1e+03 1e+04 1e+05 1e+06 1e+07 1e+08 1e+00 1e+01 1e+02 1e+03 1e+04 1e+05 1e+06 1e+07 1e+08 guesses guesses hotmail flirtlife 1e+00 1e+00 1e+03 1e+07 1e-01 1e+06 probability / fraction of users 1e-02 probability / fraction of users 1e-01 1e+05 1e+02 1e-03 1e+04 users users 1e-04 1e+03 1e-02 1e-05 1e+01 1e+02 optimal guessing optimal guessing 1e-06 guesses from rockyou guesses from flirtlife 1e+01 guesses from flirtlife guesses from hotmail guesses from hotmail 1e-03 guesses from computerbits 1e-07 40% 40% 1e+00 1e+00 1e+00 1e+01 1e+02 1e+03 1e+04 1e+05 1e+06 1e+07 1e+08 1e+00 1e+01 1e+02 1e+03 1e+04 1e+05 1e+06 1e+07 1e+08 guesses guesses computerbits rockyou

  13. Dictionary Attack Gawker December 2010, Gawker, 748090 DES Hashes, well salted. 1e+00 1e+05 1e-01 probability / fraction of users 1e+04 1e-02 users 1e+03 1e-03 1e+02 1e-04 guesses from rockyou guesses from flirtlife 1e+01 guesses from hotmail 1e-05 guesses from computerbits guesses from dictionary 40% 1e+00 1e+00 1e+01 1e+02 1e+03 1e+04 1e+05 1e+06 1e+07 1e+08 guesses Results in paper for % passwords. Dell’Amico’10 review smart generators. This looks × 10!

  14. Helping Users If users select passwords ‘randomly’, can we make them a better generator? • Banned list (e.g. twitter), • Password rules (e.g. numbers and letters). • Act like a cracker (e.g. cracklib), • Cap peak of password distribution (e.g. Schechter’10), • Aim for uniform? Metropolis-Hastings algorithm takes bad random number generator and makes it good.

  15. Metropolis-Hastings for Uniform Passwords Keep a frequency table F ( x ) for requests to use password x . 1. Uniformly choose x from all previously seen passwords. 2. Ask user for a new password x ′ . 3. Generate a uniform real number u in the range [0 , F ( x ′ )] and then increment F ( x ′ ). If u ≤ F ( x ) go to step 4 (accept), otherwise return to step 2 (reject). 4. Accept use of x ′ as password.

  16. How does it do? 10000 First Choice Metropolis-Hastings Scheme Hard limit 1/1000000 1000 Frequency 100 10 1 1 10 100 1000 10000 100000 1e+06 Rank Rockyou-based test, 1000000 users, mean tries 1.28, variance 0.61. Could be implemented using min-count sketch. Doesn’t store actual use frequencies. No parameters, aims to flatten whole distribution.

  17. Conclusions • Idea of distribution of password choices seems useful. • Zipf is OK, but not perfect match. • Different user groups have a lot in common (not peak). • Dictionaries not great for dictionary attacks. • Treat users as random password generators? • Future: Generalise beyond web passwords? • Future: Field test of Metropolis-Hastings? • Future: What does optimal banned list look like?

  18. From Reviews • So much cool literature from (at least) 1979–2012. • In security, passwords are the gift that keeps on giving.

Recommend


More recommend