a natural language approach to automated cryptanalysis of
play

A Natural Language Approach to Automated Cryptanalysis of Two-time - PowerPoint PPT Presentation

A Natural Language Approach to Automated Cryptanalysis of Two-time Pads Joshua Mason Kathryn Watkins Jason Eisner Adam Stubblefield The Two Time Pad Problem Attack at Dawn doQvYcSWIPyXaC Attack at Dawn doQvYcSWIPyXaC Take the


  1. A Natural Language Approach to Automated Cryptanalysis of Two-time Pads Joshua Mason Kathryn Watkins Jason Eisner Adam Stubblefield

  2. The Two Time Pad Problem

  3. Attack at Dawn doQvYcSWIPyXaC ⊕

  4. Attack at Dawn doQvYcSWIPyXaC ⊕ Take the Beach doQvYcSWIPyXaC ⊕

  5. Attack at Dawn doQvYcSWIPyXaC ⊕ Take the Beach doQvYcSWIPyXaC ⊕ ⊕

  6. Attack at Dawn doQvYcSWIPyXaC ⊕ Take the Beach doQvYcSWIPyXaC ⊕ ⊕ Attack at Dawn doQvYcSWIPyXaC ⊕ ⊕ ⊕ Take the Beach doQvYcSWIPyXaC

  7. Attack at Dawn doQvYcSWIPyXaC ⊕ Take the Beach doQvYcSWIPyXaC ⊕ ⊕ Attack at Dawn doQvYcSWIPyXaC ⊕ ⊕ ⊕ Take the Beach doQvYcSWIPyXaC

  8. Attack at Dawn doQvYcSWIPyXaC ⊕ Take the Beach doQvYcSWIPyXaC ⊕ ⊕ Attack at Dawn ⊕ Take the Beach

  9. Attack at Dawn = 15 15 1f 04 43 1f 48 04 54 62 21 00 14 6 ⊕ Take the Beach

  10. OJNcDfoMncXzYwwQQZRXYWORT190LP

  11. OJNcDfoMncXzYwwQQZRXYWORT190LP the ⊕

  12. OJNcDfoMncXzYwwQQZRXYWORT190LP the ⊕ QpL

  13. OJNcDfoMncXzYwwQQZRXYWORT190LP the ⊕

  14. OJNcDfoMncXzYwwQQZRXYWORT190LP the ⊕ Man

  15. Formalized by F. Rubin in 1978 Automated by E. Dawson and L. Nielson in 1996

  16. Assumptions • Uppercase English characters and space • Space is always the most frequent character

  17. P 0 ⊕ P 1 = 6e 71 00 6f 79 61

  18. P 0 ⊕ P 1 = 6e 71 00 6f 79 61

  19. P 0 ⊕ P 1 = 6e 71 6f 79 61

  20. P 0 ⊕ P 1 = 6e 71 6f 79 61

  21. P 1 ⊕ P 2 = 67 82 00 00 00 00 00 34

  22. P 1 ⊕ P 2 = 67 82 00 00 00 00 00 34

  23. P 1 ⊕ P 2 = 67 82 00 00 00 34

  24. Testing Methodology • Trained on the first 600K characters of the Bible • Attempted recovery of passages from first 600K characters of the bible

  25. Percentage Correctly Recovered Dawson & Nielson P0 ⊕ P1 62.7% P1 ⊕ P2 61.5% P0 ⊕ P1 62.6%

  26. Percentage Correctly Recovered Dawson & Our Nielson Technique P0 ⊕ P1 62.7% 100% P1 ⊕ P2 61.5% 99.99% P0 ⊕ P1 62.6% 99.96%

  27. Our Assumptions • Plaintext has some structure • Plaintext is in a language we know

  28. n-gram count 2 a 2 p 2 l 1 e 2

  29. 7 billion characters

  30. 7 billion 450 million characters characters

  31. 7 billion 450 million 4 billion characters characters characters

  32. apple orange

  33. a start o P 0 ⊕ P 1 0e 02 11 02

  34. a o p(a) p(o) start o P 0 ⊕ P 1 0e 02 11 02

  35. a o p(a) p(o) start p(o) p(a) o a P 0 ⊕ P 1 0e 02 11 02

  36. a o ap or p(a) p(o) p(p|a) p(r|o) start p(o) p(a) p(r|o) p(p|a) o a or ap P 0 ⊕ P 1 0e 02 11 02

  37. a o ap or app ora p(a) p(o) p(p|a) p(r|o) p(p|ap) p(a|or) start p(o) p(a) p(r|o) p(p|a) p(a|or) p(p|ap) o a or ap ora app P 0 ⊕ P 1 0e 02 11 02

  38. a o ap or p(a) p(o) p(p|a) p(r|o) start p(o) p(a) p(r|o) p(p|a) o a or ap P 0 ⊕ P 1 0e 02 0e 02

  39. a o ap or apa oro p(a) p(o) p(p|a) p(r|o) p(a|ap) p(o|or) start p(o) p(a) p(r|o) p(p|a) p(o|or) p(a|ap) o a or ap oro apa P 0 ⊕ P 1 0e 02 0e 02

  40. a o ap or apa oro p(a) p(o) p(p|a) p(r|o) p(a|ap) p(o|or) start p(o) p(a) p(r|o) p(p|a) p(o|or) p(a|ap) o a or ap oro apa P 0 ⊕ P 1 0e 02 0e 02

  41. Memory/Computation

  42. a start b c P 2 ⊕ P 3 01 00 02 02

  43. start b c c P 2 ⊕ P 3 01 00 02 02

  44. start b c c b P 2 ⊕ P 3 01 00 02 02

  45. ba ca bb cb bc cc start b c ca ba cb bb c b cc bc P 2 ⊕ P 3 01 00 02 02

  46. start b c p(b) p(c) c b P 2 ⊕ P 3 01 00 02 02 p(c) p(b)

  47. b c p(b) p(c) c b P 2 ⊕ P 3 01 00 02 02 p(c) p(b)

  48. ba ca bb cb bc cc b c ca ba p(b) p(c) cb bb c b cc bc P 2 ⊕ P 3 01 00 02 02 p(c) p(b)

  49. ba ca p(a|b) p(a|c) bb cb p(b|b) p(b|c) bc cc p(c|b) p(c|c) b c p(a|c) p(a|b) ca ba p(b) p(c) p(b|c) p(b|b) cb bb p(c|c) p(c|b) c b cc bc P 2 ⊕ P 3 01 00 02 02 p(c) p(b)

  50. ba ca p(a|b) p(a|c) bb cb p(b|b) p(b|c) bc cc p(c|b) p(c|c) p(a|c) p(a|b) ca ba p(b|c) p(b|b) cb bb p(c|c) p(c|b) cc bc P 2 ⊕ P 3 01 00 02 02

  51. ba ca bb cb bc cc ca ba cb bb cc bc P 2 ⊕ P 3 01 00 02 02

  52. ba ca ca ba cc bc P 2 ⊕ P 3 01 00 02 02

  53. ba ca ... ca ba cc bc P 2 ⊕ P 3 01 00 02 02

  54. ... P 2 ⊕ P 3 01 00 02 02

  55. ... END P 2 ⊕ P 3 01 00 02 02

  56. END P 2 ⊕ P 3 01 00 02 02

  57. ... END b c ba ca P 2 ⊕ P 3 01 00 02 02

  58. Commodity Hardware Dual Core System Pentium 3 GHz Memory 8 GB Storage 1.2 TB

  59. Model Build Time ~12 hours Runtime 200 ms per byte Memory Usage ~2 GB

  60. Our testing methodology

  61. 402,590 Files 98,699 Files 520,931 Files

  62. 402,590 Files 98,699 Files 520,931 Files 2,590 Files 8,699 Files 20,931 Files

  63. 402,590 Files 98,699 Files 520,931 Files 2,590 Files 8,699 Files 20,931 Files 50 Files 50 Files 50 Files

  64. Small HTML 90.64% E-mail 82.29% Documents 53.84%

  65. Small Medium HTML 90.64% 92.78% E-mail 82.29% 89.04% Documents 53.84% 53.05%

  66. Small Medium Large HTML 90.64% 92.78% 93.79% E-mail 82.29% 89.04% 90.85% Documents 53.84% 53.05% 52.72%

  67. The Switching Problem

  68. I want to remind you about our All-Employee Meeting this Tuesday, Oct. 23, at 10 a.m. Houston time at the Hyatt Regency. We obviously have a lot to talk about. Last week Well I hope you have Dad doing some of the cleaning! You know how he always has an opinion but yet no participation. Anyway I hope you're doing fine. I'm fine

  69. I want to remind you about our All-Employee Meeting this Tuesday, Oct. 23, at 10 a.m. Houston time at the Hyatt Regency participation. Anyway I hope you're doing fine. I'm fine and about to Well I hope you have Dad doing some of the cleaning! You know how he always has an opinion but yet no. We obviously have a lot to talk about. Last week we reported third quarter earnings. We

  70. Wu showed Word 2002 re-uses one time pad

Recommend


More recommend