Reverse Engineering CAPTCHAs WCRE 2008 Reverse Engineering CAPTCHAs Abram Hindle, Micheal W. Godfrey, Richard C. Holt Software Architecture Group David R. Cheriton School of Computer Science University of Waterloo Canada http://swag.uwaterloo.ca/ { ahindle,migod,holt } @cs.uwaterloo.ca Abram Hindle 1
Reverse Engineering CAPTCHAs WCRE 2008
Reverse Engineering CAPTCHAs WCRE 2008
Reverse Engineering CAPTCHAs WCRE 2008
Reverse Engineering CAPTCHAs WCRE 2008
Reverse Engineering CAPTCHAs WCRE 2008
Reverse Engineering CAPTCHAs WCRE 2008 Motivation • How can we solve that CAPTCHA? • How was a CAPTCHA made? Abram Hindle 7
Reverse Engineering CAPTCHAs WCRE 2008 Why Reverse Engineer? • If we can reverse engineer a CAPTCHA – leverage weaknesses – re-implement a CAPTCHA ∗ The more we understand the easier it is to defeat ∗ We can solve by cloning Abram Hindle 8
Reverse Engineering CAPTCHAs WCRE 2008 Abram Hindle 9
Reverse Engineering CAPTCHAs WCRE 2008 CAPTCHA Properties Abram Hindle 10
Reverse Engineering CAPTCHAs WCRE 2008 Abram Hindle 11
Reverse Engineering CAPTCHAs WCRE 2008 Common Properties • Readable: the captcha must be easily read and decoded by humans. • Unguessable: The captcha message cannot be guessed at random with any real confidence. • Order-able: Characters are read left to right, top to bottom (exceptions could include Hebrew or Arabic captchas). If a captcha is readable, its character ordering should be apparent. Abram Hindle 12
Reverse Engineering CAPTCHAs WCRE 2008 Bitmap fonts and placement Abram Hindle 13
Reverse Engineering CAPTCHAs WCRE 2008 Backgrounds Abram Hindle 14
Reverse Engineering CAPTCHAs WCRE 2008 Noise Abram Hindle 15
Reverse Engineering CAPTCHAs WCRE 2008 Linear Transformations Abram Hindle 16
Reverse Engineering CAPTCHAs WCRE 2008 Non-Linear Transformations Abram Hindle 17
Reverse Engineering CAPTCHAs WCRE 2008 Dripping and Fuzzy Text Abram Hindle 18
Reverse Engineering CAPTCHAs WCRE 2008 CAPTCHA Breaking Abram Hindle 19
Reverse Engineering CAPTCHAs WCRE 2008 Abram Hindle 20
Reverse Engineering CAPTCHAs WCRE 2008 Layering Abram Hindle 21
Reverse Engineering CAPTCHAs WCRE 2008 Abram Hindle 22
Reverse Engineering CAPTCHAs WCRE 2008 Abram Hindle 23
Reverse Engineering CAPTCHAs WCRE 2008 Abram Hindle 24
Reverse Engineering CAPTCHAs WCRE 2008 Abram Hindle 25
Reverse Engineering CAPTCHAs WCRE 2008 Abram Hindle 26
Reverse Engineering CAPTCHAs WCRE 2008 Abram Hindle 27
Reverse Engineering CAPTCHAs WCRE 2008 Abram Hindle 28
Reverse Engineering CAPTCHAs WCRE 2008 Abram Hindle 29
Reverse Engineering CAPTCHAs WCRE 2008 Abram Hindle 30
Reverse Engineering CAPTCHAs WCRE 2008 Text Pixel Identification and Image Cleanup Abram Hindle 31
Reverse Engineering CAPTCHAs WCRE 2008 Erosion and Dilation Abram Hindle 32
Reverse Engineering CAPTCHAs WCRE 2008 Thresholding Abram Hindle 33
Reverse Engineering CAPTCHAs WCRE 2008 Edge Detection Abram Hindle 34
Reverse Engineering CAPTCHAs WCRE 2008 Segmentation Abram Hindle 35
Reverse Engineering CAPTCHAs WCRE 2008 Weight Segmentation Abram Hindle 36
Reverse Engineering CAPTCHAs WCRE 2008 Box Segmenter Abram Hindle 37
Reverse Engineering CAPTCHAs WCRE 2008 Shrinking and K-Means segmentation Abram Hindle 38
Reverse Engineering CAPTCHAs WCRE 2008 Fill Flood Abram Hindle 39
Reverse Engineering CAPTCHAs WCRE 2008 Normalization and Character Matching Abram Hindle 40
Reverse Engineering CAPTCHAs WCRE 2008 PCA of A Abram Hindle 41
Reverse Engineering CAPTCHAs WCRE 2008 PCA of F Abram Hindle 42
Reverse Engineering CAPTCHAs WCRE 2008 Skeletonization Abram Hindle 43
Reverse Engineering CAPTCHAs WCRE 2008 CAPTCHA Solving • Character Database • Normalization of Characters – PCA etc. • Matching – Nearest Neighbor – Shape Matching Abram Hindle 44
Reverse Engineering CAPTCHAs WCRE 2008 Piratebay Database Abram Hindle 45
Reverse Engineering CAPTCHAs WCRE 2008 Digg Database Abram Hindle 46
Reverse Engineering CAPTCHAs WCRE 2008 Reverse Engineering • Layering • Background • Noise • Text • Transforms Abram Hindle 47
Reverse Engineering CAPTCHAs WCRE 2008
Reverse Engineering CAPTCHAs WCRE 2008 Captcha Solving Summary • Image Clean Up • Text Pixel Identification • Segmentation • Character Matching – Normalization Abram Hindle 49
Reverse Engineering CAPTCHAs WCRE 2008 Solving by Cloning • Reverse Engineer captcha • Preprocess the captcha • Parameterize • Generate candidates – Search through the captchas – Find best match – Repeat Abram Hindle 50
Reverse Engineering CAPTCHAs WCRE 2008 Watercap demo • Provided with a captcha of “WCREWCRE” and the code to generate such captchas • Algorithm – Per each column we iterate through each character, ∗ generating a captcha for each prefix and character, · keeping the best match. Abram Hindle 51
Reverse Engineering CAPTCHAs WCRE 2008 CAPTCHA Example Accuracy Digg 30% PHPBB 99% Piratebay 61% Watercap 27% / 93% Rogers 95% Minimum accuracy of our captcha breakers Abram Hindle 52
Reverse Engineering CAPTCHAs WCRE 2008 How to improve captcha implementations • Non-linear transformations • Non-fill-flood-able letters • Use more characters • Limit captcha access • Similar to the background • Non continuous and overlapping characters Abram Hindle 53
Reverse Engineering CAPTCHAs WCRE 2008 Ethics • Spammers • Visually Impaired • Poor security • Options: – Telephone Confirmation – Credit Cards – Web of trust Abram Hindle 54
Reverse Engineering CAPTCHAs WCRE 2008 Reverse Engineering Lessons • RE Can be interpretative • Some outputs have properties that allow us to Reverse Engineer the software that created it – In this case 2D Image generation has many common patterns • Absence of code still allows RE Abram Hindle 55
Reverse Engineering CAPTCHAs WCRE 2008 Future Work • Better Breakers • Layer recognition • Audio captchas Abram Hindle 56
Reverse Engineering CAPTCHAs WCRE 2008 Conclusion • Reverse Engineering captchas hi-lights techniques that have weaknesses. • Captcha generation follows certain patterns which are recoverable and leveragable. • Captchas have been defeated – Even “good” captchas from Microsoft, Yahoo and Google have been defeated. Abram Hindle 57
Recommend
More recommend