unnaturalnets
play

UnnaturalNets Work by Joshua Campbell, Eddie Antonio Santos, - PowerPoint PPT Presentation

UnnaturalNets Work by Joshua Campbell, Eddie Antonio Santos, Nelson J Amaral, Joshua is On the Abram Hindle postdoc market! Sept 2018 Introduction Exploring Bimodal program analysis via a naturalness lens Syntax Error Detection


  1. The Story of Ada 100,000s of crash reports per day PartyCrasher Crash database 4 of 44

  2. Example: Mozilla • more than 2 million crash reports per week! • Manual bucketing @ 1 crash/minute: ◦ 913 Full-time employees! 5 of 44

  3. What We Want oopsie bug annoyance Goal: Group the crashes together in buckets by what caused them whoops random crash 6 of 44

  4. Realism! oopsie bug annoyance whoops Time whoops regression random crash 10 of 44

  5. How good is a solution? • How do we measure correctness? • BCubed precision and recall! • Why not just normal precision and recall? • The solutions just put crashes together in buckets, ◦ doesn’t say what bugs exist (or even how many bugs exist) 11 of 44

  6. High BCubed Precision oopsie bug annoyance whoops Time whoops regression random crash 12 of 44

  7. High BCubed Recall oopsie bug annoyance whoops Time whoops regression random crash 13 of 44

  8. Balanced BCubed P/R oopsie bug annoyance whoops Time whoops regression random crash 14 of 44

  9. But does it scale? • We want it now! ◦ ( n log n total time or log n time per crash) ◦ Classical clustering algorithms are n 2 total time ◦ 2 million/week 15 of 44

  10. Online oopsie bug annoyance Future Past whoops Time whoops regression random crash 16 of 44

  11. Don’t want to hire devs • Doesn’t require developers to categorize crashes ◦ unsupervised 17 of 44

  12. Non-stationary oopsie bug increase crash rate? Future Past Time whoops random crash new bucket? 18 of 44

  13. In Practice: Mozilla • “Signature Generation” • Fast! • Accurate? 19 of 44

  14. In Practice: Others • Mozilla, Microsoft (WER), Apple, Google... • Typically involve LOTS of hand-written rules 20 of 44

  15. In Literature • A bunch of methods that are n 2 time complexity (or worse) ◦ take at least time proportional to n to sort one crash 21 of 44

  16. In Literature • Lerch, et al. ◦ Not designed for crash report deduplication! ◦ Uses Lucene search engine find similar documents (bugs) 27 of 44

  17. Lucene search Based on a standard textbook IR technique called TF-IDF plus some adjustments ↑↑↑ words in this document (crash) ↑↑↑ ↓↓↓ words in every document (crash) ↓↓↓ • the, be, to, of, and, a, in ... 28 of 44

  18. In Literature • Lerch, et al. ◦ Let’s try that, but instead of trying to group bugs together, let’s group crashes! 29 of 44

  19. Let’s Add Context evince crashed with SIGSEGV in cairo_transform() This happens immediately when trying to mark text with the mouse. ProblemType: Crash Architecture: amd64 DistroRelease: Ubuntu 7.10 ExecutablePath: /usr/bin/evince Package: evince 0.9.0-1ubuntu4 PackageArchitecture: amd64 ProcCmdline: evince ./expenses-uds-sevilla.pdf Signal: 11 SourcePackage: evince Uname: Linux donald 2.6.20-15-generic #2 SMP 30 of 44

  20. In Literature • Lerch, et al. ◦ Requires breaking up things (bugs, crashes) into “words” 31 of 44

  21. Tokenization: Lerch evince crashed with SIGSEGV in cairo_transform() #0 0x00002b34461e4dd1 in cairo_transform () from /usr/lib/libcairo.so.2 #1 0x00002b344498a150 in CairoOutputDev::setDefaultCTM () from /usr/lib/libpoppler-glib.so.1 #2 0x00002b344ae2cefc in TextSelectionPainter::TextSelectionPainter () from /usr/lib/libpoppler.so.1 32 of 44

  22. Tokenization: Space evince crashed with SIGSEGV in cairo_transform() #0 0x00002b34461e4dd1 in cairo_transform () from /usr/lib/libcairo.so.2 #1 0x00002b344498a150 in CairoOutputDev::setDefaultCTM () from /usr/lib/libpoppler-glib.so.1 #2 0x00002b344ae2cefc in TextSelectionPainter::TextSelectionPainter () from /usr/lib/libpoppler.so.1 33 of 44

  23. Tokenization: CamelCase evince crashed with SIGSEGV in cairo_transform() #0 0x00002b34461e4dd1 in cairo_transform () from /usr/lib/libcairo.so.2 #1 0x00002b344498a150 in CairoOutputDev::setDefaultCTM () from /usr/lib/libpoppler-glib.so.1 #2 0x00002b344ae2cefc in TextSelectionPainter::TextSelectionPainter () from /usr/lib/libpoppler.so.1 34 of 44

  24. Tokenization Lerch glib setdefaultctm 0x00002b344498a150 Space libpoppler 0x00002b344498a150 from cairooutputdev from in #1 () /usr/lib/libpoppler-glib.so.1 CairoOutputDev::setDefaultCTM Camel set usr lib 150 glib so 1 x 0 1 Output Dev CTM in Cairo 344498 00002 a Default from b libpoppler 35 of 44

  25. Results • Ok so who won? 36 of 44

  26. Best F1 Best Recall CamelC Lerch SpaceC 1Frame Best Precision 1Mod 2Frame 1Addr 1File 3Frame 37 of 44

Recommend


More recommend