Performance and Concurrency Bug Detection Tools for Java Programs Shan Lu University of Chicago 1
Fighting software bugs is crucial • Software is everywhere – http://en.wikipedia.org/wiki/List_of_software_bugs • Software bugs are widespread and costly – Lead to 40% system down time [Blueprints 2000] – Cost 312 Billion lost per year [Cambridge 2013] 2
Fighting bugs in cloud systems …
… is crucial
Different aspects of fighting bugs In-house In-field In-field In-house bug detection failure recovery failure diagnosis bug fixing 5
Work from my group (local systems) In-house In-field In-field In-house bug detection failure recovery failure diagnosis bug fixing concurrency bugs [OOPSLA10]; [ASPLOS06];[SOSP07]; [PLDI11]; [ASPLOS13a] [ASPLOS13b]; [ASPLOS09];[ASPLOS10]; [OSDI12]; [FSE14] [ASPLOS14]; [ASPLOS11]; [OOPSLA13]; [FSE16] [ICSE17a] [OOPSLA16] performance bugs [PLDI12]; [OOPSLA14]; [CAV13]; n/a [ICSE13] [ICSE17b] [ICSE15] 6
Work from my group (local systems) In-house In-field In-field In-house bug detection failure recovery failure diagnosis bug fixing concurrency bugs [OOPSLA10]; [ASPLOS06];[SOSP07]; [PLDI11]; [ASPLOS13a] [ASPLOS13b]; [ASPLOS09];[ASPLOS10]; [OSDI12]; [FSE14] [ASPLOS14]; [ASPLOS11]; [OOPSLA13]; [FSE16] [ICSE17a] [OOPSLA16] performance bugs [PLDI12]; [OOPSLA14]; [CAV13]; n/a [ICSE13] [ICSE17b] [ICSE15] 7
Work from my group (cloud systems) In-house In-field In-field In-house bug detection failure recovery failure diagnosis bug fixing concurrency bugs [ASPLOS16]; On-going n/a On-going [ASPLOS17] performance bugs [CIKM’17]; [SOSP’15]; [SOCC’17] On-going [OSDI’16] On-going 8
Our bug-tools for Java programs Empirical In-house In-field In-house bug studies bug detection failure diagnosis bug fixing distributed concurrency bugs [ASPLOS16] [ASPLOS17] performance bugs [OOPSLA14]; [PLDI12] [ICSE13] [ICSE15] [ICSE17b] 9
Empirical Bug Studies Understanding and detecting real-world performance bugs [ PLDI '12 ] 10 TaxDC: A Comprehensive Taxonomy of Non-Deterministic Concurrency Bugs in Cloud Distributed Systems [ ASPLOS '16 ]
Performance bug studies • Why did we do this study? The most cited paper in PLDI 2012 11 Understanding and detecting real-world performance bugs [ PLDI '12 ]
Benchmark Suite Bug DB Application Software Type Language Tags # Bugs MLOC History Command-line Utility + Apache C/Java 0.45 13 y N/A 25 Server + Library Chrome GUI Application C/C++ 14.0 4 y N/A 10 Compile- GCC Compiler C/C++ 5.7 13 y 10 time-hog Mozilla GUI Application C++/JS 4.7 14 y perf 36 MySQL Server Software C/C++/C# 1.3 10 y S5 28 12 Understanding and detecting real-world performance bugs [ PLDI '12 ]
What/How did we study? • Bug root causes • Bug locations • Read on-line discussion • Bug triggering conditions • Check patches • Bug fix strategies • Check source code • Bug symptoms • Bug-related inputs All Manual 13 Understanding and detecting real-world performance bugs [ PLDI '12 ]
What could have been better? • Bug root causes • Bug locations • Read on-line discussion • Bug triggering conditions • Check patches • Bug fix strategies • Check source code • Bug symptoms • Bug-related inputs All Manual 14 Understanding and detecting real-world performance bugs [ PLDI '12 ]
Distributed concurrency bug studies • Why did we do this study? 15 TaxDC: A Comprehensive Taxonomy of Non-Deterministic Concurrency Bugs in Cloud Distributed Systems [ ASPLOS '16 ]
Benchmark Suite MLOC Bug DB Application Software Type # Bugs History Distributed Key-Value Store Cassandra Java 0.06 9 y 19 Hadoop Distributed computing Java 1.2 12 y 36 HBase Distributed Key-Value Store Java 0.2 10 y 30 Zookeeper Distributed Synch. Service Java 0.1 10 y 19 16 TaxDC: A Comprehensive Taxonomy of Non-Deterministic Concurrency Bugs in Cloud Distributed Systems [ ASPLOS '16 ]
What/How did we study? • Triggering conditions • Read on-line discussion • Errors & Failures • Check patches • Fix strategies • Check source code • Read software tutorial All Manual 17 TaxDC: A Comprehensive Taxonomy of Non-Deterministic Concurrency Bugs in Cloud Distributed Systems [ ASPLOS '16 ]
What could have been better? • Triggering conditions • Read on-line discussion • Errors & Failures • Check patches • Fix strategies • Check source code • Read software tutorial All Manual 18 TaxDC: A Comprehensive Taxonomy of Non-Deterministic Concurrency Bugs in Cloud Distributed Systems [ ASPLOS '16 ]
Bug Detection & Fixing DCatch: Automatically Detecting Distributed Concurrency Bugs in Cloud Systems [ ASPLOS'17 ] Toddler: Detecting Performance Problems via Similar Memory-Access Patterns [ ICSE '13 ] 19 CARAMEL: Detecting and Fixing Performance Problems That Have Non-Intrusive Fixes [ ICSE'15 ]
Dynamic DCbug Detection • DCbugs Bugs caused by improper timing among distributed operations • Why? Why not applying single-machine detectors? – Distributed triggering – Different happens-before model – Distributed error propagation – Much larger scales 20 DCatch: Automatically Detecting Distributed Concurrency Bugs in Cloud Systems [ ASPLOS'17 ]
DCatch Tool Trace HB Triage Trigger 21 DCatch: Automatically Detecting Distributed Concurrency Bugs in Cloud Systems [ ASPLOS'17 ]
DCatch Tool Implementation Trace HB Triage Trigger Javassist WALA Javassist 22 DCatch: Automatically Detecting Distributed Concurrency Bugs in Cloud Systems [ ASPLOS'17 ]
Benchmark Suite Application Software Type Workload # Bugs Distributed Key-Value Store Cassandra startup 1 Hadoop Distributed Computing wordcount 2 HBase Distributed Key-Value Store enable, split, alter 2 Zookeeper Distributed Synch. Service startup 2 23 DCatch: Automatically Detecting Distributed Concurrency Bugs in Cloud Systems [ ASPLOS'17 ]
Dynamic PerfBug Detection • Loop inefficiency bugs – Inefficient data structure – Redundant computation • Why? Toddler: Detecting Performance Problems via Similar Memory-Access Patterns [ ICSE '13 ] 24
Toddler Tool Trace Access-Value Analysis Toddler: Detecting Performance Problems via Similar Memory-Access Patterns [ ICSE '13 ] 25
Toddler Tool Trace Access-Value Analysis Soot Toddler: Detecting Performance Problems via Similar Memory-Access Patterns [ ICSE '13 ] 26
Benchmark Suite Known New Application Software Type KLOC Bugs Bugs Ant Java 110 1 8 Build tool Apache Col. Collections library Java 51 1 20 Groovy Dynamic language Java 137 1 0 Ggl Core Lib Collections library Java 156 2 10 JFreeChart Java 64 1 8 Chart framework JMeter Load testing tool Java 86 1 1 Lucene Text search engine Java 321 2 0 PDFBox PDF framework Java 78 1 0 Solr Search server Java 373 1 0 How to get inputs? Toddler: Detecting Performance Problems via Similar Memory-Access Patterns [ ICSE '13 ] 27
Static PerfBug Detection & Fixing • Missing-break bugs Where Is Computation Wasted? How Is Computation Every Late Early Wasted? Iteration Iterations Iterations No-Result Type 1 Type 2 Type Y Useless-Result Type X Type 3 Type 4 • Why? CARAMEL: Detecting and Fixing Performance Problems That Have Non-Intrusive Fixes [ ICSE'15 ] 28
Caramel Tool Side Side Side Fix Effect Effect Effect Sugge Ins. Cond. DataFl. stion CARAMEL: Detecting and Fixing Performance Problems That Have Non-Intrusive Fixes [ ICSE'15 ] 29
Caramel Tool Implementation Side Side Side Fix Effect Effect Effect Sugge Ins. Cond. DataFl. stion WALA WALA WALA WALA CARAMEL: Detecting and Fixing Performance Problems That Have Non-Intrusive Fixes [ ICSE'15 ] 30
Benchmark Suite (1) New Application Software Type KLOC Bugs Ant Java 110 1 Build tool Apache Col. Collections library Java 51 20 Groovy Dynamic language Java 137 9 Ggl Core Lib Collections library Java 156 10 JFreeChart Java 64 8 Chart framework JMeter Load testing tool Java 86 4 Lucene Text search engine Java 321 14 PDFBox PDF framework Java 78 10 Solr Search server Java 373 2 CARAMEL: Detecting and Fixing Performance Problems That Have Non-Intrusive Fixes [ ICSE'15 ] 31
Benchmark Suite (2) New Application Software Type KLOC Bugs Log4J Java 52 6 Logging framework Sling Web app. framework Java 202 6 Struts Web app. framework Java 175 4 Tika Content extraction Java 50 1 Tomcat Java 295 4 Web server No input requirements CARAMEL: Detecting and Fixing Performance Problems That Have Non-Intrusive Fixes [ ICSE'15 ] 32
Failure Diagnosis Performance Diagnosis for Inefficient Loops [ ICSE'17 ] 33
Conclusion • 2 bug studies, 3 bug detection tools • 18 Java benchmark software (4 distributed) • Workloads – Functionality & Performance • Analysis mechanisms – Human – WALA, Soot, Javassist • Others – Bug-tracking systems 34
Thanks! Shan Lu University of Chicago shanlu@uchicago.edu 35
Recommend
More recommend