Why do big data and cloud systems slow down and stop? Shan Lu
What are? Why do big data and cloud systems slow down and stop?
Big data & cloud systems 3
Big data & cloud systems DB-backed web applications Cloud services ● ● 4
DB-backed web applications … HTTP request Application server Database query DBMS 5
Performance is critical for web applications ● Low latency is critical Nearly half of the users expect a site to load in less than 2 seconds 1 SECOND 11 % 16 % 7 % DELAY IN PAGE LOAD Fewer Page Less Customer Loss in Profit Views Satisfaction ● Low latency is challenging given the data size 6
Cloud services 7
8
9
Reliability is critical for cloud services 10
Reliability is critical for cloud services 11
… Outline ● What slows down (big data) web applications [ICSE’18] ○ What can we do about it? [CIKM’17, FSE’18, ICSE’19, CIDR’20] DBMS 1000+ bugs found ● What stops cloud systems? [HotOS’19] ○ What can we do about it? [ASPLOS’16, ASPLOS’17, ASPLOS’18, PLDI’19, SOSP’19] 1000+ bugs found 12
What Slowed Down Database-Backed Web Applications hyperloop.cs.uchicago.edu Shan Lu View-Centric Performance Optimization for Database-Backed Web Applications. ICSE’19 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18. PowerStation: Automatically detecting and fixing inefficiencies of database-backed web applications in IDE . FSE’18
Common Web-app Architecture … HTTP request Application server Database query DBMS 14
Common Web-app Architecture http:// www.xxx.com /blogs/index user HTTP class BlogsController request def index user_id = 1 Controller Application server myblogs = Blog .retrieve(user_id) end end class Blog Model def retrieve(user_id) Blog .where(uid = user_id) SELECT * FROM blogs where uid = id end end Query Translator DBMS 3
Common Web-app Architecture http:// www.xxx.com /blogs/index user HTTP class BlogsController request def index user_id = 1 Controller http://blogs/index Application server … myblogs = Blog .retrieve(user_id) 1001 unread blogs end Arriving at Zurich end View Stopping by Bern Model One day at Luzern Love love Berner Oberland app/views/blogs/index.html.erb Query Translator @myblogs.each do |blog| Love Berner Oberland blogs blog.content<br/> uid contents Back to Lausanne end DBMS 3
Potential sources of inefficiencies Object Relational Mapping Framework class Blog Model Blog .where(uid = user_id) end blogs SELECT * FROM blogs where uid = id uid contents DBMS 3
Potential sources of inefficiencies MVC Design Pattern Object Relational Mapping Framework Controller View class Blog Model Blog .where(uid = user_id) app/views/blogs/index.html.erb end @myblogs.each do |blog| blogs SELECT * FROM blogs where uid = id blog.content<br/> uid contents end DBMS 3
Outline How severe is the problem? 64 issues in Profile 12 apps from 6 common categories 40 pages What are the common inefficiency patterns? 9 anti- Build performance-bug taxonomy patterns How to solve the problem? 1000 + Design automated bug detection & fixing bugs 19
Outline 64 issues in Profile 12 apps from 6 common categories 40 pages Build performance-bug taxonomy Design automated bug detection & fixing 20
Profiling methodology Synthesize DB content based on real-world website statistics Top 2 Apps in 6 popular categories 21 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.
Profiling End-to-end Page Time 6 apps have pages > 3s 11 apps have pages > 2s 40 problematic pages Server takes most time 20000 record 22 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.
Why is it slow? There are inefficiency bugs! 23 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.
Why is it slow? We manually fix the 64 issues we found across 39 pages ● LoC changed speedup 80% There are bugs! 60% 24 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.
Outline Profile 12 apps from 6 common categories 9 anti- Build performance-bug taxonomy patterns Design automated bug detection & fixing 25 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.
Common Performance Anti-patterns 64 performance issues 140 performance issues 9 anti-patterns from profiling from bug tracking system 26 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.
Common Performance Anti-patterns 2 ORM API Application Misuse Design Tradeoff 106 issues across 12 apps 47 issues across 12 apps Database 1 3 Design 41 issues across 10 apps 27 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.
ORM API Misuse Inefficient Computation UC Unnecessary Computation IC 26 issues across 8 apps 22 issues across 10 apps Inefficient Rendering Inefficient Data Access IR ID UD 5 issues across 4 apps 44 issues across 11 app Unnecessary Data Retrieval 9 issues across 4 apps 28 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.
ORM API Misuse Inefficient Computation UC Unnecessary Computation IC 26 issues across 8 apps 22 issues across 10 apps Inefficient Rendering Inefficient Data Access IR ID UD 5 issues across 4 apps 44 issues across 11 app Unnecessary Data Retrieval 9 issues across 4 apps 29 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.
ORM API Misuse: inefficient computation inefficient project.issues.count>0 SELECT COUNT(*) FROM issues WHERE project_id = ? inefficient project.issues.any? SELECT COUNT(*) FROM issues WHERE project_id = ? project.issues.exists? efficient SELECT 1 AS ONE FROM issues WHERE project_id = ? LIMIT 1 2X speedup 30 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.
ORM API Misuse: unnecessary computation values.each do |value| u.issues.include? value end 31 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.
ORM API Misuse: unnecessary computation + rans = u.issues values.each do |value| values.each do |value| - u.issues.include?value + rans.include?value end end 20X speed up 32 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.
ORM API misuses that affect memory consumption map (:id) VS pluck (:id) ● pluck(size).sum VS sum(size) ● pluck + pluck VS SQL UNION ● … ● 33
How to tackle API Misuses? Why cannot existing compiler handle this? ● Can we extend compiler to ● ○ Understand ORM APIs and queries? ○ Detect the problem? ○ Solve the problem? 34 PowerStation: Automatically detecting and fixing inefficiencies of database-backed web applications in IDE . FSE’18
Database-aware PDG Copy: v1 = u Copy: v2 = values v1 = u v2 = values val = v2[] values.reject |val| v2.do |val| u.issues.include?val v3 = v1.issues Call: v3=v1.issues Call: v3=v1.issues Call: v3=v1.issues end v3.include?val query node end data edge Call:v3.include?val control edge SQL: SELECT * from issues (a) Ruby code WHERE user_id=? (b) PDG 35
Detect and Fix Copy: v1 = u Copy: v2 = values val = v2[] Call: v3=v1.issues Loop-invariant query query node data edge control edge Call:v3.include?val 36 PowerStation: Automatically detecting and fixing inefficiencies of database-backed web applications in IDE . FSE’18
PowerStation (Integrated with RubyMine) Click here PowerStation Whole App Single Action issues PowerStation DS RD LI LI IA CS IR blogs_controller.rb 4 FIX FIX blogs_controller.rb 4 run_query is a loop invariant query Fix: move it out of the loop 37 Issue List PowerStation: Automatically detecting and fixing inefficiencies of database-backed web applications in IDE . FSE’18
Try our Powerstation! • 12 real world apps • 1221 inefficiencies found 38 PowerStation: Automatically detecting and fixing inefficiencies of database-backed web applications in IDE . FSE’18
Common Performance Anti-patterns 2 ORM API Application Misuse Design Tradeoff 106 issues across 12 apps 47 issues across 12 apps 1 Database 3 Design 41 issues 39 across 10 apps
Database Design Problem Missing fields (8 issues across 5 apps): ● fields derivable from other fields and not persistently stored ○ id longitude latitude location 2X Missing index (33 issues across 10 apps) ● 40 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.
Recommend
More recommend