why do big data and cloud systems slow down and stop
play

Why do big data and cloud systems slow down and stop? Shan Lu What - PowerPoint PPT Presentation

Why do big data and cloud systems slow down and stop? Shan Lu What are? Why do big data and cloud systems slow down and stop? Big data & cloud systems 3 Big data & cloud systems DB-backed web applications Cloud services


  1. Why do big data and cloud systems slow down and stop? Shan Lu

  2. What are? Why do big data and cloud systems slow down and stop?

  3. Big data & cloud systems 3

  4. Big data & cloud systems DB-backed web applications Cloud services ● ● 4

  5. DB-backed web applications … HTTP request Application server Database query DBMS 5

  6. Performance is critical for web applications ● Low latency is critical Nearly half of the users expect a site to load in less than 2 seconds 1 SECOND 11 % 16 % 7 % DELAY IN PAGE LOAD Fewer Page Less Customer Loss in Profit Views Satisfaction ● Low latency is challenging given the data size 6

  7. Cloud services 7

  8. 8

  9. 9

  10. Reliability is critical for cloud services 10

  11. Reliability is critical for cloud services 11

  12. … Outline ● What slows down (big data) web applications [ICSE’18] ○ What can we do about it? [CIKM’17, FSE’18, ICSE’19, CIDR’20] DBMS 1000+ bugs found ● What stops cloud systems? [HotOS’19] ○ What can we do about it? [ASPLOS’16, ASPLOS’17, ASPLOS’18, PLDI’19, SOSP’19] 1000+ bugs found 12

  13. What Slowed Down Database-Backed Web Applications hyperloop.cs.uchicago.edu Shan Lu View-Centric Performance Optimization for Database-Backed Web Applications. ICSE’19 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18. PowerStation: Automatically detecting and fixing inefficiencies of database-backed web applications in IDE . FSE’18

  14. Common Web-app Architecture … HTTP request Application server Database query DBMS 14

  15. Common Web-app Architecture http:// www.xxx.com /blogs/index user HTTP class BlogsController request def index user_id = 1 Controller Application server myblogs = Blog .retrieve(user_id) end end class Blog Model def retrieve(user_id) Blog .where(uid = user_id) SELECT * FROM blogs where uid = id end end Query Translator DBMS 3

  16. Common Web-app Architecture http:// www.xxx.com /blogs/index user HTTP class BlogsController request def index user_id = 1 Controller http://blogs/index Application server … myblogs = Blog .retrieve(user_id) 1001 unread blogs end Arriving at Zurich end View Stopping by Bern Model One day at Luzern Love love Berner Oberland app/views/blogs/index.html.erb Query Translator @myblogs.each do |blog| Love Berner Oberland blogs blog.content<br/> uid contents Back to Lausanne end DBMS 3

  17. Potential sources of inefficiencies Object Relational Mapping Framework class Blog Model Blog .where(uid = user_id) end blogs SELECT * FROM blogs where uid = id uid contents DBMS 3

  18. Potential sources of inefficiencies MVC Design Pattern Object Relational Mapping Framework Controller View class Blog Model Blog .where(uid = user_id) app/views/blogs/index.html.erb end @myblogs.each do |blog| blogs SELECT * FROM blogs where uid = id blog.content<br/> uid contents end DBMS 3

  19. Outline How severe is the problem? 64 issues in Profile 12 apps from 6 common categories 40 pages What are the common inefficiency patterns? 9 anti- Build performance-bug taxonomy patterns How to solve the problem? 1000 + Design automated bug detection & fixing bugs 19

  20. Outline 64 issues in Profile 12 apps from 6 common categories 40 pages Build performance-bug taxonomy Design automated bug detection & fixing 20

  21. Profiling methodology Synthesize DB content based on real-world website statistics Top 2 Apps in 6 popular categories 21 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.

  22. Profiling End-to-end Page Time 6 apps have pages > 3s 11 apps have pages > 2s 40 problematic pages Server takes most time 20000 record 22 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.

  23. Why is it slow? There are inefficiency bugs! 23 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.

  24. Why is it slow? We manually fix the 64 issues we found across 39 pages ● LoC changed speedup 80% There are bugs! 60% 24 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.

  25. Outline Profile 12 apps from 6 common categories 9 anti- Build performance-bug taxonomy patterns Design automated bug detection & fixing 25 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.

  26. Common Performance Anti-patterns 64 performance issues 140 performance issues 9 anti-patterns from profiling from bug tracking system 26 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.

  27. Common Performance Anti-patterns 2 ORM API Application Misuse Design Tradeoff 106 issues across 12 apps 47 issues across 12 apps Database 1 3 Design 41 issues across 10 apps 27 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.

  28. ORM API Misuse Inefficient Computation UC Unnecessary Computation IC 26 issues across 8 apps 22 issues across 10 apps Inefficient Rendering Inefficient Data Access IR ID UD 5 issues across 4 apps 44 issues across 11 app Unnecessary Data Retrieval 9 issues across 4 apps 28 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.

  29. ORM API Misuse Inefficient Computation UC Unnecessary Computation IC 26 issues across 8 apps 22 issues across 10 apps Inefficient Rendering Inefficient Data Access IR ID UD 5 issues across 4 apps 44 issues across 11 app Unnecessary Data Retrieval 9 issues across 4 apps 29 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.

  30. ORM API Misuse: inefficient computation inefficient project.issues.count>0 SELECT COUNT(*) FROM issues WHERE project_id = ? inefficient project.issues.any? SELECT COUNT(*) FROM issues WHERE project_id = ? project.issues.exists? efficient SELECT 1 AS ONE FROM issues WHERE project_id = ? LIMIT 1 2X speedup 30 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.

  31. ORM API Misuse: unnecessary computation values.each do |value| u.issues.include? value end 31 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.

  32. ORM API Misuse: unnecessary computation + rans = u.issues values.each do |value| values.each do |value| - u.issues.include?value + rans.include?value end end 20X speed up 32 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.

  33. ORM API misuses that affect memory consumption map (:id) VS pluck (:id) ● pluck(size).sum VS sum(size) ● pluck + pluck VS SQL UNION ● … ● 33

  34. How to tackle API Misuses? Why cannot existing compiler handle this? ● Can we extend compiler to ● ○ Understand ORM APIs and queries? ○ Detect the problem? ○ Solve the problem? 34 PowerStation: Automatically detecting and fixing inefficiencies of database-backed web applications in IDE . FSE’18

  35. Database-aware PDG Copy: v1 = u Copy: v2 = values v1 = u v2 = values val = v2[] values.reject |val| v2.do |val| u.issues.include?val v3 = v1.issues Call: v3=v1.issues Call: v3=v1.issues Call: v3=v1.issues end v3.include?val query node end data edge Call:v3.include?val control edge SQL: SELECT * from issues (a) Ruby code WHERE user_id=? (b) PDG 35

  36. Detect and Fix Copy: v1 = u Copy: v2 = values val = v2[] Call: v3=v1.issues Loop-invariant query query node data edge control edge Call:v3.include?val 36 PowerStation: Automatically detecting and fixing inefficiencies of database-backed web applications in IDE . FSE’18

  37. PowerStation (Integrated with RubyMine) Click here PowerStation Whole App Single Action issues PowerStation DS RD LI LI IA CS IR blogs_controller.rb 4 FIX FIX blogs_controller.rb 4 run_query is a loop invariant query Fix: move it out of the loop 37 Issue List PowerStation: Automatically detecting and fixing inefficiencies of database-backed web applications in IDE . FSE’18

  38. Try our Powerstation! • 12 real world apps • 1221 inefficiencies found 38 PowerStation: Automatically detecting and fixing inefficiencies of database-backed web applications in IDE . FSE’18

  39. Common Performance Anti-patterns 2 ORM API Application Misuse Design Tradeoff 106 issues across 12 apps 47 issues across 12 apps 1 Database 3 Design 41 issues 39 across 10 apps

  40. Database Design Problem Missing fields (8 issues across 5 apps): ● fields derivable from other fields and not persistently stored ○ id longitude latitude location 2X Missing index (33 issues across 10 apps) ● 40 How not to structure your database-backed web applications: a study of performance bugs in the wild. ICSE’18.

Recommend


More recommend