analysis of derby performance
play

Analysis of Derby Performance Staff Engineer Olav Sandst Senior - PowerPoint PPT Presentation

Analysis of Derby Performance Staff Engineer Olav Sandst Senior Engineer Dyre Tjeldvoll Sun Microsystems Database Technology Group This is a draft version that is subject to change. The authors can be contacted at Olav.Sandstaa@sun.com or


  1. Analysis of Derby Performance Staff Engineer Olav Sandstå Senior Engineer Dyre Tjeldvoll Sun Microsystems Database Technology Group This is a draft version that is subject to change. The authors can be contacted at Olav.Sandstaa@sun.com or Dyre.Tjeldvoll@sun.com 1

  2. Overview ● Introduction – What is a database (DBMS)? – Derby Architecture – What is performance? ● Performance Evaluation of Derby ● Performance Tips ● Comparison with MySQL and PostgreSQL 2

  3. Introduction 3

  4. Derby Architecture ● Embedded ● Network Server Appl. Network Appl. JDBC server JDBC JDBC SQL SQL Access Access Storage Storage 4

  5. What Is “Performance”? ● How do you measure database performance? – Throughput – Response time ● Average? ● Max? ● Median? ● Out-of-the-box or carefully tuned? ● How to compare database systems with different tuning possibilities? ● Zero administration? 5

  6. Performance Evaluation of Derby 6

  7. Performance Evaluation of Derby ● Evaluation of disk and file system configurations ● Comparing Embedded and Network Server – Throughput and response times – Resource usage (CPU, network) ● The effects of adjusting the database buffer size (page cache size) ● The effect of keeping the log on a separate disk 7

  8. Derby Performance Evaluation: Derby, OS and Hardware Configuration ● Derby configuration: – “Out of the box” – “Main memory” database – “Disk” database ● Log device: – Same as data device – Separate log disk ● Disk write cache: – Enabled – Disabled 8

  9. Disk and File System Configurations ● Hard disk: – Write cache ● File system: – Logging/journaling in UFS – Direct I/O (partially evaluated) – File meta-data update (not evaluated) 9

  10. Disk Write Cache: Throughput and Response Time 10

  11. Disk and File System Configurations: Conclusions ● Disk Write Cache: Do users want high throughput and low response times, or durability/recoverability? Who are we optimizing for? ● Without write cache on disks a higher number of clients are needed to get maximum throughput ● Things to evaluate: – UFS logging – Java NIO – Direct I/O (not available from Java) – Avoid update of file meta-data 11

  12. Comparing Embedded with Network Server: TPC-B: Throughput and Response Time 12

  13. Comparing Embedded with Network Server: Single-record SELECT 13

  14. Comparing Embedded with Network Server: Resource Usage Derby Embedded: Derby Network Server: CPU? Network? Appl. Network Appl. JDBC server JDBC JDBC CPU? SQL SQL Network? Access Access Storage Storage 14

  15. Comparing Embedded and Network Server: CPU Usage CPU usage per transaction [ms] 1.4 1.3 1.2 1.1 1 0.9 0.8 User CPU 0.7 System CPU 0.6 0.5 0.4 0.3 0.2 0.1 0 TPC- TPC- Insert Insert Se- Se- Up- Up- B B CS Emb CS lect lect date date Emb Emb Cs Emb CS 15

  16. Comparing Embedded and Network Server: CPU Usage, cont. Increase in CPU usage compared to Embedded: System CPU User CPU Total TPC-B 0.09 64% 0.25 32% 36% Insert 0.04 100% 0.05 21% 33% Select 0.04 133% 0.06 52% 65% Update 0.03 75% 0.04 21% 30% Message handling The Network (TCP/IP) Server Code 16

  17. Comparing Embedded with Network Server: Conclusions ● Reduction in throughput of Network Server compared to Embedded: – Update operations: 5% – Select operations: 30-40% ● Response time: – Small increase of about 0.5 ms per SQL operation ● Utilizing the write cache on the disks increases the difference for update operations ● What can be done? – Profile to find CPU intensive parts in the Network Server code – Optimize 17

  18. Comparing Embedded with Network Server: Conclusions ● The Network Server adds 30%-60% to the CPU usage compared to the Embedded version ● The main causes for increased CPU usage: – System CPU usage: ● Message sending and receiving – User CPU usage: ● Message parsing (Strings) ● Character set conversions 18

  19. Performance Tips 19

  20. Performance Tips ● Programming: – Prepared statements ● Database schema design: – Indices ● Derby configuration: – durability=test , if recovery is not important 20

  21. Prepared Statements ● Compilation is expensive, especially in Derby which uses Java byte code generation ● Prepared statements virtually eliminates this cost ● Easy to fall into this trap for beginners (string concatenation is easy) ● <Example> 21

  22. Indices ● Use indices to optimize much used access paths ● <Example> 22

  23. Relaxed Durability ● Durability is expensive – Log must be written to disk prior to commit – This becomes a major bottleneck (cf. write cache, log on separate device) ● Durability can be disabled – Setting durability=test disables log flushing at commit – Dramatically increases throughput... – ... but database may be corrupt after a crash – Would be better if Derby could recover to a consistent state (with some transaction loss) – An alternative to a main memory database ● <Example> 23

  24. Comparing the Performance of MySQL, PostgreSQL and Derby 24

  25. Performance evaluation: MySQL, PostgreSQL and Derby Evaluated performance of: ● MySQL/InnoDB ● PostgreSQL ● Derby Embedded ● Derby Client/Server 25

  26. What is a Database (DBMS)? ● Transactions ● ACID – Atomicity – Consistency – Isolation – Durability Database Atomicity Consistency Isolation Durability Derby Yes Yes Yes Yes PostgreSQL Yes Yes Yes Yes MySQL, Innodb Yes Yes Yes Yes MySQL, MyIsam No ? Table lock Yes HSQLDB No ? ? No 26

  27. Performance Experiments ● Two configurations: – “Small” main-memory database: 10 MB data/50MB database buffer – “Big” disk-based database: 50 MB data/10MB database buffer ● Tests: – TPC-B – Single-tuple operations: select ● Load: – 1-100 concurrent clients 27

  28. Throughput: TPC-B “Small” db “Big” db 1600 500 Derby embedded Derby embedded Derby client/server Derby client/server MySQL (InnoDB) MySQL (InnoDB) 450 1400 PostgreSQL PostgreSQL 400 1200 350 Transactions per second Transactions per second 1000 300 800 250 200 600 150 400 100 200 50 0 0 0 20 40 60 80 100 0 20 40 60 80 100 Number of clients Number of clients 28

  29. Throughput: Select (“small” db) 25000 Derby embedded Derby client/server MySQL (InnoDB) PostgreSQL 20000 Transactions per second 15000 10000 5000 0 0 20 40 60 80 100 29 Number of clients

  30. Conclusions ● Derby outperforms MySQL on large databases ● MySQL performs better on small main-memory databases ● No significant performance loss with client/server, except for SELECT operations ● Need to identify, and reduce the CPU overhead that limits throughput in CPU-bound configurations 30

  31. Disk IO: “Small” Databases 2.5 2.25 2 1.75 1.5 Db writes per transaction 1.25 Log writes per transaction 1 0.75 0.5 0.25 0 Derby em- Derby cli- MySQL PostgreSQL bedded ent/server 31

  32. Disk IO: “Big” Databases Why does Derby out-perform MySQL for disk-based databases? 2 1.8 1.6 - TPC-B like load - 50 MB database 1.4 - 10 MB buffer 1.2 Db writes per 1 transaction Log writes per transaction 0.8 0.6 0.4 0.2 0 Derby client/server MySQL 32

  33. JDBC Driver: CPU Usage CPU usage per TPC-B transaction [ms]: 0.16 0.15 0.14 0.13 0.12 0.11 0.1 0.09 0.08 User System 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 33 Derby MySQL PostgreSQL

  34. Conclusions: Resource Usage ● MySQL performs better than Derby when: – The database is small and fits in the database buffer – Throughput becomes CPU-bound ● Derby performs better than MySQL when: – The database is large and does not fit in the database buffer – Throughput becomes IO-bound ● Derby has focused on maintaining a low footprint 34

Recommend


More recommend