understanding the role of io as a bottleneck
play

Understanding the Role of IO as a Bottleneck Morgan Tocker - PowerPoint PPT Presentation

Understanding the Role of IO as a Bottleneck Morgan Tocker firstname @percona.com 1 Wednesday, April 14, 2010 "Disks are the root cause of all database performance problems" - Anonymous 2 Wednesday, April 14, 2010 Is that true?


  1. Understanding the Role of IO as a Bottleneck Morgan Tocker firstname @percona.com 1 Wednesday, April 14, 2010

  2. "Disks are the root cause of all database performance problems" - Anonymous 2 Wednesday, April 14, 2010

  3. Is that true? ★ That statement really the theme of this talk. ✦ I hate to talk in tautologies -so I won’t actually answer if it is true or false . ✦ What I will do is share the background theory so you can answer it for your specific case. 3 Wednesday, April 14, 2010

  4. Let’s start with some context: L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 25 ns Main memory reference 100 ns Compress 1K bytes with Zippy 3,000 ns Send 2K bytes over 1 Gbps network 20,000 ns Read 1 MB sequentially from memory 250,000 ns Round trip within same datacenter 500,000 ns Disk seek 10,000,000 ns Read 1 MB sequentially from disk 20,000,000 ns Send packet CA->Netherlands->CA 150,000,000 ns See: http://www.linux-mag.com/cache/7589/1.html and Google http:// 4 www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf Wednesday, April 14, 2010

  5. Mental Math.. ★ 10,000,000 ns = 10ms = 100 operations/second. ✦ The figure Google quoted here is about the average for a 7200RPM drive. ✦ When we talk about our storage devices, we most commonly measure them in IOPS (IO operations per second). ✦ So a 7200RPM drive can do approximately 100IOPS. 5 Wednesday, April 14, 2010

  6. Why count “operations”? ★ Because there’s not much difference between doing one small request versus one slightly larger request. Again; Disk seek 10,000,000 ns Read 1 MB sequentially from disk 20,000,000 ns 6 Wednesday, April 14, 2010

  7. Yeah, there’a gap: ★ For each disk operation: ✦ Millions of CPU operations can be done. ✦ Hundreds of thousands of memory operations can be done. 7 Wednesday, April 14, 2010

  8. [os default] Everything is buffered! ★ When you write to a file, here’s what happens in the Operating System: Block 9, 10, 1, 4, 200, 5. Block 1, 4, 5, 9, 10, 200 8 Wednesday, April 14, 2010

  9. [os default] Everything is buffered! ★ When you write to a file, here’s what happens in the Operating System: Block 9, 10, 1, 4, 200, 5. Block 1, 4, 5, 9, 10, 200 What happens to this buffer if we loose power? 8 Wednesday, April 14, 2010

  10. The OS provides a way! ★ $ man fsync Hint: MyISAM just writes to the OS buffer and has no durability. Synopsis #include <unistd.h> int fsync(int fd); int fdatasync(int fd); Description fsync() transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device) where that file resides. The call blocks until the device reports that the transfer has completed. It also flushes metadata information associated with the file (see stat(2)). 9 http://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/ Wednesday, April 14, 2010

  11. Basic Operation (High Level) Log Files SELECT * FROM City WHERE CountryCode= ʼ AUS ʼ Tablespace Buffer Pool 10 Wednesday, April 14, 2010

  12. Basic Operation (High Level) Log Files SELECT * FROM City WHERE CountryCode= ʼ AUS ʼ Tablespace Buffer Pool 10 Wednesday, April 14, 2010

  13. Basic Operation (High Level) Log Files SELECT * FROM City WHERE CountryCode= ʼ AUS ʼ Tablespace Buffer Pool 10 Wednesday, April 14, 2010

  14. Basic Operation (High Level) Log Files SELECT * FROM City WHERE CountryCode= ʼ AUS ʼ Tablespace Buffer Pool 10 Wednesday, April 14, 2010

  15. Basic Operation (High Level) Log Files SELECT * FROM City WHERE CountryCode= ʼ AUS ʼ Tablespace Buffer Pool 10 Wednesday, April 14, 2010

  16. Basic Operation (High Level) Log Files SELECT * FROM City WHERE CountryCode= ʼ AUS ʼ Tablespace Buffer Pool 10 Wednesday, April 14, 2010

  17. Basic Operation (cont.) Log Files UPDATE City SET name = 'Morgansville' WHERE name = 'Brisbane' AND CountryCode='AUS' Tablespace Buffer Pool 11 Wednesday, April 14, 2010

  18. Basic Operation (cont.) Log Files UPDATE City SET name = 'Morgansville' WHERE name = 'Brisbane' AND CountryCode='AUS' Tablespace Buffer Pool 11 Wednesday, April 14, 2010

  19. Basic Operation (cont.) Log Files UPDATE City SET name = 'Morgansville' WHERE name = 'Brisbane' AND CountryCode='AUS' Tablespace Buffer Pool 11 Wednesday, April 14, 2010

  20. Basic Operation (cont.) Log Files UPDATE City SET name = 'Morgansville' WHERE name = 'Brisbane' AND CountryCode='AUS' Tablespace Buffer Pool 11 Wednesday, April 14, 2010

  21. Basic Operation (cont.) 01010 Log Files UPDATE City SET name = 'Morgansville' WHERE name = 'Brisbane' AND CountryCode='AUS' Tablespace Buffer Pool 11 Wednesday, April 14, 2010

  22. Basic Operation (cont.) 01010 Log Files UPDATE City SET name = 'Morgansville' WHERE name = 'Brisbane' AND CountryCode='AUS' Tablespace Buffer Pool 11 Wednesday, April 14, 2010

  23. Basic Operation (cont.) 01010 Log Files UPDATE City SET name = 'Morgansville' WHERE name = 'Brisbane' AND CountryCode='AUS' Tablespace Buffer Pool 11 Wednesday, April 14, 2010

  24. Basic Operation (cont.) 01010 Log Files UPDATE City SET name = 'Morgansville' WHERE name = 'Brisbane' AND CountryCode='AUS' Tablespace Buffer Pool 11 Wednesday, April 14, 2010

  25. Why does InnoDB work like...? ★ Log files are sequential IO. ★ Writing down “dirty pages” is reordered to be sequential IO instead of order of the operation. ★ Many database systems actually work this way. 12 Wednesday, April 14, 2010

  26. So now we’ve established... ★ That: ✦ Disks are slow. ✦ We try and write our algorithms around them wherever we can possibly do so. ★ The next question becomes: ✦ When is touching disks unavoidable? 13 Wednesday, April 14, 2010

  27. For Reads: ★ I like to think of this as a game of pong, and it depends how big our paddle is as to how many ‘physical reads’ we need: 14 Wednesday, April 14, 2010

  28. For Reads (cont.) ★ If the paddle is big enough to cover all our data - you will have no physical (disk) reads. ✦ If it’s not big enough, it depends how much of your data needs to be in your ‘working set’. 15 Wednesday, April 14, 2010

  29. For Writes: ★ Writes don’t really get to play pong. They always go down to disk - this is because we need persistence after crashes. ✦ Think D urability in ACID. 16 Wednesday, April 14, 2010

  30. Interpreting OS commands ★ [On Linux] You’re best friend here is iostat. ✦ vmstat has some info. Just not always enough. ★ Let’s try and interpret a very simple example to start with.... 17 Wednesday, April 14, 2010

Recommend


More recommend