Understanding the Role of IO as a Bottleneck Morgan Tocker firstname @percona.com 1 Wednesday, April 14, 2010
"Disks are the root cause of all database performance problems" - Anonymous 2 Wednesday, April 14, 2010
Is that true? ★ That statement really the theme of this talk. ✦ I hate to talk in tautologies -so I won’t actually answer if it is true or false . ✦ What I will do is share the background theory so you can answer it for your specific case. 3 Wednesday, April 14, 2010
Let’s start with some context: L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 25 ns Main memory reference 100 ns Compress 1K bytes with Zippy 3,000 ns Send 2K bytes over 1 Gbps network 20,000 ns Read 1 MB sequentially from memory 250,000 ns Round trip within same datacenter 500,000 ns Disk seek 10,000,000 ns Read 1 MB sequentially from disk 20,000,000 ns Send packet CA->Netherlands->CA 150,000,000 ns See: http://www.linux-mag.com/cache/7589/1.html and Google http:// 4 www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf Wednesday, April 14, 2010
Mental Math.. ★ 10,000,000 ns = 10ms = 100 operations/second. ✦ The figure Google quoted here is about the average for a 7200RPM drive. ✦ When we talk about our storage devices, we most commonly measure them in IOPS (IO operations per second). ✦ So a 7200RPM drive can do approximately 100IOPS. 5 Wednesday, April 14, 2010
Why count “operations”? ★ Because there’s not much difference between doing one small request versus one slightly larger request. Again; Disk seek 10,000,000 ns Read 1 MB sequentially from disk 20,000,000 ns 6 Wednesday, April 14, 2010
Yeah, there’a gap: ★ For each disk operation: ✦ Millions of CPU operations can be done. ✦ Hundreds of thousands of memory operations can be done. 7 Wednesday, April 14, 2010
[os default] Everything is buffered! ★ When you write to a file, here’s what happens in the Operating System: Block 9, 10, 1, 4, 200, 5. Block 1, 4, 5, 9, 10, 200 8 Wednesday, April 14, 2010
[os default] Everything is buffered! ★ When you write to a file, here’s what happens in the Operating System: Block 9, 10, 1, 4, 200, 5. Block 1, 4, 5, 9, 10, 200 What happens to this buffer if we loose power? 8 Wednesday, April 14, 2010
The OS provides a way! ★ $ man fsync Hint: MyISAM just writes to the OS buffer and has no durability. Synopsis #include <unistd.h> int fsync(int fd); int fdatasync(int fd); Description fsync() transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device) where that file resides. The call blocks until the device reports that the transfer has completed. It also flushes metadata information associated with the file (see stat(2)). 9 http://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/ Wednesday, April 14, 2010
Basic Operation (High Level) Log Files SELECT * FROM City WHERE CountryCode= ʼ AUS ʼ Tablespace Buffer Pool 10 Wednesday, April 14, 2010
Basic Operation (High Level) Log Files SELECT * FROM City WHERE CountryCode= ʼ AUS ʼ Tablespace Buffer Pool 10 Wednesday, April 14, 2010
Basic Operation (High Level) Log Files SELECT * FROM City WHERE CountryCode= ʼ AUS ʼ Tablespace Buffer Pool 10 Wednesday, April 14, 2010
Basic Operation (High Level) Log Files SELECT * FROM City WHERE CountryCode= ʼ AUS ʼ Tablespace Buffer Pool 10 Wednesday, April 14, 2010
Basic Operation (High Level) Log Files SELECT * FROM City WHERE CountryCode= ʼ AUS ʼ Tablespace Buffer Pool 10 Wednesday, April 14, 2010
Basic Operation (High Level) Log Files SELECT * FROM City WHERE CountryCode= ʼ AUS ʼ Tablespace Buffer Pool 10 Wednesday, April 14, 2010
Basic Operation (cont.) Log Files UPDATE City SET name = 'Morgansville' WHERE name = 'Brisbane' AND CountryCode='AUS' Tablespace Buffer Pool 11 Wednesday, April 14, 2010
Basic Operation (cont.) Log Files UPDATE City SET name = 'Morgansville' WHERE name = 'Brisbane' AND CountryCode='AUS' Tablespace Buffer Pool 11 Wednesday, April 14, 2010
Basic Operation (cont.) Log Files UPDATE City SET name = 'Morgansville' WHERE name = 'Brisbane' AND CountryCode='AUS' Tablespace Buffer Pool 11 Wednesday, April 14, 2010
Basic Operation (cont.) Log Files UPDATE City SET name = 'Morgansville' WHERE name = 'Brisbane' AND CountryCode='AUS' Tablespace Buffer Pool 11 Wednesday, April 14, 2010
Basic Operation (cont.) 01010 Log Files UPDATE City SET name = 'Morgansville' WHERE name = 'Brisbane' AND CountryCode='AUS' Tablespace Buffer Pool 11 Wednesday, April 14, 2010
Basic Operation (cont.) 01010 Log Files UPDATE City SET name = 'Morgansville' WHERE name = 'Brisbane' AND CountryCode='AUS' Tablespace Buffer Pool 11 Wednesday, April 14, 2010
Basic Operation (cont.) 01010 Log Files UPDATE City SET name = 'Morgansville' WHERE name = 'Brisbane' AND CountryCode='AUS' Tablespace Buffer Pool 11 Wednesday, April 14, 2010
Basic Operation (cont.) 01010 Log Files UPDATE City SET name = 'Morgansville' WHERE name = 'Brisbane' AND CountryCode='AUS' Tablespace Buffer Pool 11 Wednesday, April 14, 2010
Why does InnoDB work like...? ★ Log files are sequential IO. ★ Writing down “dirty pages” is reordered to be sequential IO instead of order of the operation. ★ Many database systems actually work this way. 12 Wednesday, April 14, 2010
So now we’ve established... ★ That: ✦ Disks are slow. ✦ We try and write our algorithms around them wherever we can possibly do so. ★ The next question becomes: ✦ When is touching disks unavoidable? 13 Wednesday, April 14, 2010
For Reads: ★ I like to think of this as a game of pong, and it depends how big our paddle is as to how many ‘physical reads’ we need: 14 Wednesday, April 14, 2010
For Reads (cont.) ★ If the paddle is big enough to cover all our data - you will have no physical (disk) reads. ✦ If it’s not big enough, it depends how much of your data needs to be in your ‘working set’. 15 Wednesday, April 14, 2010
For Writes: ★ Writes don’t really get to play pong. They always go down to disk - this is because we need persistence after crashes. ✦ Think D urability in ACID. 16 Wednesday, April 14, 2010
Interpreting OS commands ★ [On Linux] You’re best friend here is iostat. ✦ vmstat has some info. Just not always enough. ★ Let’s try and interpret a very simple example to start with.... 17 Wednesday, April 14, 2010
Recommend
More recommend