Caching and Layered Disk Devices CS 4411 Spring 2020
Announcements • EGOS Code Updated • Impending Cornell shutdown • Next lecture will be conducted via Zoom • Office hours will be Zoom meetings hosted by TAs • Links will be posted to Piazza
Outline for Today • Block Cache Design • Memory hierarchy • Disk blocks and block cache • Write-Through vs. Write-Back • The EGOS storage system • Block devices • Layering • Code details
Memory Hierarchy Access Time Size 4 cycles (1 ns) 64 KB L1 Cache 12 cycles (4 ns) 256 KB L2 Cache 42 cycles (14 ns) 9 MB L3 Cache Main Memory 75 ns 16 GB Hard Disk (SSD) 100 𝜈 s 256 GB Hard Disk (Spinning) 10 ms 1 TB
Outline • Block Cache Design • Memory hierarchy • Disk blocks and block cache • Write-Through vs. Write-Back • The EGOS storage system • Block devices • Layering • Code details
Hard Disk Abstraction • Disk drivers provide read/write operations in units of blocks • Usually 512 bytes, based on sector size of a spinning disk • File system stores files in groups of blocks write(5, 128, &buf) … 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Block #
Operations to Read from a File File System process 1. Read file’s inode into 540 memory 541 read(542) foo.txt 2. Get location of data foo.txt 542 inode inode (1o ms later) block 542 blocks from inode 543 read(544) 3. Read each data block foo.txt 544 foo.txt data 1 data 1 read(545) in range of read foo.txt 545 foo.txt data 2 request into memory data 2 foo.txt 546 data 3 4. Respond to file read 547 request 548 Disk
Operations to Read from a File File System process 540 • How long does it take 541 to read 1000 bytes from read(542) foo.txt foo.txt 542 the file? inode inode (1o ms later) block 542 543 • What happens when we read(544) foo.txt 544 foo.txt get another read data 1 data 1 read(545) foo.txt 545 request for the same foo.txt data 2 data 2 foo.txt 546 file? data 3 547 • Why is this inefficient? 548 Disk
The Block Cache • Store recently-used disk Empty slot 540 metadata Block #, valid blocks in memory 541 foo.txt metadata • Cache entry metadata 542 inode foo.txt inode 543 indicates which block it metadata foo.txt foo.txt 544 caches (if any) data 1 data 1 foo.txt metadata 545 • Reading a cached block data 2 foo.txt 546 data 3 is a memory access, not Block cache 547 (in memory) a disk access 548 Disk
Using the Block Cache File System process 540 metadata read(542) 541 foo.txt inode foo.txt metadata (14-75 ns later) 542 inode foo.txt block 542 inode 543 metadata read(544) foo.txt foo.txt foo.txt 544 data 1 data 1 data 1 read(545) read(545) foo.txt metadata foo.txt 545 data 2 foo.txt data 2 (1o ms later) block 545 foo.txt block 545 data 2 546 data 3 Block cache 547 (in memory) After a cache miss, put the 548 requested block in the cache Disk
Cache Eviction • What if the cache is full and 540 a process needs to read a bar.txt metadata 541 new block? inode bar.txt foo.txt inode 542 • Choose a block to evict inode metadata read(548) bar.txt foo.txt 543 based on an eviction data 2 inode foo.txt metadata 544 algorithm data 1 foo.txt foo.txt • LRU, LFU, CLOCK, etc. data 1 545 data 2 metadata • Block cache service must keep foo.txt foo.txt 546 data 3 data 2 state for this algorithm bar.txt 547 Block cache data 1 • This assignment: CLOCK bar.txt 548 data 2 Disk
Outline • Block Cache Design • Memory hierarchy • Disk blocks and block cache • Write-Through vs. Write-Back • The EGOS storage system • Block devices • Layering • Code details
Handling Writes • What if a process writes to 540 metadata a block that’s in the cache? 541 File System process foo.txt metadata 542 inode foo.txt inode 543 metadata foo.txt write(544) foo.txt foo.txt data 1 544 data 1 data 1 foo.txt (1o ms later) Done metadata 545 data 2 foo.txt foo.txt data 2 • Opt. 1: Forward the write 546 data 3 Block cache 547 to the disk now 548 • Write-Through cache Disk
Handling Writes • Opt. 2: Write to the cache, 540 metadata then sync to disk later Dirty = true 541 File System process foo.txt metadata 542 inode foo.txt Sync process: inode 543 metadata foo.txt write(544) foo.txt foo.txt data 1 544 data 1 data 1 foo.txt (1o ms later) Done metadata 545 data 2 foo.txt foo.txt data 2 546 data 3 • Write-Back cache Block cache 547 • Mark cache slots as “dirty” 548 when they need syncing Disk
Outline • Block Cache Design • Memory hierarchy • Disk blocks and block cache • Write-Through vs. Write-Back • The EGOS storage system • Block devices • Layering • Code details
Storage in EGOS • Disk server (disksvr.c) reads and writes blocks to HW • Block server (blocksvr.c) also req: read file req: read blocks reads and writes blocks • Forwards requests to disk user Block BFS proc server read block server (eventually) reply reply • Blocks are grouped by inode # • File server (bfs.c) stores files in sequences of blocks Disk reply: block • Each file has an inode for its blocks server
Block Service Layering read(inode, block) • Within the block server, a stack of block stores Block Server • Each block store has the read(inode, block) Block data same interface TreeDisk block store • Block server sends read(inode, block) Block data requests to top of stack ClockDisk block store • Each block store knows the block store below it, read(inode, block) Block data can “pass through” read ProtDisk (relay) block store and write operations read(block) (to disk server)
Block Service Layering read(inode, block) • This is how EGOS adds Block Server a block cache – it’s a read(inode, block) Block data block store layer! TreeDisk block store read(inode, block) Block data • Reads don’t have to be ClockDisk block store forwarded if the block read(inode, block) Block data is in the cache ProtDisk (relay) block store read(block) (to disk server)
Block Service Layering read(inode, block) • Important: Each block store can have its own Block Server interpretation of inode read(inode, block) Block data numbers TreeDisk block store • In TreeDisk , inodes track groups of blocks belonging read(inode, block) Block data to the same file ClockDisk block store • In ProtDisk , inodes read(inode, block) Block data represent disk partitions on the underlying disk server ProtDisk (relay) block store • Right now there’s only one, so all ProtDisk ops have inode = 0 read(block) (to disk server)
Outline • Block Cache Design • Memory hierarchy • Disk blocks and block cache • Write-Through vs. Write-Back • The EGOS storage system • Block devices • Layering • Code details
Adding “Objects” to C • A block store is a struct typedef struct block_store { void *state; full of function pointers int (*getninodes)(struct block_store *this_bs); int (*getsize)(struct block_store *this_bs, • Each FP is a “member unsigned int ino); int (*setsize)(struct block_store *this_bs, function” whose first unsigned int ino, block_no newsize); argument is “this” int (*read)(struct block_store *this_bs, unsigned int ino, block_no offset, block_t *block); • Also a pointer to some int (*write)(struct block_store *this_bs, unsigned int ino, block_no offset, block_t *block); other struct containing void (*release)(struct block_store *this_bs); int (*sync)(struct block_store *this_bs, the block store’s state – unsigned int ino); } block_store_t; private member variables typedef block_store_t *block_if;
Adding “Objects” to C • Each block store “class” int clockdisk_getninodes(block_store_t *this_bs); int clockdisk_getsize(block_store_t *this_bs, unsigned int ino); can inherit this int clockdisk_setsize(block_store_t *this_bs, unsigned int ino, block_no newsize); “interface” by providing int clockdisk_read(block_store_t *this_bs, unsigned int ino, block_no offset, block_t *block); int clockdisk_write(block_store_t *this_bs, functions matching the unsigned int ino, block_no offset, block_t *block); void clockdisk_release(block_store_t *this_bs); FP types int clockdisk_sync(block_store_t *this_bs, unsigned int ino); • Each can define its own struct clockdisk_state { block_if below; state struct block_t* blocks; block_no nblocks; };
Adding “Objects” to C • Each block store class block_if clockdisk_init(block_if below, block_t *blocks, block_no nblocks) { struct clockdisk_state *cs = new_alloc ( has a “constructor” that struct clockdisk_state); cs->below = below; Initialize this returns a block_if cs->blocks = blocks; object’s state cs->nblocks = nblocks; block_if this_bs = new_alloc (block_store_t); this_bs->state = cs; this_bs->getninodes = clockdisk_getninodes; this_bs->getsize = clockdisk_getsize; Assign the function pointers this_bs->setsize = clockdisk_setsize; in block_store_t to this this_bs->read = clockdisk_read; class’s implementation of this_bs->write = clockdisk_write; this_bs->release = clockdisk_release; those functions this_bs->sync = clockdisk_sync; return this_bs; }
Recommend
More recommend