redo logging fjnish distributed systems 1
play

redo logging (fjnish) / distributed systems 1 1 last time (1) - PowerPoint PPT Presentation

redo logging (fjnish) / distributed systems 1 1 last time (1) block groups keep related data+metadata in one part of disk preference, not requirement exceptions can span multiple block groups divide up block/inode indices between block


  1. idempotency logged operations should be okay to do twice = idempotent bad example: increment inode link count as long as last committed inode value in log is right… bad example: allocate new inode with particular contents good example: overwrite data block with new value bad example: append data to last used block of fjle 6 good example: set inode link count to 4 good example: overwrite inode number X with new value

  2. redo logging summary write intended operation to the log before ever touching ‘real’ data in format that’s safe to do twice write marker to commit to the log if exists, the operation will be done eventually actually update the real data 7

  3. redo logging and fjlesystems fjlesystems that do redo logging are called journalling fjlesystems 8

  4. exercise (1) suppose OS performing operation of appending 100KB to a 100KB fjle X in directory Y and uses redo logging, ext2-like fjlesystem with 1KB blocks, 4B block pointers part 1: what’s modifjed? [A] free block map [B] data blocks for fjle [C] indirect blocks for fjle [D] data blocks for directory [E] inode for fjle [F] inode for directory [G] the log 9

  5. exercise (2) suppose OS performing operation of appending 100KB to a 100KB fjle X in directory Y and uses redo logging part 2: crash happens after writing: log entries for entire operation free block map changes indirect blocks for fjle …what is written after restart as part of this operation? [A] free block map [B] data blocks for fjle [C] indirect blocks for fjle [D] data blocks for directory [E] inode for fjle [F] inode for directory [G] the log 10

  6. lots of writing? entire log can be written sequentially ideal for hard disk performance also pretty good for SSDs no waiting for ‘real’ updates application can proceed while updates are happening fjles will be updated even if system crashes often better for performance! 11

  7. degrees of consistency not all journalling fjlesystem use redo logging for everything some use it only for metadata operations some use it for both metadata and user data only metadata: avoids lots of duplicate writing metadata+user data: integrity of user data guaranteed 12

  8. distributed systems multiple machines working together to perform a single task 13 called a distributed system

  9. some distibuted systems models 2 peer-to-peer 7 node 6 node 5 node 4 node 3 node node client/server 1 node … N client N-1 client 2 client 1 client server 14

  10. client/server model server client GET /index.html index.html’s contents are … client(s): “sometimes on” sends requests to server(s) needs to know how to contact server server(s): “always on” responds to client requests never initiaties contact with a client 15

  11. client/server model server client GET /index.html index.html’s contents are … client(s): “sometimes on” sends requests to server(s) needs to know how to contact server server(s): “always on” responds to client requests never initiaties contact with a client 15

  12. client/server model server client GET /index.html index.html’s contents are … client(s): “sometimes on” sends requests to server(s) needs to know how to contact server server(s): “always on” responds to client requests never initiaties contact with a client 15

  13. layers of servers? ad server database server application server web server web client web server is also application server’s client 16

  14. example: Wikipedia architecture image by Timo Tijhof, via https://commons.wikimedia.org/wiki/File:Wikipedia_webrequest_flow_2015-10.png 17

  15. example: Wikipedia architecture (zoom) image by Timo Tijhof, via https://commons.wikimedia.org/wiki/File:Wikipedia_webrequest_flow_2015-10.png 18

  16. peer-to-peer no always-on server everyone knows about hopefully, no one bottleneck — “scalability” any machine can contact any other machine every machine plays an approx. equal role? set of machines may change over time 19

  17. why distributed? multiple machine owners collaborating put (part of) service “in the cloud” combine many cheap machines to replace expensive machine 20 delegation of responsiblity to other entity easier to add incrementally redundancy — one machine can fail and system still works?

  18. exercise which are likely advantages of client/server model over peer-to-peer? [A] easier to make whole system work despite failure of any machine [B] easier to handle most machines being offmine a majority of the time [C] better suited to a mix of a few very big/high-performance and many small/low-performance machines 21

  19. mailbox model Recv() = “Hello” receiving program not yet received by queue of messages waiting to be sent from sending program queue of messages network knows how to get message to B B: “Hello” mailbox abstraction: send/receive messages Send(B, “Hello”) B: “Hello” B machine the network A machine 22

  20. mailbox model Recv() = “Hello” receiving program not yet received by queue of messages waiting to be sent from sending program queue of messages network knows how to get message to B B: “Hello” mailbox abstraction: send/receive messages Send(B, “Hello”) B: “Hello” B machine the network A machine 22

  21. mailbox model Recv() = “Hello” receiving program not yet received by queue of messages waiting to be sent from sending program queue of messages network knows how to get message to B B: “Hello” mailbox abstraction: send/receive messages Send(B, “Hello”) B: “Hello” B machine the network A machine 22

  22. mailbox model Recv() = “Hello” receiving program not yet received by queue of messages waiting to be sent from sending program queue of messages network knows how to get message to B B: “Hello” mailbox abstraction: send/receive messages Send(B, “Hello”) B: “Hello” B machine the network A machine 22

  23. what about servers? client/server model: server wants to reply to clients might want to send/receive multiple messages can build this with mailbox idea send a ‘return address’ need to track related messages common abstraction that does this: the connection 23

  24. what about servers? client/server model: server wants to reply to clients might want to send/receive multiple messages can build this with mailbox idea send a ‘return address’ need to track related messages common abstraction that does this: the connection 23

  25. extension: conections Conn = Accept() “4” = Recv(Conn) Send(Conn, “4”) A: (B, “4”) “2 + 2 = ?” = Recv(Conn) Send(Conn, “2 + 2 = ?”) B: (A, “2 + 2 = ?”) A: connection to B OK! connections : two-way channel for messages Conn = Connect(B) B: open connection to A? B machine A machine extra operations: connect, accept 24

  26. connections versus pipes connections look kinda like two-direction pipes in fact, in POSIX will have the same API: each end gets fjle descriptor representing connection can use read() and write() 25

  27. connections over mailboxes real Internet: mailbox-style communication send packets to particular mailboxes no gaurentee on order, when received no relationship between connections implemented on top of this full details: take networking (CS/ECE 4457) 26

  28. connection missing pieces? how to specify the machine? multiple programs on one machine? who gets the message? 28

  29. names and addresses IPv6 address 2607:f8b0:4004:80b::2005 port number 443 service name https memory address 0x7FFF9430 variable counter and device 0x2eh / 0x46d inode# 120800873 fjlename /home/cr4bd/NOTES.txt hostname mail.google.com name IPv4 address 216.58.217.69 hostname mail.google.com IPv4 address 128.143.22.36 hostname www.virginia.edu location/how to locate logical identifjer address 29

  30. hostnames typically use domain name system (DNS) to fjnd machine names maps logical names like www.virginia.edu chosen for humans hierarchy of names …to addresses the network can use to move messages numbers ranges of numbers assigned to difgerent parts of the network network routers knows “send this range of numbers goes this way” 30

  31. DNS: distributed database cs.virginia.edu check for updated version once in a while optimization: cache its address .edu server doesn’t change much try .edu server at … www.cs.virginia.edu? 128.143.67.11 www.cs.virginia.edu = www.cs.virginia.edu? address for DNS server DNS server my virginia.edu DNS server .edu DNS server root when it connected to network address sent to my machine DNS server ISP’s machine 31

  32. DNS: distributed database cs.virginia.edu check for updated version once in a while optimization: cache its address .edu server doesn’t change much try .edu server at … www.cs.virginia.edu? 128.143.67.11 www.cs.virginia.edu = www.cs.virginia.edu? address for DNS server DNS server my virginia.edu DNS server .edu DNS server root when it connected to network address sent to my machine DNS server ISP’s machine 31

  33. DNS: distributed database cs.virginia.edu check for updated version once in a while optimization: cache its address .edu server doesn’t change much try .edu server at … www.cs.virginia.edu? 128.143.67.11 www.cs.virginia.edu = www.cs.virginia.edu? address for DNS server DNS server my virginia.edu DNS server .edu DNS server root when it connected to network address sent to my machine DNS server ISP’s machine 31

  34. DNS: distributed database cs.virginia.edu check for updated version once in a while optimization: cache its address .edu server doesn’t change much try .edu server at … www.cs.virginia.edu? 128.143.67.11 www.cs.virginia.edu = www.cs.virginia.edu? address for DNS server DNS server my virginia.edu DNS server .edu DNS server root when it connected to network address sent to my machine DNS server ISP’s machine 31

  35. DNS: distributed database cs.virginia.edu check for updated version once in a while optimization: cache its address .edu server doesn’t change much try .edu server at … www.cs.virginia.edu? 128.143.67.11 www.cs.virginia.edu = www.cs.virginia.edu? address for DNS server DNS server my virginia.edu DNS server .edu DNS server root when it connected to network address sent to my machine DNS server ISP’s machine 31

  36. connection missing pieces? how to specify the machine? multiple programs on one machine? who gets the message? 32

  37. IPv4 addresses 32-bit numbers typically written like 128.143.67.11 four 8-bit decimal values separated by dots fjrst part is most signifjcant organizations get blocks of IPs e.g. UVa has 128.143.0.0–128.143.255.255 e.g. Google has 216.58.192.0–216.58.223.255 and 74.125.0.0–74.125.255.255 and 35.192.0.0–35.207.255.255 33 same as 128 · 256 3 + 143 · 256 2 + 67 · 256 + 11 = 2 156 782 459

  38. selected special IPv4 addresses 127.0.0.0 — 127.255.255.255 — localhost AKA loopback the machine we’re on typically only 127.0.0.1 is used 192.168.0.0–192.168.255.255 and 10.0.0.0–10.255.255.255 and 172.16.0.0–172.31.255.255 “private” IP addresses not used on the Internet also 100.64.0.0–100.127.255.255 (but with restrictions) 169.254.0.0-169.254.255.255 link-local addresses — ‘never’ forwarded by routers 34 commonly connected to Internet with network address translation

  39. network address translation IPv4 addresses are kinda scarce solution: convert many private addrs. to one public addr. locally: use private IP addresses for machines outside: private IP addresses become a single public one commonly how home networks work (and some ISPs) 35

  40. IPv6 addresses IPv6 like IPv4, but with 128-bit numbers written in hex, 16-bit parts, seperated by colons ( : ) strings of 0s represented by double-colons ( :: ) no need for address translation? 2607:f8b0:400d:c00::6a = 2607:f8b0:400d:0c00:0000:0000:0000:006a 2607f8b0400d0c0000000000000006a SIXTEEN 36 typically given to users in blocks of 2 80 or 2 64 addresses

  41. selected special IPv6 addresses ::1 = localhost anything starting with fe80 = link-local addresses never forwarded by routers 37

  42. IPv4 addresses and routing tables … network 3 anything else … … network 2 64.8.0.0–64.15.255.255 network 2 4.0.0.0–7.255.255.255 … router network 1 192.107.102.0–192.107.102.255 network 1 128.143.0.0—128.143.255.255 send it to… if I receive data for… network 3 network 2 network 1 38

  43. connection missing pieces? how to specify the machine? multiple programs on one machine? who gets the message? 39

  44. port numbers we run multiple programs on a machine IP addresses identifying machine — not enough so, add 16-bit port numbers think: multiple PO boxes at address 0–49151: typically assigned for particular services 80 = http, 443 = https, 22 = ssh, … 49152–65535: allocated on demand default “return address” for client connecting to server 40

  45. port numbers we run multiple programs on a machine IP addresses identifying machine — not enough so, add 16-bit port numbers think: multiple PO boxes at address 0–49151: typically assigned for particular services 80 = http, 443 = https, 22 = ssh, … 49152–65535: allocated on demand default “return address” for client connecting to server 40

  46. port numbers we run multiple programs on a machine IP addresses identifying machine — not enough so, add 16-bit port numbers think: multiple PO boxes at address 0–49151: typically assigned for particular services 80 = http, 443 = https, 22 = ssh, … 49152–65535: allocated on demand default “return address” for client connecting to server 40

  47. protocols protocol = agreement on how to comunicate syntax (format of messages, etc.) e.g. mailbox model: where does address go? e.g. connection: where does return address go? semantics (meaning of messages — actions to take, etc.) e.g. connection: when to consider connection created? 41

  48. human protocol: telephone caller: pick up phone caller: check for service caller: dial caller: wait for ringing callee: “Hello?” caller: “Hi, it’s Casey…” callee: “Hi, so how about …” caller: “Sure, …” … … callee: “Bye!” caller: “Bye!” hang up hang up 42

  49. layered protocols IP: protocol for sending data by IP addresses mailbox model limited message size UDP: send datagrams built on IP still mailbox model, but with port numbers TCP: reliable connections built on IP adds port numbers adds resending data if error occurs splits big amounts of data into many messages HTTP: protocol for sending fjles, etc. built on TCP 43

  50. other notable protocols (transport layer) TLS: Transport Layer Security — built on TCP like TCP, but adds encryption + authentication SSH: secure shell (remote login) — built on TCP SCP/SFTP: secure copy/secure fjle transfer — built on SSH HTTPS: HTTP, but over TLS instead of TCP FTP: fjle transfer protocol … 44

  51. sockets socket: POSIX abstraction of network I/O queue any kind of network can also be used between processes on same machine 45 a kind of fjle descriptor

  52. connected sockets sockets can represent a connection client server (setup connection / get fd s) write(fd, buffer, size) read(fd, buffer, size) write(fd, buffer, size) read(fd, buffer, size) 46 act like bidirectional pipe

  53. echo client/server void server_for_connection( int socket_fd) { } } if (read_count != write_count) {...error?...} write_count = write(socket_fd, request_buf, read_count); if (read_count <= 0) return ; // error or EOF read_count = read(socket_fd, request_buf, MAX_SIZE); while (1) { int read_count, write_count; char request_buf[MAX_SIZE]; } void client_for_connection( int socket_fd) { } write(STDOUT_FILENO, recv_buf, n); if (n <= 0) return ; // error or EOF n = read(socket_fd, recv_buf, MAX_SIZE); if (n != strlen(send_buf)) {...error?...} n = write(socket_fd, send_buf, strlen(send_buf)); while (prompt_for_input(send_buf, MAX_SIZE)) { int n; char send_buf[MAX_SIZE]; char recv_buf[MAX_SIZE]; 47

  54. echo client/server void server_for_connection( int socket_fd) { } } if (read_count != write_count) {...error?...} write_count = write(socket_fd, request_buf, read_count); if (read_count <= 0) return ; // error or EOF read_count = read(socket_fd, request_buf, MAX_SIZE); while (1) { int read_count, write_count; char request_buf[MAX_SIZE]; } void client_for_connection( int socket_fd) { } write(STDOUT_FILENO, recv_buf, n); if (n <= 0) return ; // error or EOF n = read(socket_fd, recv_buf, MAX_SIZE); if (n != strlen(send_buf)) {...error?...} while (prompt_for_input(send_buf, MAX_SIZE)) { int n; char send_buf[MAX_SIZE]; char recv_buf[MAX_SIZE]; 47 n = write(socket_fd, send_buf, strlen(send_buf));

  55. echo client/server void server_for_connection( int socket_fd) { } } if (read_count != write_count) {...error?...} write_count = write(socket_fd, request_buf, read_count); if (read_count <= 0) return ; // error or EOF read_count = read(socket_fd, request_buf, MAX_SIZE); while (1) { int read_count, write_count; char request_buf[MAX_SIZE]; } void client_for_connection( int socket_fd) { } write(STDOUT_FILENO, recv_buf, n); if (n <= 0) return ; // error or EOF if (n != strlen(send_buf)) {...error?...} n = write(socket_fd, send_buf, strlen(send_buf)); while (prompt_for_input(send_buf, MAX_SIZE)) { int n; char send_buf[MAX_SIZE]; char recv_buf[MAX_SIZE]; 47 n = read(socket_fd, recv_buf, MAX_SIZE);

  56. client: connect(fd, addr, …) request connection sockets and server sockets fd = socket(…) connection fd = accept(ss_fd, …) server: can only accept() — create normal socket still has a fjle descriptor, but … listen() — turn socket into server socket socket() function — create socket fd client: socket listen(ss_fd, …) bind(ss_fd, addr, …) … ss_fd = socket(…) server: server socket socket server client 48

  57. client: connect(fd, addr, …) request connection sockets and server sockets fd = socket(…) connection fd = accept(ss_fd, …) server: can only accept() — create normal socket still has a fjle descriptor, but … listen() — turn socket into server socket socket() function — create socket fd client: socket listen(ss_fd, …) bind(ss_fd, addr, …) … ss_fd = socket(…) server: server socket socket server client 48

  58. client: connect(fd, addr, …) request connection sockets and server sockets fd = socket(…) connection fd = accept(ss_fd, …) server: can only accept() — create normal socket still has a fjle descriptor, but … listen() — turn socket into server socket socket() function — create socket fd client: socket listen(ss_fd, …) bind(ss_fd, addr, …) … ss_fd = socket(…) server: server socket socket server client 48

  59. sockets and server sockets listen(ss_fd, …) connection server: can only accept() — create normal socket still has a fjle descriptor, but … listen() — turn socket into server socket socket() function — create socket fd fd = socket(…) socket client: bind(ss_fd, addr, …) … ss_fd = socket(…) server: server socket socket server client 48 client: connect(fd, addr, …) request connection fd = accept(ss_fd, …)

  60. sockets and server sockets listen(ss_fd, …) connection server: can only accept() — create normal socket still has a fjle descriptor, but … listen() — turn socket into server socket socket() function — create socket fd fd = socket(…) socket client: bind(ss_fd, addr, …) … ss_fd = socket(…) server: server socket socket server client 48 client: connect(fd, addr, …) request connection fd = accept(ss_fd, …)

  61. connections in TCP/IP on network: connection identifjed by 5-tuple used by OS to lookup “where is the fjle descriptor?” (protocol=TCP, local IP addr., local port, remote IP addr., remote port) both ends always have an address+port what is the IP address, port number? set with bind() function typically always done for servers, not done for clients system will choose default if you don’t 49

  62. connections on my desktop 128.143.67.236:63439 tcp TIME_WAIT 128.143.67.236:111 0 128.143.67.91:50236 0 tcp TIME_WAIT 0 128.143.67.91:49302 0 128.143.67.91:22 0 tcp TIME_WAIT 128.143.67.236:111 0 128.143.67.91:54098 0 tcp 0 172.27.98.20:49566 128.143.67.236:2049 1 2 7 . 0 . 0 . 1 : 6 3 1 ESTABLISHED 12 7.0.0.1:5043 8 1 2 7 . 0 . 0 . 1 : 6 3 1 0 0 tcp ESTABLISHED 0 127 .0.0 .1:5 0438 ESTABLISHED 0 tcp TIME_WAIT 128.143.67.236:111 0 128.143.67.91:51000 0 tcp TIME_WAIT 0 128.143.67.91:40664 50 ESTABLISHED tcp ESTABLISHED 128.143.67.236:2049 0 128.143.67.91:803 0 tcp 128.143.63.34:22 0 0 128.143.67.91:49202 0 tcp State Foreign Address Active Internet connections ( w / o servers ) : 0 0 128.143.67.91:50292 128.143.67.226:22 TIME_WAIT tcp TIME_WAIT 128.143.67.236:63439 0 128.143.67.91:732 0 tcp TIME_WAIT 128.143.67.236:111 0 128.143.67.91:52002 0 tcp TIME_WAIT 128.143.67.236:2049 0 128.143.67.91:54722 0 tcp cr4bd@reiss − t3620 / zf14 / cr4bd ; netstat −− inet −− inet6 −− numeric Proto Recv − Q Send − Q Local Address

  63. real world? varies between protocols client/server fmow (one connection at a time) create server socket client/server takes turns client writes fjrst shown here: close connection socket write response to connection socket read request from connection socket (get connection socket) accept a new connection start listening for connections bind to host:port close socket create+confjgure read response write request (gets assigned local host:port) connect socket to server hostname:port create client socket close connection communicate sockets (fd’s) of connection setup pair server socket 51

  64. real world? varies between protocols client/server fmow (one connection at a time) create server socket client/server takes turns client writes fjrst shown here: close connection socket write response to connection socket read request from connection socket (get connection socket) accept a new connection start listening for connections bind to host:port close socket create+confjgure read response write request (gets assigned local host:port) connect socket to server hostname:port create client socket close connection communicate sockets (fd’s) of connection setup pair server socket 51

  65. real world? varies between protocols client/server fmow (one connection at a time) create server socket client/server takes turns client writes fjrst shown here: close connection socket write response to connection socket read request from connection socket (get connection socket) accept a new connection start listening for connections bind to host:port close socket create+confjgure read response write request (gets assigned local host:port) connect socket to server hostname:port create client socket close connection communicate sockets (fd’s) of connection setup pair server socket 51

  66. real world? varies between protocols client/server fmow (one connection at a time) create server socket client/server takes turns client writes fjrst shown here: close connection socket write response to connection socket read request from connection socket (get connection socket) accept a new connection start listening for connections bind to host:port close socket create+confjgure read response write request (gets assigned local host:port) connect socket to server hostname:port create client socket close connection communicate sockets (fd’s) of connection setup pair server socket 51

  67. client/server fmow (one connection at a time) create server socket client/server takes turns client writes fjrst shown here: close connection socket write response to connection socket read request from connection socket (get connection socket) accept a new connection start listening for connections bind to host:port close socket create+confjgure read response write request (gets assigned local host:port) connect socket to server hostname:port create client socket close connection communicate sockets (fd’s) of connection setup pair server socket 51 real world? varies between protocols

  68. real world? varies between protocols client/server fmow (one connection at a time) create server socket client/server takes turns client writes fjrst shown here: close connection socket write response to connection socket read request from connection socket (get connection socket) accept a new connection start listening for connections bind to host:port close socket create+confjgure read response write request (gets assigned local host:port) connect socket to server hostname:port create client socket close connection communicate sockets (fd’s) of connection setup pair server socket 51

  69. real world? varies between protocols client/server fmow (one connection at a time) create server socket client/server takes turns client writes fjrst shown here: close connection socket write response to connection socket read request from connection socket (get connection socket) accept a new connection start listening for connections bind to host:port close socket create+confjgure read response write request (gets assigned local host:port) connect socket to server hostname:port create client socket close connection communicate sockets (fd’s) of connection setup pair server socket 51

  70. client/server fmow (multiple connections) bind to host:port close connection socket write response to connection socket read request from connection socket (get connection socket) accept a new connection start listening for connections create server socket spawn new process (fork) close socket read response write request (gets assigned local host:port) connect socket to server hostname:port create client socket or thread per connection 52

  71. backup slides 53

  72. the xv6 journal transaction ready for next transaction 4 clear log header ) (if number of blocks redone on recovery 3 write data (commits transaction) 2 write log header 1 write changed blocks start: num blocks = 0 no transaction otherwise: not committed or non- 0 : committed data of number of blocks (one sector) log header xv6 log (one transaction) … non-log block non-log block … … second block (log copy) fjrst block (log copy) … location for second block location for fjrst block 54

  73. the xv6 journal transaction ready for next transaction 4 clear log header ) (if number of blocks redone on recovery 3 write data (commits transaction) 2 write log header 1 write changed blocks start: num blocks = 0 no transaction otherwise: not committed or non- 0 : committed data of number of blocks (one sector) log header xv6 log (one transaction) … non-log block non-log block … … second block (log copy) fjrst block (log copy) … location for second block location for fjrst block 54

  74. the xv6 journal transaction ready for next transaction 4 clear log header ) (if number of blocks redone on recovery 3 write data (commits transaction) 2 write log header 1 write changed blocks start: num blocks = 0 no transaction otherwise: not committed or non- 0 : committed data of number of blocks = 0 (one sector) log header xv6 log (one transaction) … non-log block non-log block … … second block (log copy) fjrst block (log copy) … location for second block location for fjrst block 54

  75. the xv6 journal transaction ready for next transaction 4 clear log header ) (if number of blocks redone on recovery 3 write data (commits transaction) 2 write log header 1 write changed blocks start: num blocks = 0 no transaction otherwise: not committed or non- 0 : committed data of number of blocks = 0 (one sector) log header xv6 log (one transaction) … non-log block non-log block … … second block (log copy) fjrst block (log copy) … location for second block location for fjrst block 54

  76. the xv6 journal transaction ready for next transaction 4 clear log header ) (if number of blocks redone on recovery 3 write data (commits transaction) 2 write log header 1 write changed blocks start: num blocks = 0 no transaction otherwise: not committed or non- 0 : committed data of (one sector) log header xv6 log (one transaction) … non-log block non-log block … … second block (log copy) fjrst block (log copy) … location for second block location for fjrst block 54 number of blocks = N

  77. the xv6 journal transaction ready for next transaction 4 clear log header redone on recovery 3 write data (commits transaction) 2 write log header 1 write changed blocks start: num blocks = 0 no transaction otherwise: not committed or non- 0 : committed data of (one sector) log header xv6 log (one transaction) … non-log block non-log block … … second block (log copy) fjrst block (log copy) … location for second block location for fjrst block 54 number of blocks = N (if number of blocks � = 0 )

  78. the xv6 journal transaction ready for next transaction 4 clear log header redone on recovery 3 write data (commits transaction) 2 write log header 1 write changed blocks start: num blocks = 0 no transaction otherwise: not committed or non- 0 : committed data of (one sector) log header xv6 log (one transaction) … non-log block non-log block … … second block (log copy) fjrst block (log copy) … location for second block location for fjrst block 54 number of blocks = N = 0 (if number of blocks � = 0 )

  79. what is a transaction? so far: each fjle update? faster to do batch of updates together one log write fjnishes lots of things don’t wait to write xv6 solution: combine lots of updates into one transaction only commit when… no active fjle operation, or not enough room left in log for more operations 55

  80. what is a transaction? so far: each fjle update? one log write fjnishes lots of things don’t wait to write xv6 solution: combine lots of updates into one transaction only commit when… no active fjle operation, or not enough room left in log for more operations 55 faster to do batch of updates together

  81. redo logging problems doesn’t the log get infjnitely big? writing everything twice? 56

  82. redo logging problems doesn’t the log get infjnitely big? writing everything twice? 57

  83. limiting log size once transaction is written to real data, can discard sometimes called “garbage collecting” the log may sometimes need to block to free up log space perform logged updates before adding more to log hope: usually log cleanup happens “in the background” 58

  84. redo logging problems doesn’t the log get infjnitely big? writing everything twice? 59

Recommend


More recommend