optimistic crash consistency
play

Optimistic Crash Consistency Vijay Chidambaram Thanumalayan - PowerPoint PPT Presentation

Optimistic Crash Consistency Vijay Chidambaram Thanumalayan Sankaranarayana Pillai Andrea Arpaci-Dusseau Remzi Arpaci-Dusseau Crash Consistency Problem Single file-system operation updates multiple on-disk data structures System may crash


  1. Optimistic Crash Consistency Vijay Chidambaram Thanumalayan Sankaranarayana Pillai Andrea Arpaci-Dusseau Remzi Arpaci-Dusseau

  2. Crash Consistency Problem Single file-system operation updates multiple on-disk data structures System may crash in middle of updates File-system is partially (incorrectly) updated SOSP 13 2

  3. Performance OR Consistency Crash-consistency solutions degrade performance Users forced to choose between high performance and strong consistency - Performance differs by 10x for some workloads Many users choose performance - ext3 default configuration did not guarantee crash consistency for many years - Mac OSX fsync() does not ensure data is safe “ The Fast drives out the Slow even if the Fast is wrong ” - Kahan SOSP 13 3

  4. Ordering and Durability Crash consistency is built upon ordered writes File systems conflate ordering and durability - Ideal: {A, B} -> {C} (made durable later) - Current scenario • {A, B} durable • {C} durable Inefficient when only ordering is required SOSP 13 4

  5. Can a file system provide both high performance and strong consistency? Is there a middle ground between: high performance but no consistency strong consistency but low performance? SOSP 13 5

  6. Our solution Optimistic File System (OptFS) Journaling file system that provides performance and consistency by decoupling ordering and durability Such decoupling allows OptFS to trade freshness for performance while maintaining crash consistency SOSP 13 6

  7. Results Techniques: checksums, delayed writes, etc. OptFS provides strong consistency - Equivalent to ext4 data journaling OptFS improves performance significantly - 10x better than ext4 on some workloads New primitive osync() provides ordering among writes at high performance SOSP 13 7

  8. Outline Introduction Ordering and Durability in Journaling Optimistic File System Results Conclusion SOSP 13 8

  9. Outline Introduction Ordering and Durability in Journaling - Journaling Overview - Realizing Ordering on Disks - Journaling without Ordering Optimistic File System Results Conclusion SOSP 13 9

  10. Journaling Overview Before updating file system, write note describing update Make sure note is safely on disk Once note is safe, update file system - If interrupted, read note and redo updates SOSP 13 10

  11. Journaling Overview Workload: Creating and writing to a file Journaling protocol (ordered journaling) DATA METADATA APPLICATION FILE SYSTEM DISK SOSP 13 11 Journal

  12. Journaling Overview Workload: Creating and writing to a file Journaling protocol (ordered journaling) - Data write (D) DATA METADATA APPLICATION FILE SYSTEM D DISK SOSP 13 11 Journal

  13. Journaling Overview Workload: Creating and writing to a file Journaling protocol (ordered journaling) - Data write (D) DATA METADATA APPLICATION FILE SYSTEM DISK D SOSP 13 11 Journal

  14. Journaling Overview Workload: Creating and writing to a file Journaling protocol (ordered journaling) - Data write (D) - Logging Metadata (J M ) DATA METADATA APPLICATION FILE SYSTEM J M DISK D SOSP 13 11 Journal

  15. Journaling Overview Workload: Creating and writing to a file Journaling protocol (ordered journaling) - Data write (D) - Logging Metadata (J M ) DATA METADATA APPLICATION FILE SYSTEM DISK D J M SOSP 13 11 Journal

  16. Journaling Overview Workload: Creating and writing to a file Journaling protocol (ordered journaling) - Data write (D) - Logging Metadata (J M ) - Logging Commit (J C ) DATA METADATA APPLICATION FILE SYSTEM J C DISK D J M SOSP 13 11 Journal

  17. Journaling Overview Workload: Creating and writing to a file Journaling protocol (ordered journaling) - Data write (D) - Logging Metadata (J M ) - Logging Commit (J C ) DATA METADATA APPLICATION FILE SYSTEM DISK D J M J C SOSP 13 11 Journal

  18. Journaling Overview Workload: Creating and writing to a file Journaling protocol (ordered journaling) - Data write (D) - Logging Metadata (J M ) - Logging Commit (J C ) - Checkpointing (M) DATA METADATA APPLICATION FILE SYSTEM M DISK D J M J C SOSP 13 11 Journal

  19. Journaling Overview Workload: Creating and writing to a file Journaling protocol (ordered journaling) - Data write (D) - Logging Metadata (J M ) - Logging Commit (J C ) - Checkpointing (M) DATA METADATA APPLICATION FILE SYSTEM DISK D M J M J C SOSP 13 11 Journal

  20. Outline Introduction Ordering and Durability in Journaling - Journaling Overview - Realizing Ordering on Disks - Journaling without Ordering Optimistic File System Results Conclusion SOSP 13 12

  21. How Writes are Ordered Original Disks with Disks Write Buffers A B A B A Flush B Disk B A B Cache Disk Disk A A B Platter SOSP 13 13

  22. Journaling with Flushes Journaling protocol - Data write (D) DATA METADATA APPLICATION FILE SYSTEM DISK CACHE DISK PLATTER Journal SOSP 13 14

  23. Journaling with Flushes Journaling protocol - Data write (D) DATA METADATA APPLICATION FILE SYSTEM D DISK CACHE DISK PLATTER Journal SOSP 13 14

  24. Journaling with Flushes Journaling protocol - Data write (D) DATA METADATA APPLICATION FILE SYSTEM DISK CACHE D DISK PLATTER Journal SOSP 13 14

  25. Journaling with Flushes Journaling protocol - Data write (D) - Logging Metadata (J M ) DATA METADATA APPLICATION FILE SYSTEM J M DISK CACHE D DISK PLATTER Journal SOSP 13 14

  26. Journaling with Flushes Journaling protocol - Data write (D) - Logging Metadata (J M ) DATA METADATA APPLICATION FILE SYSTEM DISK CACHE D J M DISK PLATTER Journal SOSP 13 14

  27. Journaling with Flushes Journaling protocol - Data write (D) - Logging Metadata (J M ) DATA METADATA APPLICATION FILE SYSTEM DISK CACHE D J M FLUSH DISK PLATTER Journal SOSP 13 14

  28. Journaling with Flushes Journaling protocol - Data write (D) - Logging Metadata (J M ) DATA METADATA APPLICATION FILE SYSTEM DISK CACHE FLUSH DISK PLATTER D J M Journal SOSP 13 14

  29. Journaling with Flushes Journaling protocol - Data write (D) - Logging Metadata (J M ) - Logging Commit (J C ) DATA METADATA APPLICATION FILE SYSTEM J C DISK CACHE FLUSH DISK PLATTER D J M Journal SOSP 13 14

  30. Journaling with Flushes Journaling protocol - Data write (D) - Logging Metadata (J M ) - Logging Commit (J C ) DATA METADATA APPLICATION FILE SYSTEM DISK CACHE J C FLUSH DISK PLATTER D J M Journal SOSP 13 14

  31. Journaling with Flushes Journaling protocol - Data write (D) - Logging Metadata (J M ) - Logging Commit (J C ) DATA METADATA APPLICATION FILE SYSTEM DISK CACHE J C FLUSH FLUSH DISK PLATTER D J M Journal SOSP 13 14

  32. Journaling with Flushes Journaling protocol - Data write (D) - Logging Metadata (J M ) - Logging Commit (J C ) DATA METADATA APPLICATION FILE SYSTEM DISK CACHE FLUSH FLUSH DISK PLATTER D J M J C Journal SOSP 13 14

  33. Journaling with Flushes Journaling protocol - Data write (D) - Logging Metadata (J M ) - Logging Commit (J C ) - Checkpointing (M) DATA METADATA APPLICATION FILE SYSTEM M DISK CACHE FLUSH FLUSH DISK PLATTER D J M J C Journal SOSP 13 14

  34. Outline Introduction Ordering and Durability in Journaling - Journaling Overview - Realizing Ordering on Disks - Journaling without Ordering Optimistic File System Results Conclusion SOSP 13 15

  35. Journaling without Ordering Practitioners turn off flushes due to performance degradation - Ex: ext3 by default did not enable flushes for many years Observe crashes do not cause inconsistency for some workloads We term this probabilistic crash consistency - Studied in detail SOSP 13 16

  36. Journaling without Ordering DATA METADATA APPLICATION FILE SYSTEM D M J M J C DISK CACHE FLUSH FLUSH DISK PLATTER Journal SOSP 13 17

  37. Journaling without Ordering DATA METADATA APPLICATION FILE SYSTEM D M J M J C DISK CACHE DISK PLATTER Journal SOSP 13 17

  38. Journaling without Ordering Without flushes, blocks may be reordered DATA METADATA APPLICATION FILE SYSTEM DISK CACHE D M J M J C DISK PLATTER Journal SOSP 13 17

  39. Journaling without Ordering Without flushes, blocks may be reordered - Ex: J C and J M written first as disk head near journal DATA METADATA APPLICATION FILE SYSTEM DISK CACHE D M DISK PLATTER J M J C Journal SOSP 13 17

  40. Journaling without Ordering Without flushes, blocks may be reordered - Ex: J C and J M written first as disk head near journal DATA METADATA APPLICATION FILE SYSTEM DISK CACHE DISK PLATTER D M J M J C Journal SOSP 13 17

  41. Probabilistic Crash Consistency D M J M J C MEMORY Time DISK SOSP 13 18

  42. Probabilistic Crash Consistency D M J M J C MEMORY Time J C DISK SOSP 13 18

  43. Probabilistic Crash Consistency D M J M J C MEMORY Time D M J C J M DISK SOSP 13 18

  44. Probabilistic Crash Consistency Re-ordering leads to windows of vulnerability D M J M J C MEMORY Time D M J C J M DISK Window Total I/O Time P-inconsistency = Time in window(s) / Total I/O Time SOSP 13 18

Recommend


More recommend