Data Management Systems • Storage Management • The Memory hierarchy • Memory hierarchy • Capacity and latencies • Segments and file storage • Locality and replacement policies • Database buffer cache • Hardware evolution • Storage techniques in context Gustavo Alonso Institute of Computing Platforms Department of Computer Science ETH Zürich Storage - Memory Hierarchy 1
In an ideal world … The database should have an unlimited amount of memory with plenty of bandwidth for sequential and concurrent access, very low latencies for random accesses, persistent over time, and at a low cost instead Databases provide the illusion of large memory capacity and try to hide the performance problems created by implementing all those desirable properties through complex architectures and optimizations Storage - Memory Hierarchy 2
The memory wall • Main memory suffers from several issues: • There is never enough of it (application growth) • Memory outside the CPU chip (DRAM) is much slower than memory located in the CPU => memory wall • Processor-memory gap: processor speeds increased much faster than memory speeds • Price becomes a problem in the context of data management (DRAM is expensive) • Main memory is not persistent • Over time, a complex hierarchy evolved trying to address all these issues Storage - Memory Hierarchy 3
CPU Registers Caches Main memory (DRAM) External storage (local persistent storage) External storage (remote persistent storage) Archive storage Storage - Memory Hierarchy 4
Looking at the memory hierarchy • The memory hierarchy is a rather complex construct affected by many parameters • Capacity • Cost • Latency • Bandwidth • It keeps evolving as the parameters of each component change over time • It keeps evolving as new technology becomes available • Disclaimer: numbers provided as a reference (they vary a lot) Storage - Memory Hierarchy 5
64-bit architecture Capacity 16x64b general purpose 32x512b AVX CPU Registers L1i 32K, L1d 32K, L2 256K - 1MB, L3 8MB - 45MB Caches 1 to 1000 GB Main memory (DRAM) Few Terabytes External storage (local persistent storage) Many Terabytes External storage (remote persistent storage) Petabytes Archive storage Storage - Memory Hierarchy 6
Latency Sub-nanosecond (1 cycle) CPU Registers L1 0.5-1 ns, L2 4-8 ns, L3 15-30 ns Caches 100 ns Main memory (DRAM) Microseconds (SSD) Milliseconds (HDD) External storage (local persistent storage) Milliseconds External storage (remote persistent storage) Seconds, minutes Archive storage Storage - Memory Hierarchy 7
Access Sub-nanosecond (1 cycle) CPU Registers Caches Byte addressable Random access Main memory (DRAM) External storage (local persistent storage) Block addressable Sequential access External storage (remote persistent storage) Archive storage Storage - Memory Hierarchy 8
What does this all mean? • The performance gaps between layers is huge (difficult to imagine at human scales) • We process an increasing amount of data, resulting in even more pressure on the memory system • Data movement is one of the major sources of energy consumption and inefficiencies in modern computers (and data centers) • Performance and efficiency largely determined by how well the database manages the movement of data across the hierarchy Storage - Memory Hierarchy 9
Locality (spatial and temporal) SELECT * FROM T WHERE X > 10 • The unit of transfer between layers SELECT * FROM T SELECT * FROM T in the memory hierarchy is typically WHERE Y = 20 fixed • To improve performance, it is A B C important to exploit D E • Spatial locality (put together what belongs together) • Temporal locality (do at the same time things that require the same data) • Managing the hierarchy amounts to A B C Transfer unit improving spatial and temporal D E locality Storage - Memory Hierarchy 10
What needs to be done? • Enhance temporal and spatial locality (data organization, query scheduling) • Make sure the data is available a the layer where it is needed to hide the latency caused by getting data from lower layer (pre-fetching) • Be clever about what to keep at each layer (caching strategies, replacement strategies) • Keep track of modifications and write back to the lower layers (all the way to persistent storage) when needed Storage - Memory Hierarchy 11
Reality is complex and getting even more so • Managing the memory hierarchy was never easy • No perfect solution • Workload dependent • Many compromises needed • Problem is becoming far more involved due to architectural developments • Multicore and NUMA • Non-Volatile Memory • Cloud computing and economies of scale • Network attached storage • Hardware Acceleration Storage - Memory Hierarchy 12
Multicore and NUMA AMD Bulldozer Storage - Memory Hierarchy 13
Non-Volatile Memory (NVM) Sub-nanosecond (1 cycle) CPU Registers Non-Volatile memory is a new form of memory combining Caches characteristics of DRAM and persistent storage: Main memory (DRAM) • Cheaper than DRAM • Byte addressable • Random access NVM External storage (local persistent storage) • Persistent • Faster than disks • Can be used as External storage (remote persistent storage) • Memory • Local disk • Network attached Archive storage Storage - Memory Hierarchy 14
Cloud computing • The ephemeral nature of the computing infrastructure forces a Compute layer separation of compute and storage. • Gives more flexibility to the cloud provider Network • Has changed the nature of “disk” and “storage” in fundamental ways Storage layer • Crucial for cloud native databases Storage - Memory Hierarchy 15
Network attached storage • The bandwidth and latencies of storage devices are not very high • Motivated by cloud designs, networks are becoming faster and have more bandwidth • Round trip time in a data center is less than a seek operation on a HDD • RDMA (Remote Direct Memory Access) reduces latencies by removing OS related inefficiencies • Eventually it might be faster to get data from the memory of a remote machine or remote storage device than from a local disk. Storage - Memory Hierarchy 16
Hardware Acceleration Oracle M7 SPARC processor Storage - Memory Hierarchy 17
Summary • Dealing with the memory hierarchy is a key aspect of the architecture of data management systems • Very old problem, still relevant • Many fundamental concepts still applicable today due to the way systems are evolving Storage - Memory Hierarchy 18
Recommend
More recommend