MEMORY HIERARCHY
RANDOM ACCESS MEMORY Key features RAM is traditionally packaged as a chip. ▸ Basic storage unit is normally a cell (one bit per cell). ▸ Multiple RAM chips form a memory. ▸ RAM comes in two varieties: SRAM (Static RAM) - Very Expensive but fast (~1 nanosecond) ▸ Registers and Caches (1 – 3 MB) ▹ DRAM (Dynamic RAM) – Cheap but slow (~12 ns for DDR3) ▸ Main Memory (4 – 32 GB) ▹ 2
OUTSIDE OF THE CPU CHIP A bus is a collection of parallel wires that carry address, data, and control signals. Buses are typically shared by multiple devices. 3
MEMORY OPERATION CPU places address A on the memory bus. 4
MEMORY OPERATION Main memory reads A from the memory bus, retrieves word x, and places it on the bus. 5
HARD DRIVE STRUCTURE 6
DISK GEOMETRY Disks consist of platters, each with two surfaces. Each surface consists of concentric rings called tracks. Each track consists of sectors separated by gaps. 7
DISK GEOMETRY Aligned tracks form a cylinder. 8
RECORDING ZONES Modern disks partition tracks into disjoint subsets called recording zones. Each track in a zone has the same ▸ number of sectors, determined by the circumference of innermost track. Each zone has a different number ▸ of sectors/track, outer zones have more sectors/track than inner zones. So we use average number of ▸ sectors/track when computing capacity. 9
COMPUTING DISK CAPACITY Capacity: maximum number of bits that can be stored. Capacity = (# bytes/sector) x (avg. # sectors/track) x (# tracks/surface) x (# surfaces/platter) x (# platters/disk) Example: 512 bytes/sector ▸ 300 sectors/track (on average) 20,000 tracks/surface ▸ 2 surfaces/platter ▸ 5 platters/disk ▸ Capacity = 512 x 300 x 20000 x 2 x 5 = 30,720,000,000 = 30.72 GB 10
DISK OPERATION 11
DISK OPERATION 12
DISK ACCESS Surface organized into tracks ▸ Tracks divided into sectors ▸ A sector is the minimum unit that ▸ can be accessed by Head in position above a track 13
DISK ACCESS 14
DISK ACCESS 15
DISK ACCESS 16
DISK ACCESS 17
DISK ACCESS 18
DISK ACCESS 19
DISK ACCESS 20
DISK ACCESS 21
DISK ACCESS TIME Average time to access some target sector approximated by: T access = T avg seek + T avg rotation + T avg transfer ▸ Seek Time (T avg seek ) Time to position heads over cylinder containing target sector. ▸ Typical T avg seek is 3~9 ms ▸ Rotational Latency (T avg rotation ) Time waiting for first bit of target sector to pass under r/w head. ▸ T avg rotation = 1/2 x 1/RPMs x 60 sec/1 min ▸ Typical T avg rotation = 7200 RPMs ▸ Transfer Time (T avg transfer ) Time to read the bits in the target sector. ▸ T avg transfer = 1/RPM x 1/(avg # sectors/track) x 60 secs/1 min. ▸ 22
EXAMPLE A Hard Disk has the following parameters: Rotational rate = 7,200 RPM ▸ Average seek time = 9 ms ▸ Avg # sectors/track = 400 ▸ Compute the disk access time to read one sector. Answer: T avg rotation = 1/2 x (60 secs/7200 RPM) x 1000 ms/sec = 4 ms ▸ ▸ T avg transfer = 60/7200 RPM x 1/400 secs/track x 1000 ms/sec = 0.02 ms ▸ T access = 9 ms + 4 ms + 0.02 ms 23
DISK ACCESS ISSUES Access time dominated by seek time and rotational latency. First bit in a sector is the most expensive, the rest are (essentially) free. SRAM access time is about 4 ns/doubleword, DRAM about 60 ns Disk is about 40,000 times slower than SRAM ▸ 2,500 times slower then DRAM. ▸ 24
LOGICAL BLOCK ADDRESSING Modern disks present a simpler abstract view of the complex sector geometry: The set of available sectors is modeled as a sequence of b-sized logical ▸ blocks (0, 1, 2, ...) Mapping between logical blocks and actual (physical) sectors Maintained by hardware/firmware device called disk controller ▸ Converts requests for logical blocks into (surface, track, sector) triples ▸ 25
SOLID STATE DISKS (SSD) Pages: 512KB to 4KB, Blocks: 32 to 128 pages ▸ Data read/written in units of pages ▸ Page can be written only after its block has been erased ▸ A block wears out after about 100,000 repeated writes ▸ 26
SSD PERFORMANCE Sequential Read Throughput 550 MB/s Sequential Write Throughput 470 MB/s Random Read Throughput 365 MB/s Random Write Throughput 303 MB/s Average Sequential Average Sequential 50 us 60 us Read Time Write Time Sequential access faster than random access Common theme in the memory hierarchy ▸ Random writes are somewhat slower Erasing a block takes a long time (~1 ms) ▸ Modifying a block page requires all other pages to be copied to new block ▸ In earlier SSDs, the read/write gap was much larger. ▸ 27
SSD TRADEOFFS Advantages No moving parts ▸ Faster ▸ Less power ▸ Disadvantages Have the potential to wear out ▸ Mitigated by “wear leveling logic” in flash translation layer ▸ E.g. Intel SSD 730 guarantees 128 petabyte (128 x 10 15 bytes) of ▸ writes before they wear out In 2015, about 30 times more expensive per byte ▸ Due to cost and manufacturing limitations SSDs will not replace Hard Disk Drives entirely in the foreseeable future! 28
THE CPU-MEMORY GAP 29
LOCALITY PRINCIPLE Programs tend to use data and instructions with addresses near or equal to those they have used recently Temporal locality: Recently referenced items are likely to ▸ be referenced again in the near future Spatial locality: Items with nearby addresses tend to be ▸ referenced close together in time 30
LOCALITY PRINCIPLE sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum; Data references Reference array elements in succession (Spatial Locality) ▸ Reference variable sum each iteration (Temporal Locality) ▸ Instruction references Reference instructions in sequence (Spatial Locality) ▸ Cycle through loop repeatedly (Temporal Locality) ▸ 31
CACHE INTRODUCTION Caches exploit the locality principle A cache is a small amount of fast, expensive memory Goes between the processor and the slower, dynamic ▸ main memory Keeps a copy of the most frequently used data from ▸ the main memory Memory access speed increases overall, because we’ve made the common case faster Reads and writes to the most frequently used ▸ addresses will be serviced by the cache We only need to access the slower main memory for ▸ less frequently used data 32
EXAMPLE MEMORY HIERARCHY 33
EXAMPLE MEMORY HIERARCHY 34
CACHE CONCEPTS 35
CACHE CONCEPTS: HIT 36
CACHE CONCEPTS: MISS 37
MEASURING CACHE PERFORMANCE The hit time is how long it takes data to be sent from the cache to the processor. This is usually fast, on the order of 1-3 clock cycles. The miss penalty is the time to copy data from main memory to the cache. This often requires dozens of clock cycles (at least). Multiple Caches organized in levels to reduce miss penalty (Memory ▸ Hierarchy) The miss rate is the percentage of misses. Typical caches have a hit rate of 95% or higher ▸ 38
AVERAGE MEMORY ACCESS TIME (AMAT) The average memory access time, or AMAT, can then be computed as follows: AMAT = Hit time + (Miss rate x Miss penalty) How can we improve the average memory access time of a system? Obviously, a lower AMAT is better ▸ Miss penalties are usually much greater than hit times, so the best way to ▸ lower AMAT is to reduce the miss penalty or the miss rate However, AMAT should only be used as a general guideline. Execution time is still the best performance metric. 39
AMAT EXAMPLE Computer X has one cache (L1) with hit time of 1 cycle. The cache hit ratio is 97% and the hit time is one cycle. Computer X has DRAM with access time of 20 cycles. What is the Average Memory Access Time? AMAT = 1 cycle + (1 - 0.97) * 20 cycles AMAT = 1.6 cycles 40
Recommend
More recommend