Outline Layered course overview Final exam and other logistics Post midterm 2 topics: caches CSci 2021: Final Exam Review Lecture Post midterm 2 topics: memory Stephen McCamant University of Minnesota, Computer Science & Engineering Post midterm 2 topics: optimization Post midterm 2 topics: allocation Post midterm 2 topics: linking Abstraction layers (in one slide) Implementing high-level code (1) CSci 1133, 1933, etc. Machine-level code representation C Optimi- Machine Code Instructions, operands, flags Memory Linking zation Allocators (Ch. 3, 8) (Ch. 7) Branches and loops (Ch. 5) x86-64 (Ch. 9) Procedures and calling conventions Caches Data (Ch. 2) Virtual Arrays, structs, unions (Ch. 6) Representation Memory Buffer overflow attacks Y86-64 Code optimization CPU architecture (Ch. 4) Machine-independent techniques HCL Logic design Instruction-level parallelism (Electrical Engineering) (Ch. 4) Implementing high-level code (2) What hardware does Number representation Bits and bitwise operators Linking Unsigned and signed integers Floating point numbers Symbols, local and global Memory hierarchy and caches Libraries and static linking Disk and memory technologies Dynamic memory allocation Locality and how to use it Heap layout and algorithms Cache parameters and operation Garbage collection Optimizing cache usage C memory-usage mistakes Virtual memory Page tables and TLBs Memory permissions and sharing
Building hardware Outline Layered course overview Logic design Final exam and other logistics Boolean functions and combinational circuits Post midterm 2 topics: caches Registers and sequential circuits CPU architecture Post midterm 2 topics: memory Y86-64 instructions Post midterm 2 topics: optimization Control logic and HCL Sequential Y86-64 Post midterm 2 topics: allocation Pipelined Y86-64 Post midterm 2 topics: linking Final exam coordinates Exam rules Wednesday, May 13th (in 8.5 days) Begins promptly at 8:00, ends promptly at 10:00 8:00am - 10:00am (2 hours) Open-book, open-notes, any paper materials OK Change from midterms: electronic resources OK Test on Canvas + Zoom attendance eTextbook, electronic notes, web searches, Longer than midterms, but not twice as long compiler, disassembler But designed not to need them Topic coverage is comprehensive Still no communication with other students Slightly more than 1/3 on topics after midterm 2 Expect questions that integrate ideas allowed during the exam Why are course evaluations important? Outline Layered course overview Help us do a better job next time Final exam and other logistics What worked well, what not so well? Post midterm 2 topics: caches If you were running the course, what activities Post midterm 2 topics: memory would you spend more or less time on? I will read your written comments, after grades Post midterm 2 topics: optimization submitted Post midterm 2 topics: allocation ❤tt♣s✿✴✴srt✳✉♠♥✳❡❞✉✴❜❧✉❡✴ Post midterm 2 topics: linking
RAM technologies Disks and SSDs (Spinning) hard drives SRAM: several (e.g. 6) transistors per bit Highest capacity Faster Random access time limited by seek and rotation More expensive, less dense latencies Used for caches Always read or write an entire sector at a time DRAM: one capacitor and transistor per bit Solid-state (flash) drives Must be periodically refreshed Technology descended from EEPROMs Cheaper, more dense Random-access reads are very fast Slower Can only rewrite by erasing large blocks Used for main memory Random-access writes require recopying, slower Spatial and temporal locality Memory hierarchy Devices have trade-off between access time Spatial locality: memory accesses are close and capacity Differences of many orders of magnitude together in location Best case: sequential accesses Combine small+fast devices with big+slow ones Temporal locality: the same location is accessed in a hierarchy repeatedly close together in time Because of locality, most uses are in small+fast Set of locations being used is called the working set device Because of locality, different locations have very Must move data between levels different chances of being accessed next Keeping a copy at a higher level is called caching First example: caches between CPU core and memory Cache parameters Cache operations: read Data is moved in blocks of size ❇ ❂ ✷ ❜ Use s bits as an index to choose a set Organize cache into ❙ ❂ ✷ s sets of lines Check all lines in the set (hardware: in parallel), A set contains ❊ ❂ ✷ ❡ lines, each of which can to see if any is valid and has a matching tag contain one of the same blocks If yes, it’s a hit : block offset indicates which ❊ ❂ ✶ : direct mapped bytes desired ❊ ❃ ✶ : ❊ -way set associative If not present, it’s a miss ❙ ❂ ✶ : fully associative Fetch data from lower level (e.g., main memory) Total capacity ❈ ❂ ❙ ✁ ❊ ✁ ❇ Insert newly read data, usually evicting another ❜ and s also give a division of addresses into block ♠ ❂ t ✰ s ✰ ❜
Cache operations: write Cache usage optimizations Overall goals: maximize locality, minimize Look for a matching line as for a read working set Use more compact data representations If a hit, update contents of cache block Write-back policy: do not copy to lower levels until Prefer stride-1 data accesses evicted (opposite is write-through) E.g., for a matrix, iterate over indexes in If a miss, the common write-allocate policy outer-to-inner order copies the block into the cache Temporally group accesses to the same data Exploits locality in write-only accesses values For 2-D data, group by blocks (tiles) instead of rows Outline Virtual memory structures Layered course overview Final exam and other logistics Pages are units of data transfer (e.g., 4KB) Can be in RAM or on disk Post midterm 2 topics: caches Page table maps virtual addresses to physical Post midterm 2 topics: memory pages For efficiency, use multiple levels Post midterm 2 topics: optimization A TLB is a cache for page-table entries Post midterm 2 topics: allocation Post midterm 2 topics: linking Virtual memory uses Outline Layered course overview Avoid capacity limits on RAM Final exam and other logistics Cache data from disk for speed Post midterm 2 topics: caches Demand paging of code Implement isolation between processes Post midterm 2 topics: memory Separate page tables Post midterm 2 topics: optimization User/kernel protections Post midterm 2 topics: allocation Share reused data Executable code, shared libraries Post midterm 2 topics: linking
Principles of optimization Machine-independent optimizations Concentrate on the program parts that run the most Move computations out of loops Amdahl’s law bounds possible speedup Avoid abstract functions in time-critical code Array-style programs: concentrate on inner loops Complex programs: use a profiler Use temporary variables to reduce memory Know what the compiler can and can’t do operations Unroll loops to reduce bookkeeping overhead Compiler can be smart, but is careful about correctness Avoid unpredictable branching Functions and pointers (aliasing) block optimization Watch out for algorithmic problems Instruction-level parallelism Exposing loop parallelism To reduce latency, avoid a long critical path Modern processors are super-scalar Functional unit throughput is an ultimate limit Can do more than one thing at once And out-of-order Unroll to allow optimization between iterations In a different sequence than the original instructions Techniques to shorten the critical path: Multiple functional units , each with different Re-associate associative operators throughput and latency Replace a single accumulator with multiple parallel accumulators Outline Implementing ♠❛❧❧♦❝ Layered course overview Final exam and other logistics Data structures to represent the heap Post midterm 2 topics: caches Boundary tags and the implicit list Explicit free list(s) Post midterm 2 topics: memory Algorithms for heap management Post midterm 2 topics: optimization First fit vs. best fit Size segregation Post midterm 2 topics: allocation Post midterm 2 topics: linking
Outline Linking mechanics Layered course overview Symbols include functions and variables Final exam and other logistics Some are file-local, stack variables not even Post midterm 2 topics: caches considered Symbols are resolved to the correct definition Post midterm 2 topics: memory At most one strong definition, or one of many weak Post midterm 2 topics: optimization ones Code is relocated so it runs correctly at its final Post midterm 2 topics: allocation address Post midterm 2 topics: linking
Recommend
More recommend