7b.1 7b.2 Motivation • Organize main memory to – Facilitate byte-addressability EE 457 Unit 7b while maintaining… – Efficient fetching of the words in a cache block • __________________________ helps us achieve this Main Memory Organization 7b.3 7b.4 Interleaving Analogy Interleaving Analogy • Consider a journal consisting of 1000 pages (000-999) bound in • Example: Say article 73 runs from page 730-739 – 10 volumes (0-9) of – In Method I: Article 73 is _______________________ – 100 pages each (00-99) – In Method II: The _____ page of _______ volume form article 73 as shown below Method I Method II • Which do you prefer? (Consecutive pages (Consecutive pages in in a volume) consecutive volumes) – If reading the article you may say method I 000 000 – If you have to make a copy of the article and you have 10 photocopy 001 010 Volume 0 Volume 0 machines with 10 friends to help you might say ____________ … … 099 990 • Back to the scenario of reading the article, given those same 10 friends they could 100 001 _____________________ for you so that you can still read in a continuous manner 101 011 Volume 1 Volume 1 … … 199 991 … … 900 Page 730 is page 73 of volume 0 009 901 Page 731 is page 73 of volume 1 019 Low Order Volume 9 … Volume 9 … Interleaving … 999 999 Page 739 is page 73 of volume 9
7b.5 7b.6 Byte Addressability Byte Addressability 4. Intel 80386: 32-bit addr., 32-bit data, byte A31-A2 1. Intel 8085: 16-bit addr., 8-bit data, A15-A0 1 GB addressable, big-endian proc. byte addressable processor. 64K Memory space: 2 32 = 4GB, A31-A0 Memory space: 2 16 = 64KB, A15-A0, D7-D0 8 8 8 8 [A31-A2, BE3, BE2, BE1, BE0], D31-D0 8 BE_ BE_ BE_ BE_ 2. Intel 8086: 20-bit addr., 16-bit data, byte Byte 40 Byte 41 Byte 42 Byte 43 = Word 40 D[31:24] D[7:0] A19-A1 addressable, little-endian proc. A31-A_ _____ ½ MB ½ MB 5. Little-Endian system, ______________ system: Memory space: 2 20 = 1MB, A19-A0 ½ GB 32-bit addr., 32-bit data, ____ 8 8 [A19-A1, BHE (BE1), A0 (BE0)], D15-D0 byte addressable BHE=0 A0=0 8 8 8 8 8 8 8 8 Byte 41 Byte 40 = Word 40 (Narrow, 32-bit data bus b/w mem. and cache) BE3 BE2 BE1 BE0 BE3 BE2 BE1 BE0 D[15:8] D[7:0] Memory space: 2 32 = 4GB, A31-A0 D[31:24] D[7:0] D[31:24] D[7:0] 3. Intel 80386: 32-bit addr., 32-bit data, XCVR XCVR [A31-A2, BE3, BE2, BE1, BE0], D31-D0 Narrow Bus byte addressable, little-endian proc. A31-A2 1 GB D[31:0] Memory space: 2 32 = 4GB, A31-A0 A3,A2 = __ A3,A2 = __ A3,A2 = __ A3,A2 = __ 6. Same as 5 above, ¼ A31-A_ [A31-A2, BE3, BE2, BE1, BE0], D31-D0 but __________________ 8 8 8 8 GB BE3 BE2 BE1 BE0 XCVR XCVR XCVR XCVR Byte 43 Byte 42 Byte 41 Byte 40 = Word 40 D[31:24] D[7:0] D[31:0] 7b.7 7b.8 2-Way L.O.I. 4-Way L.O.I. System address bus uses • • System address bus – A1:A0 and size info to uses generate /BE3../BE0 Bank 1 Bank 0 (Byte Enables) – A1:A0 and size info to Bank 3 Bank 3 In a 32-bit data bus, we need 2 Bank 2 Bank 2 • A2=1 A2=0 generate /BEi (Byte address bits to produce the 4 A3,A2 = A3,A2 = A3,A2 = A3,A2 = Enables) 11 10 01 00 BE’s A31-A3 A28-A0 A28-A0 A27-0 A27-0 A27-0 A27-0 A31-A4 • In a 64-bit data bus, we would – Lower order bits to ½ GB ¼ need ___ address bits to produce select a “bank” GB ___ BE’s Shift of 3-bits in – Lower order bits to select a – Upper bits connect to 8 8 8 8 8 8 8 8 address XCVR XCVR XCVR XCVR connections BE3 BE2 BE1 BE0 BE3 BE2 BE1 BE0 “bank” each memory chip D[31:24] D[7:0] D[31:24] D[7:0] • Only 1 address bit, A2, to select A2=1 A2=0 XCVR XCVR D[31:0] one of 2 banks Narrow Bus – Upper bits connect to each memory chip D[31:0] • Each memory chip is just a Shift of 4-bits in address collection of ½ GB requiring 29 connections address bits…we can connect appropriate 29 bits
7b.9 7b.10 Organization Options Organization Comparison • Assume following latencies a.) One-word-wide b.) Wide Memory c.) EE 457 Interleaved Send address to MM 1 clock memory Organization Organization MM (DRAM) Access Time 15 clocks CPU CPU CPU Transfer time for one word 1 clock • Find time to access a cache line of 4-words Multiplexer Cache Cache Cache a. Narrow Memory ____________________________ (assume mem. controller will Bus Bus Bus auto-increment address) b. Wide Memory Mem. Mem. Mem. Mem. Memory Memory Bank Bank Bank Bank c. Interleaved Memory 0 1 2 3 7b.11 7b.12 Example Tag RAM Example • Consider a set-associative mapping and physical organization of main Tag RAM Tag RAM memory, cache data RAMs, and cache tag RAMs. (Holding Tags & Valid (Holding Tags & Valid Bits for Way 0) Bits for Way 2) • Specs: ______ ______ Set A DO A DO – 32-bit physical address, byte-addressable system Hit/Miss Hit/Miss = = DI – Cache Size = 64KB DI ________ Tag ________ + V-bit – Block Size = 4 words (16 bytes) + V-bit – Set Size = 4 blocks (64 bytes) Tag RAM Tag RAM # of MM Blocks = _____________ (Holding Tags & Valid (Holding Tags & Valid # of Cache Blocks = _____________ Bits for Way 1) Bits for Way 3) ______ ______ # of Sets = _____________________________ A DO A DO # of Groups = ___________________________ Hit/Miss Hit/Miss = = DI DI ________ ________ Member + V-bit + V-bit TAG SET WORD BYTE A1 – A0 /BE3 - /BE0
7b.13 7b.14 MM & Data RAM Example A31-A4 A3,A2 = 11 A3,A2 = 10 A3,A2 = 01 A3,A2 = 00 256MB 256MB 256MB 256MB 256MB 256MB 256MB 256MB 256MB 256MB 256MB 256MB 256MB 256MB 256MB 256MB 32-bit Bidirectional XCVR 32-bit Bidirectional XCVR 32-bit Bidirectional XCVR 32-bit Bidirectional XCVR A31-A2, Set + Word /BE3 - /BE0 /BE3 - /BE0 /BE3-/BE0 A13-A2 A13-A2 Main memory organization 4 KB 4 KB 4 KB 4 KB Way 0 4 KB 4 KB 4 KB 4 KB Way 2 DRAM TECHNOLOGIES D[31:24] D[23:16] D[15:8] D[7:0] D[31:24] D[23:16] D[15:8] D[7:0] 80386 + Buffers D[31:24] D[23:16] D[15:8] D[7:0] D[31:24] D[23:16] D[15:8] D[7:0] Way 3 Way 1 4 KB 4 KB 4 KB 4 KB 4 KB 4 KB 4 KB 4 KB A13-A2 A13-A2 /BE3 - /BE0 /BE3 - /BE0 7b.15 7b.16 Memory Chip Organization Memory Module Organization Memory module is designed to • Processor with 64-bit Data Bus • Memory technologies share the 1K Bit Lines always access data in chunks the 1 DWord at address 0x000c: A[31:0] = 0000…1100 same layout but differ in their BL[0] BL[1024] size of the data bus (64-bit data Processor Core / Registers cell implementation Control bus = 64-bit accesses) 1 1 – ___________ 5 A[2:0] + SIZE Cell Cell 0x5098a7fb • Parallelizes memory access by _____ ______ 2 – ___________ WL[0] 4 accessing the byte at the same 0000..01 50 98 a7 fb c6 13 8A 57 • Memories require the row bits A[31:3] 10-bits 1 0 location in all (8) memory chips be sent first and are used to 0000000001 Byte/ 7 6 5 4 3 2 1 0 Cell Cell Row Lane select one row (aka “____ line") Addr. Decoder at once WL[1] Row Addr – Uses a hardware component • Only the desired portion will be D[63:56] D[7:0] SRAM and DRAM differ D[15:8] known as a decoder 0x000410 in how each cell is forwarded to the registers 50 8A 57 made, but the • All cells in the selected row organization is roughly • Note the difference between access their data bits and the same 10 17 11 ... 6D system processor address and ... A4 ... 22 0000010 0 output them on their respective 0 Cell Cell 8 Col f e 0x1 57 local memory chip addresses 0x1 50 3 0x1 8A 0000..01 “___________" WL[1023] 0 7 1 Byte address from 0x0 E4 0x0 F8 0x0 2C • The column address is sent next individual chip perspective and used to select the desired Byte address from XXX 000 8 bit lines (i.e. 1 byte) system/processor perspective Amplifiers & Column Mux Column – Uses a hardware component • Each chip on the module reads 1 byte and Addr known as a mux outputs it to form a collectively larger word Data[7:0] in/out on the data bus (i.e. 8-bytes = 64-bits)
Recommend
More recommend