a memory system design framework creating smart memories
play

A Memory System Design Framework: Creating Smart Memories Amin - PowerPoint PPT Presentation

A Memory System Design Framework: Creating Smart Memories Amin Firoozshahian, Alex Solomatnikov Hicamp Systems Inc. Ofer Shacham, Zain Asgar, http://www.c 2 s 2 .org Stephen Richardson, Christos Kozyrakis, Mark Horowitz Stanford University An


  1. A Memory System Design Framework: Creating Smart Memories Amin Firoozshahian, Alex Solomatnikov Hicamp Systems Inc. Ofer Shacham, Zain Asgar, http://www.c 2 s 2 .org Stephen Richardson, Christos Kozyrakis, Mark Horowitz Stanford University

  2. An Era of Chip-Multiprocessors… � � Single-thread performance scaling has stopped � � More processor cores on the same die � � Claim: � � Scale performance � � Keep design complexity constant IBM Cell Sun Rock Intel Nehalem Amin Firoozshahian 2

  3. Looking a Little More Closely Sun Rock

  4. Reality… � � Replicated cores � � Incredibly complicated memory system � � Large amounts of logic � � Innovation is in the memory system � � Transactions, streaming, fast synchronization, security, etc. � � Never exactly the same � � Where all the bugs are! Amin Firoozshahian 4

  5. ISA for Memory Systems � � Can we regularize the memory system hardware? � � “Program” it rather than “Design” it? � � Benefits: � � Reduce design time � � Patch errors � � Run-time tuning � � How can we do this? Amin Firoozshahian 5

  6. Shared Memory System � � Resources: Proc Proc � � Local memory � � Data, state bits miss $ $ � � Interconnect � � Controllers Cache Cache Controller Controller � � Operations: Interconnect Msg � � Probing state bits � � Track requests Memory � � Communication � � Data movements (spill / refill) Amin Firoozshahian 6

  7. Streaming Memory System � � Resources: Proc Proc � � Local memory � � Interconnect … Local Local � � Controllers Mem Mem DMA DMA � � Interconnect Operations: � � Communication � � Data movements Memory � � Track outstanding transfers Amin Firoozshahian 7

  8. Transactional Memory System � � Resources � � Local memory Proc Proc � � More state bits � � Interconnect Addr. Addr. $ $ FIFO FIFO � � Controllers Commit Commit Controller Controller � � Operations Interconnect � � Data movements � � State checks / updates Memory � � Communication Amin Firoozshahian 8

  9. Commonalities � � Same resources and operations � � Different in: � � How the operations are sequenced � � Interpretation of state bits � � We need: � � Flexible local storage and interconnect � � Programmable controllers Amin Firoozshahian 9

  10. Local Memories � � Programmable memory mat � � Data array � � State bits State Data � � PLA logic � � Comparator � � Cmp Accessed by Update � � Address, Opcode � � Opcode Returns Address � � data, state, compare result [K. Mai et.al., “ Architecture and Circuit Techniques for a Reconfigurable Memory Block,” 10 IEEE International Solid-State Circuits Conference , February 2004

  11. Programmable Controllers � � Use an off-the-shelf processor? � � FLASH, Typhoon, etc. � � Too slow � � All the way to the L1 cache interface � � Our approach: � � Micro-coded engines (functional units) � � Each class of operations in a separate engine Amin Firoozshahian 11

  12. Programming � � A set of subroutines � � A set of basic operations � � Executed in a functional unit � � Each one calls next � � Link subroutines to each other Unit 2 Msg Msg Unit 1 Unit 3 Amin Firoozshahian 12

  13. Microarchitecture � � A small pipeline � � Configuration (“program”) memories � � Horizontal micro-code � � Decide what to do � � Decide how to proceed Amin Firoozshahian 13

  14. Organization DMA DMA To/From local storages DMA State Data Tracking Update Movement Line Buffers Interrupt MSHR USHR Processor Interface Network Interface To/From Processors To/From Network Amin Firoozshahian 14

  15. Read Miss Example Access Tags Access Data DMA DMA DMA State Data Tracking WB / Miss Evict Read Miss Line Read Update Movement Line Buffers MSHR USHR Interrupt Processor Interface Network Interface Read Miss Read Miss Spill Miss Read Miss Amin Firoozshahian 15

  16. Programming Complexity � � Cache Coherence � � Message types received by controller: 6 � � From processor: Cache miss, Upgrade miss, Prefetch � � From network: Coherence request, Refill, Upgrade � � Subroutine types in Tracking unit: 11 � � Streaming � � Message types: 5 � � Direct access, Gather, Scatter, Gather reply, Scatter ack. � � Subroutine types in Tracking unit: 9 Amin Firoozshahian 16

  17. Smart Memories � � 8-core CMP system � � ST 90nm-GP CMOS technology � � 5.5 ns cycle time (181MHz) 7.77mm � � 2.9M gates, 55M transistors 7.77mm 17

  18. Status � � System bring-up……………...….. � � System configuration……….…... � � JTAG tests…………………….…... � � Coherent shared memory tests… � � Transactional tests (TCC)………. � � Streaming tests…………………… � � More testing in progress � � Planning for a 32-processor system Test Chip Amin Firoozshahian 18

  19. Evaluation � � Comparison with a hardwired controller � � But which one? You would claim I am cheating! � � Compare with an “ideal” controller � � Assume controller actions occur in zero time � � Account for external actions � � Data read/write � � Message send/receive � � Gives an upper bound Amin Firoozshahian 19

  20. Average Read Latency Average Read Latency - 32 processor system 9 Real Controllers 8 Ideal controllers 7 6 Cycles 5 4 3 2 1 0 FFT Barnes FMM 179.art Bitonic Barnes MP3D MPEG2 MPEG2 Sort Enc Enc Coherent Streaming Transactions Shared Memory Amin Firoozshahian 20

  21. Execution Time � � Total average overhead: 15% Average Overhead (%) 30 24.29 25 20.03 Overhead (%) 20 14.51 14.14 15 10.64 8.33 10 7.58 6.93 5 1.88 0 Coherent Streaming Transactions Shared Memory Amin Firoozshahian 21

  22. Conclusion � � Strong similarity between memory systems � � Common resources and operations � � A framework for memory systems design � � Generate specific “instances” � � Modest performance overhead � � Compared to ideal systems Amin Firoozshahian 22

Recommend


More recommend