iee5008 autumn 2012 memory systems pipelined sram
play

IEE5008 Autumn 2012 Memory Systems PIPELINED SRAM Pranav Arya - PowerPoint PPT Presentation

IEE5008 Autumn 2012 Memory Systems PIPELINED SRAM Pranav Arya EECS Intl Graduate Program National Chiao Tung University pranav_arya7@yahoo.co.in Pranav Arya 2012 Outline Introduction Cache organization Cache implementation


  1. IEE5008 – Autumn 2012 Memory Systems PIPELINED SRAM Pranav Arya EECS Int’l Graduate Program National Chiao Tung University pranav_arya7@yahoo.co.in Pranav Arya 2012

  2. Outline  Introduction  Cache organization  Cache implementation  Pipelined SRAM  Wave pipeline cache  Pipelined-burst cache  Conclusion  Reference 2 Pranav Arya NCTU IEE5008 Memory Systems 2012

  3. Introduction  Processors – fast and faster  Parallelism – ILP, thread level parallelism  Multicore Architectures  Memory – not so fast  More varieties  Relatively less speed improvement 3 Pranav Arya NCTU IEE5008 Memory Systems 2012

  4. Introduction (contd.)  Development trends: Processor vs Memory [1] Figure1: Comparison of performance improvement of processors and memory over the years [1] 4 Pranav Arya NCTU IEE5008 Memory Systems 2012

  5. Introduction (contd.)  Nearest to processing unit; fastest  Principle of Locality of reference  Cache parameters  Organization  Content Management  Consistency Management 5 Pranav Arya NCTU IEE5008 Memory Systems 2012

  6. Introduction: Cache Organization  Three ways to organize cache  Direct mapped  Fully associative  Set associative Figure2: Various cache organizations [1] 6 Pranav Arya NCTU IEE5008 Memory Systems 2012

  7. Cache Organization (contd.)  Multilevel cache hierarchy (L1, L2 and L3) L1 On chip L2 Off chip L3 Off chip; shared Figure3. Cache Hierarchy Figure4. 8-core Nehalem processor and its Cache hierarchy [7] 7 Pranav Arya NCTU IEE5008 Memory Systems 2012

  8. Cache Implementation  Organization  Physical implementation  Control and timing 8 Pranav Arya NCTU IEE5008 Memory Systems 2012

  9. Physical Implementation Figure5: Different implementations of SRAM cell [1] 9 Pranav Arya NCTU IEE5008 Memory Systems 2012

  10. Control and Timing  SRAM control signals and timing signals  Timing operations  Two types based on timing  Asynchronous SRAM  Synchronous SRAM 10 Pranav Arya NCTU IEE5008 Memory Systems 2012

  11. Asynchronous Operation  ATD based operation Figure6. 2-way set associative asynchronous SRAM and its timing diagram [1] 11 Pranav Arya NCTU IEE5008 Memory Systems 2012

  12. Synchronous SRAM  Clock based operation  Completely synchronized (single clock)  Partial synchronization (two clocks)  Pipelined operation  Wave pipeline mode  Pipelined-burst mode 12 Pranav Arya NCTU IEE5008 Memory Systems 2012

  13. Synchronous SRAM: Wave Pipeline Cache  Early implementation model  High capacity, high speed  Operation based on clock signal  Internal clock – for internal circuitry  External – for addressing  Combined – external clock for addressing, internal for the SRAM core 13 Pranav Arya NCTU IEE5008 Memory Systems 2012

  14. Wave Pipeline Cache: Example 1  Fully pipelined 512kb SRAM, 2ns cycle time Figure7: Block Diagram of 512kb CMOS Pipelined SRAM [2] 14 Pranav Arya NCTU IEE5008 Memory Systems 2012

  15. Wave Pipeline: Example 1 (contd.)  8-stage pipeline synchronized to a clock signal Figure8. Pipelined operation for the 512kb SRAM [2] 15 Pranav Arya NCTU IEE5008 Memory Systems 2012

  16. Wave Pipeline: Example 2  Need for SRAM to directly connect with high- frequency CPU bus line  Two stage wave pipeline  First stage – clock triggered asynchronous SRAM core operation  Second stage – clock triggered synchronous data output 16 Pranav Arya NCTU IEE5008 Memory Systems 2012

  17. Wave Pipeline: Example 2 (contd.) Figure9. Block diagram of a 16 Mb synchronous SRAM and its wave pipeline operation [3] 17 Pranav Arya NCTU IEE5008 Memory Systems 2012

  18. Wave Pipeline: Improvements  Issue with early designs  Synchronization of system clock with output data at high frequency – overlap of data waves  Reason – sensitivity of access time to variations in voltage, temperature and process.  Solution to synchronization issue – dual sensing latches 18 Pranav Arya NCTU IEE5008 Memory Systems 2012

  19. Wave Pipeline: Example 3-Sensing Latches Figure10. Dual-sensing scheme [4] Figure11. Dual-sensing latch circuit diagram [4] 19 Pranav Arya NCTU IEE5008 Memory Systems 2012

  20. Example 3-Sensing Latches (contd.)  Use of two clocking signals  Clock for addressing and driving internal circuit  Clock’ to mux and latch out the data Figure12. Data wave diagram after latching [4] Figure13. Dependence of cycle time and maximum access time [4] 20 Pranav Arya NCTU IEE5008 Memory Systems 2012

  21. Pipelined-Burst SRAM  Used in most modern SRAM architectures  Burst mode read and write operations  X-1-1-1 operations 21 Pranav Arya NCTU IEE5008 Memory Systems 2012

  22. Pipelined-Burst SRAM: Example 1  Features:  4-1-1-1 pipelined-burst scheme  Burst read of four 32bit word; Data prefetched for write operation Figure14. Synchronous pipelined-burst SRAM block diagram [5] 22 Pranav Arya NCTU IEE5008 Memory Systems 2012

  23. Example 1 (contd.)  Read and write in bursts  Idle cycles in RAW and WAR conditions Figure15. Timing diagram for the pipelined-burst SRAM block diagram given in figure14 [5] 23 Pranav Arya NCTU IEE5008 Memory Systems 2012

  24. Example 2: Some Improvements in design  Added double-late address-data buffers (DLWBs) Figure16. Synchronous pipelined-burst SRAM block diagram using DLWBs [5] 24 Pranav Arya NCTU IEE5008 Memory Systems 2012

  25. Example 2 (contd.) Figure17. Synchronous pipelined-burst SRAM block diagram using DLWBs [5] 25 Pranav Arya NCTU IEE5008 Memory Systems 2012

  26. Example 2 (contd.) Figure18. Timing diagram for pipelined-burst SRAM using DLWBs [5] 26 Pranav Arya NCTU IEE5008 Memory Systems 2012

  27. Conclusion  SRAM performance improvements achievable through pipelining  Various schemes available for pipelining  Wave pipeline shows variable performance due to clock synchronization issues  Pipelined-burst SRAM better since data read/write occur in bursts – faster data operations on SRAM blocks 27 Pranav Arya NCTU IEE5008 Memory Systems 2012

  28. Reference 1. B. Jacob, S. W. Ng, D. T. Wang. Memory systems: Cache, DRAM, Disk. 2. Terry I. Chappell, Barbara A. Chappell, Stanley E. Schuster, James W. Allan, Stephen P. Klepner, Rajiv V. Joshi and Robert L. Franch.. A 2-ns Cycle, 3.8-11s Access 512-kb CMOS ECL SRAM with a Fully Pipelined Architecture, IEEE Journal of solid-state circuits, VOL. 26, NO. 11, November 1991 3. Kazuyuki Nakamura, Shigeru Kuhara, Tohru Kimura, Masahide Takada, Hisamitsu Suzuki, Hiroshi Yoshida, and Tohru Yamazaki. A 220-MHz Pipelined 16-Mb BiCMOS SRAM with PLL Proportional Self-Timing Generator, IEEE Journal of solid-state circuits, VOL. 29, NO. 11, November 1994 4. Suguru Tachibana, Hisayuki Higuchi, Koichi Takasugi, Katsuro Sasaki, Toshiaki Yamanaka, and Yoshinobu Nakagome. A 2.6ns Wave-Pipelined CMOS SRAM with Dual-Sensing-Latch Circuits, IEEE Journal of solid-state circuits, VOL. 30, NO. 4, April 1995 5. Kazuyuki Nakamura, Koichi Takeda, Hideo Toyoshima, Kenji Nodal, Hiroaki Ohkubo, Tetsuya Uchida, Toshiyuki Shimizu, Toshiro Itani, Ken Tokashiki, Koji Kishimoto. A 500MHz 4Mb CMOS Pipeline-Burst Cache SRAM with Point-to-Point Noise Reduction Coding 110, Journal of solid-state circuits, VOL. 32, NO. 11, November 1997 6. Cangsang Zhao, Uddalak Bhattacharya, Martin Denham, Jim Kolousek, Yi Lu, Yong-Gee Ng, Novat Nintunze, Kamal Sarkez, and Hemmige D. Varadarajan. An 18-Mb, 12.3-GB/s CMOS Pipeline-Burst Cache SRAM with 1.54 Gb/s/pin, IEEE Journal of solid-state circuits, VOL. 34, NO. 11, November 1999 7. D. Molka, D. Hackenberg, R. Schone, and M.S. Muller, Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System, 18th International Conference on Parallel Architectures and Compilation Techniques, September 2009 28 Pranav Arya NCTU IEE5008 Memory Systems 2012

  29. THANK YOU 29 Pranav Arya NCTU IEE5008 Memory Systems 2012

Recommend


More recommend