IEE5008 Autumn 2012 Memory Systems 3D Stacking SRAM - PowerPoint PPT Presentation

IEE5008 –Autumn 2012 Memory Systems 3D Stacking SRAM Anwar,Hossameldin Department of Electronics Engineering National Chiao Tung University Eng_hossam123@yahoo.com Anwar,Hossameldin 2012

Outline  Introduction  3D Technology Process  Physical Characteristics of d2d Vias  Planar SRAM Components  Planar SRAM design Techniques  3D implementations of Banked SRAM Arrays  3D implementations of Multiported SRAM arrays  Bank and Array-Stacked 3D SRAM Benefits  Multiported 3D SRAM Benefits  Conclusion  References Anwar,Hossameldin 2

Introduction  The semiconductor industry faces number of challenges. 1 .Poor Scaling of RC delays. 2 .Power Consumption. 3 .Manufacturing challenges.  3D integration has the potential to address these challenges.  3D integration can reap the advances in traditional planar processes such as d ouble-gate transistors, T ri-gate transistors, f inFETs, s trained Silicon and m etal gates. Anwar,Hossameldin 3

 3D fabrication involves stacking two or more die connected with density and low latency.  The increased density and ability to place and route in 3D provide new opportunities for microarchitecture design.  In 3D fabrication, the dense die-to-die enable 3d SRAM components are partitioned at the levels of individual wordlines or bitlines.  So, the benefits are: 1 .Reduction of wire length within SRAM arrays. Provides simultaneous latency. Provides energy reduction. 2 .Reduction of area footprint. Provides reduction of required wires for global routing. Anwar,Hossameldin 4

3D Technology Process  There are several proposed methods for 3D integration such as Multilayer buried structures(MLBS) Die bonding Anwar,Hossameldin 5

 Multi layer buried structure (MLBS) Structure  Multiple device layers are sequentially fabricated in stacked fashion.  Layer-to-layer connections are made from interlayer vias or from direct source- drain/drain-source contacts.  It uses local polysilicon wires for connection. Advantage  vertical 3D vias can potentially scale down with feature size. Anwar,Hossameldin 6

 Die bonding Structure  It uses conventional planar fabrication processes and metal vias to bond the planar die vertically.  Depositing vias on the top metal layers of each of the two die and/or etching vias through the backside of the die, aligning the two die and bonding them together. Anwar,Hossameldin 7

There are many organizations for multiple die bonding:  Face-to-Face (F2F) bonding.  Face-to-Back (F2B) bonding.  Back-to-Back (B2B) bonding. Anwar,Hossameldin 8

Physical Characteristics of d2d Vias  The thinning of the die, reduces the distance that d2d via must cross to connect the two die.  A d2d vias is much smaller than the planar interconnect.  It reduces both resistance and capacitance.  So, the signal propagation delay between the two die is reduced. Anwar,Hossameldin 9

Planar SRAM Components Caches Basic design parameters  C ache size.  B lock size.  A ssociativity. Features  L arge capacity. 1 .caches are organized as banks to increase bandwidth and decrease power consumption. 2 .Caches are subbanked to save power by sharing sense amplifier circuitry among subbanks.  R equire both tag and data arrays. Anwar,Hossameldin 1 0

Register files Features  L ower capacity  D o not have a tag array.  C onsist of regular array of 6T SRAM cells.  T ypically multiported with multiple read ports and multiple write ports to satisfy the required bandwidth for data processing. Anwar,Hossameldin 11

Planar SRAM array –based components features  Consists of regular array memory cells.  Easy to partition across a multiple die.  SRAM array are viewed as set of wordlines(horizontally) and set of bitlines(vertically).  Row decoder drives the wordlines and control the access transistors of the data storage cells.  The bitlines are read by sense amplifier at the bottom of the array. Anwar,Hossameldin 12

Planar SRAM design Techniques  It used to increase the performance and reduce the power consumption in SRAM arrays. Memory Banking Technique Memory Subbanking Technique Hierarchical Wordline Technique Anwar,Hossameldin 13

 Memory Banking Technique Power Saving Divides the memory array into multiple modules(banks). • Accessing only the bank that contains the required data. • Bandwidth Enhancement If the requested data values located in different banks, • we can simultaneously obtain values out of multiple banks. • Thus, mimicking the effect of a multiported memory array. • But! If multiple addresses target the same bank, we have a bank conflicts. • So, we need a buffer mechanism that stores and reissues the requests, • So that, the target bank provides the requested data values in later clock cycles. • Anwar,Hossameldin 14

Example  Higher order interleaving technique • Divides the memory array into banks based on the higher order address bits. • If the array contains 2^N locations, • One bank contains addresses from 0 to (2^(N-1))-1. • The other bank contains addresses from 2^(N-1) to (2^N)-1.  Lower order interleaving technique • Uses the lower order address bits to identify the banks(odd and even addresses). • If the requested data is located into only one bank, no need to access other banks. • So, it does not consume dynamic power. Anwar,Hossameldin 15

 Memory Subbanking Technique Features A cache block is divided into a number of subbanks. • The required word is chosen using the offset bits in the address. • The subbank selector selects between the two subbanks and feeds the data from only • one subbank into the sense amplifier circuitry. So, a common set of sense amplifiers can be shared across the subbanks. • Data are read out from only one subbank at a time. • Cutting down on the cache power. • Bitline precharge power saving because only the selected subbank needs to be • precharged. Anwar,Hossameldin 16

 Hierarchical Wordline Technique(HWL) Problems Wordlines are heavily loaded by the access transistors (two per SRAM cell) across the • whole row of SRAM cells. Wordlines contribute the overall delay of SRAM access. • HWL structure (Solution) Uses global wordlines(GWL) to drive multiple shorter subwordlines. • The decoder output is used as the global wordline. • So, the wordline loading and latency of driving wordlines are reduced. • Disadvantage Worsen the wire complexity of the wordlines,the wiring requirement of wordlines is • doubled!!. Anwar,Hossameldin 17

3D Implementations of Banked SRAM Arrays  One option for 3D-integrated SRAM array design is to stack banks on the top of each others.  Another option is to split the arrays in multiple layers.  Long metal wires are used to route global signals in banked SRAM arrays. 3D Bank Stacking 3D Array Splitting Anwar,Hossameldin 18

 3D Bank Stacking There are two possible orientations for bank stacking: Left-to-Right Stacking • Top-to-Down Stacking • Notes X is the bank width, Y is the bank height. • Assuming that X=Y. • 67% reduction in horizontal component of wiring to and from the banks. • The vertical component of the bank wiring is unaffected. • So, the reduction in wire length translates into a reduction of power and delay. • Anwar,Hossameldin 19

 3D Array Splitting Features Partitioning individual rows and columns of the SRAM arrays within a a bank and • stacking them upon themselves. Can reduce the length of either wordlines or bitlines depending on the orientation of the • split. Anwar,Hossameldin 20

The First Array-split Configuration  Stacks columns on columns  Single long wordline has been replace by a pair of parallel wordlines.  The decoder must drive the wordlines on both of the die.  So, it requires one d2d via per wordline.  At the bottom of the array, the column select multiplexors have been split across the two die .  So, it requires additional d2d vias.  There are reduction in latency and power due to wordlines length reduction. Anwar,Hossameldin 21

The Second Array-split Configuration  Stacks rows on rows.  The row decoder must be partitioned across the two die.  Decompose the 1-to-n decoder into 1-to-2 decoder and two 1-to-n/2 decoders.  The two 1-to-n/2 decoders are stacked on top of each other.  The 1-to-2 decoder will only active to avoid the stacking of thermally active components.  So, the length of the bitlines reduce to half.  There are latency and power reduction due to wire reduction at both the array and bank levels. Anwar,Hossameldin 22

3D Implementation of Multiported SRAM Arrays  There are many possible design for multiported SRAM array in 3D integration technology. Register Partitioning(RP) Bit Partitioning(BP) Port Splitting(PS) Anwar,Hossameldin 23

IEE5008 Autumn 2012 Memory Systems 3D Stacking SRAM - PowerPoint PPT Presentation

IEE5008 Autumn 2012 Memory Systems 3D Stacking SRAM Anwar,Hossameldin Department of Electronics Engineering National Chiao Tung University Eng_hossam123@yahoo.com Anwar,Hossameldin 2012 Outline Introduction 3D Technology Process

IEE5008 Autumn 2012 Memory Systems PIPELINED SRAM Pranav Arya EECS Intl Graduate Program

IEE5008 Autumn 2012 Memory Systems Quad Data Rate SRAM for the High-Throughput Communication

IEE5008 Autumn 2012 Memory Systems 3D Nand Flash Memory Pranav Arya Department of

Processor + SRAM By: Jakub Hladik, Tim Lindquist The SRAM SRAM REQUIREMENTS: 256x8bit

IEE5008 Autumn 2012 Memory Systems Survey on Memory Access Scheduling For On-Chip Cache

COMP 590-154: Computer Architecture Memory / DRAM SRAM vs. DRAM SRAM = Static RAM As

Book Stacking Harmonic Sums table Albert R Meyer, April 6, 2012 Albert R Meyer,

Hardware Design with VHDL Design Example: SRAM ECE 443 External SRAM A common type of system

Memory Systems Survey on the Off-Chip Scheduling of Memory Accesses in the Memory Interface of

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL,

Information Option Stacking (draft-zheng-dhc-relay-agent-stacking-00) Robin Zheng IETF 76 - DHC

CRS stacking: a simplified explanation Motivation CRS stack Jrgen Mann 1 , Jrg Schleicher 2 ,

WARM SRAM: A Novel Scheme to Reduce Static Leakage Energy in SRAM Arrays Mahadevan

Background Allen Tanner built an SRAM/ROM generator program back in 2004 the ROM seems to

Background w Allen Tanner built an SRAM/ROM generator program back in 2004 n the ROM seems

stereovision Miguel Ares and Santiago Royo (miguel.ares@oo.upc.edu , santiago.royo@upc.edu) COST

NIGEL SOUTHWAY Owner of NEXUS CONSULTING SERVICE 40+ years Broad Business and Manufacturing

in ZNTU Galyna Tabunshchyk Prof. Software Tools Department Development of Embedded System

Signature Analyzer Presented By : Andrew ONeil-Smith, Amy Zamon, Austin Clifton EEN 315

COTS SW Dedication Introduction Dependable Software Laboratory Konkuk Univ.

COTS SW Dedication Introduction and Concept Dependable Software Laboratory Konkuk

Video Conferencing Femi Alabi UNC-CH - Comp 523 November 22, 2010 Introduction

DVTS/HDTV Better than average Video over the Internet Jim DeRoest ( Director, Streaming Media)