Addressing the System-on-a- Addressing the System-on-a- Chip Interconnect Woes Through Chip Interconnect Woes Through Communication-Based Design Communication-Based Design J. J. Rabaey, Rabaey, M. Sgroi Sgroi, M. Sheets, A. Mihal, K. Keutzer, S. , M. Sheets, A. Mihal, K. Keutzer, S. Malik Malik, J. Rabaey, , J. Rabaey, M. A. Sangiovanni-Vincentelli Vincentelli A. Sangiovanni- University of California, Berkeley and Princeton University University of California, Berkeley and Princeton University
The SOC Interconnect Challenge The SOC Interconnect Challenge
The SOC Interconnect Challenge The SOC Interconnect Challenge “Femme se coiffant coiffant” ” “Femme se Pablo Ruiz Picasso Pablo Ruiz Picasso 1940 1940
The SOC Interconnect Challenge The SOC Interconnect Challenge System Bus DMA CPU DSP Mem Ctrl. Bridge MPEG I O O C Custom Interfaces Peripheral Control Wires Bus Ad-hoc Approach Ad-hoc Approach
The SOC Interconnect Challenge The SOC Interconnect Challenge System Bus Alternative: Alternative: DMA CPU DSP A disciplined SOC disciplined SOC interconnect design approach interconnect design approach that addresses: Mem • reliability Ctrl. Bridge • predictability • performance MPEG • power dissipation concerns caused by deep- submicron effects and I O O C complexity considerations, and exploits advanced Custom communication techniques Interfaces Peripheral Control Wires Bus Ad-hoc Approach Ad-hoc Approach
The Network-on-a-Chip (NOC) Approach The Network-on-a-Chip (NOC) Approach Memory Embedded Memory Embedded Sub-system Processors Sub-system Processors Interconnect Backplane Programmable Configurable Programmable Baseband Configurable Baseband Protocol Stack Accelerators Protocol Stack Processing Accelerators Processing Communication-based Design Communication-based Design • Orthogonalizes Orthogonalizes function and communication function and communication • • Builds on well-known Builds on well-known models-of-computation models-of-computation and correct-by-construction and correct-by-construction • synthesis flow synthesis flow • Parallels Parallels layered approach layered approach exploited by communications community exploited by communications community •
How Does the Communication Network How Does the Communication Network World Deal with these Problems? World Deal with these Problems? • Scalable clusters of Massive Cluster heterogeneous networks • Wide range of data units Clusters at different levels of Gigabit Ethernet abstraction (streams, packets, bits) • With varying throughput, latency and reliability requirements Central tenet: Layered approach standardized as the ISO-OSI Reference Model.
The ISO Protocol Stack The ISO Protocol Stack Presentation/Application Presentation/Application • Reference model for wired and wireless protocol design —Also useful guide for for conception Session Session and decomposition of NOCs • Layered approach allows for orthogonalization of concerns Transport Transport and decomposition of constraints • Not required to implement all layers of the stack Network – depends upon application needs and Network technology • Layered structure must not necessarily be maintained in final Data Link Data Link implementation – e.g., multiple layers can be merged in implementation optimization Physical Physical
The ISO Protocol Stack The ISO Protocol Stack Presentation/Application Presentation/Application Session Session Transport Transport Network Network Transmit bits over physical interconnect medium Data Link (signal waveform, voltages, timing, Data Link synchronization) Physical Physical Example: synchronous reduced- swing pulse-based signaling
The ISO Protocol Stack The ISO Protocol Stack Presentation/Application Presentation/Application Session Session Transport Transport Reliable transmission over physical link + Network media access control (MAC) Network (error detection and coding, multiple- access scheme, arbitration) Data Link Data Link Example: Bus Physical Physical
The ISO Protocol Stack The ISO Protocol Stack Presentation/Application Presentation/Application Session Session Topology-independent end-to-end Transport communication over multiple data links Transport (routing, bridging, repeaters) Network Network Example: Statically-configured mesh network of FPGA Data Link Data Link Physical Physical
The ISO Protocol Stack The ISO Protocol Stack Presentation/Application Presentation/Application Establish and maintain end-to-end communications (flow control, message Session Session reordering, packet segmentation and reassembly) Transport Transport Example: Establish, maintain and rip-up connections in dynamically Network Network reconfigurable SOCs Data Link Data Link Physical Physical
The ISO Protocol Stack The ISO Protocol Stack Presentation/Application Presentation/Application Session Session Adds state to the end-to-end connection provided by the protocol stack Transport Transport Example: Synchronous messaging, requiring sender and receiver to rendez- Network Network vous using semaphore Data Link Data Link Physical Physical
The ISO Protocol Stack The ISO Protocol Stack Presentation/Application Presentation/Application Exports communication architecture to system and performs data formatting and Session Session conversion Example: Change byte-ordering of data Transport Transport to ensure compatibility Network Network Data Link Data Link Physical Physical
Example: The Pleiades Network-on-a-Chip Example: The Pleiades Network-on-a-Chip .. Address FPGA Memory Network Interface Generator Dedicated Reconfigurable Interconnect Network Arithmetic .. Arithmetic Arithmetic Processor Configuration Processor Embedded Processor Configuration Bus • Programmable/configurable platform intended for low-energy communication and signal-processing applications (wireless, media) • Allows for dynamic task-level reconfiguration of large-granularity modules into dedicated “data-flow” accelerators [Zhang, ISSCC 00]
Maia: Reconfigurable : Reconfigurable Baseband Baseband Maia Processor for Wireless Processor for Wireless
A Session-level Perspective A Session-level Perspective Embedded processor Code seg end start AddrGen for(i=1;i<=L;i++) MEM: in for(k=i;k<=L;k++) AddrGen phi[i][k]= phi[i-1][k-1] MPY MPY +in[NP-i]*in[NP-k] MEM: phi -in[NA-1-i]*in[NA-1-k]; ALU Code seg ALU “Configure” modules Set up connections
The Network Layer The Network Layer Hierarchical reconfigurable mesh network Level-1 Mesh Level-2 Mesh Cluster Cluster Universal Hierarchical Switchbox Switchbox • Network statically configured at start of session and ripped up at end • Structured approach reduces interconnect energy with factor 7 over straightforward cross-bar
The Physical Layer The Physical Layer Reconfigurable Network Reconfigurable Network Globally Globally d in req in d out req out ack out ack in Asynchronous Asynchronous D in 2-phase self-timed REQin handshaking protocol Co-Processor Co-Processor Allows individual modules done Module Module to dynamically ( µ µ Proc Proc, ALU, MPY, SRAM…) , ALU, MPY, SRAM…) ( trade-off performance for energy-efficiency
The Physical Layer The Physical Layer Globally Globally Reconfigurable Network Reconfigurable Network Asynchronous Asynchronous d in req in d out req out ack out ack in D in Physical Layer REQin Interface Module d in d out clk done Clk done Co-Processor Co-Processor Module Module Locally Locally (ALU, MPY, SRAM…) (ALU, MPY, SRAM…) synchronous synchronous
The Physical Layer The Physical Layer Reduced voltage swing Reduced voltage swing on interconnect reduces Reconfigurable Network Reconfigurable Network on interconnect reduces energy by factor 3.4 energy by factor 3.4 0.4 V d in 0.4 V req in d out req out ack out ack in clk clk Physical Layer Interface Module in in d in 1 V 1 V 1V d out clk done 1V 0.4V d 0.4V d B A B A Co-Processor out out Module (ALU, MPY, SRAM…) level-converters
Metropolis Design Methodology Metropolis Design Methodology P P • Orthogonalization of concerns: separation of communication and P1’ P2’ computation • Formal system representation (supporting multiple Models of P1’ P2’ Computation) • Formal Methodology for Communication Refinement: P1’ A A P2’ sequence of adaptation steps between objects (processes and channel) with incompatible behaviors P1” P2”
Metropolis Design Methodology Metropolis Design Methodology Behavior Adapter: Adapt communicating processes with incompatible behaviors P1 P2 3 kb 1 kb P1 P2
Metropolis Design Methodology Metropolis Design Methodology Behavior Adapter: Adapt communicating processes with incompatible behaviors BA P1 P2 3 kb 1 kb P1 BA P2 Segmentation
Metropolis Design Methodology Metropolis Design Methodology Behavior Adapter: Adapt communicating processes with incompatible behaviors P1’ P2’ 1 kb 1 kb P1’ P2
Metropolis Design Methodology Metropolis Design Methodology Channel Selection: Select a (non-ideal) channel that physically transports messages P1’ P2’ P1’ P2 Wireless Channel (BER=10 -3 ) Globally Asynchronous Locally Synchronous Model
Recommend
More recommend