1 System-on-Chip Communication Architecture Dr.-Ing. Mohammad Abdullah Al Faruque Chair for Embedded Systems (CES) Karlsruhe Institute of Technology Al Faruque http://ces.univ-karlsruhe.de/
Columns of Embedded System Design 1. Embedded processor architectures General-purpose computer architectures are hardly appropriate for ES since they offer a fair compromise between many constraints but they do not allow to adapt to the specific needs for ES 2. Electronic System-Level design (ESL) methodologies Raising complexity of systems-on-chip (SOC) requires design methodologies at higher level of abstraction The large design space to be efficiently explored 3. Embedded Software Software engineering: MDA Model-Driven Architecture, ... 4. Technology of integrated circuits New technologies offer new possibilities for ES design Example: reconfigurable computing due to advances in FPGA technology Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
Architectures: General-purpose processor, ASIP, ASIC “efficiency”: $/Mips, mW/MHz, Mips/area, … “ Hardware solution ” ASIC: - Non-programmable, - highly specialized - instruction extension/definition ASIP - parameterization (extensible - inclusion/non-inclusion of processor) functionality/devices “ Software General purpose solution ” processor Flexibility, 1/time-to- market, … Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
Trends: Crisis of Complexity Prediction for the case no ESLTools will be used Millions of Gates However: red curve will 300 apply and lead to SoCs with 100s – 1000s of PEs per chip Available Gates 250 Used Gates 200 150 Design Productivity Gap 100 32 25 50 20 10 55 2 43 50 0.8 1 8 47 0.2 0.3 0.4 3 0 2006 1990 1992 1994 1996 1998 2000 2002 2004 [source: Gartner/Dataquest] Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
Next Generation Handheld Devices The download and the TV continue when an incoming call is accepted Download File X Games, Sensor nodes, Navigation etc. Phone X TV – Channel … X Huge Computational Power and Application Incom coming ng Video Call! Concurrency Computational Power MPSoC Varying requirements Exploiting Application Parallelism Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
Maya (Rabaey’00) Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
Maya (Rabaey’00) Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
Maya (Rabaey’ 00) Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
Maya (Rabaey’ 00) Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
Maya (Rabaey’ 00) Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
The Cell Processor Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
The Cell Processor Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
The Cell Processor Fclock > 4 GHz. Memory bandwidth: 25.6 GBytes per second. I/O bandwidth: 76.8 GBytes per second. Performance: 256 GFLOPS (Single precision at 4 GHz). 256 GOPS (Integer at 4 GHz). 25 GFLOPS (Double precision at 4 GHz). 235 square mm. 235 million transistors. Power consumption estimated at 60 - 80 W @ 4GHz Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
Cell’s Element Interconnect Bus From the trenches: D. Krolak, IBM “Well, in the beginning, early in the development process, several people were pushing for a crossbar switch, and the way the bus is architected, you could actually pull out the EIB and put in a crossbar switch if you were willing to devote more silicon space on the chip to wiring. We had to find a balance between connectivity and area, and there just wasn't enough room to put a full crossbar switch in. So we came up with this ring structure which we think is very interesting. It fits within the area constraints and still has very impressive bandwidth .” Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
Cell’s Element Interconnect Bus 4 rings (2 ckwise + 2 counter-ckwise) No token rings, still request/grant arbitrations Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
Very long wires Year 2005 Year 2010 1 ns (1 GHz) 0.1 ns (10 GHz) B B A A Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
Bus pros ( ) and cons ( ) Every unit attached adds parasitic capacitance, therefore electrical performance degrades with growth. Bus timing is difficult in a deep submicron process. Bus arbiter delay grows with the number of masters. The arbiter is also instance-specific. Bandwidth is limited and shared by all units attached. Bus latency is zero once arbiter has granted control. The silicon cost of a bus is near zero. Any bus is almost directly compatible with most available IPs, including software running on CPUs. The concepts are simple and well understood. Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
What are NoC’s ? According to Wikipedia: “Network -on-a-chip (NoC) is a new paradigm for System-on- Chip (SoC) design. NoC based-systems accommodate multiple asynchronous clocking that many of today's complex SoC designs use. The NoC solution brings a networking method to on-chip communications and claims roughly a threefold performance increase over conventional bus systems.” Imprecise… Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
Main diff. between NoC and Bus Buses do not scale! A single bus does not provide concurrent Bus-based system P E 1 P E 2 transmissions Large bus lengths are prohibitive since geometric. large SoCs plus high frequencies (~10GHz by end of decade) P E 3 P E 4 lead to non-manageable clock skews P E 1 S P E 2 Packets are transmitted – not words NoC-based system Transactions can be executed in parallel Routers in the network provide for decoupling -> no clock skew concerns Routing of wires more structured through tiling -> less complex routing P E 3 S P E 4 Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
Power consumption and NoCs Power is not an issue for large-scale network For NoCs, it is: Increased number of PEs per chip increase leads to increased wiring Communication may also increase since NoCs open new application areas (e.g. embedded multimedia) By principle, a network-based system is still more more power efficient than a bus-based system • -> because a bus-based system broadcasts the information to any possible recipient whereas in a NoC-based system the information (packet) is only been sent to actual recipients Still, due to the trend of communication-centric design styles, the power consumption of the NoC may be a major power consumer of an SoC Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
Overview Motivation NoC design Challenges State-of-the-art: Xpipe NoC Architecture Quality-of-Service (QoS) Architectures Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
NoC: Good news Only point-to-point one-way wires are used, for all network sizes. Aggregated bandwidth scales with the network size. Routing decisions are distributed and the same router is re- instanciated, for all network sizes. NoCs increase the wires utilization (as opposed to ad-hoc p2p wires) Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
There’s no free lunch… Internal network contention causes (often unpredictable) latency. The network has a significant silicon area. Bus-oriented IPs need smart wrappers. Software needs clean synchronization in multiprocessor systems. System designers need reeducation for new concepts. Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
The Communication Task Graph med vu au rast 60 cpu 0.5 40 190 600 40 idct 250 sdram sram1 sram2 ,etc 32 500 0.5 670 173 910 up bab risc adsp samp MPEG Core Graph Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
Application Pull Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
Open Research-Problems Communication Architecture : Topology Channel width Buffer size Floor planning Communication Paradigm : Mapping Scheduling Switching Routing Testing : Prototyping Benchmarking Chair for Embedded Systems Mohammad Abdullah Al Faruque WS09/10
27 Parameters to be Configured for NoC Parameters related to Parameters related to communication paradigm application mapping Routing algorithm Task scheduling Selection of Networks-on-Chip Task to IP mapping switching scheme Topology customization Buffer size customization Floorplanning customization Bandwidth customization Parameters related to router architecture Al Faruque http://ces.univ-karlsruhe.de/
Recommend
More recommend