Impact of VLSI Scaling and Synthesis on Multimedia Processor Cores T OM ´ AS B AUTISTA AND A NTONIO N ´ U ˜ NEZ CAD Division, IUMA – Applied Microelectronics Research Institute. University of Las Palmas de Gran Canaria. E-35017 Las Palmas de Gran Canaria, Canary Islands, Spain. E-mail: bautista@cma.ulpgc.es paradigm is digital media processing, especially for medium and Abstract — In this paper we present experimental results obtained dur- ing the modelling, design and implementation of a full set of versions of low bit-rate video decoding. In the digital media domain, pro- SPARC v8 Integer Unit core aimed for embedded applications in digital cessor workload is dominated by video processing tasks [Pir98], media products. VHDL has been the description language, Synopsis tools [Ack94]. In order to cope with this load, in particular for those for the logical synthesis, and Duet Technologies’ Epoch has been used for the physical layout of the final circuits. They have been mapped to 0.50 high bit-rate video coding, high-end architectures are being con- � m, three metal layers processes, in order to study the impact and 0.35 ceived and developed using superscalar, vector, and parallel pro- of VLSI scaling on SPARC microarchitectural features. The quantitative cessors [RS98], [Ses98], [Pur98], [RK96]. results given characterize suitable points in the design space. They show how much microarchitecture, design, datapath granularity and module de- cisions affect performance and cost functions. Design space exploration down to physical layouts is made possible by modelling techniques based A. Limits of specialization on configurable VHDL descriptions. The vector-microprocessor paradigm has been explored in I. I NTRODUCTION depth in [Asa98] as a result of assessing the vectorizability of SPECint programs. A quantitative analysis of extending the As feature size of tecnological processes approach deep sub- short-vector microprocessor approach to long-vector micropro- micron technnologies, and as metal layers are not a bottleneck cessors has been given recently in [LS98] demonstrating a clear anymore, the integration density available on chip is becoming performance advantage for multimedia applications over simple extremely high. The natural trend is to take this density for free. scalar and superscalar processors, up to a three-fold improve- However, deeper submicron technologies also bring along new ment factor. It also shows layout-area costs which can become problems especially related to wire delay and power consump- up to one order of magnitude higher compared to simple scalar tion. processors with multimedia extensions. A new design paradigm has emerged: the synthesis of large Another related architectural trend is represented by VLIW cores (in-house propietary or outsourced intellectual property) approaches aimed to find and automatically generate efficient ultimately building very large and complete systems on a chip. architectures through processor specialization. Relying on the This paradigm also calls for a synthesis approach relying on ar- power of the highly optimizing HP Labs Cambridge C Com- chitectural, logic and layout synthesis tools. This is in contrast piler, quantitative results reported in [FFD96] show performance to mainstream design approaches relying on full-custom cores. gains for high cost tightly targeted VLIW architectures, but Complexity issues in processor architectures related to feature also show dramatic performance losses in low and medium cost size evolution has been studied among others by Palacharla et VLIW architectures if too narrow-scope custom-fit processors al [fro]. They haev studied the tradeoff between hardware and are defined from the application. clock speed from an architectural point of view by using key After running 5730 experiments with 191 VLIW architectures pieces of full-custom layouts good to estimate the clock cycle in tailored to fit, in a wide range, 10 multimedia benchmarks, au- superscalar processors, and for geometries ranging from 0.8 ✁ m thors conclude: “If and when the cost of individual chip design to 0.18 ✁ m. The layouts for the 0.35 and 0.18 ✁ m process were becomes very much lower than it is today, it will make a lot of obtained by appropriately shrinking the layouts for the 0.80 ✁ m sense to build chips for the narrowest of embedded applications. process. Today, that seems like a dangerous route to attempt”. We set the goal to conduct a similar study but under a “synthesis-based approach” for the design rather than a “full- In recent years the advantages of standard, mainstream, pro- custom-based approach”, and to analyze also the effect of the grammable solutions have also been highlighted. These solu- different synthesis steps on the various levels of description and tions rely on standard processors available as cores for embed- design options of an architecture. In our case we developed ded systems. This approach helps in software development since completely processor layouts (over one hundred implementa- they are based on well established processor architectures and tions) for 0.50 and 0.35 ✁ m technologies. A study including efficient optimising compilers. Process technology advances are the 0.18 ✁ m process is also underway. also bringing these processors to speed marks that make soft- One of the industrial fields demanding this dense design ware solutions ever attractive.
Recommend
More recommend