the energy efficiency of cmp vs smt for
play

The Energy Efficiency of CMP vs. SMT for Multimedia Workloads - PDF document

The Energy Efficiency of CMP vs. SMT for Multimedia Workloads Ruchira Sasanka Sarita V. Adve Yen-Kuang Chen Eric Debes University of Illinois at Urbana-Champaign Architecture Research Labs Department of Computer Science Intel Corporation


  1. The Energy Efficiency of CMP vs. SMT for Multimedia Workloads ∗ Ruchira Sasanka Sarita V. Adve Yen-Kuang Chen Eric Debes University of Illinois at Urbana-Champaign Architecture Research Labs Department of Computer Science Intel Corporation { sasanka, sadve } @cs.uiuc.edu { yen-kuang.chen, eric.debes } @intel.com UIUC CS Technical Report UIUCDCS-R-2003-2325, March 2003 Intel Technical Report 130581, March 2003 Abstract 1 Introduction This paper compares the energy efficiency of This paper compares the energy efficiency of chip multi- chip multiprocessing (CMP) [10] and simulta- processing (CMP) and simultaneous multithreading (SMT) neous multithreading (SMT) [19] for multime- on modern out-of-order processors for the increasingly im- dia applications on modern out-of-order general- portant multimedia applications. Since performance is an important metric for real-time multimedia applications, we purpose processors (GPPs). Multimedia applications compare configurations at equal performance . We perform are becoming increasingly important for GPPs in a variety this comparison for a large number of performance points of systems including desktops, laptops, tablet PCs, and derived using different processor architectures and frequen- likely future handheld devices. GPPs have begun to support cies/voltages. multithreading for improved throughput, using either CMP or SMT. These techniques are a good match for multimedia We find that for the design space explored, for each work- applications which are inherently multithreaded. However, load, at each performance point, CMP is more energy effi- multimedia applications often run on portable systems cient than SMT. The difference is small for two thread sys- facing strict energy constraints. It is therefore important to tems, but large (18% to 44%) for four thread systems. We study the energy efficiency of general-purpose CMP and also find that the best SMT and the best CMP configuration SMT architectures for multimedia applications. for a given performance target have different architecture SMT allows multiple application threads to be run at the and frequency/voltage. Therefore, their relative energy ef- same time, within the same processor, potentially increasing ficiency depends on a subtle interplay between various fac- utilization of the processor resources. Specifically, current tors such as capacitance, voltage, IPC, frequency, and the wide issue out-of-order processors are often unable to uti- level of clock gating, as well as workload features. We per- lize the full supported fetch/decode/issue width for a single form a detailed analysis considering these factors and de- thread. SMT utilizes these otherwise wasted resources for velop a mathematical model to explain these results. other threads, potentially improving total throughput with Although CMP shows a clear energy advantage for four- little additional hardware. CMP, on the other hand, im- thread (and higher) workloads, it comes at the cost of in- proves throughput by adding additional processors rather creased silicon area. We therefore investigate a hybrid solu- than improving their utilization. tion where a CMP is built out of SMT cores, and find it to be At first glance, SMT may appear to be inherently more an effective compromise. Finally, we find that we can reduce energy efficient than CMP since it potentially uses its re- energy further for CMP with a straightforward application sources more effectively – SMT can get more IPC (instruc- of previously proposed techniques of adaptive architectures tions per cycle) from less hardware. However, in reality, and dynamic voltage/frequency scaling. the comparison is more complex, both in the analysis to un- derstand the experimental results and in the methodology to ∗ This work is supported in part by an equipment donation generate the right results. from AMD Corp., a gift from Intel Corp., and the National Sci- Sources of complexity and our solutions. For real-time ence Foundation under Grant No. EIA-0103645, CCR-0209198, multimedia applications, performance is a key constraint. A CCR-0205638, EIA-0224453, and CCR-0313286. Sarita V. Adve fair comparison of energy must therefore also consider per- was also supported by an Alfred P. Sloan Research Fellowship. formance. As a result, we compare the energy of SMT and Ruchira Sasanka was supported by an Intel graduate fellowship and began this work as a summer intern at Intel. 1

Recommend


More recommend