Acceleration Targets: A Study of Popular Benchmark Suites Lisa Wu and Martha A. Kim, Department of Computer Science, Columbia University { lisa,martha } @cs.columbia.edu 1 Introduction Benchmark Granularity Suite fine medium coarse Using dark silicon [14, 3] to deploy specialized ac- celerators is an idea that is gaining traction in the SPEC2006 function – application SPECJVM method class package architecture community [4, 5, 6, 1]. The underlying DACAPO method class package rationale is that specialized hardware and its atten- function – object UNLADEN-SWALLOW dant efficiency is the most effective way to draw per- formance in the anticipated power-limited scenarios. Table 1: Acceleration Targets for Each Suite Given the cost associated with designing, verifying, and deploying an accelerator, conventional wisdom dictates that a particular operation becomes an eco- analysis for each target granularity in each bench- nomical and realistic acceleration target when it is mark suite, as outlined in Table 1. used across a range of applications. 3 Results and Analysis In this study, we survey a set of popular benchmark Our results show that popular benchmark suites suites, assessing the potential of several acceleration exhibit minimal functional level commonality. For targets within them. In particular, we explore the example, it would take 500 unique, idealized accel- following three questions: erators to gain a 48X speedup across the SPEC2006 • Do the benchmarks exhibit any common func- benchmark suite. The C code is simply not mod- tionality at or above the function level? ular for acceleration, and few function accelerators can be re-used across a range of applications. For • What impact does the language or programming benchmarks written in Java, however, we see more environment have on the potential acceleration commonality as language level constructs such as of a suite of applications? classes encapsulate operations for easy re-use. The • How many unique accelerators would be required question remains whether building 20 accelerators for to see benefits across a particular benchmark SpecJVM or 50 accelerators for Dacapo is worth the suite? Does this change across suites and source investment for the 10X speedups to be had. In the programming languages? particular Python benchmark suite we used, we found that the applications made minimal use of the built- 2 Methodology ins (e.g., dict or file) resulting in very minimal op- To explore these questions, we profile four bench- portunity for acceleration beyond the methods them- mark suites: SPEC2006 (C) [11], SPECJVM selves. Our intuition is that this may be an artifact of (Java) [12], Dacapo (Java) [2], and Unladen-Swallow a computationally-oriented performance benchmark (Python) [13]. Each source language provides a suite, and is likely not reflective of the overall space slightly different set of potential acceleration targets. of Python workloads. For example, SPEC2006 is written in C and offers 4 Conclusion two target granularities: individual functions or en- tire applications. In contrast, a Java benchmark of- Our analyses of SPEC2006 confirm what C- fers three granularities: methods, classes (i.e., all of cores [14], ECO-cores [10], and DYSER [5] also found: the methods for a particular class), and entire appli- that when accelerating unstructured C code, the best cations. We classify each of these potential targets targets are large swaths of highly-application-specific as fine , medium , or coarse granularity according to code. Our Java analyses indicate some hope for com- Table 1. mon acceleration targets in classes, though the ad- For each class of acceleration targets, we sort the vantage of targeting classes over individual methods targets by decreasing execution time across the en- appears modest. Across the board, our data show tire benchmark suite. Assuming that building an ac- that filling dark silicon with specialized accelerators celerator for a particular target (1) provides infinite will require systems containing tens or even hundreds speedup of the target, and (2) incurs no data or con- of accelerators. In light of this, we believe the infras- trol transfer overhead upon invocation or return, we tructure associated with these accelerators (e.g., net- compute an upper bound on the speedup of the over- works, memory models [7, 9, 8], and toolchains[14]) all suite for the most costly target(s). We repeat this will only increase in importance. 1
Figure 1: Max speedup of benchmark suite for { fine, medium, and coarse } -granular acceleration targets. References for high-performance, low-power accelerator- [1] C. Cascaval et al. A taxonomy of accelerator ar- based systems. IEEE Computer Architecture chitectures and their programming models. IBM Letters , 9(2):53–56, Feb. 2010. Journal of Research and Development , 54(5):1– [9] B. Saha et al. Programming model for a hetero- 10, 2010. geneous x86 platform. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming [2] The Dacapo Benchmark Suite. http:// Language Design and Implementation (PLDI) . dacapobench.org/ . ACM, 2009. [3] H. Esmaeilzadeh et al. Dark silicon and the end [10] J. Sampson et al. Efficient complex operators of multicore scaling. In ISCA , pages 365–376, for irregular codes. In Proceedings of the 17th 2011. International Symposium on High Performance [4] N. Goulding-Hotta et al. GreenDroid : A mobile Computer Architeture (HPCA) , pages 491–502. application processor for a future of dark silicon. ACM, Feb 2011. IEEE Micro , 31(2):86–95, 2011. [11] Standard Performance Evaluation Corporation. [5] V. Govindaraju et al. Dynamically specialized http://www.spec.org/cpu2006/ . datapaths for energy efficient computing. In [12] Standard Performance Evaluation Corporation. HPCA , pages 503–514, 2011. http://www.spec.org/jvm2008/ . [6] R. Hameed et al. Understanding sources of inef- [13] Unladen Swallow Benchmarks. http: ficiency in general-purpose chips. In ISCA , pages //code.google.com/p/unladen-swallow/ 37–47, June 2010. wiki/Benchmarks . [7] J. Kelm et al. Cohesion: a hybrid memory model [14] G. Venkatesh et al. Conservation cores: reducing for accelerators. In ISCA , pages 429–440, June the energy of mature computations. In ASPLOS , 2010. pages 205–218, Mar. 2010. [8] M. Lyons et al. The accelerator store framework 2
Acceleration Targets: A Study of Popular Benchmark Suites Lisa Wu and Martha A. Kim Columbia University June 10, 2012 Tuesday, June 19, 2012
The story starts like this: Princess Ruruna and her helper Cain have a problem: To face dark silicon head on, they want to find applications that have acceleration potential. But what can they do to tackle the problem? Let’s see what Tico the fairy has to say... But what should we accelerate? Columbia University 2 Tuesday, June 19, 2012
Let’ s start by looking at some popular benchmark suites 1 ! Do the benchmarks exhibit any common functionality? ! If so, is it at or above the function level? Columbia University 3 Tuesday, June 19, 2012
I will profile SPEC2006 and see if I can To get a 10X speedup, we answer this question... need to accelerate over 189 unique functions! /@AB$##("C44D12"E9" ):;<%##=" !###" !####" If the hottest function runs lightening fast, how much faster would the suite be? 539<F49" ,-."/011230"45"/3671" &'(")*++,-*"./")-01+" !###" !##" !##" /-36>.3" !#" !#" !" !" #" !##" $##" %##" &##" '##" (##" )##" *##" +##" !###" #" $##" !###" !$##" %###" %$##" 896:31";<<1=1>-74>?" 2304-+"566+7+8'1.89" Columbia University 4 Tuesday, June 19, 2012
Hmm... Good! It only takes 21! Oh wait...we need to accelerate 21 ):;<%##=" different applications for a 12X 21 different applications *;<=$##>"?//@,-"A4" !####" speedup?! What if we !###" accelerated a bigger target? &'(")*++,-*"./")-01+" !###" '()"*+,,-.+"/0"*.12," 0.47B/4" !##" (++817(B/4" !##" /-36>.3" '**706'>.3" !#" !#" !" !" #" $##" !###" !$##" %###" %$##" #" $%" %#" &%" !##" !$%" !%#" !&%" $##" 2304-+"566+7+8'1.89" 3415.,"677,8,9(2/9:" Columbia University 5 Tuesday, June 19, 2012
It seems that 2 SPEC2006 cannot be accelerated easily... ! What about benchmark suites that are not written in C? ! What impact does the language or programming environment have on acceleration potential? How about other benchmark suites? Columbia University 6 Tuesday, June 19, 2012
fine Each source language provides a slightly different set of potential medium acceleration targets. coarse Benchmark Granularity Suite fine medium coarse - e SPEC2006 function – application SPECJVM method class package DACAPO method class package - function – object UNLADEN-SWALLOW er- . Table 1: Acceleration Targets for Each Suite Columbia University 7 Tuesday, June 19, 2012
Recommend
More recommend