Acceleration Targets: A Study of Popular Benchmark Suites Lisa Wu - PDF document

Acceleration Targets: A Study of Popular Benchmark Suites Lisa Wu and Martha A. Kim, Department of Computer Science, Columbia University { lisa,martha } @cs.columbia.edu 1 Introduction Benchmark Granularity Suite fine medium coarse Using dark silicon [14, 3] to deploy specialized accelerators is an idea that is gaining traction in the SPEC2006 function – application SPECJVM method class package architecture community [4, 5, 6, 1]. The underlying DACAPO method class package rationale is that specialized hardware and its atten- function – object UNLADEN-SWALLOW dant efficiency is the most effective way to draw performance in the anticipated power-limited scenarios. Table 1: Acceleration Targets for Each Suite Given the cost associated with designing, verifying, and deploying an accelerator, conventional wisdom dictates that a particular operation becomes an eco- analysis for each target granularity in each bench- nomical and realistic acceleration target when it is mark suite, as outlined in Table 1. used across a range of applications. 3 Results and Analysis In this study, we survey a set of popular benchmark Our results show that popular benchmark suites suites, assessing the potential of several acceleration exhibit minimal functional level commonality. For targets within them. In particular, we explore the example, it would take 500 unique, idealized accel- following three questions: erators to gain a 48X speedup across the SPEC2006 • Do the benchmarks exhibit any common func- benchmark suite. The C code is simply not mod- tionality at or above the function level? ular for acceleration, and few function accelerators can be re-used across a range of applications. For • What impact does the language or programming benchmarks written in Java, however, we see more environment have on the potential acceleration commonality as language level constructs such as of a suite of applications? classes encapsulate operations for easy re-use. The • How many unique accelerators would be required question remains whether building 20 accelerators for to see benefits across a particular benchmark SpecJVM or 50 accelerators for Dacapo is worth the suite? Does this change across suites and source investment for the 10X speedups to be had. In the programming languages? particular Python benchmark suite we used, we found that the applications made minimal use of the built- 2 Methodology ins (e.g., dict or file) resulting in very minimal op- To explore these questions, we profile four bench- portunity for acceleration beyond the methods them- mark suites: SPEC2006 (C) [11], SPECJVM selves. Our intuition is that this may be an artifact of (Java) [12], Dacapo (Java) [2], and Unladen-Swallow a computationally-oriented performance benchmark (Python) [13]. Each source language provides a suite, and is likely not reflective of the overall space slightly different set of potential acceleration targets. of Python workloads. For example, SPEC2006 is written in C and offers 4 Conclusion two target granularities: individual functions or entire applications. In contrast, a Java benchmark of- Our analyses of SPEC2006 confirm what C- fers three granularities: methods, classes (i.e., all of cores [14], ECO-cores [10], and DYSER [5] also found: the methods for a particular class), and entire appli- that when accelerating unstructured C code, the best cations. We classify each of these potential targets targets are large swaths of highly-application-specific as fine , medium , or coarse granularity according to code. Our Java analyses indicate some hope for com- Table 1. mon acceleration targets in classes, though the ad- For each class of acceleration targets, we sort the vantage of targeting classes over individual methods targets by decreasing execution time across the en- appears modest. Across the board, our data show tire benchmark suite. Assuming that building an ac- that filling dark silicon with specialized accelerators celerator for a particular target (1) provides infinite will require systems containing tens or even hundreds speedup of the target, and (2) incurs no data or con- of accelerators. In light of this, we believe the infras- trol transfer overhead upon invocation or return, we tructure associated with these accelerators (e.g., net- compute an upper bound on the speedup of the over- works, memory models [7, 9, 8], and toolchains[14]) all suite for the most costly target(s). We repeat this will only increase in importance. 1

Figure 1: Max speedup of benchmark suite for { fine, medium, and coarse } -granular acceleration targets. References for high-performance, low-power accelerator- [1] C. Cascaval et al. A taxonomy of accelerator ar- based systems. IEEE Computer Architecture chitectures and their programming models. IBM Letters , 9(2):53–56, Feb. 2010. Journal of Research and Development , 54(5):1– [9] B. Saha et al. Programming model for a hetero- 10, 2010. geneous x86 platform. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming [2] The Dacapo Benchmark Suite. http:// Language Design and Implementation (PLDI) . dacapobench.org/ . ACM, 2009. [3] H. Esmaeilzadeh et al. Dark silicon and the end [10] J. Sampson et al. Efficient complex operators of multicore scaling. In ISCA , pages 365–376, for irregular codes. In Proceedings of the 17th 2011. International Symposium on High Performance [4] N. Goulding-Hotta et al. GreenDroid : A mobile Computer Architeture (HPCA) , pages 491–502. application processor for a future of dark silicon. ACM, Feb 2011. IEEE Micro , 31(2):86–95, 2011. [11] Standard Performance Evaluation Corporation. [5] V. Govindaraju et al. Dynamically specialized http://www.spec.org/cpu2006/ . datapaths for energy efficient computing. In [12] Standard Performance Evaluation Corporation. HPCA , pages 503–514, 2011. http://www.spec.org/jvm2008/ . [6] R. Hameed et al. Understanding sources of inef- [13] Unladen Swallow Benchmarks. http: ficiency in general-purpose chips. In ISCA , pages //code.google.com/p/unladen-swallow/ 37–47, June 2010. wiki/Benchmarks . [7] J. Kelm et al. Cohesion: a hybrid memory model [14] G. Venkatesh et al. Conservation cores: reducing for accelerators. In ISCA , pages 429–440, June the energy of mature computations. In ASPLOS , 2010. pages 205–218, Mar. 2010. [8] M. Lyons et al. The accelerator store framework 2

Acceleration Targets: A Study of Popular Benchmark Suites Lisa Wu and Martha A. Kim Columbia University June 10, 2012 Tuesday, June 19, 2012

The story starts like this: Princess Ruruna and her helper Cain have a problem: To face dark silicon head on, they want to find applications that have acceleration potential. But what can they do to tackle the problem? Let’s see what Tico the fairy has to say... But what should we accelerate? Columbia University 2 Tuesday, June 19, 2012

Let’ s start by looking at some popular benchmark suites 1 ! Do the benchmarks exhibit any common functionality? ! If so, is it at or above the function level? Columbia University 3 Tuesday, June 19, 2012

I will profile SPEC2006 and see if I can To get a 10X speedup, we answer this question... need to accelerate over 189 unique functions! /@AB$##("C44D12"E9" ):;<%##=" !###" !####" If the hottest function runs lightening fast, how much faster would the suite be? 539<F49" ,-."/011230"45"/3671" &'(")*++,-*"./")-01+" !###" !##" !##" /-36>.3" !#" !#" !" !" #" !##" $##" %##" &##" '##" (##" )##" *##" +##" !###" #" $##" !###" !$##" %###" %$##" 896:31";<<1=1>-74>?" 2304-+"566+7+8'1.89" Columbia University 4 Tuesday, June 19, 2012

Hmm... Good! It only takes 21! Oh wait...we need to accelerate 21 ):;<%##=" different applications for a 12X 21 different applications *;<=$##>"?//@,-"A4" !####" speedup?! What if we !###" accelerated a bigger target? &'(")*++,-*"./")-01+" !###" '()"*+,,-.+"/0"*.12," 0.47B/4" !##" (++817(B/4" !##" /-36>.3" '**706'>.3" !#" !#" !" !" #" $##" !###" !$##" %###" %$##" #" $%" %#" &%" !##" !$%" !%#" !&%" $##" 2304-+"566+7+8'1.89" 3415.,"677,8,9(2/9:" Columbia University 5 Tuesday, June 19, 2012

It seems that 2 SPEC2006 cannot be accelerated easily... ! What about benchmark suites that are not written in C? ! What impact does the language or programming environment have on acceleration potential? How about other benchmark suites? Columbia University 6 Tuesday, June 19, 2012

fine Each source language provides a slightly different set of potential medium acceleration targets. coarse Benchmark Granularity Suite fine medium coarse - e SPEC2006 function – application SPECJVM method class package DACAPO method class package - function – object UNLADEN-SWALLOW er- . Table 1: Acceleration Targets for Each Suite Columbia University 7 Tuesday, June 19, 2012

Acceleration Targets: A Study of Popular Benchmark Suites Lisa Wu - PDF document

Acceleration Targets: A Study of Popular Benchmark Suites Lisa Wu and Martha A. Kim, Department of Computer Science, Columbia University { lisa,martha } @cs.columbia.edu 1 Introduction Benchmark Granularity Suite fine medium coarse Using

TOTAL NUMBER OF SUITES 26 7 JUNIOR SUITES 2 JUNIOR SUITES FOR PEOPLE WHO NEED SPECIAL CARE 4

TOTAL NUMBER OF SUITES 26 7 JUNIOR SUITES 2 JUNIOR SUITES FOR PEOPLE WHO NEED SPECIAL CARE 4

Benchmark suites to measure Motivation computer performance Benchmarking overview

SUITES P A R K H Y A T T C H I C A G O the SUITES P A R K S U I T E E X E C U T I V E S U I

Deluxe Studio Suites Anemi DELUXE Studios Our Deluxe Studio suites have been carefully designed

PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites Christian

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

SLO Growth Targets How to determine & set growth targets Todays Learning Targets I CAN

Osmocom TTCN-3 Test Suites Harald Welte <laforge@gnumonks.org> Osmocom TTCN-3 Test Suites

Medicaid Benchmark Options Analysis Stakeholder Advisory Committee July 23, 2012 Overview

The HPC Challenge Benchmark: The HPC Challenge Benchmark: A Candidate for Replacing A Candidate

Lecture: Metrics to Evaluate Performance Topics: Benchmark suites, Performance equation,

Toward Standard Formats and Benchmark Suites for Floating Point Tools Zach Tatlock

Acceleration at North Allegheny Mathematics Acceleration (Elementary) Students may qualify for

Particle Driven Acceleration Experiments Edda Gschwendtner CAS, Plasma Wake Acceleration 2014 2

Motion with Constant Acceleration 1 Particle Under Constant Acceleration In the case of motion

Entity Linking and Coreference Resolution CSCI 699 Instructor: Xiang Ren USC Computer Science

As we have discussed any church or religion that works at

Chapter 3 Relational algebra and calculus 1

Research Professionals Job Classifications March 2015 Objectives for today Review goals and

Taming the Beast Assess Kerberos-Protected Networks Emmanuel Bouillon Introduction

1. If I like it, its MINE 2. If its in my hand, its MINE 3. If I can take it from

Quantum computers: the future attack that breaks todays messages Daniel J. Bernstein &

Digital Devices and Distracted Minds: Evaluating evidence of the relationship between media use

Acceleration Targets: A Study of Popular Benchmark Suites Lisa Wu - PDF document

Acceleration Targets: A Study of Popular Benchmark Suites Lisa Wu and Martha A. Kim, Department of Computer Science, Columbia University { lisa,martha } @cs.columbia.edu 1 Introduction Benchmark Granularity Suite fine medium coarse Using

TOTAL NUMBER OF SUITES 26 7 JUNIOR SUITES 2 JUNIOR SUITES FOR PEOPLE WHO NEED SPECIAL CARE 4

TOTAL NUMBER OF SUITES 26 7 JUNIOR SUITES 2 JUNIOR SUITES FOR PEOPLE WHO NEED SPECIAL CARE 4

Benchmark suites to measure Motivation computer performance Benchmarking overview

SUITES P A R K H Y A T T C H I C A G O the SUITES P A R K S U I T E E X E C U T I V E S U I

Deluxe Studio Suites Anemi DELUXE Studios Our Deluxe Studio suites have been carefully designed

PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites Christian

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

SLO Growth Targets How to determine &amp; set growth targets Todays Learning Targets I CAN

Osmocom TTCN-3 Test Suites Harald Welte &lt;laforge@gnumonks.org&gt; Osmocom TTCN-3 Test Suites

Medicaid Benchmark Options Analysis Stakeholder Advisory Committee July 23, 2012 Overview

The HPC Challenge Benchmark: The HPC Challenge Benchmark: A Candidate for Replacing A Candidate

Lecture: Metrics to Evaluate Performance Topics: Benchmark suites, Performance equation,

Toward Standard Formats and Benchmark Suites for Floating Point Tools Zach Tatlock

Acceleration at North Allegheny Mathematics Acceleration (Elementary) Students may qualify for

Particle Driven Acceleration Experiments Edda Gschwendtner CAS, Plasma Wake Acceleration 2014 2

Motion with Constant Acceleration 1 Particle Under Constant Acceleration In the case of motion

Entity Linking and Coreference Resolution CSCI 699 Instructor: Xiang Ren USC Computer Science

As we have discussed any church or religion that works at

Chapter 3 Relational algebra and calculus 1

Research Professionals Job Classifications March 2015 Objectives for today Review goals and

Taming the Beast Assess Kerberos-Protected Networks Emmanuel Bouillon Introduction

1. If I like it, its MINE 2. If its in my hand, its MINE 3. If I can take it from

Quantum computers: the future attack that breaks todays messages Daniel J. Bernstein &amp;

Digital Devices and Distracted Minds: Evaluating evidence of the relationship between media use

SLO Growth Targets How to determine & set growth targets Todays Learning Targets I CAN

Osmocom TTCN-3 Test Suites Harald Welte <laforge@gnumonks.org> Osmocom TTCN-3 Test Suites

Quantum computers: the future attack that breaks todays messages Daniel J. Bernstein &