Profiling a warehouse-scale computer Svilen Kanev Harvard - PowerPoint PPT Presentation

Profiling a warehouse-scale computer Svilen Kanev Harvard University Juan Pablo Darago Universidad de Buenos Aires Kim Hazelwood Yahoo Labs Parthasarathy Ranganathan, Tipp Moseley Google Inc. Gu-Yeon Wei, David Brooks Harvard University

The cloud is here to stay [http://google.com/trends, 2015] 2

Warehouse-scale computers (of yore) datacenters built around a few “killer workloads” problem sizes >> 1 machine ... distributed, but tightly interconnected services communication through remote-procedure calls (RPCs) 3

Now “the datacenter is the computer” (the WSC model has caught on) “microservice architecture” Did you mean: #pldi15 thousands of services are “one RPC away” “... about a hundred of services that comprise Siri’s backend...” [Apple, Mesos meetup 2015] frequency[“#isca15”]++ 4

How do modern WSC applications interact with hardware? And what does that imply for future server processors?

Traditional profiling: load testing Isolate a service Find representative inputs Find representative operating point Profile / optimize Repeat 6

Live datacenter-scale profiling (Google-wide profiling) Select random production machines ~20,000 / day Profile each one (for a while) without isolation while running live traffic for billions of users GWP DB Aggregate days, weeks, years worth of execution [Ren et al. Google-wide profiling , 2010] 7

Live WSC profiling insights Where are cycles spent in a datacenter? Are there really no killer applications? How do WSC applications interact with instruction caches? How much ILP is there? Big / small cores? DRAM latency vs. bandwidth? Hyperthreading? 8

Where are WSC cycles spent?

No “killer” application to optimize for [1 week of sampled WSC cycles] Instead: a long tail of various different services 10

Ongoing application diversification [~3 years of sampled WSC cycles] Optimizing hardware one-application-at-a-time has diminishing returns 11

Within applications: no hotspots [search leaf node; 1 week of cycles] Corollary: hunting for per-application hotspots is not justified 12

Hotspots across applications: “datacenter tax’’ Shared low-level routines; typical for larger-than-1-server problems 13

Hotspots across applications: “datacenter tax’’ Only 6 self-contained routines account for ~30% of WSC cycles Prime candidates for accelerators in server SoCs 14

Live WSC profiling insights Where are cycles spent in a datacenter? Everywhere. Are there really no killer applications? Datacenter tax. How do WSC applications interact with instruction caches? How much ILP is there? Big / small cores? DRAM latency vs. bandwidth? Hyperthreading? 15

Microarchitecture: WSC i-cache pressure

Severe instruction cache bottlenecks 20,000 Intel IvyBridge servers 15-30% of core cycles wasted on 2 days instruction-supply stalls Top-Down analysis [Yasin 2014] 17

Severe instruction cache bottlenecks 15-30% of core cycles wasted on Fetching instructions from L3 caches instruction-supply stalls Very high i-cache miss rates 10x the highest in SPEC 50% higher than CloudSuite Lots of lukewarm code 100s MBs of instructions per binary; no hotspots 18

A problem in the making I-cache working sets 4-5x larger than largest in SPEC Growing almost 30% / year significantly faster than i-caches One solution: L2 i/d partitioning 19

Live WSC profiling insights Where are cycles spent in a datacenter? Everywhere. Are there really no killer applications? Datacenter tax. How do WSC applications interact with instruction caches? Poorly. How much ILP is there? Big / small cores? Bimodal. DRAM latency vs. bandwidth? Latency. Hyperthreading? Yes. 20

To sum up A growing number of programs cover “the world’s WSC cycles”. There is no “killer application”, and hand-optimizing each program is suboptimal. Low-level routines (datacenter tax) are a surprisingly high fraction of cycles. Good candidates for accelerators in future server processors. Common microarchitectural footprint: working sets too large for i-caches; many d- cache stalls; generally low IPC; bimodal ILP; low memory bandwidth utilization.

Profiling a warehouse-scale computer Svilen Kanev Harvard - PowerPoint PPT Presentation

Profiling a warehouse-scale computer Svilen Kanev Harvard University Juan Pablo Darago Universidad de Buenos Aires Kim Hazelwood Yahoo Labs Parthasarathy Ranganathan, Tipp Moseley Google Inc. Gu-Yeon Wei, David Brooks Harvard University

Financial Data Financial Data Financial Data Financial Data Warehouse Warehouse Warehouse

Data Warehouse Update March 19, 2019 Agenda Why a data warehouse? Why THIS data

Europe Manchester, England North America - Factory Lehi, UT HQ & Warehouse Salt Lake

An Overview of Data Warehousing and OLAP T echnology What is a data warehouse? A

Data Warehouse Chronic Conditions Data Warehouse 1 Your source for national CMS Medicare and

Profiling of Data-Parallel Processors Daniel Kruck 09/02/2014 09/02/2014 Profiling Daniel

Leaving no one behind The role of evidence-building and profiling to include displacement in

Expression Profiling Mark Voorhies 4/4/2011 Mark Voorhies Expression Profiling Review

Web User Profiling using Data Redundancy http://aminer.org/profiling Xiaotao Gu, Hong Yang, Jie

COZ : Finding Code that Counts with Causal Profiling Anuja Golechha Agenda Profiling

Optimization Profiling VisualVM Exercise Meme Credit: Randall Munroe, hrefhttp://xkcd.comxkcd

Profiling of Algorithms Profiling refers to the experimental measurement of the performance of

An introduction to Profiling Physics Coding Club: 09/06/2017 D. Dickinson

Request-Level and Data-Level Parallelism in Warehouse-Scale Computers 1 MO401 2013 Tpicos

Data Warehouse of German Federal Police From Raw Data to Flexible Analytics Data Warehouse

MARKET LEADER OF WAREHOUSE REAL ESTATE 8 % of the market of class A warehouse property of Russia

When Ensembling Smaller Models is More Effjcient than Single Large Models WebVision 2020 Dan

15-11-2019 Department of Veterinary and Animal Sciences Linear programming Anders Ringgaard

hagiography (noun) CMU SCS ChristosTheGreekGodofDatabases.com Pinterest meets Causal

Places that Fail and Endogenous Institutions David K. Levine and Salvatore Modica June 2014 1

Indexcompressionand efgicientqueryprocessing COMP90042 LECTURE 3, THE UNIVERSITY OF MELBOURNE by

Predicate Logic: Peano Arithmetic Alice Gao Lecture 20 CS 245 Logic and Computation Fall 2019

Purpose-Driven Performance 2017 Results and 2018 Guidance Feb. 16, 2018 Cautionary Statements

Foundation Support for Lobbying and Other Advocacy WEBINAR | PART 2 December 8, 2016 501(c)(4)

Sambuz

Useful Links

Newsletter

Mail Us

Profiling a warehouse-scale computer Svilen Kanev Harvard - PowerPoint PPT Presentation

Profiling a warehouse-scale computer Svilen Kanev Harvard University Juan Pablo Darago Universidad de Buenos Aires Kim Hazelwood Yahoo Labs Parthasarathy Ranganathan, Tipp Moseley Google Inc. Gu-Yeon Wei, David Brooks Harvard University

Financial Data Financial Data Financial Data Financial Data Warehouse Warehouse Warehouse

Data Warehouse Update March 19, 2019 Agenda Why a data warehouse? Why THIS data

Europe Manchester, England North America - Factory Lehi, UT HQ &amp; Warehouse Salt Lake

An Overview of Data Warehousing and OLAP T echnology What is a data warehouse? A

Data Warehouse Chronic Conditions Data Warehouse 1 Your source for national CMS Medicare and

Profiling of Data-Parallel Processors Daniel Kruck 09/02/2014 09/02/2014 Profiling Daniel

Leaving no one behind The role of evidence-building and profiling to include displacement in

Expression Profiling Mark Voorhies 4/4/2011 Mark Voorhies Expression Profiling Review

Web User Profiling using Data Redundancy http://aminer.org/profiling Xiaotao Gu, Hong Yang, Jie

COZ : Finding Code that Counts with Causal Profiling Anuja Golechha Agenda Profiling

Optimization Profiling VisualVM Exercise Meme Credit: Randall Munroe, hrefhttp://xkcd.comxkcd

Profiling of Algorithms Profiling refers to the experimental measurement of the performance of

An introduction to Profiling Physics Coding Club: 09/06/2017 D. Dickinson

Request-Level and Data-Level Parallelism in Warehouse-Scale Computers 1 MO401 2013 Tpicos

Data Warehouse of German Federal Police From Raw Data to Flexible Analytics Data Warehouse

MARKET LEADER OF WAREHOUSE REAL ESTATE 8 % of the market of class A warehouse property of Russia

When Ensembling Smaller Models is More Effjcient than Single Large Models WebVision 2020 Dan

15-11-2019 Department of Veterinary and Animal Sciences Linear programming Anders Ringgaard

hagiography (noun) CMU SCS ChristosTheGreekGodofDatabases.com Pinterest meets Causal

Places that Fail and Endogenous Institutions David K. Levine and Salvatore Modica June 2014 1

Indexcompressionand efgicientqueryprocessing COMP90042 LECTURE 3, THE UNIVERSITY OF MELBOURNE by

Predicate Logic: Peano Arithmetic Alice Gao Lecture 20 CS 245 Logic and Computation Fall 2019

Purpose-Driven Performance 2017 Results and 2018 Guidance Feb. 16, 2018 Cautionary Statements

Foundation Support for Lobbying and Other Advocacy WEBINAR | PART 2 December 8, 2016 501(c)(4)

Sambuz

Useful Links

Newsletter

Mail Us

Europe Manchester, England North America - Factory Lehi, UT HQ & Warehouse Salt Lake