Balance Principles for Algorithm-Architecture Co-design Kent - PowerPoint PPT Presentation

Balance Principles for Algorithm-Architecture Co-design Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) May 31, 2011 Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) Balance Principles for Algorithm-Architecture Co-design

Position Position : Principles (i.e, “theory”) informing practice (co-design) Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) Balance Principles for Algorithm-Architecture Co-design

Position Position : Principles (i.e, “theory”) informing practice (co-design) Hardware/Software Co-design? Algorithm-Architecture Co-design? Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) Balance Principles for Algorithm-Architecture Co-design

Position Position : Principles (i.e, “theory”) informing practice (co-design) For some computation to scale efficiently on a future parallel processor: 1. Allocation of cores? 2. Allocation of cache? 3. How must latency/bandwidth increase to compensate? Or alternatively, given a particular parallel architecture, what classes of computations will perform efficiently? Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) Balance Principles for Algorithm-Architecture Co-design

Why theoretical models? The best alternative (and perhaps the “status quo”) in co-design is to put together a model of your chip and simulate your algorithm. Very accurate, but by this point you’ve already invested lots of time and effort into a specific design. Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) Balance Principles for Algorithm-Architecture Co-design

Why theoretical models? We advocate a more principled approach that can model the performance of a processor based on some of its most high-level characteristics known to be the main bottlenecks (communication, parallel scalability)... Such a model can be refined and extended as needed, i.e based on cache characteristics, heterogeneity of the cores Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) Balance Principles for Algorithm-Architecture Co-design

Balance We define balance as: For some algorithm : T mem ≤ T comp 1 For principled analysis, we need theoretical models for T mem , T comp To be relevant for current/future processors, these models must integrate: 1. Parallelism 2. Cache/Memory Locality 1 Similar to classical notions of balance: [Kung 1986], [Callahan, et al 1988], [McCalpin 1995] Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) Balance Principles for Algorithm-Architecture Co-design

Why Balance? Importance of considering balance: 1. Inevitable trend towards imbalance: peak flops outpacing memory hierarchy. 2. Imbalance may be nonintuitive (make an improvement to some aspect of a chip without realizing that other areas must also improve to compensate) — for a particular algorithm Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) Balance Principles for Algorithm-Architecture Co-design

Why Balance? Balance is a particularly powerful lens for maintaining more realistic expectations for performance. Processor makers present raw figures for performance: peak flops, memory specs– very one-dimensional figures on their own. (i.e CPU vs. GPU wars) Balance marries the two in a way that allows parallel scalability to also enter the picture– and recognizes that not all architectures are suitable for all applications. Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) Balance Principles for Algorithm-Architecture Co-design

Assumptions For our particular “principled” approach we use two models: T mem : External Memory Model (I/O Model) T comp : Parallel DAG Model / Work-Depth Model For these models alone to be expressive we have assumptions... 1. We are modeling work on a single socket. n is large enough to not fit completely in the outer level of cache. 2. For our algorithm, we can easily deduce the structure of a dependency DAG for any n 3. The developer can overlap computation and communication arbitrarily well 4. Communication costs are dominated by misses between cache and RAM( ∴ T comm ∝ cache misses = Q(n)). Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) Balance Principles for Algorithm-Architecture Co-design

Parallel DAG Model for T comp ( T mem ≤ T comp ) 2 Inherent parallelism: W ( n ) D ( n ) . . . spectrum between embarrassingly parallel and inherently sequential (application: CPA) Desired: work optimality, maximum parallelism 2 Source: Blelloch: Parallel Algorithms Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) Balance Principles for Algorithm-Architecture Co-design

Parallel DAG Model for T comp ( T mem ≤ T comp ) Brents Theorem [1974]: Maps DAG model to PRAM model T p ( n ) = O ( D ( n ) + W ( n ) ) p Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) Balance Principles for Algorithm-Architecture Co-design

Parallel DAG Model for T comp ( T mem ≤ T comp ) We model T comp with: T comp ( n ; p , C 0 ) = ( D ( n ) + W ( n ) ) · 1 p C 0 This gives us a lower bound that an optimally-crafted algorithm could theoretically achieve. Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) Balance Principles for Algorithm-Architecture Co-design

I/O Model for T mem ( T mem ≤ T comp ) Q ( n ; Z , L ): Number of cache misses. Thus, the volume of data transferred is Q ( n ; Z , L ) × L Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) Balance Principles for Algorithm-Architecture Co-design

I/O Model for T mem ( T mem ≤ T comp ) Our intensity is thus W ( n ) Q ( n ; Z , L ) × L Desired: minimize work (work-optimality) while maximizing intensity (by minimizing cache complexity). Intensity on its own is very descriptive: intuitively we know that high-intensity operations such as matrix multiply perform well on GPUs, whereas low-intensity vector operations perform poorly. “ W ” and “ Q ” underly this behavior Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) Balance Principles for Algorithm-Architecture Co-design

I/O Model: Matrix Multiply Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) Balance Principles for Algorithm-Architecture Co-design

I/O Model for T mem ( T mem ≤ T comp ) We model T mem with: T mem ( n ; p , Z , L , α, β ) = α · D ( n ) + Q p ; Z , L ( n ) · L β Q . . . # of cache misses C 0 . . . # of cycles per second p . . . # of cores Z . . . cache size (bytes) L . . . line size (bytes) α . . . latency (s) β . . . bandwidth (bytes/s) Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) Balance Principles for Algorithm-Architecture Co-design

I/O Model for T mem ( T mem ≤ T comp ) We model T mem with: T mem ( n ; p , Z , L , α, β ) = α · D ( n ) + Q p ; Z , L ( n ) · L β Q 1 , sequential cache complexity, is well known for most algorithms. Q p , parallel cache complexity, must be separately derived, but can be directly obtained from Q 1 if certain scheduling principles are followed. Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) Balance Principles for Algorithm-Architecture Co-design

I/O Model for T mem ( T mem ≤ T comp ) We model T mem with: T mem ( n ; p , Z , L , α, β ) = α · D ( n ) + Q p ; Z , L ( n ) · L β 3 3 Blelloch, Gibbons, Simhadri (2010). Low-depth cache-oblivious algorithms. Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) Balance Principles for Algorithm-Architecture Co-design

T comp , T mem T mem ≤ T comp Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) Balance Principles for Algorithm-Architecture Co-design

T comp , T mem : After some algebra T mem ≤ T comp Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) Balance Principles for Algorithm-Architecture Co-design

Projections Irony, et. al: Parallel Matrix Multiply Bound: W ( n ) Q p ; Z , L ( n ) ≥ √ � 2 · L Z / p ∴ Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) Balance Principles for Algorithm-Architecture Co-design

Balance Principles for Algorithm-Architecture Co-design Kent - PowerPoint PPT Presentation

Balance Principles for Algorithm-Architecture Co-design Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) May 31, 2011 Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna

Query Processing Query Processing Steps balance < 2500 ( balance ( account)) balance

GASB 54 Fund Balance GASB 54 GASB 54 GASB 54 Fund Balance Fund Balance Fund Balance

Fund Balance Available Fund Balance, Fund Balance Policies, And GASB 54 State and Local

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

CH.5. BALANCE PRINCIPLES Continuum Mechanics Course (MMC) - ETSECCPB - UPC Overview Balance

EMPHASIS OBJECTIVE The students will work to emphasize their focal point using the following

WATER BALANCE DISCUSSION PRESENTATION Pieter Kriel WATER BALANCE IN THE VOLUMETRIC BALANCE OF

APP- The Barometer BALANCE Its all about Balance Balance is the Key to Perfect

Radiation Balance at TOA Radiation Balance at TOA We conclude with a brief survey of some of the

SESSION 3A: BALANCE SHEET COMPARISONS Accounting for Finance Balance Sheet: A Life Cycle

Energy Balance Estimates of ET Energy Balance Estimates of ET ET is calculated as a component of

Design Principles The goals of good design Goals What kind of devices do we want to make?

Principles Principles Principles Principles of a well of a well of a well of a well- - -

Algorithm Analysis October 12, 2016 CMPE 250 Algorithm Analysis October 12, 2016 1 / 66

Purpose, Function, and Design Purpose, Function, and Design Purpose, Function, and Design

Software Design Principles and Guidelines

World Investor Week TSP Withdrawals Your TSP Account, Fees and Options When Retiring or Leaving

International Monetary Policy 11 Balance of Payments and National Accounting 1 Michele Piffer

1A88 All Mechanical PUSH-OPEN WITH Design SILENT SOFT- CLOSING UNDERMOUNT SLIDE Worlds

Partitioning Decompose computation into tasks to equi-distribute the data and work, minimize

Holger Langkabel Introduction: Confounding in Non-Randomized Settings Assessing Balance The

Stateful Cloud Computing Applications Bo Sang (Purdue University, Ant Financial Services Group),

$ Lesson Five Credit Cards 04/09 applying for a credit card costs: Annual Percentage Rate

Internal Load Balancing in 5 mins Deliver scalable and resilient internal-only services on GCP

Balance Principles for Algorithm-Architecture Co-design Kent - PowerPoint PPT Presentation

Balance Principles for Algorithm-Architecture Co-design Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, Richard Vuduc (Georgia Tech) May 31, 2011 Kent Czechowski, Casey Battaglino, Chris McClanahan, Aparna

Query Processing Query Processing Steps balance &lt; 2500 ( balance ( account)) balance

GASB 54 Fund Balance GASB 54 GASB 54 GASB 54 Fund Balance Fund Balance Fund Balance

Fund Balance Available Fund Balance, Fund Balance Policies, And GASB 54 State and Local

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

CH.5. BALANCE PRINCIPLES Continuum Mechanics Course (MMC) - ETSECCPB - UPC Overview Balance

EMPHASIS OBJECTIVE The students will work to emphasize their focal point using the following

WATER BALANCE DISCUSSION PRESENTATION Pieter Kriel WATER BALANCE IN THE VOLUMETRIC BALANCE OF

APP- The Barometer BALANCE Its all about Balance Balance is the Key to Perfect

Radiation Balance at TOA Radiation Balance at TOA We conclude with a brief survey of some of the

SESSION 3A: BALANCE SHEET COMPARISONS Accounting for Finance Balance Sheet: A Life Cycle

Energy Balance Estimates of ET Energy Balance Estimates of ET ET is calculated as a component of

Design Principles The goals of good design Goals What kind of devices do we want to make?

Principles Principles Principles Principles of a well of a well of a well of a well- - -

Algorithm Analysis October 12, 2016 CMPE 250 Algorithm Analysis October 12, 2016 1 / 66

Purpose, Function, and Design Purpose, Function, and Design Purpose, Function, and Design

Software Design Principles and Guidelines

World Investor Week TSP Withdrawals Your TSP Account, Fees and Options When Retiring or Leaving

International Monetary Policy 11 Balance of Payments and National Accounting 1 Michele Piffer

1A88 All Mechanical PUSH-OPEN WITH Design SILENT SOFT- CLOSING UNDERMOUNT SLIDE Worlds

Partitioning Decompose computation into tasks to equi-distribute the data and work, minimize

Holger Langkabel Introduction: Confounding in Non-Randomized Settings Assessing Balance The

Stateful Cloud Computing Applications Bo Sang (Purdue University, Ant Financial Services Group),

$ Lesson Five Credit Cards 04/09 applying for a credit card costs: Annual Percentage Rate

Internal Load Balancing in 5 mins Deliver scalable and resilient internal-only services on GCP

Query Processing Query Processing Steps balance < 2500 ( balance ( account)) balance