PACE: Power-Aware Computing Engines Krste Asanovic Saman - PowerPoint PPT Presentation

PACE: Power-Aware Computing Engines Krste Asanovic Saman Amarasinghe Martin Rinard Computer Architecture Group MIT Laboratory for Computer Science http://www.cag.lcs.mit.edu/

PACE Approach Energy- Conscious Compilers Rethink Hardware-Software Interface for Power-Aware Computing Energy- - Energy Exposed Exposed Architectures Architectures

Conventional Architectures only Expose Performance ÷ � �� Current RISC/VLIW ISAs only expose hardware features that affect critical path through computation

Energy Consumption is Hidden �� ÷ � �� Most energy is consumed in microarchitectural operations that are �� from software!

Energy-Exposed Instruction Sets Reward compile-time knowledge with run-time energy savings – hardware provides mechanisms to disable microarchitectural activity, a �� – compile-time analysis determines which pieces of microarchitecture can be disabled for given application ⇒ Co-develop energy-exposed architectures and energy-conscious compilers

Energy Management Layers Application Algorithm Source Code Compiler Run-Time/O.S. PACE Focus Areas Instruction Set Microarchitecture Circuit Design Fabrication Technology

SCALE Strawman Processor • 32 processing tiles • Fast on-chip data network • 128x32b FLOP/cycle total • 4096x8b OP/cycle total • 128MB on-chip DRAM/16MB SRAM /O • External DRAM interface • Chip-to-chip interconnect channels • 20x20mm 2 in 0.1 µ m CMOS Tile Bulk SRAM/ Data Addr. Cntl. Embedded DRAM Unit Unit Unit SRAM/cache Off-chip DRAM Data Net

SCALE Processor Tile Details Data Unit Address Unit Control Unit FP Multiplier AALU0 CALU DALU0 ARegs C Regs DReg0 64x64b 16x32b 16x32b DALU1 VLIW Inst. Fetch DReg1 64x64b AALU1 and &Decode Memory DALU2 Config. Management DReg2 64x64b Inst. Buffer Cache Tag PC DALU3 B Regs DReg3 64x64b Store 8x32b FP Adder BALU Data Address/Data Interconnect Net 32KB SRAM (16 banks x 256 words x 64 bits)

SCALE Supports All Forms of Parallelism �� Vector – most streaming applications highly vectorizable Vector Control Instructions – vectors reduce instruction fetch/decode energy up to 20-60x (depends on vector length) – mature programming and compilation model Cntl. Addr. Data ⇒ SCALE supports vectors in hardware Unit Unit Unit – address and data units optimized for vectors – hardware vector control logic VLIW �� Program VLIW Cache – exploit instruction-level parallelism for non- Counter vectorizable applications – superscalar ILP expensive in hardware ⇒ SCALE supports VLIW-style ILP Cntl. Addr. Data – reuse address and data unit datapath resources Unit Unit Unit – expose datapath control lines – single wide instruction = configuration – provide control/configuration cache distributed along datapaths Thread 1Thread 2 �� – run separate threads on different tiles Thread 3Thread 4 – any mix of vector or VLIW across tiles

SCALE Exposes Locality at Multiple Levels � 2D Tile and DRAM layout � software maps computation to minimize network hops � Local SRAM within tile � software split between instruction/data/unified storage � software scratchpad RAMs or hardware-managed caches � Distributed cached control state within tile � control unit: instruction buffer � data/address unit: vector instructions or VLIW/configuration cache � Distributed register file and ALU clusters within tile � Control Unit: scalar (C) registers versus branch (B) registers � Address Unit: address (A) registers � Data Unit: Four clusters of data registers (D0-D4) � Accumulators and sneak paths to bypass register files

SCALE Software Power Grid � Turn off unused register banks and ALUs � Reduce datapath width � set width separately for each unit in tile (e.g., 32b in control unit, 16b in address unit, 64b in data unit) � Turn off individual local memory banks � Configure memory addressing model � From hardware cache-coherence to local scratchpad RAM � Turn off idle tiles and idle inter-tile network segments � Turn off refresh to unused DRAM banks

Existing Infrastructure � RAW Compiler Technology � SUIF-based C/FORTRAN compiler for tiled arrays � SPAN pointer analysis � Bitwise bitwidth analysis � Superword Level Parallelism � Space/Time scheduling � MAPS compiler-managed memory system � Pekoe Low-Power Microprocessor Library Cells � Full-custom processor blocks in 0.25 µ m CMOS process � Designed for voltage-scaled operation � SyCHOSys Energy-Performance Simulator � Fast, multi-level compiled simulation � Energy models for Pekoe processor blocks

Bitwidth Analysis Compile-time detection of minimum bitwidth required for � each variable at every static location in the program A collection of techniques � Arithmetic operations – Boolean operations – Bitmask operations – Loop induction variable bounding – Clamping optimization – Type promotion – Back propagation – Array index optimization – Value-range propagation using data-flow analysis � Loop analysis � Incorporated pointer alias analysis � Paper in PLDI’00 �

Bitwidth Power Savings (C ⇒ ASIC Synthesis) � Methodology � C → RTL � RTL simulation gives switching � Synthesis tool reports dynamic power � IBM SA27E process, 0.15 µ m drawn, 200 MHz 5 Base case Average Dynamic Power (mW) 4.5 Bitwidth analysis 4 3.5 3 2.5 2 1.5 1 0.5 0 b bblesort histogram jacobi pmatch

SyCHOSys Energy-Performance Simulation � SyCHOSys compiles a custom cycle simulator from a structural machine description � Supports gate level to behavioral level, or any mixture � Behavior specified in C++, compiles to C++ object � Can selectively compile in transition counting on nets � Automatically factors out common counts for faster simulation � Arbitrary energy models for functional units/memories � Capacitances extracted from circuit layout or estimated � Use fast bit-parallel structural energy models (much faster than lookups) � Paper in Complexity-Effective Workshop, ISCA’00

SyCHOSys Evaluation � GCD circuit benchmark � full-custom datapath layout (0.25 µ m TSMC CMOS process) � mixture of static and precharged blocks Simulation Speed Error in power (Hz) prediction C-Behavioral (gcc) 109,000,000.00 N/A Verilog-Behavioral (VCS) 544,000.00 N/A Verilog-Structural (VCS) 341,000.00 N/A SyCHOSys-Structural 8,000,000.00 N/A SyCHOSys-Power 195,000.00 0.5% - 8.2% PowerMill (extracted layout) 0.73 7.2% - 13.7% Star-Hspice (extracted layout) 0.01 0%(reference)

SyCHOSys Processor Model � Five-stage pipelined MIPS RISC processor+caches � User/kernel mode, precise interrupts, validated with architectural test suite+random test programs � Runs SPECint95 benchmarks � Simulation speeds (Sun Ultra-5, 333MHz workstation) � (ISA-level interpreter 3 MHz) � Behavioral RTL 400kHz � Structural model 40kHz � Energy model 16kHz � A Gigacycle/CPU-day or Megacycle/CPU-minute with better accuracy than Powermill

PACE Milestones � Year 2000: Baseline design � Baseline SCALE architecture definition � RAW compiler generating code for baseline SCALE design � Baseline SCALE architecture energy-performance simulator � Year 2001: Single tile � Energy-exposed SCALE tile architecture definition � Energy-conscious compiler passes for SCALE tile � Energy-exposed SCALE tile energy-performance simulator � Evaluation of energy-exposed SCALE tile � Year 2002: Multi-tile � Energy-exposed SCALE multi-tile architecture definition � Multi-tile energy-performance simulator � Multi-tile energy-conscious compiler passes � Evaluation of multi-tile SCALE processor � (Options: Fabricate SCALE prototype)

PACE: Power-Aware Computing Engines Krste Asanovic Saman - PowerPoint PPT Presentation

PACE: Power-Aware Computing Engines Krste Asanovic Saman Amarasinghe Martin Rinard Computer Architecture Group MIT Laboratory for Computer Science http://www.cag.lcs.mit.edu/ PACE Approach Energy- Conscious Compilers Rethink

PACE i CE in Iowa Liz P Parr rry Nati tional P PACE Associati tion PACE 101 101 What i

Game Engines 1 Overview Game engines are a significant part of the modern games industry

Set 10 Search Engines & SEO Outline How do search engines work? Basic operation

Set 10 Search Engines & SEO Outline How do search engines work? Basic operation

Set11 Search Engines & SEO Outline How do search engines work? Basic operation

Bringi ging P g PACE t CE to New Ha Hampshire Liz P Parr rry Nati tional P PACE Associati

High Impact Educational Practices Presenters: Dr. Anna Shostya, (ashostya@pace.edu), Dr. Joseph

C-PACE in Hampton Roads February 28, 2019 Hampton Roads Chamber of Commerce Mid-Atlantic PACE

PACE Financing Presented by, Lean & Green Michigan Ag Agenda Intro: What is PACE? What is

PACE Program of All-Inclusive Care for the Elderly Sandra J. Yoro, APD PACE Policy Analyst May

Location-Aware Computing Definition: Location-aware applications generate outputs/behaviors

EPAs Air Quality Regulations for Stationary Engines for Stationary Engines Melanie King U.S.

NCC Education and You Study and Communication Skills Your Name Internet Search Engines Date

Why learn how to build recommendation engines? Jamen Long Data Scientist DataCamp Building

Network Query Engines Network Query Engines Craig Knoblock USC Information Sciences Institute 1

Imagine for a moment @trentmwillis Lazy Loading Engines: Anything But Lazy Engines allow

Grupo LALA Third Quarter 2018 Earnings Results Conference Call October 23, 2018 1 DISCLAIMER

Company Profile January 2016 Tradition, technology & innovation January 1995 - From

Justinus A. Satrio, Ph.D. Biomass Resources & Conversion Technologies Laboratory and

From wine pomace and potato wastes to novel PHA- based bio-composites: examples of sustainable

Gear trials in Skagerrak A new pelagic grid SLU Aqua Joakim Hjelm, Andreas Sundelf,

landing obligation 10:45 Coffee break - Focus Group 11:00 PELAC control recommendations 11:30

Opposition to West-Coast Based Pelagic Longline Permits Geoff Shester & Tara Brock November

Management Measures Highly Migratory Species Management Division Fall 2018 Outline Purpose

PACE: Power-Aware Computing Engines Krste Asanovic Saman - PowerPoint PPT Presentation

PACE: Power-Aware Computing Engines Krste Asanovic Saman Amarasinghe Martin Rinard Computer Architecture Group MIT Laboratory for Computer Science http://www.cag.lcs.mit.edu/ PACE Approach Energy- Conscious Compilers Rethink

PACE i CE in Iowa Liz P Parr rry Nati tional P PACE Associati tion PACE 101 101 What i

Game Engines 1 Overview Game engines are a significant part of the modern games industry

Set 10 Search Engines &amp; SEO Outline How do search engines work? Basic operation

Set 10 Search Engines &amp; SEO Outline How do search engines work? Basic operation

Set11 Search Engines &amp; SEO Outline How do search engines work? Basic operation

Bringi ging P g PACE t CE to New Ha Hampshire Liz P Parr rry Nati tional P PACE Associati

High Impact Educational Practices Presenters: Dr. Anna Shostya, (ashostya@pace.edu), Dr. Joseph

C-PACE in Hampton Roads February 28, 2019 Hampton Roads Chamber of Commerce Mid-Atlantic PACE

PACE Financing Presented by, Lean &amp; Green Michigan Ag Agenda Intro: What is PACE? What is

PACE Program of All-Inclusive Care for the Elderly Sandra J. Yoro, APD PACE Policy Analyst May

Location-Aware Computing Definition: Location-aware applications generate outputs/behaviors

EPAs Air Quality Regulations for Stationary Engines for Stationary Engines Melanie King U.S.

NCC Education and You Study and Communication Skills Your Name Internet Search Engines Date

Why learn how to build recommendation engines? Jamen Long Data Scientist DataCamp Building

Network Query Engines Network Query Engines Craig Knoblock USC Information Sciences Institute 1

Imagine for a moment @trentmwillis Lazy Loading Engines: Anything But Lazy Engines allow

Grupo LALA Third Quarter 2018 Earnings Results Conference Call October 23, 2018 1 DISCLAIMER

Company Profile January 2016 Tradition, technology &amp; innovation January 1995 - From

Justinus A. Satrio, Ph.D. Biomass Resources &amp; Conversion Technologies Laboratory and

From wine pomace and potato wastes to novel PHA- based bio-composites: examples of sustainable

Gear trials in Skagerrak A new pelagic grid SLU Aqua Joakim Hjelm, Andreas Sundelf,

landing obligation 10:45 Coffee break - Focus Group 11:00 PELAC control recommendations 11:30

Opposition to West-Coast Based Pelagic Longline Permits Geoff Shester &amp; Tara Brock November

Management Measures Highly Migratory Species Management Division Fall 2018 Outline Purpose

Set 10 Search Engines & SEO Outline How do search engines work? Basic operation

Set 10 Search Engines & SEO Outline How do search engines work? Basic operation

Set11 Search Engines & SEO Outline How do search engines work? Basic operation

PACE Financing Presented by, Lean & Green Michigan Ag Agenda Intro: What is PACE? What is

Company Profile January 2016 Tradition, technology & innovation January 1995 - From

Justinus A. Satrio, Ph.D. Biomass Resources & Conversion Technologies Laboratory and

Opposition to West-Coast Based Pelagic Longline Permits Geoff Shester & Tara Brock November