Motivation Moores Law continues More transistors & memory - PowerPoint PPT Presentation

LMC: Automatic Resource-Aware Program-Optimized Memory Partitioning Hsin-Jung Yang † , Kermin E. Fleming ‡ , Michael Adler ‡ , Felix Winterstein § , and Joel Emer † † Massachusetts Institute of Technology, ‡ Intel Corporation, § Imperial College London, February 22nd, FPGA 2016

Motivation • Moore’s Law continues – More transistors & memory controllers on modern FPGAs • Example: Xilinx VC709: two 4GB DDR3 memories Nallatech 510T: eight 4GB DDR4 memories + 2GB HMC Xeon + FPGA: three memory channels • It is difficult to fully utilize DRAM bandwidth – Co-optimizing application cores and memory systems – Porting an existing design to a new platform • Smaller FPGA -> Larger FPGA • Single FPGA -> Multiple FPGAs

Motivation • Moore’s Law continues – More transistors & memory controllers on modern FPGAs • Example: Xilinx VC709: two 4GB DDR3 memories Nallatech 510T: eight 4GB DDR4 memories + 2GB HMC Xeon + FPGA: three memory channels • It is difficult to fully utilize DRAM bandwidth – Co-optimizing application cores and memory systems – Porting an existing design to a new platform • Smaller FPGA -> Larger FPGA • Single FPGA -> Multiple FPGAs Goal: automatically optimizing the memory system to efficiently utilize the increased DRAM bandwidth

Utilizing Multiple DRAMs • How to connect computational engines to DRAMs in order to maximize program performance? – Network topology: latency, bandwidth – On-chip caching – Area constraints ?

Utilizing Multiple DRAMs • How to connect computational engines to DRAMs in order to maximize program performance? – High design complexity: network, caching… ?

Utilizing Multiple DRAMs • How to connect computational engines to DRAMs in order to maximize program performance? – High design complexity: network, caching… • Applications have different memory behavior

Utilizing Multiple DRAMs • How to connect computational engines to DRAMs in order to maximize program performance? – High design complexity: network, caching… • Applications have different memory behavior Need more bandwidth!

Utilizing Multiple DRAMs • How to connect computational engines to DRAMs in order to maximize program performance? – High design complexity: network, caching… • Applications have different memory behavior Need more bandwidth! Need a memory compiler!

Automatic Construction of Program-Optimized Memories • A clearly-defined, generic memory abstraction – Separate the user program from the memory system implementation • Program introspection – To understand the program’s memory behavior • A resource-aware, feedback-driven memory compiler – Use introspection results as feedback to automatically construct the “best” memory system for the target program and platform

Abstraction • Abstraction hides implementation details and provides good programmability Processor FPGA Software C/Python Application User Program Operating System Abstraction Instruction Set Architecture Memory Communication Hardware Memory CPU I/O

Abstraction • Abstraction hides implementation details and provides good programmability Processor FPGA Software C/Python Application User Program Operating System Abstraction Instruction Set Architecture Memory Communication Hardware Memory CPU I/O Compilers & system developers Hardware can be optimized for the • target application and platform

LEAP Memory Abstraction LEAP memory block User Engine • Simple memory interface • Arbitrary data size Interface • Private address space • “Unlimited” storage LEAP Memory • Automatic caching interface MEM_IFC#(type t_ADDR, type t_DATA) method void readReq (t_ADDR addr); method void write (t_ADDR addr, t_DATA din); method t_DATA readResp (); endinterface

LEAP Memory Abstraction Same as block RAMs LEAP memory block User Engine • Simple memory interface • Arbitrary data size Interface • Private address space • “Unlimited” storage LEAP Memory • Automatic caching interface MEM_IFC#(type t_ADDR, type t_DATA) method void readReq (t_ADDR addr); method void write (t_ADDR addr, t_DATA din); method t_DATA readResp (); endinterface

LEAP Private Memory FPGA User Client Client Client Program Interface Platform M. Adler et al. , “LEAP Scratchpads,” in FPGA, 2011.

LEAP Private Memory FPGA User Client Client Client Program Interface on-chip SRAM Platform on-board DRAM M. Adler et al. , “LEAP Scratchpads,” in FPGA, 2011.

LEAP Private Memory FPGA Processor User Client Client Client Application Program Interface on-chip SRAM L1 Cache Platform on-board DRAM L2 Cache Memory M. Adler et al. , “LEAP Scratchpads,” in FPGA, 2011.

LEAP Memory w ith Multiple DRAMs • Naïve solution: unified memory with multiple DRAM banks Client Client Client Interface

LEAP Memory w ith Multiple DRAMs • Naïve solution: unified memory with multiple DRAM banks Client Client Client Interface Simplicity More capacity Higher bandwidth

LEAP Memory w ith Multiple DRAMs • Naïve solution: unified memory with multiple DRAM banks Client Client Client Interface Simplicity More capacity Higher bandwidth Difficulty: Performance is limited Serialized requests Long latency for large rings

LEAP Memory w ith Multiple DRAMs • Naïve solution: unified memory with multiple DRAM banks Client Client Client Interface Simplicity More capacity Higher bandwidth Difficulty: Performance is limited Serialized requests Long latency for large rings Can we do better?

LEAP Memory w ith Multiple DRAMs • Distributed central caches and memory controllers

LEAP Memory w ith Multiple DRAMs • Distributed central caches and memory controllers ?

Private Cache Network Partitioning • Program introspection – To understand programs’ memory behavior Statistics file Statistics Counter Statistics file Client A: 100 Client A: 100 Ex: # Cache misses Client B: 10 Client B: 10 # Outstanding requests Client C: 50 Client C: 50 Queueing delays Client D: 20 Client D: 20

Private Cache Network Partitioning • Case 1: Memory clients with homogeneous behavior

Private Cache Network Partitioning • Case 1: Memory clients with homogeneous behavior Homogeneous

Private Cache Network Partitioning • Case 2: Memory clients with heterogeneous behavior Traffic: 100 10 50 20

Private Cache Network Partitioning • Case 2: Memory clients with heterogeneous behavior Need more bandwidth! Traffic: 100 10 50 20

Private Cache Network Partitioning • Case 2: Memory clients with heterogeneous behavior – Load-balanced partitioning • Classical minimum makespan scheduling problem 𝑛 controllers, n clients, client j with traffic 𝑢 𝑘 𝑦 𝑗 , 𝑘 = � 1 if client j is mapped to controller i 0 otherwise ILP formulation: minimize t 𝑜 s.t. ∑ 𝑦 𝑗 , 𝑘 𝑢 𝑘 ≤ 𝑢 , 𝑗 = 1, … , 𝑛 𝑘=1 𝑛 s.t. ∑ 𝑦 𝑗 , 𝑘 = 1, j = 1, … , n 𝑗=1 s.t. 𝑦 𝑗 , 𝑘 ∈ 0,1 , 𝑗 = 1, … , 𝑛 , 𝑘 = 1, … , 𝑜

Private Cache Network Partitioning • Case 2: Memory clients with heterogeneous behavior – Load-balanced partitioning • Classical minimum makespan scheduling problem 𝑛 controllers, n clients, client j with traffic 𝑢 𝑘 𝑦 𝑗 , 𝑘 = � 1 if client j is mapped to controller i 0 otherwise Approximation: ILP formulation: Longest processing time (LPT) minimize t algorithm 𝑜 s.t. ∑ 𝑦 𝑗 , 𝑘 𝑢 𝑘 ≤ 𝑢 , 𝑗 = 1, … , 𝑛 𝑘=1 𝑛 s.t. ∑ 𝑦 𝑗 , 𝑘 = 1, j = 1, … , n 𝑗=1 s.t. 𝑦 𝑗 , 𝑘 ∈ 0,1 , 𝑗 = 1, … , 𝑛 , 𝑘 = 1, … , 𝑜

Private Cache Network Partitioning • Case 3: Fractional load-balancing

Private Cache Network Partitioning • Case 3: Fractional load-balancing ILP->LP minimize t 𝑜 s.t. ∑ 𝑦 𝑗 , 𝑘 𝑢 𝑘 ≤ 𝑢 𝑘=1 𝑛 s.t. ∑ 𝑦 𝑗 , 𝑘 = 1 𝑗=1 s.t. 𝟏 ≤ 𝒚 𝒋 , 𝒌 ≤ 𝟐

Motivation Moores Law continues More transistors & memory - PowerPoint PPT Presentation

LMC: Automatic Resource-Aware Program-Optimized Memory Partitioning Hsin-Jung Yang , Kermin E. Fleming , Michael Adler , Felix Winterstein , and Joel Emer Massachusetts Institute of Technology, Intel Corporation,

Sketch Model Review MotoThresher Empowering Tanzanian Farmers Motivation Motivation

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&M University Motivation

Bringing Portraits to Life CS448V: Lecture 13 Motivation Motivation Motivation Bring Your

Motivation: Theory & practice 2017-18 I MPORTANCE OF MOTIVATION Employees may lack

5. Motivation Motivation: Big Questions Where does motivation come from? Can

Indoor Places Lukas Kuster Motivation GPS for localization [7] 2 Motivation Indoor

UBER RUSH AND REBUILDING UBERS DISPATCHING PLATFORM motivation CHAPTER 1 OF 8 MOTIVATION

MOTIVATION MOTIVATION Dr. M. Thenmozhi Professor Department of Management Studies Indian

Video Analytics Xavier Gir-i-Nieto Motivation 2 Motivation 3 Motivation 4 Outline 1.

MOTIVATION Watch this video on intrinsic versus extrinsic motivation Value x Expectation (of

Learner Motivation Motivational Self-Reflection Self-Reflection Time Travel Think about a time

Motivation What is Motivation? How motivated are you now? What are your thoughts as you enter

RedGate - Enterprise MSE Project - Phase I Integration Server Motivation 2 Motivation 2

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/13/2011 Motivation and Toolkits

Recent work in Truncated Statistics Andrew Ilyas Motivation: Poincar and the Baker

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/14/2014 Motivation and Toolkits

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

project: meshing technical, political, and social forces Jack C. Swearengen

A Unified Monitoring Framework for Energy Consumption and Network Traffic Florentin Clouet, Simon

Transportation Data Sharing Oregon Metro Perspective Jeff Frkonja, Robb Kirkman, Peter Bosa

Natural Language Processing (CSE 517): Dependency Structure Noah Smith 2016 c University of

Natural Language Processing (CSE 517): Dependency Syntax and Parsing Noah A. Smith Swabha

Top-down Tree Long Short-Term Memory Networks Xingxing Zhang , Liang Lu, Mirella Lapata School of

Draft version: Please do not circulate or quote without authors permission WINNING IN ASIA,

Motivation Moores Law continues More transistors & memory - PowerPoint PPT Presentation

LMC: Automatic Resource-Aware Program-Optimized Memory Partitioning Hsin-Jung Yang , Kermin E. Fleming , Michael Adler , Felix Winterstein , and Joel Emer Massachusetts Institute of Technology, Intel Corporation,

Sketch Model Review MotoThresher Empowering Tanzanian Farmers Motivation Motivation

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&amp;M University Motivation

Bringing Portraits to Life CS448V: Lecture 13 Motivation Motivation Motivation Bring Your

Motivation: Theory &amp; practice 2017-18 I MPORTANCE OF MOTIVATION Employees may lack

5. Motivation Motivation: Big Questions Where does motivation come from? Can

Indoor Places Lukas Kuster Motivation GPS for localization [7] 2 Motivation Indoor

UBER RUSH AND REBUILDING UBERS DISPATCHING PLATFORM motivation CHAPTER 1 OF 8 MOTIVATION

MOTIVATION MOTIVATION Dr. M. Thenmozhi Professor Department of Management Studies Indian

Video Analytics Xavier Gir-i-Nieto Motivation 2 Motivation 3 Motivation 4 Outline 1.

MOTIVATION Watch this video on intrinsic versus extrinsic motivation Value x Expectation (of

Learner Motivation Motivational Self-Reflection Self-Reflection Time Travel Think about a time

Motivation What is Motivation? How motivated are you now? What are your thoughts as you enter

RedGate - Enterprise MSE Project - Phase I Integration Server Motivation 2 Motivation 2

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/13/2011 Motivation and Toolkits

Recent work in Truncated Statistics Andrew Ilyas Motivation: Poincar and the Baker

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/14/2014 Motivation and Toolkits

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

project: meshing technical, political, and social forces Jack C. Swearengen

A Unified Monitoring Framework for Energy Consumption and Network Traffic Florentin Clouet, Simon

Transportation Data Sharing Oregon Metro Perspective Jeff Frkonja, Robb Kirkman, Peter Bosa

Natural Language Processing (CSE 517): Dependency Structure Noah Smith 2016 c University of

Natural Language Processing (CSE 517): Dependency Syntax and Parsing Noah A. Smith Swabha

Top-down Tree Long Short-Term Memory Networks Xingxing Zhang , Liang Lu, Mirella Lapata School of

Draft version: Please do not circulate or quote without authors permission WINNING IN ASIA,

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&M University Motivation

Motivation: Theory & practice 2017-18 I MPORTANCE OF MOTIVATION Employees may lack