M 3 : INTEGRATING ARBITRARY COMPUTE UNITS AS FIRST-CLASS CITIZENS - PowerPoint PPT Presentation

Faculty of Computer Science, Institute for System Architecture, Operating Systems Group M 3 : INTEGRATING ARBITRARY COMPUTE UNITS AS FIRST-CLASS CITIZENS OS: Nils Asmussen, Hermann H¨ artig, Marcus V ¨ olp EE: Benedikt N ¨ othen, Gerhard Fettweis Dagstuhl Seminar, 02/09/2017

Why? • FPGA-based memcached 16x better in performance per watt than Atom CPU [1] • Machine learning accelerator is 20% faster than FPGA and requires 128 times less energy [2] • . . . [1] Thin servers with smart pipes: Designing SoC accelerators for memcached, ISCA’13 [2] PuDianNao: A polyvalent machine learning accelerator, ASPLOS’15 Nils Asmussen Slide 2 of 16

The Problem for OSes ARM Intel Audio big Xeon DSP Decoder ARM Intel FPGA Xeon LITTLE DSP Nils Asmussen Slide 3 of 16

The Problem for OSes ARM Intel Audio big Xeon DSP Decoder Kernel ARM Intel FPGA Xeon LITTLE DSP Kernel Nils Asmussen Slide 3 of 16

The Problem for OSes ARM Intel Audio big Xeon DSP Decoder Kernel Kernel ARM Intel FPGA Xeon LITTLE DSP Kernel Kernel Nils Asmussen Slide 3 of 16

The Problem for OSes ARM Intel big Xeon Kernel Kernel ARM Intel Xeon LITTLE Kernel Kernel Nils Asmussen Slide 3 of 16

The Goal Treat all compute units (CU) as first-class citizens: Run untrusted code without causing harm 1 Access operating system services 2 Interact as the master with other CUs 3 Nils Asmussen Slide 4 of 16

First-class Citizenchip as Enabler • Pipe communication between arbitrary CUs • Use parallism on GPUs for FS operations • Direct access to accelerators from the net • . . . Nils Asmussen Slide 5 of 16

M 3 Approach – Hardware ARM Audio Intel big DSP Decoder Xeon Mem Mem Mem Mem ARM Intel FPGA Xeon DSP LITTLE Mem Mem Mem Mem Asmussen et al.: M3: A Hardware/OS Co-Design to Tame Heterogeneous Manycores, ASPLOS’16 Nils Asmussen Slide 6 of 16

M 3 Approach – Hardware ARM Audio Intel big DSP Decoder Xeon Mem DTU Mem DTU Mem DTU Mem DTU ARM Intel FPGA Xeon DSP LITTLE Mem DTU Mem DTU Mem DTU Mem DTU Asmussen et al.: M3: A Hardware/OS Co-Design to Tame Heterogeneous Manycores, ASPLOS’16 Nils Asmussen Slide 6 of 16

M 3 Approach – Hardware PE PE PE PE ARM Audio Intel big DSP Decoder Xeon Mem DTU Mem DTU Mem DTU Mem DTU PE PE PE PE ARM Intel FPGA Xeon DSP LITTLE Mem DTU Mem DTU Mem DTU Mem DTU Asmussen et al.: M3: A Hardware/OS Co-Design to Tame Heterogeneous Manycores, ASPLOS’16 Nils Asmussen Slide 6 of 16

M 3 Approach – Software PE PE PE PE ARM Audio Intel App App Kernel big DSP Decoder Xeon App Mem DTU Mem DTU Mem DTU Mem DTU PE PE PE PE ARM Intel App App App FPGA Xeon DSP App LITTLE Mem DTU Mem DTU Mem DTU Mem DTU Asmussen et al.: M3: A Hardware/OS Co-Design to Tame Heterogeneous Manycores, ASPLOS’16 Nils Asmussen Slide 6 of 16

Data Transfer Unit • Supports memory access and message passing • Provides a number of endpoints • Each endpoint can be configured for: Accessing memory (contiguous range, byte granular) 1 Receiving messages into a receive buffer 2 Sending messages to a receiving endpoint 3 • Configuration only by kernel, usage by application • Credit system to prevent DoS attacks • Direct reply on received messages Nils Asmussen Slide 7 of 16

M 3 System Call Kernel App Mem DTU S Mem DTU R Nils Asmussen Slide 8 of 16

Prototype Platform: Tomahawk 2 PE PE PE R R R Xtensa LX4 PE PE PE R R R Mem PE PE Instr. Data Ctrl. DTU SPM SPM R R R DRAM PEs have no OS support: • No privileged mode • No MMU • No caches, but SPM Nils Asmussen Slide 9 of 16

Prototype Platform: gem5 PE PE x86 x86 DTU DTU L1 SPM VM PE PE Hash x86 DRAM Accel Ctl DRAM L1 DTU DTU SPM DTU L2 VM Nils Asmussen Slide 10 of 16

M 3 • M icrokernel-based syste m for het. m anycores • Mechanisms for PEs, memory and communication • Drivers, filesystems, . . . are implemented on top • Kernel manages permissions • DTU enforces permissions (communication, memory access) • Kernel is independent of other CUs in the system Nils Asmussen Slide 11 of 16

Virtual PEs • Comparable to a process with 0/1 threads • Creating VPE yields a VPE cap. and memory cap. • Library provides primitives like fork and exec Nils Asmussen Slide 12 of 16

Virtual PEs • Comparable to a process with 0/1 threads • Creating VPE yields a VPE cap. and memory cap. • Library provides primitives like fork and exec Execute function on different PE VPE vpe; vpe.run_async([]() { Serial::get() << "Hello World!\n"; return 0; }); int exitcode = vpe.wait(); Nils Asmussen Slide 12 of 16

Virtual PEs • VPE with 0 threads for HW accelerators • Allows direct access for applications • Time-multiplexed by the kernel Access an accelerator VPE vpe(VPEDesc::HASH_ACCEL); SendGate sg(vpe); GateIStream reply = send_receive_vmsg(sg, 1, 2, 3); int res; reply >> res; Nils Asmussen Slide 13 of 16

Filesystem: m3fs Kernel App Mem DTU S S Mem DTU R DRAM m3fs Mem DTU S R Nils Asmussen Slide 14 of 16

Filesystem: m3fs Kernel App Mem DTU S S M Mem DTU R DRAM m3fs Mem DTU S R Nils Asmussen Slide 14 of 16

Performance Comparison App Xfers OS 7 6 Time (M cycles) 5 4 3 2 1 0 M3 Lx M3 Lx M3 Lx M3 Lx tar untar find sqlite Nils Asmussen Slide 15 of 16

Summary • M 3 uses a HW/SW co-design • DTU creates common interface for all CUs • M 3 kernel controls DTUs remotely • Allows to treat all CUs as first-class citizens Nils Asmussen Slide 16 of 16

M 3 : INTEGRATING ARBITRARY COMPUTE UNITS AS FIRST-CLASS CITIZENS - PowerPoint PPT Presentation

Faculty of Computer Science, Institute for System Architecture, Operating Systems Group M 3 : INTEGRATING ARBITRARY COMPUTE UNITS AS FIRST-CLASS CITIZENS OS: Nils Asmussen, Hermann H artig, Marcus V olp EE: Benedikt N othen, Gerhard

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

Scientific Units & Conversions Objective: Students will be able to convert units and choose

1 1 easy to compute , 1 easy to compute 2

Overview Respondent pool makeup 50-99 Other / 0-49 units multiple units 7% types

Welcome to Elyria High School Complete 21 course credits Units Our Districts Course

Welcome to Elyria High School Complete 21 course credits Units Our Districts Course

Palatine High School Parent Orientation Class of 2022 4 UNITS ENGLISH (9,10,11,12) 3 UNITS

Self-testing quantum systems of arbitrary local Self-testing quantum systems of arbitrary local

OPEN COMPUTE BRIEF 7x24 Exchange Carolinas Chapter 2017 Winter Meeting AGENDA Open

CUDA (Compute Unified Device Dr. Bharathwaj Bharath Muthuswamy Architecture) and OpenCL

MULTI-GPU PROGRAMMING MODELS Jiri Kraus, Senior Devtech Compute Jan Stephan, Intern Devtech

Infrastructure as a Service (IaaS) Google Compute Engine AWS Elastic Compute Cloud (EC2) Azure

Corporate Presentation Jan 2020 Sales Performance Full year 2019 Overall: 1,361,560 units -9%

Case for Accessory Dwelling Units (ADUs) in California and Beyond Craig Savage, Building Media,

Accessory Dwelling Units 1 Accessory Dwelling Units 2 Accessory Dwelling Units New ADU within

Corporate Presentation June 2020 Sales Performance Jan-May 2020 Overall: 420,317 units -25% Y

ARM memory generator Arm Memory generator Make sure you create a folder similar to what you

ARM EDITION Matt Spisak REcon 2016, Montreal RECON 2016 ABOUT Offense-based approach to

Probabilis)c Reasoning for Assembly-Based 3D Modeling

Refresh Your Knowledge. Policy Gradient Policy gradient algorithms change the policy parameters

PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits Arghya Roy Chaudhuri

NEVE: Nested Virtualization Extensions for ARM Jin Tack Lim, Christo ff er Dall, Shih-Wei Li, Jason

Extending the swsusp Hibernation Framework to ARM Russell Dill 1 2 Introduction Russ Dill

HIGHLEVELMANIPULATION PRIMITIVESFORAROBOTARM Supported by National

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

M 3 : INTEGRATING ARBITRARY COMPUTE UNITS AS FIRST-CLASS CITIZENS - PowerPoint PPT Presentation

Faculty of Computer Science, Institute for System Architecture, Operating Systems Group M 3 : INTEGRATING ARBITRARY COMPUTE UNITS AS FIRST-CLASS CITIZENS OS: Nils Asmussen, Hermann H artig, Marcus V olp EE: Benedikt N othen, Gerhard

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

Scientific Units &amp; Conversions Objective: Students will be able to convert units and choose

1 1 easy to compute , 1 easy to compute 2

Overview Respondent pool makeup 50-99 Other / 0-49 units multiple units 7% types

Welcome to Elyria High School Complete 21 course credits Units Our Districts Course

Welcome to Elyria High School Complete 21 course credits Units Our Districts Course

Palatine High School Parent Orientation Class of 2022 4 UNITS ENGLISH (9,10,11,12) 3 UNITS

Self-testing quantum systems of arbitrary local Self-testing quantum systems of arbitrary local

OPEN COMPUTE BRIEF 7x24 Exchange Carolinas Chapter 2017 Winter Meeting AGENDA Open

CUDA (Compute Unified Device Dr. Bharathwaj Bharath Muthuswamy Architecture) and OpenCL

MULTI-GPU PROGRAMMING MODELS Jiri Kraus, Senior Devtech Compute Jan Stephan, Intern Devtech

Infrastructure as a Service (IaaS) Google Compute Engine AWS Elastic Compute Cloud (EC2) Azure

Corporate Presentation Jan 2020 Sales Performance Full year 2019 Overall: 1,361,560 units -9%

Case for Accessory Dwelling Units (ADUs) in California and Beyond Craig Savage, Building Media,

Accessory Dwelling Units 1 Accessory Dwelling Units 2 Accessory Dwelling Units New ADU within

Corporate Presentation June 2020 Sales Performance Jan-May 2020 Overall: 420,317 units -25% Y

ARM memory generator Arm Memory generator Make sure you create a folder similar to what you

ARM EDITION Matt Spisak REcon 2016, Montreal RECON 2016 ABOUT Offense-based approach to

Probabilis)c Reasoning for Assembly-Based 3D Modeling

Refresh Your Knowledge. Policy Gradient Policy gradient algorithms change the policy parameters

PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits Arghya Roy Chaudhuri

NEVE: Nested Virtualization Extensions for ARM Jin Tack Lim, Christo ff er Dall, Shih-Wei Li, Jason

Extending the swsusp Hibernation Framework to ARM Russell Dill 1 2 Introduction Russ Dill

HIGHLEVELMANIPULATION PRIMITIVESFORAROBOTARM Supported by National

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Scientific Units & Conversions Objective: Students will be able to convert units and choose