Ray 1. Introduction Problem Statement Background Related Work 2. - PowerPoint PPT Presentation

Presented by: Devin Taylor A Distributed Framework for Emerging AI Applications R. Nishihara, P. Moritz, et al October 17, 2018 University of California, Berkeley Ray

1. Introduction Problem Statement Background Related Work 2. Methodology Overview Programming model Architecture 3. Analysis Results Critical Analysis 4. Conclusion 1 Table of contents

Introduction

and dynamic computation graphs, while handling millions of tasks Need for a computation framework that supports heterogeneous per second with millisecond-level latencies. 2 Problem Statement

• High-performance, distributed execution framework for Python • Key features include: • Heterogeneous, concurrent computations • Dynamic task graphs • High-throughput and low-latency scheduling • Transparent fault tolerance • Task-parallel and actor programming models • Horizontally scalable • Applications: • Reinforcement learning • Hyperparameter tuning • Distributed training 3 Background

• Supports dynamic task graphs • Centralized scheduling architecture • No actor abstraction • Implement BSP execution model • No actor abstraction • Centralized scheduling architecture • Cannot modify DAG in response to task progress, task completion times, or faults 4 Related Work • CIEL [ 1 ] , Dask [ 2 ] • MapReduce [ 3 ] • TensorFlow Fold [ 4 ] , MXNet [ 5 ]

Methodology

• Implement a distributed framework suitable for modern AI applications • Flexibility - Functionality, duration, resource types • Performance - scheduling • Ease of development 5 Overview Goal Requirements

• Remote functions return futures - get(), wait() • Can specify resource allocation for remote functions at run time • Supports nested remote functions • Actor abstraction - Stateful edge to computation graph (data and control) 6 Methodology - Programming model Figure 1: Nested remote functions

• Application layer • Driver - executes user program • Worker - executes remote functions • Actor - executes methods it exposes • System layer • Global Control Store (GCS) • Bottom-up distributed scheduler • In-memory distributed object store - Apache Arrow 7 Methodology - Architecture Figure 2: Architecture overview

• Stores all metadata and state information • Supports pub-sub infrastructure for internal communication • Enables system to be stateless - enabling easy horizontal scalability • Scaling achieved through sharding 8 Architecture - Global Control Store (GCS)

• Global scheduler with per-node local schedulers • Tasks submitted to node’s local scheduler first • Conditions under which global scheduler is invoked: • Overloaded • Cannot satisfy task requirements • Task inputs remote scheduler 9 Architecture - Bottom-up distributed scheduler Figure 3: Bottom-up distributed

10 Architecture - Overview Figure 4: Overview of task execution Figure 5: Overview of result retrieval

Analysis

• Linear • Peak throughput > 15 GB/s • Peak IOPS 18K 11 Results - System Figure 7: Object store performance Figure 6: End-to-end scalability • 1 . 8M tasks per second • 56 µ s per operation

• Evolution Strategies (ES) Humanoid-v1 task • Scaled to 8192 cores vs 1024 • 3.7 minutes vs 10 minutes • Proximal Policy Optimization (PPO) • Ability to specify resource requirements 12 Results - RL Application Figure 8: ES implementation Figure 9: PPO application

• Fault tolerance - potentially redundant due to statistical properties of most AI algorithms • Specifying resource requirements - not always correctly understood • Replication of GCS - single point of failure so requirement for fault tolerance 13 Critical Analysis

Conclusion

• Dynamic task graphs, GCS, bottom-up distributed scheduler, and actor programming model make Ray unique contribution • Scalability and performance make Ray useful for modern AI applications • Minor criticism around redundant architecture implementations 14 Conclusion

Derek G Murray, Malte Schwarzkopf, Christopher Smowton, Steven Smith, Anil Madhavapeddy, and Steven Hand. In Proc. 8th ACM/USENIX Symposium on Networked Systems Design and Implementation , pages 113–126, 2011. Matthew Rocklin. In Proceedings of the 14th Python in Science Conference , number 130-136. Citeseer, 2015. 15 References i Ciel: a universal execution engine for distributed data-flow computing. Dask: Parallel computation with blocked algorithms and task scheduling.

Jeffrey Dean and Sanjay Ghemawat. Communications of the ACM , 51(1):107–113, 2008. Moshe Looks, Marcello Herreshoff, DeLesley Hutchins, and Peter Norvig. arXiv preprint arXiv:1702.02181 , 2017. Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. arXiv preprint arXiv:1512.01274 , 2015. 16 References ii Mapreduce: simplified data processing on large clusters. Deep learning with dynamic computation graphs. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems.

Ray 1. Introduction Problem Statement Background Related Work 2. - PowerPoint PPT Presentation

Presented by: Devin Taylor A Distributed Framework for Emerging AI Applications R. Nishihara, P. Moritz, et al October 17, 2018 University of California, Berkeley Ray 1. Introduction Problem Statement Background Related Work 2. Methodology

Ray Tracing Ray Tracing Ray Casting Ray Casting Ray-Surface Intersections Ray-Surface

Ray Tracing Ray Tracing Ray Casting Ray Casting Ray-Surface Intersections Ray-Surface

Probing Particle Acceleration with Probing Particle Acceleration with X-ray/Gamma X ray/Gamma

X- X- -ray optics -ray optics ray optics ray optics Crystal optics Crystal optics Crystal

Gamma- Gamma -Ray Particle Ray Particle Astrophysics: Astrophysics: Astrophysics:

lecture 18 Recall Ray Casting (lectures 7, 8) Ray tracing is like ray casting, but now mirror

Advanced Ray Tracing 1 2/8/2006 Distributed Ray Tracing Distributed ray tracing is an

Blu-ray Disc Association Blu-ray Disc Association CES 2007 CES 2007 Blu-ray Disc: The Leader in

Galactic X-ray Surveys and Galactic X-ray Source Populations Bob Warwick University of

X-ray sources Marat Gilfanov MPA, Garching X-ray sources X-ray binaries accreting

TM Technology for Blu-ray and TV: Java Creating your own Blu-ray Java Discs The Blu-ray Java

Ray Casting Finding Ray Direction Finding Ray Direction Goal is to find ray direction for

Computer Graphics - Ray-Tracing II - Hendrik Lensch Computer Graphics WS07/08 Ray Tracing II

MIT 6.837 - Ray Tracing Ray Tracing MIT EECS 6.837 Most slides are taken from Frdo Durand and

Advanced Ray Tracing Stochastic ray tracing: distribute rays stochastically across pixel

Connecting cosmic-ray physics, Connecting cosmic-ray physics, gamma-ray data and Dark Matter

h 4 1 Part I: Motivation and aim 2 Aim of this laboratory Capture basic knowledge

Distributed Computing Programs that cooperate and communicate over a network E-mail

Remote Method Invocation 4/29/2008 1 Opening Discussion Solutions to the interclass problem.

Distribution & Services Analyst Day 2018 Joe Reniers President, Distribution &

Panduit IA UPS Benefits for Zone Networks 4/15/2014 2 Panduit IA UPS Benefits for Zone Networks

Building a Coordinated National Soil Moisture Monitoring Network: Bringing Together Federal,

3/5/2019 Greater Kaweah GSA Technical Advisory Committee Meeting www.GreaterKaweahGSA.org

Park: An Open Platform for Learning-Augmented Computer Systems Hongzi Mao, Parimarjan Negi,

Ray 1. Introduction Problem Statement Background Related Work 2. - PowerPoint PPT Presentation

Presented by: Devin Taylor A Distributed Framework for Emerging AI Applications R. Nishihara, P. Moritz, et al October 17, 2018 University of California, Berkeley Ray 1. Introduction Problem Statement Background Related Work 2. Methodology

Ray Tracing Ray Tracing Ray Casting Ray Casting Ray-Surface Intersections Ray-Surface

Ray Tracing Ray Tracing Ray Casting Ray Casting Ray-Surface Intersections Ray-Surface

Probing Particle Acceleration with Probing Particle Acceleration with X-ray/Gamma X ray/Gamma

X- X- -ray optics -ray optics ray optics ray optics Crystal optics Crystal optics Crystal

Gamma- Gamma -Ray Particle Ray Particle Astrophysics: Astrophysics: Astrophysics:

lecture 18 Recall Ray Casting (lectures 7, 8) Ray tracing is like ray casting, but now mirror

Advanced Ray Tracing 1 2/8/2006 Distributed Ray Tracing Distributed ray tracing is an

Blu-ray Disc Association Blu-ray Disc Association CES 2007 CES 2007 Blu-ray Disc: The Leader in

Galactic X-ray Surveys and Galactic X-ray Source Populations Bob Warwick University of

X-ray sources Marat Gilfanov MPA, Garching X-ray sources X-ray binaries accreting

TM Technology for Blu-ray and TV: Java Creating your own Blu-ray Java Discs The Blu-ray Java

Ray Casting Finding Ray Direction Finding Ray Direction Goal is to find ray direction for

Computer Graphics - Ray-Tracing II - Hendrik Lensch Computer Graphics WS07/08 Ray Tracing II

MIT 6.837 - Ray Tracing Ray Tracing MIT EECS 6.837 Most slides are taken from Frdo Durand and

Advanced Ray Tracing Stochastic ray tracing: distribute rays stochastically across pixel

Connecting cosmic-ray physics, Connecting cosmic-ray physics, gamma-ray data and Dark Matter

h 4 1 Part I: Motivation and aim 2 Aim of this laboratory Capture basic knowledge

Distributed Computing Programs that cooperate and communicate over a network E-mail

Remote Method Invocation 4/29/2008 1 Opening Discussion Solutions to the interclass problem.

Distribution &amp; Services Analyst Day 2018 Joe Reniers President, Distribution &amp;

Panduit IA UPS Benefits for Zone Networks 4/15/2014 2 Panduit IA UPS Benefits for Zone Networks

Building a Coordinated National Soil Moisture Monitoring Network: Bringing Together Federal,

3/5/2019 Greater Kaweah GSA Technical Advisory Committee Meeting www.GreaterKaweahGSA.org

Park: An Open Platform for Learning-Augmented Computer Systems Hongzi Mao, Parimarjan Negi,

Distribution & Services Analyst Day 2018 Joe Reniers President, Distribution &