Steady-state scheduling on CELL Mathias Jacquelin, joint work with - PowerPoint PPT Presentation

Steady-state scheduling on CELL Mathias Jacquelin, joint work with Matthieu Gallet, Loris Marchal and Yves Robert INRIA GRAAL project-team LIP (ENS-Lyon, CNRS, INRIA) ´ Ecole Normale Sup´ erieure de Lyon, France “Scheduling for large-scale systems” workshop, Knoxville, May 14, 2009. 1/ 22

Outline Introduction Steady-state scheduling CELL Platform and Application Modeling Mapping the Application Practical Steady-State on CELL Preprocessing of the schedule State machine of the application Preliminary results Conclusion and Future works 2/ 22

Motivation ◮ Multicore architectures: new opportunity to test the scheduling strategies designed in the GRAAL team. ◮ Our trademark: efficient scheduling on heterogeneous platforms ◮ Most multicore architecture are homogeneous, regular ◮ Need for tailored algorithms (linear algebra,. . . ) ◮ Emerging heterogeneous multicore: ◮ Dedicated processing units on GPUs ◮ Mixed system: processor + accelerator ◮ This study: steady-state scheduling on CELL (bounded heterogeneity) to demonstrate the usefulness of complex (static) scheduling techniques ◮ Ongoing work: only preliminary results 3/ 22

Introduction: Steady-state Scheduling Rationale: ◮ A pipelined application: ◮ Simple chain ◮ More complex application (Directed Acyclic Graph) ◮ Objective: optimize the throughput of the application (number of input files treated per seconds) ◮ Today: simple case where each task has to be mapped on one single resource 4/ 22

Introduction: Steady-state Scheduling Rationale: ◮ A pipelined application: ◮ Simple chain ◮ More complex application (Directed Acyclic Graph) T 1 ◮ Objective: optimize the throughput T 2 of the application (number of input files treated per T 3 seconds) ◮ Today: simple case where each task has to be mapped on one single resource 4/ 22

Introduction: Steady-state Scheduling Rationale: ◮ A pipelined application: ◮ Simple chain ◮ More complex application T 1 (Directed Acyclic Graph) T 2 T 3 T 4 ◮ Objective: optimize the throughput of the application T 5 T 6 T 7 T 8 (number of input files treated per seconds) T 9 ◮ Today: simple case where each task has to be mapped on one single resource 4/ 22

Introduction: Steady-state Scheduling Rationale: ◮ A pipelined application: P 1 ◮ Simple chain ◮ More complex application T 1 P 3 P 2 (Directed Acyclic Graph) T 2 T 3 T 4 ◮ Objective: optimize the throughput of the application T 5 T 6 T 7 T 8 (number of input files treated per seconds) P 4 T 9 ◮ Today: simple case where each task has to be mapped on one single resource 4/ 22

CELL brief introduction ◮ Multicore heterogeneous processor ◮ Accelerator extension to Power architecture 5/ 22

CELL brief introduction ◮ Multicore heterogeneous processor ◮ Accelerator extension to Power architecture SPE 0 SPE 1 SPE 7 SPE 6 MEMORY PPE 0 EIB SPE 5 SPE 4 SPE 2 SPE 3 5/ 22

CELL brief introduction ◮ Multicore heterogeneous processor ◮ Accelerator extension to Power architecture SPE 0 SPE 1 SPE 7 SPE 6 MEMORY PPE 0 EIB SPE 5 SPE 4 SPE 2 SPE 3 ◮ 1 PPE core ◮ VMX unit ◮ L1, L2 cache ◮ 2 way SMT 5/ 22

CELL brief introduction ◮ Multicore heterogeneous processor ◮ Accelerator extension to Power architecture SPE 0 SPE 1 SPE 7 SPE 6 MEMORY PPE 0 EIB SPE 5 SPE 4 SPE 2 SPE 3 ◮ 8 SPEs ◮ 128-bit SIMD instruction set ◮ Local store 256KB ◮ Dedicated Asynchronous DMA engine 5/ 22

CELL brief introduction ◮ Multicore heterogeneous processor ◮ Accelerator extension to Power architecture SPE 0 SPE 1 SPE 7 SPE 6 MEMORY PPE 0 EIB SPE 5 SPE 4 SPE 2 SPE 3 5/ 22

CELL brief introduction ◮ Multicore heterogeneous processor ◮ Accelerator extension to Power architecture SPE 0 SPE 1 SPE 7 SPE 6 MEMORY PPE 0 EIB SPE 5 SPE 4 SPE 2 SPE 3 ◮ Element Interconnect Bus (EIB) ◮ 200 GB/s bandwidth 5/ 22

CELL brief introduction ◮ Multicore heterogeneous processor ◮ Accelerator extension to Power architecture SPE 0 SPE 1 SPE 7 SPE 6 MEMORY PPE 0 EIB SPE 5 SPE 4 SPE 2 SPE 3 ◮ 25 GB/s bandwidth 5/ 22

Outline Introduction Steady-state scheduling CELL Platform and Application Modeling Mapping the Application Practical Steady-State on CELL Preprocessing of the schedule State machine of the application Preliminary results Conclusion and Future works 6/ 22

Platform modeling Simple CELL modeling: ◮ 1 PPE and 8 SPE: 9 processing elements P 1 , . . . , P 9 , with unrelated speed, ◮ Each processing element access the communication bus with a (bidirectional) bandwidth b = (25 GB / s ) , ◮ The bus is able to route all concurrent communications without contention (in a first step), ◮ Due to the limited size of the DMA stack on each SPE: ◮ Each SPE can perform at most 16 simultaneous DMA operations, ◮ The PPE can perform at most 8 simultaneous DMA operations to/from a given SPE. ◮ Linear cost communication model: a data of size S is sent/received in time S / b 7/ 22

Application modeling Application is described by a directed acyclic graph: T 1 ◮ Tasks T 1 , . . . , T n T 2 T 3 T 4 ◮ Processing time of task T k on P i is t i ( k ), T 5 T 6 T 7 T 8 ◮ If there is a dependency T k → T l , data k , l is the size of the file T 9 produced by T k and needed by T l , ◮ If T k is an input task, it reads read k bytes from main memory, ◮ If T k is an output task, it writes write k bytes to main memory, 8/ 22

Steady-state scheduling on CELL Mathias Jacquelin, joint work with - PowerPoint PPT Presentation

Steady-state scheduling on CELL Mathias Jacquelin, joint work with Matthieu Gallet, Loris Marchal and Yves Robert INRIA GRAAL project-team LIP (ENS-Lyon, CNRS, INRIA) Ecole Normale Sup erieure de Lyon, France Scheduling for

Bacteria Without a Cell Wall L-forms Pros & Cons of Cell Wall Cell membrane Cell wall DNA

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

Cell Communication and Cell Signaling Why is cell signaling important? Why is cell signaling

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Instruction Scheduling Last time Instruction scheduling using list scheduling Today

Pandemic Telehealth vs Steady-State Telehealth In steady-state telepractice, several

Steady State Temperature Steady State Temperature Profiles in Rods Profiles in Rods Amy Chan

Steady-State Simulation Steady-State Simulation Overview Reading: Ch. 9 in Law & Ch. 15 in

Does God play dice with the cell? Does God play dice with the cell? Does God play dice with the

Algorithms and Methods for Distributed Storage Networks 7 File Systems Christian Schindelhauer

WEI Malm, 04/11/2016 EU developments relevant to the sector Jeremy Wall, Principal

A new boost for European Investments (Supporting slides) Dario Scannapieco Vice President

EIBs views on prospects for private sector development in Romania with a focus on the

2010 Blue Waters Performance Modeling Workshop Opening and Introduction Torsten Hoefler With

Optimization (Introduction) : IR IR f ( x ) ID Optimization " 112 FCI ) : IR NE

Asymptotics for Fermi curves of electric and magnetic periodic fields Gustavo de Oliveira UBC

Which filesystem should I use? LinuxTag 2013 Heinz Mauelshagen Consulting Development Engineer

Steady-state scheduling on CELL Mathias Jacquelin, joint work with - PowerPoint PPT Presentation

Steady-state scheduling on CELL Mathias Jacquelin, joint work with Matthieu Gallet, Loris Marchal and Yves Robert INRIA GRAAL project-team LIP (ENS-Lyon, CNRS, INRIA) Ecole Normale Sup erieure de Lyon, France Scheduling for

Bacteria Without a Cell Wall L-forms Pros &amp; Cons of Cell Wall Cell membrane Cell wall DNA

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

Cell Communication and Cell Signaling Why is cell signaling important? Why is cell signaling

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Instruction Scheduling Last time Instruction scheduling using list scheduling Today

Pandemic Telehealth vs Steady-State Telehealth In steady-state telepractice, several

Steady State Temperature Steady State Temperature Profiles in Rods Profiles in Rods Amy Chan

Steady-State Simulation Steady-State Simulation Overview Reading: Ch. 9 in Law &amp; Ch. 15 in

Does God play dice with the cell? Does God play dice with the cell? Does God play dice with the

Algorithms and Methods for Distributed Storage Networks 7 File Systems Christian Schindelhauer

WEI Malm, 04/11/2016 EU developments relevant to the sector Jeremy Wall, Principal

A new boost for European Investments (Supporting slides) Dario Scannapieco Vice President

EIBs views on prospects for private sector development in Romania with a focus on the

2010 Blue Waters Performance Modeling Workshop Opening and Introduction Torsten Hoefler With

Optimization (Introduction) : IR IR f ( x ) ID Optimization &quot; 112 FCI ) : IR NE

Asymptotics for Fermi curves of electric and magnetic periodic fields Gustavo de Oliveira UBC

Which filesystem should I use? LinuxTag 2013 Heinz Mauelshagen Consulting Development Engineer

Bacteria Without a Cell Wall L-forms Pros & Cons of Cell Wall Cell membrane Cell wall DNA

Steady-State Simulation Steady-State Simulation Overview Reading: Ch. 9 in Law & Ch. 15 in

Optimization (Introduction) : IR IR f ( x ) ID Optimization " 112 FCI ) : IR NE