Application Model NA NB NC ◦ Direct Acyclic Task Graph i 1 o τ 1 τ 2 τ 3 ◦ Mono-rate (or at least harmonic rates) ND ◦ Fixed mapping and execution order τ 4 Each task τ i : ◦ Processor Demand, Memory Demand NE NF i 2 τ 5 τ 6 ◦ Release date ( rel i ), response time ( R i ) R i rel i � � Interference 0 t 00 40 80 120 160 8 / 21
Application Model NA NB NC ◦ Direct Acyclic Task Graph i 1 o τ 1 τ 2 τ 3 ◦ Mono-rate (or at least harmonic rates) ND ◦ Fixed mapping and execution order τ 4 Each task τ i : ◦ Processor Demand, Memory Demand NE NF i 2 τ 5 τ 6 ◦ Release date ( rel i ), response time ( R i ) R i rel i � � Interference 0 t 00 40 80 120 160 � Find R i (including the interference) � Find rel i respecting precedence constraints 8 / 21
Outline 1 Motivation and Context 2 Models Definition Architecture Model Execution Model Application Model 3 Multicore Response Time Analysis of SDF Programs 4 Evaluation 5 Conclusion and Future Work 9 / 21
Response Time Analysis BUS ( R ) R = PD + I ◦ Response Time 10 / 21
Response Time Analysis BUS ( R ) R = PD + I ◦ Response Time ◦ Processor Demand 10 / 21
Response Time Analysis BUS ( R ) R = PD + I ◦ Response Time ◦ Processor Demand ◦ Bus Interference (given a model of the bus arbiter) 10 / 21
Response Time Analysis BUS ( R ) + I PROC ( R ) + I DRAM ( R ) R = PD + I ◦ Response Time ◦ Processor Demand ◦ Bus Interference (given a model of the bus arbiter) ◦ Interference from preempting tasks (no preemption: I PROC = 0 ) ◦ Interference from DRAM refreshes (out of scope. I DRAM = 0 ) 10 / 21
Response Time Analysis BUS ( R ) + I PROC ( R ) + I DRAM ( R ) R = PD + I ◦ Response Time ◦ Processor Demand ◦ Bus Interference (given a model of the bus arbiter) ◦ Interference from preempting tasks (no preemption: I PROC = 0 ) ◦ Interference from DRAM refreshes (out of scope. I DRAM = 0 ) ◦ Recursive formula ⇒ fixed-point algorithm. 10 / 21
Response Time Analysis BUS ( R ) + I PROC ( R ) + I DRAM ( R ) R = PD + I ◦ Response Time ◦ Processor Demand ◦ Bus Interference (given a model of the bus arbiter) ◦ Interference from preempting tasks (no preemption: I PROC = 0 ) ◦ Interference from DRAM refreshes (out of scope. I DRAM = 0 ) ◦ Recursive formula ⇒ fixed-point algorithm. ◦ Multiple shared resources (memory banks) 10 / 21
Response Time Analysis BUS ( R ) + I PROC ( R ) + I DRAM ( R ) R = PD + I ◦ Response Time ◦ Processor Demand ◦ Bus Interference (given a model of the bus arbiter) ◦ Interference from preempting tasks (no preemption: I PROC = 0 ) ◦ Interference from DRAM refreshes (out of scope. I DRAM = 0 ) ◦ Recursive formula ⇒ fixed-point algorithm. ◦ Multiple shared resources (memory banks) I BUS ( R ) = � I BUS ( R ) b b ∈ B where B : a set of memory banks 10 / 21
Response Time Analysis BUS ( R ) + I PROC ( R ) + I DRAM ( R ) R = PD + I ◦ Response Time ◦ Processor Demand ◦ Bus Interference (given a model of the bus arbiter) ◦ Interference from preempting tasks (no preemption: I PROC = 0 ) ◦ Interference from DRAM refreshes (out of scope. I DRAM = 0 ) ◦ Recursive formula ⇒ fixed-point algorithm. ◦ Multiple shared resources (memory banks) I BUS ( R ) = � I BUS ( R ) b b ∈ B where B : a set of memory banks � Requires a model of the bus arbiter 10 / 21
Model of the MPPA Bus G3 Rx high priority I BUS : delay from all accesses + concurrent ones Tx b Shared RR G2 DSU FP Memory 3 → 1 Bank Lv 4 RM RR 2 → 1 P 15 Lv 3 RR 16 → 1 G1 P 1 Lv 2 � P 0 Lv 1 � task of interest P 0 y t 00 40 80 11 / 21
Model of the MPPA Bus G3 Rx high priority I BUS : delay from all accesses + concurrent ones Tx b Shared RR S b i : number of accesses of task τ i to bank b G2 DSU FP Memory 3 → 1 Bank Lv 4 Sb RM i = Memory Demand to bank b RR 2 → 1 P 15 Lv 3 RR 16 → 1 G1 P 1 Lv 2 � P 0 Lv 1 � task of interest Lv 1 = S b P 0 i y t 00 40 80 11 / 21
Model of the MPPA Bus G3 Rx high priority I BUS : delay from all accesses + concurrent ones Tx b Shared RR S b i : number of accesses of task τ i to bank b G2 DSU FP Memory 3 → 1 Bank Lv 4 Sb RM i = Memory Demand to bank b RR 2 → 1 A y , b : number of concurrent accesses from core y to bank b P 15 Lv 3 i RR 16 → 1 G1 P 1 Lv 2 � P 0 Lv 1 � task of interest Lv 1 = S b P 0 i 15 min( A y , b � Lv 2 = Lv 1 + , Lv 1 ) i y =1 y t 00 40 80 11 / 21
Model of the MPPA Bus G3 Rx high priority I BUS : delay from all accesses + concurrent ones Tx b Shared RR AG 2, b S b i : number of accesses of task τ i to bank b G2 DSU FP Memory i 3 → 1 Bank Lv 4 Sb RM i = Memory Demand to bank b RR 2 → 1 A y , b : number of concurrent accesses from core y to bank b P 15 Lv 3 i RR 16 → 1 G1 P 1 Lv 2 � P 0 Lv 1 � task of interest Lv 1 = S b P 0 i 15 min( A y , b � Lv 2 = Lv 1 + , Lv 1 ) i y =1 y Lv 3 = Lv 2 +min( A G 2, b t , Lv 2 ) i 00 40 80 11 / 21
Model of the MPPA Bus G3 Rx high priority I BUS AG 3, b : delay from all accesses + concurrent ones Tx i b Shared RR S b i : number of accesses of task τ i to bank b G2 DSU FP Memory 3 → 1 Bank Lv 4 Sb RM i = Memory Demand to bank b RR 2 → 1 A y , b : number of concurrent accesses from core y to bank b P 15 Lv 3 i RR 16 → 1 G1 P 1 Lv 2 � P 0 Lv 1 � task of interest Lv 1 = S b P 0 i 15 min( A y , b � Lv 2 = Lv 1 + , Lv 1 ) i y =1 y Lv 3 = Lv 2 +min( A G 2, b t , Lv 2 ) i 00 40 80 Lv 4 = Lv 4 + A G 3, b i 11 / 21
Model of the MPPA Bus G3 Rx high priority I BUS : delay from all accesses + concurrent ones Tx b Shared RR S b i : number of accesses of task τ i to bank b G2 DSU FP Memory 3 → 1 Bank Lv 4 Sb RM i = Memory Demand to bank b RR 2 → 1 A y , b : number of concurrent accesses from core y to bank b P 15 Lv 3 i RR 16 → 1 G1 P 1 Lv 2 � P 0 Lv 1 � task of interest Lv 1 = S b P 0 i 15 min( A y , b � Lv 2 = Lv 1 + , Lv 1 ) i y =1 y Lv 3 = Lv 2 +min( A G 2, b t , Lv 2 ) i 00 40 80 Lv 4 = Lv 4 + A G 3, b i I BUS = Lv 4 × Bus Delay b 11 / 21
Model of the MPPA Bus G3 Rx high priority I BUS : delay from all accesses + concurrent ones Tx b Shared RR S b i : number of accesses of task τ i to bank b G2 DSU FP Memory 3 → 1 Bank Lv 4 Sb RM i = Memory Demand to bank b RR 2 → 1 A y , b : number of concurrent accesses from core y to bank b P 15 Lv 3 i RR Ay , b 16 → 1 � = overlapping concurrent accesses G1 P 1 i Lv 2 � P 0 Lv 1 � task of interest Lv 1 = S b P 0 i � 15 min( A y , b 0 � � Lv 2 = Lv 1 + , Lv 1 ) i y =1 y Lv 3 = Lv 2 +min( A G 2, b t , Lv 2 ) i 00 40 80 Lv 4 = Lv 4 + A G 3, b i I BUS = Lv 4 × Bus Delay b 11 / 21
Model of the MPPA Bus G3 Rx high priority I BUS : delay from all accesses + concurrent ones Tx b Shared RR S b i : number of accesses of task τ i to bank b G2 DSU FP Memory 3 → 1 Bank Lv 4 Sb RM i = Memory Demand to bank b RR 2 → 1 A y , b : number of concurrent accesses from core y to bank b P 15 Lv 3 i RR Ay , b 16 → 1 � = overlapping concurrent accesses G1 P 1 i Lv 2 � P 0 Lv 1 � task of interest Lv 1 = S b P 0 i � 15 min( A y , b 0 � � Lv 2 = Lv 1 + , Lv 1 ) i y =1 y Lv 3 = Lv 2 +min( A G 2, b t , Lv 2 ) i 00 40 80 Lv 4 = Lv 4 + A G 3, b i � A y , b depends on rel i and R i i I BUS = Lv 4 × Bus Delay b 11 / 21
Response Time Analysis with Dependencies τ 4 τ 5 PE 2 τ 3 PE 1 1 initial rel i WCRT analysis τ 0 τ 1 τ 2 PE 0 for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) i end for 1 Start with initial release dates. 12 / 21
Response Time Analysis with Dependencies τ 4 τ 5 PE 2 R l +1 � = R l τ 3 PE 1 initial rel 0 i i i 2 WCRT analysis τ 0 τ 1 τ 2 PE 0 for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) i end for 1 Start with initial release dates. 2 Compute response times ... 12 / 21
Response Time Analysis with Dependencies τ 4 τ 5 PE 2 R l +1 � = R l τ 3 PE 1 initial rel 0 i i i 2 WCRT analysis τ 0 τ 1 τ 2 PE 0 for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) i end for 1 Start with initial release dates. 2 Compute response times ... ... 12 / 21
Response Time Analysis with Dependencies τ 4 τ 5 PE 2 τ 3 PE 1 initial rel 0 R l +1 � = R l i i i WCRT analysis τ 0 τ 1 τ 2 PE 0 for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) i end for 1 Start with initial release dates. R i 2 Compute response times 2 ... ... ... a fixed-point is reached! 12 / 21
Response Time Analysis with Dependencies τ 4 τ 5 PE 2 τ 3 PE 1 initial rel 0 R l +1 � = R l i i i WCRT analysis τ 0 τ 1 τ 2 PE 0 for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) i end for 1 Start with initial release dates. 2 Compute response times R i ... ... ... a fixed-point is reached! 3 Update the release dates. Update release dates 3 for all i do rel i ← latest finish time of all the de- pendencies end for 12 / 21
Response Time Analysis with Dependencies τ 4 τ 5 PE 2 τ 3 PE 1 initial rel 0 R l +1 � = R l i i i WCRT analysis τ 0 τ 1 τ 2 PE 0 for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) i end for 4 1 Start with initial release dates. new rel i 2 Compute response times R i repeat ... ... ... a fixed-point is reached! 3 Update the release dates. Update release dates 4 Repeat until no release date changes for all i do rel i ← latest finish time of all the de- (another fixed-point iteration). pendencies end for 12 / 21
Response Time Analysis with Dependencies τ 4 τ 5 PE 2 τ 3 PE 1 initial rel 0 R l +1 � = R l i i i WCRT analysis τ 0 τ 1 τ 2 PE 0 for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) i end for 1 Start with initial release dates. new rel i 2 Compute response times R i repeate ... ... ... a fixed-point is reached! 3 Update the release dates. Update release dates 4 Repeat until no release date changes for all i do rel i ← latest finish time of all the de- (another fixed-point iteration). pendencies end for 4 rel i did not change Return: ( rel i , R i ) 12 / 21
Convergence Toward a Fixed-point τ 4 τ 5 PE 2 initial rel 0 R l +1 � = R l i i i τ 3 PE 1 WCRT analysis for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) τ 0 τ 1 τ 2 PE 0 i end for ◦ Convergence of the 1 st fixed-point iteration: new rel i R i repeate Update release dates for all i do rel i ← latest finish time of all the de- pendencies end for rel i did not change Return: ( rel i , R i ) 13 / 21
Convergence Toward a Fixed-point τ 4 τ 5 PE 2 initial rel 0 R l +1 � = R l i i i τ 3 PE 1 WCRT analysis for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) τ 0 τ 1 τ 2 PE 0 i end for ◦ Convergence of the 1 st fixed-point iteration: new rel i R i ◦ Monotonic and bounded ✓ repeate Update release dates for all i do rel i ← latest finish time of all the de- pendencies end for rel i did not change Return: ( rel i , R i ) 13 / 21
Convergence Toward a Fixed-point τ 4 τ 5 PE 2 initial rel 0 R l +1 � = R l i i i τ 3 PE 1 WCRT analysis for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) τ 0 τ 1 τ 2 PE 0 i end for ◦ Convergence of the 1 st fixed-point iteration: new rel i R i ◦ Monotonic and bounded ✓ repeat ◦ Convergence of the 2 nd fixed-point iteration: Update release dates for all i do rel i ← latest finish time of all the de- pendencies end for rel i did not change Return: ( rel i , R i ) 13 / 21
Convergence Toward a Fixed-point τ 4 τ 5 PE 2 initial rel 0 R l +1 � = R l i i i τ 3 PE 1 WCRT analysis for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) τ 0 τ 1 τ 2 PE 0 i end for ◦ Convergence of the 1 st fixed-point iteration: new rel i R i ◦ Monotonic and bounded ✓ repeat ◦ Convergence of the 2 nd fixed-point iteration: Update release dates ◦ no monotonicity: R i and rel i may grow or shrink at each iteration. ? for all i do rel i ← latest finish time of all the de- pendencies end for rel i did not change Return: ( rel i , R i ) 13 / 21
Convergence Toward a Fixed-point τ 4 τ 5 PE 2 initial rel 0 R l +1 � = R l i i i τ 3 PE 1 WCRT analysis for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) τ 0 τ 1 τ 2 PE 0 i end for ◦ Convergence of the 1 st fixed-point iteration: new rel i R i ◦ Monotonic and bounded ✓ repeat ◦ Convergence of the 2 nd fixed-point iteration: Update release dates ◦ no monotonicity: R i and rel i may grow or shrink at each iteration. ? for all i do rel i ← latest finish time of all the de- pendencies Theorem end for rel i did not change At each iteration, at least one task finds its final release date. Return: ( rel i , R i ) Full proof in our technical report: http://www-verimag.imag.fr/TR/TR-2016-1.pdf 13 / 21
Convergence Toward a Fixed-point t τ 4 τ 5 PE 2 initial rel 0 R l +1 � = R l i i i τ 3 PE 1 WCRT analysis for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) τ 0 τ 1 τ 2 PE 0 i end for ◦ Convergence of the 1 st fixed-point iteration: new rel i R i ◦ Monotonic and bounded ✓ repeat ◦ Convergence of the 2 nd fixed-point iteration: Update release dates ◦ no monotonicity: R i and rel i may grow or shrink at each iteration. ? for all i do rel i ← latest finish time of all the de- pendencies Theorem end for rel i did not change At each iteration, at least one task finds its final release date. Return: ( rel i , R i ) Full proof in our technical report: http://www-verimag.imag.fr/TR/TR-2016-1.pdf 13 / 21
Convergence Toward a Fixed-point final release date t τ 4 τ 5 PE 2 initial rel 0 R l +1 � = R l i i i τ 3 PE 1 WCRT analysis for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) τ 0 τ 1 τ 2 PE 0 i end for ◦ Convergence of the 1 st fixed-point iteration: new rel i R i ◦ Monotonic and bounded ✓ repeat ◦ Convergence of the 2 nd fixed-point iteration: Update release dates ◦ no monotonicity: R i and rel i may grow or shrink at each iteration. ? for all i do rel i ← latest finish time of all the de- pendencies Theorem end for rel i did not change At each iteration, at least one task finds its final release date. Return: ( rel i , R i ) Full proof in our technical report: http://www-verimag.imag.fr/TR/TR-2016-1.pdf 13 / 21
Convergence Toward a Fixed-point final release date t τ 4 τ 5 PE 2 initial rel 0 R l +1 � = R l i i i τ 3 PE 1 WCRT analysis for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) τ 0 τ 1 τ 2 PE 0 i end for ◦ Convergence of the 1 st fixed-point iteration: new rel i R i ◦ Monotonic and bounded ✓ repeat ◦ Convergence of the 2 nd fixed-point iteration: Update release dates ◦ no monotonicity: R i and rel i may grow or shrink at each iteration. ? for all i do rel i ← latest finish time of all the de- pendencies Theorem end for rel i did not change At each iteration, at least one task finds its final release date. Return: ( rel i , R i ) Full proof in our technical report: http://www-verimag.imag.fr/TR/TR-2016-1.pdf 13 / 21
Convergence Toward a Fixed-point final release date t τ 4 τ 5 PE 2 initial rel 0 R l +1 � = R l i i i τ 3 PE 1 WCRT analysis for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) τ 0 τ 1 τ 2 PE 0 i end for ◦ Convergence of the 1 st fixed-point iteration: new rel i R i ◦ Monotonic and bounded ✓ repeat ◦ Convergence of the 2 nd fixed-point iteration: Update release dates ◦ no monotonicity: R i and rel i may grow or shrink at each iteration. ? for all i do rel i ← latest finish time of all the de- pendencies Theorem end for rel i did not change At each iteration, at least one task finds its final release date. Return: ( rel i , R i ) Full proof in our technical report: http://www-verimag.imag.fr/TR/TR-2016-1.pdf 13 / 21
Convergence Toward a Fixed-point final release date t τ 4 τ 5 PE 2 initial rel 0 R l +1 � = R l i i i τ 3 PE 1 WCRT analysis for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) τ 0 τ 1 τ 2 PE 0 i end for ◦ Convergence of the 1 st fixed-point iteration: new rel i R i ◦ Monotonic and bounded ✓ repeat ◦ Convergence of the 2 nd fixed-point iteration: Update release dates ◦ no monotonicity: R i and rel i may grow or shrink at each iteration. ? for all i do rel i ← latest finish time of all the de- pendencies Theorem end for rel i did not change At each iteration, at least one task finds its final release date. Return: ( rel i , R i ) Full proof in our technical report: http://www-verimag.imag.fr/TR/TR-2016-1.pdf 13 / 21
Outline 1 Motivation and Context 2 Models Definition Architecture Model Execution Model Application Model 3 Multicore Response Time Analysis of SDF Programs 4 Evaluation 5 Conclusion and Future Work 14 / 21
Evaluation: ROSACE Case Study 1 h ( 200 Hz ) h_filter altitude ( 100 Hz ) ( 50 Hz ) az ( 200 Hz ) az_filter ( 100 Hz ) vz ( 200 Hz ) δ ec vz_filter vz_control ( 100 Hz ) ( 50 Hz ) q ( 200 Hz ) q_filter ( 100 Hz ) δ the va_control ( 50 Hz ) va ( 200 Hz ) va_filter ( 100 Hz ) ◦ Flight management system controller 1 Pagetti et al., RTAS 2014 15 / 21
Evaluation: ROSACE Case Study 1 h ( 200 Hz ) h_filter altitude ( 100 Hz ) ( 50 Hz ) az ( 200 Hz ) az_filter ( 100 Hz ) vz ( 200 Hz ) δ ec vz_filter vz_control ( 100 Hz ) ( 50 Hz ) q ( 200 Hz ) q_filter ( 100 Hz ) δ the va_control ( 50 Hz ) va ( 200 Hz ) va_filter ( 100 Hz ) ◦ Flight management system controller ◦ Receive from sensors and transmit to actuators 1 Pagetti et al., RTAS 2014 15 / 21
Evaluation: ROSACE Case Study 1 h ( 200 Hz ) h_filter altitude ( 100 Hz ) ( 50 Hz ) Rx receive az ( 200 Hz ) 200 Hz az_filter ( 100 Hz ) Tx transmit 50 Hz vz ( 200 Hz ) δ ec vz_filter vz_control h_filter vz_control ( 100 Hz ) ( 50 Hz ) P 4 altitude 100 Hz 50 Hz 50 Hz q ( 200 Hz ) q_filter az_filter P 3 ( 100 Hz ) δ the va_control 100 Hz ( 50 Hz ) va ( 200 Hz ) va_filter vz_filter P 2 ( 100 Hz ) 100 Hz q_filter P 1 100 Hz va_filter va_control P 0 100 Hz 50 Hz ◦ Flight management system controller ◦ Receive from sensors and transmit to actuators ◦ Assumptions: Tasks are mapped on 5 cores Debug Support Unit is disabled Context switches are over-approximated constants 1 Pagetti et al., RTAS 2014 15 / 21
Evaluation: ROSACE Case Study 1 h ( 200 Hz ) Hyper-period h_filter altitude ( 100 Hz ) ( 50 Hz ) Rx receive receive receive receive az ( 200 Hz ) 200 Hz 200 Hz 200 Hz 200 Hz az_filter ( 100 Hz ) Tx transmit 50 Hz vz ( 200 Hz ) δ ec vz_filter vz_control h_filter vz_control h_filter ( 100 Hz ) ( 50 Hz ) P 4 altitude 100 Hz 50 Hz 50 Hz 100 Hz q ( 200 Hz ) q_filter az_filter az_filter P 3 ( 100 Hz ) δ the va_control 100 Hz 100 Hz ( 50 Hz ) va ( 200 Hz ) va_filter vz_filter vz_filter P 2 ( 100 Hz ) 100 Hz 100 Hz q_filter q_filter P 1 100 Hz 100 Hz va_filter va_control va_filter P 0 100 Hz 50 Hz 100 Hz ◦ Flight management system controller ◦ Receive from sensors and transmit to actuators ◦ Assumptions: Tasks are mapped on 5 cores Debug Support Unit is disabled Context switches are over-approximated constants 1 Pagetti et al., RTAS 2014 15 / 21
Evaluation: ROSACE Case Study Task Processor Demand (cycles) Memory Demand (accesses) altitude 275 22 az_filter 274 22 h_filter 326 24 va_control 303 24 va_filter 301 23 vz_control 320 25 vz_filter 334 25 Table: Task profiles of the FMS controller ◦ Profile obtained from measurements 16 / 21
Evaluation: ROSACE Case Study Task Processor Demand (cycles) Memory Demand (accesses) altitude 275 22 az_filter 274 22 h_filter 326 24 va_control 303 24 va_filter 301 23 vz_control 320 25 vz_filter 334 25 Table: Task profiles of the FMS controller ◦ Profile obtained from measurements ◦ Memory Demand: data and instruction cache misses + communications 16 / 21
Evaluation: ROSACE Case Study Task Processor Demand (cycles) Memory Demand (accesses) altitude 275 22 az_filter 274 22 h_filter 326 24 va_control 303 24 va_filter 301 23 vz_control 320 25 vz_filter 334 25 Table: Task profiles of the FMS controller ◦ Profile obtained from measurements ◦ Memory Demand: data and instruction cache misses + communications ◦ Moreover: ◦ NoC Rx : writes 5 words ◦ NoC Tx : reads 2 words 16 / 21
Evaluation: ROSACE Case Study Task Processor Demand (cycles) Memory Demand (accesses) altitude 275 22 az_filter 274 22 h_filter 326 24 va_control 303 24 va_filter 301 23 vz_control 320 25 vz_filter 334 25 Table: Task profiles of the FMS controller ◦ Profile obtained from measurements ◦ Memory Demand: data and instruction cache misses + communications ◦ Moreover: ◦ NoC Rx : writes 5 words ◦ NoC Tx : reads 2 words � Experiments: Find the smallest schedulable hyper-period 16 / 21
Evaluation: Experiments 1 bank 5 banks 16000 E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase 12000 E1: 2−Phase Processor cycles 8000 4000 0 MPPA RR MPPA RR Bus Policy Smallest schedulable hyper-period 17 / 21
Evaluation: Experiments 1 bank 5 banks 16000 E5: Pessimistic ◦ Pessimistic assumption: E4: 1−Phase (w/o release) High priority tasks are E3: 2−Phase (w/o release) bounded by 1 access per bank E2: 1−Phase 12000 E1: 2−Phase Processor cycles 8000 4000 0 MPPA RR MPPA RR Bus Policy Smallest schedulable hyper-period E5: All accesses interfere 17 / 21
Evaluation: Experiments 1 bank 5 banks 16000 E5: Pessimistic ◦ Pessimistic assumption: E4: 1−Phase (w/o release) High priority tasks are E3: 2−Phase (w/o release) bounded by 1 access per bank E2: 1−Phase 12000 E1: 2−Phase Processor cycles 8000 4000 0 MPPA RR MPPA RR Bus Policy Smallest schedulable hyper-period E4, E3: We don’t use E5: All accesses interfere the release dates 17 / 21
Evaluation: Experiments 1 bank 5 banks 16000 E5: Pessimistic ◦ Pessimistic assumption: E4: 1−Phase (w/o release) High priority tasks are E3: 2−Phase (w/o release) bounded by 1 access per bank E2: 1−Phase 12000 E1: 2−Phase Processor cycles 8000 4000 0 MPPA RR MPPA RR Bus Policy Smallest schedulable hyper-period E4, E3: We don’t use E2, E1: Our approach. E5: All accesses interfere the release dates We use the release dates 17 / 21
Evaluation: Experiments 1-Phase model 2-Phase model memory access pattern memory access pattern 1 bank 5 banks 16000 E5: Pessimistic ◦ Pessimistic assumption: E4: 1−Phase (w/o release) High priority tasks are E3: 2−Phase (w/o release) bounded by 1 access per bank E2: 1−Phase 12000 E1: 2−Phase Processor cycles ◦ Phases are modeled as sub-tasks 8000 4000 0 MPPA RR MPPA RR Bus Policy Smallest schedulable hyper-period E4, E3: We don’t use E2, E1: Our approach. E5: All accesses interfere the release dates We use the release dates 17 / 21
Evaluation: Experiments Taking into account the memory banks improves the analysis with a factor in [1.77,2.52] 1 bank 5 banks 16000 E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase 12000 E1: 2−Phase Processor cycles 8000 4000 A r t i * f a S C o m * p l e N n t t e e t * c T s s i W R n e t o l l C d * o * * e c s u u m d e E r e o n t t e v e y s a E * d 0 t a a l u MPPA RR MPPA RR Bus Policy Smallest schedulable hyper-period E5/E1 E5/E2 E3/E1 E4/E2 E2/E1 E4/E3 MPPA 4.15 4.12 1.68 1.29 ∼ 1.01 0.77 RR 3.3 3.29 1.24 1.13 ∼ 1.01 0.91 Speedup factors 18 / 21
Evaluation: Experiments Taking into account the memory banks improves the analysis with a factor in [1.77,2.52] 1 bank 5 banks 16000 E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase 12000 E1: 2−Phase Processor cycles 8000 4000 A r t i * f a S C o m * p l e N n t t e e t * c T s s i W R n e t o l l C d * o * * e c s u u m d e E r e o n t t e v e y s a E * d 0 t a a l u MPPA RR MPPA RR Bus Policy Smallest schedulable hyper-period E5/E1 E5/E2 E3/E1 E4/E2 E2/E1 E4/E3 MPPA 4.15 4.12 1.68 1.29 ∼ 1.01 0.77 RR 3.3 3.29 1.24 1.13 ∼ 1.01 0.91 Speedup factors 18 / 21
Evaluation: Experiments Taking into account the memory banks improves the analysis with a factor in [1.77,2.52] 1 bank 5 banks 16000 E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase 12000 E1: 2−Phase Processor cycles 8000 4000 A r t i * f a S C o m * p l e N n t t e e t * c T s s i W R n e t o l l C d * o * * e c s u u m d e E r e o n t t e v e y s a E * d 0 t a a l u MPPA RR MPPA RR Bus Policy Smallest schedulable hyper-period E5/E1 E5/E2 E3/E1 E4/E2 E2/E1 E4/E3 MPPA 4.15 4.12 1.68 1.29 ∼ 1.01 0.77 RR 3.3 3.29 1.24 1.13 ∼ 1.01 0.91 Speedup factors 18 / 21
Evaluation: Experiments Taking into account the memory banks improves the analysis with a factor in [1.77,2.52] 1 bank 5 banks 16000 E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase 12000 E1: 2−Phase Processor cycles 8000 4000 A r t i * f a S C o m * p l e N n t t e e t * c T s s i W R n e t o l l C d * o * * e c s u u m d e E r e o n t t e v e y s a E * d 0 t a a l u MPPA RR MPPA RR Bus Policy Smallest schedulable hyper-period E5/E1 E5/E2 E3/E1 E4/E2 E2/E1 E4/E3 MPPA 4.15 4.12 1.68 1.29 ∼ 1.01 0.77 RR 3.3 3.29 1.24 1.13 ∼ 1.01 0.91 Speedup factors 18 / 21
Evaluation: Experiments Taking into account the memory banks improves the analysis with a factor in [1.77,2.52] 1 bank 5 banks 16000 E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase 12000 E1: 2−Phase Processor cycles 8000 4000 A r t i * f a S C o m * p l e N n t t e e t * c T s s i W R n e t o l l C d * o * * e c s u u m d e E r e o n t t e v e y s a E * d 0 t a a l u MPPA RR MPPA RR Bus Policy Smallest schedulable hyper-period E5/E1 E5/E2 E3/E1 E4/E2 E2/E1 E4/E3 MPPA 4.15 4.12 1.68 1.29 ∼ 1.01 0.77 RR 3.3 3.29 1.24 1.13 ∼ 1.01 0.91 Speedup factors 18 / 21
Evaluation: Experiments Taking into account the memory banks improves the analysis with a factor in [1.77,2.52] 1 bank 5 banks 16000 E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase 12000 E1: 2−Phase Processor cycles 8000 4000 A r t i * f a S C o m * p l e N n t t e e t * c T s s i W R n e t o l l C d * o * * e c s u u m d e E r e o n t t e v e y s a E * d 0 t a a l u MPPA RR MPPA RR Bus Policy Smallest schedulable hyper-period E5/E1 E5/E2 E3/E1 E4/E2 E2/E1 E4/E3 MPPA 4.15 4.12 1.68 1.29 ∼ 1.01 0.77 RR 3.3 3.29 1.24 1.13 ∼ 1.01 0.91 Speedup factors 18 / 21
Outline 1 Motivation and Context 2 Models Definition Architecture Model Execution Model Application Model 3 Multicore Response Time Analysis of SDF Programs 4 Evaluation 5 Conclusion and Future Work 19 / 21
Conclusion ◦ A response time analysis of SDF on the Kalray MPPA 256 20 / 21
Conclusion ◦ A response time analysis of SDF on the Kalray MPPA 256 ◦ Given: ◦ Task profile ◦ Mapping of Tasks ◦ Execution Order 20 / 21
Conclusion ◦ A response time analysis of SDF on the Kalray MPPA 256 ◦ Given: ◦ Task profile ◦ Mapping of Tasks ◦ Execution Order ◦ We compute: ◦ Tight response times taking into account the interference. ◦ Release dates respecting the dependency constraints. 20 / 21
Conclusion ◦ A response time analysis of SDF on the Kalray MPPA 256 ◦ Given: ◦ Task profile ◦ Mapping of Tasks ◦ Execution Order model of the multi-level arbiter ◦ We compute: ◦ Tight response times taking into account the interference. ◦ Release dates respecting the dependency constraints. 20 / 21
Conclusion ◦ A response time analysis of SDF on the Kalray MPPA 256 ◦ Given: ◦ Task profile ◦ Mapping of Tasks ◦ Execution Order model of the multi-level arbiter ◦ We compute: ◦ Tight response times taking into account the interference. ◦ Release dates respecting the dependency constraints. double fixed-point algorithm 20 / 21
Conclusion ◦ A response time analysis of SDF on the Kalray MPPA 256 ◦ Given: ◦ Task profile ◦ Mapping of Tasks ◦ Execution Order model of the multi-level arbiter ◦ We compute: ◦ Tight response times taking into account the interference. ◦ Release dates respecting the dependency constraints. ◦ Not restricted to SDF double fixed-point algorithm 20 / 21
Future Work ◦ Model of the Resource Manager. NoC Rx NoC Tx 8 shared memory banks 8 shared memory banks RM DSU P 6 P 7 P 14 P 15 P 4 P 5 P 12 P 13 P 2 P 3 P 10 P 11 P 0 P 1 P 8 P 9 21 / 21
Future Work tighter estimation of context switches and ◦ Model of the Resource Manager. other interrupts NoC Rx NoC Tx 8 shared memory banks 8 shared memory banks RM DSU P 6 P 7 P 14 P 15 P 4 P 5 P 12 P 13 P 2 P 3 P 10 P 11 P 0 P 1 P 8 P 9 21 / 21
Future Work tighter estimation of context switches and ◦ Model of the Resource Manager. other interrupts NoC Rx NoC Tx 8 shared memory banks 8 shared memory banks ◦ Model of the NoC accesses. RM DSU P 6 P 7 P 14 P 15 P 4 P 5 P 12 P 13 P 2 P 3 P 10 P 11 P 0 P 1 P 8 P 9 21 / 21
Future Work tighter estimation of context switches and ◦ Model of the Resource Manager. other interrupts use the output of NoC Rx NoC Tx 8 shared memory banks 8 shared memory banks any NoC analysis ◦ Model of the NoC accesses. RM DSU P 6 P 7 P 14 P 15 P 4 P 5 P 12 P 13 P 2 P 3 P 10 P 11 P 0 P 1 P 8 P 9 21 / 21
Future Work tighter estimation of context switches and ◦ Model of the Resource Manager. other interrupts use the output of NoC Rx NoC Tx 8 shared memory banks 8 shared memory banks any NoC analysis ◦ Model of the NoC accesses. RM DSU P 6 P 7 P 14 P 15 P 4 P 5 P 12 P 13 ◦ Memory access pipelining. P 2 P 3 P 10 P 11 P 0 P 1 P 8 P 9 21 / 21
Future Work tighter estimation of context switches and ◦ Model of the Resource Manager. other interrupts use the output of NoC Rx NoC Tx 8 shared memory banks 8 shared memory banks any NoC analysis ◦ Model of the NoC accesses. RM DSU current assumption: P 6 P 7 P 14 P 15 bus delay is 10 cycles P 4 P 5 P 12 P 13 ◦ Memory access pipelining. P 2 P 3 P 10 P 11 P 0 P 1 P 8 P 9 21 / 21
Future Work tighter estimation of context switches and ◦ Model of the Resource Manager. other interrupts use the output of NoC Rx NoC Tx 8 shared memory banks 8 shared memory banks any NoC analysis ◦ Model of the NoC accesses. RM DSU current assumption: P 6 P 7 P 14 P 15 bus delay is 10 cycles P 4 P 5 P 12 P 13 ◦ Memory access pipelining. P 2 P 3 P 10 P 11 P 0 P 1 P 8 P 9 ◦ Model Blocking and non-blocking accesses. 21 / 21
Future Work tighter estimation of context switches and ◦ Model of the Resource Manager. other interrupts use the output of NoC Rx NoC Tx 8 shared memory banks 8 shared memory banks any NoC analysis ◦ Model of the NoC accesses. RM DSU current assumption: P 6 P 7 P 14 P 15 bus delay is 10 cycles P 4 P 5 P 12 P 13 ◦ Memory access pipelining. P 2 P 3 P 10 P 11 P 0 P 1 P 8 P 9 ◦ Model Blocking and non-blocking accesses. reads are blocking writes are non-blocking 21 / 21
Future Work tighter estimation of context switches and ◦ Model of the Resource Manager. other interrupts use the output of NoC Rx NoC Tx 8 shared memory banks 8 shared memory banks any NoC analysis ◦ Model of the NoC accesses. RM DSU current assumption: P 6 P 7 P 14 P 15 bus delay is 10 cycles P 4 P 5 P 12 P 13 ◦ Memory access pipelining. P 2 P 3 P 10 P 11 P 0 P 1 P 8 P 9 ◦ Model Blocking and non-blocking accesses. reads are blocking writes are non-blocking Questions? 21 / 21
BACKUP
Multicore Response Time Analysis Example: Fixed Priority bus arbiter, PE1 > PE0 Bus access delay = 10 T 1 T 1 T 1 T 1 2 accesses 2 accesses 2 accesses 2 accesses PE1 T 0 PE0 t 00 40 80 120 160 1 Altmeyer et al., RTNS 2015
Multicore Response Time Analysis Example: Fixed Priority bus arbiter, PE1 > PE0 Bus access delay = 10 T 1 T 1 T 1 T 1 2 accesses 2 accesses 2 accesses 2 accesses PE1 T 0 3 accesses PE0 t 00 40 80 120 160 ◦ Task of interest running on PE 0 : R 0 = 10+3 × 10 (response time in isolation) 1 Altmeyer et al., RTNS 2015
Multicore Response Time Analysis Example: Fixed Priority bus arbiter, PE1 > PE0 Bus access delay = 10 T 1 T 1 T 1 T 1 2 accesses 2 accesses 2 accesses 2 accesses PE1 T 0 3 accesses PE0 t 00 40 80 120 160 ◦ Task of interest running on PE 0 : R 0 = 10+3 × 10 (response time in isolation) R 1 = 10+3 × 10+2 × 10 = 60 1 Altmeyer et al., RTNS 2015
Recommend
More recommend