Parallelism in Dijkstra’s algorithm? Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 6
Parallelism in Dijkstra’s algorithm? Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 6
Parallelism in Dijkstra’s algorithm? Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 6
Parallelism in Dijkstra’s algorithm? Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 6
Parallelism in Dijkstra’s algorithm? Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 6
Parallelism in Dijkstra’s algorithm? Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 6
Parallelism in Dijkstra’s algorithm? Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 6
Parallelism in Dijkstra’s algorithm? Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E 512 Speedup Dijkstra 256 performance Non-speculative 1 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 6
Parallelism in Dijkstra’s algorithm? Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E 512 Speedup Dijkstra 256 performance Non-speculative 1 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 6
Parallelism in Dijkstra’s algorithm? Data dependences Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E 512 Speedup Dijkstra 256 performance Non-speculative 1 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 6
Parallelism in Dijkstra’s algorithm? Data dependences Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E 512 E Valid Speedup Dijkstra A C B D out-of-order 256 performance schedule B D E Time Non-speculative 1 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 6
Dijkstra as a Swarm program [MICRO’15] void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) { Timestamp nDist = dist + weight(v, n); swarm::enqueue (dijkstraTask, nDist, n); } } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 7
Dijkstra as a Swarm program [MICRO’15] void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) { Timestamp nDist = dist + weight(v, n); swarm::enqueue (dijkstraTask, nDist, n); } } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 7
Dijkstra as a Swarm program [MICRO’15] void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) { Timestamp nDist = dist + weight(v, n); swarm::enqueue (dijkstraTask, nDist, n); } } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 7
Dijkstra as a Swarm program [MICRO’15] void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) { Timestamp nDist = dist + weight(v, n); swarm::enqueue (dijkstraTask, nDist, n); } Function Pointer Timestamp Arguments } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 7
Dijkstra as a Swarm program [MICRO’15] void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) { Timestamp nDist = dist + weight(v, n); swarm::enqueue (dijkstraTask, nDist, n); } Function Pointer Timestamp Arguments } } swarm::enqueue (dijkstraTask, 0, sourceVertex); swarm::run (); HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 7
Dijkstra as a Swarm program [MICRO’15] void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; Implicit Parallelism for (Vertex* n : v->neighbors) { Timestamp nDist = dist + weight(v, n); swarm::enqueue (dijkstraTask, nDist, n); } No explicit Function Pointer Timestamp Arguments } synchronization } swarm::enqueue (dijkstraTask, 0, sourceVertex); swarm::run (); Conveys new work to hardware as soon as possible HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 7
Swarm microarchitecture [MICRO’15] Swarm executes all tasks Tile organization 64-tile, 256-core chip speculatively and out of order Mem / IO L3 slice Router Large hardware task queues L2 Tile Mem / IO Mem / IO Scalable ordered speculation L1I/D L1I/D L1I/D L1I/D Core Core Core Core Scalable ordered commits Task unit Mem / IO Efficiently supports thousands of tiny speculative tasks HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 8
Dijkstra’s algorithm has speculative parallelism Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E 512 Non-speculative Speedup Dijkstra 256 performance 1 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9
Dijkstra’s algorithm has speculative parallelism Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E 512 512 Non-speculative Speedup Speedup Dijkstra 256 256 All-speculative performance [MICRO’15] 1 1 1c 1c 128c 128c 256c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9
Dijkstra’s algorithm has speculative parallelism Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E 512 512 E Non-speculative Speedup Speedup Dijkstra A C B D 256 256 All-speculative performance [MICRO’15] B D E Time 1 1 1c 1c 128c 128c 256c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9
Dijkstra’s algorithm has speculative parallelism Task graph A C E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E 512 512 Non-speculative Speedup Speedup Dijkstra 256 256 All-speculative performance [MICRO’15] 1 1 1c 1c 128c 128c 256c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9
Dijkstra’s algorithm has speculative parallelism Task graph A C E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E 512 512 Non-speculative Speedup Speedup Dijkstra 256 256 All-speculative performance [MICRO’15] 1 1 1c 1c 128c 128c 256c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9
Dijkstra’s algorithm has speculative parallelism Task graph A C E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E Dijkstra on cage14 512 512 256 Non-speculative Speedup Speedup Speedup Dijkstra 256 256 128 All-speculative performance [MICRO’15] 1 1 1 1c 1c 128c 128c 256c 256c 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9
Dijkstra’s algorithm has speculative parallelism Task graph A C E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E Dijkstra on cage14 512 512 256 Non-speculative Speedup Speedup Speedup Dijkstra 256 256 128 All-speculative performance [MICRO’15] 1 1 1 1c 1c 128c 128c 256c 256c 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9
Dijkstra’s algorithm has speculative parallelism Task graph A C E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E Dijkstra on cage14 512 512 256 Non-speculative Speedup Speedup Speedup Dijkstra 256 256 128 All-speculative performance [MICRO’15] 1 1 1 1c 1c 128c 128c 256c 256c 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9
Dijkstra’s algorithm has speculative parallelism Task graph A C E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E Dijkstra on cage14 512 512 256 Non-speculative Speedup Speedup Speedup Dijkstra 256 256 128 All-speculative performance [MICRO’15] 1 1 1 1c 1c 128c 128c 256c 256c 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9
Dijkstra’s algorithm has speculative parallelism Task graph A C E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E Dijkstra on cage14 512 512 256 Non-speculative Speedup Speedup Speedup Dijkstra 256 256 128 All-speculative performance [MICRO’15] 1 1 1 1c 1c 128c 128c 256c 256c 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9
Dijkstra’s algorithm has speculative parallelism Task graph A C E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E Dijkstra on cage14 512 512 256 Non-speculative Speedup Speedup Speedup Dijkstra 256 256 128 All-speculative performance [MICRO’15] 1 1 1 1c 1c 128c 128c 256c 256c 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9
Dijkstra’s algorithm has speculative parallelism Task graph A C E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E Dijkstra on cage14 512 512 256 256 20% Non-speculative Speedup Speedup Speedup Speedup Dijkstra 256 256 128 128 All-speculative performance [MICRO’15] 1 1 1 1 1c 1c 128c 128c 256c 256c 1c 1c 128c 128c 256c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9
Dijkstra’s algorithm has speculative parallelism Task graph A C B E D All-or-nothing speculation unduly burdens programmers E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E Dijkstra on cage14 512 512 256 256 20% Non-speculative Speedup Speedup Speedup Speedup Dijkstra 256 256 128 128 All-speculative performance [MICRO’15] 1 1 1 1 1c 1c 128c 128c 256c 256c 1c 1c 128c 128c 256c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9
Dijkstra’s algorithm needs a hybrid strategy HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 10
Dijkstra’s algorithm needs a hybrid strategy Task graph To be processed Finished 0 1 2 3 4 5 6 7 Order = Distance from source node HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 10
Dijkstra’s algorithm needs a hybrid strategy Task graph To be processed Finished 0 1 2 3 4 5 6 7 Order = Distance from source node HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 10
Dijkstra’s algorithm needs a hybrid strategy Task graph To be processed Finished Running non-speculatively 0 1 2 3 4 5 6 7 Order = Distance from source node Run tasks non-speculatively when possible HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 10
Dijkstra’s algorithm needs a hybrid strategy Task graph To be processed Finished Running non-speculatively Running speculatively 0 1 2 3 4 5 6 7 Order = Distance from source node Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 10
Dijkstra’s algorithm needs a hybrid strategy Task graph To be processed Finished Running non-speculatively Running speculatively 0 1 2 3 4 5 6 7 Order = Distance from source node Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 10
Dijkstra’s algorithm needs a hybrid strategy Task graph To be processed Finished Running non-speculatively Running speculatively 0 1 2 3 4 5 6 7 Order = Distance from source node Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 10
Dijkstra’s algorithm needs a hybrid strategy Task graph To be processed Finished Running non-speculatively Running speculatively 0 1 2 3 4 5 6 7 Order = Distance from source node Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 10
Dijkstra’s algorithm needs a hybrid strategy Task graph To be processed Finished Running non-speculatively Running speculatively 0 1 2 3 4 5 6 7 Order = Distance from source node Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 10
Dijkstra’s algorithm needs a hybrid strategy Task graph To be processed Finished Running non-speculatively Running speculatively 0 1 2 3 4 5 6 7 Order = Distance from source node Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism Each task must be runnable in either mode HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 10
Dijkstra’s algorithm needs a hybrid strategy Task graph To be processed Finished Running non-speculatively Running speculatively 0 1 2 3 4 5 6 7 Order = Distance from source node Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism Each task must be runnable in either mode Tasks in both modes must coordinate on shared data HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 10
esso reaps the benefits of Espr Espresso non-speculative and speculative parallelism Dijkstra on USA Dijkstra on cage14 512 512 256 256 Espresso Speedup Speedup Speedup Speedup All-speculative 256 256 128 128 Non-speculative 1 1 1 1 1c 1c 128c 128c 256c 256c 1c 1c 128c 128c 256c 256c Espresso avoids pathologies and scales best HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 11
Espr Espresso sso COORDINATING SPECULATIVE AND NON-SPECULATIVE PARALLELISM HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 12
esso execution model Espr Espresso Programs consist of tasks that run speculatively or non-speculatively Non-Spec. Spec. ordered Timestamp barrier commits reduce Locale mutex conflicts HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 13
esso execution model Espresso Espr Programs consist of tasks that run speculatively or non-speculatively void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; Non-Spec. Spec. for (Vertex* n : v->neighbors) espresso::create ( ordered Timestamp barrier dijkstraTask, commits dist + weight(v, n), reduce n->id, Locale mutex n); conflicts } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 13
esso execution model Espresso Espr Programs consist of tasks that run speculatively or non-speculatively void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; Non-Spec. Spec. for (Vertex* n : v->neighbors) espresso::create ( ordered Timestamp barrier dijkstraTask, commits dist + weight(v, n), reduce n->id, Locale mutex n); conflicts } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 13
esso execution model Espr Espresso Programs consist of tasks that run speculatively or non-speculatively void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; Non-Spec. Spec. for (Vertex* n : v->neighbors) espresso::create ( ordered Function Timestamp barrier dijkstraTask, commits pointer dist + weight(v, n), reduce n->id, Locale mutex Arguments n); conflicts } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 13
esso execution model Espr Espresso Programs consist of tasks that run speculatively or non-speculatively void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; Non-Spec. Spec. for (Vertex* n : v->neighbors) espresso::create ( ordered Function Timestamp barrier dijkstraTask, commits pointer dist + weight(v, n), reduce n->id, Locale mutex Arguments n); conflicts } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 13
esso execution model Espr Espresso Programs consist of tasks that run speculatively or non-speculatively void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; Non-Spec. Spec. for (Vertex* n : v->neighbors) espresso::create ( ordered Function Timestamp barrier dijkstraTask, commits pointer dist + weight(v, n), reduce n->id, Locale mutex Arguments n); conflicts } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 13
esso execution model Espr Espresso Programs consist of tasks that run speculatively or non-speculatively void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; Non-Spec. Spec. for (Vertex* n : v->neighbors) espresso::create ( ordered Function Timestamp barrier dijkstraTask, commits pointer dist + weight(v, n), reduce n->id, Locale mutex Arguments n); conflicts } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 13
esso execution model Espr Espresso Programs consist of tasks that run speculatively or non-speculatively void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; Non-Spec. Spec. for (Vertex* n : v->neighbors) espresso::create ( ordered Function Timestamp barrier dijkstraTask, commits pointer dist + weight(v, n), reduce n->id, Locale mutex Arguments n); conflicts } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 13
esso execution model Espr Espresso Programs consist of tasks that run speculatively or non-speculatively void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; Non-Spec. Spec. for (Vertex* n : v->neighbors) espresso::create ( ordered Function Timestamp barrier dijkstraTask, commits pointer dist + weight(v, n), reduce n->id, Locale mutex Arguments n); conflicts } Tasks in either mode can coordinate access to shared data } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 13
Espresso Espr esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14
Espr Espresso esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { Tile v->distance = dist; for (Vertex* n : v->neighbors) Dispatch Candidates espresso::create< type >( 7 dijkstraTask, Core 9 dist + weight(v, n), n->id, 10 Core n); … } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14
Espresso Espr esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { Tile v->distance = dist; for (Vertex* n : v->neighbors) Dispatch Candidates espresso::create< type >( SPEC 7 7 SPEC dijkstraTask, Core 9 9 SPEC dist + weight(v, n), n->id, 10 10 SPEC Core n); … … } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14
Espresso Espr esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { Tile v->distance = dist; for (Vertex* n : v->neighbors) Dispatch Candidates espresso::create< type >( SPEC 7 7 SPEC dijkstraTask, Core 9 9 SPEC dist + weight(v, n), n->id, 10 10 SPEC Core n); … … } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14
Espresso Espr esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { Tile v->distance = dist; for (Vertex* n : v->neighbors) Dispatch Candidates espresso::create< type >( SPEC 7 7 SPEC dijkstraTask, Core 9 9 SPEC dist + weight(v, n), n->id, 10 10 SPEC Core n); … … } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14
Espr Espresso esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { Tile v->distance = dist; for (Vertex* n : v->neighbors) Dispatch Candidates espresso::create< type >( NONSPEC SPEC 7 7 7 NONSPEC SPEC dijkstraTask, Core 9 9 9 SPEC SPEC dist + weight(v, n), n->id, 10 10 10 NONSPEC SPEC Core n); … … … } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14
Espresso Espr esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { Tile v->distance = dist; for (Vertex* n : v->neighbors) Dispatch Candidates espresso::create< type >( NONSPEC SPEC 7 7 7 NONSPEC SPEC dijkstraTask, Core 9 9 9 SPEC SPEC dist + weight(v, n), n->id, 10 10 10 NONSPEC SPEC Core n); … … … } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14
Espresso Espr esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { Tile v->distance = dist; for (Vertex* n : v->neighbors) Dispatch Candidates espresso::create< type >( NONSPEC SPEC 7 7 7 NONSPEC SPEC dijkstraTask, Core 9 9 9 SPEC SPEC dist + weight(v, n), n->id, 10 10 10 NONSPEC SPEC Core n); … … … } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14
Espr Espresso esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { Tile v->distance = dist; for (Vertex* n : v->neighbors) Dispatch Candidates espresso::create< type >( MAYSPEC NONSPEC SPEC 7 7 7 7 NONSPEC MAYSPEC SPEC dijkstraTask, Core 9 9 9 9 SPEC SPEC SPEC dist + weight(v, n), n->id, 10 10 10 10 NONSPEC NONSPEC SPEC Core n); … … … … } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14
Espr Espresso esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { Tile v->distance = dist; for (Vertex* n : v->neighbors) Dispatch Candidates espresso::create< type >( MAYSPEC NONSPEC SPEC 7 7 7 7 NONSPEC MAYSPEC SPEC dijkstraTask, Core 9 9 9 9 SPEC SPEC SPEC dist + weight(v, n), n->id, 10 10 10 10 NONSPEC NONSPEC SPEC Core n); … … … … } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14
Espr Espresso esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { Tile v->distance = dist; for (Vertex* n : v->neighbors) Dispatch Candidates espresso::create< type >( MAYSPEC NONSPEC SPEC 7 7 7 7 NONSPEC MAYSPEC SPEC dijkstraTask, Core 9 9 9 9 SPEC SPEC SPEC dist + weight(v, n), n->id, 10 10 10 10 NONSPEC NONSPEC SPEC Core n); … … … … } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14
Espr Espresso esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { Tile v->distance = dist; for (Vertex* n : v->neighbors) Dispatch Candidates espresso::create< type >( MAYSPEC NONSPEC SPEC 7 7 7 7 NONSPEC MAYSPEC SPEC dijkstraTask, Core 9 9 9 9 SPEC SPEC SPEC dist + weight(v, n), n->id, 10 10 10 10 NONSPEC NONSPEC SPEC Core n); … … … … } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14
Espr Espresso esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { Tile v->distance = dist; for (Vertex* n : v->neighbors) Dispatch Candidates espresso::create< type >( MAYSPEC NONSPEC SPEC 7 7 7 7 MAYSPEC NONSPEC SPEC dijkstraTask, Core 9 9 9 9 SPEC SPEC SPEC dist + weight(v, n), n->id, 10 10 10 10 NONSPEC NONSPEC SPEC Core n); … … … … } MAYSPEC lets the system decide whether to speculate } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14
sso improves efficiency and programmability Espr Espresso sssp-cage sssp-cage sssp-cage sssp-usa sssp-usa sssp-usa cf cf cf triangle triangle triangle 256 256 256 512 512 512 256 256 256 256 256 256 MAYSPEC allows programmers Speedup Speedup Speedup to exploit the best of speculative 256 256 256 128 128 128 128 128 128 128 128 128 and non-speculative parallelism 1 1 1 1 1 1 1 1 1 1 1 1 genome genome genome kmeans kmeans kmeans color color color bfs bfs bfs 128 128 128 256 256 256 256 256 256 512 512 512 Speedup Speedup Speedup 64 64 64 128 128 128 128 128 128 256 256 256 1 1 1 1 1 1 1 1 1 1 1 1 mis mis mis astar astar astar des des des 128 128 128 256 256 256 256 256 256 MAYSPEC Speedup Speedup Speedup Swarm 64 64 64 128 128 128 128 128 128 NONSPEC 1 1 1 1 1 1 1 1 1 1c 1c 1c 128c 128c 128c 256c 256c 256c 1c 1c 1c 128c 128c 128c 256c 256c 256c 1c 1c 1c 128c 128c 128c 256c 256c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 15
sso improves efficiency and programmability Espresso Espr sssp-cage sssp-cage sssp-cage sssp-usa sssp-usa sssp-usa cf cf cf triangle triangle triangle 256 256 256 512 512 512 256 256 256 256 256 256 MAYSPEC allows programmers 2.5x Speedup Speedup Speedup to exploit the best of speculative 256 256 256 128 128 128 128 128 128 128 128 128 and non-speculative parallelism 1 1 1 1 1 1 1 1 1 1 1 1 genome genome genome kmeans kmeans kmeans color color color bfs bfs bfs 128 128 128 256 256 256 256 256 256 512 512 512 Speedup Speedup Speedup 64 64 64 128 128 128 128 128 128 256 256 256 1 1 1 1 1 1 1 1 1 1 1 1 mis mis mis astar astar astar des des des 128 128 128 256 256 256 256 256 256 MAYSPEC Speedup Speedup Speedup Swarm 64 64 64 128 128 128 128 128 128 NONSPEC 1 1 1 1 1 1 1 1 1 1c 1c 1c 128c 128c 128c 256c 256c 256c 1c 1c 1c 128c 128c 128c 256c 256c 256c 1c 1c 1c 128c 128c 128c 256c 256c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 15
sso improves efficiency and programmability Espresso Espr sssp-cage sssp-cage sssp-cage sssp-usa sssp-usa sssp-usa cf cf cf triangle triangle triangle 256 256 256 512 512 512 256 256 256 256 256 256 MAYSPEC allows programmers 2.5x Speedup Speedup Speedup to exploit the best of speculative 256 256 256 128 128 128 128 128 128 128 128 128 and non-speculative parallelism 1 1 1 1 1 1 1 1 1 1 1 1 genome genome genome kmeans kmeans kmeans color color color bfs bfs bfs 128 128 128 256 256 256 256 256 256 512 512 512 Speedup Speedup Speedup MAYSPEC: 198x 64 64 64 128 128 128 128 128 128 256 256 256 22% 6.9x 1 1 1 1 1 1 1 1 1 1 1 1 Swarm: 162x mis mis mis astar astar astar des des des 128 128 128 256 256 256 256 256 256 MAYSPEC NONSPEC: 29x gmean Speedup Speedup Speedup Swarm 64 64 64 128 128 128 128 128 128 NONSPEC 1 1 1 1 1 1 1 1 1 1c 1c 1c 128c 128c 128c 256c 256c 256c 1c 1c 1c 128c 128c 128c 256c 256c 256c 1c 1c 1c 128c 128c 128c 256c 256c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 15
Please see the paper for more details! Microarchitectural details Interactions between speculative and non-speculative tasks: ◦ How are conflicts detected and resolved? ◦ How do timestamps-as-barriers affect the ordered commit protocol? Espresso exception model Additional results analysis HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 16
Cap Capsu sules les ENABLING SOFTWARE-MANAGED SPECULATION WITH ORDERED PARALLELISM HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 17
Some actions should bypass HW speculation Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 18
Some actions should bypass HW speculation Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks Memory B A D C Read & Write A D Core Core HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 18
Some actions should bypass HW speculation Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks DES Memory 256 B Ideal allocator A D Speedup C 128 Read & Write A D Core Core 1 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 18
Some actions should bypass HW speculation Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks DES Memory 256 B Ideal allocator Free list A D Speedup C 128 Read & Write A D Core Core 1 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 18
Recommend
More recommend