executing a program on the mit tagged token dataflow
play

Executing a Program on the MIT Tagged Token Dataflow Architecture - PowerPoint PPT Presentation

Executing a Program on the MIT Tagged Token Dataflow Architecture Arvind and Nikhil Notes on the paper This is a Big A Architecture paper Its a PL, an ISA, and an execution model and a dash of hardware Execution Models:


  1. Executing a Program on the MIT Tagged Token Dataflow Architecture Arvind and Nikhil

  2. Notes on the paper • This is a “Big A” Architecture paper • It’s a PL, an ISA, and an execution model • and a dash of hardware

  3. Execution Models: Von Neumann  Von Neumann (CMP)  Program counter  Centralized  Sequential  To serialization points  Instruction fetch  Memory access 3

  4. Execution Model: Dataflow  Not a new idea [Dennis, ISCA’75]  Programs are dataflow graphs  Instructions fire when data arrives  Instructions act independently +  All ready instructions can fire at once  Massive parallelism 4

  5. Execution Model: Dataflow  Not a new idea [Dennis, ISCA’75]  Programs are dataflow graphs  Instructions fire when data arrives 2  Instructions act independently +  All ready instructions can fire at once  Massive parallelism 4

  6. Execution Model: Dataflow  Not a new idea [Dennis, ISCA’75]  Programs are dataflow graphs 2  Instructions fire when data arrives 2  Instructions act independently + +  All ready instructions can fire at once  Massive parallelism 4

  7. Execution Model: Dataflow  Not a new idea [Dennis, ISCA’75]  Programs are dataflow graphs 2  Instructions fire when data arrives 2  Instructions act independently + +  All ready instructions can fire at once  Massive parallelism 4 4

  8. Von Neumann example Mul t1 ← i, j Mul t2 ← i, i Add t3 ← A, t1 A[j + i*i] = i; Add t4 ← j, t2 Add t5 ← A, t4 b = A[i*j]; Store (t5) ← i Load b ← (t3) 5

  9. Dataflow example i A j * * A[j + i*i] = i; + + b = A[i*j]; Load + Store b 6

  10. Dataflow example i A j * * A[j + i*i] = i; + + b = A[i*j]; Load + Store b 7

  11. Dataflow example i A j * * A[j + i*i] = i; + + b = A[i*j]; Load + Store b 8

  12. Dataflow example i A j * * A[j + i*i] = i; + + b = A[i*j]; Load + Store b 9

  13. Dataflow example i A j * * A[j + i*i] = i; + + b = A[i*j]; Load + Store b 10

  14. Dataflow example i A j * * A[j + i*i] = i; + + b = A[i*j]; Load + Store b 11

  15. Conditionals  Use a switch operator  No wasted work.  Natural correspondence to if-then Switch P  Can build loops T F  Use a gated phi function (ala SSA)  More parallelism -- defer predicate T F computation  Not suitable for loops phi P  Computing predicate is tricky (but solved) 12

  16. Conditionals  Use a “steering” operator. 13

  17. Loops 14

  18. Managing parallelism: Static dataflow  Exactly one input on each dataflow arc at one time  Finite state (~ the size of the dataflow graph)  Scheduling is easy  Parallelism limited by dataflow graph size (i.e. static instruction count)  No loop parallelism. A B + 15

  19. Managing Parallelism: Dynamic dataflow  Dynamic dataflow  Multiple inputs on an arc at one time  Parallelism is possible -- pipeline iterations through the loops graph  Unbounded state  Circulation speed mismatch -- mis-matched inputs  Tags are required. A B 1:A 2:B A B 3:A 1:B A B 2:A 3:B + + S 3:S S 2:S S 1:S 16

  20. Dataflow tags  Tags distinguish between different dynamic instances of the same value  Tag management in TTDA  Tags are the address of an activation record (aka stack frame)  A dynamic instance of an “instruction block” has a tag.  A central manager allocates/reclaims them. 17

  21. Dataflow Granularity  How big should the threads that “fire” be?  Fine-grain  In the limit, each instruction is a thread  Maximum parallelism  Lots synchronization overhead.  Bounded # of inputs  Coarse-grain  Potentially less parallelism (in practice?)  less synchronization overhead and variable inputs  It’s had to beat straight-line code on a pipelined machine.  5-stages == 5-way parallelism  Pretty good for short threads 18

  22. Challenges in Dataflow Execution  Building well-formed graphs.  In von Neumann ISAs any sequence of instructions is valid  Complex rules for well-formed dataflow graphs  Detecting completion  It is hard to tell when a fully distributed system is “finished”  Preventing tag explosion  k-loop bounding et. al.  Executing “normal” languages. 19

  23.  j will probably run ahead of s.  token pile up!  But it might not  Tokens out of order! 20

  24. Id  Elegant  Determinate  Functional  Non-strict  Implicit parallelism.  I-structures  Non-strictness is the least intuitive property  Exposes enormous parallelism.  Leadings to mind bending code. 21

  25. I-structures  A sort of dataflow-enabled storage element  Simple rules  Write/initialize once.  Read from an uninitialized I-structure blocks.  Read from an initialized I-structure returns.  Write to an uninitialized I-structure unblocks reads  Write to an initialized I-structure is an error.  Implementation is tricky: you need a queue for for blocked reads. 22

  26. In context  Id never really went anywhere  This paper is a good snap shot of late 80’s dataflow thinking.  Eventually gives rise to OOO execution (a la HPS)  Excellent example of vertical co-design.  They rethought the whole system  Almost always impractical  Often yields great ideas. 23

  27. Bits from your summaries  How do you execute normal languages?  How do you multitask?  How does function linking work?  Top-to-bottom design.  Where’s the data?  Would I-structures be useful today? 24

Recommend


More recommend