compositional dataflow circuits
play

Compositional Dataflow Circuits Stephen A. Edwards Richard Townsend - PowerPoint PPT Presentation

Compositional Dataflow Circuits Stephen A. Edwards Richard Townsend Martha A. Kim Columbia University MEMOCODE, Vienna, Austria, October 1, 2017 gcd( a , b ) = if a = b a else if a < b gcd( a , b a ) else gcd( a b , b ) a b gcd(


  1. Compositional Dataflow Circuits Stephen A. Edwards Richard Townsend Martha A. Kim Columbia University MEMOCODE, Vienna, Austria, October 1, 2017

  2. gcd( a , b ) = if a = b a else if a < b gcd( a , b − a ) else gcd( a − b , b )

  3. a b gcd( a , b ) = if a = b a else if a < b gcd( a , b − a ) else gcd( a − b , b )

  4. a b initial token gcd( a , b ) = 1 if a = b mux a 1 0 1 0 else if a < b = gcd( a , b − a ) else gcd( a − b , b )

  5. a b gcd( a , b ) = 1 if a = b a 1 0 1 0 else if a < b fork = gcd( a , b − a ) else 1 0 1 0 gcd( a − b , b ) demux discard gcd( a , b )

  6. a b gcd( a , b ) = 1 if a = b a 1 0 1 0 else if a < b = gcd( a , b − a ) else 1 0 1 0 gcd( a − b , b ) < discard gcd( a , b )

  7. a b gcd( a , b ) = 1 if a = b a 1 0 1 0 else if a < b = gcd( a , b − a ) else 1 0 1 0 gcd( a − b , b ) < discard gcd( a , b ) 1 0 1 0 −

  8. a b gcd( a , b ) = 1 if a = b a 1 0 1 0 else if a < b = gcd( a , b − a ) else 1 0 1 0 gcd( a − b , b ) < discard gcd( a , b ) 1 0 1 0 − 1 0 1 0

  9. a b gcd( a , b ) = 1 if a = b a 1 0 1 0 else if a < b = gcd( a , b − a ) else 1 0 1 0 gcd( a − b , b ) < discard gcd( a , b ) 1 0 1 0 − − Townsend et al. CC ’2017 1 0 1 0

  10. Patience Through Handshaking Want patient blocks to handle delays from Full buffers Memory systems Shared resources Data-dependent Busy computational computations units

  11. Patience Through Handshaking Want patient blocks to handle delays from Full buffers Memory systems Shared resources Data-dependent Busy computational computations units data downstream valid ready Meaning upstream valid 1 1 Token transferred ready 1 0 Token valid; held 0 No token to transfer − Latency-insensitive Design (Carloni et al.) Elastic Circuits (Cortadella et al.) FIFOs with backpressure

  12. Combinational Function Block Strict/Unit Rate: All input tokens required to produce an output f in0 out in1 Datapath Combinational function ignores flow control

  13. Combinational Function Block Strict/Unit Rate: All input tokens required to produce an output f in0 out in1 Valid network Output valid if both inputs are valid

  14. Combinational Function Block Strict/Unit Rate: All input tokens required to produce an output f in0 out in1 Ready network Input tokens consumed if output token is consumed (output is valid and ready)

  15. Multiplexer Block in0 in1 in2 select out in0 in1 in2 decoder select out

  16. Demultiplexer Block in select out0 out1 out2 in out0 out1 out2 decoder select

  17. Buffering a Linear Pipeline (Point 1/4) Combinational block

  18. Buffering a Linear Pipeline (Point 1/4) Long Combinational Path (Data + Valid)

  19. Buffering a Linear Pipeline (Point 1/4) 0 1 Data buffer: Pipeline register with valid, enable

  20. Buffering a Linear Pipeline (Point 1/4) 0 1 Long Combinational Path (Ready)

  21. Buffering a Linear Pipeline (Point 1/4) 0 0 1 1 0 0 1 1 0 Control Buffer: Register diverts token when downstream suddenly stops Cao et al. MEMOCODE 2015 Inspired by Carloni’s Latency Insensitive Design (e.g., MEMOCODE 2007)

  22. The Problem with Fork Combinational Block: inputs ready when both valid & output ready

  23. The Problem with Fork Combinational Block: inputs ready when both valid & output ready

  24. The Problem with Fork Fork: outputs valid only when all are ready

  25. The Problem with Fork Fork: outputs valid only when all are ready

  26. The Problem with Fork Fork: outputs valid only when all are ready Oops: Combinational Cycle This is not compositional

  27. The Solution to Combinational Loops (Point 2/4) valid ready

  28. The Solution to Combinational Loops (Point 2/4) valid ready

  29. The Solution to Combinational Loops (Point 2/4) Allowed: Combinational paths from valid to ready valid ready

  30. The Solution to Combinational Loops (Point 2/4) Allowed: Combinational paths from valid to ready valid X X X X X ready Prohibited: Combinational paths from ready to valid

  31. The Solution to Fork: A Little State (Point 3/4) Valid out ignores ready of other outputs out0 in out1 out2

  32. The Solution to Fork: A Little State (Point 3/4) Flip-flop set after token Valid out ignores ready sent suppresses duplicates of other outputs out0 in out1 out2

  33. The Solution to Fork: A Little State (Point 3/4) Flip-flop set after token Valid out ignores ready sent suppresses duplicates of other outputs out0 in out1 out2 Input consumed once one token sent on every output

  34. Nondeterministic Merge (Point 4/4) merge select Share with f f f f merge/demux demux

  35. Two-Way Nondeterministic Merge Block w/ Select in0 out Arbiter in1 0 1 sel “Two-way fork with multiplexed output selected by an arbiter”

  36. Experiments: Random Buffer Placement GCD(100,2) 21-way Conveyor BSN 6 2250 3 Completion Time ( µ s) 4 1500 2 2 750 1 (7 buffers) (80 buffers) (96 buffers) 0 0 0 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 Number of buffer pairs

  37. Best Buffering for GCD (Manually Obtained) Each loop has one of each buffer Data Buffer Control Buffer

  38. Summary Compositional Dataflow Networks as an IR Patient dataflow blocks with valid/ready handshaking 1. Break downstream, upstream paths w/ two buffer types 2. Avoid comb. cycles: prohibit ready-to-valid paths 3. Add one state bit per output so forks may “race ahead” 4. Tame nondeterministic merge with a select output Random buffer placement experiments show it works

Recommend


More recommend