Resource Allocation for Hardware Implementations of Map Richard Townsend Martha A. Kim Stephen A. Edwards Columbia University ASBD Workshop, June 15, 2014
Functional Programs to Functional Hardware Functional program (Haskell) Compiler Circuit (Verilog)
Functional Programs to Functional Hardware Map f [ x 1 , x 2 ,..., x n ] Ordered List This Talk ?
Functional Programs to Functional Hardware Map f [ x 1 , x 2 ,..., x n ] Ordered List This Talk Order Dependent ? scan fold
Functional Map vs. MapReduce
Functional Map vs. MapReduce (0,3) (1,3) (2,3) (3,3) (0,2) (1,2) (2,2) (3,2) (0,1) (1,1) (2,1) (3,1) (0,0) (1,0) (2,0) (3,0)
Functional Map vs. MapReduce Ordered (0,3) (1,3) (2,3) (3,3) (0,2) (1,2) (2,2) (3,2) (0,1) (1,1) (2,1) (3,1) (0,0) (1,0) (2,0) (3,0)
Functional Map vs. MapReduce Ordered (0,3) (1,3) (2,3) (3,3) (0,2) (1,2) (2,2) (3,2) (0,1) (1,1) (2,1) (3,1) (0,0) (1,0) (2,0) (3,0) Unordered
Structure of a Hardware Implementation f
Structure of a Hardware Implementation f f f f f
Structure of a Hardware Implementation f f f f f
Structure of a Hardware Implementation f f f f f x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10
Structure of a Hardware Implementation f f x 7 x 6 x 5 x 4 x 3 x 2 x 1 f f f
Structure of a Hardware Implementation x 1 f x 2 f x 10 x 9 x 8 x 7 x 6 x 3 f x 4 f x 5 f
Structure of a Hardware Implementation f f ( x 1 ) x 2 f x 10 x 9 x 8 x 7 x 6 f f ( x 3 ) x 4 f x 5 f
Structure of a Hardware Implementation x 6 f f ( x 1 ) x 2 f x 10 x 9 x 8 x 7 f f ( x 3 ) x 4 f x 5 f
Structure of a Hardware Implementation x 6 f x 2 f x 10 x 9 x 8 x 7 f f ( x 3 ) f ( x 1 ) x 4 f x 5 f
Structure of a Hardware Implementation x 6 f x 2 f x 10 x 9 x 8 x 7 f ( x 7 ) f f ( x 3 ) f ( x 1 ) x 4 f x 5 f
Structure of a Hardware Implementation still processing x 6 f x 2 f x 10 x 9 x 8 x 7 f ( x 7 ) f f ( x 3 ) f ( x 1 ) x 4 f x 5 f
Structure of a Hardware Implementation still processing x 6 f stuck in buffer x 2 f x 10 x 9 x 8 x 7 f ( x 7 ) f f ( x 3 ) f ( x 1 ) x 4 f x 5 f
Structure of a Hardware Implementation still processing x 6 f stuck in buffer x 2 f x 10 x 9 x 8 x 7 f ( x 7 ) f f ( x 3 ) f ( x 1 ) x 4 f holding and stalling x 5 f
Structure of a Hardware Implementation f f f f f
Structure of a Hardware Implementation f f f f f More Functional Units More Buffers (Parallelism) (Utilization)
Multiple Possible Configurations...Which to Choose? Area = 15
Multiple Possible Configurations...Which to Choose? Buffers 50% size of func. unit Area = 15
Multiple Possible Configurations...Which to Choose? f f f f f f f 15 Functional Units f f f f Area = 15 f f f f
Multiple Possible Configurations...Which to Choose? f 28 Buffers Area = 15
Multiple Possible Configurations...Which to Choose? Area = 15 f f 3 f f 5 f f f 24 × 1 2 = 12 f 20 × 1 2 = 10
Workload Structure f f f
Workload Structure f f f Best-case
Workload Structure Time f f f f f Best-case f
Workload Structure Time f f f f f Best-case f ? Average-case
Workload Structure Time f f f f f Best-case f ? Average-case Worst-case
Workload Structure Time f f f f f Best-case f ? Average-case f f Worst-case f
Workload Structure Time f f f f f Best-case f ? Average-case f f Worst-case f
Optimal Resource Allocation Simulator Results 100% 80% Completion Time 60% 40% 20% 0% 0 5 10 15 20 25 30 50 100 150 200 250 Buffer Slots per Functional Unit
Optimal Resource Allocation Simulator Results 100% 80% Maximizing Functional Units Completion Time 60% 2x speedup 40% 20% 0% 0 5 10 15 20 25 30 50 100 150 200 250 Buffer Slots per Functional Unit
Optimal Resource Allocation Simulator Results 100% 80% Maximizing Buffers 3x speedup Completion Time 60% 40% 20% 0% 0 5 10 15 20 25 30 50 100 150 200 250 Buffer Slots per Functional Unit
Optimal Resource Allocation f f f f f 100% 80% Completion Time 60% 40% 20% 0% 0 5 10 15 20 25 30 50 100 150 200 250 Buffer Slots per Functional Unit
Optimal Resource Allocation f f f f f 100% 80% Completion Time 60% Slower 40% Fewer Buffers 20% 0% 0 5 10 15 20 25 30 50 100 150 200 250 Buffer Slots per Functional Unit
Why Are There Multiple Optima? 100% 80% Completion Time 60% 40% 20% 0% 0 5 10 15 20 25 30
Why Are There Multiple Optima? 100% 80% 12 Completion Time 60% 40% 5 20% 7 0% 0 5 10 15 20 25 30 10
Why Are There Multiple Optima? 100% 80% 12 11 Completion Time 60% 40% 5 6 20% 7 6 0% 0 5 10 15 20 25 30 10 11
Performance Scales with Area 100% 90% 80% 70% Completion Time 60% 50% 40% 30% Minimum 20% 10% Ideal 0% 0 10 20 30 40 50 60 Total Area
Performance Scales with Area 100% Completion Time 80% 60% 40% 20% 0% 0 10 20 30 40 50 60 Buffers Slots / Functional Unit 30 20 10 0 0 10 20 30 40 50 60 10 Functional Units 8 6 4 2 0 10 20 30 40 50 60 Total Area
Performance Scales with Area 100% Completion Time 80% 60% f 40% 20% f 0% 0 10 20 30 40 50 60 Buffers Slots / Functional Unit 30 20 10 0 0 10 20 30 40 50 60 12 Functional Units 10 8 6 4 2 0 10 20 30 40 50 60 Total Area
Conclusions 100% 80% Area trade-off is important... Completion Time 60% 40% 20% 0% 0 5 10 15 20 25 30 50 100 150 200 250 Buffer Slots per Functional Unit
Conclusions 100% 80% Area trade-off is important... Completion Time 60% 40% 20% 0% 0 5 10 15 20 25 30 50 100 150 200 250 Buffer Slots per Functional Unit f f f ...and non-obvious f f ? f f f
Conclusions 100% 80% Area trade-off is important... Completion Time 60% 40% 20% 0% 0 5 10 15 20 25 30 50 100 150 200 250 Buffer Slots per Functional Unit f f f ...and non-obvious f f ? f f f f Model helps explore design space f f f f x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10
Conclusions 100% 80% Area trade-off is important... Completion Time 60% 40% 20% 0% 0 5 10 15 20 25 30 50 100 150 200 250 Buffer Slots per Functional Unit f f f ...and non-obvious f f ? f f f f Model helps explore design space f f f f x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 Synthesize Efficient Hardware Implementation of Map
Conclusions 100% 80% Area trade-off is important... Completion Time 60% 40% 20% 0% 0 5 10 15 20 25 30 50 100 150 200 250 Buffer Slots per Functional Unit f f f ...and non-obvious f f ? f f f f Model helps explore design space f f f f x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 Synthesize Efficient Hardware Implementation of Map 100% 90% 80% 70% Completion Time Map Fold Scan 60% 50% Enhance our abstraction 40% 30% 20% 10% 0% 0 10 20 30 40 50 60 Total Area
Recommend
More recommend