lecture 2 i lecture 2 i
play

Lecture 2 (I ): Lecture 2 (I ): Pipelining & Retiming - PowerPoint PPT Presentation

Lecture 2 (I ): Lecture 2 (I ): Pipelining & Retiming Pipelining & Retiming Hsie-Chia Chang E-mail : hcchang@mail.nctu.edu.tw Fall 2006 Outline Outline Pipelining of FI R Digital filters Data-Broadcast Structures


  1. Lecture 2 (I ): Lecture 2 (I ): Pipelining & Retiming Pipelining & Retiming 張錫嘉 Hsie-Chia Chang E-mail : hcchang@mail.nctu.edu.tw Fall 2006

  2. Outline Outline � Pipelining of FI R Digital filters – Data-Broadcast Structures – Fine-Grain Pipelining � Parallel Processing � Pipelining and Parallel Processing for Low Power � Retiming – Definitions and Properties – Solving Systems of Inequalities – Retiming Techniques • Cutset Retiming & Pipelining • Retiming for Clock Period Minimization • Retiming for Register Minimization Optimized Application-Specific I ntegrated Systems 2

  3. I ntroduction I ntroduction – If some real-time application requires a faster input rate, the critical path can be reduced by either pipelining or parallel processing Optimized Application-Specific I ntegrated Systems 3

  4. Pipelining & Parallel Processing (1/ 2) Pipelining & Parallel Processing (1/ 2) � Pipelining – Reduce the effective critical path by introducing pipelining latches along the critical datapath – Without any pipelining latches, the critical path can be reduced by � Parallel processing – Increase the sampling by replicating hardware so that inputs can be processed in parallel; outputs can be produced at the same time � This techniques applied in the non-recursive computations continue sending T sample ≠ T CLK T sample = T CLK Optimized Application-Specific I ntegrated Systems 4

  5. Pipelining & Parallel Processing (2/ 2) Pipelining & Parallel Processing (2/ 2) Example 2: Optimized Application-Specific I ntegrated Systems 5

  6. Pipelining of FI R Digital Filters Pipelining of FI R Digital Filters T Critical = T M + T A Schedule of Events in the Pipelined FIR Filter Optimized Application-Specific I ntegrated Systems 6

  7. Cutset Pipelining (1/ 2) Pipelining (1/ 2) Cutset � The speed is limited by the longest path between – any two latches – an input & a latch – a latch & an output – The input & the output � 2-level pipelined structure – The longest path can be reduced by suitably placing the pipelining latches in the architecture – In this system, at any time, 2 consecutive outputs are computed in an interleaved manner – Drawbacks • • Optimized Application-Specific I ntegrated Systems 7

  8. Cutset Pipelining (2/ 2) Pipelining (2/ 2) Cutset � Cutset � Feed-forward cutset cutset – We can arbitrarily place latches on + k D a feed-forward cutset of any FIR G2 +k D filter structure without affecting the G1 functionality of the algorithm + k D Optimized Application-Specific I ntegrated Systems 8

  9. Example 3.2.1 Example 3.2.1 Optimized Application-Specific I ntegrated Systems 9

  10. Data- -Broadcast Structures Broadcast Structures Data Optimized Application-Specific I ntegrated Systems 10

  11. Fine- -grain Pipelining grain Pipelining Fine Optimized Application-Specific I ntegrated Systems 11

  12. Parallel Processing Parallel Processing � Parallel processing are also referred to as block processing – Block size = no. of inputs processed in a clock cycle – For a 3-tap FRI filter, the duplicate hardware can be shown as: Block delay delay = + − + −  y ( 3 k ) ax ( 3 k ) bx ( 3 k 1 ) cx ( 3 k 2 ) = + − + −  y ( n ) ax ( n ) bx ( n 1 ) cx ( n 2 ) + = + + + −  y ( 3 k 1 ) ax ( 3 k 1 ) bx ( 3 k ) cx ( 3 k 1 )  + = + + + +  y ( 3 k 2 ) ax ( 3 k 2 ) bx ( 3 k 1 ) cx ( 3 k ) � I n MI MO, Optimized Application-Specific I ntegrated Systems 12

  13. Complete Parallel Processing Systems Complete Parallel Processing Systems – A serial-to-parallel converter – A parallel-to-serial converter Optimized Application-Specific I ntegrated Systems 13

  14. Why use Parallel Processing?? Why use Parallel Processing?? � Communication bounded – When the critical path is less than T communication , the I/O bound dominates and this system is communication bounded . – Pipelining can be used only to the extent such that the critical path is limited by the communication bound. – Once this is reached, pipelining can no longer increase the speed Optimized Application-Specific I ntegrated Systems 14

  15. Combined Pipelining & Parallel Processing Combined Pipelining & Parallel Processing – After combining M -level pipelining and L -level parallel processing, Optimized Application-Specific I ntegrated Systems 15

  16. CMOS Power Consumption (1/ 2) CMOS Power Consumption (1/ 2) � P total = P dynamic + P short-circuit + P static � Short circuit – current spikes � Static Power – leakage current Optimized Application-Specific I ntegrated Systems 16

  17. CMOS Power Consumption (2/ 2) CMOS Power Consumption (2/ 2) � Based on simple approximation & 1st-order analysis – Propagation delay ⋅ C V = charge 0 T ( ) pd − 2 k V V 0 t C charge the capacitance to be charged or discharged in a single clock cycle (along the critical path) V 0 、 V t the supply voltage 、 the threshold voltage K a function of technology parameters – Power consumption = ⋅ ⋅ 2 P C V f total 0 C total the total capacitance of the CMOS circuit f clock frequency of the circuit Optimized Application-Specific I ntegrated Systems 17

  18. Low Power Design Low Power Design � To reduce – Capacitances • Transistor/Gate C • Load C • Interconnects • External – Activity – Frequency – Power supply � Other issues – Off-chip connections have high capacitive load – System integration Optimized Application-Specific I ntegrated Systems 18

  19. Pipelining for Low Power (1/ 2) Pipelining for Low Power (1/ 2) � For an M-level pipelined architecture, – the critical path is reduced to 1/ M and the capacitance to be charged/discharged in a single cycle (C charge ) is also reduced to 1/ M � I f the same clock speed is maintained (f = 1/ T pd ), – only 1/M of the non-pipelined capacitance is required to be charged or discharged, which suggests voltage reduction β ⋅ V – Suppose the voltage can be reduced to , 0 ( ) = ⋅ β ⋅ ⋅ the power consumption becomes 2 P C V f pipelined total 0 = β ⋅ 2 P − non pipelined Optimized Application-Specific I ntegrated Systems 19

  20. Pipelining for Low Power (2/ 2) Pipelining for Low Power (2/ 2) – propagation delay of the original architecture – propagation delay of the pipelined architecture – setting the above two equations equal, the following quadratic equation can be obtained to solve β ( ) ( ) β ⋅ − = β ⋅ − 2 2 M V V V V 0 t 0 t Optimized Application-Specific I ntegrated Systems 20

  21. Example 3.4.1: Reduce Power by Pipelining Example 3.4.1: Reduce Power by Pipelining � Consider the following two FI R filters. x(n) x(n) m 1 m 1 m 1 D D D D D y(n) m 2 m 2 m 2 D D y(n) – What is the supply voltage of the pipelined architecture if the clock periods are identical? – What is the relative power consumption? Optimized Application-Specific I ntegrated Systems 21

  22. Solution Solution Optimized Application-Specific I ntegrated Systems 22

  23. Parallel Processing for Low Power (1/ 2) Parallel Processing for Low Power (1/ 2) � For an L-parallel architecture, – the charge capacitance remains the same, but the total capacitance (C total ) is increased L times � To maintain the same sample rate, – The clock speed is reduced to 1/L (f = 1/LT pd ), which means the C charge is charged or discharged L times longer. β ⋅ V – The supply voltage can be reduced to , 0 ( ) ( ) f the power consumption becomes = ⋅ ⋅ β ⋅ ⋅ 2 P L C V parallel total 0 L = β ⋅ 2 P − non parallel Optimized Application-Specific I ntegrated Systems 23

  24. Parallel Processing for Low Power (2/ 2) Parallel Processing for Low Power (2/ 2) – propagation delay of the original architecture – propagation delay of the parallel architecture – setting these two propagation delays equal, the following quadratic equation can be obtained to solve β ( ) ( ) β ⋅ − = β ⋅ − 2 2 L V V V V 0 t 0 t Optimized Application-Specific I ntegrated Systems 24

  25. Example 3.4.2: Reduce Power by Parallel Example 3.4.2: Reduce Power by Parallel � Consider the following two FI R filters, with critical paths denoted in dash lines respectively x(2k) x(n) D y(2k+1) D D D y(n) x(2k+1) D D y(2k) – What is the supply voltage of the parallel architecture? – What is the relative power consumption? Optimized Application-Specific I ntegrated Systems 25

  26. Solution Solution Optimized Application-Specific I ntegrated Systems 26

  27. Example 3.4.3 Example 3.4.3 � Area-efficient architecture Optimized Application-Specific I ntegrated Systems 27

  28. Summary Summary � I n pipelining & parallel processing, – M-level pipelining, – L-level parallel processing, – Combining M-level pipelining & L-level parallel processing, � For low power design, – Pipelining – Parallel Processing – Combining Pipelining and Parallel Processing Optimized Application-Specific I ntegrated Systems 28

Recommend


More recommend