Efficient Structural Adder Pipelining in Transposed Form FIR Filters International Conference on Digital Signal Processing 23 July 2015 Mathias Faust † , Martin Kumm * , Chip-Hong Chang † and Peter Zipf * † Nanyang Technological University, Singapore * University of Kassel, Germany
CONTENTS 1. Pipelining FIR Filters 2. Proposed Architecture 3. Results 2
FIR FILTERS IN TRANSPOSED FORM 3
FIR FILTERS IN TRANSPOSED FORM 4
FIR FILTERS IN TRANSPOSED FORM 5
FIR FILTERS IN TRANSPOSED FORM 6
HIGH SPEED FIR FILTERS The multiplier block is realized by shift-and-add networks for fixed coefficients Efficient pipelining of the multiplier block is well understood [Aksoy et al. 2010],[Kumm et al. 2012] Critical path delay often found in structural adders (largest word size) Pipelining of structural adders is resource expensive 7
STRUCTURAL ADDER PIPELINING 8
STRUCTURAL ADDER PIPELINING 9
STRUCTURAL ADDER PIPELINING 10
STRUCTURAL ADDER PIPELINING 11
STRUCTURAL ADDER PIPELINING 12
PIPELINED RIPPLE-CARRY ADDER 13
STRUCTURAL ADDER PIPELINING Structural adder pipelining is simple… … but is very cost intensive (FFs) to balance the pipeline… … and heavily increases the latency Alternative to speedup is using carry save adders ➯ doubles the algorithmic delays 14
NON-PIPELINED ARCHITECTURE 15
PROPOSED ARCHITECTURE 16
PROPOSED ARCHITECTURE partially redundant number representation 16
PROPOSED ARCHITECTURE partially redundant number representation 16
PROPOSED ARCHITECTURE partially redundant number representation 16
PROPOSED ARCHITECTURE partially redundant pipelined RCA number representation 16
EXPERIMENTAL RESULTS VHDL code generator was implemented Filter 1 of [Lim,Parker 1983] was analyzed with following properties: 121 taps, 8 bit input and 25 bit output word length Word length of CPAs varied from 2 to 24 bits Synthesis Results: Synopsys Design Compiler + TSMC 0.18 μ m 17
SYNTHESIS RESULTS 8 800000 7 700000 6 600000 5 500000 4 400000 µm² ns 3 300000 2 200000 1 100000 0 0 TC 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 CPA word size Logic Area Register Area Total Area Delay design goal: minimum area 18
SYNTHESIS RESULTS Min. Area Min. Delay Delay goal 1ns s CPA Area Inc. Delay Area Delay Area Delay [ µ m 2 ] [ µ m 2 ] [ µ m 2 ] [%] [ns] [ns] [ns] 2 491056 33.66 1.22 706697 0.84 507343 1.00 3 448645 22.12 1.43 648691 0.86 465506 1.00 4 425586 15.84 1.78 640342 0.84 492716 1.00 5 406812 10.73 1.91 626507 0.88 485262 1.00 6 405422 10.35 2.33 640392 0.91 509322 1.00 7 395605 7.68 2.47 622097 0.96 513486 1.00 8 394770 7.46 2.88 601832 0.96 531505 1.00 9 388114 5.64 3.05 628031 0.97 553144 1.00 10 387210 5.40 3.35 597678 1.02 557621 1.01 11 383793 4.47 3.59 622303 1.02 580999 1.03 12 387532 5.49 3.99 622010 1.03 583953 1.06 13 380876 3.67 4.20 625786 1.04 577197 1.10 14 377756 2.82 4.17 642181 1.04 599354 1.06 15 378381 2.99 4.65 647610 1.05 582792 1.10 16 378415 3.00 4.80 620234 1.08 593862 1.10 17 379389 3.27 5.16 627419 1.11 602847 1.11 18 378561 3.04 5.39 612194 1.12 594315 1.12 19 378288 2.97 5.67 621727 1.14 594694 1.14 20 377270 2.69 5.93 633673 1.13 604567 1.14 21 376073 2.37 6.19 613671 1.17 616372 1.15 22 374609 1.97 6.47 601875 1.18 597877 1.18 23 372653 1.44 6.70 637528 1.18 619662 1.16 24 374213 1.86 7.04 642730 1.18 625629 1.17 TC 367381 – 6.94 620314 1.20 606629 1.20 19
SYNTHESIS RESULTS Min. Area Min. Delay Delay goal 1ns s CPA Area Inc. Delay Area Delay Area Delay [ µ m 2 ] [ µ m 2 ] [ µ m 2 ] [%] [ns] [ns] [ns] 2 491056 33.66 1.22 706697 0.84 507343 1.00 3 448645 22.12 1.43 648691 0.86 465506 1.00 4 425586 15.84 1.78 640342 0.84 492716 1.00 5 406812 10.73 1.91 626507 0.88 485262 1.00 6 405422 10.35 2.33 640392 0.91 509322 1.00 7 395605 7.68 2.47 622097 0.96 513486 1.00 8 394770 7.46 2.88 601832 0.96 531505 1.00 9 388114 5.64 3.05 628031 0.97 553144 1.00 10 387210 5.40 3.35 597678 1.02 557621 1.01 2x speed 11 383793 4.47 3.59 622303 1.02 580999 1.03 12 387532 5.49 3.99 622010 1.03 583953 1.06 for 5.4% 13 380876 3.67 4.20 625786 1.04 577197 1.10 14 377756 2.82 4.17 642181 1.04 599354 1.06 area 15 378381 2.99 4.65 647610 1.05 582792 1.10 16 378415 3.00 4.80 620234 1.08 593862 1.10 17 379389 3.27 5.16 627419 1.11 602847 1.11 overhead 18 378561 3.04 5.39 612194 1.12 594315 1.12 19 378288 2.97 5.67 621727 1.14 594694 1.14 20 377270 2.69 5.93 633673 1.13 604567 1.14 21 376073 2.37 6.19 613671 1.17 616372 1.15 22 374609 1.97 6.47 601875 1.18 597877 1.18 23 372653 1.44 6.70 637528 1.18 619662 1.16 24 374213 1.86 7.04 642730 1.18 625629 1.17 TC 367381 – 6.94 620314 1.20 606629 1.20 19
SYNTHESIS RESULTS Min. Area Min. Delay Delay goal 1ns s CPA Area Inc. Delay Area Delay Area Delay [ µ m 2 ] [ µ m 2 ] [ µ m 2 ] [%] [ns] [ns] [ns] Same delay 2 491056 33.66 1.22 706697 0.84 507343 1.00 3 448645 22.12 1.43 648691 0.86 465506 1.00 4 425586 15.84 1.78 640342 0.84 492716 1.00 but 26% 5 406812 10.73 1.91 626507 0.88 485262 1.00 6 405422 10.35 2.33 640392 0.91 509322 1.00 more area! 7 395605 7.68 2.47 622097 0.96 513486 1.00 8 394770 7.46 2.88 601832 0.96 531505 1.00 9 388114 5.64 3.05 628031 0.97 553144 1.00 10 387210 5.40 3.35 597678 1.02 557621 1.01 2x speed 11 383793 4.47 3.59 622303 1.02 580999 1.03 12 387532 5.49 3.99 622010 1.03 583953 1.06 for 5.4% 13 380876 3.67 4.20 625786 1.04 577197 1.10 14 377756 2.82 4.17 642181 1.04 599354 1.06 area 15 378381 2.99 4.65 647610 1.05 582792 1.10 16 378415 3.00 4.80 620234 1.08 593862 1.10 17 379389 3.27 5.16 627419 1.11 602847 1.11 overhead 18 378561 3.04 5.39 612194 1.12 594315 1.12 19 378288 2.97 5.67 621727 1.14 594694 1.14 20 377270 2.69 5.93 633673 1.13 604567 1.14 21 376073 2.37 6.19 613671 1.17 616372 1.15 22 374609 1.97 6.47 601875 1.18 597877 1.18 23 372653 1.44 6.70 637528 1.18 619662 1.16 24 374213 1.86 7.04 642730 1.18 625629 1.17 TC 367381 – 6.94 620314 1.20 606629 1.20 19
SYNTHESIS RESULTS Min. Area Min. Delay Delay goal 1ns s CPA Area Inc. Delay Area Delay Area Delay 7x speed [ µ m 2 ] [ µ m 2 ] [ µ m 2 ] [%] [ns] [ns] [ns] for 26.7% Same delay 2 491056 33.66 1.22 706697 0.84 507343 1.00 3 448645 22.12 1.43 648691 0.86 465506 1.00 area 4 425586 15.84 1.78 640342 0.84 492716 1.00 but 26% 5 406812 10.73 1.91 626507 0.88 485262 1.00 6 405422 10.35 2.33 640392 0.91 509322 1.00 overhead more area! 7 395605 7.68 2.47 622097 0.96 513486 1.00 8 394770 7.46 2.88 601832 0.96 531505 1.00 9 388114 5.64 3.05 628031 0.97 553144 1.00 10 387210 5.40 3.35 597678 1.02 557621 1.01 2x speed 11 383793 4.47 3.59 622303 1.02 580999 1.03 12 387532 5.49 3.99 622010 1.03 583953 1.06 for 5.4% 13 380876 3.67 4.20 625786 1.04 577197 1.10 14 377756 2.82 4.17 642181 1.04 599354 1.06 area 15 378381 2.99 4.65 647610 1.05 582792 1.10 16 378415 3.00 4.80 620234 1.08 593862 1.10 17 379389 3.27 5.16 627419 1.11 602847 1.11 overhead 18 378561 3.04 5.39 612194 1.12 594315 1.12 19 378288 2.97 5.67 621727 1.14 594694 1.14 20 377270 2.69 5.93 633673 1.13 604567 1.14 21 376073 2.37 6.19 613671 1.17 616372 1.15 22 374609 1.97 6.47 601875 1.18 597877 1.18 23 372653 1.44 6.70 637528 1.18 619662 1.16 24 374213 1.86 7.04 642730 1.18 625629 1.17 TC 367381 – 6.94 620314 1.20 606629 1.20 19
CONCLUSION Drastically delay reductions are possible by small overhead in area Experiments showed: 2x speedup with 5.4% area overhead 7x speedup with 26.7% area overhead The latency overhead is very small compared to conventional pipelining (only a single pipelined RCA) 20
THANK YOU! LITERATURE [Aksoy et al. 2010]: L. Aksoy, E. Costa, P. Flores, and J. Monteiro, “Optimization of Area and Delay at Gate-Level in Multiple Constant Multiplications,” Euromicro Conference on Digital System Design, 2010 [Kumm et al. 2012]: M. Kumm, P. Zipf, M. Faust, and C.-H. Chang, “Pipelined Adder Graph Optimization for High Speed Multiple Constant Multiplication,” ISCAS 2012 [Lim,Parker 1983]: Y. Lim and S. Parker, “Discrete coefficient FIR digital filter design based upon an LMS criteria,” Circuits and Systems, IEEE Transactions on, vol. 30, no. 10, pp. 723–739, Oct. 1983. 21
Recommend
More recommend