Efficient VLSI architectures for baseband signal processing in - PowerPoint PPT Presentation

Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal Srikrishna Bhashyam, Joseph R. Cavallaro, and Behnaam Aazhang This work is supported by Nokia, TI, TATP and NSF

Motivation Computationally complex algorithms for base-stations – multiple users, high data rates – matrix inversions, floating point accuracy needed – DSP solutions infeasible for real-time [S.Das’99] Real-time implementations for baseband receiver? – multiuser channel estimation *S.Das et al., “Arithmetic Acceleration Techniques for Wireless Base-station Receivers”, Asilomar 1999

Contributions New estimation scheme – designed from an implementation perspective – bit-streaming, fixed-point architecture – reduced complexity, same error rate performance Real-time architecture design – exploit bit-level parallelism – area-constrained, time-constrained – real-time with minimum area

Baseband signal processing Antenna Multiple Multiuser Decoding Users Information Detection Bits Multiuser Channel Training Tracking estimation Base-Station Receiver

Channel estimation Noise +MAI Base Station Direct Reflected Path Path User 1 User 2 Estimates unknown fading amplitudes and asynchronous delays.

Need for multiuser channel estimation Detector performance depends on estimation accuracy Best estimator : Maximum Likelihood => jointly estimate parameters for all users => Multiuser channel estimation Single-user sliding correlator used for implementation

Multiuser channel estimation algorithm 2K { 1 , 1 } b ∈ − i N R * A R r ∈ C = i bb br 2 K * 2 K R ∈ ℜ bb T b b R = bb � i i 2 K * N R C ∈ L br 2 K * N A C ∈ b H r R = br � i i - Training/Tracking bits b i L r i - Received signal N - Spreading gain (typically fixed ,e.g: 32) K - Number of users (variable, <= N ) A - Maximum Likelihood channel estimate

Outline Background Channel Estimation - An implementation perspective VLSI architectures – Area-constrained, Time-constrained, Area-Time efficient DSP Comparisons and Conclusions

Iterative scheme for channel estimation ( i ) ( i 1 ) ( i 1 ) ( i ) ( i ) A A − ( A − * R R ) = − µ − bb br ( i ) ( i 1 ) T T R R − b * b b * b = + − bb bb L L 0 0 ( i ) ( i 1 ) H H R R − b * r b * r = + − br br L L 0 0 Bit-streaming, method of gradient descent Stable convergence behavior with µ Simple fixed-point architecture

Simulations - Static multipath channel Comparison of Bit Error Rates (BER) -1 10 Iterative Channel Est. SINR = 0 dB Original Channel Est. Paths =3 Training =150 bits BER -2 10 Spreading N = 31 O(K 2 N) Users K = 15 O(K 3 +K 2 N) -3 10 4 5 6 7 8 9 10 11 12 Signal to Noise Ratio (SNR)

Design specifications 32 Users (K) 32 spreading code length (N) Target = 128 Kbps – 4000 cycles available at 500 MHz Single cycle addition/multiplication

Task decomposition Tracking Window L Correlation Iterate Matrices (Per Bit) b 0 b L (2K,1) (2K,1) R br O(2KN,8 ) Channel A Estimate O(4K 2 N,8) to Detector R bb r L (N,8) r 0 O(2K 2 ,8) (N,8) TIME

Architecture design ( i ) ( i 1 ) T T R R − b * b b * b = + − bb bb L L 0 0 XNOR gates, UP/DOWN counters ( i ) ( i 1 ) H H R R − b * r b * r = + − br br L L 0 0 8-bit adders ( i ) ( i 1 ) ( i 1 ) ( i ) ( i ) A A − ( A − * R R ) = − µ − bb br 8-bit multipliers [Schulte’93] * Schulte, Swartzlander “Truncated Multiplication with Correction Constant”, Workshop on VLSI Signal Processing,1993

Area-constrained : Min. area, not real- time ( i ) ( i 1 ) T T R R − b * b b * b Channel Estimate = + − bb bb L L 0 0 b L i A (i) A (i-1) R bb j 8 8 8 1 8 Load Store 1 b L DEMUX 1 MUX Counter MUX 1 U/D 8 8 8 b 0 1 MAC Subtract i j 16 8 R br 1 8 >> 1 Subtract b 0 Add/ 8 16 Add/ Sub Sub 1 8 8 1 j j r 0 r L ( i ) ( i 1 ) H H R R − b * r b * r = + − ( i ) ( i 1 ) ( i 1 ) ( i ) ( i ) A A − ( A − * R R ) = − µ − br br L L 0 0 bb br

Area-constrained : Hardware used Blocks Quantity Full Adder Complex Total Cells Counter 1*8 8 - 8 Multiplier 1*8 64 *2 128 Adders 3*8 + 2*16 56 *2 112 Total Area 248 FA cells 4K 2 N Total Time 128,000 cycles (N=K=32)

Time-constrained : Real time, large area ( i ) ( i 1 ) T T R R − b * b b * b = + − bb bb L L 0 0 K(2K-1)*1 2K*1 M b L b*b T U ( i ) ( i 1 ) ( i 1 ) ( i ) ( i ) A A − ( A − * R R ) = − µ − X b 0 b 0 *b 0 T bb br K(2K-1)*1 Channel 2K*1 R bb A Estimate 2K*1 2K 2 *8 2KN*8 MUX Mult Subtract r L M 2K*1 2KN*8 N*8 2KN*16 U >> R br Subtract X r 0 N*8 2KN*8 2KN*16 N*8 ( i ) ( i 1 ) H H R R − b * r b * r = + − br br L L 0 0

Time-constrained : Hardware used Blocks Quantity Full Adder Complex Total Cells 2K 2 *8 16K 2 16K 2 Counter - 4K 2 N*8 256K 2 N 512K 2 N Multiplier *2 Adders 2KN*16 + 48KN + *2 96KN + 64K 2 N 128K 2 N 2KN*8 + 4K 2 N*16 Total Area 20,000,000 (N=K=32) FA cells Total Time Log 2 (2K) 6 cycles

Area-Time efficient architecture design Area - constrained – single 8-bit multiplier 4K 2 N – cycles (128,000) [3.81 Kbps, 248 FA Cells] Time-constrained 4K 2 N – 8-bit multipliers – log 2 (2K) cycles (6) [83.33 Mbps, 20,000,000 FA Cells] Goal : real-time with minimum area Different parallelism levels for multipliers

Area-Time efficient : Real-time, min. area ( i ) ( i 1 ) T T R R − b * b b * b = + − bb bb L L 0 0 ( i ) ( i 1 ) ( i 1 ) ( i ) ( i ) A A − ( A − * R R ) = − µ − bb br 2K*1 Counters MUX Channel Estimate 2K*1 2K*8 b L *b L b 0 *b 0 T T A (i) A (i-1) R bb 2K*1 2K*1 1*8 2K*8 2K*8 b L b 0 DEMUX Mult MUX 2K*1 2K*1 2K*8 MUX 1*16 Subtract r L 1*1 1*8 M N*8 1*8 U Adder >> Subtract X 1*8 r 0 1*8 1*16 N*8 Load Store R br ( i ) ( i 1 ) H H R R − b * r b * r = + − br br L L 0 0

Area-Time efficient : Hardware used Blocks Quantity Full Adder Complex Total Cells Counter 2K*8 16K - 16K Multiplier 2K*8 128K *2 256K Adders 2K*16 + 32K + 32 *2 64K + 64 2*8 + 1*16 Total Area 10,000 (N=K=32) FA cells Total Time 2KN 2,000 cycles

DSP comparisons DSPs unable to exploit bit-level parallelism Inefficient storage of bits Unable to replace bit-multiplications by add/sub. Implementation Clock Full Adder Data Rates Rate Cells 166 MHz - 1.02 Kbps C67 DSP Area 500 MHz 248 3.81 Kbps : : : : 10 4 Area-Time 500 MHz 256 Kbps : : : : 2x10 7 Time 500 MHz 83.33 Mbps

Scalability of architectures Design for maximum number of users in the system Fewer users – turn off functional units to reduce power – reconfigure hardware for higher data rates (FPGA) Investigating K-user design using K/2-user designs. Investigating DSP extensions

Conclusions New estimation scheme – designed from an implementation perspective – bit-streaming, fixed-point architecture – reduced complexity, same error rate performance Real-time architecture designs – exploit bit-level parallelism – area-constrained, time-constrained – real-time with minimum area => Real-time architectures for base-band signal processing

Efficient VLSI architectures for baseband signal processing in - PowerPoint PPT Presentation

Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal Srikrishna Bhashyam, Joseph R. Cavallaro, and Behnaam Aazhang This work is supported by Nokia, TI, TATP and NSF Motivation

Baseband Signal Processing Framework Baseband Signal Processing Framework for the OsmocomBB GSM

Multipurpose Baseband Instrument using AsAP Digital Signal Processing Jeremy W. Webb University

VLSI Digital Signal Processing Systems Keshab K. Parhi VLSI Digital Signal Processing Systems

VLSI VLSI - Digital Signal Processing Digital Signal Processing - - - Hsie-Chia

All Digital Multi-Gigabit Baseband P. Sandeep M. Seo M. Rodwell U. Madhow ECE Dept, UCSB ECE

RBS 6000 & Baseband Why Invest in RBS 6000 & Baseband Learning Service ? Better

Breaking Band reverse engineering and exploiting the shannon baseband Nico Golde

FreeCalypso A fully liberated GSM baseband Mychaela Falconia REcon Montreal 2017 The problem of

VLSI Design Styles Basic Concepts in VLSI Physical Design Automation 1 VLSI Design Cycle

VLSI Design Styles Basic Concepts in VLSI Physical Design Automation VLSI Design Cycle

CS/EE 6710 Digital VLSI Design CS/EE 6710 Digital VLSI Design 1 CS/EE 6710 Digital VLSI

Efficient signal processing using Haskell and LLVM Henning Thielemann 2016-09-15 Efficient

Tx Signal: 1000 Hz sine wave; Attenuation; Random noise with 0.5ms spike Tx Signal Noise Rx

Architectures Architectural styles Software architectures Architectures versus middleware

CS/ECE 5710/6710 Digital VLSI Design CS/ECE 5710/6710 Digital VLSI Design 1 CS/EE 5710/6710

Efficient audio signal processing using LLVM and Haskell Henning Thielemann 2013-04-30

Welcome to the Mentoring Swimming Officials webinar. We appreciate that you have taken the

2017 ANNUAL FINANCIAL REPORT ANNUAL FINANCIAL REPORT 2017 CONTENTS 6-7 Message from the

The Home Usability Network Community Living Summit Sept. 19-21, 2016 Alexandria, Virginia

Attacking the Baseband Modem of Mobile Phones to Breach the Users Privacy and Network Security

4G and 5G Networks Altaf af Shai aik (Technische Universitt Berlin, Germany) Ravishankar

Pilot-aided Direction of Arrival Estimation for mmWave Cellular Systems Mahbuba Sheba Ullah Dr.

GTC 2019, San Jose Dr. Tim OShea, CTO : tim@deepsig.io 3100 Clarendon Blvd, Suite 200

Autonomous Formation Flying (AFF) Sensor for Precision Formation Flying Missions MiMi Aung

Efficient VLSI architectures for baseband signal processing in - PowerPoint PPT Presentation

Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal Srikrishna Bhashyam, Joseph R. Cavallaro, and Behnaam Aazhang This work is supported by Nokia, TI, TATP and NSF Motivation

Baseband Signal Processing Framework Baseband Signal Processing Framework for the OsmocomBB GSM

Multipurpose Baseband Instrument using AsAP Digital Signal Processing Jeremy W. Webb University

VLSI Digital Signal Processing Systems Keshab K. Parhi VLSI Digital Signal Processing Systems

VLSI VLSI - Digital Signal Processing Digital Signal Processing - - - Hsie-Chia

All Digital Multi-Gigabit Baseband P. Sandeep M. Seo M. Rodwell U. Madhow ECE Dept, UCSB ECE

RBS 6000 &amp; Baseband Why Invest in RBS 6000 &amp; Baseband Learning Service ? Better

Breaking Band reverse engineering and exploiting the shannon baseband Nico Golde

FreeCalypso A fully liberated GSM baseband Mychaela Falconia REcon Montreal 2017 The problem of

VLSI Design Styles Basic Concepts in VLSI Physical Design Automation 1 VLSI Design Cycle

VLSI Design Styles Basic Concepts in VLSI Physical Design Automation VLSI Design Cycle

CS/EE 6710 Digital VLSI Design CS/EE 6710 Digital VLSI Design 1 CS/EE 6710 Digital VLSI

Efficient signal processing using Haskell and LLVM Henning Thielemann 2016-09-15 Efficient

Tx Signal: 1000 Hz sine wave; Attenuation; Random noise with 0.5ms spike Tx Signal Noise Rx

Architectures Architectural styles Software architectures Architectures versus middleware

CS/ECE 5710/6710 Digital VLSI Design CS/ECE 5710/6710 Digital VLSI Design 1 CS/EE 5710/6710

Efficient audio signal processing using LLVM and Haskell Henning Thielemann 2013-04-30

Welcome to the Mentoring Swimming Officials webinar. We appreciate that you have taken the

2017 ANNUAL FINANCIAL REPORT ANNUAL FINANCIAL REPORT 2017 CONTENTS 6-7 Message from the

The Home Usability Network Community Living Summit Sept. 19-21, 2016 Alexandria, Virginia

Attacking the Baseband Modem of Mobile Phones to Breach the Users Privacy and Network Security

4G and 5G Networks Altaf af Shai aik (Technische Universitt Berlin, Germany) Ravishankar

Pilot-aided Direction of Arrival Estimation for mmWave Cellular Systems Mahbuba Sheba Ullah Dr.

GTC 2019, San Jose Dr. Tim OShea, CTO : tim@deepsig.io 3100 Clarendon Blvd, Suite 200

Autonomous Formation Flying (AFF) Sensor for Precision Formation Flying Missions MiMi Aung

RBS 6000 & Baseband Why Invest in RBS 6000 & Baseband Learning Service ? Better