mimo ofdm transceiver for a many core computing fabric a
play

MIMO OFDM Transceiver for a Many-Core Computing Fabric A Nucleus - PowerPoint PPT Presentation

MIMO OFDM Transceiver for a Many-Core Computing Fabric A Nucleus based Implementation T. Kempf, D. Gnther, A. Ishaque, G. Ascheid ISS (Chair of Integrated Signal Processing Systems) Institute for Communication Technologies and Embedded


  1. MIMO OFDM Transceiver for a Many-Core Computing Fabric – A Nucleus based Implementation T. Kempf, D. Günther, A. Ishaque, G. Ascheid ISS (Chair of Integrated Signal Processing Systems) Institute for Communication Technologies and Embedded Systems

  2. Outline � Introduction � � Nucleus Methodology � MIMO OFDM Transceiver Implementation � � Application Analysis - Nuclei Identification � Efficient Nuclei Implementations on HW Platform (Flavor) � Algorithmic Performance Evaluation � Application-to-Architecture Mapping � Summary & Outlook 2

  3. Software Defined Radio Vision Flexible SDR e.g UMTS Free Area: Cost Savings e.g.GSM Infineon Technologies Infineon Technologies or new Functionality Source: Source: Today‘s Mobile Phone Future SDR Mobile Phone

  4. Software Defined Radio Vision The three key properties: � Portability � Software is portable onto different platforms Standard.exe → Device_1, ..., Device_n Flexible � Interoperability SDR e.g Bluetooth � Different devices configured for the same standard interoperate Standard_1/Device_1 ↔ Standard_1/Device_2 Standard_1/Device_1 ↔ Standard_1/Device_2 Free Area: Cost Savings � Loadability e.g.GSM Infineon Technologies Infineon Technologies or � new Platform is capable of running different standards Functionality Device ← Standard_1.exe, ..., Standard_n.exe Source: Source: But we must not forget: � Efficiency Today‘s Mobile Phone Future SDR Mobile Phone � Power consumption of flexible SDR must be close to power consumption of dedicated device ( battery driven! )

  5. Software Defined Radios Vision The three key properties: GSM.exe � Portability UMTS.exe LTE.exe On-the-fly Configuration � Software is portable onto different platforms Standard.exe → Device_1, ..., Device_n Flexible � Interoperability SDR e.g Bluetooth Contradicting Requirements ! � Different devices configured for the same standard interoperate Standard_1/Device_1  Standard_1/Device_2 Standard_1/Device_1  Standard_1/Device_2 Flexibility (programmability) vs. Flexibility (programmability) vs. Free Area: Cost Savings � Loadability Energy Efficiency e.g.GSM Infineon Technologies Infineon Technologies or � new Platform is capable of running different standards Functionality Device ← Standard_1.exe, ..., Standard_n.exe Source: Source: But we must not forget: � Efficiency Today‘s Mobile Phone Future SDR Mobile Phone � Power consumption of flexible SDR must be close to power consumption of dedicated device ( battery driven! )

  6. Outline � Introduction � � Nucleus Methodology � MIMO OFDM Transceiver Implementation � � Application Analysis - Nuclei Identification � Efficient Nuclei Implementations on HW Platform (Flavor) � Algorithmic Performance Evaluation � Application-to-Architecture Mapping � Summary & Outlook 6

  7. Nucleus Methodology Transceiver Description Transceiver N 1 Description Non N N 2 N 7 N 5 Tasks Nucleus Library Nuclei N 1 N 2 N 7 N 5 NN Nucleus Nucleus • Critical, demanding, algorithmic kernel • Kernel is common among different waveforms • Not waveform nor hardware specific PE1 PE 2 PE 3 HW (DSP) (ASIP) (rASIP) Platform Comm. Arch. PE 5 PE 4 MEM (FPGA) (GPP) 7

  8. Nucleus Methodology Transceiver Description Transceiver N 1 Description Non N N 2 N 7 N 5 Tasks Nucleus Library Nuclei N 1 N 2 N 7 N 5 NN Flavor NI NI PEs PE 1 PE 2 PE 3 PE 4 PE 5 PE1 PE 2 PE 3 HW (DSP) (ASIP) (rASIP) Platform Comm. Arch. PE 5 PE 4 MEM (FPGA) (GPP) 8

  9. Nucleus Methodology Transceiver Description Transceiver N 1 Description Non N N 2 N 7 N 5 Tasks Nucleus Library Nuclei N 1 N 2 N 7 N 5 NN Mapping Compile & Evaluation NI Board NI NI NI NNI NI NI Support Flavors Package PEs PE 1 PE 2 PE 3 PE 4 PE 5 PE1 PE 2 PE 3 HW (DSP) (ASIP) (rASIP) Platform Comm. Arch. PE 5 PE 4 MEM (FPGA) (GPP) 9

  10. Outline � Introduction � Nucleus Methodology � MIMO OFDM Transceiver Implementation � � � Application Analysis - Nuclei Identification � Efficient Nuclei Implementations on HW Platform (Flavor) � Algorithmic Performance Evaluation � Application-to-Architecture Mapping � Summary & Outlook 10

  11. Nuclei Identification: Transceiver Structure � Outer Modem IEEE 802.11n � Channel (De-)coding � (De-)Interleaving � (De-)Interleaving � Inner Modem (RX) � RX OFDM Processing OFDM Slot � Channel Estimation � Spatial Equalizing: Mitigate channel impact on payload � Soft Demapping: Calculate soft bits (LLRs) BPSK, 4QAM, 16QAM 11

  12. Nuclei Identification: Kernel Identification � Analyze different algorithmic choices within RX blocks � Identify computational kernels � Recurring tasks � Operate on data with certain alignment � Build application as composition of kernels 12

  13. Nuclei Identification: Kernel Identification (Example) � LMMSE MIMO Equalizer with QRD � Basic transmission equation y = Hx + n � Linear MMSE equalization ( ) − 1 σ x G y G H H H 2 I H H = = + ˆ ˆ ˆ ˆ n , E s � Regularized QRD   H ˆ Q       a H H     R R = = = =   b     σ σ I I   Q Q n     E s � Rewrite G using Q a and Q b E s G = σ n Q b Q a H � Computational Kernels � Regularized QR decomposition � Matrix-matrix multiplication � Matrix-vector multiplication 13

  14. Nuclei Identification: Kernel Overview � Application variants consist of a few kernels only! 14

  15. Outline � Introduction � Nucleus Methodology � MIMO OFDM Transceiver Implementation � � � Application Analysis - Nuclei Identification � Efficient Nuclei Implementations on HW Platform (Flavor) � Algorithmic Performance Evaluation � Application-to-Architecture Mapping � Summary & Outlook 15

  16. Application Implementation: P2012 Platform (ST Microelectronics) � SoC platform with maximum of 32 clusters � One cluster provides � Max. 16 RISC cores (STxP70) @ 600MHz � VECx vector extension (SIMD) � 128 bit vector registers � 8x16 bit or 4x32 bit operations � Hardware synchronizer for inter-core signaling � Interface for hardware accelerators (ASICs) 16

  17. Application Implementation: Kernel Overview � For 2x2 and 4x4 MIMO use case � Cycles for execution on single STxP70 processor core including VECX unit � Corresponding time for 600MHz clock frequency � In the range of … � Competing solutions � IEEE 802.11n real time (4 � s per OFDM slot) 17

  18. Outline � Introduction � Nucleus Methodology � MIMO OFDM Transceiver Implementation � � � Application Analysis - Nuclei Identification � Efficient Nuclei Implementations on HW Platform (Flavor) � Algorithmic Performance Evaluation � Application-to-Architecture Mapping � Summary & Outlook 18

  19. Algorithm Performance Evaluation: Investigated Algorithmic Choices � Wide variety of algorithms is implemented � Channel Estimation, Spatial Equalizer, Channel Coding � Channel Estimation, Spatial Equalizer, Channel Coding � Determine superior choice by error correction performance � Channel simulation � Fading: i.i.d. Rayleigh Fading � Power delay profile: Exponential 20dB drop along 150ns � Noise: AWGN � 4x4 MIMO system 19

  20. Algorithm Performance Evaluation: ZF vs. MMSE MIMO Equalization 4x4 MIMO, r=1/2, g 1 =(133) 8 , g 2 =(171) 8 , n conv =6144 bit, n ldpc = 1944 bit H(C)–H(C| � ) MMSE equalizer Better performance at little computational cost I(C, � ) = H(C 4QAM region 16QAM region 20

  21. Frame Error Rate of 4x4 MIMO System (Short Frames) Fix-point issues at low FERs when using MMSE-QRD, SIC-MMSE 21

  22. Frame Error Rate of 4x4 MIMO System (Short Frames) For the investigated algorithms MMSE-DS-QRD is a viable trade-off between is a viable trade-off between algorithmic performance and implementation complexity 22

  23. Frame Error Rate of 4x4 MIMO System for different Frame Sizes Algorithmic performance comparable to results found in literature to results found in literature 23

  24. Outline � Introduction � Nucleus Methodology � MIMO OFDM Transceiver Implementation � � � Application Analysis - Nuclei Identification � Efficient Nuclei Implementations on HW Platform (Flavor) � Algorithmic Performance Evaluation � Application-to-Architecture Mapping � Summary & Outlook 24

  25. Application-to-Platform Mapping: Identify Parallelism � Parallelizable dimensions of OFDM receiver application � Space (RX antennas) � Frequency (subcarriers) � Time (OFDM slots) � � Preamble � � Data payload � � Data payload 25

  26. Application-to-Platform Mapping: Assign Cores to PGs � Given :Single core timing requirements � Goal : Assign cores to match real time constraints (4 � s per slot) Task time (us) #cores Preprocessing (per OFDM frame) 4 LS Channel Estimation 17.47 4 Equalizer Preprocessing 215.31 Actual Processing (per OFDM slot) Actual Processing (per OFDM slot) OFDM Demodulation (mem. realign) 6.83 2 Equalizer (Actual Detection) 6.08 4 2 Soft Demapping (16 QAM) 2.84 26

  27. Application-to-Platform Mapping: Assign Cores � Final mapping � Partitioning of components into processing groups � Number of cores per group � 8 cores enable real time PG 2 PP&EQ 4 PEs PG 3 PG 3 PG 1 Demapping Modulation 2 PEs 2 PEs 27

Recommend


More recommend