sora high performance software
play

Sora: High Performance Software Radio using General Purpose Multi- - PowerPoint PPT Presentation

Sora: High Performance Software Radio using General Purpose Multi- Core Processors Kun Tan Jiansong Zhang Ji Fang He Liu Yusheng Ye Shen Wang Yongguang Zhang Haitao Wu Wei Wang Geoffrey M. Voelker Microsoft


  1. Sora: High Performance Software Radio using General Purpose Multi- Core Processors Kun Tan † Jiansong Zhang † Ji Fang ‡ He Liu § Yusheng Ye § Shen Wang § Yongguang Zhang † Haitao Wu † Wei Wang † Geoffrey M. Voelker ◊ † Microsoft Research Asia ‡ Tsinghua University, Beijing, China § Beijing Jiaotong University, Beijing, China ◊ UCSD, La Jolla, USA NSDI 2009, Boston, USA 1

  2. Software Radio Bluetooth GPS 3G General RF Frontend WiFi CDMA Bluetooth, WiFi, WiMAX, GSM, WiMAX software GSM CDMA, 3G, LTE … Benefits  Promise of universal connectivity and cost saving  Programmability => faster development cycle, faster to market  Open platform for wireless research NSDI 2009, Boston, USA 2

  3. Fundamental Challenges • Large volume of high-fidelity digital signals – Require a high-speed system I/O 1.2Gbps for 802.11 Antenna (20MHz channel, 16b A/D, 4x) ~up to 5 Gbps for 11n (4x4MIMO) ; Over 10Gbps for future high-speed wireless RF D/A Processor Frontend A/D Digital Hardware Software Samples NSDI 2009, Boston, USA 3

  4. Fundamental Challenges • Large volume of high-fidelity digital signals – Require a high-speed system I/O • Computation-intensive signal processing Samples Samples Samples Samples Bits Bits Bits Bits @384Mbps @512Mbps @640Mbps @1.28Gbps @24Mbps @24Mbps @48Mbps @48Mbps Convolutional Symbol Wave Scramble Interleaving QAM Mod IFFT GI Addition encoder Shaping Transmitter: To RF From MAC Samples Samples Samples Samples Bits Bits Bits @1.28Gbps @640Mbps @512Mbps @384Mbps @48Mbps @24Mbps @24Mbps Receiver: Demod + Viterbi Decimation Remove GI FFT Descramble Interleaving decoding From RF To MAC NSDI 2009, Boston, USA 4

  5. Fundamental Challenges • Large volume of high-fidelity digital signals – Require a high-speed system I/O • Computation-intensive signal processing Samples Samples Samples Samples Bits Bits Bits Bits @384Mbps @512Mbps @640Mbps @1.28Gbps @24Mbps @24Mbps @48Mbps @48Mbps Raw computation power required: Convolutional Symbol Wave Scramble Interleaving QAM Mod IFFT GI Addition encoder Shaping Transmitter: To RF 802.11b => 10Gops, 802.11a => 40Gops! From MAC Samples Samples Samples Samples Bits Bits Bits (now server-class CPU runs at 3GHz clock) @1.28Gbps @640Mbps @512Mbps @384Mbps @48Mbps @24Mbps @24Mbps Receiver: Demod + Viterbi Decimation Remove GI FFT Descramble Interleaving decoding From RF To MAC NSDI 2009, Boston, USA 5

  6. Fundamental Challenges • Large volume of high-fidelity digital signals – Require a high-speed system I/O • Computation-intensive signal processing • Hard deadline and accurate timing control – 802.11 MAC requires response within a few  s – Event trigger timing accuracy at  s level NSDI 2009, Boston, USA 6

  7. Approaches Sora Sora Programmable hardware Embedded (FPGA) Resolving the SDR platform dilemma High DSP • Commodity PC w/ C program • High performance Performance Example: Rice WARP, TI SFF-SDR • sys tput:10Gbps; ~  s latency • target wireless xput:10M~1Gbps Low Low-performance GPP-based SDR Example: GNU Radio/USRP(v1&2) • Interface USB/GbE: <1Gbps, >1ms • Achievable wireless xput: ~100Kbps Low High Programmability NSDI 2009, Boston, USA 7

  8. Sora Approach • New PCIe-based Interface card => high system throughput • New optimizations to implement PHY algorithms and streamline processing on multi-core CPU=> efficient PHY processing • Core dedication => real-time support NSDI 2009, Boston, USA 8

  9. Sora Architecture Digital Samples Multi-core CPU @Multiple Gbps RF APP APP APP APP RF RCB Mem RF A/D D/A RF Sora Sora APP APP PCIe bus Sora Soft-Radio Stack Sora Hardware General radio front-end: 700M/1.8G/2.4G/5GHz NSDI 2009, Boston, USA 9

  10. Radio Control Board Digital Samples Multi-core CPU @Multiple Gbps RF APP APP APP APP RF RCB Mem RF A/D D/A RF Sora Sora APP APP PCIe bus Sora Soft-Radio Stack Sora Hardware PCIe-based High-speed Interface card  PCIe is commodity in most modern PCs  High throughput: 16Gbps at PCIe-8x  Low latency: ~ 1  s  Separated with other I/O devices NSDI 2009, Boston, USA 10

  11. RCB Details PCIe-8x interface: up to 16Gbps throughput Versatile RF interface: up to 8 channels (8x8 MIMO) NSDI 2009, Boston, USA 11

  12. RCB Details FPGA FIFO A/D DMA RF RF Circuit Controller Controller FIFO D/A Antenna PCIE PCIe SDRAM Controller RF Front-end bus Controller Registers  s DDR SDRAM RCB  Buffered data path: bridging the synchronous ops at RF and asynchronous processing at CPU (12.3Gbps measured )  Low latency control path for software (0.36  s measured) Versatile RF interface: up to 8 channels (8x8 MIMO) NSDI 2009, Boston, USA 12

  13. Sora Software Digital Samples Multi-core CPU @Multiple Gbps RF APP APP APP APP RF RCB Mem RF A/D D/A RF Sora Sora APP APP PCIe bus Sora Soft-Radio Stack Sora Hardware High-performance SDR processing w/ key software techniques  Efficient PHY implementation using SIMD and LUTs  Speed up PHY using multi-core streamline processing  Core dedication for real-time support NSDI 2009, Boston, USA 13

  14. Efficient PHY Implementation • Exploit large high-speed cache memory – Extensive use of lookup tables (LUT): trade memory for calculation; still well fit into L2 cache – Applicable for more than half of the common algorithms; speedup ranges from 1.5x to 22x Output Data A Ex: Convolutional encoder + Direct impl. 8 ops per bit T b T b T b T b T b T b LUT impl. 2 Look- up op for 8 bits! (size 32KB) + Output Data B NSDI 2009, Boston, USA 14

  15. Efficient PHY Implementation • Exploit data parallelism in PHY – Utilize wide-vector SIMD extension in CPU – Applicable to many PHY algorithms with significant speedups (1.6x ~ 50x) Ex. (I)FFT NSDI 2009, Boston, USA 15

  16. Speed up PHY using multi-core streamline processing • Efficiently partition and schedule the PHY processing across cores – Interconnecting sub-pipeline with light-weight, synchronized FIFOs – Static scheduling of processing modules in PHY pipeline Core 1 Core 2 Demod + Viterbi Decimation Remove GI FFT Descramble Interleaving decoding Synchronized FIFO NSDI 2009, Boston, USA 16

  17. Core Dedication for Real-time Support • Exclusively allocate enough cores for SDR processing in multi-core systems – Guarantee the CPU, cache and memory bandwidth resources for predictable performance – Achieve  s-level timing control – Simple abstraction, and easier to implement in standard OSes than RT-scheduler • Implemented in WinXP without modifications to Kernel NSDI 2009, Boston, USA 17

  18. Implementation • Sora software platform on Win XP – 14K lines of C code, including PCIe driver framework, memory management, FIFO management, etc • SoftWiFi: full implementation of IEEE 802.11a/b/g PHY and DCF MAC – 9K lines of C code; 4 man-month for dev & test – DSSS 1, 2, 5.5, 11Mbps for 11b; OFDM 6, 9, 12, 18, 24, 36, 48, 54Mbps for 11a/g NSDI 2009, Boston, USA 18

  19. Results: PHY Processing 11.6 11.7 11.7 11.8 18.3 60.4 132.4 10 10 Required computation (Giga cycles per second) Required computation (Giga cycles per second) 9 9 >30x speedup ~10x speedup 8 8 7 7 After Sora Optimization 6 6 5 5 4 4 3 3 2 2 1 1 0 0 1M 2M 5.5M 11M 6M 24M 54M 1M 2M 5.5M 11M 6M 24M 54M 802.11b 802.11a/g 802.11b 802.11a/g NSDI 2009, Boston, USA 19

  20. Results: PHY Processing 11.6 11.7 11.7 11.8 18.3 60.4 132.4 10 10 Required computation (Giga cycles per second) Required computation (Giga cycles per second) 9 9 >30x speedup ~10x speedup 8 8 7 7 After Sora Optimization 6 6 Sora enables software implementation of 5 5 4 4 today’s high -speed wireless system in 3 3 standard PC with a few cores 2 2 1 1 0 0 1M 2M 5.5M 11M 6M 24M 54M 1M 2M 5.5M 11M 6M 24M 54M 802.11b 802.11a/g 802.11b 802.11a/g NSDI 2009, Boston, USA 20

  21. Results: End-to-end Throughput Communicating with commercial 802.11a/b/g card 25 Sora-Commercial Throughput (Mbps) 20 Commercial-Commercial Commercial-Sora 15 10 5 0 1M 2M 5.5M 11M 6M 24M 54M Modulation Mode NSDI 2009, Boston, USA 21

  22. Results: End-to-end Throughput Communicating with commercial 802.11a/b/g card 25 Sora-Commercial Throughput (Mbps) 20 Commercial-Commercial Seamlessly interoperate with commercial WiFi Commercial-Sora 15 • Correctness of all PHY algorithms • Satisfying timing requirements of standards 10 • Commercial equivalent performance 5 0 1M 2M 5.5M 11M 6M 24M 54M Modulation Mode NSDI 2009, Boston, USA 22

  23. Extensions TDMA MAC Jumbo frames in 802.11 NSDI 2009, Boston, USA 23

  24. Extensions: New Applications NSDI 2009, Boston, USA 24

  25. Conclusion • Sora is a fully programmable software radio platform on commodity PC architecture – Easy C programming on multi-core CPU – High performance: high processing speed, low latency, and performance guarantee • Confirmed by SoftWiFi, the first fully interoperable IEEE 802.11 (PHY and MAC) on general purpose processors • Plan to release Sora SDK to research community – H/W: RCB + 2.4G RF front-end set (~$2K USD) NSDI 2009, Boston, USA 25

Recommend


More recommend