Sora: High Performance Software Radio using General Purpose Multi- Core Processors Kun Tan † Jiansong Zhang † Ji Fang ‡ He Liu § Yusheng Ye § Shen Wang § Yongguang Zhang † Haitao Wu † Wei Wang † Geoffrey M. Voelker ◊ † Microsoft Research Asia ‡ Tsinghua University, Beijing, China § Beijing Jiaotong University, Beijing, China ◊ UCSD, La Jolla, USA NSDI 2009, Boston, USA 1
Software Radio Bluetooth GPS 3G General RF Frontend WiFi CDMA Bluetooth, WiFi, WiMAX, GSM, WiMAX software GSM CDMA, 3G, LTE … Benefits Promise of universal connectivity and cost saving Programmability => faster development cycle, faster to market Open platform for wireless research NSDI 2009, Boston, USA 2
Fundamental Challenges • Large volume of high-fidelity digital signals – Require a high-speed system I/O 1.2Gbps for 802.11 Antenna (20MHz channel, 16b A/D, 4x) ~up to 5 Gbps for 11n (4x4MIMO) ; Over 10Gbps for future high-speed wireless RF D/A Processor Frontend A/D Digital Hardware Software Samples NSDI 2009, Boston, USA 3
Fundamental Challenges • Large volume of high-fidelity digital signals – Require a high-speed system I/O • Computation-intensive signal processing Samples Samples Samples Samples Bits Bits Bits Bits @384Mbps @512Mbps @640Mbps @1.28Gbps @24Mbps @24Mbps @48Mbps @48Mbps Convolutional Symbol Wave Scramble Interleaving QAM Mod IFFT GI Addition encoder Shaping Transmitter: To RF From MAC Samples Samples Samples Samples Bits Bits Bits @1.28Gbps @640Mbps @512Mbps @384Mbps @48Mbps @24Mbps @24Mbps Receiver: Demod + Viterbi Decimation Remove GI FFT Descramble Interleaving decoding From RF To MAC NSDI 2009, Boston, USA 4
Fundamental Challenges • Large volume of high-fidelity digital signals – Require a high-speed system I/O • Computation-intensive signal processing Samples Samples Samples Samples Bits Bits Bits Bits @384Mbps @512Mbps @640Mbps @1.28Gbps @24Mbps @24Mbps @48Mbps @48Mbps Raw computation power required: Convolutional Symbol Wave Scramble Interleaving QAM Mod IFFT GI Addition encoder Shaping Transmitter: To RF 802.11b => 10Gops, 802.11a => 40Gops! From MAC Samples Samples Samples Samples Bits Bits Bits (now server-class CPU runs at 3GHz clock) @1.28Gbps @640Mbps @512Mbps @384Mbps @48Mbps @24Mbps @24Mbps Receiver: Demod + Viterbi Decimation Remove GI FFT Descramble Interleaving decoding From RF To MAC NSDI 2009, Boston, USA 5
Fundamental Challenges • Large volume of high-fidelity digital signals – Require a high-speed system I/O • Computation-intensive signal processing • Hard deadline and accurate timing control – 802.11 MAC requires response within a few s – Event trigger timing accuracy at s level NSDI 2009, Boston, USA 6
Approaches Sora Sora Programmable hardware Embedded (FPGA) Resolving the SDR platform dilemma High DSP • Commodity PC w/ C program • High performance Performance Example: Rice WARP, TI SFF-SDR • sys tput:10Gbps; ~ s latency • target wireless xput:10M~1Gbps Low Low-performance GPP-based SDR Example: GNU Radio/USRP(v1&2) • Interface USB/GbE: <1Gbps, >1ms • Achievable wireless xput: ~100Kbps Low High Programmability NSDI 2009, Boston, USA 7
Sora Approach • New PCIe-based Interface card => high system throughput • New optimizations to implement PHY algorithms and streamline processing on multi-core CPU=> efficient PHY processing • Core dedication => real-time support NSDI 2009, Boston, USA 8
Sora Architecture Digital Samples Multi-core CPU @Multiple Gbps RF APP APP APP APP RF RCB Mem RF A/D D/A RF Sora Sora APP APP PCIe bus Sora Soft-Radio Stack Sora Hardware General radio front-end: 700M/1.8G/2.4G/5GHz NSDI 2009, Boston, USA 9
Radio Control Board Digital Samples Multi-core CPU @Multiple Gbps RF APP APP APP APP RF RCB Mem RF A/D D/A RF Sora Sora APP APP PCIe bus Sora Soft-Radio Stack Sora Hardware PCIe-based High-speed Interface card PCIe is commodity in most modern PCs High throughput: 16Gbps at PCIe-8x Low latency: ~ 1 s Separated with other I/O devices NSDI 2009, Boston, USA 10
RCB Details PCIe-8x interface: up to 16Gbps throughput Versatile RF interface: up to 8 channels (8x8 MIMO) NSDI 2009, Boston, USA 11
RCB Details FPGA FIFO A/D DMA RF RF Circuit Controller Controller FIFO D/A Antenna PCIE PCIe SDRAM Controller RF Front-end bus Controller Registers s DDR SDRAM RCB Buffered data path: bridging the synchronous ops at RF and asynchronous processing at CPU (12.3Gbps measured ) Low latency control path for software (0.36 s measured) Versatile RF interface: up to 8 channels (8x8 MIMO) NSDI 2009, Boston, USA 12
Sora Software Digital Samples Multi-core CPU @Multiple Gbps RF APP APP APP APP RF RCB Mem RF A/D D/A RF Sora Sora APP APP PCIe bus Sora Soft-Radio Stack Sora Hardware High-performance SDR processing w/ key software techniques Efficient PHY implementation using SIMD and LUTs Speed up PHY using multi-core streamline processing Core dedication for real-time support NSDI 2009, Boston, USA 13
Efficient PHY Implementation • Exploit large high-speed cache memory – Extensive use of lookup tables (LUT): trade memory for calculation; still well fit into L2 cache – Applicable for more than half of the common algorithms; speedup ranges from 1.5x to 22x Output Data A Ex: Convolutional encoder + Direct impl. 8 ops per bit T b T b T b T b T b T b LUT impl. 2 Look- up op for 8 bits! (size 32KB) + Output Data B NSDI 2009, Boston, USA 14
Efficient PHY Implementation • Exploit data parallelism in PHY – Utilize wide-vector SIMD extension in CPU – Applicable to many PHY algorithms with significant speedups (1.6x ~ 50x) Ex. (I)FFT NSDI 2009, Boston, USA 15
Speed up PHY using multi-core streamline processing • Efficiently partition and schedule the PHY processing across cores – Interconnecting sub-pipeline with light-weight, synchronized FIFOs – Static scheduling of processing modules in PHY pipeline Core 1 Core 2 Demod + Viterbi Decimation Remove GI FFT Descramble Interleaving decoding Synchronized FIFO NSDI 2009, Boston, USA 16
Core Dedication for Real-time Support • Exclusively allocate enough cores for SDR processing in multi-core systems – Guarantee the CPU, cache and memory bandwidth resources for predictable performance – Achieve s-level timing control – Simple abstraction, and easier to implement in standard OSes than RT-scheduler • Implemented in WinXP without modifications to Kernel NSDI 2009, Boston, USA 17
Implementation • Sora software platform on Win XP – 14K lines of C code, including PCIe driver framework, memory management, FIFO management, etc • SoftWiFi: full implementation of IEEE 802.11a/b/g PHY and DCF MAC – 9K lines of C code; 4 man-month for dev & test – DSSS 1, 2, 5.5, 11Mbps for 11b; OFDM 6, 9, 12, 18, 24, 36, 48, 54Mbps for 11a/g NSDI 2009, Boston, USA 18
Results: PHY Processing 11.6 11.7 11.7 11.8 18.3 60.4 132.4 10 10 Required computation (Giga cycles per second) Required computation (Giga cycles per second) 9 9 >30x speedup ~10x speedup 8 8 7 7 After Sora Optimization 6 6 5 5 4 4 3 3 2 2 1 1 0 0 1M 2M 5.5M 11M 6M 24M 54M 1M 2M 5.5M 11M 6M 24M 54M 802.11b 802.11a/g 802.11b 802.11a/g NSDI 2009, Boston, USA 19
Results: PHY Processing 11.6 11.7 11.7 11.8 18.3 60.4 132.4 10 10 Required computation (Giga cycles per second) Required computation (Giga cycles per second) 9 9 >30x speedup ~10x speedup 8 8 7 7 After Sora Optimization 6 6 Sora enables software implementation of 5 5 4 4 today’s high -speed wireless system in 3 3 standard PC with a few cores 2 2 1 1 0 0 1M 2M 5.5M 11M 6M 24M 54M 1M 2M 5.5M 11M 6M 24M 54M 802.11b 802.11a/g 802.11b 802.11a/g NSDI 2009, Boston, USA 20
Results: End-to-end Throughput Communicating with commercial 802.11a/b/g card 25 Sora-Commercial Throughput (Mbps) 20 Commercial-Commercial Commercial-Sora 15 10 5 0 1M 2M 5.5M 11M 6M 24M 54M Modulation Mode NSDI 2009, Boston, USA 21
Results: End-to-end Throughput Communicating with commercial 802.11a/b/g card 25 Sora-Commercial Throughput (Mbps) 20 Commercial-Commercial Seamlessly interoperate with commercial WiFi Commercial-Sora 15 • Correctness of all PHY algorithms • Satisfying timing requirements of standards 10 • Commercial equivalent performance 5 0 1M 2M 5.5M 11M 6M 24M 54M Modulation Mode NSDI 2009, Boston, USA 22
Extensions TDMA MAC Jumbo frames in 802.11 NSDI 2009, Boston, USA 23
Extensions: New Applications NSDI 2009, Boston, USA 24
Conclusion • Sora is a fully programmable software radio platform on commodity PC architecture – Easy C programming on multi-core CPU – High performance: high processing speed, low latency, and performance guarantee • Confirmed by SoftWiFi, the first fully interoperable IEEE 802.11 (PHY and MAC) on general purpose processors • Plan to release Sora SDK to research community – H/W: RCB + 2.4G RF front-end set (~$2K USD) NSDI 2009, Boston, USA 25
Recommend
More recommend