A RISC-V ISA EXTENSION FOR ULTRA-LOW POWER IOT WIRELESS SIGNAL - PowerPoint PPT Presentation

A RISC-V ISA EXTENSION FOR ULTRA-LOW POWER IOT WIRELESS SIGNAL PROCESSING Carolynn Bernier, Hela Belhadj Amor, Zdenĕk Přikryl Oct 1, 2019

ULP WIRELESS DESIGN @ LETI 2003 2005 2010 today RFID Atmel-Starchip VHBR 65nm Digbee ULP RF SoC Letibee Foxy UWB UWB Impulse LDR-TCR radio LC FILTER CONFIG BASEBAND INPUT PLL & OUTPUT RX MN TX Hybrid UWB / RFID UWB/RFID Wake-up UMETAG Wake-up radio Radio C. Bernier | October 1, 2019 | 2

SOFTWARE RADIO FOR ULP IOT Motivation : A software-defined “Smart” wireless transceiver for IoT • PHY-agnostic solution for LPWA-IOT • Address « multi-mode » markets and lower hardware bug fix costs Software- • Offer future-proofed designs to our clients Defined • Our clients’ advanced prototypes have evolving needs : satellite-IoT, Transceiver Ultra-wide band localization, LPWA-IoT. • A new experimental platform • Design new “RF software sensors” • Use light-weight ML algorithms to extract information from the RF signal C. Bernier | October 1, 2019 | 3

SOFTWARE RADIO FOR ULP IOT • Bottleneck : Existing software-defined radio (SDR) solutions are NOT ULP ! High cost (200 - 5K USD) General purpose  High power [Akeela, 2018] C. Bernier | October 1, 2019 | 4

SOFTWARE RADIO FOR ULP IOT Solution : Design of ULP-SDR SDR-based IoT node Similar requirements in most IoT 2.4 GHz (ISM) MCU transceivers (BW < 5MHz) • Heterogeneous Application or • Protocol stack multi-core platform 1.6 GHz (satellite) Wide- ULP MEM Configurable band Challenge: or DFE SDR RF PMU UWB Target mW-level Sensor I/F or power consumption Differing requirements in subGHz (ISM) … most IoT transceivers C. Bernier | October 1, 2019 | 5

SYSTEM REQUIREMENTS • Target Architecture • A very small and fast core (signoff ~300 MHz) associated to a TCPM and TCDM • Software DSP limited to decimated sample streams • DFE includes easily configurable and common HW operators : FIR filters, down- converters, AGC… • Real-time processing of complex samples • Samples are temporarily stored in sample buffer and processed in blocks • Integer processing only • Limit size of memory  big impact on power  configurable in size • TCPM (high speed non volatile) • TCDM (stack usage !) • Sample buffer • Limit read/write to TCM • Single-cycle sleep • Wait for next block of samples • Radio = OFF/ON C. Bernier | October 1, 2019 | 6

COMPUTING FOR WIRELESS DSP • Wireless DSP requires linearity and low distortion • Operatiors MUST NOT saturate • Operators MUST NOT overflow  but checking for overflows is too costly • Wireless DSP must conserve dynamic range (DR) • The useful signal is often contained in the least significant bits • Beware of quantification noise  take care when rescaling the signal ! • Most wireless signals are complex : i(t) + j*q(t) • Frequent use of MUL, ADD, SUB, MAG, SHIFT, … instructions on 8/16/32 bit complex data • Demodulation/compensation algorithms are mostly based on correlations  i.e. multiplication • Input signal stream is typically <= 8 bits I 1 Q 1 I 1 I 2 I 2 Q 2 Q 2 Q 1 • i.e. data streams are typically 8 / 16 / 32 bits •  fits well on a 32-bit machine X X X X - + C. Bernier | October 1, 2019 | 7

WHICH PROCESSOR FOR OUR SDR ? • Academic: Dedicated processors Commercial: GP processors, DSP Custom SIMD [Chen, HPCA16] Promising power consumption Dedicated architectures  Previous work: difficult to program M3/M0+ vs. RISCY No software tool- [Belhadj, DATE19] chains  Lessons learned : GP processor can rival Custom MCU [Wu, GlobalSIP16] dedicated SoA processor architectures (with Low frequency clock additional benefits)  Large surface  Lessons learned : size of register file has overheads huge impact on cycle count RISC-V advantage !  Lessons learned : post-increment, HW Inefficient use of advanced CMOS loop, SIMD  not important in our test benches nodes (mix of DSP computing and control) | 8 C. Bernier | October 1, 2019

PROCESSOR CUSTOMIZATION • RISC-V-based acceleration ? • Extend RISC-V ISA using dedicated instructions • Codasip Studio :  An easy task ? Codasip Studio Toolset • Instruction Accurate (IA) model of new instructions • Dedicated to RF DSP HDK(CA) SDK(IA/CA) computing “zero cost” Automatic RTL generation Automatic Toolchain hardware implementation Powerful High level Syntheses generation Verilog VHDL Standards based tools & models Verification Automation VSP and processor validation Virtual prototypes C. Bernier | October 1, 2019 | 9

EXPLORING THE INSTRUCTION JUNGLE • Wanted • Minimal set of USEFUL instructions. • Only 32-bit opcodes for low decoding complexity. • REJECTED Opportunities • • Wide opcodes means up to 5 operands ! More general solution prefered : • First operation on 8-bit data is ALWAYS a complex • Halving variants (e.g. RADD) multiplication • Advanced CMOS allows single-cycle operators • Not clearly indispensable : • Tiny relative cost of ALU operators • CSMUL (complex-scalar multiply ) • Useless : • saturating instructions, MIN/MAX, 8 bit SIMD, CONJ 45 nm, 0.9 V [M. Horowitz, ISSCC 2014] C. Bernier | October 1, 2019 | 10

PROPOSED EXTENSION • 15 instructions using 3 major opcodes • « Zero-cost »  Reconfigurable HW  Systematic output DR adjust • « Low-cost »  4 output / 2 input port register file  Duplicated ALU • « Higher-cost »  3 more 32-bit multipliers C. Bernier | October 1, 2019 | 11

WIRELESS DSP TESTBENCHES Testbench 1: FSK demodulation Testbench 3: 16 and 32-bit FFT • Radix-4 decimation-infrequency, complex FFT with bit-reversed outputs, N = 128, 2048 • Based on source code from a port of the ARM CMSIS DSP library to RISC-V Testbench 2: LoRa preamble synchronization Testbench 4: CORDIC algorithm • Spreading Factor (SF) = 7, 11 • 10 iteration CORDIC algorithm applied to 32-bit complex input data. C. Bernier | October 1, 2019 | 12

Power Model Baseline +Extensions RESULTS All instr. except 1 1.05 NOP and MUL MUL 1.14 1.14 MULC16-32 / - 1.3 MULC16 • Expect at least ~50% power reductions with MULC32 - 1.59 reduced clock and VDD. Testbench Cycle count improvement (IA model) Energy improvement (est.) FSK Demod 22 % LoRa, SF=7 49 % 46 % LoRa, SF=11 52 % 50 % 16-bit FFT, N=128 55 % 53 % 16-bit FFT, N=2048 57 % 55 % 32-bit FFT, N=128 34 % 32 % 32-bit FFT, N=2048 34 % 30 % 32-bit CORDIC, 10 iteration 28 % C. Bernier | October 1, 2019 | 13

FUTURE WORK • Finish CA model & run Power/Area analysis in 22 nm • Reconfigurable hardware blocks designed in CodAL. Ex: 32-bit multiplication src1[15:0] src1[31:16] src1[15:0] src2[15:0] src1[31:16] src2[15:0] src2[31:16] src2[31:16] p 00 [31:0] p 10 [31:0] p 11 [31:0] p 01 [31:0] CASE : 32-bit integer multiplication CASE : 16-bit complex multiplication p 10 [31:0] p 11 [31:0] p 01 [31:0] p 10 [31:0] p 01 [31:0] Two’s compl. p x [32:0] […00,p x ,00..] [p 11 [31:0],p 00 [31:0]] p 00 [31:0] p real [31:0] p mag [31:0] P[63:0] C. Bernier | October 1, 2019 | 14

Special thanks to : Hela Belhadj Amor Zdenĕk Přikryl Jerry Ardizzone And Ivan Miro Panades Yves Durand Henri-Pierre Charles Simone Bacles-Min Romain Lemaire Leti, technology research institute Commissariat à l’énergie atomique et aux énergies alternatives … and all of LISAN ! Minatec Campus | 17 rue des Martyrs | 38054 Grenoble Cedex | France www.leti.fr

PROCESSOR CUSTOMIZATION • Step 1 : ISA exploration using IA model Used by IA and CA models element opc_name { use instance_data_type as name of instances; assembler {textual form of the instruction}; binary {The instructions's binary coding}; semantics Used by IA model { The instruction's behavior is described using a subset of the ANSI C language. Call to memory }; interface if_ldst }; C. Bernier | October 1, 2019 | 16

RECONFIGURABLE MULTIPLIER (8 BIT EXAMPLE HERE) State 1 : the block performs 8-bit integer multiplication a[7:0] * b[7:0] = P[15:0] p 00 [7:0] = a[3:0] * b[3:0] p 10 [7:0] = a[7:4] * b[3:0] p 01 [7:0] = a[3:0] * b[7:4] p 10 [7:0] p 11 [7:0] = a[7:4] * b[7:4] p 00 [7:0] P[15:0]= p 00 [7:0] + p 10 [7:0] << 4 + p 11 [7:0] p 01 [7:0] << 4 + p 01 [7:0] p 00 [7:0] << 8 C. Bernier | October 1, 2019 | 17

RECONFIGURABLE MULTIPLIER (8 BIT EXAMPLE HERE) State 2 : the block performs a 4-bit complex integer multiplication : (I 1 +j*Q 1 ) * (I 2 + j*Q 2 ) = P real + j*P imag Q 1 I 1 Q 2 I 2 Input is redefined: p 10 [7:0] I 1 [3:0] = a[3:0] Q 1 [3:0] = a[7:4] I 2 [3:0] = b[3:0] p 00 [7:0] Q 2 [3:0] = b[7:4] p 11 [7:0] p 01 [7:0] C. Bernier | October 1, 2019 | 18

RECONFIGURABLE MULTIPLIER (8 BIT EXAMPLE HERE) State 2 : the block performs a 4-bit complex integer multiplication : (I 1 +j*Q 1 ) * (I 2 + j*Q 2 ) = P real + j*P imag Q 1 I 1 Q 2 I 2 P real = I 1 *I 2 - Q 1 * Q 2 P real = p 00 [7:0] - p 11 [7:0] p 10 [7:0] P imag = I 1 *Q 2 + Q 1 * I 2 P imag = p 01 [7:0] + p 10 [7:0] p 00 [7:0] p 11 [7:0] p 01 [7:0] C. Bernier | October 1, 2019 | 19

A RISC-V ISA EXTENSION FOR ULTRA-LOW POWER IOT WIRELESS SIGNAL - PowerPoint PPT Presentation

A RISC-V ISA EXTENSION FOR ULTRA-LOW POWER IOT WIRELESS SIGNAL PROCESSING Carolynn Bernier, Hela Belhadj Amor, Zdenk Pikryl Oct 1, 2019 ULP WIRELESS DESIGN @ LETI 2003 2005 2010 today RFID Atmel-Starchip VHBR 65nm

The Internet of Things: (almost) every thing connected to Internet By Vctor M. Rivas Santos

Roadmap 1. Instruction Set Architectures (ISA) What is CISC? What is RISC? Why did RISC prevail

Roadmap 1. Instruction Set Architectures (ISA) What is CISC? What is RISC? Why did RISC prevail

NB-IOT Antti Ratilainen LPWAN@IETF96 1 NB-IoT targeted use cases NB-IoT Low cost Ultra

End-to-end formal ISA verification of RISC-V processors with riscv-formal Clifford Wolf About

Innovative Power Control for Ultra Low-Power and High- Ultra Low Power and High Performance

An Introduction to IoT Penetration Testing @libertyunix www.kmco.com The Agenda n IoT Attack

A/D Conversion and A/D Conversion Filtering for Ultra Low Filtering for Ultra Low A/D

Corporate Presentation December 2019 Agenda Overview ISA Group 1 Overview ISA Group in Per

ISAs and Y86-64 Samira Khan Agenda ISA vs Microarchitecture ISA Tradeoffs Y86-64 ISA

IoT - Big Data & Security MWC Smart Cities Seminar Telefnica Global IoT Group Feb 2017

Internet of Things (IoT) Raspberry Pi Summer Camp Tech Talk Raspberry Pi Camp IoT 1

Customer Presentation 16-bit Ultra Low Power Microcontroller The eCOG1, 16 Bit Ultra Low Power

Outline Introduction. Paper: System Design for Ultra-Low Power. Bernier, C. Hameau, F.,

The future of operating systems on RISC-V Alex Bradbury asb@lowrisc.org @asbradbury 4th

ChemBioDraw Today & Tomorrow Mark L. Olson, PhD Vice-President, Software Development

SYSTEM CALL IN MINIX Zhen Mo What is the MINIX System? Mini Unix (Minix) basically, a UNIX

A Privacy-Restoring Mechanism for Offline RFID Systems Gildas Avoine Iwen Coisel Tania Martin

PublicKeyCryptographyforRFID Tags

ECC on small devices Junfeng Fan Katholieke Universiteit Leuven, Belgium

CONTEXT SENSITIVE SERVICES & INFORMATION SYSTEMS IN THE PERGAMON & THE JEWISH MUSEUM

OpenPCD / OpenPICC Free Software and Hardware for 13.56MHz RFID Nov 24, 2006 FOSS.in, Bangalore

Privacy Preserving Protocols Workshop on Cryptography for the Internet of Things Jens Hermans KU

Some Recent Development Some Recent Development in RFID Privacy Models Robert H. Deng School of

A RISC-V ISA EXTENSION FOR ULTRA-LOW POWER IOT WIRELESS SIGNAL - PowerPoint PPT Presentation

A RISC-V ISA EXTENSION FOR ULTRA-LOW POWER IOT WIRELESS SIGNAL PROCESSING Carolynn Bernier, Hela Belhadj Amor, Zdenk Pikryl Oct 1, 2019 ULP WIRELESS DESIGN @ LETI 2003 2005 2010 today RFID Atmel-Starchip VHBR 65nm

The Internet of Things: (almost) every thing connected to Internet By Vctor M. Rivas Santos

Roadmap 1. Instruction Set Architectures (ISA) What is CISC? What is RISC? Why did RISC prevail

Roadmap 1. Instruction Set Architectures (ISA) What is CISC? What is RISC? Why did RISC prevail

NB-IOT Antti Ratilainen LPWAN@IETF96 1 NB-IoT targeted use cases NB-IoT Low cost Ultra

End-to-end formal ISA verification of RISC-V processors with riscv-formal Clifford Wolf About

Innovative Power Control for Ultra Low-Power and High- Ultra Low Power and High Performance

An Introduction to IoT Penetration Testing @libertyunix www.kmco.com The Agenda n IoT Attack

A/D Conversion and A/D Conversion Filtering for Ultra Low Filtering for Ultra Low A/D

Corporate Presentation December 2019 Agenda Overview ISA Group 1 Overview ISA Group in Per

ISAs and Y86-64 Samira Khan Agenda ISA vs Microarchitecture ISA Tradeoffs Y86-64 ISA

IoT - Big Data &amp; Security MWC Smart Cities Seminar Telefnica Global IoT Group Feb 2017

Internet of Things (IoT) Raspberry Pi Summer Camp Tech Talk Raspberry Pi Camp IoT 1

Customer Presentation 16-bit Ultra Low Power Microcontroller The eCOG1, 16 Bit Ultra Low Power

Outline Introduction. Paper: System Design for Ultra-Low Power. Bernier, C. Hameau, F.,

The future of operating systems on RISC-V Alex Bradbury asb@lowrisc.org @asbradbury 4th

ChemBioDraw Today &amp; Tomorrow Mark L. Olson, PhD Vice-President, Software Development

SYSTEM CALL IN MINIX Zhen Mo What is the MINIX System? Mini Unix (Minix) basically, a UNIX

A Privacy-Restoring Mechanism for Offline RFID Systems Gildas Avoine Iwen Coisel Tania Martin

PublicKeyCryptographyforRFID Tags

ECC on small devices Junfeng Fan Katholieke Universiteit Leuven, Belgium

CONTEXT SENSITIVE SERVICES &amp; INFORMATION SYSTEMS IN THE PERGAMON &amp; THE JEWISH MUSEUM

OpenPCD / OpenPICC Free Software and Hardware for 13.56MHz RFID Nov 24, 2006 FOSS.in, Bangalore

Privacy Preserving Protocols Workshop on Cryptography for the Internet of Things Jens Hermans KU

Some Recent Development Some Recent Development in RFID Privacy Models Robert H. Deng School of

IoT - Big Data & Security MWC Smart Cities Seminar Telefnica Global IoT Group Feb 2017

ChemBioDraw Today & Tomorrow Mark L. Olson, PhD Vice-President, Software Development

CONTEXT SENSITIVE SERVICES & INFORMATION SYSTEMS IN THE PERGAMON & THE JEWISH MUSEUM