Towards Benchmarking AIOT Device based on MCU Dong Li Seaway Technology Inc. ICT, CAS 2019-11-15
Outline MCU-based AIOT Device and Benchmarking SeawayRTOS Intro. & Auditing Kernel Contents Early Experiments for BenchMarking BenchMarking Goal and Method 2 Bench19 Seaway tech. 2
内容提要 01 MCU-based AIOT Device and Benchmarking 2 Bench19 Seaway tech. 3
MCU-based AIOT Device 1. Tiny Smart Device with computing ability are Already Cheap and Everywhere. 2. the Future of Machine Learning will be Tiny 2 Bench19 Seaway tech. 4
MCU and Sensors are already in milliwatts ranges - ARM & Princeton [arXiv:1905.12107] 6 in' Display 4G cell radio LP BLE4.0&WIFI Gyroscope Sensor GPS 1/4 CMOS camera 400 mW 800 mW 100 mW 130mW 180 milliwatts. 300 milliwatts. 2 Bench19 Seaway tech. 5
Deep Learning Works Well and Energy-Efficient on MCUs 1. ARM CMSIS-5 for Cortex M - CMSIS-NN - uTensor Nordic nRF 52840 ESP32 SOC 2. TensorFlow Lite For MCU BLE WIFI and BLE - Person detection - Speech Keyword spotting - Classify physical gestures 3. Microsoft Embedded Learning Library (ELL) STM32F746 Spark fun Edge Discovery kit with Apollo3 2 Bench19 Seaway tech. 6
Existing MCUs and New AIOT Low Power Proccessor Existing MCU/DSP 1. MCU 40~200Mhz 2. RAM(SDRAM) 32KB ~ 512KB 3. ROM(Flash) 512KB ~ 1MB 4. Energy ~100 uA/MHz (1.2V - 5V) ESP32 by TFLite for Face Recognition New AIOT Proccessor (MCU/DSP+NPU) 1. MCU+NPU by ARM or RISC-V 2. MCU+DSP+ Spec. NN Accelerator by ARM/RISC-V/FPGA 3. MCU+PIM(Process in Memory) chip ICT RISCV MCU+NPU FPGA Broad Bench19 Seaway tech. 7
Benchmarking Goal : The Best Shape Computing Performance Energy Consumption Max ROM spindle-shaped is the best shape Cost Max RAM picojoules per op Accuracy Bench19 Seaway tech. 8
02 SeawayRTOS Intro. & Auditing Kernel 2 Bench19 Seaway tech. 9
SeawayRTOS for AIOT Devices KB-level Seaway RTOS Kernelel) Auditing Kernel - Active Sleep Mode - App App App App ROM<10K & RAM<1K & TCB<10B - ask Fail Rate <0.1% - Seaway Runtime KB-Level Runtime AIOT Framework Seaway Energy EdgeStack Files Kernel Opt. Online AIOT App Store - Support Javascript and Python - HAL & BSP ROM<100K, RAM<2K - AI core Big Core Little Core KB-Level EdgeStack OS Sensor Hub Inference Data/Ins. Bus Function Migration - Support for MQTT、CoAP and HTTP - I/O BUS Memory Comm. WIFI、BLE、LoRA、NB-IOT and Zigbee - Controller Controller Sensors Actua. ROM<32K, RAM<2K - Resp to Req <200 mS 2 - Bench19 Seaway tech. 10
Seaway Runtime 技术特点 1. AIOT App Store - 不落盘AIOT App应用执行方法 - 面向边缘域的拟单机编程 2. AIOT Runtime Development - on Kernel:Native C/C++ - on Runtime:JavaScript/Python - Dynamic Task Allocation and Execution 3. Less Codes than Traditional Embedded Program Experiment result Evaluation index WebletScript JerryScript Duktape Espruino Compatibility(%) 58.6 99.7 99.4 66.5 Footprint(KB) 80 168 184 231 2 by ECMA-262 benchmark Bench19 Seaway tech. 11
Seaway EdgeSuite The developer now only need one application for the whole end-Edge-clould system End AIOT Device Edge AIOT Device Cloud Seaway RTOS Seasway Edge Seaway Cloud 2 Bench19 Seaway tech. 12
Auditing Kernel Design Design Goals n Enable Kernel information monitoring for event-driven RTOS should be in Kernel n A lightweight resource auditing tool Less than 1KB ROM and 1KB RAM n Early security warning when the abnormal resource usage pattern is captured Bench19 Seaway tech. 13
Auditing Kernel Design n Process l Confirm the execution entity of a task l Locate the executable code segment n Event l The event statistics data of a tasks in the kernel l Identify the abnormal event usage. n Hardware resource usage l Quantity and pattern of the consumption of hardware resource, including Proccessor, Memory, Radio and Sensors 5 Bench19 Seaway tech. 14
Seaway Resource Auditing Overview 1. Resource Auditor Moudle collects the running information and generates the log data of an AIoT device. 2. Seaway analyzes the log data in Edge devices according to the corresponding resource usage Model. 3. the AIoT devices receive the performance status. 7 Bench19 Seaway tech. 15
Kernel Auditing Architecture n Data Hook l Process-Event Model l Hardware Time-Base Model n Data processing Module n Warning Handle Module n kernel inner loop function l The entity of a task l The executable code segment l Setup hooks in basic kernel function such as do_poll / do_event l Save the data in the locally file system l Or Send them out to the gateway for analysis 7 Bench19 Seaway tech. 16
Capture the kernel data for hardware Resources Category Component Parameter K ernel Events n Hardware resource scheduling wifi_i nit _result WiFi init l Quantity and pattern of the consumption of event and task wifi_mode WiFi set_mode N etwork wifi_state WiFi On/Off Network Data source source IP Package d estin ation destination IP package_transfer t askID xTaskCreate System Task portYIELD, Shcedulin task_running_fre Information xPortSysTickHan g D ata quency dler C PU frequency CPU CPU_Frequency switch H ardware Module nvi roment_data sensor_get_data Usage Sensors S ensors_Frequen sensor frequency cy switch Bench19 Seaway tech. 17
03 Experiments for getting bench score 2 Bench19 Seaway tech. 18
Experiment Setup n SeawayRTOS n CC2538 + ESP32 l A event-driven scheduling system l an ARM Cortex-M3 with up to l multi-threaded 32MHz clock speed l lightweight threading technology l 32KB of RAM Protothreads l 256KB flash l file system(Coffee) l Zigbee in CC2538 l network support: LwIP l WIFI/BLE in ESP32 l OTA 8 Bench19 Seaway tech. 19
EVALUATION we catch the kernel data of event and process information of an benchmark task using SeawayRTOS 9 Bench19 Seaway tech. 20
n The Process-event Analysis Result l There are different operations in Period 1056&1057 compared with base behavior of this benchmarking task l The system is using the radio to send data period Warning generated The analysis restult of the tcp/ip experiment with process-event Model 10 Bench19 Seaway tech. 21
n The Time-Base Analysis Result l We got the working state information of CPU, Memory, RADIO and SENSORS l There are suspicious operations in Period 5&6 compared with normal action of this application l The System is using the radio to listen other period data The analysis result of the Time-Base Model l Warning generated, and we should suspend the task waiting for the administrator to decide. 12 Bench19 Seaway tech. 22
04 BenchMarking Goal and Method 2 Bench19 Seaway tech. 23
1. A open-source Testbed Board with sensors and Radios A: Low Power BLE/WIFI Module B: MIC C: Accelerometers D: Temperature & Humidity E: multi-threaded Protothreads F: COMS Image Sensor G: PIR (motion) sensor H: GPS the main processor 2 Bench19 Seaway tech. 24
Run the Benchmark tasks on DataSets Chars74K dataset CIFAR-10 MNIST database handwritten digits Character Recognition objects classification We can provide some baseline results on these dataset with our own implementation on STM32 and ESP32 Wechat Audio 100 Band Accelerator Data 100hours Band Heart Rate 100hours Keyword Spotting Pattern recognition for DL and SVM alg. 2 By Seaway Tech. By Seaway Tech. Bench19 Seaway tech. 25
Benchmark Design First Satisfy: 1. Benchmark Alg. Accuracy > baseline 2. Max ROM < baseline 3. Max RAM < baseline 4. Processor Cost Compare: how much energy a single benchmark task cost given picojoules per op 2 Bench19 Seaway tech. 26
Thanks Dong Li Seaway Technology Inc. lidong@haiwei.tech Bench19 Seaway tech. 27
Comparison Amazon Microsoft AliOS Things Seaway FreeRTOS ThreadX 授权方式 社区版开源 小部分开源 闭源 社区版开源 基础内核Footprint 8KB 8KB 10KB 8KB MCH综合栈, 各协议分立-80K 物端应用层协栈 MQTT协议栈-20K 专有协议-80K 32KB ML推理模型支持 - 支持 支持 支持 低功耗控制 - - 支持 支持(<0.1w) 边缘计算支持 - 支持 支持 支持 原生安全机制 - - - 支持 第三方应用支持 物云独立 物云一体 物云一体 端边云一体 IOT云服务 绑定阿里云 绑定AWS 绑定Azure 自由 AI数学库支持 - 至Cortex A级 - 至Cortex M级 2 Bench19 Seaway tech. 28
Recommend
More recommend