RoboX An End-to-End Solution to Accelerate Autonomous Control in - - PowerPoint PPT Presentation

robox
SMART_READER_LITE
LIVE PREVIEW

RoboX An End-to-End Solution to Accelerate Autonomous Control in - - PowerPoint PPT Presentation

RoboX An End-to-End Solution to Accelerate Autonomous Control in Robotics Alternative Computing Technologies (ACT) Lab Jacob Sacks Divya Mahajan Richard C. Lawson Georgia Institute of Technology Hadi Esmaeilzadeh University of


  • RoboX An End-to-End Solution to Accelerate Autonomous Control in Robotics Alternative Computing Technologies (ACT) Lab Jacob Sacks Divya Mahajan Richard C. Lawson Georgia Institute of Technology Hadi Esmaeilzadeh† † University of California, San Diego ISCA ’18 Los Angeles, California

  • Challenges in Autonomous Robotics Many diverse Battery applications constraints Limited power Compute-intensive budget

  • Challenges in Autonomous Robotics Mobile CPU Processor Mobile Flight Time

  • Challenges in Autonomous Robotics Flight Time Power

  • Accelerating Planning and Control Model Predictive Control

  • RoboX Workflow Domain-Specific Macro Dataflow Statically-Scheduled Graph Instructions Language System Quadrotor( ) { Computation state position[3], angle[3]; Schedule input torque[4]; ... Communication Task takeOff() { Schedule penalty target_height; Program Controller constraint max_height; Translator Compiler Memory ... } Schedule } Concise mathematical Automatically Statically schedule description synthesize DFG on accelerator

  • Background: System Models yaw (ɸ) thrust (f 4 ) thrust (f 3 ) roll (ψ) thrust (f 1 ) thrust (f 2 ) pitch (θ)

  • Background: Dynamics and Constraints General nonlinear dynamics yaw (ɸ) !̇ = $(!, ') thrust (f 3 ) thrust (f 4 ) roll (ψ) time inputs thrust (f 1 ) thrust (f 2 ) derivative states State and input constraints pitch (θ) ! ≤ ! # ≤ ! $ ≤ $ i

  • Background: Objective Function

  • Background: Objective Function

  • Background: Objective Function % 1 ! = $ %&'( ) * + + - $ './ ) * 0* % 2 terminal cost running cost

  • Components of MPC Model Predictive Control Input Constraints Objective Function Dynamics State Constraints

  • Domain-Specific Language System Distill MPC into modular components Task Remain close to Aims of RoboX DSL mathematical expressions Symbolic expressions Group Independent of operations implementation

  • DSL: System Component ang_vel ( ⍵ ) System MobileRobot( ) { state pos[2]; state angle; input vel; input ang_vel; (pos[0], pos[1]) … } z angle (θ) x y vel (v)

  • DSL: System Component ang_vel ( ⍵ ) System MobileRobot( ) { state pos[2]; state angle; input vel; input ang_vel; (pos[0], pos[1]) pos[0].dt = vel * cos(angle); z pos[1].dt = vel * sin(angle); angle.dt = ang_vel; angle (θ) x y … } vel (v)

  • DSL: Task Component System MobileRobot(...) { Task moveTo(…) { penalty target_x, target_y; target_x.running = pos[0] - desired_x; target_y.running = pos[1] - desired_y; …}}

  • DSL: Task Component System MobileRobot(...) { Task moveTo(…) { penalty target_x, target_y; target_x.running = pos[0] - desired_x; target_y.running = pos[1] - desired_y; constraint pos_bound; pos_bound.running = sqrt(pos[0]ˆ2 + pos[1]ˆ2); pos_bound.upper_bound <= radius;}}

  • RoboX Accelerator Architecture Flexible dataflow Compute CU CU CU Global µCode Bu ff er Global LD/ST Bu ff er Cluster 1 architecture Programmable Memory Access Engine Compute CU CU CU organized as a Cluster 2 two-level Shifter hierarchy to handle large Memory µCode Bus µCode Compute CU CU CU amount of data Cluster N-1 dependencies Compute CU CU CU Cluster N

  • RoboX Accelerator Architecture Compute- enabled interconnect to perform simple operations on in-transit data

  • RoboX Accelerator Architecture Comp CU CU CU µCode N 0 1 Bus µCode Each computer cluster executes separate compute and communication microprograms and can operate in a SIMD mode

  • RoboX Accelerator Architecture CU CU CU 0 N 1 Compute units do not initiate communication requests but consume data from single-hop connections and a shared bus

  • RoboX Accelerator Architecture Neighbor (Right) Neighbor (Left) Nonlinear State Buffer Nonlinear Input Buffer Gradient Buffer Hessian Buffer Interm Buffer The compute unit is a three-stage pipeline an divides its memory into separate buffers to simplify communication scheduling

  • RoboX Accelerator Architecture Programmable Global µCode Bu ff er Global LD/ST Bu ff er memory access Programmable Memory Access Engine engine prefetches instructions and Shifter data according to its own statically- Memory µCode Bus µCode scheduled microprogram

  • Instruction Set Architecture Scalar Compute Instructions SIMD Data Transfer Communication Instructions In-Network Load Memory Instructions Store

  • Program Translator Domain-Specific Language States and inputs Dynamics function Objective function Parameterized Solver Template Automatic di ff erentiation for necessary gradients

  • Controller Compiler Compute CU CU CU Global µCode Bu ff er Global LD/ST Bu ff er Cluster 1 Computation Instruction Programmable Memory Access Engine Schedule Compute CU CU CU Cluster 2 Communication Instruction Shifter Schedule Memory µCode Mapping and Decode Bus µCode Compute CU Memory Instruction CU CU Cluster N-1 Scheduling Schedule Compute CU CU CU Cluster N

  • Benchmarks Name System Task Task # States MobileRobot Two-Wheel Mobile Robot Trajectory Tracking Manipulator Two-Link Manipulator Reaching AutoVehicle Four-Wheel Vehicle High-Speed Racing MicroSat Miniature Satellite Orbit Control Quadrotor Four-Rotor Micro UAV Motion Planning Hexacopter Six-Rotor Micro UAV Attitude Control

  • Platforms Low Power ARM Cortex A57 CPU High Performance Intel Xeon E3 Low Power Tegra X2 Desktop Class GTX 650 Ti Tesla K40 High Performance

  • Evaluation 79 X 65 X 40.0 X ARM 35.0 X Xeon 30.0 X Speedup RoboX 25.0 X 20.0 X 15.0 X 10.0 X 5.0 X 0.0 X MobileRobot AutoVehicle MicroSat Quadrotor Manipulator Hexacopter Geomean On average, RoboX achieves a 29.4X and 7.3X speedup over the ARM A57 and Xeon E3, respectively

  • Evaluation 4.0 X GTX 650 Ti Tegra X2 3.5 X Tesla K40 RoboX 3.0 X Speedup 2.5 X 2.0 X 1.5 X 1.0 X 0.5 X 0.0 X MobileRobot AutoVehicle MicroSat Quadrotor Manipulator Hexacopter Geomean On average, RoboX achieves a 2.0X and 3.5X speedup over the GTX and Tegra, respectively, and is 1.3X slower than the Tesla

  • Evaluation GTX 650 Ti Tegra X2 Tesla K40 RoboX Performance-per-Watt 100.0 X 10.0 X 1.0 X 0.1 X MobileRobot AutoVehicle MicroSat Quadrotor Manipulator Hexacopter Geomean On average, RoboX achieves a 65.5X, 7.9X, and 71.8X performance- per-watt improvement over the GTX, Tegra, and Tesla, respectively

  • Conclusion Domain-general acceleration solution by leveraging algorithmic understanding of robotics Deliver significant performance and energy gains while abstracting away details of controls, optimization, and hardware First step towards enabling full-stack solutions for robotics from high-level mathematical specifications