robox
play

RoboX An End-to-End Solution to Accelerate Autonomous Control in - PowerPoint PPT Presentation

RoboX An End-to-End Solution to Accelerate Autonomous Control in Robotics Alternative Computing Technologies (ACT) Lab Jacob Sacks Divya Mahajan Richard C. Lawson Georgia Institute of Technology Hadi Esmaeilzadeh University of


  1. RoboX An End-to-End Solution to Accelerate Autonomous Control in Robotics Alternative Computing Technologies (ACT) Lab Jacob Sacks Divya Mahajan Richard C. Lawson Georgia Institute of Technology Hadi Esmaeilzadeh† † University of California, San Diego ISCA ’18 Los Angeles, California

  2. Challenges in Autonomous Robotics Many diverse Battery applications constraints Limited power Compute-intensive budget

  3. Challenges in Autonomous Robotics Mobile CPU Processor Mobile Flight Time

  4. Challenges in Autonomous Robotics Flight Time Power

  5. Accelerating Planning and Control Model Predictive Control

  6. RoboX Workflow Domain-Specific Macro Dataflow Statically-Scheduled Graph Instructions Language System Quadrotor( ) { Computation state position[3], angle[3]; Schedule input torque[4]; ... Communication Task takeOff() { Schedule penalty target_height; Program Controller constraint max_height; Translator Compiler Memory ... } Schedule } Concise mathematical Automatically Statically schedule description synthesize DFG on accelerator

  7. Background: System Models yaw (ɸ) thrust (f 4 ) thrust (f 3 ) roll (ψ) thrust (f 1 ) thrust (f 2 ) pitch (θ)

  8. Background: Dynamics and Constraints General nonlinear dynamics yaw (ɸ) !̇ = $(!, ') thrust (f 3 ) thrust (f 4 ) roll (ψ) time inputs thrust (f 1 ) thrust (f 2 ) derivative states State and input constraints pitch (θ) ! ≤ ! # ≤ ! $ ≤ $ i

  9. Background: Objective Function

  10. Background: Objective Function

  11. Background: Objective Function % 1 ! = $ %&'( ) * + + - $ './ ) * 0* % 2 terminal cost running cost

  12. Components of MPC Model Predictive Control Input Constraints Objective Function Dynamics State Constraints

  13. Domain-Specific Language System Distill MPC into modular components Task Remain close to Aims of RoboX DSL mathematical expressions Symbolic expressions Group Independent of operations implementation

  14. DSL: System Component ang_vel ( ⍵ ) System MobileRobot( ) { state pos[2]; state angle; input vel; input ang_vel; (pos[0], pos[1]) … } z angle (θ) x y vel (v)

  15. DSL: System Component ang_vel ( ⍵ ) System MobileRobot( ) { state pos[2]; state angle; input vel; input ang_vel; (pos[0], pos[1]) pos[0].dt = vel * cos(angle); z pos[1].dt = vel * sin(angle); angle.dt = ang_vel; angle (θ) x y … } vel (v)

  16. DSL: Task Component System MobileRobot(...) { Task moveTo(…) { penalty target_x, target_y; target_x.running = pos[0] - desired_x; target_y.running = pos[1] - desired_y; …}}

  17. DSL: Task Component System MobileRobot(...) { Task moveTo(…) { penalty target_x, target_y; target_x.running = pos[0] - desired_x; target_y.running = pos[1] - desired_y; constraint pos_bound; pos_bound.running = sqrt(pos[0]ˆ2 + pos[1]ˆ2); pos_bound.upper_bound <= radius;}}

  18. RoboX Accelerator Architecture Flexible dataflow Compute CU CU CU Global µCode Bu ff er Global LD/ST Bu ff er Cluster 1 architecture Programmable Memory Access Engine Compute CU CU CU organized as a Cluster 2 two-level Shifter hierarchy to handle large Memory µCode Bus µCode Compute CU CU CU amount of data Cluster N-1 dependencies Compute CU CU CU Cluster N

  19. RoboX Accelerator Architecture Compute- enabled interconnect to perform simple operations on in-transit data

  20. RoboX Accelerator Architecture Comp CU CU CU µCode N 0 1 Bus µCode Each computer cluster executes separate compute and communication microprograms and can operate in a SIMD mode

  21. RoboX Accelerator Architecture CU CU CU 0 N 1 Compute units do not initiate communication requests but consume data from single-hop connections and a shared bus

  22. RoboX Accelerator Architecture Neighbor (Right) Neighbor (Left) Nonlinear State Buffer Nonlinear Input Buffer Gradient Buffer Hessian Buffer Interm Buffer The compute unit is a three-stage pipeline an divides its memory into separate buffers to simplify communication scheduling

  23. RoboX Accelerator Architecture Programmable Global µCode Bu ff er Global LD/ST Bu ff er memory access Programmable Memory Access Engine engine prefetches instructions and Shifter data according to its own statically- Memory µCode Bus µCode scheduled microprogram

  24. Instruction Set Architecture Scalar Compute Instructions SIMD Data Transfer Communication Instructions In-Network Load Memory Instructions Store

  25. Program Translator Domain-Specific Language States and inputs Dynamics function Objective function Parameterized Solver Template Automatic di ff erentiation for necessary gradients

  26. Controller Compiler Compute CU CU CU Global µCode Bu ff er Global LD/ST Bu ff er Cluster 1 Computation Instruction Programmable Memory Access Engine Schedule Compute CU CU CU Cluster 2 Communication Instruction Shifter Schedule Memory µCode Mapping and Decode Bus µCode Compute CU Memory Instruction CU CU Cluster N-1 Scheduling Schedule Compute CU CU CU Cluster N

  27. Benchmarks Name System Task Task # States MobileRobot Two-Wheel Mobile Robot Trajectory Tracking Manipulator Two-Link Manipulator Reaching AutoVehicle Four-Wheel Vehicle High-Speed Racing MicroSat Miniature Satellite Orbit Control Quadrotor Four-Rotor Micro UAV Motion Planning Hexacopter Six-Rotor Micro UAV Attitude Control

  28. Platforms Low Power ARM Cortex A57 CPU High Performance Intel Xeon E3 Low Power Tegra X2 Desktop Class GTX 650 Ti Tesla K40 High Performance

  29. Evaluation 79 X 65 X 40.0 X ARM 35.0 X Xeon 30.0 X Speedup RoboX 25.0 X 20.0 X 15.0 X 10.0 X 5.0 X 0.0 X MobileRobot AutoVehicle MicroSat Quadrotor Manipulator Hexacopter Geomean On average, RoboX achieves a 29.4X and 7.3X speedup over the ARM A57 and Xeon E3, respectively

  30. Evaluation 4.0 X GTX 650 Ti Tegra X2 3.5 X Tesla K40 RoboX 3.0 X Speedup 2.5 X 2.0 X 1.5 X 1.0 X 0.5 X 0.0 X MobileRobot AutoVehicle MicroSat Quadrotor Manipulator Hexacopter Geomean On average, RoboX achieves a 2.0X and 3.5X speedup over the GTX and Tegra, respectively, and is 1.3X slower than the Tesla

  31. Evaluation GTX 650 Ti Tegra X2 Tesla K40 RoboX Performance-per-Watt 100.0 X 10.0 X 1.0 X 0.1 X MobileRobot AutoVehicle MicroSat Quadrotor Manipulator Hexacopter Geomean On average, RoboX achieves a 65.5X, 7.9X, and 71.8X performance- per-watt improvement over the GTX, Tegra, and Tesla, respectively

  32. Conclusion Domain-general acceleration solution by leveraging algorithmic understanding of robotics Deliver significant performance and energy gains while abstracting away details of controls, optimization, and hardware First step towards enabling full-stack solutions for robotics from high-level mathematical specifications

Recommend


More recommend