RoboX An End-to-End Solution to Accelerate Autonomous Control in Robotics Alternative Computing Technologies (ACT) Lab Jacob Sacks Divya Mahajan Richard C. Lawson Georgia Institute of Technology Hadi Esmaeilzadeh† † University of California, San Diego ISCA ’18 Los Angeles, California
Challenges in Autonomous Robotics Many diverse Battery applications constraints Limited power Compute-intensive budget
Challenges in Autonomous Robotics Mobile CPU Processor Mobile Flight Time
Challenges in Autonomous Robotics Flight Time Power
Accelerating Planning and Control Model Predictive Control
RoboX Workflow Domain-Specific Macro Dataflow Statically-Scheduled Graph Instructions Language System Quadrotor( ) { Computation state position[3], angle[3]; Schedule input torque[4]; ... Communication Task takeOff() { Schedule penalty target_height; Program Controller constraint max_height; Translator Compiler Memory ... } Schedule } Concise mathematical Automatically Statically schedule description synthesize DFG on accelerator
Background: System Models yaw (ɸ) thrust (f 4 ) thrust (f 3 ) roll (ψ) thrust (f 1 ) thrust (f 2 ) pitch (θ)
Background: Dynamics and Constraints General nonlinear dynamics yaw (ɸ) !̇ = $(!, ') thrust (f 3 ) thrust (f 4 ) roll (ψ) time inputs thrust (f 1 ) thrust (f 2 ) derivative states State and input constraints pitch (θ) ! ≤ ! # ≤ ! $ ≤ $ i
Background: Objective Function
Background: Objective Function
Background: Objective Function % 1 ! = $ %&'( ) * + + - $ './ ) * 0* % 2 terminal cost running cost
Components of MPC Model Predictive Control Input Constraints Objective Function Dynamics State Constraints
Domain-Specific Language System Distill MPC into modular components Task Remain close to Aims of RoboX DSL mathematical expressions Symbolic expressions Group Independent of operations implementation
DSL: System Component ang_vel ( ⍵ ) System MobileRobot( ) { state pos[2]; state angle; input vel; input ang_vel; (pos[0], pos[1]) … } z angle (θ) x y vel (v)
DSL: System Component ang_vel ( ⍵ ) System MobileRobot( ) { state pos[2]; state angle; input vel; input ang_vel; (pos[0], pos[1]) pos[0].dt = vel * cos(angle); z pos[1].dt = vel * sin(angle); angle.dt = ang_vel; angle (θ) x y … } vel (v)
DSL: Task Component System MobileRobot(...) { Task moveTo(…) { penalty target_x, target_y; target_x.running = pos[0] - desired_x; target_y.running = pos[1] - desired_y; …}}
DSL: Task Component System MobileRobot(...) { Task moveTo(…) { penalty target_x, target_y; target_x.running = pos[0] - desired_x; target_y.running = pos[1] - desired_y; constraint pos_bound; pos_bound.running = sqrt(pos[0]ˆ2 + pos[1]ˆ2); pos_bound.upper_bound <= radius;}}
RoboX Accelerator Architecture Flexible dataflow Compute CU CU CU Global µCode Bu ff er Global LD/ST Bu ff er Cluster 1 architecture Programmable Memory Access Engine Compute CU CU CU organized as a Cluster 2 two-level Shifter hierarchy to handle large Memory µCode Bus µCode Compute CU CU CU amount of data Cluster N-1 dependencies Compute CU CU CU Cluster N
RoboX Accelerator Architecture Compute- enabled interconnect to perform simple operations on in-transit data
RoboX Accelerator Architecture Comp CU CU CU µCode N 0 1 Bus µCode Each computer cluster executes separate compute and communication microprograms and can operate in a SIMD mode
RoboX Accelerator Architecture CU CU CU 0 N 1 Compute units do not initiate communication requests but consume data from single-hop connections and a shared bus
RoboX Accelerator Architecture Neighbor (Right) Neighbor (Left) Nonlinear State Buffer Nonlinear Input Buffer Gradient Buffer Hessian Buffer Interm Buffer The compute unit is a three-stage pipeline an divides its memory into separate buffers to simplify communication scheduling
RoboX Accelerator Architecture Programmable Global µCode Bu ff er Global LD/ST Bu ff er memory access Programmable Memory Access Engine engine prefetches instructions and Shifter data according to its own statically- Memory µCode Bus µCode scheduled microprogram
Instruction Set Architecture Scalar Compute Instructions SIMD Data Transfer Communication Instructions In-Network Load Memory Instructions Store
Program Translator Domain-Specific Language States and inputs Dynamics function Objective function Parameterized Solver Template Automatic di ff erentiation for necessary gradients
Controller Compiler Compute CU CU CU Global µCode Bu ff er Global LD/ST Bu ff er Cluster 1 Computation Instruction Programmable Memory Access Engine Schedule Compute CU CU CU Cluster 2 Communication Instruction Shifter Schedule Memory µCode Mapping and Decode Bus µCode Compute CU Memory Instruction CU CU Cluster N-1 Scheduling Schedule Compute CU CU CU Cluster N
Benchmarks Name System Task Task # States MobileRobot Two-Wheel Mobile Robot Trajectory Tracking Manipulator Two-Link Manipulator Reaching AutoVehicle Four-Wheel Vehicle High-Speed Racing MicroSat Miniature Satellite Orbit Control Quadrotor Four-Rotor Micro UAV Motion Planning Hexacopter Six-Rotor Micro UAV Attitude Control
Platforms Low Power ARM Cortex A57 CPU High Performance Intel Xeon E3 Low Power Tegra X2 Desktop Class GTX 650 Ti Tesla K40 High Performance
Evaluation 79 X 65 X 40.0 X ARM 35.0 X Xeon 30.0 X Speedup RoboX 25.0 X 20.0 X 15.0 X 10.0 X 5.0 X 0.0 X MobileRobot AutoVehicle MicroSat Quadrotor Manipulator Hexacopter Geomean On average, RoboX achieves a 29.4X and 7.3X speedup over the ARM A57 and Xeon E3, respectively
Evaluation 4.0 X GTX 650 Ti Tegra X2 3.5 X Tesla K40 RoboX 3.0 X Speedup 2.5 X 2.0 X 1.5 X 1.0 X 0.5 X 0.0 X MobileRobot AutoVehicle MicroSat Quadrotor Manipulator Hexacopter Geomean On average, RoboX achieves a 2.0X and 3.5X speedup over the GTX and Tegra, respectively, and is 1.3X slower than the Tesla
Evaluation GTX 650 Ti Tegra X2 Tesla K40 RoboX Performance-per-Watt 100.0 X 10.0 X 1.0 X 0.1 X MobileRobot AutoVehicle MicroSat Quadrotor Manipulator Hexacopter Geomean On average, RoboX achieves a 65.5X, 7.9X, and 71.8X performance- per-watt improvement over the GTX, Tegra, and Tesla, respectively
Conclusion Domain-general acceleration solution by leveraging algorithmic understanding of robotics Deliver significant performance and energy gains while abstracting away details of controls, optimization, and hardware First step towards enabling full-stack solutions for robotics from high-level mathematical specifications
Recommend
More recommend