Formalizing Connections Between Motion Planning and Machine Learning Siddhartha Srinivasa Boeing Endowed Professor University of Washington 1
Problems I Want You to Solve So I can Retire Siddhartha Srinivasa Retired Boeing Endowed Professor University of Washington 2
Motion Planning
6
7
8
Motion Planning is a technology
10-100X Improvement
The Piano Movers’ Problem On the Piano Movers problem. I-III , Schwartz and Sharir, Comm. on Pure and Applied Math., 1983
Roadmaps Build Roadmap Plan on Roadmap Plan on Roadmap Probabilistic roadmaps for path planning in high-dimensional configuration spaces , Kavraki et al., IEEE TRO, 1996.
A* Search
A* Search OPTIMAL!! Is it optimal over something we care about?
A* Search: A Personal Journey Search for Optimal Solutions: the Heart of Heuristic Search is Still Beating Ariel Felner ISE Department Ben-Gurion University ISRAEL felner@bgu.ac.il 1
16
A* Search: A Personal Journey
A* Search: Amoebas! Optimal Substructure f ( a ) < f ( b ) ⟹ f ( a ∘ x ) < f ( b ∘ x ) ∀ x You will never catch up. Bellman Condition f *( a ) = min x ∈ succ { c ( a , x ) + f *( b )} Be best, locally. Bacteria Vectors by Vecteezy
A* Search: Favoritism Optimism in the Face of Uncertainty (OFU) x ∈ open g ( x ) + h ( x ) min Always be optimistic under uncertainty. You’ll either be correct, or learn something important if you’re wrong. R-MAX: A general polynomial time algorithm for near-optimal reinforcement learning, Brafman and Tennenholtz, JMLR, 2002.
A* Search is Optimal … Expands the Fewest Number of Vertices But is this what we really want in Motion Planning?
Edge Evaluation Dominates Planning Time Edge Evaluations Other Amoebas are Cheap Slime is Expensive Lazy collision checking in asymptotically-optimal motion planning, Hauser, ICRA 2015.
Is there a Search Algorithm that Minimizes the Number of Edge Evaluations? I don’t care about amoebas. What algorithm minimizes slime? LazySP ICAPS 2018 [Best Conference Paper Award Winner] First Provably Edge-Optimal A*-like Search Algorithm The Provable Virtue of Laziness in Motion Planning, Hagtalab et al., ICAPS 2018.
LazySP Greedy Best-first Search over Paths To find the shortest path, eliminate all shorter paths!
LazySP OFU on Steroids! Graph, start, goal, lazy estimates Lazy search for shortest path P Update the graph Evaluate Path Collision Free P
LazySP OFU on Steroids! Graph, start, goal, lazy estimates Lazy search for shortest path P Update the graph Evaluate Path Collision Free P Send out the Ghost Amoebas
LazySP OFU on Steroids! Graph, start, goal, lazy estimates Lazy search for shortest path P Update the graph Evaluate Path Collision Free P Only Slime Known Shortest Paths
LazySP OFU on Steroids! Graph, start, goal, lazy estimates Lazy search for shortest path P Update the graph Evaluate Path Collision Free P Only Slime Known Shortest Paths
LazySP OFU on Steroids! Graph, start, goal, lazy estimates Lazy search for shortest path Update the graph P Evaluate Path Collision Free P Only Slime Known Shortest Paths
LazySP OFU on Steroids! Graph, start, goal, lazy estimates Lazy search for shortest path P Update the graph Evaluate Path Collision Free P Send out the Ghost Amoebas
LazySP OFU on Steroids! Graph, start, goal, lazy estimates Lazy search for shortest path P Update the graph Evaluate Path Collision Free P Only Slime Known Shortest Paths
LazySP OFU on Steroids! Graph, start, goal, lazy estimates Lazy search for shortest path P Update the graph Evaluate Path Collision Free P Only Slime Known Shortest Paths
LazySP OFU on Steroids! Graph, start, goal, lazy estimates Lazy search for shortest path P Update the graph Evaluate Path Collision Free P Only Slime Known Shortest Paths
LazySP OFU on Steroids! Graph, start, goal, lazy estimates Lazy search for shortest path P Update the graph Evaluate Path Collision Free P Only Slime Known Shortest Paths
LazySP OFU on Steroids! Graph, start, goal, lazy estimates Lazy search for shortest path P Update the graph Evaluate Path Collision Free P Only Slime Known Shortest Paths
LazySP OFU on Steroids! Graph, start, goal, lazy estimates Lazy search for shortest path P Update the graph Evaluate Path Collision Free P Only Slime Known Shortest Paths
LazySP OFU on Steroids! Graph, start, goal, lazy estimates Lazy search for shortest path P Update the graph Evaluate Path Collision Free P Only Slime Known Shortest Paths
LazySP OFU on Steroids! Graph, start, goal, lazy estimates Lazy search for shortest path P Update the graph Evaluate Path Collision Free P Optimal Slime!
LazySP OFU on Steroids! Graph, start, goal, lazy estimates Lazy search for shortest path P Update the graph Evaluate Path Collision Free P + =
Edge Selectors Forward (first unevaluated edge) Reverse (last unevaluated edge) Alternate (alternate Forward and Reverse) Bisect (furthest from an unevaluated edge)
The Realizability Assumption Forward Can we Learn to Hypothesis Class Imitate the Oracle? All LazySP Selectors Oracle Leveraging experience in lazy search, Bhardwaj et al., RSS 2019. Alternate The Oracle is a LazySP Selector! The Provable Virtue of Laziness in Motion Planning, Hagtalab et al., ICAPS 2018.
Is there a Search Algorithm that Minimizes the Number of Edge Evaluations? LazySP ICAPS 2018 [Best Conference Paper Award Winner] First Provably Edge-Optimal A*-like Search Algorithm
Anytime Motion Planning Feasible Path Solution Cost Shortest Path Computation Time 46
Anytime Motion Planning Solution Cost Computation Time 47
Will it converge to the shortest path? Solution Cost Computation Time 48
Beyond Asymptotic Optimality Solution Cost Computation Time 49
Beyond Asymptotic Optimality Solution Cost Time to Initial Path Computation Time 50
Beyond Asymptotic Optimality Solution Cost Suboptimality Gap Time to Initial Path Time Budget Computation Time 51
We formalize anytime search as Bayesian Reinforcement Learning Posterior Sampling for Anytime Motion Planning on Graphs with Expensive-to-Evaluate Edges, Hou et al., ICRA 2020. 52
Bayesian Anytime Motion Planning • Evaluating edges uncovers shorter paths • Anytime Objective : cumulative path lengths • Given prior on collision statuses • Bayesian Anytime Objective : • Bayesian planning algorithm uses edge evaluation history to compute collision posterior 53
The Experienced Piano Movers’ Problem New Piano. New House. Same Mover.
Bayesian Anytime Motion Planning as Bayesian Reinforcement Learning • Equivalence to episodic Bayesian RL [Osband et al, 2013] • Infer unknown MDP through repeated episodes Minimizing Bayesian regret is equivalent to minimizing the Bayesian anytime planning objective! Extending rapidly-exploring random trees for asymptotically optimal “no regret” is equivalent to asymptotic optimality anytime motion planning, Abbasi-Yadkori et al., IROS 2010. 55
Experienced Lazy Path Search Proposer Posterior Path Validator Feasible Path Evaluated edge statuses 56
The Posterior Sampling Proposer Proposer Posterior • Posterior Sampling for Motion Planning (PSMP): propose paths according to probability they are optimal Validator • Idea from multi-armed bandits (as Thompson sampling), Posterior Sampling for RL (More) efficient reinforcement learning via posterior sampling, Osband et al., N*IPS 2013. 57
The Posterior Sampling Proposer Proposer Posterior • Posterior Sampling for Motion Planning (PSMP): propose paths according to probability they are optimal Validator • Idea from multi-armed bandits (as Thompson sampling), Posterior Sampling for RL [Osband et al, 2013] • First anytime motion planning algorithm with Bayesian regret bounds • Analysis adapts [Osband et al, 2013] for deterministic MDPs • Bound of matches known lower bounds (More) efficient reinforcement learning via posterior sampling, Osband et al., N*IPS 2013. 58
The Posterior Sampling Proposer Proposer Posterior • Posterior Sampling for Motion Planning (PSMP): propose paths according to probability they are optimal Validator • Idea from multi-armed bandits (as Thompson sampling), Posterior Sampling for RL [Osband et al, 2013] • First anytime motion planning algorithm with Bayesian regret bounds • Analysis adapts [Osband et al, 2013] for deterministic MDPs • Bound of matches known lower bounds • Solves one shortest path problem per proposal (More) efficient reinforcement learning via posterior sampling, Osband et al., N*IPS 2013. 59
Recommend
More recommend