RoboCup Challenges Simulation League Small League Medium-sized - PDF document

Robotics (c) 2003 Thomas G. Dietterich 1 RoboCup Challenges • Simulation League • Small League • Medium-sized League (less interest) • SONY Legged League • Humanoid League (c) 2003 Thomas G. Dietterich 2 1

Small League • Overhead camera • Central controlling computer for each team • Fast and agile – Winning teams have had the best hardware (c) 2003 Thomas G. Dietterich 3 Small League: CMU vs. Cornell @ American Open (2003) (c) 2003 Thomas G. Dietterich 4 2

Medium-Sized League • Largest fully-autonomous robots • Has been plagued by hardware challenges (c) 2003 Thomas G. Dietterich 5 Humanoid League • Still demonstrating technology and skills (kicking, vision, localization) (c) 2003 Thomas G. Dietterich 6 3

Making Sense of Sensing • P(Image t | CameraPose t , World t ) • P(CameraPose t | BodyPose t ) • P(BodyPose t | BodyPose t-1 , Action t ) • P(World t | World t-1 ) A t • argmax Wt P(W t | I t ,A t ,)= B t-1 B t argmax Wt ∑ Wt-1,Ct,Ct-1,Bt,Bt-1 P(I t |W t ,C t ) · P(W t |W t-1 ) · P(C t | B t ) · P(B t | B t-1 ,A t ) = • argmax Wt ∑ Ct P(I t |W t ,C t ) · ∑ Wt-1 P(W t |W t- C t 1 ) · ∑ Bt P(C t | B t ) · ∑ Bt-1 P(B t | B t-1 ,A t ) I t W t-1 W t (c) 2003 Thomas G. Dietterich 11 The World • Locations (and orientations and velocities) of – self – ball – other players on same team – players on other team (c) 2003 Thomas G. Dietterich 12 6

Actions • Actions can be described at many levels of detail – low level actions: moving body joints – intermediate level actions: walking gaits, shooting and passing motions, localization motions, celebration dances • learned or programmed prior to the game – higher-level actions: “shoot on goal”, “pass to X”, “keep away from Y” • decisions are made at this level during the game (c) 2003 Thomas G. Dietterich 13 Choosing Actions to Maximize Utility • Markov Decision Process – Set of states X – Set of actions A – State transition function: P(X t | X t-1 ,A t ) – Reward function: R(X t-1 ,A t ,X t ) – Discount factor γ – Policy: π: X a A • maps from states to actions • Value of a policy: – E[R 1 + γ R 2 + γ 2 R 3 + L ] (c) 2003 Thomas G. Dietterich 14 7

The Reward Function • Overall Reward function R(X) – reward received when entering state X – example: scoring goal R = +1 – example: opponent scores: R = -1 – reward is zero most of the time. We say that reward is “delayed” (c) 2003 Thomas G. Dietterich 15 The Value Function • V π (X) is the expected discounted reward of being in state X and executing policy π . • V π (X) = ∑ X’ P(X’|X, π (X)) · [R(X, π (X),X’) + γ V π (X’)] 0.6 4.0 X 1 a 1 0.4 X 2 X -2.0 V π (X) = 0.3 · γ (-2) + 0.7 · γ (1.5) π 0.3 a 2 = 0.405 X 3 1.5 0.7 ( γ = 0.9) (c) 2003 Thomas G. Dietterich 16 8

Computing the Optimal Policy by Computing its Value Function • Let V*(X) denote the expected discounted reward of following the optimal policy, π *, starting in state X. V*(X) = max a ∑ X’ P(X’|X,a) [R(X,a,X’) + γ V*(X’)] Value Iteration: Initialize V(X) = 0 in all states X repeat until V converges: for each state X, compute V*(X) := max a ∑ S’ P(X’|X,a) [R(X,a,X’) + γ V*(X’)] (c) 2003 Thomas G. Dietterich 17 Computing the Optimal Policy from V* π * (X) := argmax a ∑ X’ P(X’|X,a) [R(X,a,X’) + γ V*(X’)] Perform a one-step lookahead, evaluate the resulting states X’ using V*, and choose the best action (c) 2003 Thomas G. Dietterich 18 9

Scale-up Problems • Value Iteration – Requires O(|X| |A| B) time, where B is the branching factor (number of states resulting from an action) – Not practical for more than 30,000 states – Not practical for continuous state spaces • Where do the probability distributions come from? (c) 2003 Thomas G. Dietterich 19 Reinforcement Learning • Learn the transition function and the reward function by experimenting with the environment • Perform value iteration to compute π * • Other methods compute V* or π * directly without learning P(X’|X,A) or R(X,A,X’) – Q learning – SARSA( λ ) (c) 2003 Thomas G. Dietterich 20 10

Scaling Methods • Value Function Approximation – Compact parameterizations of value functions (e.g., as linear, polynomial, or non-linear functions) • Policy Approximation – Compact representation of the policy – Gradient descent in “policy space” (c) 2003 Thomas G. Dietterich 21 Multiple Agents • The Markov Decision Process is a model of only a single agent, but robocup involves multiple cooperative and competitive agents • There is a separate reward function for each agent, but it depends on the actions of all of the other agents – R 1 (X, a 1 , …, a N , X’) – R 2 (X, a 1 , …, a N , X’) … – R N (X, a 1 , …, a N , X’) (c) 2003 Thomas G. Dietterich 22 11

Game Theory • Each agent (“player”) has a policy for choosing actions • The combination of policies results in a value function for each player • Each player seeks to optimize his/her own value function • Stable solutions: Nash Equilibrium – Each player’s current policy is a local optimum if all of the other players’ policies are kept fixed – Each player has no incentive to change • Computing Nash Equilibria in general is a research problem, although there are special cases where solutions are known. (c) 2003 Thomas G. Dietterich 23 Stochastic Policies • In games, the optimal policy may be stochastic (i.e., actions are chosen according to a probability distribution) – π (X,A) = probability of choosing action A in state X • Example: Rock, Paper, Scissors – Nash equilibrium: choose randomly among the three actions (c) 2003 Thomas G. Dietterich 24 12

How to choose actions when you don’t know your opponent’s policy • Consider one or more policies that your opponent is likely to play • Design a policy that works well against all of them (c) 2003 Thomas G. Dietterich 25 The Segway League? (c) 2003 Thomas G. Dietterich 26 13

Non-Mobile Robot Motion Planning • Industrial robot arms – Degrees of Freedom (one for each independent direction in which a robot or one of its effectors can move) How many P (internal) R R degrees of freedom does R R this arm have? R (c) 2003 Thomas G. Dietterich 27 Kinematics and Dynamics • Kinematic State – joint angle of each joint • Dynamic State – Kinematic State + velocities and accelerations of each joint (c) 2003 Thomas G. Dietterich 28 14

Holonomic vs. Non-Holonomic • Automobile (on a plane) – 3 degrees of freedom (x,y, θ ) – only 2 controllable degrees of freedom • wheels and steering • Holonomic: number of degrees of freedom = number of controllable degrees of freedom – easier to control, often more expensive • Non-Holonomic: degrees of freedom > controllable degrees of freedom (c) 2003 Thomas G. Dietterich 29 Path Planning • Want to move robot arm from one location (conf-1) to another (conf-2) (c) 2003 Thomas G. Dietterich 30 15

Two Different Coordinate Systems • Locations can be specified in two different coordinate systems – Workspace Coordinates • position of end-effector (x,y,z) and possibly its orientation (roll,pitch,yaw) – Joint Coordinates • angle of each joint (c) 2003 Thomas G. Dietterich 31 Configuration Space (C-Space) (c) 2003 Thomas G. Dietterich 32 16

Forward and Inverse Kinematics • Forward Kinematics – Given joint angles compute workspace coordinates • easy • Inverse Kinematics – Given workspace coordinates compute joint angles • hard: may exist multiple solutions (often infinitely many) • Path planning involves – finding a path • easy to do in joint angle space – avoiding obstacles • easy to do in workspace (c) 2003 Thomas G. Dietterich 33 Computing Obstacle Representations in C-Space • Must convert each obstacle from a region of workspace to a region in configuration space • Often done by sampling – generate grid of points in C-space – test if corresponding point is occupied by obstacle • Interesting computational geometry challenge (c) 2003 Thomas G. Dietterich 34 17

Path Planning in Configuration Space • Cell Decomposition Methods • Potential Field Methods • Voronoi Graph Methods • Probabilistic Roadmap Methods • key problem: C-Space is continuous! (c) 2003 Thomas G. Dietterich 35 Cell Decomposition • Define a grid of cells for free space – A path consists of a sequence of cells – Legal moves: go from center of one cell to center of 8 neighboring cells: • Converts path planning to discrete search problem (use A* or Value Iteration) (c) 2003 Thomas G. Dietterich 36 18

RoboCup Challenges Simulation League Small League Medium-sized - PDF document

Robotics (c) 2003 Thomas G. Dietterich 1 RoboCup Challenges Simulation League Small League Medium-sized League (less interest) SONY Legged League Humanoid League (c) 2003 Thomas G. Dietterich 2 1 Small League

RoboCup Overview Oskar von Stryk, TU Darmstadt www.robocup.org 1 History 1996: RoboCup created

RoboCup Rescue Simulation League CSU_Yunlu From Central South University Participated in Robocup

Robocup 2019 HELLO! Welc lcome ome to Robocup 2019 Sydney | Australia MRL-RSL 2 Overview

League Talk Johannes Pellenz RoboCup Rescue Exec Disaster Is the building still safe? Victims?

RoboCup Multiagent Systems Daniel Polani Adaptive Systems Research Group School of Computer

The RoboCup 2013 Drop-In Player Challenges: Experiments in Ad Hoc Teamwork Patrick MacAlpine,

Robocup SPL Standard Platform League Standard Platform Aldebaran Nao 25 Joints 2 Cameras

MIC Independent RoboCup Rescue Simulation League 2018 M I C Introduction Agents

RoboCup Rescue Simulation League 2019 Introduction Last year: This year: K-means++

TDP: Agent Simulation AIT-Rescue 1 1 1 Taishun Kusaka , Yuki Miyamoto , Akira Hasegawa 2 1

HistogramBased Visual Object Recognition for the 2007 FourLegged RoboCup League Souzana

Pose Estimation for Robotic Soccer Players in the Context of RoboCup Judith Hartfill University

Fast and Precise Black and White Ball Detection for RoboCup Soccer Jacob Menashe, Josh Kelle,

A Framework for Advanced Robot Programming in the RoboCup Domain Hayato Kobayashi 1 , Akira Ishino

Adaptive Strategies in RoboCup Soccer University of Hamburg Faculty of Mathematics, Informatics

Humanoid Robot Soccer 101 Thomas Rfer Cyber-Physical Systems German Research Center for

On the universal optimality of the 600 -cell: the Levenshtein framework lifted Peter Dragnev

AMMI Introduction to Deep Learning 11.2. LSTM and GRU Fran cois Fleuret

A sequent calculus for opetopes CT 2019 Pierre-Louis Curien 1 Cdric Ho Thanh 1 Samuel Mimram 2

Cell structures for finite subset spaces Christopher Tuffley Institute of Fundamental Sciences

Automatic Cell Layout in the 7nm Era Pascal Cremer , Stefan Hougardy, Jan Schneider, and Jannik

What Will We Do About the Boys? Providing Academic and Social Support to African American and

Routability-Driven and Fence-Aware Legalization for Mixed-Cell-Height Circuits Haocheng Li 1 ,

Connectivity of random irrigation networks Nicolas Broutin, Luc Devroye, abor Lugosi September

Sambuz

Useful Links

Newsletter

Mail Us