An Architecture for Action Selection in Robotic Soccer Peter Stone Joint work with David McAllester
RoboCup � Use soccer as a rich and realistic test-bed An international AI and Robotics research initiative � Multiple teammates with a common goal � Multiple adversaries — not known in advance Research challenges � Real-time decision making necessary � Noisy sensors and actuators � Enormous state-space Slide # 2
CMUnited-99 � Stone, Riley, Veloso � 1999 simulator league world champions � 37-team field; Total score: 110–0 (8 games) � Learned low-level behaviors � Heuristic high-level action decision � Dribble; Shoot; Hold; Clear; Pass (10) Here: Improvements over CMUnited-99 Slide # 3
Outline � RoboCup simulator � Action Selection Architecture � Leading Passes � Force Field Control for Off-Ball Motion � Results Slide # 4
� Distributed: each player a separate client � Server models dynamics and kinematics RoboCup Simulator .. � Clients receive sensations, send actions Client 1 � Parametric actions: dash, turn, kick, say Cycle t-1 t t+1 t+2 Server � Abstract, noisy sensors, hidden state � Hear sounds from limited distance Client 2 � See relative distance, angle to objects ahead 9 states 10 � > 23 � Limited resources: stamina � Play occurs in real time ( � human parameters) Slide # 5
Outline � RoboCup simulator � Action Selection Architecture � Leading Passes � Force Field Control for Off-Ball Motion � Results Slide # 6
Motivation � v ( s ) � expected reward from state s (RL) Decisions based on a Value Function 0 0 when � P ( s j s; a ) � probability of outcome s a from s � Select option with highest X selecting option (action) 0 0 P ( s j s; a ) v ( s ) 0 s Slide # 7
Options � Execute the option with the highest score An option can be scored and executed � Scoring: � p � probability of success s � v ; v � values of succeeding, failing s f � Score: p v + (1 � p ) v s s s f � value function currently hand-written � Scoring across options must be comparable Slide # 8
Aside: Soft Boolean Expressions Æ � x < y 2 [0 ; 1℄ (continuous) Æ Avoid discontinuities x = y ) x < y = 1 = 2 Æ x << 0 ) x < y � 0 Æ x >> 1 ) x < y � 1 � � if ( p; x; y ) assumes p 2 [0 ; 1℄ � ( p; x; y ) � px + (1 � p ) y � Æ � Often write if ( x < y ; z ; w ) . if Slide # 9
� Consider hundreds of passes: o � angle increments of 4 Pass Option � speed increments of 0 : 2 m=se � I I � teammate (opponent) interception time t ( o ) � Approximate, fast computation � Score: larger margin ) larger p s � 5 p = if ( I < I ; : 9 ; 0) s t o � v s based on ball’s predicted location after pass � v = 0 f Slide # 10
Other Options � p I s related only to o � v >> 0 s Shot Option: kick towards a point in the goal � v = 0 f � p I s related only to o � v > 0 s Clear Option: kick the ball down the field � v = 0 f � Difficult to calibrate many Others: dribble, send, hold, cross, ... Slide # 11
Leading Passes � Usually a pass option is selected � Many leading passes seen CMUnited-99: only direct passes Now: hundreds considered Movement without the ball is also crucial � Forces over limited regions � Boundaries treated as hard constraints CMUnited-99: SPAR Slide # 12
Outline � RoboCup simulator � Action Selection Architecture � Leading Passes � Force Field Control for Off-Ball Motion � Results Slide # 13
Movement Off the Ball In principle: derivative of value function Here: vector sum of force fields B Offsides line C O B B S T d � distance of the player to the ball b B Teammate Opponent � 10 F � B + O + if ( d < 20 ; T + C ; S ) b Slide # 14
Force Fields B Offsides line C O B B S T B Teammate Opponent Bounds-Repellent (B): Stay on the field Offsides-Repellent (O): Stay on-sides Strategic (S): Stay about 20m from teammates Tactical (T): But not too close Get-clear (C): Move away from “key” defender Slide # 15
� Keepaway vs. CMUnited-99 Results – Goal: maintain possession � Possession time in 95% confidence intervals – No offensive or defensive reasoning x Position Program Possession Time Mean Ball CMUnited-99 5.7-6.6 sec -19.5 New Team 16.9-18.7 sec -33.6 Very insensitive to most parameters Slide # 16
S b : Force of unit magnitude towards the ball S Varying d : Force downfield S � : b , d , or b d S S , S + S S + S S + S + S � 10 � F � B + O + if ( d < 20 ; T + C ; S ) b x Position S b Program Possession Time Mean Ball S + S d CMUnited 5.7-6.6 -19.5 S + S b d 16.9-18.7 -33.6 S + S + S 24.8-27.9 -35.9 22.2-25.2 25.7 23.7-26.8 26.6 Slide # 17
Overall Results � CMUnited-99 vs. CMUnited-99: 0.3 – 0.3 � New Team vs. CMUnited-99: 2.5 – 0.3 � ATT-CMUnited-2000: 3rd place RoboCup-2000 Competition � Stone, Riley, McAllester, Veloso � Also included dynamic set plays � 35-team field; Total score: 26–11 (8 games) [Riley & Veloso, 2001] Slide # 18
� An option-based action-selection architecture Summary � Leading Passes in RoboCup soccer � Force Field Control for Off-Ball Motion � Samba [Riekki & Roenig, ’98] : force fields for Related Work � SPAR [Veloso et al., ’99] : limited regions, hard action selection constraints � Learn the option value functions using RL Future Work Slide # 19
Recommend
More recommend