i nt roduct ion
play

I nt roduct ion So f ar we have st udied environment s where t - PDF document

I nt roduct ion So f ar we have st udied environment s where t here is only a single-agent Adversarial Search Today we look at what happens if we are in a set t ing where t her e ar e mult iple CS 486 / 686 agent s planning against


  1. I nt roduct ion • So f ar we have st udied environment s where t here is only a single-agent Adversarial Search • Today we look at what happens if we are in a set t ing where t her e ar e mult iple CS 486 / 686 agent s planning against each ot her May 19, 2005 – Game t heory: zero sum games Univer sit y of Wat erloo 1 2 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart Out line Games • Games are one of t he oldest , most well-st udied domains in AI • Games • Why? • Minimax search – They are f un • Evaluat ion f unct ions – Games are usually easy t o represent and t he rules are clear • Alpha-bet a pruning – St at e spaces can be very large (so more challenging t han “t oy problems”) • Coping wit h chance • I n chess t he search t ree has ~10 154 nodes • Game programs – Like t he “real world” in t hat decisions have t o be made and t ime is vit ally import ant – Easy t o det ermine when a program is doing well • i.e. it wins 3 4 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart Types of games Games as search problems • Consider a 2-player perf ect inf ormat ion game • Perf ect vs imperf ect inf or mat ion – State: board conf igurat ion plus t he player who’s – Perf ect inf o means t hat you can see t he t urn it is t o move ent ire st at e of t he game – Successor f unction: given a st at e ret urns a list of (move,st at e) pairs, indicat ing a legal move and t he – Chess, checkers, ot hello, go,… result ing board – I mperf ect inf o games include scrabble, – Terminal state: st at es where t here is a poker, most card games win/ loss/ draw • Det er minist ic vs st ochast ic – Utilit y f unct ion: assigns a numerical value t o t erminal st at es (e.g. I n chess +1 f or a win, -1 f or a – Chess is det erminist ic loss, 0 f or a draw) – Backgammon is st ochast ic – Solution : a st rat egy (way of picking moves) t hat wins t he game 5 6 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart 1

  2. Example: Tic-Tac-Toe Game search challenge MAX (X) • What makes game search challenging? X X X – There is an opponent ! MIN (O) X X X X X X – The opponent is malicious – it want s t o win (i.e. it is t rying t o make you lose) X O X O X . . . – We need t o t ake t his int o account when choosing MAX (X) O moves • Simulat e t he opponent ’s behaviour in our search X O X X O X O . . . MIN (O) X X MAX’s j ob is t o use • Not at ion: One player is called MAX (who t he search t ree t o want s t o maximize it s ut ilit y) and one player . . . . . . . . . . . . det ermine t he best is called MI N (who want s t o minimize it s move . . . X O X X O X X O X ut ilit y) TERMINAL O X O O X X O X X O X O O 7 8 Utility −1 0 +1 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart Opt imal st rat egies Opt imal st rat egies • I n st andar d search t he opt imal solut ion is • Want t o f ind t he opt imal st rat egy a sequence of moves leading t o a winning – One t hat leads t o out comes at least as good t erminal st at e as any ot her st rat egy, given t hat MI N is • But MI N has somet hing t o say about t his playing opt imally • Strategy (f rom MAX’s perspect ive): – Equilibr ium (game t heory) – Zero-sum games of perf ect inf ormat ion are – Specif y a move f or t he init ial st at e, specif y a “easy games” f rom a game t heoret ic move f or all possible st at es arising f rom perspect ive MI N’s response, t hen all possible responses t o all of MI N’s responses t o MAX’s previous move… .. 9 10 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart Minimax Value Minimax algorit hm MI NI MAX-VALUE(n) = Ut ilit y(n) if n is a t erminal st at e Max s ∈ Succ(n) MI NI MAX-VALUE(s) if n is a MAX node Min s ∈ Succ(n) MI NI MAX-VALUE(s) is n is a MI N node ply MAX 3 Ret urns act ion A corresponding a 2 a 1 a 3 t o best possible move B C D MIN 3 2 2 b b b c c c d d d 1 2 3 1 2 3 1 2 3 3 12 8 2 4 6 14 5 2 11 12 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart 2

  3. Propert ies of Minimax Propert ies of Minimax • Complet e if t ree is f init e • Complet e if t ree is f init e • Time complexit y: O(b m ) • Time complexit y: O(b m ) • Space complexit y: O(bm) (it is DFS) • Space complexit y: O(bm) (it is DFS) • Opt imal against an opt imal opponent • Opt imal against an opt imal opponent – I f MI N does not play opt imally t hen we might be able t o do bet t er f ollowing a dif f erent st rat egy m is dept h of t he t ree m is dept h of t he t ree 13 14 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart Minimax and mult i-player games to move A ( 1, 2, 6) • Can we now writ e a program t hat will play chess r easonably well? B ( 1, 2, 6) (−1, 5, 2) C ( 1, 2, 6) X ( 6, 1, 2) (−1, 5, 2) ( 5, 4, 5) A ( 1, 2, 6) ( 4, 2, 3) ( 6, 1, 2) ( 7, 4,−1) ( 5,−1,−1) (−1, 5, 2) (7, 7,−1) ( 5, 4, 5) Can not handle alliances, sidepayment s… . 15 16 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart Alpha-Bet a Pruning • No! • Can we now writ e a program t hat will play chess r easonably well? – I f we are smart (and lucky) we can do pruning – For chess b~35 and m~100 • Eliminat e large part s of t he t ree f rom – Do we really need t o look at all t hose considerat ion nodes? • Alpha-Bet a pruning applied t o a minimax t ree – Ret urns t he same decision as minimax – Prunes branches t hat cannot inf luence f inal decision 17 18 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart 3

  4. Alpha-Bet a Pruning Alpha-Bet a example • Alpha: [-inf, inf] MAX – Value of best (highest value) choice we have f ound so f ar on t he pat h f or MAX • Bet a: MI N [-inf, 3] – Value of best (lowest value) choice we have f ound so f ar on pat h f or MI N • Updat e alpha and bet a as sear ch cont inues • Prune as soon as t he value of t he current node is known t o be worse t han current alpha or 3 bet a values f or MAX or MI N 19 20 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart Alpha-Bet a example Alpha-Bet a example [-inf,inf] [3,inf] MAX MAX MI N MI N [-inf,3] [3,3] 3 12 3 12 8 21 22 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart Alpha-Bet a example Alpha-Bet a example [3,inf] [3,inf] MAX MAX MI N MI N [3,3] [-inf,2] [3,3] [-inf,2] P r une r emaining children 3 12 2 3 12 2 8 8 23 24 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart 4

  5. Alpha-Bet a example Alpha-Bet a example MAX [3,14] MAX [3,5] MI N MI N [-inf,14] [-inf,5] [3,3] [-inf,2] [3,3] [-inf,2] 3 12 2 14 3 12 2 14 5 8 8 25 26 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart Alpha-Bet a example Propert ies of Alpha-Bet a • Pruning does not af f ect t he f inal result [3,3] MAX – You prune part s of t he t ree t hat you would never reach in act ual play MI N [2,2] [3,3] [-inf,2] • The order in which moves are evaluat ed are import ant – Wit h bad move ordering will prune not hing – Wit h perf ect node ordering can reduce t ime complexit y t o O(b m/ 2 ) 2 3 12 2 14 5 8 27 28 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart Real-t ime decisions Evaluat ion f unct ions • Alpha-bet a can be a huge improvement • Apply an evaluat ion f unct ion t o a st at e over minimax – I f t erminal st at e, f unct ion ret urns act ual – St ill not good enough as we need t o search ut ilit y all t he way t o t erminal st at es f or at least – I f non-t erminal, f unct ion ret ur ns est imat e part of sear ch space of t he expect ed ut ilit y (i.e. t he chance of – Need t o make a decision about a move winning f rom t hat st at e) quickly – Funct ion must be f ast t o comput e • Heurist ic evaluat ion f unct ion + cut of f t est 29 30 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart 5

Recommend


More recommend