Reinforcement learning with restrictions on the action set Reinforcement learning with restrictions on the action set Mario Bravo Universidad de Chile Joint work with Mathieu Faure (AMSE-GREQAM)
Reinforcement learning with restrictions on the action set Introduction Outline 1 Introduction 2 The Model 3 Main Result 4 Examples 5 Sketch of the Proof
Reinforcement learning with restrictions on the action set Introduction Motivation Most debated and studied learning procedure in game theory : Fictitious play [Brown51] R S P R 0 1 -1 -1 0 1 S 1 -1 0 P Consider an N -player normal form game which is repeated in discrete time. At each time, players compute a best response to the opponent’s empirical average play. The idea is to study the asymptotic behavior of the empirical frequency of play of player i , v i n .
Reinforcement learning with restrictions on the action set Introduction Motivation Large body of literature devoted to the question of identifying classes of games where the empirical frequencies of play converge to the set of Nash equilibria of the underlying game. Zero-sum games [Robinson 51] General (non-degenerate) 2 ⇥ 2 [Miyasawa 61] Potential games [Monderer and Shapley 96]
Reinforcement learning with restrictions on the action set Introduction Motivation Large body of literature devoted to the question of identifying classes of games where the empirical frequencies of play converge to the set of Nash equilibria of the underlying game. Zero-sum games [Robinson 51] General (non-degenerate) 2 ⇥ 2 [Miyasawa 61] Potential games [Monderer and Shapley 96] Recall that A game G = ( N , ( S i ) i 2 N , ( G i ) i 2 N ) is a potential game if it exists a function k =1 S k ! R such that Φ : Π N G i ( s i , s � i ) � G i ( r i , s � i ) = Φ ( s i , s � i ) � Φ ( r i , s � i ) , for all s i , r i 2 S i and s � i 2 S � i . Primary example : Congestion games [Rosenthal 73]
Reinforcement learning with restrictions on the action set Introduction Motivation Large body of literature devoted to the question of identifying classes of games where the empirical frequencies of play converge to the set of Nash equilibria of the underlying game. Zero-sum games [Robinson 51] General (non-degenerate) 2 ⇥ 2 [Miyasawa 61] Potential games [Monderer and Shapley 96] 2-player games where one of the players has only two actions [Berger 05] New proofs and generalizations using stochastic approximation techniques [Benaim et al 05, Hofbauer and Sorin 06] Several variations and applications in multiple domains (transportation, telecomunications, etc)
Reinforcement learning with restrictions on the action set Introduction Problem Players need a lot of information !
Reinforcement learning with restrictions on the action set Introduction Problem Players need a lot of information ! Three main assumptions are made here : (i) Each player knows the structure of the game, i.e. she knows her own payo ff function, so she can compute a best response.
Reinforcement learning with restrictions on the action set Introduction Problem Players need a lot of information ! Three main assumptions are made here : (i) Each player knows the structure of the game, i.e. she knows her own payo ff function, so she can compute a best response. (ii) Each player is informed of the action selected by her opponents at each stage ; thus she can compute the empirical frequencies
Reinforcement learning with restrictions on the action set Introduction Problem Players need a lot of information ! Three main assumptions are made here : (i) Each player knows the structure of the game, i.e. she knows her own payo ff function, so she can compute a best response. (ii) Each player is informed of the action selected by her opponents at each stage ; thus she can compute the empirical frequencies (iii) Each player is allowed to choose any action at each time, so that she can actually play a best response.
Reinforcement learning with restrictions on the action set Introduction Dropping (i) and (ii) One approach (among many others) is to assume that the agents observe only their realized payo ff at each stage. Payo ff function are unkown This is the minimal information framework of the so-called reinforcement learning procedures [Borgers and Sarin 97, Erev and Roth 98] Most work in this direction proceeds as follows : a) construct a sequence of mixed strategies which are updated taking into account the payo ff they receive (which is the only information agents have access to) b) Study the convergence (or non-convergence) of this sequence.
Reinforcement learning with restrictions on the action set Introduction Dropping (i) and (ii) One approach (among many others) is to assume that the agents observe only their realized payo ff at each stage. Payo ff function are unkown This is the minimal information framework of the so-called reinforcement learning procedures [Borgers and Sarin 97, Erev and Roth 98] A B C D E R ? ? ? ? ? S ? ? ? ? ? P ? ? ? ? ? Actions played : Actions played Payo ff received : Payo ff received :
Reinforcement learning with restrictions on the action set Introduction Dropping (i) and (ii) One approach (among many others) is to assume that the agents observe only their realized payo ff at each stage. Payo ff function are unkown This is the minimal information framework of the so-called reinforcement learning procedures [Borgers and Sarin 97, Erev and Roth 98] A B C D E R ? ? ? 1 ? S ? ? ? ? ? P ? ? ? ? ? Actions played : R Actions played : D Payo ff received : 1 Payo ff received : -1
Reinforcement learning with restrictions on the action set Introduction Dropping (i) and (ii) One approach (among many others) is to assume that the agents observe only their realized payo ff at each stage. Payo ff function are unkown This is the minimal information framework of the so-called reinforcement learning procedures [Borgers and Sarin 97, Erev and Roth 98] A B C D E R ? ? ? 1 ? S ? ? -1 ? ? P ? ? ? ? ? Actions played : R, S Actions played : D, C Payo ff received : 1, -1 Payo ff received : -1, 1
Reinforcement learning with restrictions on the action set Introduction Dropping (i) and (ii) One approach (among many others) is to assume that the agents observe only their realized payo ff at each stage. Payo ff function are unkown This is the minimal information framework of the so-called reinforcement learning procedures [Borgers and Sarin 97, Erev and Roth 98] A B C D E R ? ? ? 1 ? S ? 2 -1 ? ? P ? ? ? ? ? Actions played : R, S, S Actions played : D, C, B Payo ff received : 1, -1, 2 Payo ff received : -1, 1, -2
Reinforcement learning with restrictions on the action set Introduction Dropping (i) and (ii) One approach (among many others) is to assume that the agents observe only their realized payo ff at each stage. Payo ff function are unkown This is the minimal information framework of the so-called reinforcement learning procedures [Borgers and Sarin 97, Erev and Roth 98] A B C D E R ? ? ? 1 ? S ? 2 -1 ? ? P ? ? -10 ? ? Actions played : R, S, S, P Actions played : D, C, B, C Payo ff received : 1, -1, 2, -10 Payo ff received : -1, 1, -2, 10
Reinforcement learning with restrictions on the action set Introduction Dropping (i) and (ii) One approach (among many others) is to assume that the agents observe only their realized payo ff at each stage. Payo ff function are unkown This is the minimal information framework of the so-called reinforcement learning procedures [Borgers and Sarin 97, Erev and Roth 98] How do players use the available information ? Tipically, it is supposed that players are given a rule of behavior (a choice rule ) which depends on a state variable constructed by means of the aggregate information they gather.
Reinforcement learning with restrictions on the action set Introduction Dropping (iii) Players have restrictions on their action set, due to limited computational capacity or even to physical restrictions. Some hypotheses are needed regarding payers’ ability to explore their action set.
Reinforcement learning with restrictions on the action set Introduction Dropping (iii) Players have restrictions on their action set, due to limited computational capacity or even to physical restrictions. Some hypotheses are needed regarding payers’ ability to explore their action set. For example : R S P R 0 1 -1 S -1 0 1 P 1 -1 0 R S P This kind of restrictions were introduced recently by [Benaim and Raimond 10] in the fictitious play information framework.
Reinforcement learning with restrictions on the action set Introduction Our contribution In this work We drop all the three assumptions.
Reinforcement learning with restrictions on the action set The Model Outline 1 Introduction 2 The Model 3 Main Result 4 Examples 5 Sketch of the Proof
Reinforcement learning with restrictions on the action set The Model Setting Let G = ( N , ( S i ) i 2 N , ( G i ) i 2 N ) be a given finite normal form game i S i is the set of action profiles. S = Q ∆ ( S i ) is the mixed action set for player i , i.e 8 9 : � i 2 R | S i | : < � i ( s i ) = 1 , � i ( s i ) � 0 , 8 s i 2 S i = ∆ ( S i ) = X ; , s i 2 S i i ∆ ( S i ). and ∆ = Q As usual, we use the notation � i to exclude player i , namely S � i denotes j 6 = i S j and ∆ � i the set Q j 6 = i ∆ ( S i ). the set Q
Recommend
More recommend