Multi-agent learning Emergence of Conventions Multi-agent learning Emergence of Conventions Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 1
Multi-agent learning Emergence of Conventions Motivation Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 2
Multi-agent learning Emergence of Conventions Simple example of a Markov process • Return probabilities are usually omitted in diagrams. • In this case it can be derived that, on average, � P ( Sun ) = 6/7 P ( Rain ) = 1/7 • How? We’ll see . . . Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 3
Multi-agent learning Emergence of Conventions Plan for today 1. Markov processes. (Ergodic process, communicating states/class, transient state/class, recurrent state/class, periodic state/class, absorbing state, irreducible process, stationary distribution.) Compute stationary distributions: • Solve n linear equations. • Compare n so-called z -trees (Freidlin and Wentzell, 1984). 2. Perturbed Markov processes. (Regular perturbed Markov process, punctuated equilibrium, stochastically stable state.) Compute stochastically stable states: • Compare k so-called z -trees, where k is the number of so-called recurrent classes (Peyton Young, 1993). Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 4
Multi-agent learning Emergence of Conventions Plan for today 3. Applications. • Emergence of a currency standard. • Competing technologies: operating system A vs. operating system B . • Competing technologies: cell phone company A vs. cell phone company B . (If time allows.) • Schelling’s model of segregation (1969). Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 5
Multi-agent learning Emergence of Conventions P art 1: M arkov processes Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 6
Multi-agent learning Emergence of Conventions State transitions Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 7
Multi-agent learning Emergence of Conventions Communication classes Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 8
Multi-agent learning Emergence of Conventions Start state matters Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 9
Multi-agent learning Emergence of Conventions Start state matters. . . but here it does not Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 10
Multi-agent learning Emergence of Conventions The stationary distribution (and computing one) P ( A ) = P ( A | A ′ ) P ( A ′ ) + P ( A | B ′ ) P ( B ′ ) + P ( A | C ′ ) P ( C ′ ) + P ( A | D ′ ) P ( D ′ ) Let us assume that visiting probabilities are stationary ( A = A ′ , B = B ′ , . . . ): = P ( A | A ) P ( A ) + P ( A | B ) P ( B ) + P ( A | C ) P ( C ) + P ( A | D ) P ( D ) = 0 · P ( A ) + 0 · P ( B ) + 1 · P ( C ) + 0 · P ( D ) = P ( C ) Let us write this as A = C . Similarly, B = 0.8 A , C = D , and D = 0.2 A + B . Four equations with four unknowns. (Always regular, i.e. Det � = 0 ?) Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 11
Multi-agent learning Emergence of Conventions Theory of discrete Markov processes Facts: Definitions: • Stationary distribution: fixed point • Node is recurrent: process will of transition probabilities. return to it a.s. • Empirical distribution: long run • If finite number of states: normalised frequency of visits. – At least one recurrence class. • Limit distribution: long run – If precisely one recurrence class probability to visit a node. then ergodic, and conversely. • Process is path-dependent: • Stationary distribution always empirical distribution depends on exists. start state. Ergodic otherwise. Unique iff ergodic. In that case, • Class is recurrent: process cannot stationary distr. ≡ empirical distr. escape. Transient otherwise. • If ergodic and a-periodic, then • Process is irreducible: all states can stationary distr. ≡ limit distr. reach each other. Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 12
Multi-agent learning Emergence of Conventions Finding stationary distributions with many states is difficult • Solve n equations in n unknowns. What if S is large? 0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.3 0.0 0.2 0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.3 0.0 0.2 0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.3 0.0 0.2 0.0 0.1 0.1 0.2 0.0 0.1 0.0 0.3 0.0 0.2 0.5 0.2 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.2 0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.3 0.0 0.2 0.0 0.1 0.1 0.2 0.0 0.1 0.0 0.3 0.0 0.2 0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.3 0.0 0.2 0.3 0.1 0.2 0.0 0.1 0.0 0.0 0.0 0.3 0.0 0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.3 0.0 0.2 • Freidlin & Wentzell (1984): only look at so-called state trees. Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 13
Multi-agent learning Emergence of Conventions An irreducible (and finite) Markov process Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 14
Multi-agent learning Emergence of Conventions One possible A -tree Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 15
Multi-agent learning Emergence of Conventions Another possible A -tree Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 16
Multi-agent learning Emergence of Conventions A perhaps easier way to compute the stationary distribution • An s -tree, T s , is a complete collection of disjoint paths from states � = s to s . • The likelihood of an s -tree T s , written ℓ ( T s ) , = Def the product of its edge probabilities. • The likelihood of a state s , written ℓ ( s ) , = Def sum of the likelihood of all s -trees. Theorem (Freidlin & Wentzell, 1984). Let P be an irreducible finite Markov process. Then, for all states, the likelihood of that state is proportional to the stationary probability of that state. Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 17
Multi-agent learning Emergence of Conventions Counting s -trees with Freidlin & Wentzell: example Freidlin & Wentzell (1984): v ( s ) v ( t ) = Def ∑ µ ( s ) = ℓ ( T s ) ∑ t ∈ S v ( t ) , where T ∈ T s The unique C -tree is coloured red. Computing ℓ ( T C ) = 10 ǫ · 1/4 · . . . = 5 ǫ 3 /12. Similarly: State: A B C D E F G ǫ 2 /24 5 ǫ 3 /9 5 ǫ 3 /12 5 ǫ 2 /24 ǫ 2 /24 Distribution: ǫ /48 ǫ /32 Note what happens if ǫ → 0. Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 18
Multi-agent learning Emergence of Conventions P art 2: P erturbed M arkov processes Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 19
Multi-agent learning Emergence of Conventions Motivation Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 20
Multi-agent learning Emergence of Conventions Most Markov processes are path-dependent (non-ergodic) Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 21
Multi-agent learning Emergence of Conventions Make them ergodic by perturbing with ǫ r ( s , s ′ ) here and there Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 22
Multi-agent learning Emergence of Conventions Compute s -trees from P 0 -recurrent classes only (!) Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 23
Multi-agent learning Emergence of Conventions Compute s -trees from P 0 -recurrent classes only (!) Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 24
Multi-agent learning Emergence of Conventions Class { B , D , E } possesses lowest stochastic potential, viz. 4 . Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 25
Multi-agent learning Emergence of Conventions Example of P 0 and P ǫ 0.0 0.2 0.2 0.1 0.5 0.0 0.2 0.2 0.1 0.5 ǫ 7 0.5 − ǫ 7 0.3 0.1 0.1 0.3 0.0 0.1 0.1 0.5 0.1 0.2 0.2 0.0 0.5 0.1 0.2 0.2 0.0 0.5 = lim 0.7 0.1 0.2 0.0 0.0 0.7 0.1 0.2 0.0 0.0 ǫ → 0 0.2 − ǫ 2 /2 ǫ 2 0.5 − ǫ 2 /2 0.1 0.2 0.1 0.2 0.2 0.0 0.5 0.0 0.0 0.1 0.0 0.9 0.0 0.0 0.1 0.0 0.9 • Notice that some P 0 -positive probabilities “have to give way” to perturbe P 0 -zero probabilities with ǫ . (Because row probabilities must add up to 1.) Last modified on March 29 th , 2011 at 11:53 Gerard Vreeswijk. Slide 26
Recommend
More recommend