2
3
Markov Decision Process r k+1 s k+1 Environment Environment Action a k State s k Reward r k Agent 4
5
6
7
8
9
r k+1 s k+1 Environment Action a k Reward r k Critic Value Function State s k TD Error Policy Actor Agent 10
11
12
13
14
L. Busoniu, R. Babuska, and B. De Schutter, “A comprehensive survey of multiagent reinforcement learning,” IEEE Trans. Systems, Man and Cybernetics-Part C: Applications and Reviews, vol. 38, no.2, Mar. 2008. 15
L. Busoniu, R. Babuska, and B. De Schutter, “A comprehensive survey of multiagent reinforcement learning,” IEEE Trans. Systems, Man and Cybernetics-Part C: Applications and Reviews, vol. 38, no.2, Mar. 2008. 16
L. Busoniu, R. Babuska, and B. De Schutter, “A comprehensive survey of multiagent reinforcement learning,” IEEE Trans. Systems, Man and Cybernetics-Part C: Applications and Reviews, vol. 38, no.2, Mar. 2008. 17
L. Busoniu, R. Babuska, and B. De Schutter, “A comprehensive survey of multiagent reinforcement learning,” IEEE Trans. Systems, Man and Cybernetics-Part C: Applications and Reviews, vol. 38, no.2, Mar. 2008. Temporal- Game difference Theory RL Direct Policy Search 18
L. Busoniu, R. Babuska, and B. De Schutter, “A comprehensive survey of multiagent reinforcement learning,” IEEE Trans. Systems, Man and Cybernetics-Part C: Applications and Reviews, vol. 38, no.2, Mar. 2008. Task Type -> Cooperative Competitive Mixed Agent Awareness Independent Coordination-free Opponent- Agent-independent independent Tracking Coordination-based - Agent-tracking Aware Indirect Opponent-aware Agent-aware coordination 19
L. Busoniu, R. Babuska, and B. De Schutter, “A comprehensive survey of multiagent reinforcement learning,” IEEE Trans. Systems, Man and Cybernetics-Part C: Applications and Reviews, vol. 38, no.2, Mar. 2008. 20
L. Busoniu, R. Babuska, and B. De Schutter, “A comprehensive survey of multiagent reinforcement learning,” IEEE Trans. Systems, Man and Cybernetics-Part C: Applications and Reviews, vol. 38, no.2, Mar. 2008. Obstacle S 1 S 2 L 1 R 1 L 2 R 2 1 2 Q L 2 S 2 R 2 L 1 10 -5 0 S 1 -5 -10 -5 R 1 -10 -5 10 21
C. Guestrin, M. Lagoudakis, and R. Parr, “Coordinated reinforcement learning,” in Proc. Int’l Conf. Machine Learning (ICML-02), Jul. 2002. 1 Q 1 Q 3 f 4 2 3 Q 2 Q 4 4 22
C. Guestrin, M. Lagoudakis, and R. Parr, “Coordinated reinforcement learning,” in Proc. Int’l Conf. Machine Learning (ICML-02), Jul. 2002. 1 Q 1 Q 3 f 4 2 3 Q 2 Q 4 4 23
L. Busoniu, R. Babuska, and B. De Schutter, “A comprehensive survey of multiagent reinforcement learning,” IEEE Trans. Systems, Man and Cybernetics-Part C: Applications and Reviews, vol. 38, no.2, Mar. 2008. 24
L. Busoniu, R. Babuska, and B. De Schutter, “A comprehensive survey of multiagent reinforcement learning,” IEEE Trans. Systems, Man and Cybernetics-Part C: Applications and Reviews, vol. 38, no.2, Mar. 2008. L 1 R 1 L 2 R 2 1 2 Q 1 L 2 R 2 Q 2 L 2 R 2 L 1 0 1 L 1 0 -1 R 1 -10 10 R 1 10 -10 25
L. M. Littman, “Markov games as a framework for multi-agent reinforcement learning,” in Proc. Int’l Conf. Machine Learning (ICML-94 ), Jul. 1994. 26
L. Busoniu, R. Babuska, and B. De Schutter, “A comprehensive survey of multiagent reinforcement learning,” IEEE Trans. Systems, Man and Cybernetics-Part C: Applications and Reviews, vol. 38, no.2, Mar. 2008. 27
L. Busoniu, R. Babuska, and B. De Schutter, “A comprehensive survey of multiagent reinforcement learning,” IEEE Trans. Systems, Man and Cybernetics-Part C: Applications and Reviews, vol. 38, no.2, Mar. 2008. Q 1 L 2 R 2 L 1 0 3 L 1 R 1 1 Left R 1 2 0 Right Room Room L 2 R 2 Q 2 L 2 R 2 2 L 1 0 2 R 1 3 0 28
L. Busoniu, R. Babuska, and B. De Schutter, “A comprehensive survey of multiagent reinforcement learning,” IEEE Trans. Systems, Man and Cybernetics-Part C: Applications and Reviews, vol. 38, no.2, Mar. 2008. 29
30
31
Recommend
More recommend