agent based systems
play

Agent-Based Systems Agent: autonomous Learning for Agent-Based - PowerPoint PPT Presentation

Agent-Based Systems Agent: autonomous Learning for Agent-Based Systems Environment: fully, partially, not observable S awomir Nowaczyk deterministic, stochastic, strategic actions static, dynamic, stationary, non-stationary episodic,


  1. Agent-Based Systems Agent: autonomous Learning for Agent-Based Systems Environment: fully, partially, not observable S � awomir Nowaczyk deterministic, stochastic, strategic actions static, dynamic, stationary, non-stationary episodic, sequential, discrete, continuous Computer Science Lab Department of Automatics An autonomous agent is a system AGH University of Science and Technology situated within and a part of an environment Kraków, Poland that senses that environment and acts on it, over time, April 27, 2009 in pursuit of its own agenda AI@CS AI@CS and so as to affect what it senses in the future. Department of Department of Computing Science Computing Science Learning for Agent-Based Systems – p. 1/57 Learning for Agent-Based Systems – p. 2/57 Why Agents? Types of Agents Information integration & knowledge sharing For each possible percept sequence, an ideal rational agent should choose the action Coordination & cooperative problem-solving that is expected to Autonomous mobile robots maximise its performance measure, Believable agents & artificial life on the basis of the evidence provided by the Reactive percept sequence and systems that respond in a timely fashion to whatever built-in knowledge the agent has. various changes in the environment Agent needs a performance measure Goal-oriented, pro-active & purposeful domain- and task-specific Socially communicative often non-trivial to design and/or evaluate able to communicate with other agents Omniscient vs rational agents AI@CS AI@CS including people Limits on available perceptual history Department of Department of Computing Science Computing Science Learning for Agent-Based Systems – p. 3/57 Learning for Agent-Based Systems – p. 4/57

  2. Agent Implementation Behaviour of an Agent Architecture while True: Observe_Environment() computational structures for encoding, representing and manipulating knowledge Update_Memory() and producing actions in pursuit of goals Choose_Best_Action() like a specialised programming language Update_Memory() often specific theory of intelligent behaviour Execute_Action() Agent program Agent Sensors Percepts content that is being processed by architectural computational structures Environment ? corresponding to particular problem domains reflecting particular selection of algorithms AI@CS AI@CS Agent input data Actions Actuators Department of Department of Computing Science Computing Science Learning for Agent-Based Systems – p. 5/57 Learning for Agent-Based Systems – p. 6/57 Reflex Agent Stateful Agent Agent Sensors Sensors State What the world What the world How the world evolves is like now is like now Environment Environment What my actions do What action I What action I Condition-action rules Condition-action rules should do now should do now Agent Actuators Actuators AI@CS AI@CS Department of Department of Computing Science Computing Science Learning for Agent-Based Systems – p. 7/57 Learning for Agent-Based Systems – p. 8/57

  3. Goal Based Agent Utility Based Agent Sensors Sensors State State What the world What the world How the world evolves is like now How the world evolves is like now Environment Environment What it will be like What it will be like What my actions do if I do action A What my actions do if I do action A How happy I will be Utility in such a state What action I What action I Goals should do now should do now Agent Actuators Agent Actuators AI@CS AI@CS Department of Department of Computing Science Computing Science Learning for Agent-Based Systems – p. 9/57 Learning for Agent-Based Systems – p. 10/57 Learning Agent SOAR Architecture Performance standard Sensors Critic feedback Environment changes Learning Performance element element knowledge learning goals Problem generator AI@CS AI@CS Actuators Agent Department of Department of Computing Science Computing Science Learning for Agent-Based Systems – p. 11/57 Learning for Agent-Based Systems – p. 12/57

  4. SOAR Architecture Reinforcement Learning Learning from interactions with environment no teacher to know the “right answers” Trial-and-error search perform an action evaluate response of the environment Delayed rewards some actions yield immediate rewards others simply lead to “better” states Some similarities to baby playing cause-effect relationship AI@CS AI@CS Markov property Department of Department of Computing Science Computing Science Learning for Agent-Based Systems – p. 13/57 Learning for Agent-Based Systems – p. 14/57 Reinforcement Learning Reinforcement Learning Learning a mapping from situations to actions Agent in order to maximise scalar reward value reward action state Actions are selected based on past experiences r t a t s t Exploitation r t+ 1 s t+ 1 Environment try previously well-rewarded actions expecting similar results Exploration try new sequences of actions they may turn out to be even better Proper balancing is difficult AI@CS AI@CS especially in stochastic or non-stationary Department of Department of environments Computing Science Computing Science Learning for Agent-Based Systems – p. 15/57 Learning for Agent-Based Systems – p. 16/57

  5. Policy n-armed Bandit In situation s t agent chooses action a world changes to s t +1 agent perceives s t +1 and receives r t +1 Policy π ( s, a ) = Pr { a t = a | s t = s } probability that agent will choose a given that current state is s n-armed bandit problem n actions to choose from each one yields stochastic reward exact distribution is unknown maximise long-term profit AI@CS AI@CS Department of Department of Computing Science Computing Science Learning for Agent-Based Systems – p. 17/57 Learning for Agent-Based Systems – p. 18/57 � -greedy Policy Value Function Agent can estimate payoff of each arm n-armed bandit environment is episodic based on past action executions actions of the agent do not change world state we only care about immediate reward Such estimate is called Q value In most interesting environments, however, some Obvious solution: greedy policy states are better than others always choose action with highest Q value agent should think in a longer perspective But this completely ignores exploration Reward function � -greedy policy immediate payoff for executing action a choose random action every now and then Value function π ( s, a ∗ | a ∗ = arg max Q ( a )) = 1 − � + � | A | expected future reward from a given state π ( s, a | a � = arg max Q ( a )) = � AI@CS AI@CS long-term perspective | A | Department of Department of Computing Science Computing Science Learning for Agent-Based Systems – p. 19/57 Learning for Agent-Based Systems – p. 20/57

  6. General Reinforcement Learning Algorithm Problem Specification Initialise agent’s internal state Decision on what constitutes an internal state Q values, V values, policy π , etc. representation of agent’s knowledge while not Good_Enough(): Decision on what constitutes a world state choose action a using policy π as complete as possible execute action a Means of sensing a world state observe immediate reward r Action-choice mechanism observe new world state s � policy update internal state based on s, a, r, s � an evaluation function Output resulting policy π of current world and internal state A means of executing the action AI@CS AI@CS A way of updating the internal state Department of Department of Computing Science Computing Science Learning for Agent-Based Systems – p. 21/57 Learning for Agent-Based Systems – p. 22/57 The Environment Tic-Tac-Toe Definition of the environment must consist of Play against imperfect opponent state transition function Reward is 1 for win, − 1 for loss or draw probability that executing action a in state s 0 for every other move will transform world into state s � V ( s ) is estimate probability of winning reward function from state s how much reward agent gets for carrying out V ( XXX ) = 1 particular actions or ending in particular states X O O V ( OOO ) = 0 This is often called model of the environment V ( ∗ ) = 0 . 5 If acting in real world, transition function is given O X X in simulator, it must be programmed Reward function is always specified explicitly X AI@CS AI@CS always make sure you measure the right thing Department of Department of Computing Science Computing Science Learning for Agent-Based Systems – p. 23/57 Learning for Agent-Based Systems – p. 24/57

Recommend


More recommend