TE: Discrete-Continuous Discrete task environments § State of environment is discrete § Time of environment is discrete § Percepts and/or actions are discrete • E.g. chess has discrete state, percepts and actions Continuous task environments § State of environment is continuous § Time of environment is continuous § Percepts and/or actions are continuous E.g. taxi driving is continuous state, time, percepts and actions •
Types of Task Environments Task Env Observ Agents Stochastic Episodic Static Discrete Fully Single DeterminisJc SequenJal StaJc Discrete Crossword Fully MulJ DeterminisJc SequenJal Semi Discrete Chess (clock)
Types of Task Environments Task Env Observ Agents Stochastic Episodic Static Discrete Fully Single DeterminisJc SequenJal StaJc Discrete Crossword Fully MulJ DeterminisJc SequenJal Discrete Chess (clock) Semi Poker ParJally MulJ StochasJc SequenJal StaJc Discrete Backgammon Fully MulJ StochasJc SequenJal StaJc Discrete
Types of Task Environments Task Env Observ Agents Stochastic Episodic Static Discrete Fully Single DeterminisJc SequenJal StaJc Discrete Crossword Fully MulJ DeterminisJc SequenJal Discrete Chess (clock) Semi Poker ParJally MulJ StochasJc SequenJal StaJc Discrete Backgammon Fully MulJ StochasJc SequenJal StaJc Discrete ParJally MulJ StochasJc SequenJal Dynamic ConJnuous Taxi-driving ParJally Single StochasJc SequenJal Dynamic ConJnuous Medical-diag
Types of Task Environments Task Env Observ Agents Stochastic Episodic Static Discrete Fully Single DeterminisJc SequenJal StaJc Discrete Crossword Fully MulJ DeterminisJc SequenJal Discrete Chess (clock) Semi Poker ParJally MulJ StochasJc SequenJal StaJc Discrete Backgammon Fully MulJ StochasJc SequenJal StaJc Discrete ParJally MulJ StochasJc SequenJal Dynamic ConJnuous Taxi-driving ParJally Single StochasJc SequenJal Dynamic ConJnuous Medical-diag Fully Single DeterminisJc Episodic Semi ConJnuous Image-analysis ParJally Single StochasJc Episodic Dynamic ConJnuous Part-pick-robot
Types of Task Environments Task Env Observ Agents Stochastic Episodic Static Discrete Fully Single DeterminisJc SequenJal StaJc Discrete Crossword Fully MulJ DeterminisJc SequenJal Discrete Chess (clock) Semi Poker ParJally MulJ StochasJc SequenJal StaJc Discrete Backgammon Fully MulJ StochasJc SequenJal StaJc Discrete ParJally MulJ StochasJc SequenJal Dynamic ConJnuous Taxi-driving ParJally Single StochasJc SequenJal Dynamic ConJnuous Medical-diag Fully Single DeterminisJc Episodic Semi ConJnuous Image-analysis ParJally Single StochasJc Episodic Dynamic ConJnuous Part-pick-robot ParJally Single StochasJc SequenJal Dynamic ConJnuous Refinery ctrl ParJally MulJ StochasJc SequenJal Dynamic Discrete English tutor
Simple Reflex Agent (Episodic) Agent Sensors Percepts Current internal state What the world is like now Background information What action I Condition-action rules should do now Actuators Actions function Simple-Reflex-Agent (percept) returns action persistent set-of-condition-action rules state = Interpret-Input(percept) rule = Rule-Match(state, rules) return rule.Action
Simple Reflex Agent (Episodic) Agent Sensors Percepts What the world is like now What action I Condition-action rules should do now Actuators Actions Example: The car in front is braking • Very simple and very fast as a consequence • However, one can do better by learning front-car behavior
Model-Based Reflex Agent (Seq) Sensors Percepts state, action What the world How the world evolves is like now What my actions do Next state Current model output model What action I Condition-action rules should do now Agent Actuators Actions The most effective way to handle partial observability • Keep track of the part of the world it can’t see now • The agent should maintain the previous state and action • Which depend on the percept history. Braking car: 1-2 frames
Model-Based Reflex Agent (Seq) Sensors Percepts state, action What the world How the world evolves is like now What my actions do What action I Condition-action rules should do now Agent Actuators Actions function Model-Based-Reflex-Agent (percept) returns action persistent state, action, model, rules State-based state = Update-State(state, action, percept, model) description rule = Rule-Match(state, rules) of an agent action = rule.Action (white box) return action
Model-Based Reflex Agent (Seq) Percepts Agent Actions function Model-Based-Reflex-Agent (percepts) returns actions Input-output description of an agent (black box) This is what is observable and on what the performance is measured!
Model-Based Goal-Based Agent Sensors Percepts What the world state, action is like now How the world evolves What my actions do What action I Condition-action rules should do now Agent Actuators Actions Knowing internal state not always enough to decide what to do • At a road junction a car can turn left, right or go straight
Model-Based Goal-Based Agent Sensors Percepts What the world state, action is like now How the world evolves What my actions do What action I What are my goals should do now Goal Agent Actuators information Actions Knowing internal state not always enough to decide what to do • At a road junction a car can turn left, right or go straight • Correct decision depends on where the car wants to go
Model-Based Goal-Based Agent Sensors Percepts What the world state, action is like now How the world evolves What it will be if What my actions do I do action A Consideration of the future What action I What are my goals should do now Agent Actuators Actions Knowing internal state not always enough to decide what to do • At a road junction a car can turn left, right or go straight • Correct decision depends on where the car wants to go • Search and planning involves consideration of the future
Model-Based Goal-Based Agent Sensors Percepts What the world state, action is like now How the world evolves What it will be if What my actions do I do action A What action I What are my goals should do now Agent Actuators Actions Goal-based agent versus model-based reflex agent • Less efficient but more flexible as knowledge is explicitly represented • Goals alone are not sufficient as they do not consider performance • Taxi driving: faster, cheaper, more reliable, safer.
Model-Based Utility-Based Agent Sensors Percepts What the world state, action is like now How the world evolves What it will be if What my actions do I do action A How happy I will What is my utility be in this state Performance What action I measure (int) should do now Agent Actuators Actions • Goals provide only a crude binary distinction: happy, unhappy • Utilities provide a more general internalization of performance measure
Model-Based Utility-Based Agent Sensors Percepts What the world state, action is like now How the world evolves What it will be if What my actions do I do action A How happy I will What is my utility be in this state What action I should do now Agent Actuators Actions Is it that simple? Just build agents maximizing expected utility? • Keep track of environment: perception, modeling, reasoning, learning
Summary: RaJonal Agent Percepts e Plant b a Filter Controller e e actuators sensors P( s’ | s,a ), P( e | s ) P( b | e , a ) R ( b ) , π ( b ) Actions Plant reference (Environment) What state if I see e? path What utility in b ? CPS P( s’ | s,a ), P( e | s ) What action in b ? = Planner RaJonal Map How the world evolves Agent What my actions do What overall goal?
Model-Based Utility-Based Agent Sensors Percepts What the world state, action is like now How the world evolves What it will be if What my actions do I do action A How happy I will What is my utility be in this state What action I should do now Agent Actuators Actions Hoiw does one develop such agents? • Turing: Manually is too tedious. One should learn them
General Learning Agent performance standard Sensors Critic Percepts feedback changes Learning Performance Element Element knowledge learning goals Problem Actuators Generator Actions Agent Turing proposes to build learning machines and teach them • 4 components: Learning, performance, and critic elements, problem gen.
General Learning Agent performance standard Sensors Critic Responsible for making Percepts improvements feedback changes Learning Performance Element Element knowledge learning goals Problem Actuators Generator Actions Agent
General Learning Agent performance standard Sensors Critic Responsible for selecting Percepts external actions feedback changes Learning Performance Element Element knowledge learning goals Problem Actuators Generator Actions Agent
General Learning Agent performance standard How performance element should be changed to do better? Sensors Critic Percepts feedback changes Learning Performance Element Element knowledge learning goals Problem Actuators Generator Actions Agent
General Learning Agent performance standard Sensors Critic Percepts feedback changes Learning Performance Element Element knowledge learning goals Problem Actuators Generator Actions Suggest actions that will lead to Agent new and informative experiences
General Learning Agent performance standard Sensors Critic Percepts feedback changes Learning Performance Element Element knowledge learning goals Problem Actuators Generator Actions Agent Preferred method of creating agents in many AI areas • Advantage: Allows the agent to operate in initially unknown environments
Recommend
More recommend