RL LECTURE 3 LEARNING FROM INTERACTION with environment to - PDF document

RL LECTURE 3 LEARNING FROM INTERACTION – with environment – to achieve some goal � Baby playing. No teacher. Sensorimotor connection to environment. – Cause – effect – Action – consequences – How to achieve goals � Learning to drive car, hold conversation, etc. – Environment’s response affects our subsequent actions – We find out the effects of our actions later 1

SIMPLE LEARNING TAXONOMY � Supervised Learning – “Teacher” provides required response to inputs. De- sired behaviour known. “Costly” � Unsupervised Learning – Learner looks for patterns in inputs. No “right” an- swer � Reinforcement Learning – Learner not told which actions to take, but gets reward/punishment from environment and adjusts/learns the action to pick next time. 2

� � REINFORCEMENT LEARNING Learning a mapping from situations to actions in order to maximise a scalar reward/reinforcement signal HOW? Try out actions to learn which produces highest reward – trial-and-error search Actions affect immediate reward next situation all subsequent rewards – delayed effects, delayed reward Situations, Actions, Goals Sense situations, choose actions TO achieve goals Environment uncertain 3

EXPLORATION/EXPLOITATION TRADE- OFF High rewards from trying previously-well-rewarded actions – EXPLOITATION BUT Which actions are best? Must try ones not tried before – EXPLORATION MUST DO BOTH Especially if task stochastic, try each action many times per situations to get reliable estimate of reward. Gradually prefer those actions that prove to lead to high reward. (Doesn’t arise in supervised learning) 4

EXAMPLES � Animal learning to find food and avoid predators � Robot trying to learn how to dock with charging station � Backgammon player learning to beat opponent � Football team trying to find strategies to score goals � Infant learning to feed itself with spoon � Cornet player learning to produce beautiful sounds � Temperature controller keeping FH warm while minimis- ing fuel consumption 5

✑ ✁ ✁ ✑ ✢ ☛ ✁ ✡ ✠ � FRAMEWORK State/ Situation s t AGENT Reward r t Action at rt+1 ENVIRONMENT st+1 Agent in situation �✂✁ chooses action ✄☎✁ One tick later in situation ✁✝✆✟✞ gets reward ✁✝✆✟✞ POLICY �✌☞✍✄✏✎✒✑ ✓✔✠☎✕✖✄ ✄✘✗✙� �✛✚ Given the situation at time ✜ is � the policy gives the probability the agent’s action will be ✄ . Reinforcement learning Get/find/learn the policy 6

� ✾ ✢ ✤ ✡ ✎ ✢ ✡ ✤ ✎ ✢ ✤ � ✑ ✗ ✡ ✢ ☛ � ☞ ✽ ✗ ☞ ✑ � ☛ ☛ ✡ ✾ ✰ ✽ ✡ ✎ ✢ ✡ ✎ ✢ ✡ ✤ ✢ ✡ ✎ ✢ � ✡ ✎ ✡ EXAMPLE POLICIES Find the coffee machine 3 4 2 1 start ☛ ✁� ☛ ✁� turn left or ☞ ✄✂✄☎✝✆✁✞✠✟☛✡✌☞✍✂ ✍✎ ☛ ✏✎ ☛ ✁� straight on ☞ ✒✑✁✂✒✆✔✓✖✕✘✗✚✙✛✂✢✜✣✞ ☛ ✦✥ ☛ ✁� turn right ☞ ✄✂✄☎✝✆✁✞✧✆✁✕✘✗✚✙✛✂ ✍✎ ☛ ✩★ ☛ ✁� go through door ☞ ✒✗✪✜✫✂✄✙✝✆✬✜✣☎✭✗✚✙✠✮✭✜✯✜✚✆ etc. Bandit problem 10 arms, Q table gives the Q value for each arm ✰ -greedy policy: �✼✻ ✿✺✾ ✄ ✲✱ ✄ ✲✱ ✓✳✆✁✗✵✴✠✓✷✶✲✸✺✹ ✄✏✎ ✎✒✑ else ✿✺✾ ☞✛✗ ❁❀ ✄✏✎✒✑ 7

❀ � ✡ ☛ ✠ ✎ ✄ � ✂ ✁ JARGON Policy Decision on what action to do �✌☞✍✄ in that state Reward function Defines goal, and good and bad experience for learner Value function Predicts reward. Estimate of total future reward Model of the environment Maps states and actions onto states � . If in state ✞ we take action ✞ predicts � ☎✄ (and sometimes reward ✄ ). Not all agents use models. Reward function and environmental model fixed external to agent. Policy, value function, estimate of model adjusted during learning. 8

✢ � ☛ � ✎ ✂ VALUE FUNCTIONS � How desirable is it to be in a certain state? What is its value ? Value Value is (an estimate of) the expected future reward from that state � Value vs. reward Long-term vs. immediate Want actions that lead to states of high value, not neces- sarily high immediate reward � Learn policy via learning value – when we know the values of states we can choose to go to states of high value cf. GAGP discover policy directly � Genotypical vs. phenotypical learning? (GAGP vs. RL) 9

☞ � � � ✄ GENERAL RL ALGORITHM 1. Initialise learner’s internal state (e.g. Q values, other statistics) 2. Do for a long time � Observe current world state � Choose action ✄ using the policy � Execute action � Let ✠ be immediate reward, ✱ new world state � Update internal state based on ✱ , previous in- �✌☞✍✄ ✠✖☞ ternal state 3. Output a policy based on, e.g. learnt Q values and follow it We need: � Decision on what constitutes an internal state � Decision on what constitutes a world state � Sensing of a world state � Action-choice mechanism (policy) based usually on � an evaluation (of current world and internal state) function � A means of executing the action � A way of updating the internal state 10

Environment (simulator?) provides � Transitions between world states, i.e. model � A reward function But of course the learner has to discover what these are while exploring the world. 11

EXAMPLE - 0 AND X See Sutton and Barto Section 1.4 and Figure 1.1. 12

✎ ☛ ☛ � ✎ ✑ � � ✆ � ✤ ✻ � ☛ ✎ ☛ � ☛ � � � ✑ � � ☛ � ✎ � ☛ � ✎ ✑ ☛ � � ✎ EXAMPLE Construct a player to play against an imperfect opponent For each board state, set up – estimate of probability of winning from that state XXX OOO ✂ initially Rest ✤ ✁� Play many games Move selection � mostly pick move leading to state with highest � sometimes explore Value adjustment � back-up value of states after non-exploratory moves to states preceding moves � e.g. ✆✞✝ � ☎✄ � ☎✄ � ✟✄ � ✟✄ ✎ ✡✠ ✎✒✑ ✆✟✞ Reduce over time converges to probabilities of winning – optimal policy 13

RL LECTURE 3 LEARNING FROM INTERACTION with environment to - PDF document

RL LECTURE 3 LEARNING FROM INTERACTION with environment to achieve some goal Baby playing. No teacher. Sensorimotor connection to environment. Cause effect Action consequences How to achieve goals Learning to

the interaction The Interaction interaction models translations between user and system

the interaction physical characteristics of interaction interaction styles the

SNR SNR- -cloud interaction cloud interaction cloud interaction SNR SNR cloud interaction

getting active after SCI Traditional Email Interaction: Traditional Email Interaction:

Scientific domain Human-Computer Interaction Interaction Computer science Supported by

The project INTERACTION Driver INTERACTION with in-vehicle technologies EU 7 th framework

MMI 2: Mobile Human- Computer Interaction Sensor-Based Mobile Interaction Prof. Dr. Michael

TACTILE AND MICHEL BEAUDOUIN-LAFON UNIVERSIT PARIS-SUD & INSTITUT UNIVERSITAIRE DE

MMI 2: Mobile Human- Computer Interaction Small and Large Display Interaction Prof. Dr. Michael

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

Lecture 7 Interaction Fundamentals Mark Woehrer CS 3053 - Human-Computer Interaction

RL LECTURE 3 SIMPLE LEARNING TAXONOMY LEARNING FROM INTERACTION Supervised Learning with

HoneyDrone: a medium-interaction Unmanned Aerial Vehicle HoneyDrone: a medium-interaction Unmanned

Sequence Diagrams: Interaction Frames Ferd van Odenhoven Fontys Hogeschool voor Techniek en

Trade-Offs in Human-AI Interaction Human-AI Interaction Luigi De Russis Academic Year 2019/2020

Chris Snijders - Irrelevant private stuff 2 Chris Snijders @Dagstuhl The models themselves

Ch 4 SAQs (Pop Quiz) 1. How would you go about getting the 'what'? 2. Why are Post-its so

Instance-level recognition part 2 Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW,

Instance-level recognition: Local invariant features Cordelia Schmid INRIA, Grenoble Overview

LETS GET YOUR DOCUMENTATION RIGHT ALL ABOUT ME DANIELE PROCIDA Divio (cloud hosting for

L1 DOCUMENTATION TOOLS TF-NOC, Zurich, 06/2011. L1 documentation tool - outline

5.2 MAS for managing the personal information space: ILTIS Lorenz (2001) ILTIS: Information

Helping Students Understand Texts as People Talking Doug Downs Montana State University

Writing in business/academic situations Bernt Arne degaard Writing for business/academics is

RL LECTURE 3 LEARNING FROM INTERACTION with environment to - PDF document

RL LECTURE 3 LEARNING FROM INTERACTION with environment to achieve some goal Baby playing. No teacher. Sensorimotor connection to environment. Cause effect Action consequences How to achieve goals Learning to

the interaction The Interaction interaction models translations between user and system

the interaction physical characteristics of interaction interaction styles the

SNR SNR- -cloud interaction cloud interaction cloud interaction SNR SNR cloud interaction

getting active after SCI Traditional Email Interaction: Traditional Email Interaction:

Scientific domain Human-Computer Interaction Interaction Computer science Supported by

The project INTERACTION Driver INTERACTION with in-vehicle technologies EU 7 th framework

MMI 2: Mobile Human- Computer Interaction Sensor-Based Mobile Interaction Prof. Dr. Michael

TACTILE AND MICHEL BEAUDOUIN-LAFON UNIVERSIT PARIS-SUD &amp; INSTITUT UNIVERSITAIRE DE

MMI 2: Mobile Human- Computer Interaction Small and Large Display Interaction Prof. Dr. Michael

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

Lecture 7 Interaction Fundamentals Mark Woehrer CS 3053 - Human-Computer Interaction

RL LECTURE 3 SIMPLE LEARNING TAXONOMY LEARNING FROM INTERACTION Supervised Learning with

HoneyDrone: a medium-interaction Unmanned Aerial Vehicle HoneyDrone: a medium-interaction Unmanned

Sequence Diagrams: Interaction Frames Ferd van Odenhoven Fontys Hogeschool voor Techniek en

Trade-Offs in Human-AI Interaction Human-AI Interaction Luigi De Russis Academic Year 2019/2020

Chris Snijders - Irrelevant private stuff 2 Chris Snijders @Dagstuhl The models themselves

Ch 4 SAQs (Pop Quiz) 1. How would you go about getting the 'what'? 2. Why are Post-its so

Instance-level recognition part 2 Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW,

Instance-level recognition: Local invariant features Cordelia Schmid INRIA, Grenoble Overview

LETS GET YOUR DOCUMENTATION RIGHT ALL ABOUT ME DANIELE PROCIDA Divio (cloud hosting for

L1 DOCUMENTATION TOOLS TF-NOC, Zurich, 06/2011. L1 documentation tool - outline

5.2 MAS for managing the personal information space: ILTIS Lorenz (2001) ILTIS: Information

Helping Students Understand Texts as People Talking Doug Downs Montana State University

Writing in business/academic situations Bernt Arne degaard Writing for business/academics is

TACTILE AND MICHEL BEAUDOUIN-LAFON UNIVERSIT PARIS-SUD & INSTITUT UNIVERSITAIRE DE