What Do We Want AI and ML to Do? } Short answer: Lots of things! } - PowerPoint PPT Presentation

What Do We Want AI and ML to Do? } Short answer: Lots of things! } Intelligent robot and vehicle navigation } Better web search } Automated personal assistants } Scheduling for delivery vehicles, air traffic control, industrial processes, … } Simulated agents in video games Class #21: Markov Decision } Automated translation systems Processes as Models for Learning Machine Learning (COMP 135): M. Allen, 18 Nov. 19 2 Monday, 18 Nov. 2019 Machine Learning (COMP 135) 1 2 What Do We Need? Markov Decision Processes } AI systems must be able to handle complex, uncertain } Markov Decision Processes (MDPs) combine various worlds, and come up with plans that are useful to us over ideas from probability theory and decision theory extended periods of time } A useful model for doing full planning , and for representing environments where agents can learn what to do } Uncertainty : requires something like probability theory } Value-based planning : we want to maximize expected utility } Basic idea: a world made up of states, changing based on over time, as in decision theory the actions of an AI agent, who is trying to maximize its } Planning over time : we need some sort of temporal model of long-term reward as it does so how the world can change as we go about our business } One technical detail: change happens probabilistically (under the Markov assumption) 3 4 Monday, 18 Nov. 2019 Machine Learning (COMP 135) Monday, 18 Nov. 2019 Machine Learning (COMP 135) 3 4 1

Formal Definition of an MDP An Example: Maze Navigation } An MDP has several components } Suppose we have a robot in M = < S , A , P , R , T > a maze, looking for exit S = a set of states of the world } The robot can see where it 1. A = a set of actions an agent can take is currently, and where 2. surrounding walls are, but P = a state-transition function : P ( s, a, s´ ) is the 3. probability of ending up in state s´ if you start in state s doesn’t know anything else and you take action a : P( s´ | s, a ) } We would like it to be able R = a reward function : R ( s, a, s´ ) is the one-step 4. to learn the shortest route reward you get if you go from state s to state s´ after out of the maze, no matter taking action a where it starts T = a time horizon (how many steps): we assume that 5. } How can we formulate this every state-transition, following a single action, takes a problem as an MDP? single unit of time 5 6 Monday, 18 Nov. 2019 Machine Learning (COMP 135) Monday, 18 Nov. 2019 Machine Learning (COMP 135) 5 6 MDP for the Maze Problem Action Transitions } We can use the transition function to represent important features of the maze problem domain } For instance, the robot cannot move through walls S 2 S 3 } For example, if the robot starts in the corner ( s 1 ), and tries to go } States : each state is simply the robot’s current location S 1 S 4 DOWN , nothing happens: (imagine the map is a grid), including nearby walls P(s 1 , DOWN, s 1 ) = 1.0 } Actions : the robot can move in one of the four directions ( UP, DOWN, LEFT, RIGHT ) 7 8 Monday, 18 Nov. 2019 Machine Learning (COMP 135) Monday, 18 Nov. 2019 Machine Learning (COMP 135) 7 8 2

Action Transitions, II Action Transitions, II } Similarly, we can model uncertain } Similarly, we can model uncertain action outcomes using the action outcomes using the transition model transition model } Suppose the robot is a little } Suppose the robot is a little unstable, and occasionally goes in unstable, and occasionally goes in S 2 S 3 S 2 S 3 the wrong direction the wrong direction } Thus, if it starts in state s 1 and tries } Thus, if it starts in state s 1 and tries S 1 S 4 S 1 S 4 to go UP to s 2 : to go UP to s 2 : 80% of the time it works : 1. P(s 1 , UP, s 2 ) = 0.8 9 10 Monday, 18 Nov. 2019 Machine Learning (COMP 135) Monday, 18 Nov. 2019 Machine Learning (COMP 135) 9 10 Action Transitions, II Rewards in the Maze } Similarly, we can model uncertain } If G is our goal (exit) state , we can action outcomes using the “encourage” the robot, by giving any transition model action that gets to G positive reward: S 1 } Suppose the robot is a little R(s 1 , DOWN, G) = +100 unstable, and occasionally goes in R(s 2 , LEFT, G) = +100 S 2 the wrong direction G S 2 S 3 R(s 3 , UP, G) = +100 } Thus, if it starts in state s 1 and tries } Further, we can reward quicker to go UP to s 2 : S 3 S 1 S 4 solutions by making all other 80% of the time it works: 1. movements have negative reward, e.g.: P(s 1 , UP, s 2 ) = 0.8 R(s 1 , RIGHT, s´ ) = -1 But it may slip and miss : 2. R(s 2 , UP, s´ ) = -1 P(s 1 , UP, s 3 ) = 0.2 etc. 11 12 Monday, 18 Nov. 2019 Machine Learning (COMP 135) Monday, 18 Nov. 2019 Machine Learning (COMP 135) 11 12 3

Solving the Maze Planning and Learning } How do we find policies? } A solution to our problem takes the form of a policy of action, p } If we know the entire problem , we plan S 1 S 2 S 3 } At each state, it tells the agent the } e.g., if we already know the whole maze, and know all the best thing to do: MDP dynamics, we can solve it to find the best policy of action (even if we have to take into account the probability that some G π (s 1 ) = DOWN } S 4 S 5 movements fail some of the time) π (s 2 ) = LEFT } } If we don’t know it all ahead of time, we learn } Similarly for all other states… S 6 S 7 S 8 } Reinforcement Learning: use the positive and negative feedback from the one-step reward in an MDP, and figure out a policy that gives us long-term value 13 14 Monday, 18 Nov. 2019 Machine Learning (COMP 135) Monday, 18 Nov. 2019 Machine Learning (COMP 135) 13 14 Maximizing Expected Return The Infinite (Indefinite) Case } If we are solving a planning problem like an MDP, we want } Unfortunately, this simple idea doesn’t really work for problems with indefinite time-horizons our plan to give us maximum expected reward over time } In such problems, our agent can keep on acting, and we have } In a finite-time problem, the total reward we get at some no known upper bound on how long this may continue time-step t is just the sum of future rewards (up to our } In such cases we treat upper bound as if it is infinite : T = ∞ time-limit T ): } If the time-horizon T is infinite, then the sum of rewards: R t = r t+1 + r t+2 + … + r T R t = r t+1 + r t+2 + … + r T } The optimal policy would make this sum as large as can be infinitely large (or infinitely small), too! possible, taking into account any probabilistic outcomes (e.g. robot moves that go the wrong way by accident) 15 16 Monday, 18 Nov. 2019 Machine Learning (COMP 135) Monday, 18 Nov. 2019 Machine Learning (COMP 135) 15 16 4

What Do We Want AI and ML to Do? } Short answer: Lots of things! } - PowerPoint PPT Presentation

What Do We Want AI and ML to Do? } Short answer: Lots of things! } Intelligent robot and vehicle navigation } Better web search } Automated personal assistants } Scheduling for delivery vehicles, air traffic control, industrial processes, }

My colleges Jordyn keuchler-Carey I want to major in being a lawyer. I want to be a lawyer

By Jadyn ray What I want to major in What I want to major in is sports manegment I want to

PEOPLE JUST WANT TO CHIME IN TROLLS ARE PEOPLE WHO WANT TO BE HEARD. SETHMUSE.COM WE WANT TO

Social and emotional development Kate Lloyd Parent support advisor What do we want for our

Mesa County 2008 Budget December 7, 2008 Mesa County Strategic Plan I want Mesa County to have

YOUNG VOLUNTEERS ORGANISATION Story of YVO Because YOUNGSTERS in INDIA Want to Help Want to

Do you want to SHAPE the future of your village? Do you want to CONTROL how your village is

Evaluating MT Quality Evaluation of Why do we want to do it? Translation Quality - Want to

What do we want? And when do we want it? Alternative objectives and their implications for

PROOF! I want YOU! PROOF! I want YOU! Video Equipment AND The Foundation of Every Tra ffi c

Webinar March 21, 2020 Aim We want to deepen our understanding and establish links between

CEO NESA Australians want to work. They want to find a job quickly and stay employed .

Why Choose Child Development? You have a genuine interest in babies and children You want

The technical and mathematical challenges What are the goals that set the challenges? We want

ALL I WANT FOR ALL I WANT FOR CHRISTMAS IS... CHRISTMAS IS... JOY! JOY! And the angel said

WELCOME! READ ME. YOU WANT YOUR VOICE AND THOUGHTS HEARD, RIGHT? YOU ALSO WANT PEOPLE TO

Announcements Homework k 3: Game Trees s (lead TA: Zhaoqing) Due Tue 1 Oct at 11:59pm

Real Time Macro Monitoring and Fiscal Policy Florian Misch Florian Misch Centre for European

Clean Energy Group Webinar: Financing Resilient Power November 20, 2014 Rob Sanders, Clean

+ Credential Info Night While you are waiting Gretchen Andreasen & Iris Weaver Say

Q-learning 3-23-16 Markov Decision Processes (MDPs) States: S Actions A vs. A s

Stormwater Management - TBD Precipitation - TBD Rainfall Future projections Topics for

Mean-field methods: what can go wrong? with some applications to bike-sharing systems and caching

Background estimation in searches for binary inspiral Patrick Brady Inspiral Working Group LIGO

What Do We Want AI and ML to Do? } Short answer: Lots of things! } - PowerPoint PPT Presentation

What Do We Want AI and ML to Do? } Short answer: Lots of things! } Intelligent robot and vehicle navigation } Better web search } Automated personal assistants } Scheduling for delivery vehicles, air traffic control, industrial processes, }

My colleges Jordyn keuchler-Carey I want to major in being a lawyer. I want to be a lawyer

By Jadyn ray What I want to major in What I want to major in is sports manegment I want to

PEOPLE JUST WANT TO CHIME IN TROLLS ARE PEOPLE WHO WANT TO BE HEARD. SETHMUSE.COM WE WANT TO

Social and emotional development Kate Lloyd Parent support advisor What do we want for our

Mesa County 2008 Budget December 7, 2008 Mesa County Strategic Plan I want Mesa County to have

YOUNG VOLUNTEERS ORGANISATION Story of YVO Because YOUNGSTERS in INDIA Want to Help Want to

Do you want to SHAPE the future of your village? Do you want to CONTROL how your village is

Evaluating MT Quality Evaluation of Why do we want to do it? Translation Quality - Want to

What do we want? And when do we want it? Alternative objectives and their implications for

PROOF! I want YOU! PROOF! I want YOU! Video Equipment AND The Foundation of Every Tra ffi c

Webinar March 21, 2020 Aim We want to deepen our understanding and establish links between

CEO NESA Australians want to work. They want to find a job quickly and stay employed .

Why Choose Child Development? You have a genuine interest in babies and children You want

The technical and mathematical challenges What are the goals that set the challenges? We want

ALL I WANT FOR ALL I WANT FOR CHRISTMAS IS... CHRISTMAS IS... JOY! JOY! And the angel said

WELCOME! READ ME. YOU WANT YOUR VOICE AND THOUGHTS HEARD, RIGHT? YOU ALSO WANT PEOPLE TO

Announcements Homework k 3: Game Trees s (lead TA: Zhaoqing) Due Tue 1 Oct at 11:59pm

Real Time Macro Monitoring and Fiscal Policy Florian Misch Florian Misch Centre for European

Clean Energy Group Webinar: Financing Resilient Power November 20, 2014 Rob Sanders, Clean

+ Credential Info Night While you are waiting Gretchen Andreasen &amp; Iris Weaver Say

Q-learning 3-23-16 Markov Decision Processes (MDPs) States: S Actions A vs. A s

Stormwater Management - TBD Precipitation - TBD Rainfall Future projections Topics for

Mean-field methods: what can go wrong? with some applications to bike-sharing systems and caching

Background estimation in searches for binary inspiral Patrick Brady Inspiral Working Group LIGO

+ Credential Info Night While you are waiting Gretchen Andreasen & Iris Weaver Say