AI Basics Heechul Yun Acknowledgement: Many slides are adopted from Berkeley’s CS188 AI slide deck.
Markov Decision Process (MDP) • In MDP, “Markov” means action outcomes depend only on the current state Andrey Markov (1856-1922)
Markov Decision Process (MDP) • Markov decision process s – Set of states S a – Start state s 0 – Set of actions A s, a – Transitions P(s’ |s,a) (or T(s,a,s ’)) – Rewards R(s,a,s ’ ) (and discount ) s,a,s ’ s ’ • Policy – Choice of action for each state • Utility – Sum of (discounted) rewards
Q-Learning • Q-Learning: sample-based Q-value iteration • Learn Q(s,a) values as you go – Receive a sample ( s,a,s’,r ) – Consider your old estimate: – Consider your new sample estimate: – Incorporate the new estimate into a running average:
Q-learning • Problems – Q(s,a) table grows exponentially – Cannot deal with complex problems
A Neuron Image credit: Lex Fridman, MIT
Neural Network • Regular neural network • Convolutional neural network
DQN • Q function is estimated with a neural network • Plus a few “tricks” – Experience replay to improve robustness
DQN Implementations • Plenty on the web
OpenAI Gym https://gym.openai.com/
OpenAI Gym • Easy record & replay [link] https://gym.openai.com/
OpenAI Universe https://universe.openai.com/
Udacity Self-Driving Car Simulator [link]
Resources • OpenAI – https://gym.openai.com/ – https://universe.openai.com/ • Udacity self-driving car simul – https://github.com/udacity/self-driving-car-sim • AI lectures for autonomous systems – http://ai.berkeley.edu/ – http://rll.berkeley.edu/deeprlcourse/ – http://selfdrivingcars.mit.edu/ – David Silver, Tutorial: Deep Reinforcement Learning
Recommend
More recommend