Approximate Q-Learning Update Initialize weight for each feature to - PowerPoint PPT Presentation

<latexit sha1_base64="cCR4OZ9l/OETB0Xei2CIfbgOzM=">ACDHicbVDLSsNAFJ3UV62vqEs3g1WoICURQV0Uim5ctmBsoY1hMp20QyeTMDNRSsgPuPFX3LhQcesHuPNvnLZaOuBC4dz7uXe/yYUaks69soLCwuLa8UV0tr6xubW+b2zq2MEoGJgyMWibaPJGUE0dRxUg7FgSFPiMtf3g19lv3REga8Rs1iokboj6nAcVIackzD5oVeYyOYA12ZRJ6Ka3Z2V3KMxh4dOo8eNQzy1bVmgDOEzsnZCj4Zlf3V6Ek5BwhRmSsmNbsXJTJBTFjGSlbiJjPAQ9UlHU45CIt108k0GD7XSg0EkdHEFJ+rviRSFUo5CX3eGSA3krDcW/M6iQrO3ZTyOFGE4+miIGFQRXAcDexRQbBiI0QFlTfCvEACYSVDrCkQ7BnX54nzkn1omo3T8v1yzyNItgD+6ACbHAG6uAaNIADMHgEz+AVvBlPxovxbnxMWwtGPrML/sD4/AG0/pmn</latexit> <latexit sha1_base64="cCR4OZ9l/OETB0Xei2CIfbgOzM=">ACDHicbVDLSsNAFJ3UV62vqEs3g1WoICURQV0Uim5ctmBsoY1hMp20QyeTMDNRSsgPuPFX3LhQcesHuPNvnLZaOuBC4dz7uXe/yYUaks69soLCwuLa8UV0tr6xubW+b2zq2MEoGJgyMWibaPJGUE0dRxUg7FgSFPiMtf3g19lv3REga8Rs1iokboj6nAcVIackzD5oVeYyOYA12ZRJ6Ka3Z2V3KMxh4dOo8eNQzy1bVmgDOEzsnZCj4Zlf3V6Ek5BwhRmSsmNbsXJTJBTFjGSlbiJjPAQ9UlHU45CIt108k0GD7XSg0EkdHEFJ+rviRSFUo5CX3eGSA3krDcW/M6iQrO3ZTyOFGE4+miIGFQRXAcDexRQbBiI0QFlTfCvEACYSVDrCkQ7BnX54nzkn1omo3T8v1yzyNItgD+6ACbHAG6uAaNIADMHgEz+AVvBlPxovxbnxMWwtGPrML/sD4/AG0/pmn</latexit> <latexit sha1_base64="cCR4OZ9l/OETB0Xei2CIfbgOzM=">ACDHicbVDLSsNAFJ3UV62vqEs3g1WoICURQV0Uim5ctmBsoY1hMp20QyeTMDNRSsgPuPFX3LhQcesHuPNvnLZaOuBC4dz7uXe/yYUaks69soLCwuLa8UV0tr6xubW+b2zq2MEoGJgyMWibaPJGUE0dRxUg7FgSFPiMtf3g19lv3REga8Rs1iokboj6nAcVIackzD5oVeYyOYA12ZRJ6Ka3Z2V3KMxh4dOo8eNQzy1bVmgDOEzsnZCj4Zlf3V6Ek5BwhRmSsmNbsXJTJBTFjGSlbiJjPAQ9UlHU45CIt108k0GD7XSg0EkdHEFJ+rviRSFUo5CX3eGSA3krDcW/M6iQrO3ZTyOFGE4+miIGFQRXAcDexRQbBiI0QFlTfCvEACYSVDrCkQ7BnX54nzkn1omo3T8v1yzyNItgD+6ACbHAG6uAaNIADMHgEz+AVvBlPxovxbnxMWwtGPrML/sD4/AG0/pmn</latexit> <latexit sha1_base64="cCR4OZ9l/OETB0Xei2CIfbgOzM=">ACDHicbVDLSsNAFJ3UV62vqEs3g1WoICURQV0Uim5ctmBsoY1hMp20QyeTMDNRSsgPuPFX3LhQcesHuPNvnLZaOuBC4dz7uXe/yYUaks69soLCwuLa8UV0tr6xubW+b2zq2MEoGJgyMWibaPJGUE0dRxUg7FgSFPiMtf3g19lv3REga8Rs1iokboj6nAcVIackzD5oVeYyOYA12ZRJ6Ka3Z2V3KMxh4dOo8eNQzy1bVmgDOEzsnZCj4Zlf3V6Ek5BwhRmSsmNbsXJTJBTFjGSlbiJjPAQ9UlHU45CIt108k0GD7XSg0EkdHEFJ+rviRSFUo5CX3eGSA3krDcW/M6iQrO3ZTyOFGE4+miIGFQRXAcDexRQbBiI0QFlTfCvEACYSVDrCkQ7BnX54nzkn1omo3T8v1yzyNItgD+6ACbHAG6uAaNIADMHgEz+AVvBlPxovxbnxMWwtGPrML/sD4/AG0/pmn</latexit> Approximate Q-Learning Update Initialize weight for each feature to 0. Every time we take an action, perform this update: The Q-value estimate for (s,a) is the weighted sum of its features: n X Q ( s, a ) = f i ( s, a ) w i i =1

<latexit sha1_base64="DA+nU6WuIiNWRNEhvmCSzynrSI=">ACI3icbZDNSsNAFIUn9a/Wv6pLN4NFqCAlEUFCkU3LluwtDUMJlO2qGTSZiZtJSQh3Hjq7hxocWNC9/FSdqFtl4YOJzvXu7c4aMSmWaX0ZuZXVtfSO/Wdja3tndK+4fPMogEpg0cAC0XaRJIxy0lRUMdIOBUG+y0jLHd6lvDUiQtKAP6hJSLo+6nPqUYyUtpziTaMsz9AprEJbRr4T06qVPMU8gZ5DZ2Ts0JSOCI7HCbRxL1DQy5BTLJkVMyu4LKy5KIF51Z3i1O4FOPIJV5ghKTuWGapujISimJGkYEeShAgPUZ90tOTIJ7IbZ0cm8EQ7PegFQj+uYOb+noiRL+XEd3Wnj9RALrLU/I91IuVdWPKw0gRjmeLvIhBFcA0MdijgmDFJlogLKj+K8QDJBWOteCDsFaPHlZNM8r1xWrcVGq3c7TyIMjcAzKwAKXoAbuQR0AQbP4BW8gw/jxXgzpsbnrDVnzGcOwZ8yvn8AzemiEA=</latexit> <latexit sha1_base64="DA+nU6WuIiNWRNEhvmCSzynrSI=">ACI3icbZDNSsNAFIUn9a/Wv6pLN4NFqCAlEUFCkU3LluwtDUMJlO2qGTSZiZtJSQh3Hjq7hxocWNC9/FSdqFtl4YOJzvXu7c4aMSmWaX0ZuZXVtfSO/Wdja3tndK+4fPMogEpg0cAC0XaRJIxy0lRUMdIOBUG+y0jLHd6lvDUiQtKAP6hJSLo+6nPqUYyUtpziTaMsz9AprEJbRr4T06qVPMU8gZ5DZ2Ts0JSOCI7HCbRxL1DQy5BTLJkVMyu4LKy5KIF51Z3i1O4FOPIJV5ghKTuWGapujISimJGkYEeShAgPUZ90tOTIJ7IbZ0cm8EQ7PegFQj+uYOb+noiRL+XEd3Wnj9RALrLU/I91IuVdWPKw0gRjmeLvIhBFcA0MdijgmDFJlogLKj+K8QDJBWOteCDsFaPHlZNM8r1xWrcVGq3c7TyIMjcAzKwAKXoAbuQR0AQbP4BW8gw/jxXgzpsbnrDVnzGcOwZ8yvn8AzemiEA=</latexit> <latexit sha1_base64="DA+nU6WuIiNWRNEhvmCSzynrSI=">ACI3icbZDNSsNAFIUn9a/Wv6pLN4NFqCAlEUFCkU3LluwtDUMJlO2qGTSZiZtJSQh3Hjq7hxocWNC9/FSdqFtl4YOJzvXu7c4aMSmWaX0ZuZXVtfSO/Wdja3tndK+4fPMogEpg0cAC0XaRJIxy0lRUMdIOBUG+y0jLHd6lvDUiQtKAP6hJSLo+6nPqUYyUtpziTaMsz9AprEJbRr4T06qVPMU8gZ5DZ2Ts0JSOCI7HCbRxL1DQy5BTLJkVMyu4LKy5KIF51Z3i1O4FOPIJV5ghKTuWGapujISimJGkYEeShAgPUZ90tOTIJ7IbZ0cm8EQ7PegFQj+uYOb+noiRL+XEd3Wnj9RALrLU/I91IuVdWPKw0gRjmeLvIhBFcA0MdijgmDFJlogLKj+K8QDJBWOteCDsFaPHlZNM8r1xWrcVGq3c7TyIMjcAzKwAKXoAbuQR0AQbP4BW8gw/jxXgzpsbnrDVnzGcOwZ8yvn8AzemiEA=</latexit> <latexit sha1_base64="DA+nU6WuIiNWRNEhvmCSzynrSI=">ACI3icbZDNSsNAFIUn9a/Wv6pLN4NFqCAlEUFCkU3LluwtDUMJlO2qGTSZiZtJSQh3Hjq7hxocWNC9/FSdqFtl4YOJzvXu7c4aMSmWaX0ZuZXVtfSO/Wdja3tndK+4fPMogEpg0cAC0XaRJIxy0lRUMdIOBUG+y0jLHd6lvDUiQtKAP6hJSLo+6nPqUYyUtpziTaMsz9AprEJbRr4T06qVPMU8gZ5DZ2Ts0JSOCI7HCbRxL1DQy5BTLJkVMyu4LKy5KIF51Z3i1O4FOPIJV5ghKTuWGapujISimJGkYEeShAgPUZ90tOTIJ7IbZ0cm8EQ7PegFQj+uYOb+noiRL+XEd3Wnj9RALrLU/I91IuVdWPKw0gRjmeLvIhBFcA0MdijgmDFJlogLKj+K8QDJBWOteCDsFaPHlZNM8r1xWrcVGq3c7TyIMjcAzKwAKXoAbuQR0AQbP4BW8gw/jxXgzpsbnrDVnzGcOwZ8yvn8AzemiEA=</latexit> AQL Update Details • The weighted sum of features is equivalent to a dot product between the feature and weight vectors: n X Q ( s, a ) = f i ( s, a ) w i = ~ w · f ( s, a ) i =1 • The correction term is the same for all features. • The correction to each feature is weighted by how “active” that feature was.

Exercise: Feature Q-Update Reward eating food: discount: .95 +10 learning rate: .3 Reward for losing: -500 • Suppose PacMan is considering the up action. Old Feature Values: w bias = 1 w ghosts = -20 w food = 2 Features: w eats = 4 • closest-food • bias • eats-food • #-of-ghosts-1-step-away

Notes on Approximate Q-Learning • Learns weights for a tiny number of features. • Every feature’s value is update every step. • No longer tracking values for individual (s,a) pairs. • (s,a) value estimates are calculated from features. • The weight update is a form of gradient descent. • We’ve seen this before. • We’re performing a variant of linear regression. • Feature extraction is a type of basis change. • We’ll see these again.

Approximate Q-Learning Update Initialize weight for each feature to - PowerPoint PPT Presentation

<latexit

Approximate Q-Learning 3/23/18 On-Policy Learning (SARSA) Instead of updating based on the best

Sensors Approximate computing approximate edge detection Machine learning hidden

Approximate Q-Learning 2/24/17 State Value V(s) vs. Action Value Q(s,a) " # X t r t

Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning

Time Series Forecasting With a Learning Algorithm: An Approximate Dynamic Programming Approach

Reinforcement Learning: Approximate Dynamic Programming Decision Making Under Uncertainty, Chapter

Probabilistic & Unsupervised Learning Convex Algorithms in Approximate Inference Maneesh

Backward Analysis via Over-Approximate Abstraction and Under-Approximate Subtraction Alexey

Approximate Kernel Methods and Learning on Aggregates Dino Sejdinovic joint work with Leon Law,

Multiple-Step Greedy Policies in Online and Approximate Reinforcement Learning Neural Information

Approximate Q-Learning 3-25-16 Exploration policy vs. optimal policy Where do the exploration

Approximate MPC based on machine learning and probabilistic verification Sergio Lucia Technische

Partially Exchangeable Networks and Architectures for Learning Summary Statistics in Approximate

Cycle-Consistent Adversarial Learning as Approximate Bayesian Inference Louis C. Tiao 1 Edwin V.

Approximate Homomorphic Encryption and Privacy Preserving Machine Learning Jung Hee Cheon (SNU,

Learning in extended and approximate Rational Speech Acts models Christopher Potts Stanford

Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction Hal

Q Learning Forall s, a Initialize Q(s, a) = 0 Repeat Forever Where are you? s. Choose some

Approximate inference: Sampling methods Probabilistic Graphical Models Sharif University of

Two Approximate- Programmability Birds, One Statistical- Inference Stone Adrian Sampson

Learning with Approximate Kernel Embeddings Dino Sejdinovic Department of Statistics University

Approximate Dynamic Programming A. LAZARIC ( SequeL Team @INRIA-Lille ) ENS Cachan - Master 2 MVA

Empirical Comparison of Approximate Inference Algorithms for Networked Data Prithviraj Sen Lise

Approximate Reasoning for the Semantic Web Part V Approximate Resolution for OWL Frank van

Approximate Q-Learning Update Initialize weight for each feature to - PowerPoint PPT Presentation

<latexit

Approximate Q-Learning 3/23/18 On-Policy Learning (SARSA) Instead of updating based on the best

Sensors Approximate computing approximate edge detection Machine learning hidden

Approximate Q-Learning 2/24/17 State Value V(s) vs. Action Value Q(s,a) &quot; # X t r t

Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning

Time Series Forecasting With a Learning Algorithm: An Approximate Dynamic Programming Approach

Reinforcement Learning: Approximate Dynamic Programming Decision Making Under Uncertainty, Chapter

Probabilistic &amp; Unsupervised Learning Convex Algorithms in Approximate Inference Maneesh

Backward Analysis via Over-Approximate Abstraction and Under-Approximate Subtraction Alexey

Approximate Kernel Methods and Learning on Aggregates Dino Sejdinovic joint work with Leon Law,

Multiple-Step Greedy Policies in Online and Approximate Reinforcement Learning Neural Information

Approximate Q-Learning 3-25-16 Exploration policy vs. optimal policy Where do the exploration

Approximate MPC based on machine learning and probabilistic verification Sergio Lucia Technische

Partially Exchangeable Networks and Architectures for Learning Summary Statistics in Approximate

Cycle-Consistent Adversarial Learning as Approximate Bayesian Inference Louis C. Tiao 1 Edwin V.

Approximate Homomorphic Encryption and Privacy Preserving Machine Learning Jung Hee Cheon (SNU,

Learning in extended and approximate Rational Speech Acts models Christopher Potts Stanford

Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction Hal

Q Learning Forall s, a Initialize Q(s, a) = 0 Repeat Forever Where are you? s. Choose some

Approximate inference: Sampling methods Probabilistic Graphical Models Sharif University of

Two Approximate- Programmability Birds, One Statistical- Inference Stone Adrian Sampson

Learning with Approximate Kernel Embeddings Dino Sejdinovic Department of Statistics University

Approximate Dynamic Programming A. LAZARIC ( SequeL Team @INRIA-Lille ) ENS Cachan - Master 2 MVA

Empirical Comparison of Approximate Inference Algorithms for Networked Data Prithviraj Sen Lise

Approximate Reasoning for the Semantic Web Part V Approximate Resolution for OWL Frank van

Approximate Q-Learning 2/24/17 State Value V(s) vs. Action Value Q(s,a) " # X t r t

Probabilistic & Unsupervised Learning Convex Algorithms in Approximate Inference Maneesh