A Direct Policy-Search Algorithm for Relational Reinforcement - PowerPoint PPT Presentation

Introduction CERRLA Evaluation Conclusion and Remarks A Direct Policy-Search Algorithm for Relational Reinforcement Learning Samuel Sarjant Bernhard Pfahringer, Kurt Driessens, Tony Smith Department of Computer Science University of Waikato, New Zealand 29 th August, 2013 A Direct Policy-Search Algorithm for Relational Reinforcement Learning Samuel Sarjant

Introduction CERRLA Evaluation Conclusion and Remarks Introduction ◮ Relational Reinforcement Learning (RRL) is a representational generalisation of Reinforcement Learning. ◮ Uses policy to select actions when provided state observations to maximise reward . ◮ Value-based RRL affected by number of states and may require predefined abstractions or expert guidance. ◮ Direct policy-search only needs to encode ideal action, hypothesis-driven learning. ◮ We use the Cross-Entropy Method (CEM) to learn policies. A Direct Policy-Search Algorithm for Relational Reinforcement Learning Samuel Sarjant

Introduction CERRLA Evaluation Conclusion and Remarks Cross-Entropy Method ◮ In broad terms, the Cross-Entropy Method consists of these phases: ◮ Generate samples x ( 1 ) , . . . , x ( n ) from a generator and evaluate them f ( x ( 1 ) ) , . . . , f ( x ( n ) ) . ◮ Alter the generator such that it is more likely to produce the highest valued samples again. ◮ Repeat until converged. ◮ No worse than random, then iterative improvement. ◮ Multiple generators produce combinatorial samples. A Direct Policy-Search Algorithm for Relational Reinforcement Learning Samuel Sarjant

Introduction CERRLA Evaluation Conclusion and Remarks CERRLA ◮ The Cross-Entropy Relational Reinforcement Learning Agent (C ERRLA ) applies the CEM to RRL. ◮ The CEM generator consists of multiple distributions of condition-action rules. ◮ A sample is a decision-list (policy) of rules. ◮ The generator is altered to produce the rules used in highest valued policies more often. ◮ Two parts to C ERRLA : Rule Discovery and Probability Optimisation . clear(A) , clear(B) , block(A) → move(A, B) above(X, B) , clear(X) , floor(Y) → move(X, Y) above(X, A) , clear(X) , floor(Y) → move(X, Y) A Direct Policy-Search Algorithm for Relational Reinforcement Learning Samuel Sarjant

Introduction CERRLA Evaluation Conclusion and Remarks Rule Discovery ◮ Rules are created by first identifying pseudo-RLGG rules for each action. ◮ Each rule can then produce more specialised rules by: ◮ Adding a single literal to the rule conditions. ◮ Replacing a variable with a goal variable. ◮ Splitting numerical ranges into smaller partitions. ◮ All information makes use of lossy inverse substitution. Example · The RLGG for the Blocks World move action is: clear(X), clear(Y), block(X) → move(X, Y) · Specialisations include: highest(X), floor(Y), X/A, . . . A Direct Policy-Search Algorithm for Relational Reinforcement Learning Samuel Sarjant

Introduction CERRLA Evaluation Conclusion and Remarks Relative Least General Generalisation Rules* For the moveTo action: 1. edible ( g 1 ) , ghost ( g 1 ) , distance ( g 1 , 5 ) , thing ( g 1 ) → moveTo ( g 1 , 5 ) 2. edible ( g 2 ) , ghost ( g 2 ) , distance ( g 2 , 8 ) , thing ( g 2 ) → moveTo ( g 2 , 8 ) RLGG 1 , 2 edible ( X ) , ghost ( X ) , distance ( X , ( 5 . 0 ≤ D ≤ 8 . 0 )) , thing ( X ) → moveTo ( X , D ) 3. distance ( d 3 , 14 ) , dot ( d 3 ) , thing ( d 3 ) → moveTo ( d 3 , 14 ) RLGG 1 , 2 , 3 edible ( X ) , ghost ( X ) , distance ( X , ( 5 . 0 ≤ D ≤ 14 . 0 )) , thing ( X ) → moveTo ( X , D ) * Closer to LGG, as background knowledge is explicitly known. A Direct Policy-Search Algorithm for Relational Reinforcement Learning Samuel Sarjant

Introduction CERRLA Evaluation Conclusion and Remarks Simplification Rules ◮ Simplification rules are also inferred from the environment. ◮ They are used to remove redundant conditions and identify illegal combinations. ◮ Use the same RLGG process, but only using state facts. ◮ We can infer the set of variable form untrue conditions for a state to use negated terms in simplification rules. Example · When on(X, Y) is true, above(X, Y) is true · on(X, Y) ⇒ above(X, Y) · block(X) ⇔ not(floor(X)) A Direct Policy-Search Algorithm for Relational Reinforcement Learning Samuel Sarjant

b b b b b b b b b b b b b b b b b b Introduction CERRLA Evaluation Conclusion and Remarks Initial Rule Distributions ◮ Initial rule distributions consist of RLGG distributions and all immediate specialisations. RLGG → moveTo(X) RLGG + RLGG + edible(X) → moveTo(X) blinking(X) → moveTo(X) RLGG + RLGG + ghost(X) → moveTo(X) ¬ edible(X) → moveTo(X) RLGG + dot(X) → moveTo(X) A Direct Policy-Search Algorithm for Relational Reinforcement Learning Samuel Sarjant

Introduction CERRLA Evaluation Conclusion and Remarks Probability Optimisation ◮ A policy consists of multiple rules. ◮ Each rule comes from a separate distribution. ◮ Rule usage and position are determined by CEM controlled probabilities. ◮ Each policy is tested three times. Distribution A Distribution B Distribution C a 1 : 0 . 6 b 1 : 0 . 33 c 1 : 0 . 7 Example policy a 2 : 0 . 2 b 2 : 0 . 33 c 2 : 0 . 05 a 3 : 0 . 15 b 3 : 0 . 33 c 3 : 0 . 05 a 1 . . b 3 . . . . c 1 p ( D A ) = 1 . 0 p ( D B ) = 0 . 5 p ( D C ) = 0 . 3 q ( D A ) = 0 . 0 q ( D B ) = 0 . 5 q ( D C ) = 0 . 8 A Direct Policy-Search Algorithm for Relational Reinforcement Learning Samuel Sarjant

Introduction CERRLA Evaluation Conclusion and Remarks Updating Probabilities ◮ A subset of samples make up the floating elite samples. ◮ The observed distribution is the distribution of rules in the elites. ◮ Observed rule probability equal to frequency of rules. ◮ Observed p ( D ) equal to proportion of elite policies using D . ◮ Observed q ( D ) equal to average relative position [ 0 , 1 ] . ◮ Probabilities are updated in a stepwise fashion towards the observed distribution. p i ← α · p ′ i + ( 1 − α ) · p i A Direct Policy-Search Algorithm for Relational Reinforcement Learning Samuel Sarjant

Introduction CERRLA Evaluation Conclusion and Remarks Updating Probabilities, Contd. ◮ When a rule is sufficiently probable, it branches, seeding a new candidate rule distribution. ◮ More and more specialised rules are created until further branches are not useful. ◮ Stopping Condition: A seed rule cannot branch again. ◮ Convergence occurs when each distribution converges (no significant updates). A Direct Policy-Search Algorithm for Relational Reinforcement Learning Samuel Sarjant

Introduction CERRLA Evaluation Conclusion and Remarks Summary Initialise the distribution set D repeat Generate a policy π from D Evaluate π , receiving average reward R Update elite samples E with sample π and value R Update D using E Specialise rules (if D is ready) until D has converged A Direct Policy-Search Algorithm for Relational Reinforcement Learning Samuel Sarjant

A Direct Policy-Search Algorithm for Relational Reinforcement - PowerPoint PPT Presentation

Introduction CERRLA Evaluation Conclusion and Remarks A Direct Policy-Search Algorithm for Relational Reinforcement Learning Samuel Sarjant Bernhard Pfahringer, Kurt Driessens, Tony Smith Department of Computer Science University of

Chapter 2: Relational Model Chapter 2: Relational Model Structure of Relational Databases

Chapter 3: Relational Model Structure of Relational Databases Relational Algebra Tuple

Relational Algebra Relational Query Languages Recall: Query = Retrieval Program Language

Relational Algebra 1 / 39 Relational Algebra Relational model specifies stuctures and

Relational Query Languages (2) SQL and QBE Walid G. Aref Query Languages For The Relational

Chapter 8 Evaluation of Relational Operators Implementing the Relational Algebra Relational

Relational Calculus More declarative than relational algebra Foundation for query

RELATIONAL ALGEBRA CHAPTER 6 1 CHAPTER 6 OUTLINE Unary Relational Operations: SELECT and

Relational Data Model Hacettepe University Computer Engineering Department Outline 1. Relational

This Lecture The Relational Model Relational data structures Relations and Relational

Relational Non-Relational Rational Agile Predictable Flexible Traditional

CSE 154 LECTURE 13:RELATIONAL DATABASES AND SQL Relational databases relational database : A

CSC 337 LECTURE 20: RELATIONAL DATABASES AND SQL Relational databases relational database : A

Relational Calculus Another Theoretical QL-Relational Calculus Comes in two flavors: Tuple

Extended RA Database Systems: The Complete Book Ch 5.1-5.2, 15.4 1 Relational Algebra A Set of

Exploiting Synchrony and Symmetry in Relational Verification Lauren Pick 1 Relational

Environmental Hydrodynamic Modelling Applied to Extreme Events in Caribbean and Mediterranean

PASS/EQUIP Overview Webinar Comprehensive Software for Structural Pressure Vessels Analysis

Finance Ijara & its Application www.bibf.com www.bibf.com The Instructor Senior Lecturer Tel:

Turkish Economy: Recent Developments February 2011 -16 -12 12 -8 -4 0 4 8 GDP Growth

Enabling low cost tidal energy. Andrew Scott Chief Executive Officer Scotrenewables Tidal Power

FOOD TRUCK Nassau County Food & Nutrition Services April 17th, 2018 Goals Needed a

e-Governance Roadmap for Good Governance in the state of NCT of Delhi An initiative of the

Government of India Guidelines for Swachh Bharat Mission (SBM) December 2014 CONTENTS: 1.

A Direct Policy-Search Algorithm for Relational Reinforcement - PowerPoint PPT Presentation

Introduction CERRLA Evaluation Conclusion and Remarks A Direct Policy-Search Algorithm for Relational Reinforcement Learning Samuel Sarjant Bernhard Pfahringer, Kurt Driessens, Tony Smith Department of Computer Science University of

Chapter 2: Relational Model Chapter 2: Relational Model Structure of Relational Databases

Chapter 3: Relational Model Structure of Relational Databases Relational Algebra Tuple

Relational Algebra Relational Query Languages Recall: Query = Retrieval Program Language

Relational Algebra 1 / 39 Relational Algebra Relational model specifies stuctures and

Relational Query Languages (2) SQL and QBE Walid G. Aref Query Languages For The Relational

Chapter 8 Evaluation of Relational Operators Implementing the Relational Algebra Relational

Relational Calculus More declarative than relational algebra Foundation for query

RELATIONAL ALGEBRA CHAPTER 6 1 CHAPTER 6 OUTLINE Unary Relational Operations: SELECT and

Relational Data Model Hacettepe University Computer Engineering Department Outline 1. Relational

This Lecture The Relational Model Relational data structures Relations and Relational

Relational Non-Relational Rational Agile Predictable Flexible Traditional

CSE 154 LECTURE 13:RELATIONAL DATABASES AND SQL Relational databases relational database : A

CSC 337 LECTURE 20: RELATIONAL DATABASES AND SQL Relational databases relational database : A

Relational Calculus Another Theoretical QL-Relational Calculus Comes in two flavors: Tuple

Extended RA Database Systems: The Complete Book Ch 5.1-5.2, 15.4 1 Relational Algebra A Set of

Exploiting Synchrony and Symmetry in Relational Verification Lauren Pick 1 Relational

Environmental Hydrodynamic Modelling Applied to Extreme Events in Caribbean and Mediterranean

PASS/EQUIP Overview Webinar Comprehensive Software for Structural Pressure Vessels Analysis

Finance Ijara &amp; its Application www.bibf.com www.bibf.com The Instructor Senior Lecturer Tel:

Turkish Economy: Recent Developments February 2011 -16 -12 12 -8 -4 0 4 8 GDP Growth

Enabling low cost tidal energy. Andrew Scott Chief Executive Officer Scotrenewables Tidal Power

FOOD TRUCK Nassau County Food &amp; Nutrition Services April 17th, 2018 Goals Needed a

e-Governance Roadmap for Good Governance in the state of NCT of Delhi An initiative of the

Government of India Guidelines for Swachh Bharat Mission (SBM) December 2014 CONTENTS: 1.

Finance Ijara & its Application www.bibf.com www.bibf.com The Instructor Senior Lecturer Tel:

FOOD TRUCK Nassau County Food & Nutrition Services April 17th, 2018 Goals Needed a