Learning Novel Policies For Tasks Yunbo Zhang, Wenhao Yu, Greg Turk - PowerPoint PPT Presentation

Sep 07, 2022 •375 likes •499 views

Learning Novel Policies For Tasks Yunbo Zhang, Wenhao Yu, Greg Turk Motivation Want more than one solution (i.e. novel solutions) to a problem. E.g. Different Locomotion styles for legged robots. Style 1 Style 2 Style 3 Key Aspects

Learning Novel Policies For Tasks Yunbo Zhang, Wenhao Yu, Greg Turk
Motivation • Want more than one solution (i.e. novel solutions) to a problem. • E.g. Different Locomotion styles for legged robots. Style 1 Style 2 Style 3
Key Aspects • Novelty measurement function • Measures the novelty of a trajectory compared with trajectories from other policies • Policy Gradient Update • Make sure final gradient compromises between task and novelty • Task-Novelty Bisector (TNB)
Method Overview • Define a separate novelty reward function apart from task reward. • Train a policy using Task-Novelty Bisector (TNB) to balance the optimization of task and novelty. • Update novelty measurement function. • Repeat
Novelty Measurement • Use autoencoder reconstruction error of state sequences to compute novelty. • One autoencoder for each policy. • For the set of autoencoders 𝑬 = {𝐸 % , … , 𝐸 ( } , the novelty reward function is: 𝐸 < 𝒕 − 𝒕 > ) 9∈𝑬 ‖ ‖ 𝑠 +,-./ = −exp (−𝑥 +,-./ min
Task-Novelty Bisector (TNB) • Compute policy gradients for task reward and novelty reward 𝑕 ABCD = 𝜖𝐾 ABCD 𝑕 +,-./ = 𝜖𝐾 +,-./ 𝜖𝜄 𝜖𝜄 • Compute the final policy gradient using the following rules: or
Multiple Solutions PPO Policy End-Effector Target
Multiple Solutions TNB Policies
Deceptive Reward Problems • Our methods could be further extended to solve tasks with deceptive reward signals. • E.g. Deceptive Reacher Target End-Effector
Deceptive Reward Problems TNB Policies
Thank You! Poster: Pacific Ballroom #37

Recommend

Novel Gaits for a Novel Novel Gaits for a Novel Crawling/Grasping Mechanism Crawling/Grasping

Novel Gaits for a Novel Novel Gaits for a Novel Crawling/Grasping Mechanism Crawling/Grasping Mechanism Richard Voyles Department of Computer Science and Engineering University of Minnesota AMAM 2000 University of Minnesota Department of

293 views • 27 slides

Shared Memory Programming with OpenMP Lecture 6: Tasks What are tasks? Tasks are

Shared Memory Programming with OpenMP Lecture 6: Tasks What are tasks? Tasks are independent units of work Tasks are composed of: code to execute data to compute with Threads are assigned to perform the work of each task.

381 views • 20 slides

Scheduling Aperiodic Tasks Background Scheduling Treat aperiodic tasks as lowest-priority

Scheduling Aperiodic Tasks Background Scheduling Treat aperiodic tasks as lowest-priority tasks Hybrid task set: periodic tasks + aperiodic tasks Advantages Problem: Arrival time is unknown Sporadic task with a hard deadline

212 views • 3 slides

Learning Tasks in Practice How to Make Use of COMET Learning Tasks in Vocational Schools

Lars Heinemann, Thomas Scholz Learning Tasks in Practice How to Make Use of COMET Learning Tasks in Vocational Schools Johannesburg, April 23rd 2013 Lars Heinemann COMET Tasks: Holistic Problem Solving and Work Process Knowledge Lars

201 views • 9 slides

RECENT PROGRESS ON WEB SERVICES FOR SFT Nefeli Kousi TASKS TASKS ROOT Primer to Notebooks

RECENT PROGRESS ON WEB SERVICES FOR SFT Nefeli Kousi TASKS TASKS ROOT Primer to Notebooks TASKS ROOT Primer to Notebooks PIWIK for CERNVM TASKS ROOT Primer to Notebooks PIWIK for CERNVM Websites: EP-SFT EP-DT

763 views • 50 slides

Time Management Beth Asbury Outline Time Bandits Scheduling tasks Prioritising tasks

Time Management Beth Asbury Outline Time Bandits Scheduling tasks Prioritising tasks Energy levels Workload analysis Frog sheets Outline Time Bandits Scheduling tasks Prioritising tasks Energy levels

737 views • 26 slides

Slide 1 Page: 1 Mathematical Tasks.ppt Effective Mathematics Instruction: The Role of

Mathematical Tasks.ppt Slide 1 Page: 1 Mathematical Tasks.ppt Effective Mathematics Instruction: The Role of Mathematical Tasks Page: 2 Mathematical Tasks.ppt How to Participate Page: 3 Mathematical Tasks.ppt Communication Tools Page: 4

719 views • 61 slides

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning Outcomes Define mobile learning. Describe policies for mobile learning. Identify mobile learning technologies. Definition of Mobile Learning

524 views • 16 slides

The Future is Light John Cronin AUT University, Auckland NZ Wearable Resistance (W (WR) Novel

WEARABLE RESISTANCE TRAINING The Future is Light John Cronin AUT University, Auckland NZ Wearable Resistance (W (WR) Novel - No! Wearable Resistance Novel - No! Wearable Resistance Novel - No! Wearable Resistance Novel Technology

920 views • 35 slides

Separation of End User vs Computer Services Tasks End User Tasks Computer Services

Separation of End User vs Computer Services Tasks End User Tasks Computer Services Tasks Maintain Print reports, Create data files Enter data application letters, labels for end users components Support user Train users

331 views • 4 slides

The microkernel OS Escape Nils Asmussen FOSDEM14 1 / 25 Introduction Tasks Memory VFS

Introduction Tasks Memory VFS Demo The microkernel OS Escape Nils Asmussen FOSDEM14 1 / 25 Introduction Tasks Memory VFS Demo Outline Introduction 1 Tasks 2 Memory 3 VFS 4 Demo 5 2 / 25 Introduction Tasks Memory VFS

618 views • 26 slides

10472 10316 Mentor: Prof.Amitbha Mukerjee amit@cse.iitk.ac.in 4 tasks 4 tasks

P.Yaswanth Kumar Jitendra Kumar 10472 10316 Mentor: Prof.Amitbha Mukerjee amit@cse.iitk.ac.in 4 tasks 4 tasks Counting number of 1) characters A 2) green bars 3) horizontal bars 4) vertical bars 4 tasks Counting

876 views • 18 slides

Verilog HDL:Digital Design and Modeling Chapter 10 Tasks and Functions Chapter 10 Tasks and

Chapter 10 Tasks and Functions 1 Verilog HDL:Digital Design and Modeling Chapter 10 Tasks and Functions Chapter 10 Tasks and Functions 2 Page 584 //module to illustrate a task module task_arith_log; reg [7:0] a, b, c; reg

638 views • 16 slides

Course Policies & Themes CS 795/895 machine Learning Steven J Zeil Old Dominion Univ. Fall

Policies Themes Plans Introductions Course Policies & Themes CS 795/895 machine Learning Steven J Zeil Old Dominion Univ. Fall 2010 1 Policies Themes Plans Introductions Outline Policies 1 Themes 2 What is Machine Learning?

558 views • 30 slides

City of Rock Hill Financial Policies 2016 What are Financial Policies? Financial Policies are a

City of Rock Hill Financial Policies 2016 What are Financial Policies? Financial Policies are a method of institutionalizing good financial management practices in the organization. They clarify and crystallize the strategic intent for

216 views • 18 slides

Chapter 8 Applying Thread Pools Magnus Andersson Execution policies Not all task are

Chapter 8 Applying Thread Pools Magnus Andersson Execution policies Not all task are suitable for all execution policies Dependent task Task exploiting thread confinement Response time sensitive tasks ThreadLocal tasks

358 views • 32 slides

Trust Region Policy Optimization (TRPO) John Schulman, Sergey Levine, Philipp Moritz, Michael I.

Trust Region Policy Optimization (TRPO) John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel Presenter: Jingkang Wang Date: January 21, 2020 A Taxonomy of RL Algorithms We are here! Image credit: OpenAI Spinning Up,

531 views • 39 slides

A Distributed and Stochastic Algorithmic Framework for Active Matter Sarah Cannon 1 Joshua Daymude

A Distributed and Stochastic Algorithmic Framework for Active Matter Sarah Cannon 1 Joshua Daymude 2 Daniel Goldman 3 Shengkai Li 3 Dana Randall 1 Andrea Richa 2 Will Savoie 3 . . 1 Georgia Institute of Technology, CS 2 Arizona State University, CS

369 views • 20 slides

Geometric Methods for Modelling and Control of Shape-Actuated Underwater Vehicles Kristi A.

Geometric Methods for Modelling and Control of Shape-Actuated Underwater Vehicles Kristi A. Morgansen Department of Aeronautics and Astronautics University of Washington Nonlinear Dynamics and Control Lab Modeling and control of

520 views • 32 slides

Whats Wrong with Meta -Learning (and how we might fix it) Sergey Levine UC Berkeley Google

Whats Wrong with Meta -Learning (and how we might fix it) Sergey Levine UC Berkeley Google Brain Yahya, Li, Kalakrishnan, Chebotar , Levine, 16 Kalashnikov, Irpan, Pastor, Ibarz, Herzong, Jang, Quillen, Holly, Kalakrishnan, Vanhoucke,

619 views • 34 slides

Optimal control models of the goal-oriented human locomotion (with Y. Chitour, F. Chittaro, and

Optimal control models of the goal-oriented human locomotion (with Y. Chitour, F. Chittaro, and P. Mason) Fr ed eric Jean (ENSTA ParisTech, Paris) Nonlinear Control and Singularities October 24-28, 2010 F. Jean (ENSTA ParisTech)

365 views • 32 slides

seL4 Microkernel Status Update Gernot Heiser | gernot.heiser@data61.csiro.au | @GernotHeiser

seL4 Microkernel Status Update Gernot Heiser | gernot.heiser@data61.csiro.au | @GernotHeiser FOSDEM, Bruxelles, 2020-02-02 https://trustworthy.systems What is seL4? seL4: Assurance and Performance The worlds first operating- Worlds

253 views • 23 slides

Reference Spreading Hybrid Control Exploiting Dynamic Contact Transitions in Robotics Applications

Reference Spreading Hybrid Control Exploiting Dynamic Contact Transitions in Robotics Applications Alessandro Saccon OptHySYS Workshop Trento, January 9-11, 2017 Robotic Locomotion and Manipulation HRP-2 Durus Atlas iCub Walkman Amigo

498 views • 48 slides

Semantic Modeling with Frames Rainer Osswald & Wiebke Petersen Department of Linguistics and

Semantic Modeling with Frames Rainer Osswald & Wiebke Petersen Department of Linguistics and Information Science Heinrich-Heine-Universit at D usseldorf ESSLLI 2018 Introductory Course Sofia University 06. 08. 10. 08. 2018 SFB

1.24k views • 82 slides