Learning from a Learner Alexis Jacq (1,2), Matthieu Geist (1), Ana - PowerPoint PPT Presentation

Aug 22, 2022 •240 likes •378 views

Learning from a Learner Alexis Jacq (1,2), Matthieu Geist (1), Ana Paiva (2), Olivier Pietquin (1) 1 Google Research, Brain team 2 Instituto Superior Tecnico, University of Lisbon Goal: You want to learn an optimal behaviour by watching others

Learning from a Learner Alexis Jacq (1,2), Matthieu Geist (1), Ana Paiva (2), Olivier Pietquin (1) 1 Google Research, Brain team 2 Instituto Superior Tecnico, University of Lisbon
Goal: You want to learn an optimal behaviour by watching others learning t=20 Learner improvements t=0
Goal: You want to learn an optimal behaviour by watching others learning t=20 Infer Learner rewards t=0
Goal: You want to learn an optimal behaviour by watching others learning t=20 Infer Learner rewards Observer t=0 (after training with inferred reward)
Applications: - You can observe an agent that learns through RL but do not see its reward - You can observe somebody training but have limited access to the environment - You were able to build increasingly good policies for your task but can’t tell why
Assume the learner is optimizing a regularized objective:
The value of a state-action couple is given by the fixed point of the (regularized) bellman equation: And one can show that the softmax: is an improvement of the policy. Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. ICML, 2018.
Given the two consecutive policies, one can recover the reward function: Up to a shaping that does not modify the optimal policy of the regularized Markov Decision Process:
Result with exact soft policy improvements in gridworld:
Result with exact soft policy improvements in gridworld: Ground truth reward. Recovered reward function by inverting soft policy improvement. Knowing the reward is state-only.
Result with mujoco and proximal policy iterations: (Red) Evolution of the learner 's score during its observed improvements. (Blue) Evolution of the observer 's score when training on the same environment and using the recovered reward function.
Poster: 06:30 -- 09:00 PM Room Pacific Ballroom

Recommend

Investigating the scope of textual metrics for learner level discrimination and learner analytics

Learner Corpus Research 2019 - 12-14 September Investigating the scope of textual metrics for learner level discrimination and learner analytics Nicolas Ballier Thomas Gaillat Problem statement Learning a language For individuals >

645 views • 26 slides

Learner Motivation Motivational Self-Reflection Self-Reflection Time Travel Think about a time

Learner Motivation Motivational Self-Reflection Self-Reflection Time Travel Think about a time when you were capable of doing something, but lacked the necessary motivation. Principles of Learner Motivation What is Learner Motivation? Learner

492 views • 30 slides

Thoughts on Learner Data and Motivation Learner Language Dependency Parsing and Dependency

Thoughts on Learner Data and Dependency Parsing Niels Ott, Ramon Ziai, Julia Krivanek, Detmar Meurers Introduction and Thoughts on Learner Data and Motivation Learner Language Dependency Parsing and Dependency Annotation Approximated

812 views • 53 slides

Learner guides Articulate the learning goals Discussion question Learner guides Activities

Learner guides Articulate the learning goals Discussion question Learner guides Activities Put a Goal on It: WebJunction & Self-Directed Learning November 12, 2014 Kathleen Gesinger Jennifer Peterson Ahniwa Ferrari Todays Agenda

797 views • 57 slides

Learning from Snapshot Examples Jacob Beal MIT CSAIL April, 2005 Associating a Lemon Mind

Learning from Snapshot Examples Jacob Beal MIT CSAIL April, 2005 Associating a Lemon Mind Learner Associating a Lemon Mind Learner Associating a Lemon Mind Learner Space is cluttered with objects Associating a Lemon Mind Learner

697 views • 40 slides

Roadmap On annotating On annotating learner corpora learner corpora Detmar Meurers Detmar

ICALL: Part IV ICALL: Part IV Roadmap On annotating On annotating learner corpora learner corpora Detmar Meurers Detmar Meurers Intelligent Computer-Assisted Language Learning Universit at T ubingen Universit at T ubingen

571 views • 13 slides

The importance of Learner involvement in Adult Education John Gates A learner from Wales,

The importance of Learner involvement in Adult Education John Gates A learner from Wales, UK I believe that adult learners have the right to be involved in decisions that affect their learning journey Prof. Bob Fryer (TSB Forum 2000 )

134 views • 12 slides

English Learner Advisory Committee Third Meeting Chaparral High School January 28, 2020 1

English Learner Advisory Committee Third Meeting Chaparral High School January 28, 2020 1 Welcome and Introductions 2 What is the ELAC? The ELAC (English Learner Advisory Committee) is a group of English Learner parents and community

473 views • 21 slides

The Learner Record Index The Learner Record Index Building a better student Building a better

The Learner Record Index The Learner Record Index Building a better student Building a better student experience together experience together What is the LRI ? The ability to match and identify students across multiple organizations when the SS#

719 views • 42 slides

Unpacking the Learner- Selection Suitcase : A Synthesis of Evaluation Findings from Learner

Unpacking the Learner- Selection Suitcase : A Synthesis of Evaluation Findings from Learner Directed Educational Improvement Initiatives. Benita Williams Mobile: +27 82 772 9709 bwilliams@feedbackpm.com Case Study Questions 1) You

243 views • 23 slides

Writing for the Futurelearn learner Matthew Moran (The Open University) Writing for the

Writing for the Futurelearn learner Matthew Moran (The Open University) Writing for the Futurelearn learner Matthew Moran (The Open University) Writing for the Futurelearn learner Matthew Moran (The Open University) Key points Understanding

290 views • 16 slides

The IB Learner Profile Presentation to Parents April 6 th , 2009 7:00 PM What is the IB Learner

The IB Learner Profile Presentation to Parents April 6 th , 2009 7:00 PM What is the IB Learner Profile? It is a list of attributes identified by the IBO that express the values inherent to the IB continuum of international education It

501 views • 24 slides

English Learner Programs DELAC Presentation September 25, 2017 1 Objectives 1. Purpose of

Saratoga Union School District English Learner Programs DELAC Presentation September 25, 2017 1 Objectives 1. Purpose of DELAC 2. Introduce the ELD team 3. Legal requirements for English learner instruction 4. SUSD English learner

276 views • 25 slides

A Teacher for Every Learner Scalable Learner-Centered Systems Team: Bryant York, Andy van Dam,

A Teacher for Every Learner Scalable Learner-Centered Systems Team: Bryant York, Andy van Dam, Jeff Ullman, Elliot Soloway, Jordan Pollack, Alan Kay, Tom Kalil. 1 20-Year Vision ! Information Technology enables all learners to participate in

557 views • 14 slides

NLP for Non-Canonical Language and Nature of Categories Learner Language POS example Syntax

NLP for Non-Canonical Language and Learner Language Detmar Meurers Why analyze Learner Language NLP for Non-Canonical Language and Nature of Categories Learner Language POS example Syntax Importance of tasks and learners Detmar

492 views • 10 slides

English Learner Parent Advisory Council Information Session Monday, December 17, 2018 Public

English Learner Parent Advisory Council Information Session Monday, December 17, 2018 Public Schools of Brookline Office of English Learner Education Terms ELPAC: English Learner Parent Advisory Council LOOK Act: Language Opportunities

269 views • 13 slides

Birth of a De Facto Standard Message Passing Interface Al Geist ORNL Celebrating 25 years of

Birth of a De Facto Standard Message Passing Interface Al Geist ORNL Celebrating 25 years of MPI September 25, 2017 ANL ORNL is managed by UT-Battelle for the US Department of Energy Birth of a De Facto Standard Or How I Stopped Worrying

125 views • 12 slides

Computational Social Choice: Spring 2019 Ulle Endriss Institute for Logic, Language and

Automated Reasoning for SCT COMSOC 2019 Computational Social Choice: Spring 2019 Ulle Endriss Institute for Logic, Language and Computation University of Amsterdam Ulle Endriss 1 Automated Reasoning for SCT COMSOC 2019 Plan for Today

323 views • 28 slides

Term Co-Occurrence VSM, session 11 CS6200: Information Retrieval Slides by: Jesse Anderton

Term Co-Occurrence VSM, session 11 CS6200: Information Retrieval Slides by: Jesse Anderton Query Expansion We can add words with similar meanings to query terms, e.g. from stem classes or a thesaurus. We can also add words which commonly

316 views • 10 slides

S. Guatelli, J. Brown, S. Incerti, V. Ivanchenko, L. Pandola Geant4 Collaboration Workshop 2015

S. Guatelli, J. Brown, S. Incerti, V. Ivanchenko, L. Pandola Geant4 Collaboration Workshop 2015 K. Amako et al, IEEE TNS, 52(4), 910- 918, 2005. Comparison of Attenuation coefficients Stopping Power and Range of e - , p and

529 views • 11 slides

Are Killer Apps Killing Exascale? Al Geist Corporate Fellow Oak Ridge National Lab CCDSC 2016

Are Killer Apps Killing Exascale? Al Geist Corporate Fellow Oak Ridge National Lab CCDSC 2016 Lyon France October 4, 2016 ORNL is managed by UT-Battelle for the US Department of Energy This is HUGE! This is HUGE! I love this U.S.

325 views • 10 slides

Das Gehirn eines Buddha: Wie wir zu Strke und innerem Frieden finden knnen Parabola Forum,

Das Gehirn eines Buddha: Wie wir zu Strke und innerem Frieden finden knnen Parabola Forum, Zrich 24. April 2014 Rick Hanson, Ph.D. The Wellspring Institute for Neuroscience and Contemplative Wisdom WiseBrain.org RickHanson.net Themen

838 views • 81 slides

Refactoring Earthquake-Tsunami Causality and Messaging via Big Data Analytics: The Transformative

Refactoring Earthquake-Tsunami Causality and Messaging via Big Data Analytics: The Transformative Potential of Credible Tweets L. I. Lumb 1,2 & J. R. Freemantle 3 1 York University, 2 Univa Corporation & 3 Independent MCBDA 2016 (First

406 views • 27 slides

Lecture 24: Machine Learning for HPC Abhinav Bhatele, Department of Computer Science Summary of

High Performance Computing Systems (CMSC714) Lecture 24: Machine Learning for HPC Abhinav Bhatele, Department of Computer Science Summary of last lecture Discrete-event simulations (DES) Parallel DES: conservative vs. optimistic

681 views • 17 slides