Towards a Unified Framework for Learning from Observation Santiago - PowerPoint PPT Presentation

Towards a Unified Framework for Learning from Observation Santiago Ontañón (IIIA-CSIC, Spain) José L. Montaña (Universidad de Cantabria, Spain) Avelino J. Gonzalez (University of Central Florida, USA)

Motivation • Many disconnected approaches in the literature • Lack of a common framework to compare

Outline • Learning from Observation • A Unified Framework • Levels of Difficulty of LFO • Statistical Formulation • Conclusions

Learning from Observation • Learn to perform a task solely by observing the external behavior of another agent

Learning from Observation • Supervised learning: learning a mapping from input variables to output variables • LfO: learning a control function (which might have internal state)

Many Approaches • Can be traced back to 1979, with different names: • Learning from Observation • Learning from Demonstration • Imitation Learning • Apprenticeship Learning • Programming by Demonstration

Many Approaches • Reinforcement Learning Techniques • Case-based Reasoning • Decision Trees, Neural Networks, etc. • Generic Algorithms • Inductive Logic Programming • Cognitive Architectures (SOAR, etc.) • etc. [Argall et al. 2009] “A survey of robot learning from demonstration”

Applications • Domains with complex behaviors: • Robotics • Computer games • Training and simulation • Automated programming • etc.

Related Problems • Inverse Reinforcement Learning: • Given behavior (optimal policy, or trajectories), learn the reward function • Workflow reconstruction / Automata discovery

Vocabulary • An environment E T C • An expert (or actor) C A • A task T perception action • A learning agent A E

Learning Traces • The learning agent A can only observe the interaction of the expert C with the environment, E, not the internal state of C: • perceptions (state of E by A): X • actions: Y LT = [( t 1 , x 1 , y 1 ) , ..., ( t n , x n , y n )]

LFO Task • Given: • A set of learning traces LT 1 , ..., LT k • An environment E (characterized by a set of input variables X, and a set of control variables Y) • Optionally, a description of the task T • Learn: • A behavior B that “behaves like” C in achieving task T in E

“Behaves like” • If no T is specified: • LFO is equivalent to learning to predict C’s actions • If T is specified: • LFO’s performance must take into account both predicting C’s actions and accomplishing T

Measuring Performance • In traditional ML, performance is measured by leaving some examples out of the training set: test set • In LFO, test set would be a set of traces • Comparing traces is not trivial • Achievement of task T must be taken into account

Measuring Performance • Evaluate performance: how well is T achieved • Evaluate output: how well the model predicts expert actions (like traditional ML) • Evaluate model: inspect the learned model (typically by human inspection)

Types of LFO Problems • Not all LFO algorithms work for all LFO problems • Common differences: • Continuous/discreet variables • Observable environment or not • etc.

Types of LFO Problems • LFO problems can be characterized depending on whether: • They require generalization or not • They require planning or not • Do we have a model of the environment

Types of LFO Problems Generalization? Planning? Known Env.? Level no no - Level 1: Strict Imitation yes no - Level 2: Reactive Behavior yes yes yes Level 3: Tactical Behavior Level 4: Tactical Behavior yes yes no in unknown environment

Level 1: Strict Imitation • No feedback required from environment • No need for generalization nor planning • The learned behavior is a strict function of time • Algorithms required: pure memorization • Example: robots in factories

Level 2: Reactive Behavior • Behavior is a ”perception to action mapping” • No need for planning • Standard (classification/regression) machine learning algorithms can be used in this level • Example: simple complete information games like pong or space invaders

Level 3: Tactical Behavior • Perception is not enough to determine behavior: • Behavior to be learned has internal state • Standard (classification/regression) machine learning algorithms cannot be used directly • Example: driving a car, or complex games (e.g. Stratego)

Statistical Formulation of LFO • Behavior as a stochastic process I = { I 1 , ..., I n } I k = ( X k , Y k ) • LFO consists on estimating the probability distribution of the stochastic process ρ ( Y k | x k , i k − 1 , ..., i 1 )

Level 1: Strict Imitation • Only the sequence of actions in the training trace has non 0 probability: ρ ( I 1 = ( x 1 , y 1 ) , ..., I n = ( x n , y n )) = 1 BT = [( x 1 , y 1 ) , ..., ( x n , y n )]

Level 2: Reactive Behavior • Reactive behavior only depends on perceptions: ρ ( Y k | x k , i k − 1 , ..., i 1 ) = ρ ( Y k | x k ) • In this case, LFO is equivalent to the traditional supervised learning problem, and each entry in a trace is one training example

Level 3: Tactical Behavior • The behavior needs some internal state (i.e. memory). Assuming only a finite amount of memory is required to learn a task: ρ ( Y k | x k , i k − 1 , ..., i 1 ) = ρ ( Y k | x k , i k − 1 , ..., i k − l ) • Where l plays a similar role as the order in a Markov process

Level 3: Tactical Behavior • Given a fixed l : • Markov process of order l can be reduced to one of order 1 • We could use supervised learning algorithms • With an explosion in the set of input features

Conclusions • Large amount of existing work in LFO • Each author uses a different framework and vocabulary • Need for unification for easy comparison of research and results

Conclusions • We presented a proposal for unified vocabulary • Classification of LFO tasks in a series of levels: • Our goal was to classify the types of algorithms needed for different types of tasks

Future Work • Performance evaluation methodology • Standard testbeds for comparison: • E.g. computer games?

Thank you!

Towards a Unified Framework for Learning from Observation Santiago - PowerPoint PPT Presentation

Towards a Unified Framework for Learning from Observation Santiago Ontan (IIIA-CSIC, Spain) Jos L. Montaa (Universidad de Cantabria, Spain) Avelino J. Gonzalez (University of Central Florida, USA) Motivation Many disconnected

Basics of Unified Sports Ways to get involved with Unified Sports in Ohio Ohio 1 What are

SARVAM UCS Unified Communication Server Unified Communication Server for Modern Enterprises

Qatar observation stations Qatar observation stations, Instruments and calibrations By By

A Unified Framework for the A Unified Framework for the Consumer-Grade Image Pipeline

UNIFIED MEMORY IN CUDA 6 MARK HARRIS NVIDIA CONFIDENTIAL Unified Memory Dramatically Lower

Unified Straight and Curved Steel Girder Design Specifications Introduction Unified Steel

UNIFIED PAYMENTS AT A GLANCE DEAR MERCHANT, WELCOME TO UNIFIED PAYMENTS! At Unified Payments,

SPORTS! Unified Basketball Special Olympics U NIFIED B ASKETBALL Unified Basketball helps

Thailand Earth Observation Activities and THEOS Thailand Earth Observation Activities and THEOS

Medicare Medicare Outpatient Observation Notice Outpatient Observation Notice Janet Miller

Rotational Momentum Observation Experiment 1 - Figure Skater Observation Experiment 2 - Diver

Expert Knowledge Makes Towards an . . . Towards an . . . Predictions More Accurate: Reference

The Joint Effort for Data assimilation Integration Observation Operators (Unified Forward

Traditional Framework Downtown vs. Unified Shopping Center Downtown Unified Shopping Center

Update to the 2020 Unified Transportation Program Texas Transportation Commission 2020 Unified

2021 Unified Transportation Program Development Texas Transportation Commission 2021 Unified

STRATEGIC PLANNING Rhode Island FORUM NUMBER ONE March 23-24, 2017 1 INTRODUCTIONS Frank

Natural Gas in the United States Report of the Potential Gas Committee (December 31, 2012)

4. Fitness landscape analysis for multi-objective optimization problems Fitness landscape

Grand Challenge Project (http://www-rnc.lbl.gov/GC/) D. Olson RHIC Off-line Computing Review 30

COMS 4721: Machine Learning for Data Science Lecture 21, 4/13/2017 Prof. John Paisley Department

RAMBO: Run-time packer Analysis with Multiple Branch Observation Xabier Ugarte-Pedrero, Davide

Farm Business Management: The Fundamentals of Good Practice Peter L. Nuthall Chapter 10

Hidden Markov Models Markov Model (Finite State Machine with Probs) Modeling a sequence of