a bayesian approach to generative adversarial imitation
play

A Bayesian Approach to Generative Adversarial Imitation Learning - PowerPoint PPT Presentation

A Bayesian Approach to Generative Adversarial Imitation Learning NeurIPS 2018 Presenter Wonseok Jeon @ KAIST Joint work with Seokin Seo @ KAIST Kee-Eung Kim @ KAIST & PROWLER.io Imitation Learning A Markov decision process (MDP)


  1. A Bayesian Approach to Generative Adversarial Imitation Learning NeurIPS 2018 Presenter Wonseok Jeon @ KAIST Joint work with Seokin Seo @ KAIST Kee-Eung Kim @ KAIST & PROWLER.io

  2. Imitation Learning • A Markov decision process (MDP) without cost • A policy Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

  3. Imitation Learning • A Markov decision process (MDP) without cost • A policy • Instead, there is a set of expert’s demonstrations : • Learn a policy that mimics well. Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

  4. Generative Adversarial Imitation Learning (GAIL) • Use generative adversarial networks (GANs) for imitation learning: 1. Sample trajectories by using and (expert demonstrations). 2. Train discriminator. 3. Update policy by using reinforcement learning (RL), e.g., TRPO, PPO. Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

  5. Generative Adversarial Imitation Learning (GAIL) I don’t want to • move a lot… GAIL requires model-free RL inner loops. • The environment simulation is required. • Sample-efficiency issues • Obtaining trajectory samples from the environment is often very costly, e.g., physical robots in a real world. Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

  6. Generative Adversarial Imitation Learning (GAIL) I don’t want to • move a lot… GAIL requires model-free RL inner loops. • The environment simulation is required. • Sample-efficiency issues • Obtaining trajectory samples from the environment is often very costly, e.g., physical robots in a real world. • Motivation • For each iteration, the discriminator is updated by using minibatches. • How about using Bayesian classification to train discriminator? • Expected to make more refined cost function for imitation learning! Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

  7. Bayesian Framework for GAIL • Probabilistic model for trajectories • For each trajectories , a sequence of state-action pairs satisfies Markov property : trajectory Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

  8. Bayesian Framework for GAIL • Probabilistic model for trajectories • For each trajectories , a sequence of state-action pairs satisfies Markov property : • Two policies: agent’s policy , expert’s policy agent’s expert’s trajectory trajectory Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

  9. Bayesian Framework for GAIL • Role of discriminator • The probability that models whether comes from the expert or the agent trajectory discriminator agent’s expert’s trajectory trajectory Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

  10. Bayesian Framework for GAIL • Posterior distributions • Posterior for discriminator (conditioned on perfect trajectory discrimination) • Posterior for policy (conditioned on preventing perfect discrimination) Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

  11. Bayesian Framework for GAIL • Posterior distributions • Posterior for discriminator (conditioned on perfect trajectory discrimination) GAIL uses maximum likelihood estimation (MLE) for both policy and discriminator updates! • Posterior for policy (conditioned on preventing perfect discrimination) Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

  12. Bayesian GAIL: GAIL with Posterior-Predictive Cost • The objective is reinforcement posterior-predictive learning cost Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

  13. Bayesian GAIL: GAIL with Posterior-Predictive Cost • The objective is reinforcement posterior-predictive learning cost • Learning Curve for 5 MuJoCo tasks! Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

  14. Bayesian GAIL: GAIL with Posterior-Predictive Cost • The objective is reinforcement posterior-predictive For more information, please come to our poster session! learning cost • Wed Dec 5th 5-7 PM @ Room 210 & 230 AB #129 Learning Curve for 5 MuJoCo tasks! Thanks! Wonseok Jeon @ KAIST A Bayesian Approach to Generative Adversarial Imitation Learning

Recommend


More recommend