Learning to Collaborate in Markov Decision Processes Goran Radanovic - PowerPoint PPT Presentation

Mar 15, 2024 •140 likes •223 views

Learning to Collaborate in Markov Decision Processes Goran Radanovic , Rati Devidze, David C. Parkes, Adish Singla Motivation: Human-AI Collaboration Example setting Helper-AI Human Agent A1 Agent A2 Task (Best) responds Commits to to !

Learning to Collaborate in Markov Decision Processes Goran Radanovic , Rati Devidze, David C. Parkes, Adish Singla
Motivation: Human-AI Collaboration Example setting Helper-AI Human Agent A1 Agent A2 Task (Best) responds Commits to to ! " policy ! " Behavioral differences Agents have different models of the world [Dimitrakakis et al., NIPS 2017] 2
Motivation: Human-AI Collaboration Helper-AI Human Agent A1 Agent A2 Task ! # changes Humans change/adapt their behavior over Commits to over time policy ! " time. Can we utilize learning to adopt a good policy for A1 despite the changing behavior of A2, without detailing A2's learning dynamics? 3
Formal Model: Two-agent MDP • Episodic two-agent MDP with commitments • Goal: design a learning algorithm for A1 that achieves a sublinear regret – Implies near optimality for smooth MDPs Rewards and transitions are non-stationary. Agent A1 4
Experts with Double Recency Bias • Based on experts in MDPs: – Assign an experts algorithm to each state – Use ! values as experts’ losses [Even-Dar et al., NIPS 2005] • Introduce double recency bias & ',) 0 * ' = 1 ! Γ - & ',) )./ " − 1 " − % Recency windowing Recency modulation 5
Main Results (Informally) Theorem: The regret or ExpDRBias decays as !(# $%& '( )*+ , , . / ) , provided that the magnitude change of A2’s policy is !( # (1 ) . Theorem: Assume that the magnitude change of A2’s policy is Ω(1) . Then achieving a sublinear regret is at least as hard as learning parity with noise . 6
Thank you! • Visit me at the poster session! 7

Recommend

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr Conor McArdle EE414 - Markov Chains 1/30 Markov Processes A Markov Process is a stochastic process X t with the Markov property : Pr ( X t n x n |

491 views • 30 slides

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models

470 views • 8 slides

Model Repair for Markov Decision Model Repair for Markov Decision Model Repair for Markov

Model Repair for Markov Decision Model Repair for Markov Decision Model Repair for Markov Decision Model Repair for Markov Decision Processes Processes Processes Processes Marta Kwiatkowska Department of Computer Science, University of

524 views • 36 slides

Outline Md Md Markov Markov Decision Decision Processes Processes Grid World Example

3/25/2017 Outline Md Md Markov Markov Decision Decision Processes Processes Grid World Example Markov Decision Processes MDP definition Optimal Policies Auto Racing Example CSE 415: Introduction to Artificial Intelligence

431 views • 11 slides

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov Chains o Definition o Stationary Property o Paths in Markov Chains o Classification of States o Steady States in MCs. Stochastic Processes 2 Markov

781 views • 28 slides

Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo

Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo Supervisors: Prof. Nigel Bean, Dr Joshua Ross University of Adelaide July 10, 2013 Background Intervals Markov Decision Processes Markov chains

528 views • 30 slides

Introduction to Partially Observable Markov Decision Processes CS 886 Sequential Decision Making

Module 14 Introduction to Partially Observable Markov Decision Processes CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Markov Decision Processes MDPs: Fully Observable MDPs Decision maker

398 views • 21 slides

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM

980 views • 32 slides

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov chains Hidden Markov Models (HMMs) Algorithms: Viterbi, forward, backward, posterior decoding Profile HMMs Baum-Welch algorithm 9001

1.16k views • 87 slides

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

CSCE 471/871 Lecture 3: CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden Markov Models Stephen Scott Markov Chains Stephen Scott Hidden Markov Models Specifying an HMM sscott@cse.unl.edu 1

439 views • 26 slides

Markov Systems, Markov Decision Processes, and Dynamic Programming Andrew W. Moore Note to

Artificial Intelligence 15-381 April 5, 2007 Sequential Decision Problems & Markov Decision Processes Recap of last lecture Reasoning over time - Markov Processes - Hidden Markov Models - modeling state transitions - probability of

321 views • 16 slides

Machine Learning and Data Mining Reinforcement Learning Markov Decision Processes Kalev Kask

+ Machine Learning and Data Mining Reinforcement Learning Markov Decision Processes Kalev Kask Overview Intro Markov Decision Processes Reinforcement Learning Sarsa Q-learning Exploration vs Exploitation tradeoff 2

1.81k views • 146 slides

The simplex method is strongly polynomial for deterministic Markov decision processes Ian Post

The simplex method is strongly polynomial for deterministic Markov decision processes Ian Post Yinyu Ye Fields Institute November 29, 2013 Post, Ye Simplex on MDPs Fields, Nov 29, 2013 1 / 18 Markov Decision Processes A Markov decision

496 views • 32 slides

Discrete Time Markov Chains Discrete-Time Markov Chains Books - Introduction to Stochastic

Markov Processes and Applications Markov Processes and Applications Discrete-Time Markov Chains D k h Continuous Time Markov Chains Continuous-Time Markov Chains Applications Applications Queuing theory Performance

977 views • 41 slides

Markov processes (Markov chains) Construct a Bayes net from these variables: parents? Markov

Markov processes (Markov chains) Construct a Bayes net from these variables: parents? Markov assumption: X t depends on bounded subset of X 0: t 1 Temporal probability models First-order Markov process: P ( X t | X 0: t 1 ) = P ( X t | X t

359 views • 7 slides

Markov processes (Markov chains) Construct a Bayes net from these variables: parents? CPTs?

Markov processes (Markov chains) Construct a Bayes net from these variables: parents? CPTs? Temporal probability models Chapter 15, Sections 15 Chapter 15, Sections 15 1 Chapter 15, Sections 15 4 Outline Markov processes (Markov

244 views • 5 slides

Investigation on the double melting peak of PLLA using Modulated DSC M. Coletti, C. Gracia

Investigation on the double melting peak of PLLA using Modulated DSC M. Coletti, C. Gracia Fernandez TA Instruments TAINSTRUMENTS.COM TAINSTRUMENTS.COM TA Instruments Worldwide manufacturer of equipment for Thermal Analysis, Rheology,

166 views • 12 slides

The XMASS experiment YANG, Byeongsu for XMASS collaboration 27

The XMASS experiment YANG, Byeongsu for XMASS collaboration 27 2015 12 18 1 Contents Introduction to the XMASS This years physics results

289 views • 24 slides

Lab 10 Function Pointer Allows your program to determine which function to call dynamically. -

Lab 10 Function Pointer Allows your program to determine which function to call dynamically. - Example: double (*func) (double x); This function pointer signature is to take a double as parameter, and returns a double. typedef Allow you to

294 views • 10 slides

The Lerch Zeta Function: Analytic Continuation Je ff Lagarias , University of Michigan Ann

The Lerch Zeta Function: Analytic Continuation Je ff Lagarias , University of Michigan Ann Arbor, MI, USA (December 21, 2010) Workshop on Various Zeta Functions and related topics , (The University of Tokyo, Dec. 21-22, 2010) (K. Matsumoto, T.

555 views • 52 slides

Frequency Domain Analysis of Signals and Systems ELEN 3024 - Communication Fundamentals School

Frequency Domain Analysis of Signals and Systems ELEN 3024 - Communication Fundamentals School of Electrical and Information Engineering, University of the Witwatersrand July 15, 2013 Amplitude Modulation Proakis and Salehi, Communication

421 views • 15 slides

Production in the Decay of (1S) at BaBar Bryan Fulsom SLAC National Accelerator Laboratory

Observation of Inclusive D Production in the Decay of (1S) at BaBar Bryan Fulsom SLAC National Accelerator Laboratory Quarkonium Working Group Workshop 2010 Fermilab, Batavia, IL May 19, 2010 Talk Outline Background

444 views • 15 slides

Situation and outlook for (hadronic) diboson resonances in ATLAS Bill Murray Warwick/STFC-RAL

Situation and outlook for (hadronic) diboson resonances in ATLAS Bill Murray Warwick/STFC-RAL GGI Run 1 summary Run 2 prospects 29 th Sept 2015 A word on Higgs! W.Murray 1 Disclaimer I am no expert on jet substructure techniques Core

821 views • 36 slides

Higgs Searches at ATLAS Liron Barak (Weizmann Institute of Science) on behalf of the ATLAS

Higgs Searches at ATLAS Liron Barak (Weizmann Institute of Science) on behalf of the ATLAS collaboration Outline LHC and ATLAS SM Higgs Boson Production Decay modes Combination MSSM Higgs bosons Summary L. Barak

617 views • 42 slides