LEARNING IN GAMES WITH NOISY PAYOFF OBSERVATIONS Background and - PowerPoint PPT Presentation

LEARNING IN GAMES WITH NOISY PAYOFF OBSERVATIONS Background and motivation Preliminaries The core scheme Learning with noisy feedback Mario Bravo  Panayotis Mertikopoulos   Universidad de Santiago de Chile  CNRS – Laboratoire d’Informatique de Grenoble ADGO 2016 – Santiago, January 28, 2016 P . Mertikopoulos CNRS – Laboratoire d’Informatique de Grenoble Sunday, October 7, 2012

Outline Background and motivation Preliminaries The core scheme Learning with noisy feedback Background and motivation Preliminaries The core scheme Learning with noisy feedback P . Mertikopoulos CNRS – Laboratoire d’Informatique de Grenoble Sunday, October 7, 2012

Example: a trader chooses asset proportions in an investment portfolio. Example: asset placements determine returns. Example: change asset proportions based on performance. Learning in Games When does the agents’ learning process lead to a “reasonable” outcome? Background and motivation Preliminaries The core scheme Learning with noisy feedback The basic context: ▸ Decision-making: agents choose actions, each seeking to optimize some objective. ▸ Payoffs : rewards are determined by the decisions of all interacting agents. ▸ Learning: the agents adjust their decisions and the process continues. P . Mertikopoulos CNRS – Laboratoire d’Informatique de Grenoble Sunday, October 7, 2012

Learning in Games When does the agents’ learning process lead to a “reasonable” outcome? Background and motivation Preliminaries The core scheme Learning with noisy feedback The basic context: ▸ Decision-making: agents choose actions, each seeking to optimize some objective. Example: a trader chooses asset proportions in an investment portfolio. ▸ Payoffs : rewards are determined by the decisions of all interacting agents. Example: asset placements determine returns. ▸ Learning: the agents adjust their decisions and the process continues. Example: change asset proportions based on performance. P . Mertikopoulos CNRS – Laboratoire d’Informatique de Grenoble Sunday, October 7, 2012

Example: in high-frequency trading (HFT), decision times . Example: the SEC requires small differences in HFT orders to reduce volatility. Example: volatility estimates highly inaccurate at the time-scale. Motivation Background and motivation Preliminaries The core scheme Learning with noisy feedback ▸ In many applications, decisions taken at very fast time-scales. ▸ Regulations/physical constraints limit changes in decisions. ▸ Fast time-scales have adverse effects on quality of feedback. P . Mertikopoulos CNRS – Laboratoire d’Informatique de Grenoble Sunday, October 7, 2012

Motivation Background and motivation Preliminaries The core scheme Learning with noisy feedback ▸ In many applications, decisions taken at very fast time-scales. Example: in high-frequency trading (HFT), decision times ≈  µ s . ▸ Regulations/physical constraints limit changes in decisions. Example: the SEC requires small differences in HFT orders to reduce volatility. ▸ Fast time-scales have adverse effects on quality of feedback. Example: volatility estimates highly inaccurate at the  µ s time-scale. P . Mertikopoulos CNRS – Laboratoire d’Informatique de Grenoble Sunday, October 7, 2012

The Flash Crash of 2010 Background and motivation Preliminaries The core scheme Learning with noisy feedback A trillion-dollar NYSE crash (and partial rebound) that lasted 35 minutes (14:42–15:07) Aggressive selling due to imperfect volatility estimates induced a huge drop in liquidity and precipitated the crash (Vuorenmaa and Wang, 2014) P . Mertikopoulos CNRS – Laboratoire d’Informatique de Grenoble Sunday, October 7, 2012

Background and motivation Preliminaries The core scheme Learning with noisy feedback What this talk is about : Examine the robustness of a class of continuous-time learning schemes with noisy feedback. P . Mertikopoulos CNRS – Laboratoire d’Informatique de Grenoble Sunday, October 7, 2012

Outline Background and motivation Preliminaries The core scheme Learning with noisy feedback Background and motivation Preliminaries The core scheme Learning with noisy feedback P . Mertikopoulos CNRS – Laboratoire d’Informatique de Grenoble Sunday, October 7, 2012

Mixed strategies yield expected payoffs Strategy profiles: Payoff vector of player : where is the payoff to the -th action of player in the mixed strategy profile . Game setup Background and motivation Preliminaries The core scheme Learning with noisy feedback Throughout this talk, we focus on finite games: ▸ Finite set of players : N = { , . . . , N } ▸ Finite set of actions per player: A k = { α k , , α k , , . . . } ▸ Reward of player k determined by corresponding payoff function u k ∶ ∏ k A k → R : ( α  , . . . , α n ) ↦ u k ( α  , . . . , α N ) P . Mertikopoulos CNRS – Laboratoire d’Informatique de Grenoble Sunday, October 7, 2012

Payoff vector of player : where is the payoff to the -th action of player in the mixed strategy profile . Game setup Background and motivation Preliminaries The core scheme Learning with noisy feedback Throughout this talk, we focus on finite games: ▸ Finite set of players : N = { , . . . , N } ▸ Finite set of actions per player: A k = { α k , , α k , , . . . } ▸ Reward of player k determined by corresponding payoff function u k ∶ ∏ k A k → R : ( α  , . . . , α n ) ↦ u k ( α  , . . . , α N ) ▸ Mixed strategies x k ∈ X k ≡ ∆ ( A k ) yield expected payoffs u k ( x  , . . . , x N ) = ∑ α  . . . ∑ α N x , α  ⋯ x N , α N u k ( α  , . . . , α N ) ▸ Strategy profiles: x = ( x  , . . . , x N ) ∈ X ≡ ∏ k X k P . Mertikopoulos CNRS – Laboratoire d’Informatique de Grenoble Sunday, October 7, 2012

Game setup Background and motivation Preliminaries The core scheme Learning with noisy feedback Throughout this talk, we focus on finite games: ▸ Finite set of players : N = { , . . . , N } ▸ Finite set of actions per player: A k = { α k , , α k , , . . . } ▸ Reward of player k determined by corresponding payoff function u k ∶ ∏ k A k → R : ( α  , . . . , α n ) ↦ u k ( α  , . . . , α N ) ▸ Mixed strategies x k ∈ X k ≡ ∆ ( A k ) yield expected payoffs u k ( x  , . . . , x N ) = ∑ α  . . . ∑ α N x , α  ⋯ x N , α N u k ( α  , . . . , α N ) ▸ Strategy profiles: x = ( x  , . . . , x N ) ∈ X ≡ ∏ k X k ▸ Payoff vector of player k : v k ( x ) = ( v k α ( x )) α ∈ A k where v k α ( x ) = v k ( α ; x − k ) is the payoff to the α -th action of player k in the mixed strategy profile x ∈ X . P . Mertikopoulos CNRS – Laboratoire d’Informatique de Grenoble Sunday, October 7, 2012

Definition leads to no regret if for all , i.e. if every player’s average regret is non-positive in the long run. NB: unilateral definition, no need for a game Regret Background and motivation Preliminaries The core scheme Learning with noisy feedback Suppose players follow a trajectory of play x ( t ) (based on some learning/adjustment rule, to be discussed later). How does x k ( t ) compare on average to the “best possible” action α k ∈ A k ? u k ( α ; x − k ( s )) − u k ( x ( s )) P . Mertikopoulos CNRS – Laboratoire d’Informatique de Grenoble Sunday, October 7, 2012

Definition leads to no regret if for all , i.e. if every player’s average regret is non-positive in the long run. NB: unilateral definition, no need for a game Regret Background and motivation Preliminaries The core scheme Learning with noisy feedback Suppose players follow a trajectory of play x ( t ) (based on some learning/adjustment rule, to be discussed later). How does x k ( t ) compare on average to the “best possible” action α k ∈ A k ?  u k ( α ; x − k ( s )) − u k ( x ( s )) ds t ∫ P . Mertikopoulos CNRS – Laboratoire d’Informatique de Grenoble Sunday, October 7, 2012

Definition leads to no regret if for all , i.e. if every player’s average regret is non-positive in the long run. NB: unilateral definition, no need for a game Regret Background and motivation Preliminaries The core scheme Learning with noisy feedback Suppose players follow a trajectory of play x ( t ) (based on some learning/adjustment rule, to be discussed later). How does x k ( t ) compare on average to the “best possible” action α k ∈ A k ?  u k ( α ; x − k ( s )) − u k ( x ( s )) ds t α ∈ A k ∫ max P . Mertikopoulos CNRS – Laboratoire d’Informatique de Grenoble Sunday, October 7, 2012

LEARNING IN GAMES WITH NOISY PAYOFF OBSERVATIONS Background and - PowerPoint PPT Presentation

LEARNING IN GAMES WITH NOISY PAYOFF OBSERVATIONS Background and motivation Preliminaries The core scheme Learning with noisy feedback Mario Bravo Panayotis Mertikopoulos Universidad de Santiago de Chile CNRS Laboratoire

On the Approximation of Mean-Payoff Games Raffaella Gentilini University of Perugia Convegno

Games Miheer Dewaskar Chennai Mathematical Institute April 27, 2016 1 / 19 Outline Finite

Formal Modeling in Cognitive Science 1 Noisy Channel Model Channel Capacity Lecture 29: Noisy

The Multiple Dimensions of Mean-Payoff Games Laurent Doyen CNRS & LSV, ENS Paris-Saclay RP

Mean-payoff games with incomplete information Paul Hunter, Guillermo P erez, Jean-Franc ois

Strategy recovery for stochastic mean payoff games Marcello Mamino TU Dresden GRASTA 15,

S S S S erious Games erious Games erious Games erious Games + Computer S + Computer S +

Potential Games Matoula Petrolia April 14, 2011 Examples Potential Games Potential vs

Pre-Grundy Games Games And Graphs Workshop 2017 In collaboration with : Eric Duch ene,

Learning Nearest Neighbor Graphs from Noisy Distance Samples Noisy Distance Samples Blake Mason,

Robust Predictions in Games with Incomplete Information joint with Stephen Morris (Princeton

Today Experts/Zero-Sum Games Equilibrium. Boosting and Experts. Routing and Experts. Two person

LOGIC OF GAMES Andreas Blass University of Michigan Ann Arbor, MI 48109 ablass@umich.edu Games

Nash Dynamics and Potential Games Maria Serna Fall 2016 AGT-MIRI, FIB Potential Games Contents

CSC2556 Lecture 11 Noncooperative Games 2: Zero-Sum Games, Stackelberg Games CSC2556 - Nisarg

Congestion Games with affine functions Maria Serna Fall 2016 AGT-MIRI, FIB-UPC Congestion Games

Exam Review Introduction to Machine Learning T-529-ITME Instructor: Dan Lizotte Exam Logistics

for Compressed Imaging Chunli Guo, Mike E. Davies Institute for Digital Communications

Computing Equilibria Christos H. Papadimitriou UC Berkeley christos Games 1/3 1/3 1/3

Continuous Time Models of Repeated Games with Imperfect Public Monitoring Drew Fudenberg and

Seminar: Automata Theory Timed Automata Jennifer Nist 11 th February 2016 Chair of Software

Counting minimal transversals in -acyclic Hypergraph Benjamin Bergougnoux, Florent Capelli ,

Comparisons of stochastic task-resource systems Bruno Gaujal Jean-Marc Vincent INRIA and LIG

New Initiatives in Policy, System, and Environmental change (PSE) project & activities

LEARNING IN GAMES WITH NOISY PAYOFF OBSERVATIONS Background and - PowerPoint PPT Presentation

LEARNING IN GAMES WITH NOISY PAYOFF OBSERVATIONS Background and motivation Preliminaries The core scheme Learning with noisy feedback Mario Bravo Panayotis Mertikopoulos Universidad de Santiago de Chile CNRS Laboratoire

On the Approximation of Mean-Payoff Games Raffaella Gentilini University of Perugia Convegno

Games Miheer Dewaskar Chennai Mathematical Institute April 27, 2016 1 / 19 Outline Finite

Formal Modeling in Cognitive Science 1 Noisy Channel Model Channel Capacity Lecture 29: Noisy

The Multiple Dimensions of Mean-Payoff Games Laurent Doyen CNRS &amp; LSV, ENS Paris-Saclay RP

Mean-payoff games with incomplete information Paul Hunter, Guillermo P erez, Jean-Franc ois

Strategy recovery for stochastic mean payoff games Marcello Mamino TU Dresden GRASTA 15,

S S S S erious Games erious Games erious Games erious Games + Computer S + Computer S +

Potential Games Matoula Petrolia April 14, 2011 Examples Potential Games Potential vs

Pre-Grundy Games Games And Graphs Workshop 2017 In collaboration with : Eric Duch ene,

Learning Nearest Neighbor Graphs from Noisy Distance Samples Noisy Distance Samples Blake Mason,

Robust Predictions in Games with Incomplete Information joint with Stephen Morris (Princeton

Today Experts/Zero-Sum Games Equilibrium. Boosting and Experts. Routing and Experts. Two person

LOGIC OF GAMES Andreas Blass University of Michigan Ann Arbor, MI 48109 ablass@umich.edu Games

Nash Dynamics and Potential Games Maria Serna Fall 2016 AGT-MIRI, FIB Potential Games Contents

CSC2556 Lecture 11 Noncooperative Games 2: Zero-Sum Games, Stackelberg Games CSC2556 - Nisarg

Congestion Games with affine functions Maria Serna Fall 2016 AGT-MIRI, FIB-UPC Congestion Games

Exam Review Introduction to Machine Learning T-529-ITME Instructor: Dan Lizotte Exam Logistics

for Compressed Imaging Chunli Guo, Mike E. Davies Institute for Digital Communications

Computing Equilibria Christos H. Papadimitriou UC Berkeley christos Games 1/3 1/3 1/3

Continuous Time Models of Repeated Games with Imperfect Public Monitoring Drew Fudenberg and

Seminar: Automata Theory Timed Automata Jennifer Nist 11 th February 2016 Chair of Software

Counting minimal transversals in -acyclic Hypergraph Benjamin Bergougnoux, Florent Capelli ,

Comparisons of stochastic task-resource systems Bruno Gaujal Jean-Marc Vincent INRIA and LIG

New Initiatives in Policy, System, and Environmental change (PSE) project &amp; activities

The Multiple Dimensions of Mean-Payoff Games Laurent Doyen CNRS & LSV, ENS Paris-Saclay RP

New Initiatives in Policy, System, and Environmental change (PSE) project & activities