Oliver Schulte Zeyu Zhao Kurt Routley Tim Schwartz Computing - PowerPoint PPT Presentation

Oliver Schulte Zeyu Zhao Kurt Routley Tim Schwartz Computing Science/Statistics Simon Fraser University Burnaby-Vancouver, Canada

} Reinforcement Learning: Major branch of Artificial Intelligence ( not psychology). } Studies sequential decision-making under uncertainty . } Studied since the 1950s Ø Many models, theorems, algorithms, software. Sports Reinforcement Analytics Learning on-line intro text by Sutton and Barto 2/20

} Fundamental model type in reinforcement learning: Markov De Decision Process. } Multi ti-agent t version: Markov Game. } Models Models dynamics: e.g. given th the current t sta tate te of a matc tch, what t event t is likely to to occur next? t? } Applicati tion in th this paper: 1. 1. value acti tions. 2. 2. compute te player rankings. 4/20

Markov Game Dynamics Example Home = Colorado Away = St. Louis Differential = Home - Away face-‑off( ¡ Home,Offensive ¡Zone) Ini$al ¡State ¡ Goal ¡Differen$al ¡= ¡0, ¡ 0,2,1 ¡ Manpower ¡Differen$al ¡= ¡2, ¡ ¡ [face-‑off(Home,Off.)] ¡ Period ¡= ¡1 ¡ 0 ¡sec ¡ Time in Alexander ¡Steen ¡ Sequence wins ¡Face-‑off ¡in ¡ (sec) Colorado’s ¡ Offensive ¡Zine ¡ 5/20

Markov Game Dynamics Example 0,2,1 ¡ [face-‑off(Home,Off.) ¡ Shot(Away,Off.)] ¡ GD ¡= ¡0, ¡MD ¡= ¡2, ¡P ¡= ¡1 ¡ 0,2,1 ¡ [face-‑ off(Home,Off.)] ¡ 0 ¡sec ¡ 16 ¡sec ¡ Time in Alexander ¡Steen ¡ MaP ¡Duchen ¡shoots ¡ Sequence wins ¡Face-‑off ¡ (sec) 6/20

Markov Game Dynamics Example 0,2,1 ¡ [face-‑off(Home,Off.) ¡ Shot(Away,Off.)] ¡ 0,2,1 ¡ [face-‑off(Home,Off.) ¡ Shot(Away,Off.) ¡ GD ¡= ¡0, ¡MD ¡= ¡2, ¡P ¡= ¡1 ¡ Shot(Away,Off.)] ¡ P(Away ¡goal) ¡= ¡32% ¡ 0,2,1 ¡ [face-‑ off(Home,Off.)] ¡ 0 ¡sec ¡ 16 ¡sec ¡ 22 ¡sec ¡ 41 ¡sec ¡ 42 ¡sec ¡ Time in Alexander ¡Steen ¡ MaP ¡Duchen ¡shoots ¡ Alex ¡Pientrangelo ¡ Tyson ¡Barries ¡shoots ¡ sequence ¡ends ¡ Sequence wins ¡Face-‑off ¡ shoots ¡ (sec) 7/20

Markov Game Dynamics Example 0,2,1 ¡ [face-‑off(Home,Off.) ¡ Shot(Away,Off.)] ¡ 0,2,1 ¡ 0,2,1 ¡ [face-‑off(Home,Off.) ¡ Shot(Away,Off.) ¡ [face-‑off(Home,Offensive), ¡ GD ¡= ¡0, ¡MD ¡= ¡2, ¡P ¡= ¡1 ¡ Shot(Away,Offensive), ¡ Shot(Away,Off.)] ¡ P(Away ¡goal) ¡= ¡32% ¡ Shot(Away,Offensive), ¡ 0,2,1 ¡ Shot(Home,Offensive)] ¡ [face-‑ off(Home,Off.)] ¡ 0 ¡sec ¡ 16 ¡sec ¡ 22 ¡sec ¡ 41 ¡sec ¡ 42 ¡sec ¡ Time in Alexander ¡Steen ¡ MaP ¡Duchen ¡shoots ¡ Alex ¡Pientrangelo ¡ Tyson ¡Barries ¡shoots ¡ sequence ¡ends ¡ Sequence wins ¡Face-‑off ¡ shoots ¡ (sec) 8/20

Markov Game Dynamics Example 0,2,1 ¡ [face-‑off(Home,Off.) ¡ Shot(Away,Off.)] ¡ 0,2,1 ¡ 0,2,1 ¡ 0,2,1 ¡ [face-‑off(Home,Off.) ¡ [face-‑off(Home,Off.), ¡ Shot(Away,Off.) ¡ [face-‑off(Home,Offensive), ¡ Shot(Away,Off.), ¡ GD ¡= ¡0, ¡MD ¡= ¡2, ¡P ¡= ¡1 ¡ Shot(Away,Offensive), ¡ Shot(Away,Off.)] ¡ Shot(Away,Off.), ¡ P(Away ¡goal) ¡= ¡32% ¡ Shot(Away,Offensive), ¡ 0,2,1 ¡ Shot(Home,Off.), ¡ Shot(Home,Offensive)] ¡ [face-‑ Stoppage] ¡ off(Home,Off.)] ¡ 0 ¡sec ¡ 16 ¡sec ¡ 22 ¡sec ¡ 41 ¡sec ¡ 42 ¡sec ¡ Time in Alexander ¡Steen ¡ MaP ¡Duchen ¡shoots ¡ Alex ¡Pientrangelo ¡ Tyson ¡Barries ¡shoots ¡ sequence ¡ends ¡ Sequence wins ¡Face-‑off ¡ shoots ¡ (sec) 9/20

Markov Game Dynamics Example 0,2,1 ¡ [face-‑off(Home,Off.) ¡ Shot(Away,Off.)] ¡ 0,2,1 ¡ 0,2,1 ¡ 0,2,1 ¡ [face-‑off(Home,Off.) ¡ [face-‑off(Home,Off.), ¡ Shot(Away,Off.) ¡ [face-‑off(Home,Offensive), ¡ Shot(Away,Off.), ¡ GD ¡= ¡0, ¡MD ¡= ¡2, ¡P ¡= ¡1 ¡ Shot(Away,Offensive), ¡ Shot(Away,Off.)] ¡ Shot(Away,Off.), ¡ P(Away ¡goal) ¡= ¡32% ¡ Shot(Away,Offensive), ¡ 0,2,1 ¡ Shot(Home,Off.), ¡ Shot(Home,Offensive)] ¡ [face-‑ Stoppage] ¡ off(Home,Off.)] ¡ 0 ¡sec ¡ 16 ¡sec ¡ 22 ¡sec ¡ 41 ¡sec ¡ 42 ¡sec ¡ Time in Alexander ¡Steen ¡ MaP ¡Duchen ¡shoots ¡ Alex ¡Pientrangelo ¡ Tyson ¡Barries ¡shoots ¡ sequence ¡ends ¡ Sequence wins ¡Face-‑off ¡ shoots ¡ (sec) 10/20

} Two agents, Home and Away. } Zero-sum: if Home earns a reward of r, then Away receives –r. } Rewards can be ◦ win match ◦ sco score re go goal al ◦ receive penalty (cost). 11/20

Big Data: Play-by-play 2007-2015 Table 1: Size of Dataset Number of Teams 32 Number of Players 1,951 Number of Games 9,220 Number of Sequences 590,924 Number of Events 2,827,467 Big Model: 1.3 M states 13/20

Player Performance Evaluation 14/20

} Key quantity in Markov game models: the to tota tal expecte ted reward for a player given the current game state. ◦ Written V(s). } Looks ah Looks ahead ead over all possible game continuations. dynamic programming total expected transition probabilities reward of state s 15/20

} Q(s,a) = the expected total reward if action a is executed in state s. } The acti tion-value functi tion. impact ( s , a ) = Q ( s , a ) − V ( s ) Expected reward Expected reward after action before action 16/20

Q-value Ticker Q-value =   P(that away team scores next goal) 0,2,1 ¡ [face-‑off(Home,Off.) ¡ Shot(Away,Off.)] ¡ 0,2,1 ¡ P(Away ¡goal) ¡= ¡36% ¡ 0,2,1 ¡ 0,2,1 ¡ [face-‑off(Home,Off.) ¡ [face-‑off(Home,Off.), ¡ Shot(Away,Off.) ¡ [face-‑off(Home,Offensive), ¡ Shot(Away,Off.), ¡ GD ¡= ¡0, ¡MD ¡= ¡2, ¡P ¡= ¡1 ¡ Shot(Away,Offensive), ¡ Shot(Away,Off.)] ¡ Shot(Away,Off.), ¡ P(Away ¡goal) ¡= ¡32% ¡ Shot(Away,Offensive), ¡ 0,2,1 ¡ P(Away ¡goal) ¡= ¡35% ¡ Shot(Home,Off.), ¡ Shot(Home,Offensive)] ¡ [face-‑off(Home,Off.)] ¡ Stoppage] ¡ P(Away ¡goal) ¡= ¡32% ¡ P(Away ¡goal) ¡= ¡28% ¡ P(Away ¡goal) ¡= ¡32% ¡ 0 ¡sec ¡ 16 ¡sec ¡ 22 ¡sec ¡ 41 ¡sec ¡ 42 ¡sec ¡ Time (sec) Alexander ¡Steen ¡ MaP ¡Duchen ¡shoots ¡ Alex ¡Pientrangelo ¡ Tyson ¡Barries ¡shoots ¡ sequence ¡ends ¡ wins ¡Face-‑off ¡ shoots ¡ 17/20

} Context-Aware. ◦ e.g. goals more valuable in ties than when ahead. } Look Ahead: ◦ e.g. penalties è powerplay è goals but not immediately. 18/20

1. From the Q-function, compute impact values of state-action pairs. 2. For each action that a player takes in a game state, find its impact value. 3. Sum player action impacts over all games in a season. (Like +/-). 19/20

• The Blues’ STL line comes out very well. • Tarasenko is under- valued, St. Louis increased his salary 7- fold. 20/20

Name Goal Impact Points +/- Salary Jason Spezza 29.64 66 -26 $5,000,000 Jonathan Toews 28.75 67 25 $6,500,000 Joe Pavelski 27.20 79 23 $4,000,000 Marian Hossa 26.12 57 26 $7,900,000 Patrick Sharp 24.43 77 12 $6,500,000 Sidney Crosby 24.23 104 18 $12,000,000 Claude Giroux 23.89 86 7 $5,000,000 Tyler Seguin 23.89 84 16 $4,500,000 Jason Spezza: high goal impact, low +/-. • plays very well on poor team (Ottawa Senators). • Requested transfer for 2014-2015 season. 21/20

Oliver Schulte Zeyu Zhao Kurt Routley Tim Schwartz Computing - PowerPoint PPT Presentation

Oliver Schulte Zeyu Zhao Kurt Routley Tim Schwartz Computing Science/Statistics Simon Fraser University Burnaby-Vancouver, Canada } Reinforcement Learning: Major branch of Artificial Intelligence ( not psychology). } Studies sequential

What Can Learned Intrinsic Rewards Capture? Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu,

Rule Languages: Rule Languages: Automotive Use Case Automotive Use Case Kurt Godden Kurt

GMP Training Course Inspections from an industry perspective 20-21 October 2009 Fiona Routley

SUB-BASALT IMAGING WITH WIDE-ANGLE SEISMIC DATA Zeyu Zhao Advisor: Dr. Mrinal Sen EDUCATION

of Nuclear Facilities Zhao Zeyu State Nuclear Security Technology Center Beijing, China

FEL Parameters D. Schulte D. Schulte, CERN, October 2013 1 Goals for Today Define some

Disclosures Disclosures Fractures: A. Schwartz A. Schwartz Epidemiology and Risk Factors

Instance Search Task Wenhui Jiang (jiang1st@bupt.edu.cn) Zhicheng Zhao, Qi Chen, Jinlong Zhao,

Now Up Yours @ TIM MEDIN @ TIM MEDIN @ TIM MEDIN Change your career Ch @ TIM MEDIN 100%

Mind Change Optimal Learning Of Bayes Net Structure Oliver Schulte School of Computing Science

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9

Artificial Neural Networks Oliver Schulte - CMPT 726 Feed-forward Networks Network Training

Continuous Latent Variables Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 12 Principal Component

Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Maximum Margin Criterion Math

Graphical Models - Part II Oliver Schulte - CMPT 726 Bishop PRML Ch. 8 Markov Random Fields

Disclosures Antiretroviral Therapy Management: Brad Hare: None Annie Luetkemeyer:

H were created in the late 18th and 19th 2 n 2nd preimages centuries; they federated and became

Structures Lecture 5 Structures 12 February 2015 1 Wentworth Institute of Technology COMP201

Language Models LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing Web Search Slides based

This time... Bayesian Net Belief Propagation Algorithm LDPC/IRA Codes S. Cheng (OU-Tulsa)

A Comparative Usability Study of Two-Factor Authentication Emiliano de Cristofaro 1 , Honglu Du 2

Freeness and the Transpose ( Matrices Just Wanna be Free ) Jamie Mingo (Queens University)

La Layou out an and Gr Grids No screens Prof. Lydia Chilton COMS 4170 24 January 2018 COMS

Oliver Schulte Zeyu Zhao Kurt Routley Tim Schwartz Computing - PowerPoint PPT Presentation

Oliver Schulte Zeyu Zhao Kurt Routley Tim Schwartz Computing Science/Statistics Simon Fraser University Burnaby-Vancouver, Canada } Reinforcement Learning: Major branch of Artificial Intelligence ( not psychology). } Studies sequential

What Can Learned Intrinsic Rewards Capture? Zeyu Zheng*, Junhyuk Oh*, Matteo Hessel, Zhongwen Xu,

Rule Languages: Rule Languages: Automotive Use Case Automotive Use Case Kurt Godden Kurt

GMP Training Course Inspections from an industry perspective 20-21 October 2009 Fiona Routley

SUB-BASALT IMAGING WITH WIDE-ANGLE SEISMIC DATA Zeyu Zhao Advisor: Dr. Mrinal Sen EDUCATION

of Nuclear Facilities Zhao Zeyu State Nuclear Security Technology Center Beijing, China

FEL Parameters D. Schulte D. Schulte, CERN, October 2013 1 Goals for Today Define some

Disclosures Disclosures Fractures: A. Schwartz A. Schwartz Epidemiology and Risk Factors

Instance Search Task Wenhui Jiang (jiang1st@bupt.edu.cn) Zhicheng Zhao, Qi Chen, Jinlong Zhao,

Now Up Yours @ TIM MEDIN @ TIM MEDIN @ TIM MEDIN Change your career Ch @ TIM MEDIN 100%

Mind Change Optimal Learning Of Bayes Net Structure Oliver Schulte School of Computing Science

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9

Artificial Neural Networks Oliver Schulte - CMPT 726 Feed-forward Networks Network Training

Continuous Latent Variables Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 12 Principal Component

Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Maximum Margin Criterion Math

Graphical Models - Part II Oliver Schulte - CMPT 726 Bishop PRML Ch. 8 Markov Random Fields

Disclosures Antiretroviral Therapy Management: Brad Hare: None Annie Luetkemeyer:

H were created in the late 18th and 19th 2 n 2nd preimages centuries; they federated and became

Structures Lecture 5 Structures 12 February 2015 1 Wentworth Institute of Technology COMP201

Language Models LM Jelinek-Mercer Smoothing and LM Dirichlet Smoothing Web Search Slides based

This time... Bayesian Net Belief Propagation Algorithm LDPC/IRA Codes S. Cheng (OU-Tulsa)

A Comparative Usability Study of Two-Factor Authentication Emiliano de Cristofaro 1 , Honglu Du 2

Freeness and the Transpose ( Matrices Just Wanna be Free ) Jamie Mingo (Queens University)

La Layou out an and Gr Grids No screens Prof. Lydia Chilton COMS 4170 24 January 2018 COMS

What Can Learned Intrinsic Rewards Capture? Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu,