G enerative A dversarial User Model for R einforcement L earning - PowerPoint PPT Presentation

G enerative A dversarial User Model for R einforcement L earning Based R ecommendation S ystem Xinshi Chen 1 , Shuang Li 1 , Hui Li 2 , Shaohua Jiang 2 , Yuan Qi 2 , Le Song 1,2 1 Georgia Tech, 2 Ant Financial ICML 2019

RL for Recommendation System display items display items system … … … choice choice state at 𝑢 + 1 state at 𝑢 + 2 state at 𝑢 user A user’s interest evolves over time based on what she observes. • Recommender’s action can significantly influence such evolution. • A RL based recommender can consider user’s long term interest. •

display items display items system … … … choice choice state at 𝑢 + 1 state at 𝑢 + 2 state at 𝑢 user Challenges reward=? reward=? Training of RL policy requires 1. User is the environment lots of interactions with users e.g. (1) For AlphaGo Zero , 4.9 million games of self-play were generated for training. (2) RL for Atari game takes more than 50 hours on GPU for training. 2. The reward function (a user’s interest) is unknown

Our solution We propose A G enerative A dversarial User Model • - to model user’s action - to recover user’s reward Use GAN User Model as a simulator to pre-train the RL policy offline • simulated interaction GAN User Model system Simulated Environment RL policy

Generative Adversarial User Model 2 components: User’s reward 𝒔(𝒕 𝒖 , 𝒃 𝒖 ) displayed items 𝒝 - 𝑏 - is clicked item. • 𝑡 - is user’s experience (state). • User’s behavior 𝝔(𝒕 𝒖 , 𝒝 𝒖 ) 𝑏 - ∼ 𝜚 𝑡 - , 𝒝 - choice 𝒝 𝒖 contains items displayed by the system. • act 𝑏 - ∼ 𝜚 to maximize her expected reward. • 𝒔(𝒕 𝒖 , 𝒃 𝒖 ) reward 𝜚 ∗ (𝑡 - , 𝒝 - ) = arg max 𝔽 : 𝑠 𝑡 - , 𝑏 - − 𝑆 𝜚 /𝜃 • :

Generative Adversarial Training In analogy to GAN: 𝝔 (behavior) acts as a generator • 𝒔 (reward) acts as a discriminator • Jointly learned via a mini-max formulation : G G 𝑠 𝑡 - , 𝑏 - - - min max 𝔽 : D − 𝑆 𝜚 /𝜃 − D 𝑠(𝑡 -CHI , 𝑏 -CHI ) C : -EF -EF

Model Parameterization 2 architectures for aggregating historical information (i.e. state 𝑡 - ) (1) LSTM 𝑢−𝑛 𝒈 ∗ 𝑢−1 ⋯ 𝒈 ∗ we weight t matr trix ix ℎ 𝑢−1 (2) Position Weight 𝑥 11 ⋯ 𝑥 1𝑜 𝑢 𝑠 𝑗 concat = ⋮ ⋮ ⋮ × 𝑥 𝑛1 ⋯ 𝑥 𝑛𝑜 𝑢 𝒈 𝑗

Set Recommendation RL policy ∗ ∗ ∗ 𝑏 O 𝑏 F 𝑏 N set recommendation … display 𝑙 items all available 𝐿 items ∗ = arg max ∗ , 𝑏 N ∗ , … 𝑏 O Q R ,…,Q S 𝑅(𝑡 - , 𝑏 F , 𝑏 N , … , 𝑏 O ) 𝑏 F combinatorial action space 𝑳 Intractable computation! 𝒍

Set Recommendation RL policy We design a cascading Q network to compute the optimal action with linear complexity: ∗ = arg max ∗ , 𝑏 N ∗ , … 𝑏 O Q R ,…,Q S 𝑅(𝑡 - , 𝑏 F , 𝑏 N , … , 𝑏 O ) 𝑏 F decompose ∗ = arg max 𝑹 𝟐∗ 𝑡 - , 𝑏 F ≔ max Q R 𝑹 𝟐∗ (𝑡 - , 𝑏 F ) Q X:S 𝑅(𝑡 - , 𝑏 F , 𝑏 N:O ) 𝑏 F ∗ = arg max 𝑹 𝟑∗ 𝑡 - , 𝑏 F , 𝑏 N ≔ max ∗ , 𝑏 N ) Q X 𝑹 𝟑∗ (𝑡 - , 𝑏 F Q [:S 𝑅(𝑡 - , 𝑏 F , 𝑏 N , 𝑏 \:O ) 𝑏 N … … ∗ = arg max ∗ , 𝑏 N ∗ , … , 𝑏 O ) Q S 𝑹 𝒍∗ (𝑡 - , 𝑏 F 𝑏 O

Set Recommendation RL policy: Cascading DQN ∗ ∗ 𝑏 1 𝑏 2 ∗ 𝑏 𝑙 … Argmax Argmax Argmax 𝑅 1 (𝑡, 𝑏 1 ; 𝜄 1 ) 𝑅 2 (𝑡, 𝑏 1 ∗ , 𝑏 2 ; 𝜄 2 ) ∗ 𝑅 𝑙 (𝑡, 𝑏 1:𝑙−1 , 𝑏 𝑙 ; 𝜄 𝑙 ） … 𝑏 2 𝑏 𝑙 𝑏 1 𝑡

Experiments Predictive Performance of User Model Recommendation Policy Based On User Model

Experiments Cascading-DQN policy pre-trained over a GAN User Model can quickly achieve a high CTR even when it is applied to a new set of users.

Thanks! Poster: Pacific Ballroom #252, Tue, 06:30 PM Contact: xinshi.chen@gatech.edu

G enerative A dversarial User Model for R einforcement L earning - PowerPoint PPT Presentation

G enerative A dversarial User Model for R einforcement L earning Based R ecommendation S ystem Xinshi Chen 1 , Shuang Li 1 , Hui Li 2 , Shaohua Jiang 2 , Yuan Qi 2 , Le Song 1,2 1 Georgia Tech, 2 Ant Financial ICML 2019 RL for Recommendation

Signaling Games and the Emergence of Linguistic Meaning PENG 2012/2013 Introduction 1 S IGNALING

RUN groupadd -r user && useradd -r -g user user USER user $ docker run --read-only debian

voice Kate Howland End-user programming? End-user programming? End-user programming?

User Pays User Committee User Pays User Committee 8 th August 2011 1 2 Agenda

Screen 1 Go to www.myenroll.com < Click Request User ID and Password> Acquire USER ID and

User Datagram Datagram Protocol (UDP) Protocol (UDP) User Srinidhi Varadarajan UDP: The User

UX/UI What is UX and UI? UX Process User Research User Research Creating User

#TZA2018 THAILAND SOCIAL MEDIA SUMMARY 49 13.6 12 Million User Million User Million User

Summary User-centric Social Social Multimedia Multimedia Computing From Users: user-perceptive

Cosmological model : Cosmological model Cosmological model Cosmological model : : : :

Trusted Hardware Sai Krishna Deepak Maram, CS 6410 Move to a Cloud-based model User apps User

Tow ards a Model of Tow ards a Model of Provenance and User View s Provenance and User View s

Outline 2 Motivation Current cyber defense landscape & open questions Pro-active

The he Time me of of Spac pace I e Inv nvader aders W Will C Come ome to o Pas ass A

fabrizio.falchi@cnr.it fabrizio.falchi@cnr.it fabrizio.falchi@cnr.it W HAT S THAT ?

min L ( ) - GANs: Hard (different) optimization problem: minimax. Gauthier Gidel

Recommender Systems Collabora2ve Filtering and Matrix Factoriza2on

Replicable Evaluation of Recommender Systems Alejandro Bellogn (Universidad Autnoma de Madrid,

Towards using Cached Data Mining for Large Scale Recommender Systems Swapneel Sheth, Gail Kaiser

DS504/CS586: Big Data Analytics Recommender System Prof. Yanhua Li Time: 6:00pm 8:50pm Thu.

Privacy in Recommender Systems CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 21:

Temporal Learning and Sequence Modeling for a Job Recommender System Kuan Liu, Xing Shi, Anoop

Shefali Garg Fangyan Sun Music dataset is too big while life is short!!!! You need someone to

Intelligently Recommending Key Bindings on Physical Keyboards with Demonstrations in Emacs Shudan

G enerative A dversarial User Model for R einforcement L earning - PowerPoint PPT Presentation

G enerative A dversarial User Model for R einforcement L earning Based R ecommendation S ystem Xinshi Chen 1 , Shuang Li 1 , Hui Li 2 , Shaohua Jiang 2 , Yuan Qi 2 , Le Song 1,2 1 Georgia Tech, 2 Ant Financial ICML 2019 RL for Recommendation

Signaling Games and the Emergence of Linguistic Meaning PENG 2012/2013 Introduction 1 S IGNALING

RUN groupadd -r user &amp;&amp; useradd -r -g user user USER user $ docker run --read-only debian

voice Kate Howland End-user programming? End-user programming? End-user programming?

User Pays User Committee User Pays User Committee 8 th August 2011 1 2 Agenda

Screen 1 Go to www.myenroll.com &lt; Click Request User ID and Password&gt; Acquire USER ID and

User Datagram Datagram Protocol (UDP) Protocol (UDP) User Srinidhi Varadarajan UDP: The User

UX/UI What is UX and UI? UX Process User Research User Research Creating User

#TZA2018 THAILAND SOCIAL MEDIA SUMMARY 49 13.6 12 Million User Million User Million User

Summary User-centric Social Social Multimedia Multimedia Computing From Users: user-perceptive

Cosmological model : Cosmological model Cosmological model Cosmological model : : : :

Trusted Hardware Sai Krishna Deepak Maram, CS 6410 Move to a Cloud-based model User apps User

Tow ards a Model of Tow ards a Model of Provenance and User View s Provenance and User View s

Outline 2 Motivation Current cyber defense landscape &amp; open questions Pro-active

The he Time me of of Spac pace I e Inv nvader aders W Will C Come ome to o Pas ass A

fabrizio.falchi@cnr.it fabrizio.falchi@cnr.it fabrizio.falchi@cnr.it W HAT S THAT ?

min L ( ) - GANs: Hard (different) optimization problem: minimax. Gauthier Gidel

Recommender Systems Collabora2ve Filtering and Matrix Factoriza2on

Replicable Evaluation of Recommender Systems Alejandro Bellogn (Universidad Autnoma de Madrid,

Towards using Cached Data Mining for Large Scale Recommender Systems Swapneel Sheth, Gail Kaiser

DS504/CS586: Big Data Analytics Recommender System Prof. Yanhua Li Time: 6:00pm 8:50pm Thu.

Privacy in Recommender Systems CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 21:

Temporal Learning and Sequence Modeling for a Job Recommender System Kuan Liu, Xing Shi, Anoop

Shefali Garg Fangyan Sun Music dataset is too big while life is short!!!! You need someone to

Intelligently Recommending Key Bindings on Physical Keyboards with Demonstrations in Emacs Shudan

RUN groupadd -r user && useradd -r -g user user USER user $ docker run --read-only debian

Screen 1 Go to www.myenroll.com < Click Request User ID and Password> Acquire USER ID and

Outline 2 Motivation Current cyber defense landscape & open questions Pro-active