ConQUR: Mitigating Delusional Bias in Deep Q-Learning DiJia (Andy) Su (Princeton) Jayden Ooi (Google) Tyler Lu (Google) Dale Schuurmans (Google) Craig Boutilier (Google)
-
•
●
• • • •
• •
• •
Search: Q-Regressors P 13 Proprietary & Confidential
Batch #1 - - - ……………………………………….. - ……………………………………….. P 14 Proprietary & Confidential
Batch #1 Batch #1 - - Generate & fit G e n e - - r a Q Q-labels t e - l a & b - ……………………………………….. - ……………………………………….. fi e t l s - ……………………………………….. - ……………………………………….. P 15 Proprietary & Confidential
Batch #1 Batch #1 - - Generate & fit G e n e - - r a Q Q-labels t e - l a & b - ……………………………………….. - ……………………………………….. fi e t l s - ……………………………………….. - ……………………………………….. P 16 Proprietary & Confidential
Batch #1 Batch #1 - - Generate & fit G e n e - - r a Q Q-labels t e - l a & b - ……………………………………….. - ……………………………………….. fi e t l s - ……………………………………….. - ……………………………………….. P 17 Proprietary & Confidential
Batch #1 Batch #1 G - - Generate & fit e n e r a - - Q t Q-labels e - l a & b fi e - ……………………………………….. - ……………………………………….. l t s - ……………………………………….. - ……………………………………….. Q-Label Set Q-Label Set #1 #2 P 18 Proprietary & Confidential
Batch #1 Batch #1 - - - - - ……………………………………….. - ……………………………………….. - ……………………………………….. - ……………………………………….. Q-Label Set Q-Label Set #1 #2 P 19 Proprietary & Confidential
Batch #1 Batch #1 - - - - - ……………………………………….. - ……………………………………….. - ……………………………………….. - ……………………………………….. Batch #2 - - - ……………………………………….. - ……………………………………….. P 20 Proprietary & Confidential
Batch #1 Batch #1 - - - - - ……………………………………….. - ……………………………………….. - ……………………………………….. - ……………………………………….. Batch #2 - - - ……………………………………….. - ……………………………………….. P 21 Proprietary & Confidential
Batch #1 Batch #1 - - - - - ……………………………………….. - ……………………………………….. - ……………………………………….. - ……………………………………….. Batch #2 - - - ……………………………………….. - ……………………………………….. P 22 Proprietary & Confidential
……… P 23 Proprietary & Confidential
- - - -
1)
λ
λ
• • • • -
•
•
• • • •
Recommend
More recommend