Bandits
Advanced Econometrics 2, Hilary term 2021 Multi-armed bandits
Maximilian Kasy
Department of Economics, Oxford University
1 / 25
Advanced Econometrics 2, Hilary term 2021 Multi-armed bandits - - PowerPoint PPT Presentation
Bandits Advanced Econometrics 2, Hilary term 2021 Multi-armed bandits Maximilian Kasy Department of Economics, Oxford University 1 / 25 Bandits Agenda Thus far: Supervised machine learning data are given. Next: Active
Bandits
Department of Economics, Oxford University
1 / 25
Bandits
2 / 25
Bandits
3 / 25
Bandits The multi-armed bandit
t ,
t ∼ F d
t ] = θ d
4 / 25
Bandits The multi-armed bandit
d
d
T ∑ 1≤t≤T
t≥1
5 / 25
Bandits The multi-armed bandit
T ∑ 1≤t≤T
t≥1
T ∑ 1≤t≤T
t
T ∑ 1≤t≤T
Bandits The multi-armed bandit
7 / 25
Bandits Two popular algorithms
t = 1 T d
t ∑
1≤s≤t
t = ∑ 1≤s≤t
t = B(T d t ).
d
t + Bd t .
8 / 25
Bandits Two popular algorithms
d
t+1.
9 / 25
Bandits Two popular algorithms
t ∼ Ber(θ d).
t = 1+ T d t · ¯
t ,
t = 1+ T d t ·(1− ¯
t ).
d
t ∼ Beta(αd t ,β d t )
10 / 25
Bandits Regret bounds
T ∑ 1≤t≤T
t
T ∑ 1≤t≤T
d
T ]·∆d.
T ] small when ∆d > 0.
T ] for the UCB algorithm.
11 / 25
Bandits Regret bounds
T ∑1≤t≤T Yt for i.i.d. Yt.
12 / 25
Bandits Regret bounds
13 / 25
Bandits Regret bounds
t = 1 T d
t ∑
1≤s≤t
t = (ψ∗)−1
t
t −θ d > Bd t ) ≤ exp(−T d t ·ψ∗(Bd t ))
t −θ d < −Bd t ) ≤ t−α.
14 / 25
Bandits Regret bounds
t =
t
t =
t
15 / 25
Bandits Regret bounds
t
t
t − Bd t > θ d
t > ∆d.
t
t
t − Bd t > θ d
t is small. By definition of Bd t , 3 happens iff
t <
16 / 25
Bandits Regret bounds
17 / 25
Bandits Regret bounds
T =
T periods
T .
T ] = ∑ 1≤t≤T
T + ∑
˜
T d
T <t≤T
T + ∑
˜
T d
T <t≤T
T + ∑
˜
T d
T <t≤T
T + ∑
˜
T d
T <t≤T
T +
18 / 25
Bandits Regret bounds
T ] ≤ α log(T)
T ∑ d
19 / 25
Bandits Regret bounds
t ] = o(ta) for all a > 0. Then
t→∞ T
d
t→∞ T
d
20 / 25
Bandits Gittins index
t as the charge which would make you indifferent between playing or not,
21 / 25
Bandits Gittins index
t = sup
1≤s≤τ
d
t .
22 / 25
Bandits Gittins index
1 .
t .
23 / 25
Bandits Contextual bandits
24 / 25
Bandits References
25 / 25