Correlated bandits or: How to minimize mean-squared error online 1 LinkedIn Corp. 2 Indian Institute of Technology Madras. A portion of this work was done while the authors were at University of Maryland, College Park 1 V. Praneeth Boda 1 and Prashanth L. A. 2
Centrality among Bandits for measuring temperature in a region. approximate the whole network. Aim: Find arm with highest information about other arms 2 ▶ Placement of sensors used ▶ Best set of towers which
Minimum Mean Squared Error Estimation g The optimal K 3 MMSE ▶ Jointly Gaussian arms X M = ( X 1 , . . . , X K ) , with zero mean and covariance matrix Σ ≜ E [ X T M X M ] . E i ≜ min [( ) T ( )] X M − g ( X i ) X M − g ( X i ) E [( ) 2 ] ∑ ∑ = E X j − E [ X j | X i ] = σ 2 j ( 1 − ρ 2 ij ) j = 1 j ̸ = i g ∗ ( X i ) = E [ X M | X i ] = [ E [ X 1 | X i ] . . . E [ X K | X i ]] T , with E [ X j | X i ] = E [ X j X i ] i ] X i = ρ ij σ j X i . E [ X 2 σ i
Correlated Bandits Observe a sample from the bivariate endfor A n based on sample-based MSE-value estimates necessary for estimating correlation structure 4 Input : set of arm-pairs S ≜ { ( i , j ) | i , j = 1 , . . . , K , i < j } , number of rounds n For t = 1 , 2 , . . . , n do Select a pair ( i t , j t ) ∈ S distribution corresponding to the arms i t , j t Output an arm ˆ so that P ( A n ̸ = i ∗ ) is minimized. Here i ∗ = arg min E i . i ∈M
MSE Estimation and Concentration j i cK 5 Based on samples of the Gaussian arms: Sample correlation Sample variance ij 5 MSE of arm i ( ) ˆ ∑ E i ≜ σ 2 ˆ 1 − ˆ ρ 2 . j ̸ = i MSE Concentration: Assume σ 2 i ≤ 1 , i = 1 , . . . , K . Then, for any i = 1 , . . . , K , and for any ϵ ∈ [ 0 , 2 K ] , we have ( − nl 2 ϵ 2 ) (� � ) � ˆ E i − E i � > ϵ ≤ 14 K exp , P � � where c is a universal constant, and 0 < l = min σ 2 i .
SR algorithm: Illustration of arm-pair elimination (1,2) are eliminated (4,5) (3,5) (3,4) (2,5) (2,4) (2,3) (1,5) (1,4) (1,3) eliminated Maintain active arms and arm-pairs (4,5) (3,5) (3,4) (2,5) (2,4) (2,3) (1,5) (1,4) (1,3) (1,2) 6 Active arm-pairs after arms 4 , 5 are Active arm-pairs after arms 3 , 4 , 5
Successive Rejects: An algorithm to find the best arm arm with lowest MSE times) 2 play each arm-pair Play the remaining two arm Phase . . Initial- . . . Play each arm pair in A 2 , Phase 2 . Pull each pair in A 1 , n 1 2 ization 7 Phase 1 A 1 = all arm pairs, ▶ One arm pair played n 1 B 1 = { 1 , . . . , K } , times, . . . , another two ⌈ ⌉ n − ( K ) n k = , C ( K ) ≈ C ( K ) ( K + 1 − k ) played n 2 times K log K . ▶ k arms played n k + 1 times times; Set B k + 1 = B k \ K − 1 ▶ ∑ ( k − 1 ) n k + ( K − 1 ) n K − 1 < n , k = 1 n 2 − n 1 times; Eliminate . . . ▶ n k increases with k ▶ Adaptive exploration: better than uniform ( = ( K ) n / pairs n K − 1 − n K − 2 times K − 1
Thanks. Questions? 8
Recommend
More recommend