Data Poisoning Attack cks on Stoch chastic c Bandits Fang Liu and Ness Shroff
Outline • Background q What are bandits? q Motivations • Data poisoning attacks on stochastic bandits q Offline model q Online model q Simulation results • Conclusions and discussions
What are bandits? • Repeated game between an agent and an environment action Reinforcement learning environment agent MAB x Stochastic x Online learning reward
<latexit sha1_base64="OGLDUleD2BV15SICbUcCikU7Q/o=">ACBHicbVA9SwNBEN2LXzF+nVpqcRiE2ISLCGoXtBEsjJAzgVwIe5u5y5K9D3fnhHCksfGv2Fio2Poj7Pw3bj4KTXw8Hhvhpl5XiK4Qtv+NnILi0vLK/nVwtr6xuaWub1zp+JUMnBYLGLZ9KgCwSNwkKOAZiKBhp6Ahte/HPmNB5CKx1EdBwm0QxpE3OeMopY65r57E0JAXQE+lx1LzG7rg9dyYMeHnXMol2x7DmSWVKimSKWsf8crsxS0OIkAmqVKtiJ9jOqETOBAwLbqogoaxPA2hpGtEQVDsbfzG0DrXStfxY6orQGqu/JzIaKjUIPd0ZUuypW8k/ue1UvTP2hmPkhQhYpNFfiosjK1RJFaXS2AoBpQJrm+1WI9KilDHVxBh1CZfXmeOMfl87J9e1KsXkzTyJM9ckBKpEJOSZVckRpxCOP5Jm8kjfjyXgx3o2PSWvOmM7skj8wPn8A8GiYag=</latexit> <latexit sha1_base64="OGLDUleD2BV15SICbUcCikU7Q/o=">ACBHicbVA9SwNBEN2LXzF+nVpqcRiE2ISLCGoXtBEsjJAzgVwIe5u5y5K9D3fnhHCksfGv2Fio2Poj7Pw3bj4KTXw8Hhvhpl5XiK4Qtv+NnILi0vLK/nVwtr6xuaWub1zp+JUMnBYLGLZ9KgCwSNwkKOAZiKBhp6Ahte/HPmNB5CKx1EdBwm0QxpE3OeMopY65r57E0JAXQE+lx1LzG7rg9dyYMeHnXMol2x7DmSWVKimSKWsf8crsxS0OIkAmqVKtiJ9jOqETOBAwLbqogoaxPA2hpGtEQVDsbfzG0DrXStfxY6orQGqu/JzIaKjUIPd0ZUuypW8k/ue1UvTP2hmPkhQhYpNFfiosjK1RJFaXS2AoBpQJrm+1WI9KilDHVxBh1CZfXmeOMfl87J9e1KsXkzTyJM9ckBKpEJOSZVckRpxCOP5Jm8kjfjyXgx3o2PSWvOmM7skj8wPn8A8GiYag=</latexit> <latexit sha1_base64="OGLDUleD2BV15SICbUcCikU7Q/o=">ACBHicbVA9SwNBEN2LXzF+nVpqcRiE2ISLCGoXtBEsjJAzgVwIe5u5y5K9D3fnhHCksfGv2Fio2Poj7Pw3bj4KTXw8Hhvhpl5XiK4Qtv+NnILi0vLK/nVwtr6xuaWub1zp+JUMnBYLGLZ9KgCwSNwkKOAZiKBhp6Ahte/HPmNB5CKx1EdBwm0QxpE3OeMopY65r57E0JAXQE+lx1LzG7rg9dyYMeHnXMol2x7DmSWVKimSKWsf8crsxS0OIkAmqVKtiJ9jOqETOBAwLbqogoaxPA2hpGtEQVDsbfzG0DrXStfxY6orQGqu/JzIaKjUIPd0ZUuypW8k/ue1UvTP2hmPkhQhYpNFfiosjK1RJFaXS2AoBpQJrm+1WI9KilDHVxBh1CZfXmeOMfl87J9e1KsXkzTyJM9ckBKpEJOSZVckRpxCOP5Jm8kjfjyXgx3o2PSWvOmM7skj8wPn8A8GiYag=</latexit> <latexit sha1_base64="VZwEYdZ9FuCainyckm02XT8hLWM=">ACPHicbZBLS8QwFIVT346vUZdugoOgoEMrgroQfCAIblRmfNDWkmbSmWCaluRWHEr/mBv/gzt3blyouHVtZpyFjl4InHz3XpJzwlRwDb9ZA0MDg2PjI6NlyYmp6ZnyrNz5zrJFGV1mohEXYZEM8ElqwMHwS5TxUgcCnYR3hx0+he3TGmeyBq0U+bHpCl5xCkBg4Jy7Wy5trLjxQRaYZgfFp5gEbjmfhfk3OMSu8d+4eksDnLYcYr2pXhq1Cs9bG9Az1FG+2wA/KFbtqdwv/FU5PVFCvToLyo9dIaBYzCVQrV3HTsHPiQJOBStKXqZSugNaTLXSElipv2867AS4Y0cJQocyTgLv25kZNY63YcmsmOTd3f68D/em4G0Zafc5lmwCT9fijKBIYEd6LEDa4YBdE2glDFzV8xbRFKJjASyYEp9/yX1Ffr25X7dONyu5+L40xtIAW0TJy0CbaRUfoBNURfoGb2iN+vBerHerY/v0QGrtzOPfpX1+QUmG690</latexit> <latexit sha1_base64="VZwEYdZ9FuCainyckm02XT8hLWM=">ACPHicbZBLS8QwFIVT346vUZdugoOgoEMrgroQfCAIblRmfNDWkmbSmWCaluRWHEr/mBv/gzt3blyouHVtZpyFjl4InHz3XpJzwlRwDb9ZA0MDg2PjI6NlyYmp6ZnyrNz5zrJFGV1mohEXYZEM8ElqwMHwS5TxUgcCnYR3hx0+he3TGmeyBq0U+bHpCl5xCkBg4Jy7Wy5trLjxQRaYZgfFp5gEbjmfhfk3OMSu8d+4eksDnLYcYr2pXhq1Cs9bG9Az1FG+2wA/KFbtqdwv/FU5PVFCvToLyo9dIaBYzCVQrV3HTsHPiQJOBStKXqZSugNaTLXSElipv2867AS4Y0cJQocyTgLv25kZNY63YcmsmOTd3f68D/em4G0Zafc5lmwCT9fijKBIYEd6LEDa4YBdE2glDFzV8xbRFKJjASyYEp9/yX1Ffr25X7dONyu5+L40xtIAW0TJy0CbaRUfoBNURfoGb2iN+vBerHerY/v0QGrtzOPfpX1+QUmG690</latexit> <latexit sha1_base64="VZwEYdZ9FuCainyckm02XT8hLWM=">ACPHicbZBLS8QwFIVT346vUZdugoOgoEMrgroQfCAIblRmfNDWkmbSmWCaluRWHEr/mBv/gzt3blyouHVtZpyFjl4InHz3XpJzwlRwDb9ZA0MDg2PjI6NlyYmp6ZnyrNz5zrJFGV1mohEXYZEM8ElqwMHwS5TxUgcCnYR3hx0+he3TGmeyBq0U+bHpCl5xCkBg4Jy7Wy5trLjxQRaYZgfFp5gEbjmfhfk3OMSu8d+4eksDnLYcYr2pXhq1Cs9bG9Az1FG+2wA/KFbtqdwv/FU5PVFCvToLyo9dIaBYzCVQrV3HTsHPiQJOBStKXqZSugNaTLXSElipv2867AS4Y0cJQocyTgLv25kZNY63YcmsmOTd3f68D/em4G0Zafc5lmwCT9fijKBIYEd6LEDa4YBdE2glDFzV8xbRFKJjASyYEp9/yX1Ffr25X7dONyu5+L40xtIAW0TJy0CbaRUfoBNURfoGb2iN+vBerHerY/v0QGrtzOPfpX1+QUmG690</latexit> <latexit sha1_base64="L/g2Nbif5/pHGY/b/yPoMjLzs4=">AB63icbVA9SwNBEJ2LXzF+RS1tFoNgFS4iqF3QxjKCZwLJEfY2e8mS3b1jd04IR36DjYWKrX/Izn/jJrlCow8GHu/NMDMvSqWw6PtfXmldW19o7xZ2dre2d2r7h82CQzjAcskYnpRNRyKTQPUKDkndRwqiLJ29H4Zua3H7mxItH3OEl5qOhQi1gwik4Keiri3615tf9Ochf0ihIDQq0+tXP3iBhmeIamaTWdht+imFODQom+bTSyxPKRvTIe86qniNsznx07JiVMGJE6MK41krv6cyKmydqIi16kojuyNxP/87oZxpdhLnSaIdsSjOJMGEzD4nA2E4QzlxhDIj3K2EjaihDF0+FRdCY/nlvyQ4q1/V/bvzWvO6SKMR3AMp9CAC2jCLbQgAYCnuAFXj3tPXtv3vuiteQVM4fwC97HN0W8joU=</latexit> <latexit sha1_base64="L/g2Nbif5/pHGY/b/yPoMjLzs4=">AB63icbVA9SwNBEJ2LXzF+RS1tFoNgFS4iqF3QxjKCZwLJEfY2e8mS3b1jd04IR36DjYWKrX/Izn/jJrlCow8GHu/NMDMvSqWw6PtfXmldW19o7xZ2dre2d2r7h82CQzjAcskYnpRNRyKTQPUKDkndRwqiLJ29H4Zua3H7mxItH3OEl5qOhQi1gwik4Keiri3615tf9Ochf0ihIDQq0+tXP3iBhmeIamaTWdht+imFODQom+bTSyxPKRvTIe86qniNsznx07JiVMGJE6MK41krv6cyKmydqIi16kojuyNxP/87oZxpdhLnSaIdsSjOJMGEzD4nA2E4QzlxhDIj3K2EjaihDF0+FRdCY/nlvyQ4q1/V/bvzWvO6SKMR3AMp9CAC2jCLbQgAYCnuAFXj3tPXtv3vuiteQVM4fwC97HN0W8joU=</latexit> <latexit sha1_base64="L/g2Nbif5/pHGY/b/yPoMjLzs4=">AB63icbVA9SwNBEJ2LXzF+RS1tFoNgFS4iqF3QxjKCZwLJEfY2e8mS3b1jd04IR36DjYWKrX/Izn/jJrlCow8GHu/NMDMvSqWw6PtfXmldW19o7xZ2dre2d2r7h82CQzjAcskYnpRNRyKTQPUKDkndRwqiLJ29H4Zua3H7mxItH3OEl5qOhQi1gwik4Keiri3615tf9Ochf0ihIDQq0+tXP3iBhmeIamaTWdht+imFODQom+bTSyxPKRvTIe86qniNsznx07JiVMGJE6MK41krv6cyKmydqIi16kojuyNxP/87oZxpdhLnSaIdsSjOJMGEzD4nA2E4QzlxhDIj3K2EjaihDF0+FRdCY/nlvyQ4q1/V/bvzWvO6SKMR3AMp9CAC2jCLbQgAYCnuAFXj3tPXtv3vuiteQVM4fwC97HN0W8joU=</latexit> <latexit sha1_base64="BuR+vJ1wgr8ILupSCqF3XjLOl6o=">ACLHicbZDfShtBFMZn/VM12hrtpTeDQYjShk0pqHdBeyEoqJBUIROX2cnZzZCZ3WXmrBCWfSFvfJVS8MIWb30OJzEXNvbAD+7xmzhdmSlr0/Udvbn5h8cPS8kplde3jp/XqxuZPm+ZGQEekKjXIbegZAIdlKjgOjPAdajgKhwej/2rWzBWpkbRxn0NI8TGUnB0UlB9Qc71xBzpiDCOrO5DgpZshwUTCd3+x9dXcgy+L0rD4m/mWi7pZMpTFtMyPjAe4G1Zrf8CdF30NzCjUyrYug+pv1U5FrSFAobm236WfYK7hBKRSUFZbyLgY8hi6DhOuwfaKybYl3XFKn0apcSdBOlHfThRcWzvSoevUHAd21huL/O6OUYHvUImWY6QiNeHolxRTOk4OtqXBgSqkQMujHR/pWLAXVToAq64EJqzK7+HzrfGYcO/F5rHU3TWCZbZJvUSZPskxY5IRekQwS5I7/I/nj3XsP3l/v6bV1zpvOfCb/lPf8AoXCqHk=</latexit> <latexit sha1_base64="BuR+vJ1wgr8ILupSCqF3XjLOl6o=">ACLHicbZDfShtBFMZn/VM12hrtpTeDQYjShk0pqHdBeyEoqJBUIROX2cnZzZCZ3WXmrBCWfSFvfJVS8MIWb30OJzEXNvbAD+7xmzhdmSlr0/Udvbn5h8cPS8kplde3jp/XqxuZPm+ZGQEekKjXIbegZAIdlKjgOjPAdajgKhwej/2rWzBWpkbRxn0NI8TGUnB0UlB9Qc71xBzpiDCOrO5DgpZshwUTCd3+x9dXcgy+L0rD4m/mWi7pZMpTFtMyPjAe4G1Zrf8CdF30NzCjUyrYug+pv1U5FrSFAobm236WfYK7hBKRSUFZbyLgY8hi6DhOuwfaKybYl3XFKn0apcSdBOlHfThRcWzvSoevUHAd21huL/O6OUYHvUImWY6QiNeHolxRTOk4OtqXBgSqkQMujHR/pWLAXVToAq64EJqzK7+HzrfGYcO/F5rHU3TWCZbZJvUSZPskxY5IRekQwS5I7/I/nj3XsP3l/v6bV1zpvOfCb/lPf8AoXCqHk=</latexit> <latexit sha1_base64="BuR+vJ1wgr8ILupSCqF3XjLOl6o=">ACLHicbZDfShtBFMZn/VM12hrtpTeDQYjShk0pqHdBeyEoqJBUIROX2cnZzZCZ3WXmrBCWfSFvfJVS8MIWb30OJzEXNvbAD+7xmzhdmSlr0/Udvbn5h8cPS8kplde3jp/XqxuZPm+ZGQEekKjXIbegZAIdlKjgOjPAdajgKhwej/2rWzBWpkbRxn0NI8TGUnB0UlB9Qc71xBzpiDCOrO5DgpZshwUTCd3+x9dXcgy+L0rD4m/mWi7pZMpTFtMyPjAe4G1Zrf8CdF30NzCjUyrYug+pv1U5FrSFAobm236WfYK7hBKRSUFZbyLgY8hi6DhOuwfaKybYl3XFKn0apcSdBOlHfThRcWzvSoevUHAd21huL/O6OUYHvUImWY6QiNeHolxRTOk4OtqXBgSqkQMujHR/pWLAXVToAq64EJqzK7+HzrfGYcO/F5rHU3TWCZbZJvUSZPskxY5IRekQwS5I7/I/nj3XsP3l/v6bV1zpvOfCb/lPf8AoXCqHk=</latexit> <latexit sha1_base64="fK3DLAWfZrBJ4YWuVe+8yrb6qWg=">AB73icbVBNS8NAEN34WetX1aOXxSJ4kJKIoN6qXjxWMLalDWGz3bRLdzdhdyKU0F/hxYOKV/+ON/+N2zYHbX0w8Hhvhpl5USq4Adf9dpaWV1bX1ksb5c2t7Z3dyt7+o0kyTZlPE5HoVkQME1wxHzgI1ko1IzISrBkNbyd+84lpwxP1AKOUBZL0FY85JWCldjvMr0M4hXFYqbo1dwq8SLyCVFGBRlj56vYSmkmgApiTMdzUwhyoFTwcblbmZYSuiQ9FnHUkUkM0E+PXiMj63Sw3GibSnAU/X3RE6kMSMZ2U5JYGDmvYn4n9fJIL4Mcq7SDJis0VxJjAkePI97nHNKIiRJYRqbm/FdEA0oWAzKtsQvPmXF4l/Vruqufn1fpNkUYJHaIjdI8dIHq6A41kI8okugZvaI3RzsvzrvzMWtdcoqZA/QHzucP89iQCw=</latexit> <latexit sha1_base64="fK3DLAWfZrBJ4YWuVe+8yrb6qWg=">AB73icbVBNS8NAEN34WetX1aOXxSJ4kJKIoN6qXjxWMLalDWGz3bRLdzdhdyKU0F/hxYOKV/+ON/+N2zYHbX0w8Hhvhpl5USq4Adf9dpaWV1bX1ksb5c2t7Z3dyt7+o0kyTZlPE5HoVkQME1wxHzgI1ko1IzISrBkNbyd+84lpwxP1AKOUBZL0FY85JWCldjvMr0M4hXFYqbo1dwq8SLyCVFGBRlj56vYSmkmgApiTMdzUwhyoFTwcblbmZYSuiQ9FnHUkUkM0E+PXiMj63Sw3GibSnAU/X3RE6kMSMZ2U5JYGDmvYn4n9fJIL4Mcq7SDJis0VxJjAkePI97nHNKIiRJYRqbm/FdEA0oWAzKtsQvPmXF4l/Vruqufn1fpNkUYJHaIjdI8dIHq6A41kI8okugZvaI3RzsvzrvzMWtdcoqZA/QHzucP89iQCw=</latexit> <latexit sha1_base64="fK3DLAWfZrBJ4YWuVe+8yrb6qWg=">AB73icbVBNS8NAEN34WetX1aOXxSJ4kJKIoN6qXjxWMLalDWGz3bRLdzdhdyKU0F/hxYOKV/+ON/+N2zYHbX0w8Hhvhpl5USq4Adf9dpaWV1bX1ksb5c2t7Z3dyt7+o0kyTZlPE5HoVkQME1wxHzgI1ko1IzISrBkNbyd+84lpwxP1AKOUBZL0FY85JWCldjvMr0M4hXFYqbo1dwq8SLyCVFGBRlj56vYSmkmgApiTMdzUwhyoFTwcblbmZYSuiQ9FnHUkUkM0E+PXiMj63Sw3GibSnAU/X3RE6kMSMZ2U5JYGDmvYn4n9fJIL4Mcq7SDJis0VxJjAkePI97nHNKIiRJYRqbm/FdEA0oWAzKtsQvPmXF4l/Vruqufn1fpNkUYJHaIjdI8dIHq6A41kI8okugZvaI3RzsvzrvzMWtdcoqZA/QHzucP89iQCw=</latexit> What are bandits? • Model § At each (discrete) time t, the agent plays action A t from a set of K actions § The agent receives reward , drawn from unknown distribution A t Y A t ,t • Performance measure " T T # § Regret(loss) X X R ( T ) = E max Y i,t − Y A t ,t i ∈ [ K ] t =1 t =1 § Minimize regret = maximize total reward • Regret lower bounds X ! µ ∗ − µ i § Problem-dependent: where is expected reward KL ( µ a , µ ∗ ) log T Ω µ i i § Problem-independent: ⇣ √ ⌘ Ω KT • Popular algorithms § Upper Confidence Bounds (UCB), Thompson Sampling, epsilon-greedy
Motivations • Adversarial learning is well studied in deep learning • How robust are bandits? • Many applications § Clinical trials § Recommendation systems § Ad placement § A/B test § A component of game-playing algorithms (MCTS), e.g. AlphaGo § Resource allocation • If under stealthy attack, hard to detect (due to limited feedback)
Recommend
More recommend