Safe Exploration for Interactive Machine Learning Matteo Turchetta, Felix Berkenkamp, Andreas Krause
<latexit sha1_base64="6dFmwAauOuGIL1koQpOLW+AHK4o=">AB7XicbVBNSwMxEJ2tX7V+VT16CRahIpTdKuix6MVjBfsB7VKyabaNZpMlyapl6X/w4kERr/4fb/4b03YP2vpg4PHeDPzgpgzbVz328ktLa+sruXCxubW9s7xd29paJIrRBJeqHWBNORO0YZjhtB0riqOA01ZwfzXxWw9UaSbFrRnF1I/wQLCQEWys1AzLT8cnj71iya24U6BF4mWkBnqveJXty9JElFhCMdadzw3Nn6KlWGE03Ghm2gaY3KPB7RjqcAR1X46vXaMjqzSR6FUtoRBU/X3RIojrUdRYDsjbIZ63puI/3mdxIQXfspEnBgqyGxRmHBkJq8jvpMUWL4yBJMFLO3IjLEChNjAyrYELz5lxdJs1rxTivVm7NS7TKLIw8HcAhl8OAcanANdWgAgTt4hld4c6Tz4rw7H7PWnJPN7MfOJ8/r6Oiw=</latexit> <latexit sha1_base64="hL+FaLtOT9luwfLW3Ut08xl3Pcw=">AB6HicbVDLTgJBEOzF+IL9ehlIjHxRHbRI9ELx4hkUcCGzI79MLI7OxmZtZICF/gxYPGePWTvPk3DrAHBSvpFLVne6uIBFcG9f9dnJr6xubW/ntws7u3v5B8fCoqeNUMWywWMSqHVCNgktsG4EthOFNAoEtoLR7cxvPaLSPJb3ZpygH9GB5CFn1Fip/tQrltyOwdZJV5GSpCh1it+dfsxSyOUhgmqdcdzE+NPqDKcCZwWuqnGhLIRHWDHUkj1P5kfuiUnFmlT8JY2ZKGzNXfExMaT2OAtsZUTPUy95M/M/rpCa89idcJqlByRaLwlQE5PZ16TPFTIjxpZQpri9lbAhVZQZm03BhuAtv7xKmpWyd1Gu1C9L1ZsjycwCmcgwdXUIU7qEDGCA8wyu8OQ/Oi/PufCxac042cwx/4Hz+AOeHjQA=</latexit> Interactive Machine Learning • Agent can query no noisy sy values of an unk unkno nown n function • Use data to make inform rmed queries x f ( x ) + w • Available queries may depend from previous ones: model dependency with directed graph • Includes: Bayesian optimization, active learning and exploration of deterministic Markov decision processes Icon made by Freepik, Good Ware from www.flaticon.com Matteo Turchetta
<latexit sha1_base64="hL+FaLtOT9luwfLW3Ut08xl3Pcw=">AB6HicbVDLTgJBEOzF+IL9ehlIjHxRHbRI9ELx4hkUcCGzI79MLI7OxmZtZICF/gxYPGePWTvPk3DrAHBSvpFLVne6uIBFcG9f9dnJr6xubW/ntws7u3v5B8fCoqeNUMWywWMSqHVCNgktsG4EthOFNAoEtoLR7cxvPaLSPJb3ZpygH9GB5CFn1Fip/tQrltyOwdZJV5GSpCh1it+dfsxSyOUhgmqdcdzE+NPqDKcCZwWuqnGhLIRHWDHUkj1P5kfuiUnFmlT8JY2ZKGzNXfExMaT2OAtsZUTPUy95M/M/rpCa89idcJqlByRaLwlQE5PZ16TPFTIjxpZQpri9lbAhVZQZm03BhuAtv7xKmpWyd1Gu1C9L1ZsjycwCmcgwdXUIU7qEDGCA8wyu8OQ/Oi/PufCxac042cwx/4Hz+AOeHjQA=</latexit> <latexit sha1_base64="CtSaWxXLG+cQl1A/lMlyUODJ8dM=">AB7nicbVA9SwNBEJ2LXzF+RS1tFoMQm3AXBS0sgjaWEcwHJEfY2+wlS/Z2z909MRz5ETYWitj6e+z8N26SKzTxwcDjvRlm5gUxZ9q47reTW1ldW9/Ibxa2tnd294r7B0tE0Vog0guVTvAmnImaMw2k7VhRHAaetYHQz9VuPVGkmxb0Zx9SP8ECwkBFsrNR6KD+dXiG3Vy5FXcGtEy8jJQgQ71X/Or2JUkiKgzhWOuO58bGT7EyjHA6KXQTWNMRnhAO5YKHFHtp7NzJ+jEKn0USmVLGDRTf0+kONJ6HAW2M8JmqBe9qfif10lMeOmnTMSJoYLMF4UJR0ai6e+ozxQlho8twUQxeysiQ6wMTahg3BW3x5mTSrFe+sUr07L9WuszjycATHUAYPLqAGt1CHBhAYwTO8wpsTOy/Ou/Mxb8052cwh/IHz+QPFw46K</latexit> <latexit sha1_base64="1WDpB7YDdQ1gSi87g0UxOTPeTV4=">AB8XicbVDLSgNBEOz1GeMr6tHLYBDiJexGQY9BLx4jmAcmS5idJIhs7ObmVkxLPkLx4U8erfePNvnDwOmljQUFR1090VxIJr47rfzsrq2vrGZmYru72zu7efOzis6ShRDKsEpFqBFSj4BKrhuBjVghDQOB9WBwM/Hrj6g0j+S9GcXoh7QneZczaqz0MCw8nbV6OCRuO5d3i+4UZJl4c5KHOSrt3FerE7EkRGmYoFo3PTc2fkqV4UzgONtKNMaUDWgPm5ZKGqL20+nFY3JqlQ7pRsqWNGSq/p5Iaj1KAxsZ0hNXy96E/E/r5mY7pWfchknBiWbLeomgpiITN4nHa6QGTGyhDLF7a2E9amizNiQsjYEb/HlZVIrFb3zYunuIl+nseRgWM4gQJ4cAluIUKVIGBhGd4hTdHOy/Ou/Mxa1x5jNH8AfO5w9OaZAF</latexit> <latexit sha1_base64="6dFmwAauOuGIL1koQpOLW+AHK4o=">AB7XicbVBNSwMxEJ2tX7V+VT16CRahIpTdKuix6MVjBfsB7VKyabaNZpMlyapl6X/w4kERr/4fb/4b03YP2vpg4PHeDPzgpgzbVz328ktLa+sruXCxubW9s7xd29paJIrRBJeqHWBNORO0YZjhtB0riqOA01ZwfzXxWw9UaSbFrRnF1I/wQLCQEWys1AzLT8cnj71iya24U6BF4mWkBnqveJXty9JElFhCMdadzw3Nn6KlWGE03Ghm2gaY3KPB7RjqcAR1X46vXaMjqzSR6FUtoRBU/X3RIojrUdRYDsjbIZ63puI/3mdxIQXfspEnBgqyGxRmHBkJq8jvpMUWL4yBJMFLO3IjLEChNjAyrYELz5lxdJs1rxTivVm7NS7TKLIw8HcAhl8OAcanANdWgAgTt4hld4c6Tz4rw7H7PWnJPN7MfOJ8/r6Oiw=</latexit> Safety constrained interactive machine learning f ( x ) + w q ( x ) ≥ 0 Unknown safety constrai raint q(x)>0 that must be satisfied at al all ti times x q ( x ) < 0 Encompasses many problems Therapy design Mars exploration Model free RL [Sui et al. 2015], [Turchetta et al. 2016], [Berkenkamp et al. 2016] [Sui et al. 2018] [Wachi et al. 2018] Icon made by Smashicons from www.flaticon.com Matteo Turchetta
<latexit sha1_base64="vhcSZKB5REty1fMyhkxMk/A5FwM=">AB8nicbVDLSgMxFL1TX7W+qi7dBIvgqsxUQZdFXbisYB8wHUomzbShmWRIMkIZ+hluXCji1q9x59+YaWehrQcCh3PuJeMOFMG9f9dkpr6xubW+Xtys7u3v5B9fCo2WqCG0TyaXqhVhTzgRtG2Y47SWK4jktBtObnO/+0SVZlI8mlCgxiPBIsYwcZKfj/GZkwz+5mg2rNrbtzoFXiFaQGBVqD6ld/KEkaU2EIx1r7npuYIMPKMLprNJPNU0wmeAR9S0VOKY6yOaRZ+jMKkMUSWfMGiu/t7IcKz1NA7tZB5RL3u5+J/npya6DjImktRQRYfRSlHRqL8fjRkihLDp5ZgopjNisgYK0yMbaliS/CWT14lnUbdu6g3Hi5rzZuijKcwCmcgwdX0IR7aEbCEh4hld4c4z4rw7H4vRklPsHMfOJ8/diuRXg=</latexit> <latexit sha1_base64="PkvxSbIJAC9+r2/QYT+PqTqc0Ag=">AB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkV9Fj0oMeK9gPaUDbTbt0swm7E6GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSKQw6Lrfzsrq2vrGZmGruL2zu7dfOjhsmjVjDdYLGPdDqjhUijeQIGStxPNaRI3gpGN1O/9cS1EbF6xHC/YgOlAgFo2ilh9se9kplt+LOQJaJl5My5Kj3Sl/dfszSiCtkhrT8dwE/YxqFEzySbGbGp5QNqID3rFU0YgbP5udOiGnVumTMNa2FJKZ+nsio5Ex4yiwnRHFoVn0puJ/XifF8MrPhEpS5IrNF4WpJBiT6d+kLzRnKMeWUKaFvZWwIdWUoU2naEPwFl9eJs1qxTuvVO8vyrXrPI4CHMJnIEHl1CDO6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEn5I2</latexit> <latexit sha1_base64="PkvxSbIJAC9+r2/QYT+PqTqc0Ag=">AB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkV9Fj0oMeK9gPaUDbTbt0swm7E6GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSKQw6Lrfzsrq2vrGZmGruL2zu7dfOjhsmjVjDdYLGPdDqjhUijeQIGStxPNaRI3gpGN1O/9cS1EbF6xHC/YgOlAgFo2ilh9se9kplt+LOQJaJl5My5Kj3Sl/dfszSiCtkhrT8dwE/YxqFEzySbGbGp5QNqID3rFU0YgbP5udOiGnVumTMNa2FJKZ+nsio5Ex4yiwnRHFoVn0puJ/XifF8MrPhEpS5IrNF4WpJBiT6d+kLzRnKMeWUKaFvZWwIdWUoU2naEPwFl9eJs1qxTuvVO8vyrXrPI4CHMJnIEHl1CDO6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEn5I2</latexit> <latexit sha1_base64="iTYg1Ft2L5xMtjPOc3OrWXcTUmE=">AB+XicbVDLSsNAFJ3UV62vqEs3wSK4KkVdFl047KifUAbw2Q6aYdOZsLMTaGE/okbF4q49U/c+TdO2iy09cDA4Zx7uHdOmHCmwXW/rdLa+sbmVnm7srO7t39gHx61tUwVoS0iuVTdEGvKmaAtYMBpN1EUxyGnXB8m/udCVWaSfEI04T6MR4KFjGCwUiBbfelsfN09jB7SgI7Kpbc+dwVolXkCoq0Azsr/5AkjSmAgjHWvc8NwE/woY4XRW6aeaJpiM8ZD2DBU4ptrP5pfPnDOjDJxIKvMEOHP1dyLDsdbTODSTMYaRXvZy8T+vl0J07WdMJClQRaLopQ7IJ28BmfAFCXAp4Zgopi51SEjrDABU1bFlOAtf3mVtOs176JWv7+sNm6KOsroBJ2ic+ShK9RAd6iJWoigCXpGr+jNyqwX6936WIyWrCJzjP7A+vwBIxyT+w=</latexit> <latexit sha1_base64="iTYg1Ft2L5xMtjPOc3OrWXcTUmE=">AB+XicbVDLSsNAFJ3UV62vqEs3wSK4KkVdFl047KifUAbw2Q6aYdOZsLMTaGE/okbF4q49U/c+TdO2iy09cDA4Zx7uHdOmHCmwXW/rdLa+sbmVnm7srO7t39gHx61tUwVoS0iuVTdEGvKmaAtYMBpN1EUxyGnXB8m/udCVWaSfEI04T6MR4KFjGCwUiBbfelsfN09jB7SgI7Kpbc+dwVolXkCoq0Azsr/5AkjSmAgjHWvc8NwE/woY4XRW6aeaJpiM8ZD2DBU4ptrP5pfPnDOjDJxIKvMEOHP1dyLDsdbTODSTMYaRXvZy8T+vl0J07WdMJClQRaLopQ7IJ28BmfAFCXAp4Zgopi51SEjrDABU1bFlOAtf3mVtOs176JWv7+sNm6KOsroBJ2ic+ShK9RAd6iJWoigCXpGr+jNyqwX6936WIyWrCJzjP7A+vwBIxyT+w=</latexit> Existing approaches D p Build a conserv rvat ative estimat ate of the decisions that are safe to evaluate S t p S t Uni Uniforml mly reduce uce unce uncertaint nty on the boundary of this region G t G t Treating the ex expansion of the safe set as a pro proxy xy objective can be was waste teful Example: 1D optimization task StageOPT [Sui et al. 2018] Many unnecessary samples when optimum has already been found f ( s ) ≡ q ( s ) q ( s ) ≥ 0 # evaluations Domain D Matteo Turchetta
Recommend
More recommend