Online Meta-Learning Chelsea Finn*, Aravind Rajeswaran*, Sham Kakade, Sergey Levine
Deep networks + large datasets =
In many prac9cal situa9ons : Deep networks + large datasets = Learn new task with only a few datapoints
In many prac9cal situa9ons : Deep networks + large datasets = Learn new task with only a few datapoints Meta-Learning (Schmidhuber et al. ’87, Bengio et al. ’92) Given i.i.d. task distribuNon, learn a new task efficiently
In many prac9cal situa9ons : Deep networks + large datasets = Learn new task with only a few datapoints Meta-Learning (Schmidhuber et al. ’87, Bengio et al. ’92) Given i.i.d. task distribuNon, learn a new task efficiently
In many prac9cal situa9ons : Deep networks + large datasets = Learn new task with only a few datapoints Meta-Learning learn (Schmidhuber et al. ’87, Bengio et al. ’92) Given i.i.d. task distribuNon, learn a new task efficiently
In many prac9cal situa9ons : Deep networks + large datasets = Learn new task with only a few datapoints Meta-Learning learn (Schmidhuber et al. ’87, Bengio et al. ’92) Given i.i.d. task distribuNon, learn a new task efficiently More realis9cally :
In many prac9cal situa9ons : Deep networks + large datasets = Learn new task with only a few datapoints Meta-Learning learn (Schmidhuber et al. ’87, Bengio et al. ’92) Given i.i.d. task distribuNon, learn a new task efficiently learn More realis9cally : Nme
In many prac9cal situa9ons : Deep networks + large datasets = Learn new task with only a few datapoints Meta-Learning learn (Schmidhuber et al. ’87, Bengio et al. ’92) Given i.i.d. task distribuNon, learn a new task efficiently learn learn More realis9cally : Nme
In many prac9cal situa9ons : Deep networks + large datasets = Learn new task with only a few datapoints Meta-Learning learn (Schmidhuber et al. ’87, Bengio et al. ’92) Given i.i.d. task distribuNon, learn a new task efficiently learn learn learn More realis9cally : Nme
In many prac9cal situa9ons : Deep networks + large datasets = Learn new task with only a few datapoints Meta-Learning learn (Schmidhuber et al. ’87, Bengio et al. ’92) Given i.i.d. task distribuNon, learn a new task efficiently learn learn learn learn More realis9cally : Nme
In many prac9cal situa9ons : Deep networks + large datasets = Learn new task with only a few datapoints Meta-Learning learn (Schmidhuber et al. ’87, Bengio et al. ’92) Given i.i.d. task distribuNon, learn a new task efficiently learn learn learn learn learn More realis9cally : Nme
In many prac9cal situa9ons : Deep networks + large datasets = Learn new task with only a few datapoints Meta-Learning learn (Schmidhuber et al. ’87, Bengio et al. ’92) Given i.i.d. task distribuNon, learn a new task efficiently learn learn learn learn learn learn More realis9cally : Nme
In many prac9cal situa9ons : Deep networks + large datasets = Learn new task with only a few datapoints Meta-Learning learn (Schmidhuber et al. ’87, Bengio et al. ’92) Given i.i.d. task distribuNon, learn a new task efficiently learn learn learn learn learn learn learn More realis9cally : Nme
In many prac9cal situa9ons : Deep networks + large datasets = Learn new task with only a few datapoints Meta-Learning learn (Schmidhuber et al. ’87, Bengio et al. ’92) Given i.i.d. task distribuNon, learn a new task efficiently learn learn learn learn learn learn learn More realis9cally : Nme slow learning rapid learning
Meta-Learning learn (Schmidhuber et al. ’87, Bengio et al. ’92) Given i.i.d. task distribuNon, learn a new task efficiently Online Learning perform perform perform perform perform perform perform (Hannan ’57, Zinkevich ’03) Perform sequence of tasks while minimizing staNc regret. Nme
Meta-Learning learn (Schmidhuber et al. ’87, Bengio et al. ’92) Given i.i.d. task distribuNon, learn a new task efficiently Online Learning perform perform perform perform perform perform perform (Hannan ’57, Zinkevich ’03) Perform sequence of tasks while minimizing staNc regret. Nme zero-shot performance
Meta-Learning learn (Schmidhuber et al. ’87, Bengio et al. ’92) Given i.i.d. task distribuNon, learn a new task efficiently Online Learning perform perform perform perform perform perform perform (Hannan ’57, Zinkevich ’03) Perform sequence of tasks while minimizing staNc regret. Nme zero-shot performance Online Meta-Learning (this work) Efficiently learn a sequence of tasks from a non-staNonary distribuNon.
Meta-Learning learn (Schmidhuber et al. ’87, Bengio et al. ’92) Given i.i.d. task distribuNon, learn a new task efficiently Online Learning perform perform perform perform perform perform perform (Hannan ’57, Zinkevich ’03) Perform sequence of tasks while minimizing staNc regret. Nme zero-shot performance learn learn learn learn learn learn learn Online Meta-Learning (this work) Efficiently learn a sequence of tasks from a non-staNonary distribuNon. Nme performance a?er seeing a small amount of data
The Online Meta-Learning Se6ng
<latexit sha1_base64="lzUmXogL4embxjWdgvnwZayh68=">AEcHicdZPbhMxEIa3aYASTi3cIHBQBepVFWVpEKgSpWKQIiLXhToSYpD5PXOJla89uIDUbTKi/A03MIb8Bo8Ad5DS5OCr2Zn/HtmvpmNMsGNbd/LTWm9eu31i52bp1+87de6tr90+McprhMVNC6bOIGhRc4rHlVuBZpGmkcDTaPymiJ9+RW24kd2mE/pUPJE86o9a7BWmOHRDjkMk+EMyOBiZ21PmWUIagEMqpitbLISR2hJYC4RLIUWUaFxkf/QIkpXYURfnH2ec4BCpjEMoYSJxkRZCzQ5g90Ko+XBkqdZqckaAiG9F5nt94pDVo5/0xoq4R5Zwu6W0BErKwpXImdApmFu626epQuRU0tznpFDynV3ru32W8RbjE1mLUxrWw4VrEkHE2NkDnCy3rHNgNwnye52EteD1EacGMlKt15zAGNoQJtz5gYSzVRGA8LlVz8zrnUEDOSknlkfC4QxcFvuKIdOKYey094TkcMQH9j+oSle4VQJWkaW8JOvickxetQeVfO8vMUWXJUs6y6vFA+DyHxzL2q4t4iKONLTKvxsyWF1vb7fLA1eNTm2sB/U5HKwthSRWzL8nLRPUmF6n7SedU205E0UCj8dv3ZgOsedN6dfO9PMS1gyeU9cVpgo30bpvazIaWrMNI38zWKbzGKscP4r1nM2edXPucycRcmqRIkTYBU/wrEXCOzYuoNyjT3tQIbeWSs+CXmshSsCtK+E4kTptLUD8lP+2CWExRiwV2D9zHLRYx5Na9ZwbWzSPGqcdLd7uxsdz901/f1oRXgkfB02Aj6AQvg/3gfXAYHAes8a3xvfGj8XP5d/Nh83HzSXW1sVRrHgRzp7n5B2/+d1k=</latexit> The Online Meta-Learning Se6ng Space of parameters ✓ ∈ Θ ⊆ R d and loss functions ` : Θ → R For round t ∈ { 1 , 2 , . . . ∞ } :
<latexit sha1_base64="lzUmXogL4embxjWdgvnwZayh68=">AEcHicdZPbhMxEIa3aYASTi3cIHBQBepVFWVpEKgSpWKQIiLXhToSYpD5PXOJla89uIDUbTKi/A03MIb8Bo8Ad5DS5OCr2Zn/HtmvpmNMsGNbd/LTWm9eu31i52bp1+87de6tr90+McprhMVNC6bOIGhRc4rHlVuBZpGmkcDTaPymiJ9+RW24kd2mE/pUPJE86o9a7BWmOHRDjkMk+EMyOBiZ21PmWUIagEMqpitbLISR2hJYC4RLIUWUaFxkf/QIkpXYURfnH2ec4BCpjEMoYSJxkRZCzQ5g90Ko+XBkqdZqckaAiG9F5nt94pDVo5/0xoq4R5Zwu6W0BErKwpXImdApmFu626epQuRU0tznpFDynV3ru32W8RbjE1mLUxrWw4VrEkHE2NkDnCy3rHNgNwnye52EteD1EacGMlKt15zAGNoQJtz5gYSzVRGA8LlVz8zrnUEDOSknlkfC4QxcFvuKIdOKYey094TkcMQH9j+oSle4VQJWkaW8JOvickxetQeVfO8vMUWXJUs6y6vFA+DyHxzL2q4t4iKONLTKvxsyWF1vb7fLA1eNTm2sB/U5HKwthSRWzL8nLRPUmF6n7SedU205E0UCj8dv3ZgOsedN6dfO9PMS1gyeU9cVpgo30bpvazIaWrMNI38zWKbzGKscP4r1nM2edXPucycRcmqRIkTYBU/wrEXCOzYuoNyjT3tQIbeWSs+CXmshSsCtK+E4kTptLUD8lP+2CWExRiwV2D9zHLRYx5Na9ZwbWzSPGqcdLd7uxsdz901/f1oRXgkfB02Aj6AQvg/3gfXAYHAes8a3xvfGj8XP5d/Nh83HzSXW1sVRrHgRzp7n5B2/+d1k=</latexit> <latexit sha1_base64="QpLTBGJVTJK0PRBwp5wgSav8E=">AB8nicbVDLSgNBEJyNrxhfUY9eBhPBU9iNB8VTQBCPEcwDNkuYnZ1NhszOLDO9QlgC/oQXD4p49Wu8+TdOHgdNLGgoqrp7gpTwQ247rdTWFvf2Nwqbpd2dvf2D8qHR2jMk1ZiyqhdDckhgkuWQs4CNZNSNJKFgnHN1M/c4j04Yr+QDjlAUJGUgec0rASr5WmYzwNa5CtV+uDV3BrxKvAWpoAWa/fJXL1I0S5gEKogxvuemEOREA6eCTUq9zLCU0BEZMN9SRJmgnx28gSfWSXCsdK2JOCZ+nsiJ4kx4yS0nQmBoVn2puJ/np9BfBXkXKYZMEni+JMYFB4+j+OuGYUxNgSQjW3t2I6JpQsCmVbAje8surpF2veRe1+n290rh9msdRCfoFJ0jD12iBrpDTdRCFCn0jF7RmwPOi/PufMxbC84iwmP0B87nD/hQkMw=</latexit> <latexit sha1_base64="51E2b5kWBxUvu4YMcz0TQXzGfS0=">ACS3icbVC7TgMxEPSF9zMBSpoTCRI0V0oESCgoICJAJIuSjy+TbEwo+TvQZFp/sSWvgcPoDvoEMUOCEFBEayPJrZ9a4nzQW3GEVvQWVmdm5+YXFpeWV1b1a29i8toZBm2mhTa3KbUguI2chRwmxugMhVwk96fjPybBzCWa3WFwxy6kt4p3ueMopd6tWojASF6uJewTON+o1erR81ojPAviSekTia46G0EjSTzElQyAS1thNHOXYLapAzAeVy4izklN3TO+h4qgE2y3Gm5fhrleysK+NPwrDsfqzo6DS2qFMfaWkOLDT3kj8z+s47B91C65yh6DY96C+EyHqcBRDmHEDMXQE8oM97uGbEANZejD+jUFlJMcQfqfKHhkWkqsiJh52Uxim5KdlnuX5HeQy4y8NcAkJalzWeTvEvuW414Nm67JVPz6dJLxItskO2SMxOSTH5IxckDZhxJEn8kxegtfgPfgIPr9LK8GkZ4v8QmXuC+ips9U=</latexit> <latexit sha1_base64="Ug62yUj+VwH/Ec6Iq/XsAUXW9c=">ACRHicbVDLTsMwEHR4lvfryCWiReJUJeUAx0pw4MChSLRFNFHlOFtqYTuRvQFVUf6CK3wO/8A/cENcEW6bAxRGsjSa2fXuTpQKbtDz3py5+YXFpeXKyura+sbm1vbObsckmWbQZolI9E1EDQiuoI0cBdykGqiMBHSj+7Ox30AbXirnGUQijpneIDziha6bYW4BCQ9rHW3656dW8C9y/xS1IlJVr9HacWxAnLJChkghrT870Uw5xq5ExAsRpkBlLK7ukd9CxVIJ8nKhXtoldgdJNo+he5E/dmRU2nMSEa2UlIcmlvLP7n9TIcnIY5V2mGoNh0CATLibu+H435hoYipElGlud3XZkGrK0Kb0awqoTHIEaS9R8MgSKamK84BdFnkAQszIWZzaX6T1kIsY8mwRWFz9WdT/Es6jbp/XG9cNarN8zLhCtknB+SI+OSENMkFaZE2YUSRJ/JMXpxX5935cD6npXNO2bNHfsH5+gbZ5Lz</latexit> round : t ` t ( · ) θ t The Online Meta-Learning Se6ng Space of parameters ✓ ∈ Θ ⊆ R d and loss functions ` : Θ → R For round t ∈ { 1 , 2 , . . . ∞ } : 1. World picks a loss function ` t ( · ) 2. Agent should pick ✓ t without knowledge of ` t
Recommend
More recommend