overview of robot decision making
play

Overview of Robot Decision Making Prof. Yuke Zhu Fall 2020 CS391R: - PowerPoint PPT Presentation

Overview of Robot Decision Making Prof. Yuke Zhu Fall 2020 CS391R: Robot Learning (Fall 2020) 1 Todays Agenda What is Robot Decision Making? Mathematical Framework of Sequential Decision Making Learning for Decision Making


  1. Overview of Robot Decision Making Prof. Yuke Zhu Fall 2020 CS391R: Robot Learning (Fall 2020) 1

  2. Today’s Agenda ● What is Robot Decision Making? ● Mathematical Framework of Sequential Decision Making ● Learning for Decision Making ○ reinforcement learning (model-free vs. model-based) ○ imitation learning (behavior cloning, DAgger, IRL, and adversarial learning) ● Research Frontiers ○ compositionality, learning to learn, … CS391R: Robot Learning (Fall 2020) 3

  3. Robot Learning is to close the perception- action loop. Ro Perceive Perceive Act Act Act Perceive [Sa et al. IROS 2014] [Levine et al. JMLR 2016] [Bohg et al. ICRA 2018] CS391R: Robot Learning (Fall 2020) 4

  4. What is Robot Decision Making? Choosing the action a robot should perform in the physical world… Assistive Robots (Companions) Outer Space (Explorers) Autonomous Driving (Transporters) CS391R: Robot Learning (Fall 2020) 5

  5. What is Robot Decision Making? Choosing the action a robot should perform in the physical world… • Behaviors can’t be easily programmed • Imperfect sensing and actuation • Safety and robustness under uncertainty [Source: Boston Dynamics] CS391R: Robot Learning (Fall 2020) 6

  6. Robot Decision Making vs. Playing Games Robot decision making is embodied , active , and environmentally situated . [Source: Boston Dynamics] [Source: DeepMind’s AlphaGo] CS391R: Robot Learning (Fall 2020) 7

  7. Before We Dive In… ● This lecture is intended to provide a high-level, bird-eye view on (robot) decision making. ● The goal is not to go through all technical details: ○ We will re-visit them through paper reading in the following weeks. ○ Study the parts that you are less familiar with from online resources. ● Take related courses and read textbooks to learn this subject in depth (see the last slide). CS391R: Robot Learning (Fall 2020) 8

  8. <latexit sha1_base64="Pnkf2aWKD9BYQ2qZKVRM674ymOg=">ACFHicdVDLSgNBEJz1bXxFPXoZDIKSEHZjQpJDQBDBo4JRIVmW3slEh8w+mOkVwpqP8OKvePGgiFcP3vwbJw9BRQsaiqpurv8WAqNtv1hTU3PzM7NLyxmlpZXVtey6xvnOkoU40WyUhd+qC5FCFvokDJL2PFIfAlv/B7h0P/4oYrLaLwDPsxdwO4CkVXMEAjedm82tUF2KMN2g4Ar30/PRq0lJdi3hnQW6ob2sMChQZ46HrZnF0sOZVSpU7HpL4/IeUqdYr2CDkywYmXfW93IpYEPEQmQeuWY8fopqBQMkHmXaieQysB1e8ZWgIAduOnpqQHeM0qHdSJkKkY7U7xMpBFr3A90Di/Xv72h+JfXSrBbc1MRxgnykI0XdRNJMaLDhGhHKM5Q9g0BpoS5lbJrUMDQ5JgxIXx9Sv8n56WiUy7WTsu5g9IkjgWyRbJLnFIlRyQY3JCmoSRO/JAnsizdW89Wi/W67h1yprMbJIfsN4+AR9XnP8=</latexit> <latexit sha1_base64="jGRosKYE5dgsYh5U1uEh78GwnDA=">AB/nicdVDLSsNAFJ34rPUVFVduBotQNyWJLW13BTcuK9oHNKFMpN26GQSZiZCQV/xY0LRdz6He78GydtBU9MHA4517umePHjEplWR/Gyura+sZmYau4vbO7t28eHZlAhMOjhikej7SBJGOekoqhjpx4Kg0Gek508vM793R4SkEb9Vs5h4IRpzGlCMlJaG5nFZDhV0KYduiNQEI5bezM+HZsmqOHbNqTXhkjQvclKtQ7tiLVACOdpD890dRTgJCVeYISkHthUrL0VCUczIvOgmksQIT9GYDTlKCTSxfx5/BMKyMYREI/ruBC/b6RolDKWejrySyj/O1l4l/eIFBw0spjxNFOF4eChIGVQSzLuCICoIVm2mCsKA6K8QTJBWurGiLuHrp/B/0nUqdrXSuK6Wk5eRwGcgFNQBjaogxa4Am3QARik4AE8gWfj3ng0XozX5eiKke8cgR8w3j4BpBSVRg=</latexit> <latexit sha1_base64="5iAfSjFxJW7S24c+E2zRXg95A=">AB9XicdVDLSgMxFM3UV62vqks3wSK4kGFSWtplwY3LCvYBM2PJpJk2NMkMSUYpQ/DjQtF3Pov7vwb04egogcuHM65l3viVLOtPG8D6ewtr6xuVXcLu3s7u0flA+PujrJFKEdkvBE9SOsKWeSdgwznPZTRbGIO1Fk8u537ujSrNE3phpSkOBR5LFjGBjpdtghIXAZO+d4HCQbniudVG02sguCAI1ZfEq9chcr0FKmCF9qD8HgwTkgkqDeFYax95qQlzrAwjnM5KQaZpiskEj6hvqcSC6jBfXD2DZ1YZwjhRtqSBC/X7RI6F1lMR2U6BzVj/9ubiX56fmbgZ5kymaGSLBfFGYcmgfMI4JApSgyfWoKJYvZWSMZYWJsUCUbwten8H/Srbqo5java5VWdRVHEZyAU3AOEGiAFrgCbdABCjwAJ7As3PvPDovzuyteCsZo7BDzhvn/YQkiQ=</latexit> <latexit sha1_base64="JpD9q9Jvp09IkN0chWQ+ZK5PxIY=">AB7XicdVBNSwMxEM3Wr1q/qh69BIvgadktLe2x4MVjBfsB7VJm02wbm2SXJCuUpf/BiwdFvPp/vPlvTLcVPTBwO9GWbmhQln2njeh1PY2Nza3inulvb2Dw6PyscnXR2nitAOiXms+iFoypmkHcMp/1EURAhp71wdrX0e/dUaRbLWzNPaCBgIlnECBgrdYcTEAJG5YrnVhtNr+HjnPh+fUW8eh37rpejgtZoj8rvw3FMUkGlIRy0HvheYoIMlGE0VpmGqaAJnBhA4slSCoDrL82gW+sMoYR7GyJQ3O1e8TGQit5yK0nQLMVP/2luJf3iA1UTPImExSQyVZLYpSjk2Ml6/jMVOUGD63BIhi9lZMpqCAGBtQyYbw9Sn+n3Srl9zmze1Squ6jqOIztA5ukQ+aqAWukZt1E3aEH9ISendh5dF6c1VrwVnPnKIfcN4+AfOgj1w=</latexit> <latexit sha1_base64="9mN0evEAmW6HidnbyBTqbhwcRE=">AB/nicdVDLSsNAFJ34rPUVFVduBotQNyGJLW13FTcuK9gHNKVMpN26GQSZiZCQV/xY0LRdz6He78GydtBU9MHA4517umePHjEpl2x/Gyura+sZmYau4vbO7t28eHZklAhM2jhikej5SBJGOWkrqhjpxYKg0Gek60+vMr97R4SkEb9Vs5gMQjTmNKAYKS0NzeMyGiroUQ69EKkJRiy9nJ8PzZJtuU7VrTbgkjQuclKpQceyFyiBHK2h+e6NIpyEhCvMkJR9x47VIEVCUczIvOglksQIT9GY9DXlKCRykC7iz+GZVkYwiIR+XMGF+n0jRaGUs9DXk1lG+dvLxL+8fqKC+iClPE4U4Xh5KEgYVBHMuoAjKghWbKYJwoLqrBPkEBY6caKuoSvn8L/Sce1nIpVv6mUm5eRwGcgFNQBg6ogSa4Bi3QBhik4AE8gWfj3ng0XozX5eiKke8cgR8w3j4BbAqVIg=</latexit> <latexit sha1_base64="i+E65M1i3gjNnzJeIYbguafpo1s=">AB8nicdVDLSsNAFJ34rPFVdelmsAiuQlIs7UYsuHFZxT6gDWUynbRDJzNhZiKU0M9w40KRbt37H27Ev3GSKjogQuHc+7lnuDmFGlXfdWlpeWV1bL23Ym1vbO7vlvf2OEonEpI0FE7IXIEUY5aStqWakF0uCoCRbjC9yPzuLZGKCn6jZzHxIzTmNKQYaSP1BxHSE4xYej0fliuU603LoHc+J5tYK4tRr0HDdH5fzFPosXb3ZrWH4djAROIsI1ZkipvufG2k+R1BQzMrcHiSIxwlM0Jn1DOYqI8tM8hweG2UEQyFNcQ1z9ftEiKlZlFgOrOI6reXiX95/USHDT+lPE404bhYFCYMagGz+GISoI1mxmCsKQmK8QTJBHW5ku2ecLXpfB/0qk63qnTuHIrzSoUAKH4AicA/UQRNcghZoAwEuAMP4NHS1r31ZC2K1iXrc+YA/ID1/AFAvpTc</latexit> <latexit sha1_base64="47fJjpAaSDkG5CZij9cVsTrtsLM=">ACFnicdVDLSgNBEJz1bXxFPXoZDKJgDLsxoh4EwYvHCEaF7Lr0TiZmcPbBTK8Q1v0KL/6KFw+KeBVv/o2Th6CiBQNFVTc9VUEihUb/rBGRsfGJyanpgszs3PzC8XFpTMdp4rxBotlrC4C0FyKiDdQoOQXieIQBpKfB9dHPf/8hist4ugUuwn3QriKRFswQCP5xS03BOwkFk9zOt1/NLOHDrqn9Ded3C3fumXtY5mCj5fLNmVqrNT3dmnA7K/PS1XepU7D5KZIi6X3x3WzFLQx4hk6B107ET9DJQKJjkecFNU+AXcMVbxoaQci1l/Vj5XTNKC3ajpV5EdK+n0jg1DrbhiYyV4I/dvriX95zRTbe14moiRFHrHBoXYqKca01xFtCcUZyq4hwJQwf6WsAwoYmiYLpoSvpPR/clatOLXK3kmtdFgd1jFVsgq2SAO2SWH5JjUSYMwckceyBN5tu6tR+vFeh2MjljDnWXyA9bJxupn0s=</latexit> <latexit sha1_base64="m2zP/jMubcdRPaHpYCksETvGRPs=">AB8nicdVDLSsNAFJ3UV42vqks3g0VwFSbF0m7EghuXFewD0lAm0k7dJIJMxOhH6GxeKdOve/3Aj/o3TREFD1w4nHMv9wbJwpjdC7VpZXVvfKG/aW9s7u3uV/YOuEqktEMEF7IfYEU5i2lHM81pP5EURwGnvWB6ufR7t1QqJuIbPUuoH+FxzEJGsDaSN4iwnhDMs/Z8WKkip9ZoYLc+K69YKgeh26DspRvXixz5PFm90eVl4HI0HSiMacKyU56JE+xmWmhFO5/YgVTBZIrH1DM0xhFVfpZHnsMTo4xgKSpWMNc/T6R4UipWRSYzmVE9dtbin95XqrDp+xOEk1jUmxKEw51AIu74cjJinRfGYIJpKZrJBMsMREmy/Z5glfl8L/SbfmuGdO8xpVWzVQoAyOwDE4BS5ogBa4Am3QAQIcAcewKOlrXvryVoUrSXrc+YQ/ID1/AE9tJTa</latexit> <latexit sha1_base64="pvU12GU1dDG1BShnHUIzAeZ3O6M=">AB8nicdVDLSsNAFJ34rPFVdelmsAiuwqRY2o1YceOygn1AG8pkOmHTiZhZiKU0M9w40KRbt37H27Ev3GSKjogQuHc+7lnv9mDOlEXq3lpZXVtfWSxv25tb2zm5b7+jokQS2iYRj2TPx4pyJmhbM81pL5YUhz6nX96mfndWyoVi8SNnsXUC/FYsIARrI3UH4RYTwjm6cV8WK4gp1pvoLoLc+K6tYKgWg26DspROX+xz+LFm90al8Ho4gkIRWacKxU30Wx9lIsNSOczu1BomiMyRSPad9QgUOqvDSPIfHRhnBIJKmhIa5+n0ixaFSs9A3nVlE9dvLxL+8fqKDhpcyESeaClIsChIOdQSz+GISUo0nxmCiWQmKyQTLDHR5ku2ecLXpfB/0qk67qnTuEaVZhUKIFDcAROgAvqoAmuQAu0AQERuAMP4NHS1r31ZC2K1iXrc+YA/ID1/AEm6ZTL</latexit> <latexit sha1_base64="UGPCm8FQBtob6fJA7liEs4eoWaY=">ACP3icdVBNSxtBGJ7VamP8aIxHL0ODkIOE3WBILkLEixchtk0UsiG8O5kQ2Zml5lZISz5Df1DvehP8ObViwel9NJDb53dtBpFXxh4nuf9mPd9gogzbVz31la/rCy+jG3l/f2Nz6VNgudnQYK0LbJOShughAU84kbRtmOL2IFAURcHoeTI7T/PklVZqF8puZRrQnYCTZkBEwVuoXOr4AMybAk9PZoc9BjFT9rX2f4zOVokrUXyJSUjEAKwr7IR/ULJrVTrDbfu4Qx4Xm0O3FoNexU3i1Kz6Jd/X3/3W/3CjT8ISyoNISD1l3PjUwvAWUY4XSW92NIyATGNGuhRIE1b0ku3+G96wywMNQ2ScNztTFjgSE1lMR2Mp0Zf06l4pv5bqxGTZ6CZNRbKgk84+GMcmxKmZeMAUJYZPLQCimN0VkzEoIMZanrcm/L8Uvw861Yp3UGmcWTeqaB45tIs+ozLyUB010QlqoTYi6Ae6Qw/o0bly7p2fzq956ZLzr2cHvQjnz18tjrPb</latexit> <latexit sha1_base64="FywXp2+qoAPEXh6BbgVeq4gZKgw=">AB8nicdVDLSsNAFJ34rPFVdelmsAiuQlIs7UYsuHFZ0T6gDWUynbRDJzNhZiKU0M9w40KRbt37H27Ev3GSKjogQuHc+7lnuDmFGlXfdWlpeWV1bL23Ym1vbO7vlvf2OEonEpI0FE7IXIEUY5aStqWakF0uCoCRbjC9yPzuLZGKCn6jZzHxIzTmNKQYaSP1BxHSE4xYej0fliuU603LoHc+J5tYK4tRr0HDdH5fzFPosXb3ZrWH4djAROIsI1ZkipvufG2k+R1BQzMrcHiSIxwlM0Jn1DOYqI8tM8hweG2UEQyFNcQ1z9ftEiKlZlFgOrOI6reXiX95/USHDT+lPE404bhYFCYMagGz+GISoI1mxmCsKQmK8QTJBHW5ku2ecLXpfB/0qk63qnTuHIrzSoUAKH4AicA/UQRNcghZoAwEuAMP4NHS1r31ZC2K1iXrc+YA/ID1/AFCQ5Td</latexit> Mathematical Framework: Marko kov Decisi sion Processe sses A Markov Decision Process is defined by a tuple M = h S , A , P , R , γ i ( s t ∈ S ) S : state space A ( a t ∈ A ) : action space P a ss 0 = Pr[ s t +1 | s t , a t ] P : transition probability r ( s, a ) = E [ r t +1 | s = s t , a = a t ] : reward function R : a discount factor γ ∈ [0 , 1] γ CS391R: Robot Learning (Fall 2020) 9

Recommend


More recommend