fractional underdamped langevin dynamics
play

FRACTIONAL UNDERDAMPED LANGEVIN DYNAMICS: Umut im ekli LTCI, Tlcom - PowerPoint PPT Presentation

FRACTIONAL UNDERDAMPED LANGEVIN DYNAMICS: Umut im ekli LTCI, Tlcom Paris, RETARGETING SGD WITH MOMENTUM UNDER Institut Polytechnique de Paris HEAVY-TAILED GRADIENT NOISE Umut im ekli*, Lingjiong Zhu*, Yee Whye Teh, Mert


  1. FRACTIONAL UNDERDAMPED LANGEVIN DYNAMICS: Umut Ş im ş ekli LTCI, Télécom Paris, RETARGETING SGD WITH MOMENTUM UNDER Institut Polytechnique de Paris HEAVY-TAILED GRADIENT NOISE Umut Ş im ş ekli*, Lingjiong Zhu*, Yee Whye Teh, Mert Gürbüzbalaban (Florida State University) (University of Oxford) (Rutgers University) *equal contribution International Conference on Machine Learning, 2020

  2. <latexit sha1_base64="C8nmr3TGvVxaCrtAv71khNJL21s=">AEXicfZLNbhMxEMfdhI+yfLVw5GIRVWo5RElBwLEqrUSlohbol1Sn0azXm1ixvYvtLUktPwXPwEPACXHlCbhxhafASRM12SAsrfTf+c3fnhlNnAtubKPxc6FSvXb9xs3FW9HtO3fv3V9afnBkskJTdkgzkemTGAwTXLFDy61gJ7lmIGPBjuPeqyE/Pmfa8Ewd2EHOWhI6iqecg2h9tIBURALIJaLhLnUt3urpB+vYWI1B9UR7AMmqQbadLHxBSC65NW3HCVeY7EnWgXbP4/TMrfI1P3K3l2qNemN08LxojkUNjc9+e7nymSQZLSRTlgow5rTZyG3LgbacCuYjUhiWA+1Bh50GqUAy03Kj9j1eCZEp5kOn7J4FJ12OJDGDGRoYEWC7ZoyGwb/xU4Lm75sOa7ywjJFLx9KC4FthoezxAnXjFoxCAKo5qFWTLsQpmXDxCOi2EeaSQkqcWEq3pHhC3Hq+t7PwvMpeF6GF1Pwogy1YXaCY/eujLev2HaZbY2sjlAQ+GQO5irAfvk1mYwv1NIlZc+mvIKbZbg7BXf9GQGRd6GUs5NcdbpTvsCAmVT73l/aJ38+isgWC3uj2Zvg3suZBpvpJ46A7kiufNijDhmK/+VBf5wXRDSzIsOSbJYJ48NiN8trPC+O1uvNp/X1t89qG8/HK76IHqHaBU10Qu0gV6jfXSIKPqKfqHf6E/1U/VL9Vv1+2VqZWHseYhmTvXHX9L3abg=</latexit> <latexit sha1_base64="k1CAiTA04L6xnd5D7O+mlmMUzw=">AESXicfZLdbtMwGIbdZcAIfxucIHFiNoEGE1MzEHCNI1NYtIQ42dbpbqrvjhuF9V2Itst3SzfAbfB1XAFXAYcIY5w2pR1KcJSpNfv8/049hfnPNWmXv9emwvmL12+snA1vHb9xs1bi0u3D3XWV5Qd0IxnqhGDZjyV7MCkhrNGrhiImLOjuPe64EcDpnSayU/mNGctAV2ZdlIKxlvtxS/EpDxhlgxid2x7a5F79RBPvC4IAQ5Ph/Twk797ZgoIeYw8TquPSqySoaxL+ceYUJCPN54GxfFxwSv4ZnW7cWV+np9tPCsiEqxgsq1316a+0qSjPYFk4Zy0LoZ1XPTsqBMSjlzIelrlgPtQZc1vZQgmG7Z0bU5/MA7Ce5kyn/S4JE7nWFBaH0qYh8pwJzoKivMf7Fm3Retmwq875hko4bdfocmwXb4CTVDFq+KkXQFXqz4rpCSigxr9USCT7TDN/8zKx/rKcJUWHuGOHzl2Egyk4qMKzKXhWhUozM8Gx/VDFO+dsp8q2R6mWUOC4MQNz6eGw2k0kZUElbFLN2RLncKsK96bgnjsmwPMTqMTsJud/ulstoEFPTvRjdMnOxeGZJv5uVHsrc9+lzMFJlOPLQHVFal0fo6pBD/i4NhGedFeGFEiOZLO6GOyoOsaz4nBjPXq6vH+2crm83LEF9A9tIxWUYReoE30Bu2jA0TRz9rd2v3acvAt+BH8Cn6PQ+dqZc4dGHNB38A2B7WQ=</latexit> <latexit sha1_base64="fF9Fa6Oi0eSLx1J4zadUKP9lCRI=">AENnicfZLfbtMwFMbdlcEIsHVwyY1FhbRxUTUDATeTprFJTBpi/Ok2qW4rx3Fa7YTbGe0s/w4vAOvwg1cIW5BJwuZV2KsBTp+PzOd+wcf1HGmTbt9rfaUv3G8s1bK7eDO3fvra41u8f6zRXhHZIylN1GmFNOZO0Y5jh9DRTFIuI05Po7FXBT86p0iyVH80koz2Bh5IljGDjU4PGBI2jPtIGK7gNEVZDwSTiTDCjB9YziJiESGlq+rGDaJcNObIw2fBoEyKjGJZDTj9BlChMQit9jc7F3w5sO3R9CZO+3WCbrlT5Jgq5QaPZbrWnCy4GYRk0QbmOButLX1CcklxQaQjHWnfDdmZ6FivDCKcuQLmGSZneEi7PpRYUN2z0xk5+NhnYpikyn/SwGl2XmGx0HoiIl8psBnpKiuS/2Ld3CQve5bJLDdUksuDkpxDk8Ji4DBmihLDJz7ARDF/V0hG2E/L+GcJkKSfSoElnExbmdRcUKU2LFz1+H5HDyvwos5eFGFxevNcGTfV/H+Fduvsr2p1CKCOTxdgJn0cFw9TcRlQyVsXNXsiu4W4WHc/DQ9RHm2QhXag7iqz89qDbQWM9u+8Fdymc7FwRoj3rfKPrGq9mVGTqie2NL3zPhqiIvhfHR6XdT4IrlmkuJU64LY4dVGy8Gx1ut8Glr692z5s7z0uIr4CF4BDZACF6AHfAaHIEOIOBHbm2Wlurf61/r/+s/7osXaqVmgfg2qr/gOeTHUc</latexit> DEEP LEARNING & SGD-MOMENTUM § Deep learning (in general) n f ( x ) , 1 x ? = arg min n o X f ( i ) ( x ) n x ∈ R d non-convex data points i =1 network weights cost function § Optimization Algorithm – S tochastic G radiend D escent with momentum step-size stochastic velocity (learning rate) gradient (momentum) f k ( x ) , 1 v k +1 =˜ v k � ˜ η r ˜ f k +1 ( x k ) r ˜ ˜ γ ˜ X f ( i ) ( x ) b x k +1 = x k + ˜ v k +1 i ∈ Ω k minibatch minibatch size Momentum decay 2

  3. <latexit sha1_base64="KVGumPDBglPv0/WPIQVS5uBk6A=">AENnicfZLdbtMwFMfdhcEIsA+45MZiAnVMVG1BwM2kaWwSk4YH/uQ6lKdOE4XzXaC7Z2Vh6Hd+BVuIErxC2PgNkWpsiLFn6+/zO/9g5OUHKY2aze+1Be/a4vUbSzf9W7fvLK+srt091slAUXZE56o0wA047FkRyY2nJ2mioEIODsJzl/l/GTIlI4T+dGMU9YV0JdxFMwLtRbHT8iSoSYDIOe2XqC6QPQkBx3iQSAg4qpORO25sTFIN3sREf1bGkgBte3CklkSMANZhouCO6JnfEL8sn5eYKsoWySY3up6s9GcLDwvWqVYR+U67K0tfCVhQgeCSUM5aN1pNVPTtaBMTDnLfDLQLAV6Dn3WcVKCYLprJz3K8EMXCXGUKLelwZPotMOC0HosApcpwJzpKsuD/2KdgYledm0s04FhkhYXRQOTYLzhuMwVowaPnYCqIrdWzE9A9c5436LTyT7QhPXPxla1yTXxfyGILKjLJuFwyk4rMKLKXhRhUozc4kD+76K967YXpXtTqyWUOD4dA6m0sFR9TYRlgWVsGHVsyOu4E4VHkzBg+wTAZ6eQSVnP7z60v1qAQ368rUfsJ+ecp8n+wyNzeKvXHutylTYBL12BJQfRHLzM1Rn+Tif3kwKvOc8GdGJH+SRKuMzfYreoYz4vjdqP1tNF+92x9+3k54kvoPnqA6qiFXqBt9BodoiNE0c/aYm25tuJ98354v7zfRepCrfTcQzPL+/MXanJy3w=</latexit> <latexit sha1_base64="QrCyrRqFlXcH3Bmcpi1Gkw2AnU=">AD93icfZJLbxMxEMfdLo+yvFI4clkRIRUkoqQgQJyq0kpUKqI80kaKQzTr9SZWbO9iOyGptZ8FLiCufAmucOXb4E02SrJBjGTp7/nNeMajCVPOtKnX/2xsehcuXrq8dcW/eu36jZuV7VunOhkqQpsk4YlqhaApZ5I2DTOctlJFQYScnoWDFzk/G1GlWSLfm0lKOwJ6ksWMgHGubuV5szvYwePwfoCNYiB7nH4MsISQAzaMR9TG2TziYQGCeHrvVqr1Wn1qwbpoFKCjvpbm9+xVFChoJKQzho3W7U9OxoAwjnGY+HmqaAhlAj7adlCo7tjpJ7PgnvNEQZwod6QJpt7lDAtC64kIXaQA09dljv/xdpDEz/rWCbToaGSzArFQx6YJMgnFkRMUWL4xAkgirleA9IHBcS4ufpY0k8kEQJkZN1UMovzCmFsx1m2CkdLcFSG50vwvAyVpmaOQ/u2jA8X7LDMDqapFhPgQWsNptLBcbmaiIoHlbBROWdfLOB+GR4vwePsAwae9qEUcxQtfnpUfkCDnf7Lpulz2+Z7+MD6vZG0Vcu+3VKFZhEPbAYVE8wmbk96uFc/C8OxkWcE/7KiuQtmSThOnOL3Siv8bo43a01HtV23zyu7j0pVnwL3UF30Q5qoKdoD71EJ6iJCPqCfqJf6Lc38T5737zvs9DNjSLnNlox78dfEkRdrQ=</latexit> <latexit sha1_base64="pDfvaTRzPf4KrEM0W/eQlFyxsmA=">AD13icfZJb9MwFMe9hsItw7e4MWiQkI8VO1AwOM0NsGkIcalW1FTqhPHba36EtlOaRdF8IR45UPAK3wdvg1Om6ptirAU6e/zO3/7+OSEMWfGNhp/tirehYuXLm9f8a9eu37jZnXn1qlRiSa0RXuh2CoZxJ2rLMctqONQURcnoWjp7n/GxMtWFKvrfTmHYFDCTrMwLWhXrVO63eCAeGCRwIsEMt0heQGMNAZr1qrVFvzBbeFM1C1FCxTno7lR9BpEgiqLSEgzGdZiO23RS0ZYTzA8SQ2MgIxjQjpMSBDXdPaIDN93kQj3lXaftHgWXWkIyZitBl5pWaMsuD/2KdxPafdVMm48RSeYX9ROrcJ5R3DENCWT50AopmrFZMhaCDW9c0PJP1ElBAgozSYhFk61PYTydZtg7HK3Bchucr8LwMtaF2gcP0bRkfLtlhmR3MrGlAgOP2Boylg5PybSIqDnR/Oyp79sUS7pfh8Qo8zj4GwOMhlHKOouVLj8oHGDCLat9lc/til/l+cEDd3Gj6yrlfx1SDVfphGoAeCYzN0eDIBf/y4NJkeEvzYieUlWKW7ywW6Wx3hTnO7Wm4/qu28e1/aeFCO+je6ie+gBaqKnaA+9RCeohQj6gn6iX+i398H7H31vs1TK1uF5zZaW973vySzUjU=</latexit> <latexit sha1_base64="idQYSgAEt/YXMHrO0jrzbvpGoSg=">AD3nicfZLbhMxFIbdDJcy3FJYshmIkAqLKAmoZcGiKo1EpSLKJW2kOE09Hk9ixfaMbE9I6s4WVogtL8CSLTwLb4MnmSjJBHGkf453/nt46Pjx4wqXav92Sg5V65eu75w7156/adu+WteycqSiQmLRyxSLZ9pAijgrQ01Yy0Y0kQ9xk59YevMn46IlLRSHzUk5h0OeoLGlKMtE31yg8hR3rg+6aZwstWb7gNx/4TeHnW8F56kIpQT3rlSq1am4a3Luq5qIA8jntbpR8wiHDCidCYIaU69VqsuwZJTEjqQsTRWKEh6hPOlYKxInqmulbUu+xzQReGEn7Ce1Ns8sOg7hSE+7byqxzVWRZ8l+sk+jwRdQESeaCDy7KEyYpyMvG4wXUEmwZhMrEJbU9urhAZIazs+FwryCUecIxEYO6LUzOYWmnGarsLREhwV4cUSvChCqYieY9+8L+LmgjWL7GBqNRAj5rXYCwsHBdv40F+oOQmKHr2+QLuF+HREjxKzyBi8QAVag6DxUsPiwcopObdfkhn9vlf6rwgNi9keSNdb+NiUQ6k8NRLPqUjtHvVhJv5Xh8Z5nRXuyopkLekoYiq1i10vrvG6OGlU68+qjXfPK3s7+YpvgfgEdgGdbAL9sBrcAxaAIMv4Cf4BX47585n56vzbVZa2sg98FKON/Ap3MVE4=</latexit> UNDERSTANDING SGD-M § Theory: better-established for convex problems still in early phase for non-convex problems § Useful approach for analysis à S tochastic D ifferential E quations (SDE) U k ( x ) , r ˜ gradient noise: f k ( x ) � r f ( x ) E k U k ( x ) k 2 < 1 § If we assume and invoke CLT: U k ∼ Gaussian § SGD-m à Euler-Maruyama Discretization of the SDE : friction r 2 γ d v t = � ( γ v t + r f ( x t ))d t + β dB t Underdamped Dalalyan&Riou-Durand’18 (a.k.a. Kinetic) Brownian Gao et al.’18 Langevin Dynamics Motion d x t = v t d t Inverse temperature 3

Recommend


More recommend