Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon - PowerPoint PPT Presentation

Multi-Dimensional LSTM Networks for Video Prediction Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon NVIDIA Research March 29, 2018 Wonmin Byeon | NVIDIA Research | March 29, 2018 1 / 44

Multi-Dimensional LSTM Networks for Video Prediction Convolutional Neural Networks Sementic Segmentation Biomedical Image Segmentation SegNet [Badrinarayanan16] U-Net [Ronneberger15] Page Segmentation Fluid Simulation / Pressure Solve Fully-CNN [Wick18] CNN [Tompson17] Wonmin Byeon | NVIDIA Research | March 29, 2018 2 / 44

Multi-Dimensional LSTM Networks for Video Prediction Convolutional Neural Networks Sementic Segmentation Biomedical Image Segmentation SegNet [Badrinarayanan16] U-Net [Ronneberger15] • Needs a lot of computations Wonmin Byeon | NVIDIA Research | March 29, 2018 3 / 44

Multi-Dimensional LSTM Networks for Video Prediction Convolutional Neural Networks Sementic Segmentation Biomedical Image Segmentation SegNet [Badrinarayanan16] U-Net [Ronneberger15] • Needs a lot of computations Each window can be computed in parallel An efficient GPU implementation is possible Wonmin Byeon | NVIDIA Research | March 29, 2018 3 / 44

Multi-Dimensional LSTM Networks for Video Prediction Convolutional Neural Networks Sementic Segmentation Biomedical Image Segmentation SegNet [Badrinarayanan16] U-Net [Ronneberger15] • Needs a lot of computations Each window can be computed in parallel An efficient GPU implementation is possible • Has a fixed size of receptive field Wonmin Byeon | NVIDIA Research | March 29, 2018 4 / 44

Multi-Dimensional LSTM Networks for Video Prediction Convolutional Neural Networks Sementic Segmentation Biomedical Image Segmentation SegNet [Badrinarayanan16] U-Net [Ronneberger15] • Needs a lot of computations Each window can be computed in parallel An efficient GPU implementation is possible • Has a fixed size of receptive field • Perceives only small local contexts of the pixels Wonmin Byeon | NVIDIA Research | March 29, 2018 4 / 44

Multi-Dimensional LSTM Networks for Video Prediction Convolutional Neural Networks • Needs a lot of computations Each window can be computed in parallel An efficient GPU implementation is possible • Has a fixed size of receptive field • Perceives only small local contexts of the pixels Images from Zheng’s ECCV16 tutorial Wonmin Byeon | NVIDIA Research | March 29, 2018 5 / 44

Multi-Dimensional LSTM Networks for Video Prediction Convolutional Neural Networks • Needs a lot of computations Each window can be computed in parallel An efficient GPU implementation is possible • Has a fixed size of receptive field • Perceives only small local contexts of the pixels Solutions? Images from Zheng’s ECCV16 tutorial Wonmin Byeon | NVIDIA Research | March 29, 2018 5 / 44

Multi-Dimensional LSTM Networks for Video Prediction Convolutional Neural Networks: solutions Up-pooling (deconvolution) Adding Conditional Random Field (CRF) DeconvNet [Noh16] DeepLab [Chen16] Wonmin Byeon | NVIDIA Research | March 29, 2018 6 / 44

Multi-Dimensional LSTM Networks for Video Prediction Convolutional Neural Networks: solutions? Up-pooling (deconvolution) Adding Conditional Random Field (CRF) DeconvNet [Noh16] DeepLab [Chen16] Using Dilated/Atrous Convolutions Dilated Convolutions [Yu15] DeepLab V2 [Chen16] Animation from https://github.com/vdumoulin/conv_arithmetic Wonmin Byeon | NVIDIA Research | March 29, 2018 7 / 44

Multi-Dimensional LSTM Networks for Video Prediction Convolutional Neural Networks: solutions? Using Dilated Convolutions & Going Deeper DeepLab V3 [Chen17] Wonmin Byeon | NVIDIA Research | March 29, 2018 8 / 44

Multi-Dimensional LSTM Networks for Video Prediction Convolutional Neural Networks: solutions? Using Dilated Convolutions & Going Deeper DeepLab V3 [Chen17] Fusing Multi-Resolutions Adopting Large Kernels RefineNet [Lin16] Global-CNN [Peng17] Wonmin Byeon | NVIDIA Research | March 29, 2018 9 / 44

Multi-Dimensional LSTM Networks for Video Prediction How can we efficiently capture global/long range context? Wonmin Byeon | NVIDIA Research | March 29, 2018 10 / 44

Multi-Dimensional LSTM Networks for Video Prediction How can we efficiently capture global/long range context? Image from http://staffwww.dcs.shef.ac.uk/people/H.Lu/feeler.html Wonmin Byeon | NVIDIA Research | March 29, 2018 10 / 44

Multi-Dimensional LSTM Networks for Video Prediction Long Short Term Memory Recurrent Networks Wonmin Byeon | NVIDIA Research | March 29, 2018 11 / 44

Multi-Dimensional LSTM Networks for Video Prediction LSTM Networks for Sequence Learning speech [Graves05, Graves06] handwriting [Liwicki07, Graves09] Wonmin Byeon | NVIDIA Research | March 29, 2018 12 / 44

Multi-Dimensional LSTM Networks for Video Prediction Sequence Classification Task with Dependencies mapping x 1 x 2 ... to y 1 y 2 .... x 1 y 1 y 2 x 2 y 3 x 3 y 4 x 4 * * y ∈ Y F : x ∈ X Wonmin Byeon | NVIDIA Research | March 29, 2018 13 / 44

Multi-Dimensional LSTM Networks for Video Prediction Sequence Classification Task with Dependencies mapping x 1 x 2 ... to y 1 y 2 .... h 1 x 1 y 1 y 2 x 2 y 3 x 3 y 4 x 4 … ... input sequence output sequence x * y * Wonmin Byeon | NVIDIA Research | March 29, 2018 13 / 44

Multi-Dimensional LSTM Networks for Video Prediction 1-Dimensional LSTM Networks Standard LSTM [Hochreiter97,Gers99] ... h t − 1 x t − 1 x t y t LSTM x t + 1 ... Input Hidden Layer Output Wonmin Byeon | NVIDIA Research | March 29, 2018 14 / 44

Multi-Dimensional LSTM Networks for Video Prediction 1-Dimensional LSTM Networks Standard LSTM Bidirectional LSTM [Hochreiter97,Gers99] [Graves05, Chen05] ... x t − 1 x t h t − 1 ... h t − 1 x t + 1 x t − 1 LSTM ... x t y t LSTM y t x t + 1 LSTM ... x t − 1 ... h t + 1 x t Input Hidden Layer Output x t + 1 ... Input Hidden Layer Output Wonmin Byeon | NVIDIA Research | March 29, 2018 14 / 44

Multi-Dimensional LSTM Networks for Video Prediction Multi-Dimensional LSTM networks Scene Labeling with LSTM Recurrent Neural Networks [Byeon15] Wonmin Byeon | NVIDIA Research | March 29, 2018 15 / 44

Multi-Dimensional LSTM Networks for Video Prediction 2-Dimensional LSTM Networks for images red: the current pixel Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44

Multi-Dimensional LSTM Networks for Video Prediction 2-Dimensional LSTM Networks for images 3x1x1 3x1x1 LSTM LSTM ... s s Input I k Output 1 1 3 LSTM LSTM 3 3x1x1 3x1x1 s s LSTM Layer Hidden Layer Scene Labeling with LSTM Recurrent Neural Networks [Byeon15] Wonmin Byeon | NVIDIA Research | March 29, 2018 17 / 44

Multi-Dimensional LSTM Networks for Video Prediction 2-Dimensional LSTM Networks for images Perceives the entire spatio-temporal context of each pixel in a few sweeps through all pixels Requires fewer number of 3x1x1 3x1x1 parameters to takes both local LSTM LSTM ... s s Input I k Output 1 and global contexts into account 1 3 LSTM LSTM 3 3x1x1 3x1x1 s s End-to-End learning , No pre- and LSTM Layer post- processing Hidden Layer Scene Labeling with LSTM Recurrent Neural Networks [Byeon15] Wonmin Byeon | NVIDIA Research | March 29, 2018 17 / 44

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon - PowerPoint PPT Presentation

Multi-Dimensional LSTM Networks for Video Prediction Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon NVIDIA Research March 29, 2018 Wonmin Byeon | NVIDIA Research | March 29, 2018 1 / 44 Multi-Dimensional LSTM Networks for

Who will Attend This Event Together? Event Attendance Prediction via Deep LSTM Networks

Exploring interpretable LSTM neural networks over multi-variable data Sebastian U. Stich (MLO,

Multi- -dimensional Data and dimensional Data and Spatial Range Spatial Range Multi Query in

Wormhole branch prediction using multi-dimensional histories Jorge Albericio, Joshua San Miguel ,

Bus Arrival Time Prediction with LSTM Neural Network A. Agafonov, A. Yumaganov Samara National

Multi Multi-dimensional Data and Spatial Range dimensional Data and Spatial Range Query in

Visualizing Multi-dimensional Data S E T H H O R R I G A N C O M P U T E R V I S U A L I Z A T

A Novel Framework For Scalable Video A Novel Framework For Scalable Video Streaming Over

Argumentative Link Prediction using Residual Networks and Multi-Objective Learning Galassi Andrea

Attention Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Encoder-decoder Models

Attention Graham Neubig Site https://phontron.com/class/nn4nlp2020/ Encoder-decoder Models

Framewise Phoneme Classification with Bidirectional LSTM Networks Alex Graves and Jurgen

Deep Learning: multi-layer neural networks Recurrent Neural Networks: sequence data Long

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

Today Arrays One-dimensional Machine-Level Programming IV: Data Multi-dimensional

An Introduction to Neural Networks Long Short Term Memory (LSTM) and the Attention mechanism Ange

Data Structures in Memory! Arrays One dimensional Multi dimensional (nested) M

Bottom-up Estimation and Top-down Prediction for Multi-level Models: Solar Energy Prediction

Multi-Dimensional Gas Flows Tai-Ping Liu Academia Sinica, Taiwan Stanford University Final

Multi-dimensional Indexing GIS applications (maps): GIS applications (maps): Urban

Multi-dimensional Dependency Grammar as Graph Description Ralph Debusmann and Gert Smolka

F From STEM EELS to multi dimensional and STEM EELS t lti di i l d multi signal

E ffi cient 2D Viewpoint Combination for Human Action Recognition Multi-view Action

Mixed Reality Research Lab www.mixedrealityresearch.com DR JAMES BIRT Multi-dimensional defines