Multi-Dimensional LSTM Networks for Video Prediction Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon NVIDIA Research March 29, 2018 Wonmin Byeon | NVIDIA Research | March 29, 2018 1 / 44
Multi-Dimensional LSTM Networks for Video Prediction Convolutional Neural Networks Sementic Segmentation Biomedical Image Segmentation SegNet [Badrinarayanan16] U-Net [Ronneberger15] Page Segmentation Fluid Simulation / Pressure Solve Fully-CNN [Wick18] CNN [Tompson17] Wonmin Byeon | NVIDIA Research | March 29, 2018 2 / 44
Multi-Dimensional LSTM Networks for Video Prediction Convolutional Neural Networks Sementic Segmentation Biomedical Image Segmentation SegNet [Badrinarayanan16] U-Net [Ronneberger15] • Needs a lot of computations Wonmin Byeon | NVIDIA Research | March 29, 2018 3 / 44
Multi-Dimensional LSTM Networks for Video Prediction Convolutional Neural Networks Sementic Segmentation Biomedical Image Segmentation SegNet [Badrinarayanan16] U-Net [Ronneberger15] • Needs a lot of computations Each window can be computed in parallel An efficient GPU implementation is possible Wonmin Byeon | NVIDIA Research | March 29, 2018 3 / 44
Multi-Dimensional LSTM Networks for Video Prediction Convolutional Neural Networks Sementic Segmentation Biomedical Image Segmentation SegNet [Badrinarayanan16] U-Net [Ronneberger15] • Needs a lot of computations Each window can be computed in parallel An efficient GPU implementation is possible • Has a fixed size of receptive field Wonmin Byeon | NVIDIA Research | March 29, 2018 4 / 44
Multi-Dimensional LSTM Networks for Video Prediction Convolutional Neural Networks Sementic Segmentation Biomedical Image Segmentation SegNet [Badrinarayanan16] U-Net [Ronneberger15] • Needs a lot of computations Each window can be computed in parallel An efficient GPU implementation is possible • Has a fixed size of receptive field • Perceives only small local contexts of the pixels Wonmin Byeon | NVIDIA Research | March 29, 2018 4 / 44
Multi-Dimensional LSTM Networks for Video Prediction Convolutional Neural Networks • Needs a lot of computations Each window can be computed in parallel An efficient GPU implementation is possible • Has a fixed size of receptive field • Perceives only small local contexts of the pixels Images from Zheng’s ECCV16 tutorial Wonmin Byeon | NVIDIA Research | March 29, 2018 5 / 44
Multi-Dimensional LSTM Networks for Video Prediction Convolutional Neural Networks • Needs a lot of computations Each window can be computed in parallel An efficient GPU implementation is possible • Has a fixed size of receptive field • Perceives only small local contexts of the pixels Solutions? Images from Zheng’s ECCV16 tutorial Wonmin Byeon | NVIDIA Research | March 29, 2018 5 / 44
Multi-Dimensional LSTM Networks for Video Prediction Convolutional Neural Networks: solutions Up-pooling (deconvolution) Adding Conditional Random Field (CRF) DeconvNet [Noh16] DeepLab [Chen16] Wonmin Byeon | NVIDIA Research | March 29, 2018 6 / 44
Multi-Dimensional LSTM Networks for Video Prediction Convolutional Neural Networks: solutions? Up-pooling (deconvolution) Adding Conditional Random Field (CRF) DeconvNet [Noh16] DeepLab [Chen16] Using Dilated/Atrous Convolutions Dilated Convolutions [Yu15] DeepLab V2 [Chen16] Animation from https://github.com/vdumoulin/conv_arithmetic Wonmin Byeon | NVIDIA Research | March 29, 2018 7 / 44
Multi-Dimensional LSTM Networks for Video Prediction Convolutional Neural Networks: solutions? Using Dilated Convolutions & Going Deeper DeepLab V3 [Chen17] Wonmin Byeon | NVIDIA Research | March 29, 2018 8 / 44
Multi-Dimensional LSTM Networks for Video Prediction Convolutional Neural Networks: solutions? Using Dilated Convolutions & Going Deeper DeepLab V3 [Chen17] Fusing Multi-Resolutions Adopting Large Kernels RefineNet [Lin16] Global-CNN [Peng17] Wonmin Byeon | NVIDIA Research | March 29, 2018 9 / 44
Multi-Dimensional LSTM Networks for Video Prediction How can we efficiently capture global/long range context? Wonmin Byeon | NVIDIA Research | March 29, 2018 10 / 44
Multi-Dimensional LSTM Networks for Video Prediction How can we efficiently capture global/long range context? Wonmin Byeon | NVIDIA Research | March 29, 2018 10 / 44
Multi-Dimensional LSTM Networks for Video Prediction How can we efficiently capture global/long range context? Wonmin Byeon | NVIDIA Research | March 29, 2018 10 / 44
Multi-Dimensional LSTM Networks for Video Prediction How can we efficiently capture global/long range context? Image from http://staffwww.dcs.shef.ac.uk/people/H.Lu/feeler.html Wonmin Byeon | NVIDIA Research | March 29, 2018 10 / 44
Multi-Dimensional LSTM Networks for Video Prediction Long Short Term Memory Recurrent Networks Wonmin Byeon | NVIDIA Research | March 29, 2018 11 / 44
Multi-Dimensional LSTM Networks for Video Prediction LSTM Networks for Sequence Learning speech [Graves05, Graves06] handwriting [Liwicki07, Graves09] Wonmin Byeon | NVIDIA Research | March 29, 2018 12 / 44
Multi-Dimensional LSTM Networks for Video Prediction Sequence Classification Task with Dependencies mapping x 1 x 2 ... to y 1 y 2 .... x 1 y 1 y 2 x 2 y 3 x 3 y 4 x 4 * * y ∈ Y F : x ∈ X Wonmin Byeon | NVIDIA Research | March 29, 2018 13 / 44
Multi-Dimensional LSTM Networks for Video Prediction Sequence Classification Task with Dependencies mapping x 1 x 2 ... to y 1 y 2 .... h 1 x 1 y 1 y 2 x 2 y 3 x 3 y 4 x 4 … ... input sequence output sequence x * y * Wonmin Byeon | NVIDIA Research | March 29, 2018 13 / 44
Multi-Dimensional LSTM Networks for Video Prediction 1-Dimensional LSTM Networks Standard LSTM [Hochreiter97,Gers99] ... h t − 1 x t − 1 x t y t LSTM x t + 1 ... Input Hidden Layer Output Wonmin Byeon | NVIDIA Research | March 29, 2018 14 / 44
Multi-Dimensional LSTM Networks for Video Prediction 1-Dimensional LSTM Networks Standard LSTM Bidirectional LSTM [Hochreiter97,Gers99] [Graves05, Chen05] ... x t − 1 x t h t − 1 ... h t − 1 x t + 1 x t − 1 LSTM ... x t y t LSTM y t x t + 1 LSTM ... x t − 1 ... h t + 1 x t Input Hidden Layer Output x t + 1 ... Input Hidden Layer Output Wonmin Byeon | NVIDIA Research | March 29, 2018 14 / 44
Multi-Dimensional LSTM Networks for Video Prediction Multi-Dimensional LSTM networks Scene Labeling with LSTM Recurrent Neural Networks [Byeon15] Wonmin Byeon | NVIDIA Research | March 29, 2018 15 / 44
Multi-Dimensional LSTM Networks for Video Prediction 2-Dimensional LSTM Networks for images red: the current pixel Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44
Multi-Dimensional LSTM Networks for Video Prediction 2-Dimensional LSTM Networks for images red: the current pixel Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44
Multi-Dimensional LSTM Networks for Video Prediction 2-Dimensional LSTM Networks for images red: the current pixel Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44
Multi-Dimensional LSTM Networks for Video Prediction 2-Dimensional LSTM Networks for images red: the current pixel Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44
Multi-Dimensional LSTM Networks for Video Prediction 2-Dimensional LSTM Networks for images red: the current pixel Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44
Multi-Dimensional LSTM Networks for Video Prediction 2-Dimensional LSTM Networks for images red: the current pixel Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44
Multi-Dimensional LSTM Networks for Video Prediction 2-Dimensional LSTM Networks for images red: the current pixel Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44
Multi-Dimensional LSTM Networks for Video Prediction 2-Dimensional LSTM Networks for images red: the current pixel Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44
Multi-Dimensional LSTM Networks for Video Prediction 2-Dimensional LSTM Networks for images red: the current pixel Wonmin Byeon | NVIDIA Research | March 29, 2018 16 / 44
Multi-Dimensional LSTM Networks for Video Prediction 2-Dimensional LSTM Networks for images 3x1x1 3x1x1 LSTM LSTM ... s s Input I k Output 1 1 3 LSTM LSTM 3 3x1x1 3x1x1 s s LSTM Layer Hidden Layer Scene Labeling with LSTM Recurrent Neural Networks [Byeon15] Wonmin Byeon | NVIDIA Research | March 29, 2018 17 / 44
Multi-Dimensional LSTM Networks for Video Prediction 2-Dimensional LSTM Networks for images Perceives the entire spatio-temporal context of each pixel in a few sweeps through all pixels Requires fewer number of 3x1x1 3x1x1 parameters to takes both local LSTM LSTM ... s s Input I k Output 1 and global contexts into account 1 3 LSTM LSTM 3 3x1x1 3x1x1 s s End-to-End learning , No pre- and LSTM Layer post- processing Hidden Layer Scene Labeling with LSTM Recurrent Neural Networks [Byeon15] Wonmin Byeon | NVIDIA Research | March 29, 2018 17 / 44
Recommend
More recommend