lecture 13
play

Lecture 13: Segmentation and Attention Fei-Fei Li & Andrej - PowerPoint PPT Presentation

Lecture 13: Segmentation and Attention Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - Lecture 13 - 24 Feb 2016 24 Feb 2016 1 Administrative Assignment 3 due


  1. Semantic Segmentation: Upsampling Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 41

  2. Semantic Segmentation: Upsampling Learnable upsampling! Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 42

  3. Semantic Segmentation: Upsampling Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 43

  4. Semantic Segmentation: Upsampling “skip connections” Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 44

  5. Semantic Segmentation: Upsampling “skip connections” Skip connections = Better results Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 45

  6. Learnable Upsampling: “Deconvolution” Typical 3 x 3 convolution, stride 1 pad 1 Input: 4 x 4 Output: 4 x 4 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 46

  7. Learnable Upsampling: “Deconvolution” Typical 3 x 3 convolution, stride 1 pad 1 Dot product between filter and input Input: 4 x 4 Output: 4 x 4 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 47

  8. Learnable Upsampling: “Deconvolution” Typical 3 x 3 convolution, stride 1 pad 1 Dot product between filter and input Input: 4 x 4 Output: 4 x 4 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 48

  9. Learnable Upsampling: “Deconvolution” Typical 3 x 3 convolution, stride 2 pad 1 Input: 4 x 4 Output: 2 x 2 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 49

  10. Learnable Upsampling: “Deconvolution” Typical 3 x 3 convolution, stride 2 pad 1 Dot product between filter and input Input: 4 x 4 Output: 2 x 2 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 50

  11. Learnable Upsampling: “Deconvolution” Typical 3 x 3 convolution, stride 2 pad 1 Dot product between filter and input Input: 4 x 4 Output: 2 x 2 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 51

  12. Learnable Upsampling: “Deconvolution” 3 x 3 “deconvolution”, stride 2 pad 1 Input: 2 x 2 Output: 4 x 4 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 52

  13. Learnable Upsampling: “Deconvolution” 3 x 3 “deconvolution”, stride 2 pad 1 Input gives weight for filter Input: 2 x 2 Output: 4 x 4 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 53

  14. Learnable Upsampling: “Deconvolution” 3 x 3 “deconvolution”, stride 2 pad 1 Input gives weight for filter Input: 2 x 2 Output: 4 x 4 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 54

  15. Learnable Upsampling: “Deconvolution” Sum where 3 x 3 “deconvolution”, stride 2 pad 1 output overlaps Input gives weight for filter Input: 2 x 2 Output: 4 x 4 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 55

  16. Learnable Upsampling: “Deconvolution” Sum where 3 x 3 “deconvolution”, stride 2 pad 1 output overlaps Same as backward pass for normal convolution! Input gives weight for filter Input: 2 x 2 Output: 4 x 4 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 56

  17. Learnable Upsampling: “Deconvolution” Sum where 3 x 3 “deconvolution”, stride 2 pad 1 output overlaps Same as backward pass for normal convolution! “Deconvolution” is a bad Input gives name, already defined as weight for “inverse of convolution” filter Better names: convolution transpose, backward strided convolution, Input: 2 x 2 Output: 4 x 4 1/2 strided convolution, upconvolution Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 57

  18. Learnable Upsampling: “Deconvolution” Im et al, “Generating images with recurrent adversarial networks”, arXiv 2016 “Deconvolution” is a bad name, already defined as “inverse of convolution” Radford et al, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, ICLR 2016 Better names: convolution transpose, backward strided convolution, 1/2 strided convolution, upconvolution Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 58

  19. Learnable Upsampling: “Deconvolution” Great explanation in appendix Im et al, “Generating images with recurrent adversarial networks”, arXiv 2016 “Deconvolution” is a bad name, already defined as “inverse of convolution” Radford et al, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, ICLR 2016 Better names: convolution transpose, backward strided convolution, 1/2 strided convolution, upconvolution Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 59

  20. Semantic Segmentation: Upsampling Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 60

  21. Semantic Segmentation: Upsampling Normal VGG “Upside down” VGG 6 days of training on Titan X… Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 61

  22. Instance Segmentation Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 62

  23. Instance Segmentation Detect instances, give category, label pixels “simultaneous detection and segmentation” (SDS) Lots of recent work (MS-COCO) Figure credit: Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 63

  24. Instance Segmentation Similar to R-CNN, but with segments Hariharan et al, “Simultaneous Detection and Segmentation”, ECCV 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 64

  25. Instance Segmentation Similar to R-CNN, but with segments External Segment proposals Hariharan et al, “Simultaneous Detection and Segmentation”, ECCV 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 65

  26. Instance Segmentation Similar to R-CNN, but with segments External Segment proposals Hariharan et al, “Simultaneous Detection and Segmentation”, ECCV 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 66

  27. Instance Segmentation Similar to R-CNN, but with segments External Segment proposals Mask out background with mean image Hariharan et al, “Simultaneous Detection and Segmentation”, ECCV 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 67

  28. Instance Segmentation Similar to R-CNN, but with segments External Segment proposals Mask out background with mean image Hariharan et al, “Simultaneous Detection and Segmentation”, ECCV 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 68

  29. Instance Segmentation Similar to R-CNN, but with segments External Segment proposals Mask out background with mean image Hariharan et al, “Simultaneous Detection and Segmentation”, ECCV 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 69

  30. Instance Segmentation: Hypercolumns Hariharan et al, “Hypercolumns for Object Segmentation and Fine-grained Localization”, CVPR 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 70

  31. Instance Segmentation: Hypercolumns Hariharan et al, “Hypercolumns for Object Segmentation and Fine-grained Localization”, CVPR 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 71

  32. Instance Segmentation: Cascades Similar to Faster R-CNN Won COCO 2015 challenge (with ResNet) Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 72

  33. Instance Segmentation: Cascades Similar to Faster R-CNN Won COCO 2015 challenge (with ResNet) Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 73

  34. Instance Segmentation: Cascades Region proposal network (RPN) Similar to Faster R-CNN Won COCO 2015 challenge (with ResNet) Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 74

  35. Instance Segmentation: Cascades Region proposal network (RPN) Similar to Faster R-CNN Reshape boxes to fixed size, figure / ground logistic regression Won COCO 2015 challenge (with ResNet) Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 75

  36. Instance Segmentation: Cascades Region proposal network (RPN) Similar to Faster R-CNN Reshape boxes to fixed size, figure / ground logistic regression Mask out background, predict object class Won COCO 2015 challenge (with ResNet) Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 76

  37. Instance Segmentation: Cascades Region proposal network (RPN) Similar to Faster R-CNN Reshape boxes to Learn entire model fixed size, end-to-end! figure / ground logistic regression Mask out background, predict object class Won COCO 2015 challenge (with ResNet) Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 77

  38. Instance Segmentation: Cascades Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Predictions Ground truth Cascades”, arXiv 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 78

  39. Segmentation Overview ● Semantic segmentation ○ Classify all pixels ○ Fully convolutional models, downsample then upsample ○ Learnable upsampling: fractionally strided convolution ○ Skip connections can help ● Instance Segmentation ○ Detect instance, generate mask ○ Similar pipelines to object detection Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 79

  40. Attention Models Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 80

  41. Recall: RNN for Captioning Image: H x W x 3 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 81

  42. Recall: RNN for Captioning CNN Image: Features: H x W x 3 D Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 82

  43. Recall: RNN for Captioning CNN h0 Image: Features: Hidden state: H x W x 3 D H Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 83

  44. Recall: RNN for Captioning Distribution over vocab d1 CNN h0 h1 Image: Features: Hidden state: H x W x 3 D H y1 First word Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 84

  45. Recall: RNN for Captioning Distribution over vocab d1 d2 CNN h0 h1 h2 Image: Features: Hidden state: H x W x 3 D H y1 y2 First Second word word Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 85

  46. Recall: RNN for Captioning RNN only looks at Distribution whole image, once over vocab d1 d2 CNN h0 h1 h2 Image: Features: Hidden state: H x W x 3 D H y1 y2 First Second word word Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 86

  47. Recall: RNN for Captioning RNN only looks at Distribution whole image, once over vocab d1 d2 CNN h0 h1 h2 Image: Features: Hidden state: What if the RNN H x W x 3 D H looks at different y1 y2 parts of the image at each timestep? First Second word word Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 87

  48. Soft Attention for Captioning CNN Features: Image: L x D H x W x 3 Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 88

  49. Soft Attention for Captioning CNN h0 Features: Image: L x D H x W x 3 Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 89

  50. Soft Attention for Captioning Distribution over L locations a1 CNN h0 Features: Image: L x D H x W x 3 Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 90

  51. Soft Attention for Captioning Distribution over L locations a1 CNN h0 Features: Image: L x D Weighted H x W x 3 z1 features: D Weighted combination Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual of features Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 91

  52. Soft Attention for Captioning Distribution over L locations a1 CNN h0 h1 Features: Image: L x D Weighted H x W x 3 z1 y1 features: D Weighted combination First word Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual of features Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 92

  53. Soft Attention for Captioning Distribution over Distribution L locations over vocab a1 a2 d1 CNN h0 h1 Features: Image: L x D Weighted H x W x 3 z1 y1 features: D Weighted combination First word Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual of features Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 93

  54. Soft Attention for Captioning Distribution over Distribution L locations over vocab a1 a2 d1 CNN h0 h1 Features: Image: L x D Weighted H x W x 3 z1 y1 z2 features: D Weighted combination First word Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual of features Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 94

  55. Soft Attention for Captioning Distribution over Distribution L locations over vocab a1 a2 d1 CNN h0 h1 h2 Features: Image: L x D Weighted H x W x 3 z1 y1 z2 y2 features: D Weighted combination First word Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual of features Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 95

  56. Soft Attention for Captioning Distribution over Distribution L locations over vocab a1 a2 d1 a3 d2 CNN h0 h1 h2 Features: Image: L x D Weighted H x W x 3 z1 y1 z2 y2 features: D Weighted combination First word Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual of features Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 96

  57. Soft Attention for Captioning Distribution over Distribution Guess which framework L locations over vocab was used to implement? a1 a2 d1 a3 d2 CNN h0 h1 h2 Features: Image: L x D Weighted H x W x 3 z1 y1 z2 y2 features: D Weighted combination First word Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual of features Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 97

  58. Soft Attention for Captioning Distribution over Distribution Guess which framework L locations over vocab was used to implement? a1 a2 d1 a3 d2 Crazy RNN = Theano CNN h0 h1 h2 Features: Image: L x D Weighted H x W x 3 z1 y1 z2 y2 features: D Weighted combination First word Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual of features Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 98

  59. Soft vs Hard Attention a b CNN c d Grid of features Image: (Each D- H x W x 3 dimensional) p a p b From RNN: p c p d Distribution over grid locations Xu et al, “Show, Attend and Tell: Neural p a + p b + p c + p c = 1 Image Caption Generation with Visual Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 99

  60. Soft vs Hard Attention a b CNN c d Grid of features Image: (Each D- H x W x 3 dimensional) Context vector z (D-dimensional) p a p b From RNN: p c p d Distribution over grid locations Xu et al, “Show, Attend and Tell: Neural p a + p b + p c + p c = 1 Image Caption Generation with Visual Attention”, ICML 2015 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 13 - 24 Feb 2016 100

Recommend


More recommend