Deep Transfer Learning for Visual Analysis Yu-Chiang Frank Wang, - PowerPoint PPT Presentation

Deep Transfer Learning for Visual Analysis Yu-Chiang Frank Wang, Associate Professor Dept. Electrical Engineering, National Taiwan University Taipei, Taiwan 2018/5/19 2 nd AII Workshop

Trends of Deep Learning 2

Transfer Learning: What, When, and Why? (cont’d) • A practical example https://techcrunch.com/2017/02/08/udacity-open-sources-its-self-driving-car-simulator-for-anyone-to-use/ https://googleblog.blogspot.tw/2014/04/the-latest-chapter-for-self-driving-car.html 3

Recent Research Focuses on Transfer Learning • CVPR 2018 Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation • AAAI 2018 Order-Free RNN with Visual Attention for Multi-Label Classification • CVPR 2018 Multi-Label Zero-Shot Learning with Structured Knowledge Graphs • CVPRW 2018 Unsupervised Deep Transfer Learning for Person Re-Identification 4

Detach & Adapt – Beyond Image Style Transfer • Faceapp – Putting a smile on your face! • Deep learning for representation disentanglement • Interpretable deep feature representation Input Mr. Takeshi Kaneshiro 5

Detach & Adapt – Beyond Image Style Transfer • Cross-domain image synthesis, manipulation & translation With supervision w/o supervision Transfer Disentangle Disentangle smile smile from from Photo Cartoon Y.-C. F. Wang et al. , Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation, CVPR 2018 6

Detach & Adapt – Beyond Image Style Transfer • Cross-domain image synthesis, manipulation & translation [CVPR’18] With supervision attribute W/o supervision Y.-C. F. Wang et al. , Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation, CVPR 2018 7

Example Results • Face • Photo & Sketch Conditional Unsupervised Image Translation • w/o Label supervision w/o Label supervision Unpaired Y.-C. F. Wang et al. , Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation, CVPR 2018 8

Comparisons Cross-Domain Image Translation Representation Disentanglement Unpaired Multi- Joint Interpretability of Bi-direction Unsupervised Training Data domains Representation disentangled factor X X X X Pix2pix O X O X CycleGAN Cannot disentangle image representation O O O X StarGAN O X O O UNIT O X X O DTN O X infoGAN Cannot translate images across domains X O AC-GAN O O O O O Partially CDRD (Ours) 9

Multi-Label Classification for Image Analysis • Prediction of multiple object labels from an image • Learning across image and semantics domains • No object detectors available • Desirable if be able to exploit label co-occurrence info Labels: Person Table Sofa Chair TV Lights Carpet … 11

DNN for Multi-Label Classification • Canonical-Correlated Autoencoder (C2AE) [Wang et al., AAAI 2017] • Unique integration of autoencoder & deep canonical correlation analysis (DCCA) • Autoencoder: label embedding + label recovery + label co-occurrence • DCCA: joint feature & label embedding • Can handle missing labels during learning feature space label space Clouds Clouds Lake Lake Ocean Ocean label space Latent space Water Water Sky Sky Sun Sun Sunset Sunset Y.-C. F. Wang et al. , Learning Deep Latent Spaces for Multi-Label Classification, AAAI 2017 12

Order-Free RNN with Visual Attention for Multi-Label Classification [AAAI’18] • Visual Attention for MLC [Wang et al., AAAI’18] Y.-C. F. Wang et al. , Order-Free RNN with Visual Attention for Multi-Label Classification, AAAI 2018 13

Order-Free RNN with Visual Attention for Multi-Label Classification • Experiments • NUS-WIDE: 269,648 images with 81 labels • MS-COCO: 82,783 images with 80 labels • Quantitative Evaluation MS-COCO NUS-WIDE Y.-C. F. Wang et al. , Order-Free RNN with Visual Attention for Multi-Label Classification, AAAI 2018 14

Order-Free RNN with Visual Attention for Multi-Label Classification • Qualitative Evaluation Example images in MS-COCO with the associated attention maps Incorrect predictions with reasonable visual attention Y.-C. F. Wang et al. , Order-Free RNN with Visual Attention for Multi-Label Classification, AAAI 2018 15

Multi-Label Zero-Shot Learning with Structured Knowledge Graphs [CVPR’18] • Utilizing structured knowledge graphs for modeling label dependency 16

• Our Proposed Network 17

• Our Proposed Network 18

Order-Free RNN with Visual Attention for Multi-Label Classification • Experiments • NUS-WIDE: 269,648 images with 1000 labels • MS-COCO: 82,783 images with 80 labels • Quantitative Evaluation • ML vs. ML-ZSL vs. Generalized ML-ZSL 19

Introduction: Person re-identification Camera #1 Camera #3 Camera #2 Camera #4 Person re-identification task: the system needs to match appearances of a person of interest across non-overlapping cameras. 21

Adaptation & Re-ID Network Latent Space Target Dataset 𝐽 𝑢 Latent Encoder Latent Decoder 𝐹 8 % 𝑓 2 $ % 𝑌 ℒ 5677 ℒ +:( + 𝑌 𝑢 % 𝑓 ( 𝐸 0 w/o labels 𝐹 9 𝐹 0 ℒ (%+& $ & ℒ +:( 𝑌 Source Dataset 𝐽 s & 𝑓 ( 𝑌 𝑡 ℒ 5677 + & 𝑓 2 𝐹 - w/ labels 𝐷 - ℒ ()*&& Classifier 22

Testing Scenario 23

Comparisons with Recent Re-ID Methods 24

Recent Research Focuses on Transfer Learning • AAAI 2018 Order-Free RNN with Visual Attention for Multi-Label Classification • CVPR 2018 Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation • CVPR 2018 Multi-Label Zero-Shot Learning with Structured Knowledge Graphs • CVPRW 2018 Unsupervised Deep Transfer Learning for Person Re-Identification 25

Other Ongoing Research Topics • Take a Deep Look from a Single Image • Single-Image 3D Object Model Prediction • Completing Videos from a Deep Glimpse 26

3D Shape Estimation from A Single 2D Image • Recovering Shape from a Single Image • Supervised Setting • Input image and its ground truth 3D voxel available for training 27

3D Shape Estimation from A Single 2D Image • Recovering Shape from a Single Image • Semi-Supervised Setting • Input image and its ground truth 2D mask available for training 28

3D Shape Estimation from A Single 2D Image • Example Results 29

3D Shape Estimation from A Single 2D Image • Example Results Chair pose pose 30

Recent Research Focuses • Take a Deep Look from a Single Image • Single-Image 3D Object Model Prediction • Completing Videos from a Deep Glimpse 31

What’s Video Completion? 32

From Video Synthesis to Completion • Our Proposed Network • Variational autoencoder, recurrent neural nets, and GAN Input: non-consecutive frames of interest Input Output Output: video sequence (more than one possible output) Input Synthesized Real . . . . or Three Stages in Learning Fake Input Real 1. Learning frame-based representation Temporal Temporal . . . . 2. Learning video-based representation Encoder Generator 3. Learning video representation Stochastic & Recurrent Conditional-GAN (SR-cGAN) conditioned on input anchor frames 33

Video Synthesis KTH Shape Motion MUG 34

Video Completion – Example Results Shape Motion Output (Synthesized Video) Input (Anchor Frames) GIF 6 7 11 12 14 15 6 7 11 14 15 12 KTH Output (Synthesized Video) Input (Anchor Frames) GIF 2 3 7 9 12 14 2 3 7 9 12 14 35

Video Completion - Stochasticity Output (Synthesized Video) Input (Anchor Frames) GIF 3 5 8 12 13 14 3 5 8 12 13 14 Different Motion 36

Video Interpolation & Prediction • Interpolation • Input: 2 anchor frames • • fixed on t=1 and 8 • Output 8 frames • Prediction • Input: • 6 anchor frames • Fixed on t=1~6 • Output 16 frames 37

Summary • Deep Transfer Learning for Visual Analysis • Multi-Label Classification for Image Analysis • Detach and Adapt – Beyond Image Style Transfer • Single-Image 3D Object Model Prediction • Completing Videos from a Deep Glimpse Person Table Sofa Chair TV Lights Carpet … 38

For More Information… • Vision and Learning Lab at NTUEE (http://vllab.ee.ntu.edu.tw/) 39

Thank You! 40

Deep Transfer Learning for Visual Analysis Yu-Chiang Frank Wang, - PowerPoint PPT Presentation

Deep Transfer Learning for Visual Analysis Yu-Chiang Frank Wang, Associate Professor Dept. Electrical Engineering, National Taiwan University Taipei, Taiwan 2018/5/19 2 nd AII Workshop Trends of Deep Learning 2 Transfer Learning: What, When,

CSI5180. MachineLearningfor BioinformaticsApplications Deep learning encoding and transfer

Big Transfer (BiT): General Visual Representation Learning Abhash Kumar Singh Harit Vishwakarma

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Transfer learning and domain adaptation Semi-supervised and transfer learning Myth : you cant

Knowledge Transfer for Visual Recognition The University of Tokyo RIKEN AIP (Team leader of

Deep Visual Learning on Hypersphere Weiyang Liu, Zhen Liu College of Computing Georgia

Multimodal Learning for Image Captioning and Visual Question Answering Xiaodong He Deep Learning

Transfer and Multi-Task Learning CS 294-112: Deep Reinforcement Learning Sergey Levine Class

Learning Visual Servoing with Deep Features and Fitted Q-Iteration Alex X. Lee 1 , 2,1,3 1 ,

Adapted Deep Embeddings: A Synthesis of Methods for ! -Shot Inductive Transfer Learning Tyler R.

TOWARDS CREATING A KNOWLEDGE GAP FOR DEEP LEARNING BASED MEDICAL IMAGE ANALYSIS Dr. S.

models to understand visual cortex 11-785 Introduction to Deep Learning Fall 2017 Michael Tarr

Deep Learning for Natural Language Processing Introduction to transfer learning and pre-trained

Industrial Transfer Learning Introduction to Industrial Transfer Learning Industrial Transfer

Visual deep learning models, in particular for face recognition and models of invariant

Visual Learning with Unlabeled Video and Look-Around Policies Kristen Grauman Department of

Deep (Transfer) Learning for NLP on Small Data Sets Evaluating efficacy and application of

Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model Paul

Geometry-Aware Deep Visual Learning Katerina Fragkiadaki zebras How this talk fits the workshop

Target-driven Visual Navigation in Indoor Scenes Using Deep Reinforcement Learning [Zhu et al.

CS6501: Deep Learning for Visual Recognition Seq2Seq Model & Text-to-Image Synthesis

Deep learning for visual recognition Thurs April 27 Kristen Grauman UT Austin Last time

Transfer Learning Eu Wern Teh What are we covering? Why transfer learning? Fine

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning Natalia Vassilieva, PhD

Deep Transfer Learning for Visual Analysis Yu-Chiang Frank Wang, - PowerPoint PPT Presentation

Deep Transfer Learning for Visual Analysis Yu-Chiang Frank Wang, Associate Professor Dept. Electrical Engineering, National Taiwan University Taipei, Taiwan 2018/5/19 2 nd AII Workshop Trends of Deep Learning 2 Transfer Learning: What, When,

CSI5180. MachineLearningfor BioinformaticsApplications Deep learning encoding and transfer

Big Transfer (BiT): General Visual Representation Learning Abhash Kumar Singh Harit Vishwakarma

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Transfer learning and domain adaptation Semi-supervised and transfer learning Myth : you cant

Knowledge Transfer for Visual Recognition The University of Tokyo RIKEN AIP (Team leader of

Deep Visual Learning on Hypersphere Weiyang Liu*, Zhen Liu* College of Computing Georgia

Multimodal Learning for Image Captioning and Visual Question Answering Xiaodong He Deep Learning

Transfer and Multi-Task Learning CS 294-112: Deep Reinforcement Learning Sergey Levine Class

Learning Visual Servoing with Deep Features and Fitted Q-Iteration Alex X. Lee 1 , 2,1,3 1 ,

Adapted Deep Embeddings: A Synthesis of Methods for ! -Shot Inductive Transfer Learning Tyler R.

TOWARDS CREATING A KNOWLEDGE GAP FOR DEEP LEARNING BASED MEDICAL IMAGE ANALYSIS Dr. S.

models to understand visual cortex 11-785 Introduction to Deep Learning Fall 2017 Michael Tarr

Deep Learning for Natural Language Processing Introduction to transfer learning and pre-trained

Industrial Transfer Learning Introduction to Industrial Transfer Learning Industrial Transfer

Visual deep learning models, in particular for face recognition and models of invariant

Visual Learning with Unlabeled Video and Look-Around Policies Kristen Grauman Department of

Deep (Transfer) Learning for NLP on Small Data Sets Evaluating efficacy and application of

Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model Paul

Geometry-Aware Deep Visual Learning Katerina Fragkiadaki zebras How this talk fits the workshop

Target-driven Visual Navigation in Indoor Scenes Using Deep Reinforcement Learning [Zhu et al.

CS6501: Deep Learning for Visual Recognition Seq2Seq Model &amp; Text-to-Image Synthesis

Deep learning for visual recognition Thurs April 27 Kristen Grauman UT Austin Last time

Transfer Learning Eu Wern Teh What are we covering? Why transfer learning? Fine

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning Natalia Vassilieva, PhD

Deep Visual Learning on Hypersphere Weiyang Liu, Zhen Liu College of Computing Georgia

CS6501: Deep Learning for Visual Recognition Seq2Seq Model & Text-to-Image Synthesis