Introduction of Recent Work at MIL The University of Tokyo, NVAIL - PowerPoint PPT Presentation

Recognize, Describe, and Generate: Introduction of Recent Work at MIL The University of Tokyo, NVAIL Partner Yoshitaka Ushiku

MIL: Machine Intelligence Laboratory Beyond Human Intelligence Based on Cyber-Physical Systems Members Varying research topics ICCV, CVPR, ECCV, • One Professor ( Prof. Harada ) ICML, NIPS, • One Lecturer ( me ) ICASSP , SIGdial, • One Assistant Professor ACM Multimedia, ICME, • One Postdoc ICRA, IROS, etc. • Two Office Administrators The most important thing • 11 Ph. D. students • 23 Master students • 8 Bachelor students We are hiring! • 5 Interns

Journalist Robot • Born in 2006 • Objective: publishing news automatically – Recognize • Objects, people, actions – Describe • What is happening – Generate • Contents as humans do

Outline • Journalist Robot: ancestor of current work in MIL • Outline: research originates with this robot – Recognize • Basic: Framework for DL, Domain Adaptation • Classification: Single-modality, Multi-modalities – Describe • Image Captioning • Video Captioning – Generate • Image Reconstruction • Video Generation

Recognize

MILJS: JavaScript × Deep Learning [Hidaka+, ICLR Workshop 2017]

MILJS: JavaScript × Deep Learning [Hidaka+, ICLR Workshop 2017] • Support for both learning and inference • Support for nodes with GPGPUs – Currently WebCL is utilized. – Now working on WebGPU. • Support for nodes w/o GPGPUs • No requirements to install any software – Even ResNet with 152 layers can be trained Let me show you a preliminary demonstration using mnist!

Asymmetric Tri-training for Domain Adaptation [Saito+, submitted to ICML 2017] • Unsupervised domain adaptation Trained on mnist → Works on SVHN? – Ground-truth labels are associated with source (mnist) – However, there are no labels for target (SVHN)

Asymmetric Tri-training for Domain Adaptation [Saito+, submitted to ICML 2017] • Asymmetric Tri-training: pseudo labels for target domain

Asymmetric Tri-training for Domain Adaptation [Saito+, submitted to ICML 2017] 1 st : Training on MNIST → Add pseudo labels for easy samples eight nine 2 nd ~: Training on MNIST+α→ Add more pseudo labels

End-to-end learning for environmental sound classification [Tokozume+, ICASSP 2017] Existing methods for speech / sound recognition: ① Feature extraction: Fourier Transformation (log-mel features) ② Classification: CNN with the extracted feature map ① ② Log-mel features are suitable for human speech; but for environmental sounds…?

End-to-end learning for environmental sound classification [Tokozume+, ICASSP 2017] Proposed approach (EnvNet): CNN for both ① feature map extraction and ② classification ① ② Extracted “feature map”

End-to-end learning for environmental sound classification [Tokozume+, ICASSP 2017] Comparison of accuracy [%] on ESC-50 [Piczak, ACM MM 2015] 71.0 64.5 64.0 log-mel feature + CNN End-to-end CNN End-to-end CNN & [Piczak, MLSP 2015] (Ours) log-mel feature + CNN (Ours) EnvNet can extract discriminative features for environmental sounds

Visual Question Answering (VQA) [Saito+, ICME 2017] Question answering system for • Associated image • Question by natural language Q: Is it going to rain soon? Q: Why is there snow on one Ground Truth A: yes side of the stream and clear grass on the other? Ground Truth A: shade

Visual Question Answering (VQA) [Saito+, ICME 2017] VQA = Multi-class classification Image feature Image � � � Integrated vector � �� Answer � Question feature bed sheets, pillow � � Question � What objects are found on the bed? After integrating for �� : usual classification

Visual Question Answering [Saito+, ICME 2017] Current advancement: improving how to integrate � and � � �� • Concatenation e.g.) [Antol+, ICCV 2015] � � • Summation e.g.) Image feature (with attention) + Question feature � �� [Xu+Saenko, ECCV 2016] • Multiplication � �� e.g.) Bilinear multiplication [Fukui+, EMNLP 2016] � � � � • This work: DualNet doing sum, multiply and concatenation � ��

Visual Question Answering (VQA) [Saito+, ICME 2017] VQA Challenge 2016 (in CVPR 2016) Won the 1 st place on abstract images w/o attention mechanism Q: What fruit is yellow and brown? Q: How many screens are there? A: banana A: 2 Q: What is the boy playing with? Q: Are there any animals swimming in the A: teddy bear pond? A: no

Describe

Automatic Image Captioning [Ushiku+, ACMMM 2011 ]

Training Dataset A small white dog A white van wearing a flannel parked in an warmer. empty lot. A small gray dog A white cat rests A small white dog standing on a leash. on a leash. head on a stone. Nearest Captions A black dog White and gray standing in a kitten lying on Input Image A small white dog wearing a flannel warmer. grassy area. its side. A small white dog wearing a flannel warmer. A small gray dog on a leash. A small gray dog on a leash. Silver car parked A woman posing on side of road. on a red scooter. A black dog standing in a grassy area. A black dog standing in a grassy area.

Automatic Image Captioning [ACM MM 2012 , ICCV 2015] Group of people sitting at a table with a dinner. Tourists are standing on the middle of a flat desert.

Image Captioning + Sentiment Terms [Andrew+, BMVC 2016] A confused man in a A man in a blue shirt A zebra standing in a blue shirt is sitting on a and blue jeans is field with a tree in the bench. standing in the dirty background. overlooked water.

Image Captioning + Sentiment Terms [Andrew+, BMVC 2016] Two steps for adding a sentiment term 1. Usual image captioning using CNN+RNN The most probable noun is memorized

Image Captioning + Sentiment Terms [Andrew+, BMVC 2016] Two steps for adding a sentiment term 1. Usual image captioning using CNN+RNN 2. Forced to predict sentiment term before the noun

Beyond Caption to Narrative [Andrew+, ICIP 2016] A man is holding a box of doughnuts. Then he and a woman are standing next each other. Then she is holding a plate of food.

Beyond Caption to Narrative [Andrew+, ICIP 2016] A man is he and a she is holding Narrative holding a box woman are a plate of food. of doughnuts. standing next each other.

Beyond Caption to Narrative [Andrew+, ICIP 2016] A boat is floating on the water near a mountain. And a man riding a wave on top of a surfboard. Then he on the surfboard in the water.

Generate

Image Reconstruction [Kato+, CVPR 2014] Traditional pipeline for image classification Extracting Collecting Calculating Classifying local descriptors descriptors Global feature images � � � �� d d d ( d ; θ ) p 3 d 2 m d m Camera d d 1 d 2 1 k d d k N d d Cat d 3 j j d N

Image Reconstruction [Kato+, CVPR 2014] � � � �� d d d ( d ; θ ) p 3 d 2 m d m Camera 1 d d d 2 1 k d d k N d d Cat d 3 j j d N Inversed problem: Image reconstruction from a label Pot

Image Reconstruction [Kato+, CVPR 2014] Pot Optimized arrangement using: Global location cost + Adjacency cost Other examples cat (bombay) camera grand piano gramophone headphone pyramid joshua tree wheel chair

Video Generation [Yamamoto+, ACMMM 2016] • Image generation is still challenging Only successful for controlled settings: – Human faces – Birds – Flowers BEGAN StackGAN [Berthelot+, 2017 Mar.] [Zhang+, 2016 Dec.] • Video generation is … – Additionally requiring temporal consistency – Extremely challenging [Vondrick+, NIPS 2016]

Video Generation [Yamamoto+, ACMMM 2016] • This work: generating easy videos – C3D (3D convolutional neural network) for conditional generation with an input label – tempCAE (temporal convolutional auto-encoder) for regularizing video to improve its naturalness

Video Generation [Yamamoto+, ACMMM 2016] Car runs Ours to left (C3D+tempCAE) Only C3D Ours Rocket (C3D+tempCAE) flies up Only C3D

Conclusion • MIL: Machine Intelligence Laboratory Beyond Human Intelligence Based on Cyber-Physical Systems • This talk introduces some of the current research – Recognize • Basic: Framework for DL, Domain Adaptation • Classification: Single-modality, Multi-modalities – Describe • Image Captioning, Video Captioning – Generate • Image Reconstruction, Video Generation

Introduction of Recent Work at MIL The University of Tokyo, NVAIL - PowerPoint PPT Presentation

Recognize, Describe, and Generate: Introduction of Recent Work at MIL The University of Tokyo, NVAIL Partner Yoshitaka Ushiku MIL: Machine Intelligence Laboratory Beyond Human Intelligence Based on Cyber-Physical Systems Members Varying

50 mil children 28 mil due to conflict 22 mil beJer life Con Concer cerns e exp xpres essed

How much have things changed ? 2017 1887 1952 NI Ulster (NI) NI Population 1.87 mil

Highlights & Under-the-Radar Subtleties Ken Javor EMC Consultant MIL-STD-461G USAF is

D211 F D211 F ACILITIE CILITIE S S 220 220 40-55 40-55 total school total school years

Forge.mil Open Source Collaborative Principles Within the DoD Guy Martin, Aaron Lippold -

SFC Greg Pliler Iowa Counterdrug Taskforce 515-577-2575 Gregory.s.pliler.mil@mail.mil

Effective Jan. 1, 2019 Online sellers must collect if: 1. $100,000 in Utah sales, or 2. 200 Utah

Presentation For Quarter 2, 2018 Performance 1 Financial performance No. of Branch 28-30

Recent Development in India Recent Development in India Recent Development in India Recent

CURRENT STATUS OF FOOT AND MOUTH DISEASE IN TANZANIA MICHAEL J. MADEGE Ministry of Livestock

Presentation For Quarter 1, 2018 Performance July , 2018 1 Financial performance Q1 18 Q1'

Chauvet Cave 35,000 El Castillo Cave 40,000 1 Mil=1/1000 th of an inch How does a paint company

Mahjong International League (MIL) and Duplicate Mahjong History of Mahjong Modern Mahjong and

A Comparison Between MIL-STD and Commercial EMC Requirements Part 2 By Vincent W. Greb

A Comparison Between MIL-STD and Commercial EMC Requirements Part 1 By Vincent W. Greb

Intersection Graphs for Text Analysis Elizabeth Leeds David Marchette leedsem@nswc.navy.mil

Introduction The history of the introduction of competition law around the world has been one

The Role of Demand-Side Remedies in Driving Effective Competition A Review for Which? UKCN

On the effectiveness of loan-to-value regulation in a multiconstraint framework Anna Grodecka

Smokescreen? Catherine Waddams ESRC Centre for Competition Policy & Norwich Business School

Bad Environments, Good Environments: A Non-Gaussian Asymmetric Volatility Model Geert Bekaert

Practical Application for Athletic Trainers M. Susan Guyer, DPE, ATC, CSCS, CES Which one?

The Impact of Social Ignorance on Weighted Congestion Games Vasilis Gkatzelis C.I.M.S. New York

The CAPM and Access Pricing Kevin Davis Commonwealth Bank Chair of Finance Director, Melbourne

Introduction of Recent Work at MIL The University of Tokyo, NVAIL - PowerPoint PPT Presentation

Recognize, Describe, and Generate: Introduction of Recent Work at MIL The University of Tokyo, NVAIL Partner Yoshitaka Ushiku MIL: Machine Intelligence Laboratory Beyond Human Intelligence Based on Cyber-Physical Systems Members Varying

50 mil children 28 mil due to conflict 22 mil beJer life Con Concer cerns e exp xpres essed

How much have things changed ? 2017 1887 1952 NI Ulster (NI) NI Population 1.87 mil

Highlights &amp; Under-the-Radar Subtleties Ken Javor EMC Consultant MIL-STD-461G USAF is

D211 F D211 F ACILITIE CILITIE S S 220 220 40-55 40-55 total school total school years

Forge.mil Open Source Collaborative Principles Within the DoD Guy Martin, Aaron Lippold -

SFC Greg Pliler Iowa Counterdrug Taskforce 515-577-2575 Gregory.s.pliler.mil@mail.mil

Effective Jan. 1, 2019 Online sellers must collect if: 1. $100,000 in Utah sales, or 2. 200 Utah

Presentation For Quarter 2, 2018 Performance 1 Financial performance No. of Branch 28-30

Recent Development in India Recent Development in India Recent Development in India Recent

CURRENT STATUS OF FOOT AND MOUTH DISEASE IN TANZANIA MICHAEL J. MADEGE Ministry of Livestock

Presentation For Quarter 1, 2018 Performance July , 2018 1 Financial performance Q1 18 Q1'

Chauvet Cave 35,000 El Castillo Cave 40,000 1 Mil=1/1000 th of an inch How does a paint company

Mahjong International League (MIL) and Duplicate Mahjong History of Mahjong Modern Mahjong and

A Comparison Between MIL-STD and Commercial EMC Requirements Part 2 By Vincent W. Greb

A Comparison Between MIL-STD and Commercial EMC Requirements Part 1 By Vincent W. Greb

Intersection Graphs for Text Analysis Elizabeth Leeds David Marchette leedsem@nswc.navy.mil

Introduction The history of the introduction of competition law around the world has been one

The Role of Demand-Side Remedies in Driving Effective Competition A Review for Which? UKCN

On the effectiveness of loan-to-value regulation in a multiconstraint framework Anna Grodecka

Smokescreen? Catherine Waddams ESRC Centre for Competition Policy &amp; Norwich Business School

Bad Environments, Good Environments: A Non-Gaussian Asymmetric Volatility Model Geert Bekaert

Practical Application for Athletic Trainers M. Susan Guyer, DPE, ATC, CSCS, CES Which one?

The Impact of Social Ignorance on Weighted Congestion Games Vasilis Gkatzelis C.I.M.S. New York

The CAPM and Access Pricing Kevin Davis Commonwealth Bank Chair of Finance Director, Melbourne

Highlights & Under-the-Radar Subtleties Ken Javor EMC Consultant MIL-STD-461G USAF is

Smokescreen? Catherine Waddams ESRC Centre for Competition Policy & Norwich Business School