slide credits agrawal slide credits agrawal slide credits
play

Slide Credits:Agrawal Slide Credits:Agrawal Slide Credits:Agrawal - PowerPoint PPT Presentation

Slide Credits:Agrawal Slide Credits:Agrawal Slide Credits:Agrawal Kolmogorov-Smirnov Test p(Captions vs (Q+A))<0.001 LSTM : one hidden layer MLP : 2 hidden layer fc network output size 1024 1000 dropout(0.5) units tanh each word size


  1. Slide Credits:Agrawal

  2. Slide Credits:Agrawal

  3. Slide Credits:Agrawal

  4. Kolmogorov-Smirnov Test p(Captions vs (Q+A))<0.001

  5. LSTM : one hidden layer MLP : 2 hidden layer fc network output size 1024 1000 dropout(0.5) units tanh each word size 300 end-to-end learning cross-entropy Deeper LSTM: two hidden layer output : 2048 > fc+tanh >1024 Input Vocabulary : All question words

  6. 2-Channel VQA Model Neural Network Image Embedding Softmax over top K answers 4096-dim Convolution Layer Pooling Layer Convolution Layer Pooling Layer Fully-Connected MLP + Non-Linearity + Non-Linearity Embedding Question “How many horses are in this image?” 1024-dim Slide Credits:Agrawal

  7. Ablation #1: Language-alone Neural Network Embedding Image Softmax 1k output over top K answers units Convolution Layer Pooling Layer Convolution Layer Pooling Layer Fully-Connected MLP + Non-Linearity + Non-Linearity Question Embedding “How many horses are in this image?” 1024-dim Slide Credits:Agrawal

  8. Ablation #2: Vision-alone Neural Network Image Embedding Softmax over top K answers 4096-dim Convolution Layer Pooling Layer Convolution Layer Pooling Layer Fully-Connected MLP + Non-Linearity + Non-Linearity Question Embedding “How many horses are in this image?” Slide Credits:Agrawal

  9. Slide Credits:Agrawal

  10. Slide Credits:Agrawal

  11. Current Leaderboard

  12. Questions&Discussion&Demo

Recommend


More recommend