outline
play

Outline Convolutional Neural Network Architectures for Matching - PowerPoint PPT Presentation

Outline Hu, NIPS14 Irsoy, NIPS14 Outline Convolutional Neural Network Architectures for Matching Natural Language Sentences. NIPS14 Convolutional Sentence Model Convolutional Matching Models Experiments Deep Recursive Neural


  1. Outline Hu, NIPS’14 Irsoy, NIPS’14 Outline Convolutional Neural Network Architectures for Matching Natural Language Sentences. NIPS’14 Convolutional Sentence Model Convolutional Matching Models Experiments Deep Recursive Neural Networks for Compositionality in Language. NIPS’14 Deep Recursive Neural Networks Experiments LU Yangyang luyy11@sei.pku.edu.cn Jan. 14, 2015

  2. Outline Hu, NIPS’14 Irsoy, NIPS’14 Outline Convolutional Neural Network Architectures for Matching Natural Language Sentences. NIPS’14 Convolutional Sentence Model Convolutional Matching Models Experiments Deep Recursive Neural Networks for Compositionality in Language. NIPS’14

  3. Outline Hu, NIPS’14 Irsoy, NIPS’14 Authors • Convolutional Neural Network Architectures for Matching Natural Language Sentences • NIPS’14 • Baotian Hu 1 , Zhengdong Lu 2 , Hang Li 2 , and Qingcai Chen 1 1 Harbin Institute of Technology, Shenzhen Graduate School 2 Noah’s Ark Lab, Huawei Technologies Co. Ltd.

  4. Outline Hu, NIPS’14 Irsoy, NIPS’14 Introduction Matching two potentially heterogenous language objects: • to model the correspondence between “linguistic objects” of different nature at different levels of abstractions • generalizes the conventional notion of similarity or relevance • related tasks: top- k re-ranking in machine translation, dialogue, para- phrase identification

  5. Outline Hu, NIPS’14 Irsoy, NIPS’14 Introduction Matching two potentially heterogenous language objects: • to model the correspondence between “linguistic objects” of different nature at different levels of abstractions • generalizes the conventional notion of similarity or relevance • related tasks: top- k re-ranking in machine translation, dialogue, para- phrase identification Natural language sentences: • complicated structures: sequential & hierarchical Sentence matching: • the internal structures of sentences • the rich patterns in their interactions

  6. Outline Hu, NIPS’14 Irsoy, NIPS’14 Introduction Matching two potentially heterogenous language objects: • to model the correspondence between “linguistic objects” of different nature at different levels of abstractions • generalizes the conventional notion of similarity or relevance • related tasks: top- k re-ranking in machine translation, dialogue, para- phrase identification Natural language sentences: • complicated structures: sequential & hierarchical Sentence matching: • the internal structures of sentences • the rich patterns in their interactions → adapting the convolutional strategy to natural language • the hierarchical composition for sentences • the simple-to-comprehensive fusion of matching patterns

  7. Outline Hu, NIPS’14 Irsoy, NIPS’14 Convolutional Sentence Model Convolution: Given sentence input x , the convolution unit for feature map of type- f (among F l of them) on Layer- l : z ( l,f ) ( x ) the output of feature map of type- f for location i in Layer- l i W ( l,f ) the parameters for f on Layer- l σ ( · ) the activation function (Sigmoid or Relu) z ( l − 1) ˆ the segment of Layer- ( l − 1) for the convolution at location i i

  8. Outline Hu, NIPS’14 Irsoy, NIPS’14 Convolutional Sentence Model (cont.) Max-Pooling: in every two-unit window for every f • shrinks the size of the representation by half → quickly absorbs the differences in length for sentence representation • filters out undesirable composition of words Length Variability: • putting all-zero padding vectors after the last word of the sentence until the maximum length • To eliminate the boundary effect caused by the great variability of sentence lengths: adding a gate g ( v ) to the convolutional unit which sets output vectors to all-zeros if the input is all zeros

  9. Outline Hu, NIPS’14 Irsoy, NIPS’14 Some Analysis on the Convolutional Architecture The convolutional unit + max- pooling: the compositional operator with lo- cal selection mechanism as in the recursive autoencoder

  10. Outline Hu, NIPS’14 Irsoy, NIPS’14 Some Analysis on the Convolutional Architecture The convolutional unit + max- pooling: the compositional operator with lo- cal selection mechanism as in the recursive autoencoder Compared to Recursive Models: • does not take a single path of word/phrase composition (by a separate gating function, an external parser, or just natural sequential order) • takes multiple choices of composition via a large feature map and leaves the choices to the pooling afterwards to pick the more appropriate segments for each composition • limitation of the convolutional architecture: a fixed depth → bounding the level of composition it could do Relation to “Shallow” Convolutional Models: • SENNA-type architecture: a convolution layer (local) and a max-pooling layer (global) → lost sentence-level sequential order • the superset of SENNA-type architectures

  11. Outline Hu, NIPS’14 Irsoy, NIPS’14 Architecture-I (ARC-I) 1 Convolutional Matching Models • Finding the representation of each sentence • Comparing the representation for the two sentences with a multi- layer perceptron (MLP) 1the Siamese architecture: B. Antoine,et al. A semantic matching energy function for learning with multi-relational data. Machine Learning, 94(2):233 – 259, 2014.

  12. Outline Hu, NIPS’14 Irsoy, NIPS’14 Architecture-I (ARC-I) 1 Convolutional Matching Models • Finding the representation of each sentence • Comparing the representation for the two sentences with a multi- layer perceptron (MLP) The Drawback of ARC-I : • defers the interaction between two sentences to until their individual representation matures → runs at the risk of losing details important for the matching task in representing the sentences • the representation of each sentence is formed without knowledge of each other, and this cannot be adequately circumvented in backward phase (learning) 1the Siamese architecture: B. Antoine,et al. A semantic matching energy function for learning with multi-relational data. Machine Learning, 94(2):233 – 259, 2014.

  13. Outline Hu, NIPS’14 Irsoy, NIPS’14 Architecture-II (ARC-II) Convolutional Matching Models ARC-II :Built directly on the interaction space between two sentences • letting two sentences meet before their own high-level representations mature • still retaining the space for the individual development of abstraction of each sentence Layer-1: “one-dimensional” (1D) convolutions For segment i on S X and segment j on S Y : Layer-2: a 2D max-pooling in non-overlapping 2 × 2 windows Layer-3: a 2D convolution on k 3 × k 3 windows of output from Layer-2

  14. Outline Hu, NIPS’14 Irsoy, NIPS’14 Some Analysis on ARC-II Convolutional Matching Models Order Preservation: • Both the convolution and pooling operation in ARC-II have this order preserving property. z ( l ) • Generally, contains information about the i,j words in S X before those in z ( l ) i +1 ,j , although they may be generated with slightly different segments in S Y , due to the 2D pooling. Model Generality: • ARC-II actually subsumes ARC-I as a special case

  15. Outline Hu, NIPS’14 Irsoy, NIPS’14 Training Objective: negative sampling + a large margin • Stochastic gradient descent with mini-batch ( 100 ∼ 200 in sizes) • Regularization: - early stopping: enough for models with medium size and large training sets (with over 500 K instances) - early stopping + dropout: For small datasets (less than 10 k training instances) • Initialized input: 50 -dimensional word embeddings with the Word2Vec - English: learnt on Wikipedia ( ∼ 1 B words) - Chinese: learnt on Weibo data ( ∼ 300 M words) • Convolution: - 3 -word window throughout all experiments - test various numbers of feature maps (typically from 200 to 500 ) • Architecture: - ARC-II for all tasks: 8 layers (3 convolutions + 3 poolings + 2 MLPs) - ARC-I: less layers (2 convolutions + 2 poolings + 2 MLPs) and more hidden nodes

  16. Outline Hu, NIPS’14 Irsoy, NIPS’14 Tasks & Competitor Methods Three tasks: • Matching language objects of heterogenous natures I. Sentence Completion II. Tweet-Response Matching • Matching homogeneous objects III. Paraphrase Identification Competitor Methods: • WordEmbed : represent each short-text as the sum of the embedding of the words it contains and match two documents by MLP • DeepMatch 2 : 3 hidden layers and 1,000 hidden nodes in 1st hidden layers • uRAE+MLP 3 : unfolding RAE, each sentence represented as a 100-dimensional vector • SENNA+MLP/sim : the SENNA-type sentence model • SenMLP : take the whole sentence as input, and use an MLP to obtain the score of coherence 2Z. Lu and H. Li. A deep architecture for matching short texts. In Advances in NIPS, 2013 3R. Socher, E. H. Huang, and A. Y. Ng. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In Advances in NIPS, 2011.

Recommend


More recommend