One-Shot Learning: Language Acquisition for Machine SS16 - PowerPoint PPT Presentation

One-Shot Learning: Language Acquisition for Machine SS16 Computational Linguistics for Low-Resource Languages Mayumi Ohta July 6, 2016 Institute for Computational Linguistics Heidelberg University

Table of contents 1. Introduction 2. Language Acquisition for Human 3. Language Acquisition for Machine Zero-shot learning One-shot learning Application to Low-Resource Languages 4. Summary 1

Introduction

My Interest Our Focus: How can CL/NLP support documenting low-resource languages? (collection, transcription, translation, annotation, etc.) Implicit Assumption: Only human can produce primary language resources. � = Primary language resources must be produced by human only. 2

My Interest Our Focus: How can CL/NLP support documenting low-resource languages? (collection, transcription, translation, annotation, etc.) Implicit Assumption: Only human can produce primary language resources. � = Primary language resources must be produced by human only. What if a machine can learn a language? ... of course, it is still a fantasy, but ... 2

My Interest Our Focus: How can CL/NLP support documenting low-resource languages? (collection, transcription, translation, annotation, etc.) Implicit Assumption: Only human can produce primary language resources. � = Primary language resources must be produced by human only. What if a machine can learn a language? ... of course, it is still a fantasy, but ... Big breakthrough: Deep Learning (2010 ∼ ) → no need for feature design 2

Impact of Deep Learning Example 1. Neural Network Language Model [Mikolov et al. 2011] ... Princess Mary was easier, fed in had oftened him. Pierre asking his soul came to the packs and drove up his father-in-law women. generated by LSTM-RNN LM trained with Leo Tolstoy’s " War and Peace " Source: http://karpathy.github.io/2015/05/21/rnn-effectiveness/ "Colorless green ideas sleep furiously." by Noam Chomsky 3

Impact of Deep Learning Example 1. Neural Network Language Model [Mikolov et al. 2011] ... Princess Mary was easier, fed in had oftened him. Pierre asking his soul came to the packs and drove up his father-in-law women. generated by LSTM-RNN LM trained with Leo Tolstoy’s " War and Peace " Source: http://karpathy.github.io/2015/05/21/rnn-effectiveness/ "Colorless green ideas sleep furiously." by Noam Chomsky It looks as if they know " syntax ". (3rd person singular, tense, etc.) 3

Impact of Deep Learning Example 2. word2vec [Mikolov et al. 2013a] KING − MAN + WOMAN = QUEEN Source: https://www.tensorflow.org/versions/master/tutorials/word2vec/index.html 3

Impact of Deep Learning Example 2. word2vec [Mikolov et al. 2013a] KING − MAN + WOMAN = QUEEN Source: https://www.tensorflow.org/versions/master/tutorials/word2vec/index.html Intuitive characteristics of " semantics " are (somehow!) embedded in vector space. 3

Language Acquisition for Human

First Language Acquisition Vocabulary explosion ... what happened? Kobayashi et al. 2012, modified 4

Helen Keller (1880 – 1968) "w-a-t-e-r" Image source: http://en.wikipedia.org/wiki/Helen_Keller 5

Language acquisition ... to simplify the problem: " Everything has a name " model Language acquisition → Vocabulary acquisition → Mapping between concepts and words (main focus: Nouns) ↔ "water" Image source: https://de.wikipedia.org/wiki/Wasser 6

Machine vs. Human Machine learns: 1. relationship between words (i.e. word2vec ) 2. from manually-defined features (i.e. SVM , CRF , ...) 3. from large quantity of training examples 4. iteratively (i.e. SGD ) Human kids learn: 1. relationship between words and concepts 2. from raw data 3. from just one or a few examples 4. immediately (not necessarily need repetition) 7

Machine vs. Human Machine learns: 1. relationship between words (i.e. word2vec ) 2. from manually-defined features (i.e. SVM , CRF , ...) 3. from large quantity of training examples 4. iteratively (i.e. SGD ) Human kids learn: 1. relationship between words and concepts 2. from raw data 3. from just one or a few examples 4. immediately (not necessarily need repetition) → " fast mapping " 7

Language Acquisition for Machine

Two directions Machine learning approach inspired from " fast mapping "? 8

Two directions Machine learning approach inspired from " fast mapping "? concept word zero − → "rabbit" ← − one Zero-shot learning : unknown concept → known word One-shot learning : unknown word → known concept Image source: https://en.wikipedia.org/wiki/Rabbit 8

Zero-shot learning

Zero-shot learning: Overview Example: Image Classification Task dog dog rabbit cat cat Traditional supervised setting • train a model with labeled image data Image source: https://en.wikipedia.org/ 9

Zero-shot learning: Overview Example: Image Classification Task dog dog (dog|cat|rabbit)? rabbit cat cat Traditional supervised setting • train a model with labeled image data • classify a known label for an unseen image Image source: https://en.wikipedia.org/ 9

Zero-shot learning: Overview Example: Image Classification Task dog dog rabbit cat cat Zero-shot learning • train a model with labeled image data Image source: https://en.wikipedia.org/ 9

Zero-shot learning: Overview Example: Image Classification Task dog dog (dog|cat|rabbit)? rabbit cat cat Zero-shot learning • train a model with labeled image data • classify a known but unseen label for an unseen image → no training examples for the classes of test examples Image source: https://en.wikipedia.org/ 9

Zero-shot learning: Core idea Core idea: image features Socher et al. 2013, modified 10

Zero-shot learning: Core idea Core idea: word embeddings Socher et al. 2013, modified 10

Zero-shot learning: Core idea Core idea: project image features onto word embeddings Socher et al. 2013, modified 10

Zero-shot learning: Formulation [Socher et al. 2013] Method: Multi-layer Neural Network (Back Propagation) Objective function: known labels word embedding 2 � � �� θ ( 1 ) � � � ω y − θ ( 2 ) f x ( i ) � � J (Θ) = � � � � � y ∈ Y x ( i ) ∈ X input data image features where f ( · ) : non-linear activation function such as tanh ( · ) θ ( 1 ) : weights for the first layer θ ( 2 ) : weights for the second layer → update weights such that image features closes to the word embedding 11

One-shot learning

One-shot learning: Overview Example: Automatic Speech Synthesis Traditional supervised setting • train a model with labeled audio data (pipelined: segment → cluster → learn transition prob.) • generate an audio for a given concept 12

One-shot learning: Overview Example: Automatic Speech Synthesis One-shot learning • jointly train a model with labeled audio data • generate an audio for a given concept heard before just once 12

One-shot learning: Formulation [Lake et al. 2014] Method: Hierarchical Bayesian (parametric or non-parametric) Pr ( X train | X test ) arg max Pr ( X test | X train ) = arg max Pr ( X test | X train ) (1) Pr ( X train ) � � � � X train | Z ( i ) Z ( i ) Pr Pr L train train � � � X test | Z ( i ) Pr ( X test | X train ) ≈ Pr train L i = 1 � � � � � X train | Z ( j ) Z ( j ) Pr Pr train train j = 1 (2) L � � � � � X train | Z ( i ) Z ( i ) Pr ( X train ) ≈ (3) Pr Pr train train i = 1 where X train , X test : sequences of features Z train : acoustic segments (units) L : length (number of units) 13

One-Shot Learning: Language Acquisition for Machine SS16 - PowerPoint PPT Presentation

One-Shot Learning: Language Acquisition for Machine SS16 Computational Linguistics for Low-Resource Languages Mayumi Ohta July 6, 2016 Institute for Computational Linguistics Heidelberg University Table of contents 1. Introduction 2.

SHOT Brand Price NOTES WEST COAST MAGNUM SIZES 4 - 9 $ 39.20 Eagle shot prices may not be

Siamese Network & Matching Network for one-shot learning Reference Papers Siamese Neural

Horizontal Movable Shot Blasting Machine PW2-40DA Horizontal movable shot blasting machine Dust

A Bayesian Approach to A Bayesian Approach to Unsupervised One- Unsupervised One -Shot Shot

First Language Acquisition: Inherent Difficulty of Language Acquisition Theories and Evidence

Zero-Shot Learning for Word Translation: Successes and Failures Ndapa Nakashole, University of

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Stages of Language Acquisition Learning In-Utero Stages of Babbling Stages of Phonemic

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Lecture 8 Sample Sample Chapter 8 and 10 Statistic Shot Noise Limit Homodyne Demodula-

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Linguistics 101 Language Acquisition Language Acquisition All (normal) human children...

Thermo Shot Thermo Shot F30 Series F30 Series NEC Avio Avio Infrared Technologies Co., Ltd.

Intro to Unity Shaders CM163 Lab 1 Rendering Pipeline Vertex Shader - Program that transforms

Plan for Lexical Analysis with Jlex and One Pass Code Gen Structure of the MeggyJava Compiler

Time-Space Tradeoffs for Two-Pass Learning Sumegha Garg (Princeton) Joint Work with Ran Raz

HW/SW Codesign w/ FPGAsMicroprocessors/Embedded Cores ECE 495/595 Microprocessors/Embedded Cores

Vi Video Ob eo Object ject Segm Segmen enta tati tion on CV3DST | Prof. Leal-Taix 1

. One-to-one and Onto Functions . . Definition . Let f A B denote a function from A to

Todays Agenda Upcoming Homework Section 3.2: Inverse Functions and Logarithms Lindsey

JUST THE MATHS SLIDES NUMBER 3.3 TRIGONOMETRY 3 (Approximations & inverse functions)

One-Shot Learning: Language Acquisition for Machine SS16 - PowerPoint PPT Presentation

One-Shot Learning: Language Acquisition for Machine SS16 Computational Linguistics for Low-Resource Languages Mayumi Ohta July 6, 2016 Institute for Computational Linguistics Heidelberg University Table of contents 1. Introduction 2.

SHOT Brand Price NOTES WEST COAST MAGNUM SIZES 4 - 9 $ 39.20 Eagle shot prices may not be

Siamese Network &amp; Matching Network for one-shot learning Reference Papers Siamese Neural

Horizontal Movable Shot Blasting Machine PW2-40DA Horizontal movable shot blasting machine Dust

A Bayesian Approach to A Bayesian Approach to Unsupervised One- Unsupervised One -Shot Shot

First Language Acquisition: Inherent Difficulty of Language Acquisition Theories and Evidence

Zero-Shot Learning for Word Translation: Successes and Failures Ndapa Nakashole, University of

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Stages of Language Acquisition Learning In-Utero Stages of Babbling Stages of Phonemic

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Lecture 8 Sample Sample Chapter 8 and 10 Statistic Shot Noise Limit Homodyne Demodula-

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Linguistics 101 Language Acquisition Language Acquisition All (normal) human children...

Thermo Shot Thermo Shot F30 Series F30 Series NEC Avio Avio Infrared Technologies Co., Ltd.

Intro to Unity Shaders CM163 Lab 1 Rendering Pipeline Vertex Shader - Program that transforms

Plan for Lexical Analysis with Jlex and One Pass Code Gen Structure of the MeggyJava Compiler

Time-Space Tradeoffs for Two-Pass Learning Sumegha Garg (Princeton) Joint Work with Ran Raz

HW/SW Codesign w/ FPGAsMicroprocessors/Embedded Cores ECE 495/595 Microprocessors/Embedded Cores

Vi Video Ob eo Object ject Segm Segmen enta tati tion on CV3DST | Prof. Leal-Taix 1

. One-to-one and Onto Functions . . Definition . Let f A B denote a function from A to

Todays Agenda Upcoming Homework Section 3.2: Inverse Functions and Logarithms Lindsey

JUST THE MATHS SLIDES NUMBER 3.3 TRIGONOMETRY 3 (Approximations &amp; inverse functions)

Siamese Network & Matching Network for one-shot learning Reference Papers Siamese Neural

JUST THE MATHS SLIDES NUMBER 3.3 TRIGONOMETRY 3 (Approximations & inverse functions)