Neural Networks for Natural Language Processing Alexandre Allauzen - PowerPoint PPT Presentation

Neural Networks for Natural Language Processing Alexandre Allauzen Universit´ e Paris-Sud / LIMSI-CNRS 19/01/2017 A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 1 / 46

Introduction Outline 1 Introduction 2 The language modeling and tagging tasks 3 Neural network language model 4 Character based model sequence tagging 5 Conclusion A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 2 / 46

Introduction “Successful” applications of Natural Language Processing (NLP) A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 3 / 46

Introduction Some NLP tasks Machine Translation � 13 �� … � The 13th Shanghai Film festival ... Word Sense Disambiguation Paraphrase XYZ'acquired'ABC'yesterday ABC'has'been'taken'over'by'XYZ I need new batteries for my mouse Syntactic Parsing Summarization Spam detection The Dow Jones is up Economy Let’s&go&to&Agra!& The S&P 500 jumped is good Buy$V1AGRA$…$ Housing price rose I see him with a telescope Part of Speech (POS) tagging Coreference resolution Dialog / Question Answering Where is a Bug's life playing ? ADJ'''''''ADJ''''''NOUN''VERB'''''ADV Carter'told'Mubarak' he 'shouldn’t'run'again Sept Parnassien at 7:30 Colorless'''green'''ideas'''sleep'''furiously A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 4 / 46

Introduction Ambiguous, noisy and with great variability Why NLP is so hard ? Named entities and Idioms Non-canonical language Where is A Bug’s Life playing (...) Great job @justinbieber ! Were SOO PROUD of what youve done! Let It Be was recorded (...) Push the daisies U taught us 2 #neversaynever & you yourself should never give up either lose face Neologism World knowledge unfriend, retweet, Mary and Sue are sisters bromance, +1, ... Mary and Sue are mothers A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 5 / 46

Introduction Ambiguous, noisy and with great variability Why NLP is so hard ? Named entities and Idioms Non-canonical language Where is A Bug’s Life playing (...) Great job @justinbieber ! Were SOO PROUD of what youve done! Let It Be was recorded (...) Push the daisies U taught us 2 #neversaynever & you yourself should never give up either lose face Neologism World knowledge unfriend, retweet, Mary and Sue are sisters bromance, +1, ... Mary and Sue are mothers Hospitals are Sued by 7 Foot Doctors Kids Make Nutritious Snacks Iraqi Head Seeks Arms A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 5 / 46

Introduction Statistical NLP A very successful approach, indeed From Peter Norvig ( http://norvig.com/chomsky.html ) Search engines : 100% of major players are trained and probabilistic. Speech recognition : 100% of major systems ... Machine translation : 100% of top competitors ... Question answering : the IBM Watson system ... A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 6 / 46

Introduction Statistical NLP A very successful approach, indeed From Peter Norvig ( http://norvig.com/chomsky.html ) Search engines : 100% of major players are trained and probabilistic. Speech recognition : 100% of major systems ... Machine translation : 100% of top competitors ... Question answering : the IBM Watson system ... Today, add neural network/deep-learning A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 6 / 46

Introduction Statistical NLP A very successful approach, indeed From Peter Norvig ( http://norvig.com/chomsky.html ) Search engines : 100% of major players are trained and probabilistic. Speech recognition : 100% of major systems ... Machine translation : 100% of top competitors ... Question answering : the IBM Watson system ... Today, add neural network/deep-learning Statistical NLP ? Using statistical techniques to infer structures from text based on statistical language modeling . A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 6 / 46

Introduction Statistical NLP - a (very) brief history 1970 -1983: Early success in speech reocgnition Hidden Markov models for acoustic modeling The first notion of language modeling as a Makov Chain 1983 - : Dominance of empiricism and statistical methods Incorporate probabilities for most language processing Use large corpora for training and evaluation 2003 - : Neural networks As a component at the beginning ... to end-to-end systems today A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 7 / 46

Introduction NLP: Statistical issues Data sparsity in high dimension For most of NLP tasks: model structured data with very peculiar and sparse distributions with a large set of possible outcomes Ambiguity and variability The context is essential. Language is difficult to “interpret”, even for human → Learning to efficiently represent language data → Neural networks have renewed the research perspectives A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 8 / 46

Introduction Is it so important ? It is decisive ! Machine translation issue : Opinion mining and Stock prediction A. Hathaway vs Berkshire Hathaway A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 9 / 46

Introduction Outline 1 Introduction 2 The language modeling and tagging tasks 3 Neural network language model 4 Character based model sequence tagging 5 Conclusion A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 10 / 46

The language modeling and tagging tasks Outline 1 Introduction 2 The language modeling and tagging tasks 3 Neural network language model 4 Character based model sequence tagging 5 Conclusion A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 11 / 46

The language modeling and tagging tasks n -gram language model Applications Automatic Speech Recognition, Machine Translation, OCR, ... The goal Estimate the (non-zero) probability of a word sequence for a given vocabulary n -gram assumption L � P ( w 1: L ) = P ( w i | w i − n +1: i − 1 ) , ∀ i, w i ∈ V i =1 A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 12 / 46

The language modeling and tagging tasks Discrete n -gram model (conventional) A word given its context n = 4: P ( w i = ? | w i − 3 , w i − 2 , w i − 1 ) � �� context ? time goes by → � �� context A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 13 / 46

The language modeling and tagging tasks Discrete n -gram model (conventional) A word given its context n = 4: P ( w i = ? | w i − 3 , w i − 2 , w i − 1 ) � �� context vocabulary: the fastly time goes by → ? � �� slowly context A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 13 / 46

The language modeling and tagging tasks Discrete n -gram model (conventional) A word given its context n = 4: P ( w i = ? | w i − 3 , w i − 2 , w i − 1 ) � �� context vocabulary: P (? | time goes by) θ the the fastly θ fastly time goes by → ? � �� θ slowly slowly context A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 13 / 46

The language modeling and tagging tasks Discrete n -gram model (conventional) A word given its context n = 4: P ( w i = ? | w i − 3 , w i − 2 , w i − 1 ) � �� context vocabulary: P (? | time goes by) θ the the fastly θ fastly |V| time goes by → ? � �� θ slowly slowly context A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 13 / 46

The language modeling and tagging tasks Discrete n -gram model (conventional) A word given its context n = 4: P ( w i = ? | w i − 3 , w i − 2 , w i − 1 ) � �� context vocabulary: P (? | time goes by) θ the the θ fastly fastly |V| time goes by → ? � �� slowly θ slowly context |V| 4 parameters, Maximum Likelihood Estimate A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 13 / 46

The language modeling and tagging tasks The Zipf law (for French) frequency ∝ 1 /rank 6 1e7 5 4 3 2 1 0 0 200 400 600 800 1000 1200 A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 14 / 46

The language modeling and tagging tasks The Zipf law (for French) frequency ∝ 1 /rank Zipf law 10 8 de la l' des 10 7 que ne même 10 6 place enfants recherche 10 5 cent prévues cherchent 10 4 prestigieuse reconnaisse 10 3 stimulants mélopée 10 2 Hirano Rainville 10 1 Kande 10 0 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 Rank A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 14 / 46

The language modeling and tagging tasks The Zipf law (for French) frequency ∝ 1 /rank Zipf law 10 8 de la l' des 10 7 que ne même 10 6 place enfants recherche 10 5 cent prévues cherchent 10 4 prestigieuse reconnaisse 10 3 stimulants mélopée 10 2 Bures-sur-Yvette,133096 Hirano Rainville 10 1 Kande 10 0 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 Rank A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 14 / 46

Neural Networks for Natural Language Processing Alexandre Allauzen - PowerPoint PPT Presentation

Neural Networks for Natural Language Processing Alexandre Allauzen Universit e Paris-Sud / LIMSI-CNRS 19/01/2017 A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 1 / 46 Introduction Outline 1 Introduction 2 The language

Lecture 4: Recurrent neural networks for natural language processing Plan of the lecture Part

IN5550: Neural Methods in Natural Language Processing IN5550 Neural Methods in Natural

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Neural networks, Language

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

IN5550 Neural Methods in Natural Language Processing Recurrent Neural Networks Stephan Oepen

IN5550: Neural Methods in Natural Language Processing IN5550 Neural Methods in Natural

IN5550: Neural Methods in Natural Language Processing IN5550 Neural Methods in Natural

IN5550 Neural Methods in Natural Language Processing Applications of Recurrent Neural Networks

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Automated Testing of Debian Packages Status Update Lucas Nussbaum lucas@debian.org Lucas

Graph structure in polynomial systems: chordal networks Pablo A. Parrilo Laboratory for

ELEC / COMP 177 Fall 2011 Some slides from Kurose

Solidity Pt. 2 Lessons 3-5 Libraries/OpenZeppelin, SafeMath Time Random number generation

More on PSL some examples, some pitfalls pulsed signal The PSL was right assert always (req

WebRTC Identity in SAML Federations Mihly Mszros May 19, 2015 NIIF Institute Budapest /

The CLT Multilinear Map From DGHV to Zeroizing Tancrde Lepoint Paris - October 14-15, 2015

Spectre and Meltdown Clifford Wolf q/Talk 2018-01-30 Spectre and Meltdown Spectre

Sambuz

Useful Links

Newsletter

Mail Us