Neural Networks for Natural Language Processing Alexandre Allauzen Universit´ e Paris-Sud / LIMSI-CNRS 19/01/2017 A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 1 / 46
Introduction Outline 1 Introduction 2 The language modeling and tagging tasks 3 Neural network language model 4 Character based model sequence tagging 5 Conclusion A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 2 / 46
Introduction “Successful” applications of Natural Language Processing (NLP) A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 3 / 46
Introduction “Successful” applications of Natural Language Processing (NLP) A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 3 / 46
Introduction “Successful” applications of Natural Language Processing (NLP) A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 3 / 46
Introduction Some NLP tasks Machine Translation � 13 ���������� … � The 13th Shanghai Film festival ... Word Sense Disambiguation Paraphrase XYZ'acquired'ABC'yesterday ABC'has'been'taken'over'by'XYZ I need new batteries for my mouse Syntactic Parsing Summarization Spam detection The Dow Jones is up Economy Let’s&go&to&Agra!& The S&P 500 jumped is good Buy$V1AGRA$…$ Housing price rose I see him with a telescope Part of Speech (POS) tagging Coreference resolution Dialog / Question Answering Where is a Bug's life playing ? ADJ'''''''ADJ''''''NOUN''VERB'''''ADV Carter'told'Mubarak' he 'shouldn’t'run'again Sept Parnassien at 7:30 Colorless'''green'''ideas'''sleep'''furiously A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 4 / 46
Introduction Ambiguous, noisy and with great variability Why NLP is so hard ? Named entities and Idioms Non-canonical language Where is A Bug’s Life playing (...) Great job @justinbieber ! Were SOO PROUD of what youve done! Let It Be was recorded (...) Push the daisies U taught us 2 #neversaynever & you yourself should never give up either lose face Neologism World knowledge unfriend, retweet, Mary and Sue are sisters bromance, +1, ... Mary and Sue are mothers A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 5 / 46
Introduction Ambiguous, noisy and with great variability Why NLP is so hard ? Named entities and Idioms Non-canonical language Where is A Bug’s Life playing (...) Great job @justinbieber ! Were SOO PROUD of what youve done! Let It Be was recorded (...) Push the daisies U taught us 2 #neversaynever & you yourself should never give up either lose face Neologism World knowledge unfriend, retweet, Mary and Sue are sisters bromance, +1, ... Mary and Sue are mothers Hospitals are Sued by 7 Foot Doctors Kids Make Nutritious Snacks Iraqi Head Seeks Arms A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 5 / 46
Introduction Statistical NLP A very successful approach, indeed From Peter Norvig ( http://norvig.com/chomsky.html ) Search engines : 100% of major players are trained and probabilistic. Speech recognition : 100% of major systems ... Machine translation : 100% of top competitors ... Question answering : the IBM Watson system ... A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 6 / 46
Introduction Statistical NLP A very successful approach, indeed From Peter Norvig ( http://norvig.com/chomsky.html ) Search engines : 100% of major players are trained and probabilistic. Speech recognition : 100% of major systems ... Machine translation : 100% of top competitors ... Question answering : the IBM Watson system ... Today, add neural network/deep-learning A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 6 / 46
Introduction Statistical NLP A very successful approach, indeed From Peter Norvig ( http://norvig.com/chomsky.html ) Search engines : 100% of major players are trained and probabilistic. Speech recognition : 100% of major systems ... Machine translation : 100% of top competitors ... Question answering : the IBM Watson system ... Today, add neural network/deep-learning Statistical NLP ? Using statistical techniques to infer structures from text based on statistical language modeling . A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 6 / 46
Introduction Statistical NLP - a (very) brief history 1970 -1983: Early success in speech reocgnition Hidden Markov models for acoustic modeling The first notion of language modeling as a Makov Chain 1983 - : Dominance of empiricism and statistical methods Incorporate probabilities for most language processing Use large corpora for training and evaluation 2003 - : Neural networks As a component at the beginning ... to end-to-end systems today A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 7 / 46
Introduction NLP: Statistical issues Data sparsity in high dimension For most of NLP tasks: model structured data with very peculiar and sparse distributions with a large set of possible outcomes Ambiguity and variability The context is essential. Language is difficult to “interpret”, even for human → Learning to efficiently represent language data → Neural networks have renewed the research perspectives A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 8 / 46
Introduction Is it so important ? It is decisive ! Machine translation issue : Opinion mining and Stock prediction A. Hathaway vs Berkshire Hathaway A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 9 / 46
Introduction Outline 1 Introduction 2 The language modeling and tagging tasks 3 Neural network language model 4 Character based model sequence tagging 5 Conclusion A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 10 / 46
The language modeling and tagging tasks Outline 1 Introduction 2 The language modeling and tagging tasks 3 Neural network language model 4 Character based model sequence tagging 5 Conclusion A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 11 / 46
The language modeling and tagging tasks n -gram language model Applications Automatic Speech Recognition, Machine Translation, OCR, ... The goal Estimate the (non-zero) probability of a word sequence for a given vocabulary n -gram assumption L � P ( w 1: L ) = P ( w i | w i − n +1: i − 1 ) , ∀ i, w i ∈ V i =1 A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 12 / 46
The language modeling and tagging tasks Discrete n -gram model (conventional) A word given its context n = 4: P ( w i = ? | w i − 3 , w i − 2 , w i − 1 ) � �� � context ? time goes by → � �� � context A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 13 / 46
The language modeling and tagging tasks Discrete n -gram model (conventional) A word given its context n = 4: P ( w i = ? | w i − 3 , w i − 2 , w i − 1 ) � �� � context vocabulary: the fastly time goes by → ? � �� � slowly context A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 13 / 46
The language modeling and tagging tasks Discrete n -gram model (conventional) A word given its context n = 4: P ( w i = ? | w i − 3 , w i − 2 , w i − 1 ) � �� � context vocabulary: P (? | time goes by) θ the the fastly θ fastly time goes by → ? � �� � θ slowly slowly context A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 13 / 46
The language modeling and tagging tasks Discrete n -gram model (conventional) A word given its context n = 4: P ( w i = ? | w i − 3 , w i − 2 , w i − 1 ) � �� � context vocabulary: P (? | time goes by) θ the the fastly θ fastly |V| time goes by → ? � �� � θ slowly slowly context A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 13 / 46
The language modeling and tagging tasks Discrete n -gram model (conventional) A word given its context n = 4: P ( w i = ? | w i − 3 , w i − 2 , w i − 1 ) � �� � context vocabulary: P (? | time goes by) θ the the θ fastly fastly |V| time goes by → ? � �� � slowly θ slowly context |V| 4 parameters, Maximum Likelihood Estimate A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 13 / 46
The language modeling and tagging tasks The Zipf law (for French) frequency ∝ 1 /rank 6 1e7 5 4 3 2 1 0 0 200 400 600 800 1000 1200 A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 14 / 46
The language modeling and tagging tasks The Zipf law (for French) frequency ∝ 1 /rank Zipf law 10 8 de la l' des 10 7 que ne même 10 6 place enfants recherche 10 5 cent prévues cherchent 10 4 prestigieuse reconnaisse 10 3 stimulants mélopée 10 2 Hirano Rainville 10 1 Kande 10 0 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 Rank A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 14 / 46
The language modeling and tagging tasks The Zipf law (for French) frequency ∝ 1 /rank Zipf law 10 8 de la l' des 10 7 que ne même 10 6 place enfants recherche 10 5 cent prévues cherchent 10 4 prestigieuse reconnaisse 10 3 stimulants mélopée 10 2 Bures-sur-Yvette,133096 Hirano Rainville 10 1 Kande 10 0 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 Rank A. Allauzen (Univ. Paris-Sud/LIMSI) NNet & NLP 19/01/2017 14 / 46
Recommend
More recommend