Lecture 18 Natural Language Processing Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Dan Klein at Berkeley
Recap Speech Recognition Course Overview Machine Translation ✔ Introduction ✔ Learning ✔ Artificial Intelligence ✔ Supervised ✔ Intelligent Agents Decision Trees, Neural Networks ✔ Search Learning Bayesian Networks ✔ Uninformed Search ✔ Unsupervised ✔ Heuristic Search EM Algorithm ✔ Uncertain knowledge and ✔ Reinforcement Learning Reasoning ◮ Games and Adversarial Search ✔ Probability and Bayesian ◮ Minimax search and approach Alpha-beta pruning ✔ Bayesian Networks ◮ Multiagent search ✔ Hidden Markov Chains ✔ Kalman Filters ◮ Knowledge representation and Reasoning ◮ Propositional logic ◮ First order logic ◮ Inference ◮ Plannning 2
Recap Speech Recognition Outline Machine Translation 1. Recap 2. Speech Recognition 3. Machine Translation Statistical MT Rule-based MT 3
Recap Speech Recognition Recap: Sequential data Machine Translation 4
Recap Speech Recognition Recap: Filtering Machine Translation 5
Recap Speech Recognition Recap: State Trellis Machine Translation ◮ State trellis: graph of states and transitions over time ◮ Each arc represents some transition x t − 1 → x t ◮ Each arc has weight Pr ( x t | x t − 1 ) Pr ( e t | x t ) ◮ Each path is a sequence of states ◮ The product of weights on a path is the seq’s probability ◮ Can think of the Forward (and now Viterbi) algorithms as computing sums of all paths (best paths) in this graph 6
Recap Speech Recognition Recap: Forward/Viterbi Machine Translation 7
Recap Speech Recognition Recap: Particle Filtering Machine Translation Particles: track samples of states rather than an explicit distribution 8
Recap Speech Recognition Natural Language Machine Translation ◮ 100.000 years ago humans started to speak ◮ 7.000 years ago humans started to write Machines process natural language to: ◮ acquire information ◮ communicate with humans 9
Recap Speech Recognition Natural Language Processing Machine Translation ◮ Speech technologies ◮ Automatic speech recognition (ASR) ◮ Text-to-speech synthesis (TTS) ◮ Dialog systems ◮ Language processing technologies ◮ Machine translation ◮ Information extraction ◮ Web search, question answering ◮ Text classification, spam filtering, etc. 10
Recap Speech Recognition Outline Machine Translation 1. Recap 2. Speech Recognition 3. Machine Translation Statistical MT Rule-based MT 11
Recap Speech Recognition Digitalizing Speech Machine Translation Speech input is an acoustic wave form 12
Recap Speech Recognition Spectral Analysis Machine Translation 13
Recap Speech Recognition Acoustic Feature Sequence Machine Translation 14
Recap Speech Recognition State Space Machine Translation ◮ Pr ( E | X ) encodes which acoustic vectors are appropriate for each phoneme (each kind of sound) ◮ Pr ( X | X ′ ) encodes how sounds can be strung together ◮ We will have one state for each sound in each word ◮ From some state x, can only: ◮ Stay in the same state (e.g. speaking slowly) ◮ Move to the next position in the word ◮ At the end of the word, move to the start of the next word ◮ We build a little state graph for each word and chain them together to form our state space X 15
Recap Speech Recognition HMM for speech Machine Translation 16
Recap Speech Recognition Transition with Bigrams Machine Translation 17
Recap Speech Recognition Decoding Machine Translation ◮ While there are some practical issues, finding the words given the acoustics is an HMM inference problem ◮ We want to know which state sequence x 1 : T is most likely given the evidence e 1 : T : ◮ From the sequence x, we can simply read off the words 18
Recap Speech Recognition Outline Machine Translation 1. Recap 2. Speech Recognition 3. Machine Translation Statistical MT Rule-based MT 19
Recap Speech Recognition Machine Translation Machine Translation ◮ Fundamental goal: analyze and process human language, broadly, robustly, accurately... ◮ End systems that we want to build: Ambitious: speech recognition, machine translation, information extraction, dialog interfaces, question answering... Modest: spelling correction, text categorization, language recognition, genre classification. 20
Recap Speech Recognition Language Models Machine Translation ◮ Language defined by a sequence of strings and rules called grammars. ◮ Formal languages also need semantics that define meaning. ◮ Natural Languages: 1. not definitive: is disagreement with grammar rules “Not to be invited is sad” “To be not invited is sad” 2. ambiguous: “Entire store 25% off” “I will bring my bike tomorrow if it looks nice in the morning.” 3. large and constantly changing 21
Recap Speech Recognition Machine Translation ◮ n -gram sequence of n characters or sequence of n words, syllables ◮ n -gram models: define probability distributions for these sequences ◮ n -gram model is defined as a Markov chain of order n − 1. For a trigram: p ( c i | c 1 : i − 1 ) = p ( c i | c i − 2 : i − 1 ) N N � � p ( c 1 : N ) = Pr ( c i | c 1 : i − 1 ) = Pr ( c i | c i − 2 : i − 1 ) i = 1 i = 1 ◮ 100 chars � millions of entries ◮ with words even worse ◮ Corpus body of text 22
Recap Speech Recognition Language identification Machine Translation Learned from corpus: p ( c i | c i − 2 : i − 1 , l ) Most probable language: l ∗ = argmax l p ( l | c 1 : N ) = argmax l p ( l ) p ( c 1 : N | l ) (Bayes) N � = argmax l p ( l ) p ( c i | c i − 2 : i − 1 , l ) (Markov property) i = 1 Computers can reach 99% accuracy 23
Recap Speech Recognition Machine Translation Machine Translation Rough translation: gives the main point but contains errors Pre-edited translation: original text written in constrained language easier to translate automatically Restricted-source translation: fully automatic but only on technical content as e.g. weather forecast 24
Recap Speech Recognition Machine Translation Systems Machine Translation Very simplified there are three types of machine translation Statistical machine translation (SMT) learn relational dependencies of features such as grams, lemmas, etc. • Requires large data sets • Example: google translate • Relatively easy to implement Rule-based machine translation (RBMT) use grammatical rules and language constructions to analyze syntax and semantics • Use moderate size data sets • Long development time and expertise Hybrid machine translation either construct from RBMT and use SMT to post-process and optimize the result • Or use grammatical rules to derive further features to then be fed in the statistical learning machine • New direction of research. 25
Recap Speech Recognition Brief History Machine Translation 26
Recap Speech Recognition Machine Translation ◮ Interlingual model: the source language, i.e. the text to be translated is transformed into an interlingua, i.e., an abstract language-independent representation. The target language is then generated from the interlingua. ◮ Transfer model: the source language is transformed into an abstract, less language-specific representation. Linguistic rules which are specific to the language pair then transform the source language representation into an abstract target language representation and from this the target sentence is generated. ◮ Direct model: words are translated directly without passing through an additional representation. 27
Recap Speech Recognition Levels of Transfer Machine Translation Interlingua Semantics Attraction ( NamedJohn, NamedMary, High ) English Semantics French Semantics Loves ( John, Mary ) Aime ( Jean, Marie ) English Syntax French Syntax S ( NP ( John), VP ( loves, NP(Mary ))) S ( NP ( Jean), VP ( aime, NP(Marie ))) English Words French Words Jean aime Marie John loves Mary Vauquois pyramid 28
Recap Speech Recognition Levels of Transfer Machine Translation 29
Recap Speech Recognition The problem with dictionary look ups Machine Translation 30
Recap Statistical machine translation Speech Recognition Machine Translation Data driven MT 32
Recap Speech Recognition Machine Translation ◮ e sequence of strings in English ◮ f sequence of strings in French f ∗ = argmax f Pr ( f | e ) = argmax f Pr ( e | f ) Pr ( f ) ◮ Pr ( e | f ) learned from bilingual (parallel) corpus made of phrases seen before 33
Recap Speech Recognition Machine Translation e 1 e 2 e 3 e 4 e 5 There is a smelly wumpus sleeping in 2 2 f 3 f 2 f 1 f 4 f 5 Il y a un wumpus malodorant qui dort à 2 2 d 1 = 0 d 3 = -2 d 2 = +1 d 4 = +1 d 5 = 0 Given English sentence e find French sentence f ∗ : 1. break English e into phrases e 1 , . . . , e n 2. ∀ e i choose the French f i : Pr ( f i | e i ) 3. choose a permutation of phrases f 1 , . . . , f n ∀ f i choose distortion d i : num. of words that phrase f i has moved wrt f i − 1 n � Pr ( f , d | e ) = Pr ( f i | e i ) Pr ( d i ) i = 1 with 100 French phrases for a 5-gram English there are 100 5 different 5-gram and 5! reorderings. 34
Recommend
More recommend