Statistical Natural Language Processing Sing DET NOUN PUNCT Def Sing 3s,Pres Sing,Dem case PROPN det obl root det nsubj punct Ç. Çöltekin, VERB DET Summer Semester 2018 Next Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2018 3 / 27 Motivation Overview Çağrı Çöltekin Annotation layers: example ADP From the AP comes this story : SfS / University of Tübingen 4 / 27 Speech But it must be recognized that the notion ’probability of a Summer Semester 2018 6 / 27 Motivation Overview Practical matters Next On the word ‘statistical’ sentence’ is an entirely useless one, under any known Ç. Çöltekin, interpretation of this term. — Chomsky (1968) rule-based methods from 80’s 90’s) statistical component Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2018 SfS / University of Tübingen another (recent/trending) approach Motivation 5 / 27 Overview Practical matters Next Typical NLP pipeline Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2018 Motivation improves the results Overview Practical matters Next Do we need a pipeline? pipeline approach: – tasks are done individually, results are passed to upper level Synthesis Practical matters Generation 1 / 27 space change through time and behavior For fun (research): recognition/synthesis Word For profjt (engineering): Application examples Next Practical matters Overview Motivation Summer Semester 2018 research SfS / University of Tübingen Ç. Çöltekin, science (and more) program Why study (statistical) NLP Next Practical matters Overview Motivation Summer Semester 2018 Seminar für Sprachwissenschaft University of Tübingen ccoltekin@sfs.uni-tuebingen.de annotation for linguistic 7 / 27 Ç. Çöltekin, SfS / University of Tübingen phonetics / phonology morphology syntax semantics discourse Analysis Generation Speech Recognition Morphological Analysis Next Practical matters Parsing Semantic Summer Semester 2018 2 / 27 Generation Sentence Planning Sentence analysis Discourse Motivation analysis Overview Layers of linguistic analysis • (Most of) you are studying in a ‘computational linguistics’ / tʃaːɾˈɯ tʃœltecˈɪn / • Many practical applications • Investigating basic questions in linguistics and cognitive • Machine translation • Modeling cognitive/social • Question answering • Authorship attribution • Information retrieval • Investigating language • Dialog systems • Summarization • Text classifjcation • (Automatic) corpus • Text mining/analytics • Sentiment analysis • Speech • Automatic grading • Forensic linguistics • Text processing / normalization • Word/sentence tokenization → Syntax • POS tagging • Morphological analysis • Syntactic parsing → Tokens • Semantic parsing → POS Tags → Morphology • Named entity recognition • Coreference resolution • Most ”traditional” NLP architectures are based on a • Some linguistic traditions emphasize(d) use of ‘symbolic’, • Joint learning (e.g., POS tagging and syntax) often • Some NLP systems are based on rule-based systems (esp. • End-to-end learning (without intermediate layers) is • Virtually, all modern NLP systems include some sort of
Motivation Even more ambiguities Statistical methods and data sparsity Next Practical matters Overview Motivation 12 / 27 Summer Semester 2018 SfS / University of Tübingen Ç. Çöltekin, Cartoon Theories of Linguistics, SpecGram Vol CLIII, No 4, 2008. http://specgram.com/CLIII.4/school.gif with pretty pictures Next disambiguation component is necessary Practical matters Overview Motivation 11 / 27 Summer Semester 2018 SfS / University of Tübingen Ç. Çöltekin, Overview you’re not alone! with anchovies is better. I don’t know. know to deal with ambiguities Ç. Çöltekin, elephant in my pajamas. SfS / University of Tübingen Summer Semester 2018 SfS / University of Tübingen Ç. Çöltekin, tools in NLP What is in this course Next Practical matters Overview Motivation 14 / 27 Summer Semester 2018 Ç. Çöltekin, SfS / University of Tübingen relative frequency rank a long tail follows … word frequencies in a small corpus Languages are full of rare events Next Practical matters Overview Motivation 13 / 27 Summer Semester 2018 How he got in my pajamas, alive. it’s too hard to read. NLP and computational complexity Next Practical matters Overview Motivation 9 / 27 Summer Semester 2018 SfS / University of Tübingen Ç. Çöltekin, search space probabilities of words in it? Next fun with newspaper headlines Practical matters Overview Motivation 8 / 27 Summer Semester 2018 SfS / University of Tübingen Ç. Çöltekin, What is diffjcult with NLP? Next Practical matters NLP and ambiguity 15 / 27 fruit fmies like a banana. More ambiguities we do not recognize many of them at fjrst read Next Practical matters Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2018 Overview 10 / 27 Motivation • How many possible parses a sentence may have? • How many ways can you align two (parallel) sentences? • Combinatorial problems - computational complexity • How to calculate probability of sentence based on the • Ambiguity • Data sparseness • Many similar questions we deal with have an exponential • Naive approaches often are computationally intractable • Time fmies like an arrow; • Hearing voices? Then • FARMER BILL DIES IN HOUSE • Outside of a dog, a book is • No parking on both sides. • TEACHER STRIKES IDLE KIDS • They are canning peas. • SQUAD HELPS DOG BITE VICTIM a man’s best friend; inside • My job was keeping him • BAN ON NUDE DANCING ON GOVERNOR’S DESK • One morning I shot an • PROSTITUTES APPEAL TO POPE • We watched another fmy. • KIDS MAKE NUTRITIOUS SNACKS • Double job pay. • DRUNK GETS NINE MONTHS IN VIOLIN CASE • He fed her cat food. • MINERS REFUSE TO WORK AFTER DEATH • Don’t eat the pizza with knife and fork ; the one • Statistical methods (machine learning) are the best way we • Even for rule-based approaches, a statistical • Machine learning methods require (annotated) data • But … 0 . 06 • Quick introduction / refreshers on important prerequisites 0 . 04 • The computational linguist’s toolbox: basic methods and 0 . 02 • Some applications of NLP 0 . 00 0 50 100 150 200 250
Recommend
More recommend