outline of today s lecture natural language processing
play

Outline of todays lecture Natural Language Processing Lecture 1: - PowerPoint PPT Presentation

Natural Language Processing Natural Language Processing Outline of todays lecture Natural Language Processing Lecture 1: Introduction Overview of the course Simone Teufel Why NLP is hard Scope of NLP Computer Laboratory A sample


  1. Natural Language Processing Natural Language Processing Outline of today’s lecture Natural Language Processing Lecture 1: Introduction Overview of the course Simone Teufel Why NLP is hard Scope of NLP Computer Laboratory A sample application: sentiment classification University of Cambridge More NLP applications January 2012 NLP components Lecture Materials created by Ann Copestake Natural Language Processing Natural Language Processing Lecture 1: Introduction Lecture 1: Introduction Overview of the course Overview of the course NLP and linguistics Also note: NLP: the computational modelling of human language. 1. Morphology — the structure of words: lecture 2. 2. Syntax — the way words are used to form phrases: ◮ Exercises: pre-lecture and post-lecture lectures 3, 4 and 5. ◮ Glossary 3. Semantics ◮ Recommended Book: Jurafsky and Martin (2008). ◮ Compositional semantics — the construction of meaning based on syntax: lecture 6. ◮ Lexical semantics — the meaning of individual words: lecture 6. 4. Pragmatics — meaning in context: lecture 7.

  2. Natural Language Processing Natural Language Processing Lecture 1: Introduction Lecture 1: Introduction Why NLP is hard Why NLP is hard Querying a knowledge base Why is this difficult? User query : Similar strings mean different things, different strings mean the ◮ Has my order number 4291 been shipped yet? same thing: Database : 1. How fast is the TZ? ORDER 2. How fast will my TZ arrive? Order number Date ordered Date shipped 3. Please tell me when I can expect the TZ I ordered. 4290 2/2/09 2/2/09 Ambiguity: 4291 2/2/09 2/2/09 ◮ Do you sell Sony laptops and disk drives? 4292 2/2/09 ◮ Do you sell (Sony (laptops and disk drives))? USER: Has my order number 4291 been shipped yet? ◮ Do you sell (Sony laptops) and disk drives)? DB QUERY: order(number=4291,date_shipped=?) RESPONSE: Order number 4291 was shipped on 2/2/09 Natural Language Processing Natural Language Processing Lecture 1: Introduction Lecture 1: Introduction Why NLP is hard Why NLP is hard Why is this difficult? Why is this difficult? Similar strings mean different things, different strings mean the Similar strings mean different things, different strings mean the same thing: same thing: 1. How fast is the TZ? 1. How fast is the TZ? 2. How fast will my TZ arrive? 2. How fast will my TZ arrive? 3. Please tell me when I can expect the TZ I ordered. 3. Please tell me when I can expect the TZ I ordered. Ambiguity: Ambiguity: ◮ Do you sell Sony laptops and disk drives? ◮ Do you sell Sony laptops and disk drives? ◮ Do you sell (Sony (laptops and disk drives))? ◮ Do you sell (Sony (laptops and disk drives))? ◮ Do you sell (Sony laptops) and disk drives)? ◮ Do you sell (Sony laptops) and disk drives)?

  3. Natural Language Processing Natural Language Processing Lecture 1: Introduction Lecture 1: Introduction Why NLP is hard Why NLP is hard Why is this difficult? Why is this difficult? Similar strings mean different things, different strings mean the Similar strings mean different things, different strings mean the same thing: same thing: 1. How fast is the TZ? 1. How fast is the TZ? 2. How fast will my TZ arrive? 2. How fast will my TZ arrive? 3. Please tell me when I can expect the TZ I ordered. 3. Please tell me when I can expect the TZ I ordered. Ambiguity: Ambiguity: ◮ Do you sell Sony laptops and disk drives? ◮ Do you sell Sony laptops and disk drives? ◮ Do you sell (Sony (laptops and disk drives))? ◮ Do you sell (Sony (laptops and disk drives))? ◮ Do you sell (Sony laptops) and disk drives)? ◮ Do you sell (Sony laptops) and disk drives)? Natural Language Processing Natural Language Processing Lecture 1: Introduction Lecture 1: Introduction Why NLP is hard Why NLP is hard Why is this difficult? Wouldn’t it be better if . . . ? Similar strings mean different things, different strings mean the The properties which make natural language difficult to process same thing: are essential to human communication: 1. How fast is the TZ? ◮ Flexible 2. How fast will my TZ arrive? ◮ Learnable but compact 3. Please tell me when I can expect the TZ I ordered. ◮ Emergent, evolving systems Ambiguity: Synonymy and ambiguity go along with these properties. ◮ Do you sell Sony laptops and disk drives? Natural language communication can be indefinitely precise: ◮ Ambiguity is mostly local (for humans) ◮ Do you sell (Sony (laptops and disk drives))? ◮ Semi-formal additions and conventions for different genres ◮ Do you sell (Sony laptops) and disk drives)?

  4. Natural Language Processing Natural Language Processing Lecture 1: Introduction Lecture 1: Introduction Why NLP is hard Scope of NLP Wouldn’t it be better if . . . ? Some NLP applications ◮ spelling and grammar ◮ information extraction checking The properties which make natural language difficult to process ◮ question answering are essential to human communication: ◮ optical character ◮ summarization ◮ Flexible recognition (OCR) ◮ text segmentation ◮ screen readers ◮ Learnable but compact ◮ exam marking ◮ augmentative and ◮ Emergent, evolving systems ◮ report generation alternative communication Synonymy and ambiguity go along with these properties. ◮ machine translation ◮ machine aided translation Natural language communication can be indefinitely precise: ◮ natural language interfaces ◮ lexicographers’ tools ◮ Ambiguity is mostly local (for humans) to databases ◮ information retrieval ◮ Semi-formal additions and conventions for different genres ◮ email understanding ◮ document classification ◮ dialogue systems ◮ document clustering Natural Language Processing Natural Language Processing Lecture 1: Introduction Lecture 1: Introduction A sample application: sentiment classification A sample application: sentiment classification Sentiment classification: finding out what people think Motorola KRZR (from the Guardian) about you Motorola has struggled to come up with a worthy successor to the RAZR, arguably the most influential ◮ Task: scan documents for positive and negative opinions phone of the past few years. Its latest attempt is the on people, products etc. KRZR, which has the same clamshell design but has ◮ Find all references to entity in some document collection: some additional features. It has a striking blue finish on the front and the back of the handset is very tactile list as positive, negative (possibly with strength) or neutral. brushed rubber. Like its predecessors, the KRZR has ◮ Summaries plus text snippets. a laser-etched keypad, but in this instance Motorola ◮ Fine-grained classification: has included ridges to make it easier to use. e.g., for phone, opinions about: overall design, keypad, . . . Overall there’s not much to dislike about the phone, camera. but its slightly quirky design means that it probably ◮ Still often done by humans . . . won’t be as huge or as hot as the RAZR.

  5. Natural Language Processing Natural Language Processing Lecture 1: Introduction Lecture 1: Introduction A sample application: sentiment classification A sample application: sentiment classification Sentiment classification: the research task IMDb: An American Werewolf in London (1981) Rating: 9/10 Ooooo. Scary. ◮ Full task: information retrieval, cleaning up text structure, The old adage of the simplest ideas being the best is named entity recognition, identification of relevant parts of once again demonstrated in this, one of the most text. Evaluation by humans. entertaining films of the early 80’s, and almost ◮ Research task: preclassified documents, topic known, certainly Jon Landis’ best work to date. The script is opinion in text along with some straightforwardly light and witty, the visuals are great and the extractable score. atmosphere is top class. Plus there are some great ◮ Movie review corpus, with ratings. freeze-frame moments to enjoy again and again. Not forgetting, of course, the great transformation scene which still impresses to this day. In Summary: Top banana Natural Language Processing Natural Language Processing Lecture 1: Introduction Lecture 1: Introduction A sample application: sentiment classification A sample application: sentiment classification Bag of words technique Sentiment words ◮ Treat the reviews as collections of individual words. ◮ Classify reviews according to positive or negative words. ◮ Could use word lists prepared by humans, but machine thanks learning based on a portion of the corpus (training set) is preferable. ◮ Use star rankings for training and evaluation. ◮ Pang et al, 2002: Chance success is 50% (movie database was artifically balanced), bag-of-words gives 80%.

  6. Natural Language Processing Natural Language Processing Lecture 1: Introduction Lecture 1: Introduction A sample application: sentiment classification A sample application: sentiment classification Sentiment words Sentiment words thanks never from Potts and Schwarz (2008) Natural Language Processing Natural Language Processing Lecture 1: Introduction Lecture 1: Introduction A sample application: sentiment classification A sample application: sentiment classification Sentiment words Sentiment words never quite from Potts and Schwarz (2008)

Recommend


More recommend