Introduction to Artificial Intelligence Natural Language Processing - PowerPoint PPT Presentation

Introduction to Artificial Intelligence Natural Language Processing Janyl Jumadinova November 14, 2016 Credit: NLP Stanford

Question Answering: IBM’s Watson 2/25

Information Extraction 3/25

Sentiment Extraction 4/25 Source: Washington Post

Machine Translation 5/25

Language Technology 6/25

Ambiguity makes NLP hard 7/25

Ambiguity makes NLP hard ◮ Teacher Strikes Idle Kids ◮ Red Tape Holds Up New Bridges ◮ Juvenile Court to Try Shooting Defendant ◮ Local High School Dropouts Cut in Half 7/25

Other NLP Difficulties 8/25

Progress ◮ What tools do we need? ◮ Knowledge about language ◮ Knowledge about the world ◮ A way to combine knowledge sources 9/25

Progress ◮ What tools do we need? ◮ Knowledge about language ◮ Knowledge about the world ◮ A way to combine knowledge sources ◮ How we generally do this: ◮ Probabilistic models built from language data ◮ P(“maison” → “house”) → high ◮ P(“L’avocat general” → “the general avocado”) → low 9/25

Basic Text Processing Regular Expressions ◮ A formal language for specifying text strings. 10/25

Basic Text Processing Regular Expressions ◮ A formal language for specifying text strings. ◮ How can we search for any of these? woodchuck woodchucks Woodchuck Woodchucks 10/25

Regular Expressions: Disjunctions 11/25

Regular Expressions: Negation in Disjunction ◮ Negations [ ∧ Ss ] ◮ Carat means negation only when first in [] 12/25

Regular Expressions: More Disjunction ◮ Woodchucks is another name for groundhog! ◮ The pipe | for disjunction 13/25

Regular Expressions: ? * + . 14/25

Regular Expressions: Example Find all instances of the word “the” in a text 15/25

Basic Text Processing Word tokenization Every NLP task needs to do text normalization: 1. Segmenting/tokenizing words in running text 2. Normalizing word formats 3. Segmenting sentences in running text 16/25

How Many Words? 17/25

Simple Tokenization in UNIX 18/25

Basic Text Processing Normalization Every NLP task needs to do text normalization: 1. Segmenting/tokenizing words in running text 2. Normalizing word formats 3. Segmenting sentences in running text 19/25

Issues in Tokenization ◮ Finland’s capital → Finland Finlands Finland’s ◮ what’re, I’m, isn’t → What are, I am, is not ◮ Hewlett-Packard → Hewlett Packard ◮ state-of-the-art → state of the art ◮ Lowercase → lower-case lowercase lower case ◮ San Francisco → one token or two? 20/25

Issues in Tokenization ◮ Finland’s capital → Finland Finlands Finland’s ◮ what’re, I’m, isn’t → What are, I am, is not ◮ Hewlett-Packard → Hewlett Packard ◮ state-of-the-art → state of the art ◮ Lowercase → lower-case lowercase lower case ◮ San Francisco → one token or two? ◮ Language Issues : French, German, Japanese, Chinese,... 20/25

Basic Text Processing Stemming Every NLP task needs to do text normalization: 1. Segmenting/tokenizing words in running text 2. Normalizing word formats 3. Segmenting sentences in running text 21/25

Stemming ◮ Reduce terms to their stems in information retrieval ◮ Stemming is crude chopping of affixes language dependent ◮ Example: automate(s) , automatic , automation all reduced to automat . 22/25

Porter’s Algorithm Most common English stemmer. 23/25

Sentence Segmentation ◮ !, ? are relatively unambiguous 24/25

Sentence Segmentation ◮ !, ? are relatively unambiguous ◮ Period “.” is quite ambiguous - Sentence boundary - Abbreviations like Inc. or Dr. - Numbers like .02 or 4.3 24/25

Sentence Segmentation ◮ !, ? are relatively unambiguous ◮ Period “.” is quite ambiguous - Sentence boundary - Abbreviations like Inc. or Dr. - Numbers like .02 or 4.3 ◮ Build a binary classifier - Classifiers: hand-written rules, regular expressions, or machine-learning 24/25

Determining if a word is end-of-sentence: a Decision Tree 25/25

Introduction to Artificial Intelligence Natural Language Processing - PowerPoint PPT Presentation

Introduction to Artificial Intelligence Natural Language Processing Janyl Jumadinova November 14, 2016 Credit: NLP Stanford Question Answering: IBMs Watson 2/25 Information Extraction 3/25 Sentiment Extraction 4/25 Source: Washington

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Introduction and

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Introduction and

Artificial Intelligence Opponents that are challenging, or allies that are helpful Unit

Artificial Intelligence Games need opponents that are challenging, or allies that are helpful

Artificial Intelligence for Games IMGD 4000 Introduction to Artificial Intelligence (AI)

Embodied Machines Artificial vs. Embodied Intelligence Artificial Intelligence (AI)

Foundations of Artificial Intelligence 15. Natural Language Processing Understand, interpret,

CS 380: ARTIFICIAL INTELLIGENCE NATURAL LANGUAGE 12/04/2013 Santiago Ontan

Introduction to Artificial Neural Networks Ahmed Guessoum Natural Language Processing and

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Wrap-Up and Exam

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Hidden Markov

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Hidden Markov

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Common Lisp

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Common Lisp

Natural Language Processing Morphology Artificial Intelligence Lecture 7 Karim Bouzoubaa

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Data Structures

CS325 Artificial Intelligence Natural Language Processing II (Ch. 23) Dr. Cengiz Gnay, Emory

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing More Common Lisp

What is Artificial Intelligence? . . . Exactly what the computer provides is the ability not to be

Natural Language Processing Computational Linguistics Text processing Artificial Intelligence

CSE 473: Artificial Intelligence Advanced Applic's: Natural Language Processing Steve Tanimoto

Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 18-19-20 Natural

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Common Lisp Core

1.1 What is AI? 1. What is Artificial Intelligence? 2. AI Past and Present 3. Rational