Something Old, Something New A Talk about NLP for the Curious - PowerPoint PPT Presentation

Something Old, Something New A Talk about NLP for the Curious @EVANAHARI, YOW! AUSTRALIA 2016

Jabberwocky

“`Twas brillig, and the slithy toves Did gyre and gimble in the wabe: All mimsy were the borogoves, And the mome raths outgrabe.” – Lewis Carroll   from Through the Looking-Glass and What Alice Found There, 1871

Why are these monkeys following me? Arrfff! LOL

Challenges • Mistakes • Slang & sparse words • Ambiguity types • Lexical • Syntax level • Referential

Human Language • The cortical speech center unique to humans • Evolution over hundred thousands of years • Vocabulary • Grammar • Speed • An advanced processing unit • Sounds • Meaning of words • Grammar constructs • Match against a knowledge base • Understanding context and humor!

Human Language Processing Phonology − organization of sounds Morphology − construction of words Syntax − creation of valid sentences/phrases and identifying the structural roles of words in them Semantics − finding meaning of words/phrases/sentences Pragmatics − Situational meaning of sentences Discourse − order of sentences affecting interpretation World knowledge − mapping to general world knowledge Context awareness - the hardest part…?

Natural Language Processing • Computers generating language • Computers understanding human language   Lexical analysis Syntactic analysis Semantic analysis Discourse Integration Pragmatic Analysis

“You should know a word by the company it keeps.” – J. R. Firth, 1957

Language Models • Represent language in a mathematical way A language model is a function that captures the statistical characteristics of the word-sequence distribution in a language • Dimensionality challenge 10-word sequence from a 100 000 word vocabulary   —> 10 ^50 possible sequences • Large sample set vs processing time & cost vs accuracy

Bag-of-words Vocabulary: Sample text:   Happy Happy birthday to you   = [100000]   = [111100]   birthday Happy birthday to you   = [010000] = [111100]   to Happy birthday dear “name”   = [001000]   = [110011] you Happy birthday to you = [000100] = [111100] dear = [000010] “name” = [000001] = [443311] Term frequency • Not suited for huge vocabulary • Semantics are not considered • Order of words are lost

n-grams “Hello everyone who is eager to learn NLP!” • “gram”: a unit, e.g. letter, phoneme, word, … • uni-gram: Hello, everyone, who, is, … • bi-gram: Hello-everyone, everyone-who, who-is, … • n-gram: n-length sequences of units • k-skip-gram: skip k units • bi-skip-tri-gram: Hello-is-learn, everyone-eager-NLP

n-gram Probabilistic Model • Given a sequence of words what is the likelihood of the next? • Using counts of n-grams extracted from a training data set we can predict the next word x based on probabilities count (x i-(n-1) ,… ,x i-1, x i ) P(x i | x i-(n-1) ,… ,x i-1 ) = count(x i-(n-1) ,… ,x i-1 ) • Simple; only n-1 words determines the probability • Difficult to handle infrequent words and expressions • Smoothening (e.g. Good-Turing, Katz-Back-off model, etc) • Use additional sampling (bi-grams, tri-grams, skip-grams)

Example use:   Named Entity Extraction (NER) Examples: • Grammar based: “…live in <city>” • Co-occurrence based: “new+york”, “san +francisco”, … Common pattern: Inference of applying various models

Naive Bayes Probabilistic Model Apple + Round 0 Red HasLeaf +

Example Use: Text Classification Feature No Yes Sample Data Apple Red No Green 4 4/14 0.29 Green Yes Yellow 3 2 5/14 0.36 Yellow Yes Red 2 3 5/14 0.36 Red Yes Grand Total 5 9 Red Yes 5/14 9/14 Green Yes 0.36 0.64 Yellow No Incoming fruit text says “red” - is it about an apple? Yellow No P(Yes | Red) = P( Red | Yes) * P(Yes) / P (Red) Red Yes Yellow Yes P (Red |Yes) = 3/9 = 0.33 Red No P(Yes)= 9/14 = 0.64 Green Yes P(Red) 0.36 Green Yes P (Yes | Red) = 0.33 * 0.64 / 0.36 = 0.60 Yellow No 60% chance it’s about an apple!

Naive Bayes Things to Consider: • Easy and fast, good for multi-class, better than most • Does not handle unknown categories well, needs smoothing • Needs less training data, but well representative • Assuming attributes to be truly independent

Combining Models Things to Consider: • How many models can you afford? • How good are your models (i.e. training data)? • Latency vs accuracy?

Bag of Words = 0 0 0 1 = 0 1 0 0

Continuous Bag of Words (Embeddings) = 2 3 8 1 = 7 5 6 2

Distributed Representation • A word is a dot in a multi-dimensional vector space, where each dimension is features of a word • Decide features? • HUMAN: decides features; gender, plurality, semantic characteristics • COMPUTER: learn the   features; continuous values  

Neural Net Language Model • A model based on the capabilities of NN is an NNLM • Rely on the NN to discover the features of a distributed representation • Extrapolations makes it possible to keep a dense model - even for very large data sets

Mikolow et al’s CBOW vs Continuous Skip-gram • CBOW - predict a term based on context (near-terms) • w-2, w-1, w+1, w+2 —> w • fast to train • higher accuracy for frequent words • conditioning on context needs larger data sets • Continuous Skip-gram - predict context (near-terms) based on a word • w —> w-2, w-1, w+1, w+2 • k-skip-n-gram: k and n determines complexity (training time vs accuracy) • helps create more samples from a smaller data set (data sparsity, rare terms)

Diagram borrowed from Mikolow et al’s paper

              NN-based Probabilistic Prediction Model 1. Probability of next term, i.e. Bayes Theorem   P(w 1, w 2 ,… ,w t-1 , w t ) = P(w 1 )P(w 2 |w 1 )P(w 3 |w 1 ,w 2 )…P(w 1, w 2 ,… ,w t-1 ) Approximate t with n - to gain simplicity of n-grams   2. d-dimensional feature vector C w t-i (column w t-i of parameter matrix C) :   x = (Cw t-n+1, 1 , …, Cw t-n+1, d , Cw t-n+2, 1 , …, Cw t-2, d , Cw t-1, 1 , …, Cw t-1, d ) C k contains learned features for word k   3. Use standard NN for probabilistic classification (Softmax):   e ak P(w t = k|w t-n+1 , … ,w t-1 ) = SUM (i=1 to N) e ai where   a k = b k + SUM(i=1 to h) W ki tanh(c i + SUM(j=1 to (n-1)d) V ij x j )

Diagram borrowed from Bengio et al’s paper

  NLP is not New … ABBYY, Angoss, Attensity, AUTINDEX, Autonomy, Averbis, Basis Technology, Clarabridge, Complete Discovery Source, Endeca Technologies, Expert System S.p.A., FICO Score, General Sentiment, IBM LanguageWare, IBM SPSS, Insight, LanguageWare, Language Computer Corporation, Lexalytics, LexisNexis, Luminoso, Mathematica, MeaningCloud, Medallia, Megaputer Intelligence, NetOwl, RapidMiner, SAS Text Miner and Teragram;, Semantria , Smartlogic, StatSoft, Sysomos, WordStat, Xpresso, ….

…but Getting Hot (Again) • Big text data sets available • Distributed processing tech & capacity cheaper • ML-based training economically possible (and more accurate) • Open source movement • Large upswing potential… No animals were harmed during this photo shoot

Cheat Sheet • openNLP - Java, Apache, familiar, easier, older • coreNLP - Java, Stanford, popular, good tool span • NLTK - python, rich in resources, easiest • spaCy - up and coming, python, promising.. • FasCext - nothing new..? • Spark - “ML framework”, custom implementaKon, large scale • Deeplearning4j - word2vec (java, scala) • Tensorflow (SyntaxNet) - separated opKmizaKon & more tuning nobs, beCer syntax parsing model, very recently large scale too

Summary and Questions Language key to our species’ success • Our multi-step process is complex and our brains   • forgiving A language models represents word sequence   • distributions within a language Bag-of-words, n-grams are common representations • Naive bayes common for probabilistic models • Distributed representations are dense and powerful • NNLM based on learned word-features • Positive NLP trends:   • More open source tools and frameworks and generated distributed representations available to all

Jabberwocky Vote! @EVANAHARI, YOW! AUSTRALIA 2016

Something Old, Something New A Talk about NLP for the Curious - PowerPoint PPT Presentation

Something Old, Something New A Talk about NLP for the Curious @EVANAHARI, YOW! AUSTRALIA 2016 Jabberwocky `Twas brillig, and the slithy toves Did gyre and gimble in the wabe: All mimsy were the borogoves, And the mome raths outgrabe.

Financial Disclosure Statement Something Old, Something New, Something Unbreakable, and Something

VoIP Security Title : Something Old (H.323), Something New (IAX), Something Hallow ( Security ),

OLD MCDONALD COUNTY JAIL PLAT MAP OLD MCDONALD COUNTY JAIL OLD MCDONALD COUNTY JAIL OLD

Old Dominion University Old Dominion Unive sity Old Dominion University Old Dominion University

1 To check something out (pv): to see, watch, examine, try. Something/someone is not ones cup of

The Hebrew Bible The Hebrew Bible (Old Testament) The Hebrew Bible (Old Testament) The

Something old, something new? Marriage, warfare and gender relations in

Making Old Things New Reuben Thomas Old Things GNU has many old-fashioned packages

TITANIUM EYEWEAR DESIGNED IN ICELAND, MADE IN ITALY AGNAR NEW NEW NEW ALBA NEW NEW NEW

Region 9/10 2016 Flu Season Old way of reporting Benefits to old way of reporting

Example exploration Old Faithful R.W. Oldford Old Faithful In the Yellowstone National Park,

How old is too old for a military pilot? Around the old buoy again! RAeS Aerospace Medicine Group

The Old Irish Goat Society To preserve and promote the Old Irish Goat" ' Handlebars ',

WELCOME OLD CHENEY PAVEMENT REPAIR PROJECT | 40 th Street to Highway 2 NDOR Project LCLC

Methods and Resources; Orthography and Phonology Old NorseIcelandic Literature? Old

Endomorphisms - old and not so old Joachim Cuntz Copenhagen 2019 A unique, even bizarre,

The Legal Regime of Outer Space and Global Space Governance NIKLAS HEDMAN United Nations Office

An Automatically Built Named Entity Lexicon for Arabic M. Attia, A. Toral , L. Tounsi*, M.

Demographic Surveys of Arab Annotators on CrowdFlower Hamdy Mubarak, Kareem Darwish {hmubarak,

Yarmouk Basin Development Pre 1948 Rutenberg Zionist Concession (Dam, Naharayim Station,

Early CDF 12-17-2010 Alvin Tollestrup 1 The U. S. Laboratories 1970-1995 Foundations of the

DISCLAIMER DISCLAIMER DISCLAIMER DISCLAIMER HISTORY HISTORY 1910 1945 HISTORY 38TH

Chapter 6. Study of Crises and Inflation UMSL Max Gillman Max Gillman () 1 / 68 Study of

Sharing Time: Carving Out a Space for Collections Outreach ALAO SCAig and the Society of Ohio

Something Old, Something New A Talk about NLP for the Curious - PowerPoint PPT Presentation

Something Old, Something New A Talk about NLP for the Curious @EVANAHARI, YOW! AUSTRALIA 2016 Jabberwocky `Twas brillig, and the slithy toves Did gyre and gimble in the wabe: All mimsy were the borogoves, And the mome raths outgrabe.

Financial Disclosure Statement Something Old, Something New, Something Unbreakable, and Something

VoIP Security Title : Something Old (H.323), Something New (IAX), Something Hallow ( Security ),

OLD MCDONALD COUNTY JAIL PLAT MAP OLD MCDONALD COUNTY JAIL OLD MCDONALD COUNTY JAIL OLD

Old Dominion University Old Dominion Unive sity Old Dominion University Old Dominion University

1 To check something out (pv): to see, watch, examine, try. Something/someone is not ones cup of

The Hebrew Bible The Hebrew Bible (Old Testament) The Hebrew Bible (Old Testament) The

Something old, something new? Marriage, warfare and gender relations in

Making Old Things New Reuben Thomas Old Things GNU has many old-fashioned packages

TITANIUM EYEWEAR DESIGNED IN ICELAND, MADE IN ITALY AGNAR NEW NEW NEW ALBA NEW NEW NEW

Region 9/10 2016 Flu Season Old way of reporting Benefits to old way of reporting

Example exploration Old Faithful R.W. Oldford Old Faithful In the Yellowstone National Park,

How old is too old for a military pilot? Around the old buoy again! RAeS Aerospace Medicine Group

The Old Irish Goat Society To preserve and promote the Old Irish Goat&quot; ' Handlebars ',

WELCOME OLD CHENEY PAVEMENT REPAIR PROJECT | 40 th Street to Highway 2 NDOR Project LCLC

Methods and Resources; Orthography and Phonology Old NorseIcelandic Literature? Old

Endomorphisms - old and not so old Joachim Cuntz Copenhagen 2019 A unique, even bizarre,

The Legal Regime of Outer Space and Global Space Governance NIKLAS HEDMAN United Nations Office

An Automatically Built Named Entity Lexicon for Arabic M. Attia*, A. Toral *, L. Tounsi*, M.

Demographic Surveys of Arab Annotators on CrowdFlower Hamdy Mubarak, Kareem Darwish {hmubarak,

Yarmouk Basin Development Pre 1948 Rutenberg Zionist Concession (Dam, Naharayim Station,

Early CDF 12-17-2010 Alvin Tollestrup 1 The U. S. Laboratories 1970-1995 Foundations of the

DISCLAIMER DISCLAIMER DISCLAIMER DISCLAIMER HISTORY HISTORY 1910 1945 HISTORY 38TH

Chapter 6. Study of Crises and Inflation UMSL Max Gillman Max Gillman () 1 / 68 Study of

Sharing Time: Carving Out a Space for Collections Outreach ALAO SCAig and the Society of Ohio

The Old Irish Goat Society To preserve and promote the Old Irish Goat" ' Handlebars ',

An Automatically Built Named Entity Lexicon for Arabic M. Attia, A. Toral , L. Tounsi*, M.