Fast & Effective: Natural Language Understanding Mike Conover, - PowerPoint PPT Presentation

Fast & Effective: Natural Language Understanding Mike Conover, Ph.D. Principal Data Scientist

SkipFlag • Smart Knowledge Base • Instant Answers • Expert Identification • Intelligent Bot SkipFlag

Smart Knowledge Base • Entity Graph • Projects & Jargon • Relevant Articles • Documentation • Source Code

Prototype Rapidly: Or how to solve open research problems in a production environment on deadline.

Reflections Exercise is good for you.

Reflections Start with the model the state of the art claims to beat and implement that.

Containers & Model Deployment

Tiered Metadata Architecture Compute local data access • Memory constrained environments • Fast bulk write •

Language in the Wild Wikipedia Twitter Common Crawl Linked Structured Cornucopia of Malformed Text • Petabyte Scale Web Crawl • Taxonomic • Available for Free on S3 •

Word Embeddings occupy

“All models are wrong, but some are useful.” George Box

Who Needs Grammar, Anyway? Azimuth Declination Percolate Azimuth .. M’s of Dimensions .5 .9 .01 LSA / LDA, etc. Declination Orienteering Physics .. 100’s of Dimensions .9 0.1 Percolate

Targets of Interest Document Clusters Ranking Feature EDA Classification Feature Engineering

Semantic Structure Man Better Rome Best Italy King Good Woman Tokyo Japan Queen Gender Geography Superlatives

Embedding Vectors channel above sky the The sky above the port was the color of Embedding Dimension television, tuned to a dead channel. Document Vector

Word Embeddings Glove Vectors Word2Vec Wikipedia 2014 + Gigaword 5 (6B tokens, 400K vocab, uncased, Google News (100B tokens, 3M vocab, 300d) 50d, 100d, 200d, & 300d vectors, 822 MB): Freebase (100B words, 1.4M vocab, 300d) Common Crawl (42B tokens, 1.9M vocab, uncased, 300d vectors, 1.75 GB) Common Crawl (840B tokens, 2.2M vocab, cased, 300d vectors, 2.03 GB) Corpus Casing Dimensionality Size Twitter (2B tweets, 27B tokens, 1.2M vocab, uncased, 25d, 50d, 100d, & 200d vectors, 1.42 GB) https://nlp.stanford.edu/projects/glove/

Build Your Own Embeddings Out of the Box Word2Vec Doc2Vec Poincare Embeddings LDA / LSA

Tensorflow Embedding Projector Text Images Music .. Get Crazy

Compositional Embeddings Domain Specific Corpora Initialize with Pre-trained Embeddings

Cut to the Chase trichlorodifluorene FastText trichl Multiclass Classification ‒ .. Subword Embeddings ‒ fluor https://github.com/facebookresearch/fastText Bojanowski, Piotr, et al. "Enriching word vectors with subword information." arXiv (2016)

Embed All the Things! StarSpace ‒ Text Classification ‒ Graph Embeddings ‒ Similarity / Ranking ‒ Image Classification https://github.com/facebookresearch/StarSpace L. Wu "StarSpace: Embed All The Things!." arXiv (2017)

Fine-Grained Structure DisplayCy

Breakdown DisplayCy www.sadtromebone.com

Taking Sentences Apart Zeroth Law: This only works in practice, never in theory. DisplayCy

Learning to Rank with Neural Nets Sometimes Good Enough Isn’t Good Enough Severyn, Aliaksei, and Alessandro Moschitti. "Learning to rank short text pairs with convolutional deep neural networks." SIGIR 2015. http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/

Section Break Slide Area for a Subhead, or the Name and Title of a Copresenter

Pete Sam Scott Matt Skomoroch Shah Blackburn Hayes

Fast & Effective: Natural Language Understanding Mike Conover, Ph.D. Principal Data Scientist

Cut to the Chase Emoji Space https://github.com/facebookresearch/fastText P.Bojanowski, "Enriching word vectors with subword information." arXiv (2016)

Build Your Own Embeddings Paragraph Vectors (Doc2Vec) Le, Quoc, and Tomas Mikolov. "Distributed representations of sentences and documents." International Conference on Machine Learning . 2014.

Ship It! ) s k e ) e s W k ) e g 2 e n ( W i e o z g 2 w i e n ( n g t O e n e o n n p i o ( i i v t e r y i n e s c p t l t l a l u e t R a o e o z d d h t d e o i e o o r C l r o a a F l r u r M u n H P P e t e d o a l t e g n r i e u t g e h a c a p g a t m c 1 r 2 i 3 m i n v e S K L r i w p l o e i O f a S C o r t r S P

Instant Answers

Fast & Effective: Natural Language Understanding Mike Conover, - PowerPoint PPT Presentation

Fast & Effective: Natural Language Understanding Mike Conover, Ph.D. Principal Data Scientist SkipFlag Smart Knowledge Base Instant Answers Expert Identification Intelligent Bot SkipFlag Smart Knowledge Base Entity

Natural Language Understanding We want to communicate with computers using natural language

EFFECTIVE EFFECTIVE EFFECTIVE EFFECTIVE COMMUNICATIONS COMMUNICATIONS People First Language

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing Stages in understanding natural language Why its hard

Neural Language Models The New Frontier of Natural Language Understanding Gabriele Sarti

Outline of todays lecture Overview of Natural Language Generation Components of Natural

A Software Suite for the Understanding of Natural Language Marco Ponza Paolo Ferragina Natural

Matrix Completion and Matrix Concentration Lester Mackey, Ameet Talwalkar, Michael I. Jordan

NewsDiffs: Version Controlling the News Eric Price Margaret Sullivan MIT The New York Times

Exposing Inconsistent Search Results with Bobble Nick Feamster Georgia Tech Wenke Lee, Xinyu Xing,

Web Vitals for a healthier open web Ben Morss Developer Advocate DrupalCon Ben Morss

TCP Options for Low Latency: Maximum ACK Delay and Microsecond Timestamps Neal Cardwell Yuchung

Solvers Marco Chiarandini Department of Mathematics & Computer Science University of

Little Love Monsters Taylorview Media Center Penny Kimmet, kimmpenn@d91.k12.id.us Dana Carvo and

Visual SLAM with an Event-based Camera Hanme Kim Supervisor: Prof. Andrew Davison Robot Vision

Fast & Effective: Natural Language Understanding Mike Conover, - PowerPoint PPT Presentation

Fast & Effective: Natural Language Understanding Mike Conover, Ph.D. Principal Data Scientist SkipFlag Smart Knowledge Base Instant Answers Expert Identification Intelligent Bot SkipFlag Smart Knowledge Base Entity

Natural Language Understanding We want to communicate with computers using natural language

EFFECTIVE EFFECTIVE EFFECTIVE EFFECTIVE COMMUNICATIONS COMMUNICATIONS People First Language

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing Stages in understanding natural language Why its hard

Neural Language Models The New Frontier of Natural Language Understanding Gabriele Sarti

Outline of todays lecture Overview of Natural Language Generation Components of Natural

A Software Suite for the Understanding of Natural Language Marco Ponza Paolo Ferragina Natural

Matrix Completion and Matrix Concentration Lester Mackey, Ameet Talwalkar, Michael I. Jordan

NewsDiffs: Version Controlling the News Eric Price Margaret Sullivan MIT The New York Times

Exposing Inconsistent Search Results with Bobble Nick Feamster Georgia Tech Wenke Lee, Xinyu Xing,

Web Vitals for a healthier open web Ben Morss Developer Advocate DrupalCon Ben Morss

TCP Options for Low Latency: Maximum ACK Delay and Microsecond Timestamps Neal Cardwell Yuchung

Solvers Marco Chiarandini Department of Mathematics &amp; Computer Science University of

Little Love Monsters Taylorview Media Center Penny Kimmet, kimmpenn@d91.k12.id.us Dana Carvo and

Visual SLAM with an Event-based Camera Hanme Kim Supervisor: Prof. Andrew Davison Robot Vision

Solvers Marco Chiarandini Department of Mathematics & Computer Science University of