Introd u ction to spaC y AD VAN C E D N L P W ITH SPAC Y Ines - PowerPoint PPT Presentation

Introd u ction to spaC y AD VAN C E D N L P W ITH SPAC Y Ines Montani spaC y core de v eloper

The nlp object # Import the English language class from spacy.lang.en import English # Create the nlp object nlp = English() contains the processing pipeline incl u des lang u age - speci � c r u les for tokeni z ation etc . ADVANCED NLP WITH SPACY

The Doc object # Created by processing a string of text with the nlp object doc = nlp("Hello world!") # Iterate over tokens in a Doc for token in doc: print(token.text) Hello world ! ADVANCED NLP WITH SPACY

The Token object doc = nlp("Hello world!") # Index into the Doc to get a single Token token = doc[1] # Get the token text via the .text attribute print(token.text) world ADVANCED NLP WITH SPACY

The Span object doc = nlp("Hello world!") # A slice from the Doc is a Span object span = doc[1:4] # Get the span text via the .text attribute print(span.text) world! ADVANCED NLP WITH SPACY

Le x ical attrib u tes doc = nlp("It costs $5.") print('Index: ', [token.i for token in doc]) print('Text: ', [token.text for token in doc]) print('is_alpha:', [token.is_alpha for token in doc]) print('is_punct:', [token.is_punct for token in doc]) print('like_num:', [token.like_num for token in doc]) Index: [0, 1, 2, 3, 4] Text: ['It', 'costs', '$', '5', '.'] is_alpha: [True, True, False, False, False] is_punct: [False, False, False, False, True] like_num: [False, False, False, True, False] ADVANCED NLP WITH SPACY

Let ' s practice ! AD VAN C E D N L P W ITH SPAC Y

Statistical Models AD VAN C E D N L P W ITH SPAC Y Ines Montani spaC y core de v eloper

What are statistical models ? Enable spaC y to predict ling u istic a � rib u tes in conte x t Part - of - speech tags S y ntactic dependencies Named entities Trained on labeled e x ample te x ts Can be u pdated w ith more e x amples to � ne - t u ne predictions ADVANCED NLP WITH SPACY

Model Packages import spacy nlp = spacy.load('en_core_web_sm') Binar y w eights Vocab u lar y Meta information ( lang u age , pipeline ) ADVANCED NLP WITH SPACY

Predicting Part - of - speech Tags import spacy # Load the small English model nlp = spacy.load('en_core_web_sm') # Process a text doc = nlp("She ate the pizza") # Iterate over the tokens for token in doc: # Print the text and the predicted part-of-speech tag print(token.text, token.pos_) She PRON ate VERB the DET pizza NOUN ADVANCED NLP WITH SPACY

Predicting S y ntactic Dependencies for token in doc: print(token.text, token.pos_, token.dep_, token.head.text) She PRON nsubj ate ate VERB ROOT ate the DET det pizza pizza NOUN dobj ate ADVANCED NLP WITH SPACY

Label Description E x ample ns u bj nominal s u bject She dobj direct object pi zz a det determiner ( article ) the ADVANCED NLP WITH SPACY

Predicting Named Entities # Process a text doc = nlp(u"Apple is looking at buying U.K. startup for $1 billion") # Iterate over the predicted entities for ent in doc.ents: # Print the entity text and its label print(ent.text, ent.label_) Apple ORG U.K. GPE $1 billion MONEY ADVANCED NLP WITH SPACY

Tip : the e x plain method Get q u ick de � nitions of the most common tags and labels . spacy.explain('GPE') Countries, cities, states' spacy.explain('NNP') 'noun, proper singular' spacy.explain('dobj') 'direct object' ADVANCED NLP WITH SPACY

R u le - based Matching AD VAN C E D N L P W ITH SPAC Y Ines Montani spaC y core de v eloper

Wh y not j u st reg u lar e x pressions ? Match on Doc objects , not j u st strings Match on tokens and token a � rib u tes Use the model ' s predictions E x ample : " d u ck " (v erb ) v s . " d u ck " ( no u n ) ADVANCED NLP WITH SPACY

Match patterns Lists of dictionaries , one per token Match e x act token te x ts [{'ORTH': 'iPhone'}, {'ORTH': 'X'}] Match le x ical a � rib u tes [{'LOWER': 'iphone'}, {'LOWER': 'x'}] Match an y token a � rib u tes [{'LEMMA': 'buy'}, {'POS': 'NOUN'}] ADVANCED NLP WITH SPACY

Using the Matcher (1) import spacy # Import the Matcher from spacy.matcher import Matcher # Load a model and create the nlp object nlp = spacy.load('en_core_web_sm') # Initialize the matcher with the shared vocab matcher = Matcher(nlp.vocab) # Add the pattern to the matcher pattern = [{'ORTH': 'iPhone'}, {'ORTH': 'X'}] matcher.add('IPHONE_PATTERN', None, pattern) # Process some text doc = nlp("New iPhone X release date leaked") # Call the matcher on the doc matches = matcher(doc) ADVANCED NLP WITH SPACY

Using the Matcher (2) # Call the matcher on the doc doc = nlp("New iPhone X release date leaked") matches = matcher(doc) # Iterate over the matches for match_id, start, end in matches: # Get the matched span matched_span = doc[start:end] print(matched_span.text) iPhone X match_id : hash v al u e of the pa � ern name start : start inde x of matched span end : end inde x of matched span ADVANCED NLP WITH SPACY

Matching le x ical attrib u tes pattern = [ {'IS_DIGIT': True}, {'LOWER': 'fifa'}, {'LOWER': 'world'}, {'LOWER': 'cup'}, {'IS_PUNCT': True} ] doc = nlp("2018 FIFA World Cup: France won!") 2018 FIFA World Cup: ADVANCED NLP WITH SPACY

Matching other token attrib u tes pattern = [ {'LEMMA': 'love', 'POS': 'VERB'}, {'POS': 'NOUN'} ] doc = nlp("I loved dogs but now I love cats more.") loved dogs love cats ADVANCED NLP WITH SPACY

Using operators and q u antifiers (1) pattern = [ {'LEMMA': 'buy'}, {'POS': 'DET', 'OP': '?'}, # optional: match 0 or 1 times {'POS': 'NOUN'} ] doc = nlp("I bought a smartphone. Now I'm buying apps.") bought a smartphone buying apps ADVANCED NLP WITH SPACY

Using operators and q u antifiers (2) Description {'OP': '!'} Negation : match 0 times {'OP': '?'} Optional : match 0 or 1 times {'OP': '+'} Match 1 or more times {'OP': '*'} Match 0 or more times ADVANCED NLP WITH SPACY

Introd u ction to spaC y AD VAN C E D N L P W ITH SPAC Y Ines - PowerPoint PPT Presentation

Introd u ction to spaC y AD VAN C E D N L P W ITH SPAC Y Ines Montani spaC y core de v eloper The nlp object # Import the English language class from spacy.lang.en import English # Create the nlp object nlp = English() contains the

SPAC 101 Transaction Basics and Current Trends Transaction Basics What is a SPAC? Blank

Processing pipelines AD VAN C E D N L P W ITH SPAC Y Ines Montani spaC y core de v eloper What

Training and u pdating models AD VAN C E D N L P W ITH SPAC Y Ines Montani spaC y core de v

Data Str u ct u res : Vocab , Le x emes and StringStore AD VAN C E D N L P W ITH SPAC Y Ines

INTROD TRODUCT CTION TO TO PRI RIOR ORITY TY-BASED ED B BUDGET ET BUDGETI TING F FOR

Introd u ction to a u dio data in P y thon SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON

Introd u ction to P y D u b SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON Daniel Bo u

Introd u ction IN TE R ME D IATE IN TE R AC TIVE DATA VISU AL IZATION W ITH P L OTLY IN R

Introd u ction VISU AL IZIN G G E OSPATIAL DATA IN P YTH ON Mar y v an Valkenb u rg Data

Introd u ction to signals FIN AN C IAL TR AD IN G IN R Il y a Kipnis Professional Q u antitati

Introd u ction to E x plorator y Data Anal y sis STATISTIC AL TH IN K IN G IN P YTH ON ( PAR T 1

Introd u ction to iterators P YTH ON DATA SC IE N C E TOOL BOX ( PAR T 2 ) H u go Bo w ne -

Introd u ction to EFA FAC TOR AN ALYSIS IN R Jennifer Br u sso w Ps y chometrician Ps y cho +

Introd u ction to the NASA fireball data set BU IL D IN G DASH BOAR D S W ITH SH IN YDASH BOAR

Introd u ction to Tid y Data W OR K IN G W ITH DATA IN TH E TIDYVE R SE Alison Hill

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Equivalents Work with a partner. Try writing the fraction/decimal/percentage, or saying and

Relativistic stable processes in quasi-ballistic heat conduction Samy Tindel Purdue University

Project 2 Overview UC Santa Barbara Semaphores Semaphore

CS: Pod of Delight Week 5: Campus Resources, Pizza, Luck Campus Resources Student Services

Func%onal Probabilis%c Programming CUFP 2013 Avi Pfeffer Charles

Modifiers X-bar theory Modifiers (1) a. a large small shirt b. a small large shirt (2) a. a

Democratizing Machine Learning and Artificial Intelligence: Probabilistic Programming with Scala

Comparison of sequential and parallel algorithms for word and context count Names: Eduardo

Sambuz

Useful Links

Newsletter

Mail Us