Knowledge Graph Construction from Text AAAI 2017 J AY P UJARA , S AMEER S INGH , B HAVANA D ALVI
Tutorial Overview Part 1: Knowledge Graphs Part 2: Part 3: Knowledge Graph Extraction Construction Part 4: Critical Analysis 2
Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. NLP Fundamentals [Sameer] b. Information Extraction [Bhavana] Coffee Break 3. Knowledge Graph Construction a. Probabilistic Models [Jay] b. Embedding Techniques [Sameer] 4. Critical Overview and Conclusion [Bhavana] 3
What is Knowledge Extraction? Text John was born in Liverpool, to Julia and Alfred Lennon. Literal Facts Alfred Lennon childOf birthplace John Liverpool Lennon Julia childOf Lennon 4
NLP Fundamentals EXTRACTING STRUCTURES FROM LANGUAGE
What is NLP? “Knowledge” NLP Structured Unstructured Precise, Actionable Ambiguous Specific to the task Lots and lots of it! Humans can read them, but Can be used for downstream … very slowly applications, such as creating … can’t remember all Knowledge Graphs! … can’t answer questions 6
What is NLP? John was born in Liverpool, to Julia and Alfred Lennon. Natural Language Processing Lennon.. Mrs. Lennon.. his father the Pool John Lennon... .. his mother .. Alfred he Location Person Person Person John was born in Liverpool, to Julia and Alfred Lennon. NNP VBD VBD IN NNP TO NNP CC NNP NNP 7
What is Information Extraction? Lennon.. Mrs. Lennon.. his father the Pool John Lennon... .. his mother .. Alfred he Location Person Person Person John was born in Liverpool, to Julia and Alfred Lennon. NNP VBD VBD IN NNP TO NNP CC NNP NNP Information Extraction Alfred Lennon childOf spouse birthplace John Liverpool Lennon Julia childOf Lennon 8
Breaking it Down Alfred Information Lennon Extraction Entity resolution, childOf spouse Entity linking, birthplace Liverpool John Lennon Relation extraction… Julia childOf Lennon Document Lennon.. Mrs. Lennon.. his father the Pool Coreference Resolution... John Lennon... .. his mother .. Alfred he Location Person Person Person John was born in Liverpool, to Julia and Alfred Lennon. Sentence Dependency Parsing, Part of speech tagging, Named entity recognition… NNP VBD VBD IN NNP TO NNP CC NNP NNP John was born in Liverpool, to Julia and Alfred Lennon. 9
Tokenization & Sentence Splitting “Mr. Bob Dobolina is thinkin' of a master plan. Why doesn't he quit?” [Mr.] [Bob] [Dobolina] [is] [thinkin’] [of] [a] [master] [plan] [.] [Why] [doesn't] [he] [quit] [?] How it is done: Uses in KG Construction: Regular expressions, but not trivial Strictly constrains other NLP tasks • • Mr., Yahoo!, lower-case Parts of Speech • • For non-English, incredibly difficult! Dependency Parsing • • Chinese: no “space” character Directly effects KG nodes/edges • • Non-trivial for some domains… Mention boundaries • • What is a “token” in BioNLP? Relations within sentences • • 10
Tagging the Parts of Speech NNP VBD VBD IN NNP TO NNP CC NNP NNP John was born in Liverpool, to Julia and Alfred Lennon. How it is done: Uses in KG Construction: Context is important! Entities appear as nouns • • run, table, bar, … Verbs are very useful • • Label whole sentence together For identifying relations • • “Structured prediction” For identifying entity types • • Conditional Random Fields, .. Important for downstream NLP • • Now: CNNs, LSTMs, … NER, Dependency Parsing, … • • 11
Detecting Named Entities Location Person Person Person John was born in Liverpool, to Julia and Alfred Lennon. How it is done: Uses in KG Construction: Context is important! Mentions describes the nodes • • Georgia, Washington, … Types are incredibly important! • • John Deere, Thomas Cook, … Often restrict relations • • Princeton, Amazon, … Fine-grained types are informative! • • Label whole sentence together Brooklyn: city • • Structured prediction again Sanders: politician, senator • • 12
NER: Entity Types 3 class: Location, Person, Organization Stanford CoreNLP 4 class: Location, Person, Organization, Misc 7 class: Location, Person, Organization, Money, Percent, Date, Time PERSON People, including fictional. NORP Nationalities or religious or political groups. FACILITY Buildings, airports, highways, bridges, etc. ORG Companies, agencies, institutions, etc. GPE Countries, cities, states. spaCy.io LOC Non-GPE locations, mountain ranges, bodies of water. PRODUCT Objects, vehicles, foods, etc. (Not services.) EVENT Named hurricanes, battles, wars, sports events, etc. WORK_OF_ART Titles of books, songs, etc. LANGUAGE Any named language. From Stanford CoreNLP (http://nlp.stanford.edu/software/CRF-NER.shtml) 13
NER: Entity Types Fine-grained Types • More on this later… From Ling & Weld. AAAI 2012 (http://aiweb.cs.washington.edu/ai/pubs/ling-aaai12.pdf) 14
Dependency Parsing How it is done: Uses in KG Construction: Model: score trees using features Incredibly useful for relations! • • Lexical: words, POS, … What verb is attached? • • • Structure: distance, … • Relation to which mention? Prediction: Search over trees Incredibly useful for attributes! • • greedy, spanning tree, belief Appositives: “X, the CEO, …” • • propagation, dynamic prog, … Paths are used as surface relations • Using http://nlp.stanford.edu:8080/corenlp/process 15
Dependency Paths nmod:to nmod:in John was born in Liverpool, to Julia and Alfred Lennon. case case Text Patterns Dependency Paths “was born in” “was born in” John, Liverpool “was born in Liverpool, to” “was born to” John, Julia “was born in Liverpool, to Julia and” “was born to” John, Alfred Lennon 16
Within-document Coreference Mrs. Lennon.. Alfred He… .. his mother .. his father Lennon.. the Pool he John Lennon... John was born in Liverpool, to Julia and Alfred Lennon. How it is done: Uses in KG Construction: Mo`del: score pairwise links More context for each entity! • • dep path, similarity, types, … Many relations occur on pronouns • • • “representative mention” • “He is married to her” Prediction: Search over clusterings Coref can be used for types • • greedy (left to right), ILP, Nominals: The president, … • • belief propagation, MCMC, … Difficult, so often ignored • 17
Information Extraction Lennon.. Mrs. Lennon.. his father the Pool John Lennon... .. his mother .. Alfred he Location Person Person Person John was born in Liverpool, to Julia and Alfred Lennon. NNP VBD VBD IN NNP TO NNP CC NNP NNP Information Extraction Alfred Lennon childOf spouse birthplace John Liverpool Lennon Julia childOf Lennon 18
Surface Patterns Combine tokens, dependency paths, and entity types to define rules. appos nmod case det CEO , DT of Argument 1 Argument 2 Person Organization Bill Gates, the CEO of Microsoft, said … Mr. Jobs, the brilliant and charming CEO of Apple Inc., said … … announced by Steve Jobs, the CEO of Apple. … announced by Bill Gates, the director and CEO of Microsoft. … mused Bill, a former CEO of Microsoft. and many other possible instantiations… 19
Rule-Based Extraction appos nmod headOf case Implies Argument 1 Argument 2 det Argument 1 Argument 2 DT CEO of , Use a collection of rules as the system itself Person Organization Variations High precision: when it fires, it’s correct Easy to explain predictions Source: Easy to fix mistakes Manually specified • Learned from Data • However… Multiple Rules: Only work when the rules fire Attach priorities/precedence • Poor recall: Do not generalize! Attach probabilities (more later) • 20
Supervised Extraction P(birthplace) = 0.75 Machine Learning: hopefully, generalizes the labels in the right way Classifier Use all of NLP as features: words, POS, NER, dependencies, embeddings … POS NER Dep Path Text in b/w embeddings However Feature Engineering Usually, a lot of labeled data is needed, which is expensive & time consuming. John was born in Liverpool, to Julia and Alfred Lennon. Requires a lot of feature engineering! 21
Entity Resolution & Linking ...during the late 60's and early 70's, Kevin Smith worked with several local... ...the term hip-hop is attributed to Lovebug Starski . What does it actually mean... Like Back in 2008, the Lions drafted Kevin Smith , even though Smith was badly... ... backfield in the wake of Kevin Smith 's knee injury, and the addition of Haynesworth... The filmmaker Kevin Smith returns to the role of Silent Bob... Nothing could be more irrelevant to Kevin Smith 's audacious ''Dogma'' than ticking off... ... The Physiological Basis of Politics,” by Kevin Smith , Douglas Oxley, Matthew Hibbing... 22
Recommend
More recommend