BOA Bootstrapping the Linked Data Web Daniel Gerber, Axel-Cyrille - PowerPoint PPT Presentation

http://aksw.org/files/boa.pdf BOA Bootstrapping the Linked Data Web Daniel Gerber, Axel-Cyrille Ngonga Ngomo AKSW, Universität Leipzig http://www.volunteer-conservation-peru.org

Motivation • most knowledge bases extracted from (semi)-structured data • Linked Data Cloud grows • BUT: only 15-20 % of information • How can we extract data from the document-oriented web?

Idea • start with triples from the Data Web • extract natural language patterns which express predicates found in triples • combine patterns & NLP to find labels which stand in relation with predicate • generate RDF and feed it into Data Web

Related Work • NLP & RDF: • Fox, Extractiv, Alchemy, OpenCalais • NELL [CAR+10] • initial ontology: 100+ categories/relations • PROSPERA [NDA+11] • harvesting of n-grams-itemset patterns ➤ generalisation without adding noise • [JUR+10] M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Distant supervision for relation extraction without labeled data. ACL, pages 1003 – 1011, 2009.

The BOA approach

Corpus extraction • Crawler • Seed Pages, removes HTML • Cleaner • SBD, UTF-8 filters to remove noise • Indexer • sentences get index

Knowledge acquisition • Class C that serves as the rdfs:domain or as the rdfs:range of predicate p • knowledge base for background knowledge • extract statements with entities of rdf:type C as subject or object • db:Place , db:Person , db:Organisation

Pattern Search • set of entities s and o connected through p • find sentences which contain s & o, strip the rest • replace labels with variables (?D?, ?R?) • A BOA pattern is a pair P = (μ(p), θ), where μ(p) is p’s URI and θ is a natural language representation of p. • A BOA pattern mapping is a function M such that M(p) = S , where S is the set of natural language representations for p. • Occurrences, sentences, labels p is learned from, number of occurrences for each label combination

Pattern Scoring • Pattern Filtering: Length, Stop Words, Occurrence • Support: used across several triples in background knowledge • Typicity: allows to map ?D?, ?R? to entities with rdf:type of domain/range of p • Specificity: used exclusively to express p, IDF adopted to patterns • ( Similarity: how similar is a pattern to label of predicate) • Combine Support, Typicity, Specificity to calculate local maximum

RDF Generation • use top-n pattern for each relation • find sentences which contain pattern • NER-tag sentence • look for token’s classes which match domain/range • extract labels • URI retrieval above threshold do not create new URI

http://139.18.2.164:8080/boa

Evaluation • Corpora • en_wiki (44.7M), en_news (256.1M) • Background Knowledge • Organisation, Place and Person (283 relations from 1 to 471920 triples) • Parameters • top1,2 pattern, kappa, 500 sentences for Typicity, 100 example sentences for 12 different KBs

Results

Examples Relation Top-2 Pattern URI en-wiki en-news Domain/Range foundationPerson 1. R , co-founder of D 1. R, the co-founder of D Organisation/Person 2. R , founder of D 2. R, founder of the D subsidiary 1. R , a subsidiary of D 1. R, a division of D Organisation/Organisati 2. D‘s acquisition of R 2. - (R , a division of D) on 1. D has been named in the birthPlace 1.D was born in R R Person/PopulatedPlace 2. - (D , the mayor of R) 2. D, MP for R

Discussion • we can use patterns from wiki for every corpus • we create many new triples • we create correct triples • we need 15 minutes for one iteration • Q1 & Q2 answered with YES

Future Work • Train NER on DBpedia classes • Iteration 1+ • Human feedback • Pattern generalization • rdf:type extractor • Languages/Corpora • Webservices

Thank you! Questions?

References • [NDA+11] • Ndapandula Nakashole, Martin Theobald, and Gerhard Weikum. Scalable knowledge harvesting with high precision and high recall. In WSDM, pages 227 – 236, Hong Kong, 2011. • [CAR+10] • Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. Toward an architecture for never-ending language learning. In AAAI, 2010.

BOA Bootstrapping the Linked Data Web Daniel Gerber, Axel-Cyrille - PowerPoint PPT Presentation

http://aksw.org/files/boa.pdf BOA Bootstrapping the Linked Data Web Daniel Gerber, Axel-Cyrille Ngonga Ngomo AKSW, Universitt Leipzig http://www.volunteer-conservation-peru.org Motivation most knowledge bases extracted from

Boa Meets Python: A Boa Dataset of Data Science Software in Python Language Sumon Biswas , Md

L e g a l Issue s for Boa rds: Ne w Boa rd Me mbe r T ra ining , Pa rt 2 T ra vis Powe ll, Da

Jamaica, Queens June 13, 2018 Timeline 2004 - Initial Grant Application to BOA Program

A PAPER PRESENTATION ON FINANCIAL SERVICES TO PROMOTE MECHANIZATION BY BANK OF AGRICULTURE

Hand and Wrist BOA Instructional Course Manchester 2019 Prof David Warwick MD FRCS FRCS(Orth)

S a n J o s e C i t y C o l l e g e Boa r d R e vie w 12.08.2015 A FA C I LI TI ES M A S

ICM Traineeships Are we ready for it? Ana Cecilia Boa-Ventura, MA & Rita Cadima, PhD

Basic Ordering Agreement (BOA) William McKenna Chief, EAGLE Business Office Army Sustainment

Brazilian Culture Prof. Emanuelle Oliveira Department of Spanish and Portuguese

LR LRTP Update to o th the TAC & MIC IC Boa oard August 20 & 21, 2019 Status Update

PRESENTATION TO THE CITY COUNCIL, CITY OF GLEN COVE Step III BOA Implementation Strategy for

What Matte rs Mo st Pre se nta tion to the L ora in County Community Colle g e Distric t Boa rd

Mi Missouri ri Asses essmen ent Part rtner ership Update Patton onville Boa Board of E of

BoA Securities 2020 Energy Credit Conference June 4, 2020 Legal Disclaimer This communication

Boa Board rd of of Re Rege gents Re Regu gular r Mee eetin ing Presidents Report

BOA BOARD RD OF OF EDUCATION EDUCATION Proposed 2016-2017 School Budget Presented by: Mark

Towards Bootstrapping a Polarity Shifter Lexicon using Linguistic Features Marc Schulder Michael

Bootstrapping Labels for One-Hundred Million Images Jimmy Whitaker We are drowning in data Data

Community Shares Booster Programme Good Finance Live! GMCVO 27 June 2019 What were going to

Todd McCaskeys Senior Thesis The Pennsylvania State University Architectural Engineering

Sensors development for under sea water study Dr. Amporn Poyai Thai Microelectronics Center

UNRAVELLING AND GUIDING THE MOLECULAR SELF-ASSEMBLY ON SURFACES An Ver Heyen February 2008

A reference for the covariant Hamiltonian boundary term James M. Nester nester@phy.ncu.edu.tw

Multi GPU, Interactive 3D Simulator for the Lattice Boltzmann Immersed Boundary Method Bob Zigon

BOA Bootstrapping the Linked Data Web Daniel Gerber, Axel-Cyrille - PowerPoint PPT Presentation

http://aksw.org/files/boa.pdf BOA Bootstrapping the Linked Data Web Daniel Gerber, Axel-Cyrille Ngonga Ngomo AKSW, Universitt Leipzig http://www.volunteer-conservation-peru.org Motivation most knowledge bases extracted from

Boa Meets Python: A Boa Dataset of Data Science Software in Python Language Sumon Biswas , Md

L e g a l Issue s for Boa rds: Ne w Boa rd Me mbe r T ra ining , Pa rt 2 T ra vis Powe ll, Da

Jamaica, Queens June 13, 2018 Timeline 2004 - Initial Grant Application to BOA Program

A PAPER PRESENTATION ON FINANCIAL SERVICES TO PROMOTE MECHANIZATION BY BANK OF AGRICULTURE

Hand and Wrist BOA Instructional Course Manchester 2019 Prof David Warwick MD FRCS FRCS(Orth)

S a n J o s e C i t y C o l l e g e Boa r d R e vie w 12.08.2015 A FA C I LI TI ES M A S

ICM Traineeships Are we ready for it? Ana Cecilia Boa-Ventura, MA &amp; Rita Cadima, PhD

Basic Ordering Agreement (BOA) William McKenna Chief, EAGLE Business Office Army Sustainment

Brazilian Culture Prof. Emanuelle Oliveira Department of Spanish and Portuguese

LR LRTP Update to o th the TAC &amp; MIC IC Boa oard August 20 &amp; 21, 2019 Status Update

PRESENTATION TO THE CITY COUNCIL, CITY OF GLEN COVE Step III BOA Implementation Strategy for

What Matte rs Mo st Pre se nta tion to the L ora in County Community Colle g e Distric t Boa rd

Mi Missouri ri Asses essmen ent Part rtner ership Update Patton onville Boa Board of E of

BoA Securities 2020 Energy Credit Conference June 4, 2020 Legal Disclaimer This communication

Boa Board rd of of Re Rege gents Re Regu gular r Mee eetin ing Presidents Report

BOA BOARD RD OF OF EDUCATION EDUCATION Proposed 2016-2017 School Budget Presented by: Mark

Towards Bootstrapping a Polarity Shifter Lexicon using Linguistic Features Marc Schulder Michael

Bootstrapping Labels for One-Hundred Million Images Jimmy Whitaker We are drowning in data Data

Community Shares Booster Programme Good Finance Live! GMCVO 27 June 2019 What were going to

Todd McCaskeys Senior Thesis The Pennsylvania State University Architectural Engineering

Sensors development for under sea water study Dr. Amporn Poyai Thai Microelectronics Center

UNRAVELLING AND GUIDING THE MOLECULAR SELF-ASSEMBLY ON SURFACES An Ver Heyen February 2008

A reference for the covariant Hamiltonian boundary term James M. Nester nester@phy.ncu.edu.tw

Multi GPU, Interactive 3D Simulator for the Lattice Boltzmann Immersed Boundary Method Bob Zigon

ICM Traineeships Are we ready for it? Ana Cecilia Boa-Ventura, MA & Rita Cadima, PhD

LR LRTP Update to o th the TAC & MIC IC Boa oard August 20 & 21, 2019 Status Update