The Impact of Distributional Metrics in the Quality of Relational - PowerPoint PPT Presentation

The Impact of Distributional Metrics in the Quality of Relational Triples calo Oliveira 1 , Paulo Gomes Hernani Costa, Hugo Gon¸ hpcosta@student.dei.uc.pt, { hroliv,pgomes } @dei.uc.pt Cognitive & Media Systems Group CISUC, University of Coimbra Lisbon, August 16, 2010 1supported by FCT scholarship grant SFRH/BD/44955/2008 Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 1 / 21

Outline Introduction 1 Information Extraction Information Retrieval Research Goals Approach 2 Experimentation 3 Set-up Metrics adaptation Results Additional experimentation Concluding remarks 4 Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 2 / 21

Introduction Introduction Knowledge bases (eg. WordNet) are useful resources for NLP Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 3 / 21

Introduction Introduction Knowledge bases (eg. WordNet) are useful resources for NLP Their creation and maintenance involves intensive human effort Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 3 / 21

Introduction Introduction Knowledge bases (eg. WordNet) are useful resources for NLP Their creation and maintenance involves intensive human effort Automatic creation/enrichment from textual resources is an alternative Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 3 / 21

Introduction Introduction Knowledge bases (eg. WordNet) are useful resources for NLP Their creation and maintenance involves intensive human effort Automatic creation/enrichment from textual resources is an alternative ▶ Higher coverage, easier update, but... Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 3 / 21

Introduction Introduction Knowledge bases (eg. WordNet) are useful resources for NLP Their creation and maintenance involves intensive human effort Automatic creation/enrichment from textual resources is an alternative ▶ Higher coverage, easier update, but... ▶ Precision is lower Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 3 / 21

Introduction Introduction Knowledge bases (eg. WordNet) are useful resources for NLP Their creation and maintenance involves intensive human effort Automatic creation/enrichment from textual resources is an alternative ▶ Higher coverage, easier update, but... ▶ Precision is lower ▶ Evaluation requires once again intensive human labour! Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 3 / 21

Introduction Information Extraction Information extraction (IE) Automatic extraction of structured information from natural language. Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 4 / 21

Introduction Information Extraction Information extraction (IE) Automatic extraction of structured information from natural language. “Car is a vehicle with 4 wheels and an engine, used for carrying a small number of passengers.” Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 4 / 21

Introduction Information Extraction Information extraction (IE) Automatic extraction of structured information from natural language. “Car is a vehicle with 4 wheels and an engine, used for carrying a small number of passengers.” ▶ vehicle HYPERNYM OF car Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 4 / 21

Introduction Information Extraction Information extraction (IE) Automatic extraction of structured information from natural language. “Car is a vehicle with 4 wheels and an engine, used for carrying a small number of passengers.” ▶ vehicle HYPERNYM OF car ▶ wheel PART OF car ▶ engine PART OF car Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 4 / 21

Introduction Information Extraction Information extraction (IE) Automatic extraction of structured information from natural language. “Car is a vehicle with 4 wheels and an engine, used for carrying a small number of passengers.” ▶ vehicle HYPERNYM OF car ▶ wheel PART OF car ▶ engine PART OF car ▶ carrying people PURPOSE OF car Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 4 / 21

Introduction Information Retrieval Information retrieval (IR) Locating specific information in natural language resouces. Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 5 / 21

Introduction Information Retrieval Information retrieval (IR) Locating specific information in natural language resouces. Approaches based on the occurrence of words in documents. Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 5 / 21

Introduction Information Retrieval Information retrieval (IR) Locating specific information in natural language resouces. Approaches based on the occurrence of words in documents. Distributional similarity metrics ▶ Cocitation (Small (1973)) ▶ LSA (Deerwester et al. (1990)) ▶ Lin’s (Lin (1998)) ▶ PMI-IR (Turney (2001)) ▶ 휎 (Kozima and Furugori (1993)) ▶ ... Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 5 / 21

Introduction Research Goals Goals 1 Use IR metrics to improve IE precision ▶ Adapt distributional metrics to determine words similarity Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 6 / 21

Introduction Research Goals Goals 1 Use IR metrics to improve IE precision ▶ Adapt distributional metrics to determine words similarity ▶ Wandmacher et al. (2007) and Cederberg and Widdows (2003) used LSA to weight hypernymy triples Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 6 / 21

Introduction Research Goals Goals 1 Use IR metrics to improve IE precision ▶ Adapt distributional metrics to determine words similarity ▶ Wandmacher et al. (2007) and Cederberg and Widdows (2003) used LSA to weight hypernymy triples ▶ What about other semantic relations? Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 6 / 21

Introduction Research Goals Goals 1 Use IR metrics to improve IE precision ▶ Adapt distributional metrics to determine words similarity ▶ Wandmacher et al. (2007) and Cederberg and Widdows (2003) used LSA to weight hypernymy triples ▶ What about other semantic relations? ▶ What metrics should be used? Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 6 / 21

Introduction Research Goals Goals 1 Use IR metrics to improve IE precision ▶ Adapt distributional metrics to determine words similarity ▶ Wandmacher et al. (2007) and Cederberg and Widdows (2003) used LSA to weight hypernymy triples ▶ What about other semantic relations? ▶ What metrics should be used? ▶ New combined metrics? Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 6 / 21

Introduction Research Goals Goals 1 Use IR metrics to improve IE precision ▶ Adapt distributional metrics to determine words similarity ▶ Wandmacher et al. (2007) and Cederberg and Widdows (2003) used LSA to weight hypernymy triples ▶ What about other semantic relations? ▶ What metrics should be used? ▶ New combined metrics? 2 Help manual evaluation Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 6 / 21

Approach IE system Grammars Extraction of Corpus relational triples Removal of triples with stopwords Lemmatisation Additional Metrics extraction of triples application Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 7 / 21

Experimentation Set-up Experimentation set-up ublico 2 corpus (annotated version) CETEMP´ ▶ 28,000 documents ▶ 30,100 unique context words (nouns, verbs and adjectives) ▶ term-document matrix 2 http://www.linguateca.pt/cetempublico/ Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 8 / 21

Experimentation Set-up Experimentation set-up ublico 2 corpus (annotated version) CETEMP´ ▶ 28,000 documents ▶ 30,100 unique context words (nouns, verbs and adjectives) ▶ term-document matrix Triples obtained ▶ Extracted: 20,308 ▶ Discarded: 5,844 ▶ Inferred: 2,492 ▶ Final triple set: 16,956 2 http://www.linguateca.pt/cetempublico/ Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 8 / 21

Experimentation Metrics adaptation Similarity between two documents For instance, Cocitation: First presented as a similarity metric between scientific papers (Small (1973)) Cocitation ( d i , d j ) = P ( d i ∩ d j ) (1) P ( d i ∪ d j ) Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 9 / 21

Experimentation Metrics adaptation Similarity between two documents For instance, Cocitation: First presented as a similarity metric between scientific papers (Small (1973)) Cocitation ( d i , d j ) = P ( d i ∩ d j ) (1) P ( d i ∪ d j ) ▶ d i , d j represent two documents Costa, Gon¸ calo Oliveira & Gomes (CISUC) LaTeCH 2010 Lisbon, August 16, 2010 9 / 21

The Impact of Distributional Metrics in the Quality of Relational - PowerPoint PPT Presentation

The Impact of Distributional Metrics in the Quality of Relational Triples calo Oliveira 1 , Paulo Gomes Hernani Costa, Hugo Gon hpcosta@student.dei.uc.pt, { hroliv,pgomes } @dei.uc.pt Cognitive & Media Systems Group CISUC, University of

Distributional Semantics The unsupervised modeling of meaning on a large scale Tim Van de Cruys

What we learned from Community Metrics Agenda Why are metrics used? How metrics are used

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics

AGENCY OPERATIONS METRICS The Metrics of Me The Metrics of Me x 159 13,006 5 days old books

Proposal Metrics Dashboard What Gets Measured Gets Done Topics Why Keep Metrics? What

Software Metrics And I gnominy Software Metrics And I gnominy Software Metrics And I gnominy

Software Metrics Overview SE 350 Software Process & Product Quality Lecture Objectives

Distributional Compositionality Intro to Distributional Semantics Raffaella Bernardi University

Linear mixed models with improper priors and flexible distributional assumptions for longitudinal

Statistics and Samples in Distributional Reinforcement Learning Mark Rowland, Robert Dadashi,

Statistics and Samples in Distributional Reinforcement Learning Rowland, Dadashi, Kumar, Munos,

Compositional Distributional Semantic Models for Semantic Relatedness and Entailment Sidharth

Automatic construction of distributional thesaurus (for multiple languages) Zheng ZHANG 1 st

Software Metrics Alex Boughton Executive Summary What are software metrics? Why are

Astheno-Khler and strong KT General results metrics Bismut connection Definition of strong KT

NDCs and metrics Andrei Marcu , Director, ERCST 1 NDCs and metrics Main issues: - Which metrics

Multi-agent constrained optimization of a strongly convex function Necdet Serhat Aybat

INDUSTRIES INC. CORPORATE PRESENTATION Financial Results for Fiscal 2018 and Second Quarter

Technical Considerations for In-Beam Gamma-Ray Experiments at the RIBF P . Doornenbal

Managem ent of paediatric severe sepsis-a brief overview Presented by: Radu Botgros, MD EMA

Contents Company Introduction 01 DMDC Introduction 02 Quality Control 03 Honor &

Everywhere. First Aid products for work places Best function and quality on market

Check Requests Welcome! We will begin promptly at 9 a.m. Make sure your first and last

Non-Catalog Orders Welcome! We will begin at 1:30 p.m. Make sure your first and last

The Impact of Distributional Metrics in the Quality of Relational - PowerPoint PPT Presentation

The Impact of Distributional Metrics in the Quality of Relational Triples calo Oliveira 1 , Paulo Gomes Hernani Costa, Hugo Gon hpcosta@student.dei.uc.pt, { hroliv,pgomes } @dei.uc.pt Cognitive & Media Systems Group CISUC, University of

Distributional Semantics The unsupervised modeling of meaning on a large scale Tim Van de Cruys

What we learned from Community Metrics Agenda Why are metrics used? How metrics are used

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics

AGENCY OPERATIONS METRICS The Metrics of Me The Metrics of Me x 159 13,006 5 days old books

Proposal Metrics Dashboard What Gets Measured Gets Done Topics Why Keep Metrics? What

Software Metrics And I gnominy Software Metrics And I gnominy Software Metrics And I gnominy

Software Metrics Overview SE 350 Software Process &amp; Product Quality Lecture Objectives

Distributional Compositionality Intro to Distributional Semantics Raffaella Bernardi University

Linear mixed models with improper priors and flexible distributional assumptions for longitudinal

Statistics and Samples in Distributional Reinforcement Learning Mark Rowland, Robert Dadashi,

Statistics and Samples in Distributional Reinforcement Learning Rowland, Dadashi, Kumar, Munos,

Compositional Distributional Semantic Models for Semantic Relatedness and Entailment Sidharth

Automatic construction of distributional thesaurus (for multiple languages) Zheng ZHANG 1 st

Software Metrics Alex Boughton Executive Summary What are software metrics? Why are

Astheno-Khler and strong KT General results metrics Bismut connection Definition of strong KT

NDCs and metrics Andrei Marcu , Director, ERCST 1 NDCs and metrics Main issues: - Which metrics

Multi-agent constrained optimization of a strongly convex function Necdet Serhat Aybat

INDUSTRIES INC. CORPORATE PRESENTATION Financial Results for Fiscal 2018 and Second Quarter

Technical Considerations for In-Beam Gamma-Ray Experiments at the RIBF P . Doornenbal

Managem ent of paediatric severe sepsis-a brief overview Presented by: Radu Botgros, MD EMA

Contents Company Introduction 01 DMDC Introduction 02 Quality Control 03 Honor &amp;

Everywhere. First Aid products for work places Best function and quality on market

Check Requests Welcome! We will begin promptly at 9 a.m. Make sure your first and last

Non-Catalog Orders Welcome! We will begin at 1:30 p.m. Make sure your first and last

Software Metrics Overview SE 350 Software Process & Product Quality Lecture Objectives

Contents Company Introduction 01 DMDC Introduction 02 Quality Control 03 Honor &