Representing a concept by the distribution of names of its instances - PowerPoint PPT Presentation

Existing data/model we use ● The Instantiation dataset (Boleda, Gupta, and Padó, 2017, EACL) : – e.g., <Emmy Noether, scientist>, <Edinburgh, capital> – derived from WordNet’s ‘instance hyponym’ relation. 52

Existing data/model we use ● The Instantiation dataset (Boleda, Gupta, and Padó, 2017, EACL) : – e.g., <Emmy Noether, scientist>, <Edinburgh, capital> – derived from WordNet’s ‘instance hyponym’ relation. ● We focus on the 159 categories that have at least 5 entities. 53

Existing data/model we use ● The Instantiation dataset (Boleda, Gupta, and Padó, 2017, EACL) : – e.g., <Emmy Noether, scientist>, <Edinburgh, capital> – derived from WordNet’s ‘instance hyponym’ relation. ● We focus on the 159 categories that have at least 5 entities. ● As DS representations of the entities’ names and categories’ predicates we use the Google News embeddings (Mikolov, Sutskever, et al., 2013, ANIPS) . 54

Evaluation: gathering human judgments 55

Evaluation: gathering human judgments Following Bruni, Tran and Baroni’s MEN benchmark (2012, JAIR): 56

Evaluation: gathering human judgments Following Bruni, Tran and Baroni’s MEN benchmark (2012, JAIR): ● We semi-randomly sampled 1000 category pairs (out of 12.5K). 57

Evaluation: gathering human judgments Following Bruni, Tran and Baroni’s MEN benchmark (2012, JAIR): ● We semi-randomly sampled 1000 category pairs (out of 12.5K). ● ‘Comparative’ task : which pair of categories are more related to each other? 58

Evaluation: gathering human judgments Following Bruni, Tran and Baroni’s MEN benchmark (2012, JAIR): ● We semi-randomly sampled 1000 category pairs (out of 12.5K). ● ‘Comparative’ task : which pair of categories are more related to each other? ● Also same way of computing aggregated ‘relatedness’ scores. 59

Crowdsource task 60

Main result 61

Main result ● Spearman (ranking) correlations between: 62

Main result ● Spearman (ranking) correlations between: – cosine similarities from Name-based / Predicate-based and – aggregate scores from our human judgments 63

Main result ● Spearman (ranking) correlations between: – cosine similarities from Name-based / Predicate-based and – aggregate scores from our human judgments ● Result: – Predicate-based: 0.56 64

Main result ● Spearman (ranking) correlations between: – cosine similarities from Name-based / Predicate-based and – aggregate scores from our human judgments ● Result: – Predicate-based: 0.56 – Name-based: 0.74 65

Artist’s impression 66

Artist’s impression 67

How many names do we need? 68

How many names do we need? 69

How many names do we need? S u r p r i s i n g l y f e w ! 70

Entities need to be representative 71

Entities need to be representative ● E.g., the Name-based model overestimates surgeon ~ siege ... 72

Entities need to be representative ● E.g., the Name-based model overestimates surgeon ~ siege ... ● Instances of surgeon in the Instantiation dataset: – William Cowper – James Parkinson – Alexis Carrel – Walter Reed – William Beaumont – Joseph Lister 73

Entities need to be representative ● E.g., the Name-based model overestimates surgeon ~ siege ... ● Instances of surgeon in the Instantiation dataset: – William Cowper – James Parkinson – Alexis Carrel – Walter Reed I n v o l v e d i n WW1 – William Beaumont – Joseph Lister 74

Entities need to be representative ● E.g., the Name-based model overestimates surgeon ~ siege ... ● Instances of surgeon in the Instantiation dataset: – William Cowper – James Parkinson – Alexis Carrel – Walter Reed I n v o l v e d i n WW1 – William Beaumont M e mb e r s o f U S mi l i t a r y c o r p s – Joseph Lister 75

Entities need to be representative ● E.g., the Name-based model overestimates surgeon ~ siege ... ● Instances of surgeon in the Instantiation dataset: – William Cowper – James Parkinson Wr o t e “ t h e s i e g e o f c h e s t e r ” ( ? ) – Alexis Carrel – Walter Reed I n v o l v e d i n WW1 – William Beaumont M e mb e r s o f U S mi l i t a r y c o r p s – Joseph Lister 76

Discussion 77

Discussion 78

Discussion ● Main finding: 79

Discussion ● Main finding: – Name-based representations of category concepts align better with ‘the world’ than Predicate-based representations. 80

Discussion ● Main finding: – Name-based representations of category concepts align better with ‘the world’ than Predicate-based representations. – Even a small number of (representative) names can be enough. 81

Discussion ● Main finding: – Name-based representations of category concepts align better with ‘the world’ than Predicate-based representations. – Even a small number of (representative) names can be enough. ● Outlook: – Not every category has named instances... 82

Discussion ● Main finding: – Name-based representations of category concepts align better with ‘the world’ than Predicate-based representations. – Even a small number of (representative) names can be enough. ● Outlook: – Not every category has named instances... – NLP relevance? Vs. sense disambiguation? Contextualized word embeddings (ELMo, BERT, …)? 83

Discussion ● Main finding: – Name-based representations of category concepts align better with ‘the world’ than Predicate-based representations. – Even a small number of (representative) names can be enough. ● Outlook: – Not every category has named instances... – NLP relevance? Vs. sense disambiguation? Contextualized word embeddings (ELMo, BERT, …)? – Cognitive relevance? E.g., prototype theory? 84

Acknowledgments This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 715154). This paper reflects the authors’ view only, and the EU is not responsible for any use that may be made of the information it contains. 85

Image sources https://ui-ex.com/explore/whale-transparent-dark/ https://commons.wikimedia.org/wiki/File:Cowicon.svg https://commons.wikimedia.org/wiki/File:Bird_1010720_drawing.svg https://commons.wikimedia.org/wiki/File:Dog_silhouette.svg https://commons.wikimedia.org/wiki/File:Cat_silhouette_darkgray.svg https://commons.wikimedia.org/wiki/File:Frog_(example).svg https://commons.wikimedia.org/wiki/File:PeregrineFalconSilhouettes.svg https://commons.wikimedia.org/wiki/File:Common_goldfish_silhouette.svg https://commons.wikimedia.org/wiki/File:Six_weeks_old_cat_(aka).jpg https://nl.m.wikipedia.org/wiki/Bestand:Kooikerhondje_puppy.jpg https://nl.m.wikipedia.org/wiki/Bestand:Golden_Retriever_eating_crust_of_pizza.jpg https://commons.wikimedia.org/wiki/File:Cat-eating-prey.jpg 86

Where are predicates and names, anyway? name predicate 87

Where are predicates and names, anyway? name predicate 88

Crowdsource task 89

Crowdsource task 90

Crowdsource task instructions 91

Crowdsource task instructions 92

Why definitions? 93

Why definitions? ● The same words can often be used to denote various categories. 94

Why definitions? ● The same words can often be used to denote various categories. ● To properly evaluate the Name-based approach, the human judgments should be about the categories as intended by the Instantiation dataset we use. 95

Why definitions? ● The same words can often be used to denote various categories. ● To properly evaluate the Name-based approach, the human judgments should be about the categories as intended by the Instantiation dataset we use. ● (Would be good practice more generally – e.g., vs. the good subject effect. ) 96

Why definitions? ● The same words can often be used to denote various categories. ● To properly evaluate the Name-based approach, the human judgments should be about the categories as intended by the Instantiation dataset we use. ● (Would be good practice more generally – e.g., vs. the good subject effect. ) ● This may give the Predicate-based approach a disadvantage… 97

Why definitions? ● The same words can often be used to denote various categories. ● To properly evaluate the Name-based approach, the human judgments should be about the categories as intended by the Instantiation dataset we use. ● (Would be good practice more generally – e.g., vs. the good subject effect. ) ● This may give the Predicate-based approach a disadvantage… – but this disadvantage is not an unfair one. 98

A closer look per ontological domain 99

A closer look per ontological domain Predicate -based: 100

Representing a concept by the distribution of names of its instances - PowerPoint PPT Presentation

Representing a concept by the distribution of names of its instances Matthijs Westera, Gemma Boleda and Sebastian Pad Representing a concept by the distribution of names of its instances A b h i j e e t G u p t a & Matthijs

Welcome! Org. Names Org. Names Org. Names Org. Names Technical Set-up Denver Art

Welcome! Org. Names Org. Names Org. Names Org. Names TFGH Dave Ross GHC3 Robert Aaron

Presentation Last Names A-E Ms. Kennair Last Names F-L Ms. Fornera Last Names M-R Ms. Tippins

Web Hosting and Domain Names Introduction to Web Design Web Hosting and Domain Names

AAA Showcase! Who is my counselor? Last Names A-EL: Mr. Melvin Last Names EM-LEE: Ms. Tauer

Web Development Web Hosting and Domain Names CSCI-GA 1122 Web Development Web Hosting and

The Base Names The Base Names Elohim Elohim El El Shaddai El El Shaddai Shaddai Shaddai

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

103 GENUINE MARKETING THOUGHT LEADERS 2 BIG NAMES WORTH KNOWING Not all big names are true

Senior College Information Night Click to access link All Counselors, All Grades! Mr. Childress

Encoding names and named entities Magdalena Turska 28 June 2014 1/34 Names, People, and Places

Criminal Use of Domain Names Greg Aaron, Illumintel Colin Strutt, Interisle Consulting Group 1

Register Names lecture 9 MIPS assembly language 2 - register names - pseudo instructions

CSC 1800 Organization of Programming Languages Scope 1 Scope and Names Scope determines

More Events CS 51 and CSCI E-51 April 5, 2014 . Road map The concept Using events

On representing semantic maps On representing semantic maps Ferdinand de Haan Ferdinand de Haan

Initial Consultation Meeting with the State of Nevada Working together to bring broadband to Public

A Financial Simplification Strategy Sponsored by the VC - Chief Financial Officer Pierre Ouillet

Managing Electronic Data in FCPA Investigations Michael Lackey, Jr. Todd M. Haley James T.

Council Meeting September 23, 2019 Council Development Review/adopt minutes Metro bonds

AIRS SCIENCE TEAM MEETING ATTENDEES AND DISTRIBUTION LIST AIRS SCIENCE TEAM * = AIRS SCIENCE

Separation Logics for Pointer Programs James Brotherston Lorentz Center Workshop on Effective

GAME SEMANTICS FOR INTERFACE MIDDLEWEIGHT JAVA Andrzej S. Murawski Steven J. Ramsay WARWICK

Course Overview 02-223 How to Analyze Your Own Genome

Sambuz

Useful Links

Newsletter

Mail Us