Exploration of a Threshold for Similarity based on Uncertainty in - PowerPoint PPT Presentation

Dec 15, 2022 •801 likes •942 views

Exploration of a Threshold for Similarity based on Uncertainty in Word Embedding Navid Rekabsaz, Mihai Lupu, Allan Hanbury @NRekabsaz rekabsaz@ifs.tuwien.ac.at European Conference of Information Retrieval (ECIR) Aberdeen, April 2017 Word

Exploration of a Threshold for Similarity based on Uncertainty in Word Embedding Navid Rekabsaz, Mihai Lupu, Allan Hanbury @NRekabsaz rekabsaz@ifs.tuwien.ac.at European Conference of Information Retrieval (ECIR) Aberdeen, April 2017
Word Embedding journalist dwarfish reporter 0.78 corpulent 0.44 freelance_journalist 0.74 hideous 0.43 investigative_journalist 0.74 unintelligent 0.42 photojournalist 0.73 wizened 0.42 correspondent 0.71 catoblepas 0.42 investigative_reporter 0.68 creature 0.42 writer 0.64 humanoid 0.41 freelance_reporter 0.63 grotesquely 0.41 newsman 0.61 tomtar 0.41 2
Uncertainty Uncertainty:
Similarity Probability Distribution • Similarity between terms as probability distribution • Normal distribution on observed similarities of 5 ‘identical’ models
Cumulative Similarity Distributions Y axes: Expected number of neighbors in a similarity value, averaged over 100 terms
Filtering Neighbors What is the best threshold for filtering the related terms? Hypothesis: it can be estimated based on the average number of synonyms over the terms What is the expected number of synonyms for a word in English? 147306 # of terms: Average # of synonyms per term: 1.6 Standard deviation : 3.1
Threshold Proposed Threshold: cumulative frequency equal to 1.6
Integrating Similarity in IR Models Generalizing Translation Models in the Probabilistic Relevance Framework Rekabsaz et al., CIKM 2016 8
Experiments Results • Gain of MAP over standard BM25, averaged on collections. • Optimal threshold is either the same or in the confidence interval of the proposed threshold.
Take Home Message WE OBSERVED • Uncertainty in similarity value of neural network word embedding models: • depends on similarity range • depends on dimensionality WE PROPUSE • Threshold to filter most similar terms : • Proposed threshold as good as optimal threshold
Come for a chat! @NRekabsaz rekabsaz@ifs.tuwien.ac.at
Threshold vs. TopN • Conclusion2: Threshold outperforms TopN Threshold-based TopN

Recommend

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

SemEval 2014 Task-3 Cross-Level Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly focused on similar types of lexical items Semantic Similarity What if we have different types of inputs? CLSS:

971 views • 47 slides

Watershed Below TMDL Threshold At TMDL Threshold Above TMDL Threshold Water Quality Overview

Nantucket Harbor Watershed Below TMDL Threshold At TMDL Threshold Above TMDL Threshold Water Quality Overview Routine monitoring: June-September. Water quality, dissolved oxygen, algae, eelgrass. Work with State agencies to

631 views • 19 slides

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic Similarity; how similar are a pair of lexical items? Semantic Similarity Semantic Similarity Semantic Similarity Sentence level

1.42k views • 127 slides

Time- -dependent Similarity Measure dependent Similarity Measure Time Time-dependent Similarity

Time- -dependent Similarity Measure dependent Similarity Measure Time Time-dependent Similarity Measure of Queries Using Historical Click- - of Queries Using Historical Click- of Queries Using Historical Click through Data through Data

481 views • 23 slides

Threshold Implementations Svetla Nikova Threshold Implementations A provably secure

Threshold Implementations Svetla Nikova Threshold Implementations A provably secure countermeasure Against (first) order power analysis based on multi party computation and secret sharing 2 Outline Threshold Implementations

990 views • 58 slides

Greenhouse Gas CEQA Greenhouse Gas CEQA Significance Threshold Significance Threshold

Greenhouse Gas CEQA Greenhouse Gas CEQA Significance Threshold Significance Threshold Stakeholder Working Group # 3 Stakeholder Working Group # 3 June 19, 2008 SCAQMD Diamond Bar, California GHG Significance Threshold GHG Significance

691 views • 12 slides

Enter the Threshold The NIST Threshold Cryptography Project National Institute of Standards and

Enter the Threshold The NIST Threshold Cryptography Project National Institute of Standards and Technology NIST Threshold Cryptography Workshop 2019 (#NTCW2019) March 11, 2019 @ NIST campus, Gaithersburg MD, USA Contact email:

1.21k views • 80 slides

Threshold resummation far from threshold GGI, Firenze, September 7 th , 2011 Giovanni Ridolfi

Threshold resummation far from threshold GGI, Firenze, September 7 th , 2011 Giovanni Ridolfi Universit` a di Genova and INFN Genova, Italy Plan of the talk: 1. When is threshold resummation relevant? 2. Ambiguities in resummed results

939 views • 69 slides

Polynomial threshold functions and Boolean threshold circuits Kristoffer Arnsfelt Hansen 1

Polynomial threshold functions and Boolean threshold circuits Kristoffer Arnsfelt Hansen 1 Vladimir V. Podolskii 2 1 Aarhus University 2 Steklov Mathematical Institute MFCS 2013 1 / 27 Boolean Threshold Functions Boolean function f : { a , b } n

927 views • 44 slides

Unification of CSC and SE ABET Effor ts Similarity of CSC and SE Programs Similarity of CSC and

Unification of CSC and SE ABET Effor ts Similarity of CSC and SE Programs Similarity of CSC and SE Programs Curr icula are 85-90% the same. Similarity of CSC and SE Programs Curr icula are 85-90% the same. Its appropr iate

664 views • 39 slides

LECTURE 4 Similarity and Distance Recommender Systems SIMILARITY AND DISTANCE Thanks to: Tan,

DATA MINING LECTURE 4 Similarity and Distance Recommender Systems SIMILARITY AND DISTANCE Thanks to: Tan, Steinbach, and Kumar, Introduction to Data Mining Rajaraman and Ullman, Mining Massive Datasets Similarity and Distance

774 views • 52 slides

I/O-EFFICIENT SIMILARITY JOIN R. Pagh, N. Pham, F. Silvestri, M. Stckel Similarity Join R = Q

I/O-EFFICIENT SIMILARITY JOIN R. Pagh, N. Pham, F. Silvestri, M. Stckel Similarity Join R = Q . / r S = { ( q, x ) Q S | d ( q, x ) r } Similarity join example 1 (Record linkage) Country Name Token ID USA IBM Mircosoft 1 USA

398 views • 27 slides

COMP9313: Big Data Management High Dimensional Similarity Search Similarity Search Problem

COMP9313: Big Data Management High Dimensional Similarity Search Similarity Search Problem Definition: Given a query and dataset , find o , where is similar to Two types of similarity search

2.14k views • 61 slides

DATA MINING LECTURE 4 Similarity and Distance Recommender Systems SIMILARITY AND DISTANCE

665 views • 52 slides

DATA MINING LECTURE 5 Similarity and Distance Sketching, Locality Sensitive Hashing SIMILARITY

DATA MINING LECTURE 5 Similarity and Distance Sketching, Locality Sensitive Hashing SIMILARITY AND DISTANCE Thanks to: Tan, Steinbach, and Kumar, Introduction to Data Mining Rajaraman and Ullman, Mining Massive Datasets Similarity

474 views • 46 slides

Meta-Reinforcement Learning of Structured Exploration Strategies Abhishek Gupta , Russell

Meta-Reinforcement Learning of Structured Exploration Strategies Abhishek Gupta , Russell Mendonca, YuXuan Liu, Pieter Abbeel, Sergey Levine Human Exploration vs Robot Exploration Human Exploration vs Robot Exploration Human Exploration vs Robot

1.2k views • 30 slides

1 A statistical definition of probability: frequentist 2 concepts: 1. Sample space , S , is the

Probability and Likelihood, a brief introduction in support of a course on molecular evolution (BIOL 3046) Probability The subject of probability is a branch of mathematics dedicated to building models to describe conditions of uncertainty and

559 views • 12 slides

Data Streams Tutorial Andrew McGregor University of Massachusetts, Amherst Data Stream Model

Data Streams Tutorial Andrew McGregor University of Massachusetts, Amherst Data Stream Model [Morris 78] [Munro, Paterson 78] [Flajolet, Martin 85] [Alon, Matias, Szegedy 96] [Henzinger, Raghavan, Rajagopalan 98] Data Stream

1.6k views • 135 slides

EDP 613 Fall 2020 Chapter 2 Slides Abhik Roy Abhik.Roy@mail.wvu.edu West Virginia University

EDP 613 Fall 2020 Chapter 2 Slides Abhik Roy Abhik.Roy@mail.wvu.edu West Virginia University EDP 613 Fall 2020 p. 1/26 Always Remember! CORRELATION DOES NOT PROVE CAUSATION! EDP 613 Fall 2020 p. 2/26 Example Construct a frequency

504 views • 26 slides

Effects of a Time-Varying String Tension & String Repulsion in Momentum Space Tau-dependent

Effects of a Time-Varying String Tension & String Repulsion in Momentum Space Tau-dependent string tension with N. Hunt-Smith Publication in preparation Physics motivations? A primitive model Results: Strangeness-pT correlations String

563 views • 31 slides

Random-Variate Generation Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

Chapter 8 Random-Variate Generation Banks, Carson, Nelson & Nicol Discrete-Event System Simulation Purpose & Overview Develop understanding of generating samples from a specified distribution as input to a simulation model.

565 views • 21 slides

Effect of ivabradine on recurrent hospitalization for worsening heart failure: findings from

S ystolic H eart failure treatment with the I f inhibitor ivabradine T rial Effect of ivabradine on recurrent hospitalization for worsening heart failure: findings from SHIFT Jeffrey S Borer on behalf of M Bhm, I Ford, M Komajda, L Tavazzi,

499 views • 17 slides

Emergent Structure Models: Applications to World Politics Prof. Lars-Erik Cederman Center for

Introduction to Computational Modeling of Social Systems Emergent Structure Models: Applications to World Politics Prof. Lars-Erik Cederman Center for Comparative and International Studies (CIS) Seilergraben 49, Room G.2, lcederman@ethz.ch

725 views • 36 slides

Chapter 4 IMAGE PROC IMAGE PROCESSIN ESSING ILWIS for Windows contains a set of image

Chapter 4 IMAGE PROC IMAGE PROCESSIN ESSING ILWIS for Windows contains a set of image processing tools for enhancement and analysis of data from space borne or airborne platforms. In this chapter, the routine applications such as image

1.22k views • 88 slides