documents using Data Fusion by Hamed Rezanejad Outline - PowerPoint PPT Presentation

Ranking Segmented documents using Data Fusion by Hamed Rezanejad

Outline • Description of the problem • Motivation/Importance • Methodology • Experimental results • Demo • Conclusion/future work 2

Description Text Ranked Results Query Collection 1. 2. Document 1 3. 4. Document 2 Ranking … Function … N. Document N 3

Description • Order of retrieved documents is very important • Generally, Size of documents differs compare to each other. • Each document has different segments discussing different issues • Using these segments can help us to have better order of retrieved documents 4

Motivation/Importance • Passage Retrieval  Unit of retrieval is blocks of text from the stored document  Current IR systems are used for indexing a great variety of documents .  For big size documents , standard ranking is not of value.  Tracking topics in information feeds , is a case that standard ranking has nothing to do. 5

Motivation/Importance • Data Fusion  Accepts two or more ranked lists and merges these lists into a single ranked list Aim of data fusion: 1. Providing a better effectiveness than all systems used for data fusion. 2. Grouping existing search services under one umbrella . 6

Methodology Data Fusion Document Query 1 Results R(1,1) Passage 1 R(1,2) Relevance Rank score Measurement of Passage 2 … Document using K 1 different IRSs R(1,M) … IRS 1 … IRS 2 Passage M IRS 3 R(n, M) … IRS n 7

Methodology Document # Passages Ranks of Final rank passages 1 2 1, 3 1.58 2 3 2, 6, 7 4.033 3 2 9, 10 6.49 4 4 4, 5, 8, 11 5.39 ∑log(𝑠𝑏𝑜𝑙) Final Rank = log(#𝑞𝑏𝑡𝑡𝑏𝑕𝑓𝑡) 8

Experimental Results • I have used Indri from Lemur Project • The project's first product was the Lemur Toolkit, a collection of software tools and search engines designed to support research on using statistical language models for information retrieval tasks. • Later the project added the Indri search engine for large-scale search • I have used TREC vol. 4 as dataset. 9

Experimental Results • Indri provides the QueryEnvironment and IndexEnvrionment classes, which can be used from C++, Java, C# or PHP • QueryEnvironment allows you to run queries and retrieve a ranked list of results. • IndexEnvironment understands many different file types. – TREC formatted documents, HTML documents, text documents, and PDF files , … 11

Demo & Future Works <document> 1. Treat each section 0.15 <section><head>Introduction</head> extent as a “document” Statistical language modeling allows formal methods to be applied to information retrieval. ... 2. Score each “document” </section> 0.50 according to query <section><head>Multinomial Model</head> Here we provide a quick review of multinomial language models. 3. Return a ranked list of ... extents. </section> 0.05 <section><head>Multiple-Bernoulli Model</head> We now examine two formal methods for statistically modeling documents and queries based on the multiple-Bernoulli distribution. SCORE DOCID BEGIN END ... 0.50 IR-352 51 205 </section> … 0.35 IR-352 405 548 </document> 0.15 IR-352 0 50 … … … … 12

documents using Data Fusion by Hamed Rezanejad Outline - PowerPoint PPT Presentation

Ranking Segmented documents using Data Fusion by Hamed Rezanejad Outline Description of the problem Motivation/Importance Methodology Experimental results Demo Conclusion/future work 2 Description Text Ranked Results

Searching Documents and Pages Searching Documents and Pages Searching Documents and Pages Prof.

information retrieval find documents find documents in response to user query find

Page 1 of 19 mhtml:file://C:\Documents and Settings\Administrator\My Documents\My

Transforming XML Documents Transforming XML Documents How the XSLT language transforms XML

XML Walking the Tree Modifying the Tree Generating XML Documents Creating Documents Volker

Support for Semantic Documents in Protg Henrik Eriksson Linkping University Semantic

Document Management using Protg Henrik Eriksson Linkping University Approach: Semantic

Using animations within L A T EX documents Michael Doob Department of Mathematics University of

Lecture on Initial Documents Markus Roggenbach October 2011 Overview 2 Overview Initial

documents: the po documents: the power and er and ef effects ects of of pr prof ofessional

Tender Documents Specification of workmanship and materials 3. Bill of Quantities 4.

1 The tender documents must provide clear and The tender documents must provide clear and

C he c k Y o ur P DF : The Importance of Evaluating Your Documents Check Your PDF: The

Why the New Standard? Quick Side-Bar We Currently Have Lots of Guidance Documents, Why do

Searching String Collections for the Most Relevant Documents the Most Relevant Documents Wing

Dynamic Documents in Stata Bill Rising StataCorp LP 2016 Oceania Stata Users Group Meeting

How to build a revision plan What to revise How to lay out my plan How to get on with it Step

Intelligent Information Retrieval: Intelligent Information Retrieval: some research trends some

Multimedia Data Management M Second cycle degree programme (LM) in Computer Engineering

Golden Retriever an information retrieval project by Peter Peerdeman and Timen Olthof

FINANCIAL AID 101 How to prepare for financing college 1 How much does college cost How to

1.0 Introduction The HYDRA (High-Yield Dihydrogen-monoxide Retrieval Assembly) system was designed

Outline of Presentation MARS: Applying Multiplicative Introduction -- the vector model over

Interweaving THE ROLE OF RETRIEVAL, SPACING AND INTERLEAVING IN THE CURRICULUM BY MARK ENSER

Sambuz

Useful Links

Newsletter

Mail Us