1 Outline Overview of Kikori-KS Background Summary of our - PDF document

Outline � Background � Summary of our Contribution Kikori-KS: An Effective and � Kikori-KS Efficient Keyword Search System for Digital Libraries in XML � User Interfaces � Implementation of Keyword Search on Relational Databases � Ranking Model Toshiyuki Shimizu (Kyoto University) � Experiments Norimasa Terada (Nagoya University) � Conclusions, Future Works Masatoshi Yoshikawa (Kyoto University) ICADL 2006 29 th November 2 Background (2/2) Background (1/2) � Large number of documents in digital � For the keyword “database” libraries are now structured in XML : Element Node � Growing demand for XML Information : Text Value article Retrieval (XML-IR) Systems � We can identify meaningful document fragments transaction title body by encoding documents in XML section section � ex) Sections, subsections and paragraphs XML Index in scholarly articles database title p title p � Browsing only document fragments relevant to a certain topic Introduction XML Labeling …XML database � Keyword search on XML documents Query processing… � Simple, intuitively understandable, yet useful form of queries, especially for unskilled end-users � We do not need to understand XML query languages and XML schema 3 4 Outline Summary of our Contribution � Background � We have developed Kikori-KS � A prototype system for XML-IR � Summary of our Contribution � Under Kikori Project � Kikori-KS � Accepts Keyword Set as a query � User Interfaces � Implementation of Keyword Search on Relational � User-friendly interface Databases � Ranking Model � FetchHighlight interface � Experiments � Storage schema on RDB � Conclusions, Future Works � The database schema is carefully designed � Acceptable search time 5 6 1

Outline Overview of Kikori-KS � Background � Summary of our Contribution SQL Translation � Kikori-KS RDB Set of Keywords Module � User Interfaces � Implementation of Keyword Search on Relational Ranked relevant elements Databases Storage Module � Ranking Model End User � Experiments User Interface Search Results <?xml version=“1.0”?> � Conclusions, Future Works Module <document> ~ ~ </document> XML Documents 7 8 User Interfaces Outline � Search results of XML-IR are document � Background fragments, which may be nested � Summary of our Contribution � INEX 2005 project * defined three strategies � Kikori-KS for element retrieval � User Interfaces E 1 E 1 D 1 D 1 � Implementation of Keyword Search on Relational E 2 E 2 E 11 E 11 Databases E 3 E 3 E 12 E 111 E 13 E 112 : : � Ranking Model : : E 12 : D 2 � Experiments : E 21 D 2 ( E i does not E 22 E 21 � Conclusions, Future Works overlap with E j ) : : Thorough Focussed FetchBrowse FetchHighlight � Three strategies of INEX are not necessarily intended to be used in designing user interfaces 9 10 * http://inex.is.informatik.uni-duisburg.de/2005/ Retrieval Strategy of INEX (1/3) Retrieval Strategy of INEX (2/3) � Thorough � Focussed E 1-3 (0.6) E 1-3 (0.6) E 1-8 (0.5) E 1-8 (0.5) � The system retrieves only focussed � Relevant elements are retrieved in E 1-10 (0.4) E 2-10 (0.4) elements (i.e. non-overlapping descending order of their scores E 2-10 (0.4) elements) : � Ranked in relevance order : element score element score D 1 D 2 D 1 D 2 0.2 0.1 0.2 0.1 E 1-1 E 2-1 E 1-1 E 2-1 0 0.6 0.1 0 0 0.2 0 0.6 0.1 0 0 0.2 E 1-2 E 1-3 E 1-4 E 2-2 E 2-3 E 2-4 E 1-2 E 1-3 E 1-4 E 2-2 E 2-3 E 2-4 0 0.5 0 0.3 0 0.5 0 0.3 E 1-5 E 1-8 E 2-5 E 2-8 E 1-5 E 1-8 E 2-5 E 2-8 0 0 0.3 0.4 0 0 0.3 0.4 0 0 0 0.4 0 0 0 0.4 E 1-6 E 1-7 E 1-9 E 1-10 E 2-6 E 2-7 E 2-9 E 2-10 E 1-6 E 1-7 E 1-9 E 1-10 E 2-6 E 2-7 E 2-9 E 2-10 11 12 2

Retrieval Strategy of INEX (3/3) FetchHighlight � FetchBrowse � Fetching Phase � Displaying search result elements aggregated � The system first identifies relevant by XML documents is effective D 1 (0.2) documents and ranks them in E 1-3 (0.6) � FetchBrowse is of that style relevance order (0.5) E 1-8 � Browsing Phase � Displaying search result elements in their (0.4) E 1-10 � Within a fetched document, the : document order is useful D 2 (0.1) system identifies relevant elements and ranks them in relevance order E 2-10 (0.4) D 1 : document score E 11 � XML documents are first sorted in their D 1 0.2 D 2 0.1 E 111 relevance order element score 0.2 0.1 E 1-1 E 2-1 E 112 � Relevant elements within the XML 0 0.6 0.1 0 0 0.2 E 12 document are displayed in document : E 1-2 E 1-3 E 1-4 E 2-2 E 2-3 E 2-4 D 2 order E 21 0 0.5 0 0.3 � Elements are indented in accordance : E 1-5 E 1-8 E 2-5 E 2-8 with their depth in the XML tree FetchHighlight 0 0 0.3 0.4 0 0 0 0.4 E 1-6 E 1-7 E 1-9 E 1-10 E 2-6 E 2-7 E 2-9 E 2-10 13 14 FetchHighlight Interface Browsing Document Fragment Outline elements are displayed Elements with high score are Aggregated by Document order displayed by using a larger font document * Selected document fragment is Highlighted * Search words are Highlighted 15 16 The Feature of FetchHighlight Interface Outline � Focussed elements are easily identified � Background � Users can also recognize the parts in the � Summary of our Contribution documents with many high relevant elements � Kikori-KS clustered � User Interfaces � Implementation of Keyword Search on Relational � Outline elements Databases � Displayed even if the score is 0 � Ranking Model � The elements with particular structural information � Experiments � ex) such as sections and subsections � Conclusions, Future Works � Useful for browsing 17 18 3

Storing XML documents into RDB Conceptual Database Design � A huge number of document fragments have Path Element to be handled efficiently pathID pathexp docID elemID pathID start end label 1 /article 1 1 1 1 236 XML Index � ex) There are 16,080,830 document fragments 2 /article/transaction 1 2 2 10 44 database (elements) against 16,819 documents in the 1 3 3 45 68 XML Index 3 /article/title INEX 1.9 collection used in our experiments : : : : : : : : � Storage schema based on XRel Term 1 � Independent of the logical structure of XML term docID elemID tfipf article elemID database 1 1 0.3 documents. 2 3 4 transaction title body database 1 2 0.1 � Conceptual Database Design 5 8 : : : : section section XML Index � Element (docID, elemID, pathID, start, end, label) XML 1 1 0.3 9 10 6 7 database XML 1 3 0.4 � Path (pathID, pathexp) title p title p : : : : � Term (term, docID, elemID, tfipf) Introduction XML Labeling Query processing * label: short text representing the element * tfipf : term weight in the element 19 20 We explain XML database Schema Refinement (1/4) Schema Refinement (2/4) � Materialized view � Materialized view � Join Element table, Path table, and Term table � Join Element table, Path table, and Term table � Partitioning the Term table with each term Term Element Path � Term_xyz (docID, elemID, tfipf, start, end, label, pathexp) term docID elemID tfipf start end label pathexp database 1 1 0.3 1 236 XML Index /article database 1 2 0.1 10 44 database /article/transaction � Selecting outline elements and constructing : : : : : : : : an Outline table in advance XML 1 1 0.3 1 236 XML Index /article XML 1 3 0.4 45 68 XML Index /article/title � The system designer have to predefines outline : : : : : : : : elements � Outline (docID, elemID, start, end, label, pathexp) 21 22 Schema Refinement (3/4) Schema Refinement (4/4) � Partitioning the table by terms � Selecting outline elements 1 and constructing an article � Term_xyz (docID, elemID, tfipf, start, end, label, pathexp) elemID 2 3 4 Outline table in advance transaction title body � The system designer 5 8 predefine outline elements section section XML Index 10 9 6 7 Term_database database title p title p docID elemID tfipf start end label pathexp 1 1 0.3 1 236 XML Index /article Introduction XML Labeling 1 2 0.1 10 44 database /article/transaction Query processing : : : : : : : We explain XML database Term_XML � Outline (docID, elemID, start, end, label, pathexp) docID elemID tfipf start end label pathexp Outline 1 1 0.3 1 236 XML Index /article 1 3 0.4 45 68 XML Index /article/title docID elemID start end label pathexp : : : : : : : 1 5 75 143 Introduction /article/body/section 1 8 144 219 XML Labeling /article/body/section 23 24 4

1 Outline Overview of Kikori-KS Background Summary of our - PDF document

Outline Background Summary of our Contribution Kikori-KS: An Effective and Kikori-KS Efficient Keyword Search System for Digital Libraries in XML User Interfaces Implementation of Keyword Search on Relational Databases

Tool support for testing Chapter 6 1. Types of test tools 2. Effective use of test tools:

HOW TO QUIT YOUR JOB THROUGH THE POWER OF REAL ESTATE INVESTING a Webinar from BiggerPockets

How does it work? There are three roles in the process: Shopper, Releaser and Approver. In

L7 June 18, 2017 1 Lecture 7: Functions I CSCI 1360E: Foundations for Informatics and Analytics

Course Content IR, session 8 CS6200: Information Retrieval Slides by: Jesse Anderton Big

A Re q ue ste r s Pe rspe c tive Chie f F OI A Offic e rs Co unc il Me e ting July 27, 2017

INBOUND MARKETING What Is It? How Can It Help Your Business? Friday, June 6, 14 WHO IS THE

Mining the graph structures of the web Aristides Gionis Yahoo! Research, Barcelona, Spain, and

Question Answering and Reading Comprehension Kevin Duh Fall 2019, Intro to HLT, Johns Hopkins

Software Security Return to Libc and ROP Jan Nordholz Prof. Jean-Pierre Seifert Security in

10/14/19 Strategic Competence: Teaching Children to Use a Research-based Strategy for Problem

Space Based observation of the UHE Universe Andrea Santangelo Kepler Center for Astro and

M. Sugizaki on behalf of MAXI collabolation (RIKEN, JAXA, Tokyo Inst. Tech., Osaka Univ., Aoyama

To the moon and back Customer Identity and Access Management in a global Drupal setup Ground

Dataflow Analysis CSE 401 Section 9-ish Aaron Johnston & Nate Yazdani Announcements -

Motivation Programs may contain code whose result is needed, but in which some computation is

System Calls and Signals: Communication with the OS System Call Jonathan Misurda An operation

CS6 Practical System Skills Fall 9 edition Leonhard Spiegelerg

1 last time page table review virtual to physical translation two-level page tables how xv6

Dataflow analysis Discovering Global Live Ranges of Variables cs4713 1 Optimization and

Reaching Definitions Global Opt: Reaching Definitions Concept of definition and use Using

Dont kill agility with agile processes Troels Richter @troelsrichter Aino Vonge Corry @apaipi

Repeated games Felix Munoz-Garcia Strategy and Game Theory - Washington State University Repeated

djigger An open-source performance analysis solution Context Performance Testing & Analysis

1 Outline Overview of Kikori-KS Background Summary of our - PDF document

Outline Background Summary of our Contribution Kikori-KS: An Effective and Kikori-KS Efficient Keyword Search System for Digital Libraries in XML User Interfaces Implementation of Keyword Search on Relational Databases

Tool support for testing Chapter 6 1. Types of test tools 2. Effective use of test tools:

HOW TO QUIT YOUR JOB THROUGH THE POWER OF REAL ESTATE INVESTING a Webinar from BiggerPockets

How does it work? There are three roles in the process: Shopper, Releaser and Approver. In

L7 June 18, 2017 1 Lecture 7: Functions I CSCI 1360E: Foundations for Informatics and Analytics

Course Content IR, session 8 CS6200: Information Retrieval Slides by: Jesse Anderton Big

A Re q ue ste r s Pe rspe c tive Chie f F OI A Offic e rs Co unc il Me e ting July 27, 2017

INBOUND MARKETING What Is It? How Can It Help Your Business? Friday, June 6, 14 WHO IS THE

Mining the graph structures of the web Aristides Gionis Yahoo! Research, Barcelona, Spain, and

Question Answering and Reading Comprehension Kevin Duh Fall 2019, Intro to HLT, Johns Hopkins

Software Security Return to Libc and ROP Jan Nordholz Prof. Jean-Pierre Seifert Security in

10/14/19 Strategic Competence: Teaching Children to Use a Research-based Strategy for Problem

Space Based observation of the UHE Universe Andrea Santangelo Kepler Center for Astro and

M. Sugizaki on behalf of MAXI collabolation (RIKEN, JAXA, Tokyo Inst. Tech., Osaka Univ., Aoyama

To the moon and back Customer Identity and Access Management in a global Drupal setup Ground

Dataflow Analysis CSE 401 Section 9-ish Aaron Johnston &amp; Nate Yazdani Announcements -

Motivation Programs may contain code whose result is needed, but in which some computation is

System Calls and Signals: Communication with the OS System Call Jonathan Misurda An operation

CS6 Practical System Skills Fall 9 edition Leonhard Spiegelerg

1 last time page table review virtual to physical translation two-level page tables how xv6

Dataflow analysis Discovering Global Live Ranges of Variables cs4713 1 Optimization and

Reaching Definitions Global Opt: Reaching Definitions Concept of definition and use Using

Dont kill agility with agile processes Troels Richter @troelsrichter Aino Vonge Corry @apaipi

Repeated games Felix Munoz-Garcia Strategy and Game Theory - Washington State University Repeated

djigger An open-source performance analysis solution Context Performance Testing &amp; Analysis

Dataflow Analysis CSE 401 Section 9-ish Aaron Johnston & Nate Yazdani Announcements -

djigger An open-source performance analysis solution Context Performance Testing & Analysis