Towards Structure-Aware Information Retrieval Petr Sojka et al Masaryk University, Faculty of Informatics, Brno, Czech Republic <https://mir.fi.muni.cz/> Informatics Colloquium, Faculty of Informatics, MU, Brno, Czech Republic October 25th, 2014 Illustrations by Jiří Franek. }w� !"#$%&'()+,-./012345<yA|
Motivation Semantics Semantics search Formulae Evaluation Structured (Martin) Searching Evaluation gensim MSC, Representation Formulae Distributional Searching: MIaS (Michal) Similarity (???) Search Future of (Partha) Entailment Information Retrieval Talk topics and take-home message Summary Entailment Similarity MIaS at NTCIR Petr Sojka, Informatics Colloquium, Faculty of Informatics, Brno, CZ, October 25th, 2014: Towards Structure-Aware Information Retrieval
Motivation Searching: MIaS MIaS at NTCIR Similarity Entailment Summary Outline 1 Motivation 2 Searching: MIaS 3 MIaS at NTCIR 4 Similarity 5 Entailment 6 Summary and future work Petr Sojka, Informatics Colloquium, Faculty of Informatics, Brno, CZ, October 25th, 2014: Towards Structure-Aware Information Retrieval
Motivation Searching: MIaS MIaS at NTCIR Similarity Entailment Summary Dependency on Information Retrieval: Information Society Now! Petr Sojka, Informatics Colloquium, Faculty of Informatics, Brno, CZ, October 25th, 2014: Towards Structure-Aware Information Retrieval
Motivation Summary Searching: MIaS Scholarly STEM Communication via Digital Math Libraries Petr Sojka, Informatics Colloquium, Faculty of Informatics, Brno, CZ, October 25th, 2014: Towards Structure-Aware Information Retrieval Entailment Similarity MIaS at NTCIR ? ! E = mc 2 E = mc 2 E = mc 2 Markup Design Typesetting Proofreading Preprint Print Distribution
Motivation Searching: MIaS MIaS at NTCIR Similarity Entailment Summary History of information retrieval : gradual speedup of changes Petr Sojka, Informatics Colloquium, Faculty of Informatics, Brno, CZ, October 25th, 2014: Towards Structure-Aware Information Retrieval
Motivation Searching: MIaS MIaS at NTCIR Similarity Entailment Summary Search: A gate to knowledge Querying and search ing similar structures more and more important. Structures: math formulae, syntactic or sentence dependency trees, compositional named entity terms, knowledge base terms. <http://google.cz/search?q=Kovacik+Rakosnik> $L^{p(x)}$ https://www.google.cz/search?q=”L^{p(x)}” + without quotes or figures :-). Petr Sojka, Informatics Colloquium, Faculty of Informatics, Brno, CZ, October 25th, 2014: Towards Structure-Aware Information Retrieval
Motivation Nature 454, 263 (2008) | doi:10.1038/454263b numerous challenges identified during DML-CZ project. Workshop series Towards a Digital Mathematics Library founded to tackle Searching: MIaS Petr Sojka, Informatics Colloquium, Faculty of Informatics, Brno, CZ, October 25th, 2014: Towards Structure-Aware Information Retrieval Summary MIaS at NTCIR Similarity Entailment Starting small but adding up: a free maths archive A small group of researchers is intends to change this. They have To make such an archive easier 1,500 journals whose archives have meeting in Birmingham, UK, later started small, with a handful of to search, researchers have found already been digitized. “A few years this month to plan a free digital digitization projects in Poland, ways to guess the subject of a ago, this model had the potential library of mathematics. Russia, Serbia and the Czech paper on the basis of the frequency to change the mathematics journal All the mathematical literature Republic. In a few years they hope to of symbols in it. But there will be literature in profound ways,” he ever published runs to more unite these repositories with their many more-practical challenges, says. But most publishers have han 50 million pages, with around western European counterparts such as finding the funds to scan rushed to scan their own archives in 75,000 articles added each in an archive to be hosted by the millions of old papers and striking order to lock them up and sell them year. Over the past decade there European Union, according to the deals with publishers who hold to libraries. have been several attempts to organizer, Petr Sojka, an informatics rights to them. “While the effort to digitize the make this prodigious body of scientist at Masaryk University It may already be too late to build smaller collections is admirable, work accessible in a single digital in Brno in the Czech Republic. a single free mathematical archive, and it’s certainly worthwhile, it’s archive, but so far none has Eventually this pan-European according to John Ewing, head of the unlikely to effect a larger change,” succeeded. archive could be expanded globally, American Mathematical Society, says Ewing. ■ A group of mathematicians he says. which maintains a list of more than Jascha Hoffman 263
Motivation Searching: MIaS MIaS at NTCIR Similarity Entailment Summary DML workshop series archived in DML-CZ Petr Sojka, Informatics Colloquium, Faculty of Informatics, Brno, CZ, October 25th, 2014: Towards Structure-Aware Information Retrieval
Motivation Göttingen library,… <http://eudml.org> providers). Aggregation via standard OAI-PMH protocol (OAI servers run by data DML-CZ, NUMDAM, DML-PL, DML-PT, DML-GR, DML-BG, DML-ES,… established DML repositories: The Czech Digital Mathematics Library DML content providers serve mostly publisher’s or regional more or less 14 data and technology providers plus associated partners as ZMath, Searching: MIaS EuDML Aggregation of data from building bricks of regional repositories: Summary Entailment Similarity MIaS at NTCIR Petr Sojka, Informatics Colloquium, Faculty of Informatics, Brno, CZ, October 25th, 2014: Towards Structure-Aware Information Retrieval
Motivation Searching: MIaS MIaS at NTCIR Similarity Entailment Summary Math aware Search and Indexing L A T EXSearch, LeActiveMath, MathWebSearch) problematic Petr Sojka, Informatics Colloquium, Faculty of Informatics, Brno, CZ, October 25th, 2014: Towards Structure-Aware Information Retrieval • Conventional searching approaches are not applicable for math • Usage of existing mathematical search engines (MathDex, EgoMath, • new Math Indexer and Searcher (MIaS) developed at MU
Motivation Search semantic - trees) (substitution OpenMath MathML, Content OpenMath MathML, Content MathML, Presentation MathWeb L ? math, DOI titles, EX T A L syntactic - EX(text) T A L QMath, A T Searching: MIaS MathML, EgoThor mixed text, math, EX T A L mixed Infty MathML (text) Presentation MathML, PDF Content Presentation EX, EgoMath only) text (for Lucene Apache mixed text, math, editor) styles (palette Maple, Yacas Maxima, Mathematica, EX T A language MathML (text) Presentation Word, PDF EX, T A EX/L T HTML, MathDex core Indexing Queries Query L eq. Approach converters Used representation Internal documents Input MIR systems comparison Summary Entailment Similarity MIaS at NTCIR jtidy, blahtex, LaTeXML, Hermes, OpenMath EXSearch T A L Lucene Apache mixed text, math, (palette editor) OpenMath syntactic (text) - OpenMath OMDoc, Word+Math- Type, pdf2tiff- >Infty syntactic ? text, math, Petr Sojka, Informatics Colloquium, Faculty of Informatics, Brno, CZ, October 25th, 2014: Towards Structure-Aware Information Retrieval mixed Apache Lucene LeActiveMath α - ✕ ✕ ✕ ✔ ✕
Motivation Math Indexer and Searcher MIaS — features term, distributional representation of formulae Searching: MIaS Petr Sojka, Informatics Colloquium, Faculty of Informatics, Brno, CZ, October 25th, 2014: Towards Structure-Aware Information Retrieval Summary Entailment Similarity MIaS at NTCIR • Inspired mostly by MathDex and EgoMath • Presentation and now also Content MathML • Allows similarity (not only exact match) between query and matched • Commutativity • Unification of variables and constants • Subformulae matching • Level of similarity calculation for expressions • Mixed mathematical-textual queries • Based on full text state of the art Apache Lucene core
Motivation Searching: MIaS MIaS at NTCIR Similarity Entailment Summary Math Indexer and Searcher — Design Petr Sojka, Informatics Colloquium, Faculty of Informatics, Brno, CZ, October 25th, 2014: Towards Structure-Aware Information Retrieval
Motivation Searching: MIaS MIaS at NTCIR Similarity Entailment Summary Math Indexer and Searcher — Design II Petr Sojka, Informatics Colloquium, Faculty of Informatics, Brno, CZ, October 25th, 2014: Towards Structure-Aware Information Retrieval
Motivation Searching: MIaS MIaS at NTCIR Similarity Entailment Summary Formula processing weighting example Petr Sojka, Informatics Colloquium, Faculty of Informatics, Brno, CZ, October 25th, 2014: Towards Structure-Aware Information Retrieval
Motivation Summary Searching: MIaS Math formulae indexing processing Petr Sojka, Informatics Colloquium, Faculty of Informatics, Brno, CZ, October 25th, 2014: Towards Structure-Aware Information Retrieval Entailment Similarity MIaS at NTCIR indexing math processing ordering canonicalization tokenization weighting variables unification constants unification searching
Recommend
More recommend