the art of mathematics retrieval
play

The Art of Mathematics Retrieval Petr Sojka et al. Masaryk - PowerPoint PPT Presentation

The Art of Mathematics Retrieval Petr Sojka et al. Masaryk University, Faculty of Informatics, Brno, Czech Republic <sojka@fi.muni.cz> Informatics Colloquium, FI MU, Brno, Czech Republic November 8th, 2011 . . . . . . Why Math Retrieval


  1. The Art of Mathematics Retrieval Petr Sojka et al. Masaryk University, Faculty of Informatics, Brno, Czech Republic <sojka@fi.muni.cz> Informatics Colloquium, FI MU, Brno, Czech Republic November 8th, 2011

  2. . . . . . . Why Math Retrieval (T hundred words” (Ross Moore). a query or a document. EX notation? Math in T Similarity as in MUFIN (pictures), Sketch Engine (text attributes)? How to pose questions about mathematics? usually with the lot of mathematics with formulae and equations. Searching is crucial part of accessibility of the great stuff you all create, EX math search)? Why Math Retrieval (T Conclusions Evaluation Math Indexer and Searcher Existing Approaches EX Math Search)? Informatics Colloquium, FI MU, Brno, CZ, November 8th, 2011: The Art of Mathematics Retrieval . . . . . . . . . . . . . . . . . . . . . . • Compact and logical expression of formulae, quickest entering of them into • A picture is worth a thousand words, “a mathematical formulae is worth of

  3. . . . . . . EX math search is more relevant now than ever? review databases); more or less a failure so far . success . oxymoron. Digital Mathematics Library (DML) without math-aware search is an (3,500,000 articles). notation. All mathematics ever publisher is estimated at 100,000,000 pages EX math Zentralblatt MATH expected this year, most of them authored in T Why Math Retrieval (T Informatics Colloquium, FI MU, Brno, CZ, November 8th, 2011: The Art of Mathematics Retrieval Why T Conclusions Evaluation Math Indexer and Searcher Existing Approaches EX Math Search)? . . . . . . . . . . . . . . . . . . . . . . • Because of G? (G as in Google, Globalization,…). • The vast treasure of mathematical papers; 140,000 new papers in • Search – crucial part (access to data); search is a gate to this knowledge; • Text and keyword based search? No problem (Google, review databases); • Mathematics formulae search? It is a problem (either in Google or in the

  4. . . . . . . Conclusions workshop in Bertinoro as a reply Prof. James Davenport, CEIC member, MKM 2011 PC chair, on panel at DML 2011 A: “Math formulae search.” to login and use a modern DML as EuDML?” Q: “What functionality and incentives would made a working mathematician Motivation for MSE (DML panel discussion) Informatics Colloquium, FI MU, Brno, CZ, November 8th, 2011: The Art of Mathematics Retrieval Why Math Retrieval (T Evaluation Math Indexer and Searcher Existing Approaches EX Math Search)? . . . . . . . . . . . . . . . . . . . . . .

  5. . . . . . . Conclusions “ Einstein $E=mcˆ2$ ” over arXiv. Compare google://Einstein with math-aware search of in a math formula. Sometimes the only difference among set of notions/key words would be Allowing formulas in queries helps to disambiguate and narrow search. Motivation for using a MSE (including formulae) – cont. Informatics Colloquium, FI MU, Brno, CZ, November 8th, 2011: The Art of Mathematics Retrieval Why Math Retrieval (T Evaluation Math Indexer and Searcher Existing Approaches EX Math Search)? . . . . . . . . . . . . . . . . . . . . . .

  6. . . . . . . Conclusions your search application supports math formulae search. EX-search aware—e.g. physicists and mathematicians under different names). across languages and vocabularies (e.g. same objects studied/used by the most relevant documents. Why Math Retrieval (T Motivation for using a MSE (search examples) – cont. Evaluation Math Indexer and Searcher Existing Approaches EX Math Search)? Informatics Colloquium, FI MU, Brno, CZ, November 8th, 2011: The Art of Mathematics Retrieval . . . . . . . . . . . . . . . . . . . . . . • Search problem formulation: given query containing text and formulae, find • Example 1: knowing the solution of partial differential equation in L 1 ( C 3 ) , is there one in L 2 ( C 5 ) ? • Example 2: historians may want to follow the history of a (class of) formula(s) • Imagine your favourite ebook math textbook being T

  7. . . . . . . Conclusions with hundreds of millions formulae? indexed and how the search is performed to be useable on DL The rest of the talk: how is it actually done , how are the formulae Yes, you can! (in our MIaS system) Take-off message from this talk Evaluation Why Math Retrieval (T Math Indexer and Searcher Existing Approaches EX Math Search)? Informatics Colloquium, FI MU, Brno, CZ, November 8th, 2011: The Art of Mathematics Retrieval . . . . . . . . . . . . . . . . . . . . . .

  8. . . . . . . Why Math Retrieval (T by the community as the MSE. LeActiveMath, DLMF equation search, MathWebSearch, but none accepted EXSearch, T A handling, no established and accepted user interface and query language. (LaTeXSearch by Springer). Towards math search engine (MSE) – existing players Conclusions Evaluation Math Indexer and Searcher Existing Approaches EX Math Search)? Informatics Colloquium, FI MU, Brno, CZ, November 8th, 2011: The Art of Mathematics Retrieval . . . . . . . . . . . . . . . . . . . . . . • Niche market for big players (as Google), attempts to solve by publishers • Many challenges: heterogenity of math representation, notation, semantics • Numerous attempts to solve the problem: MathDex, EgoMath, L

  9. . . . . . . algorithms and relevance calculation semantic data text searching * supports Content MathML and OpenMath * problem with acquiring notation * only for documents authored for LeActiveMath learning environment database: 3 M formulae from ‘random’ sources (cf. 200 M in arXiv) approximate match based on strings * no formulae structure matching * small EX math string A T EXSearch : MSE offered by Springer * closed source * only for L A T pioneering conversion effort Why Math Retrieval (T Informatics Colloquium, FI MU, Brno, CZ, November 8th, 2011: The Art of Mathematics Retrieval Math Indexer and Searcher Existing Approaches Existing systems—pros and cons EX Math Search)? Conclusions Evaluation . . . . . . . . . . . . . . . . . . . . . . • EgoMath and EgoMath2 : based on full text web search system Egothor * presentation MathML for indexing * idea of formulae augmentation, α -equivalence • MathDex : formerly MathFind * seven digit figure NSF grant by Design Science (Robert Miner) * Lucene based, indexing n -grams of presentation MathML * • L • LeActiveMath : indexing string tokens from OMDoc with OpenMath semantic • DLMF : only for documents authored for DLMF in special markup * equation search • MathWeb Search : semantic approach – uses substitution trees – not based on full

  10. . . . . . . Why Math Retrieval (T EX input. MIaS – Math Indexer and Searcher Conclusions Evaluation Math Indexer and Searcher Existing Approaches EX Math Search)? Informatics Colloquium, FI MU, Brno, CZ, November 8th, 2011: The Art of Mathematics Retrieval . . . . . . . . . . . . . . . . . . . . . . • Math-aware , full-text based search engine. • Joins textual and mathematical querying. • MathML or T

  11. . . . . . . Why Math Retrieval (T indexing. Math for software applications: MathML wins and is used by most computer EX notation for querying. EX T A EX notation wins and is used by people (mostly AMSL Math for people : T EX and MathML How to ask and how to index –dual world of T Conclusions Evaluation Math Indexer and Searcher Existing Approaches EX Math Search)? Informatics Colloquium, FI MU, Brno, CZ, November 8th, 2011: The Art of Mathematics Retrieval . . . . . . . . . . . . . . . . . . . . . . fits most needs): → T algebra systems, browsers, in workflow of DTP systems: → MathML for

  12. . . . . . . Conclusions representations of a formula (with different ‘near synonyms’) put in the index). From text to math: the same idea explored for math (e.g. having multiple a song in the index that even does not appear in the invitation. EXbook’s Concert invitation example: there is a name of Czech composer of T In text retrieval: Indexing word stems only instead of word forms. Dual world of querying and indexing languages Informatics Colloquium, FI MU, Brno, CZ, November 8th, 2011: The Art of Mathematics Retrieval Why Math Retrieval (T Evaluation Math Indexer and Searcher Existing Approaches EX Math Search)? . . . . . . . . . . . . . . . . . . . . . .

  13. . . . . . . Evaluation Why Math Retrieval (T MSE overall design Conclusions Informatics Colloquium, FI MU, Brno, CZ, November 8th, 2011: The Art of Mathematics Retrieval Math Indexer and Searcher EX Math Search)? Existing Approaches . . . . . . . . . . . . . . . . . . . . . . indexing document Preprocessing input handler into document canonicalized math text presentation MathML indexer tokenization Lucene Core terms unification index canonicalization query results math processing searcher math text input query searching

  14. . . . . . . Evaluation Why Math Retrieval (T Math indexing design Conclusions Informatics Colloquium, FI MU, Brno, CZ, November 8th, 2011: The Art of Mathematics Retrieval Math Indexer and Searcher Existing Approaches EX Math Search)? . . . . . . . . . . . . . . . . . . . . . . indexing input document canonicalized handler document math text indexing Lucene indexer tokenization math processing terms unification index ordering canonicalization canonicalization query results math processing math tokenization searcher text weighting variables unification input query searching constants unification searching

Recommend


More recommend