formula search
play

Formula Search Akiko Aizawa aizawa@nii.ac.jp National Institute of - PowerPoint PPT Presentation

1 Natural Language Processing Techniques for Mathematical Formula Search Akiko Aizawa aizawa@nii.ac.jp National Institute of Informatics About myself Affiliation National Institute of Informatics (Digital Content Media Science Division)


  1. 1 Natural Language Processing Techniques for Mathematical Formula Search Akiko Aizawa aizawa@nii.ac.jp National Institute of Informatics

  2. About myself Affiliation • National Institute of Informatics (Digital Content Media Science Division) • Research Center for Knowledge Media and Content Science • The University of Tokyo (Computer Science Department) • Graduate School for Advanced Studies (Department of Informatics) Keywords • Text Processing • Natural Language Processing Akiko • Information Retrieval Aizawa • Knowledge Engineering

  3. CICM 2018 (3) Today’s Talk • Introduction: Math Formula in Scientific Literature • Mathematical Formula in Natural Language Text • NTCIR: Math Information Retrieval Task • Technical Challenges in NTCIR Math IR • Math Understanding as AI Challenges

  4. CICM 2018 (4) Introduction: Math Formula in Scientific Literature

  5. CICM 2018 (5) Searching Math Formulae on the Web Pythagorean theorem 𝐶𝐷 2 + 𝐵𝐷 2 = 𝐵𝐶 2 𝑏 2 + 𝑐 2 𝑑 =

  6. CICM 2018 (6) Searching Math Formulae on the Web “Given a mathematical expression, the problem is to Pythagorean theorem find expressions that match it.” 𝐶𝐷 2 + 𝐵𝐷 2 = 𝐵𝐶 2 Shahab Kamali and Frank Wm. Tompa. 2010. A new mathematics 𝑏 2 + 𝑐 2 retrieval system. In Proceedings of the 19th ACM international 𝑑 = conference on Information and knowledge management (CIKM 2010).

  7. CICM 2018 (7) 3R ’s for Math Formula Search Requirement Representations Resources Is there a standard Are there many Are there may representation for documents with users who need math formulae? math formulae? math formulae?

  8. 8 3R ’s for Math Formula Search Requirement Representations Resources MathML NIST Digital Library of Q. Is mathematics related (W3C recommendation) Mathematical Functions to your research? Web-browsable XML Wikipedia Strongly related <math xmlns='http://www.w3.org/1998/Math 16,962 math articles /MathML' mathematica:form='TraditionalForm' assessed by xmlns:mathematica='http://www.wolfram.com/XML WikiProject Mathematics /'> <semantics> <mrow> <mrow> <mrow> <mrow> Related Somewhat <mi> log </mi> <mo> &#8289; </mo> <mo> ( </mo> related <msub> <mi> z </mi> <mn> 1 </mn> </msub> Wolfram Function SITE 307,409 formulas XML for math semantics 77% researchers across <annotation-xml encoding='MathML-Content'> <apply> <ci> Condition </ci> <apply> <eq /> diversity of disciplines <apply> <plus /> <apply> <ln /> <apply> <ci> answered Subscript </ci> <ci> z </ci> <cn type='integer'> 1 </cn> </apply> </apply> <apply> <ln /> <apply> ‘YES’. Many many <ci> Subscript </ci> scientific articles NISTEP Policy Study

  9. CICM 2018 (9) 3R ’s for Math Formula Search Requirement Representations Resources Is there a standard Are there many Are there may representation for documents with users who need math formulae? math formulae? math formulae? Still, math formula search is a tough problem

  10. 10 Why math formula search is difficult? Math Formula Search Math Formula Understanding

  11. 11 Three Representation Levels of Math Formulae Presentation level Content level Computation level

  12. CICM 2018 (12) X squared For printing/displaying Presentation level 𝑌 2 Content level LaTex MathML Presentation Markup PDF Computation level …

  13. CICM 2018 (13) X squared For “semantics” Presentation level <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"> <apply> <power/> <ci>X</ci> Content level <cn>2</cn> </apply> </math> Snuggletex Computation level https://www2.ph.ed.ac.uk/snuggletex/UpConversionDemo

  14. CICM 2018 (14) X squared For “computing” Presentation level power(X, 2) X ** 2 Content level math.pow(X, 2) X * X Computation level …

  15. 15 Computer understanding of math formulae Presentation level Content level Computation level

  16. CICM 2018 (16) Mathematical Formulae in Natural Language Text

  17. CICM 2018 (17) Math in Scientific Researches: The Cultural Difference • Math for mathematics • Math for many other research fields

  18. CICM 2018 (18) Math in Scientific Researches: The Cultural Difference 𝑏 2 + 𝑐 2 = 𝑑 2 Pythagorean theorem A hypothesis that needs to Description of a real world be proven (which is assumed to be true) “True” or “False”? What is 𝑑 when 𝑏 = 3 and 𝑐 = 4 ?

  19. CICM 2018 (19) Math in Scientific Researches: The Cultural Difference • Math for mathematics • Math for many other research fields Hypothesis that needs to Description of a real world be proven (which is assumed to be true) Natural language text is Math formulae are a a complement for a part of natural language mathematical proof semantics

  20. 20 Math and NLP: Observations • Inseparability: A math formulae often appears as an indispensable component of a sentence • Ambiguity: Just like natural language text, there exists certain ambiguity with math formulae in a document that should be resolved by their context information • Translatability: Math formulae can be translated into natural language sentences, and sometimes, vice versa.

  21. CICM 2018 (21) Typical page layout of a scientific paper figure figure caption figure figure caption textbox textbox textbox textbox textbox section title textbox page number

  22. CICM 2018 (22) Typical Scientific “Text” Puyang Xu, Asela Gunawardana, Sanjeev Khudanpur: Efficient Subsampling for Training Complex Language Models, EMNLP 2011

  23. CICM 2018 (23) Math formulae in a sentence Denoting the concatenated d- dimensional word representations MATH-w-6-5- 0-32, we have the following probability definition: MATH-p- 6-6-0 where MATH-w-6-7-0-1 denotes the hidden layer size, MATH-w-6-7-0-8 and MATH-w- 6-7-0-10 are the bias vectors for the output nodes and hidden nodes respectively. Puyang Xu, Asela Gunawardana, Sanjeev Khudanpur: Efficient Subsampling for Training Complex Language Models, EMNLP 2011

  24. 24 Math and NLP: Observations • Inseparability: A math formulae often appears as an indispensable component of a sentence • Ambiguity: Just like natural language text, there exists certain ambiguity with math formulae in a document that should be resolved by their context information • Translatability: Math formulae can be translated into natural language sentences, and sometimes, vice versa.

  25. CICM 2018 (25) Ambiguity of mathematical formulae For which of the following functions is + f a ( b ) f(a+b) = f(a) + f(b) for all positive numbers a and b? 1/ f ... where f is a frequency. Intuitively, if z is a function g of y ( ) f x and y is a function f of x, then z is a function of x. f : variable  ? 2 f f : function Can anyone explain to me how angular frequency (w) = 2pi(f)? Context information becomes crucial for semantic disambiguation

  26. CICM 2018 (26) Ambiguity of mathematical formulae Corinne’s Shibboleth Suppose the temperature on a rectangular stab of metal is given by 𝑈(𝑦, 𝑧) = 𝑙(𝑦 2 + 𝑧 2 ) where 𝑙 is a constant. What is 𝑈(𝑠, 𝜄) ? A: 𝑈 𝑠, 𝜄 = 𝑙𝑠 2 Physicists B: 𝑈 𝑠, 𝜄 = 𝑙(𝑠 2 +𝜄 2 ) Mathematicians Sometimes, “commonsense” is also important for disambiguation Dray, T. & Manogoue, C. (2002). Vector calculus bridge project website, http://www.math.oregonstate.edu/bridge/ideas/functions Redish, E. F., & Kuo, E. (2015). Language of physics, language of math: Disciplinary culture and dynamic epistemology. Science & Education, 24(5-6), 561-590.

  27. 27 Math and NLP: Observations • Inseparability: A math formulae often appears as an indispensable component of a sentence • Ambiguity: Just like natural language text, there exists certain ambiguity with math formulae in a document that should be resolved by their context information • Translatability: Math formulae can be translated into natural language sentences, and sometimes, vice versa.

  28. CICM 2018 (28) Math to NLP Translation Prof. Masakazu ChattyInfty Braille Mathematics Notation Suzuki Nonprofit Organization Science Accessibility Net Reading software for mathematical document http://www.sciaccess.net/en/ChattyInfty/ RNIB, Braille Mathematics Notation 1987. 1987, Peterborough: Royal National Institute for the Blind.

  29. CICM 2018 (29) Math formula as a sentence Y equals X squared. 𝑍 = 𝑌 2 .

  30. 30 Math and NLP: Observations • Inseparability: A math formulae often appears as an indispensable component of a sentence • Ambiguity: Just like natural language text, there exists certain ambiguity with math formulae in a document that should be resolved by their context information • Translatability: Math formulae can be translated into natural language sentences, and sometimes, vice versa. Semantic analysis of math formulae can be considered as one variation of NLP semantic parsing

  31. CICM 2018 (31) NTCIR: Math Information Retrieval Tasks

Recommend


More recommend