Accessibility Issues in Digital Mathema�cal Libraries Petr Sojka, Michal Růžička, Maroš Kucbel, and Mar�n Jarmar Masaryk University, Faculty of Informa�cs, Brno, Czech Republic <sojka@fi.muni.cz>, {mruzicka, kocka, 172981}@mail.muni.cz Universal Learning Design, Brno, 13th February 2013
Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . Outline 1 Introduc�on . . 2 PDF Processing . . 3 MathML Processing . . 4 Summary . . Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013
Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . Introduc�on • Digital mathema�cs libraries are on the rise. • The European Digital Mathema�cs Library (EuDML, <h�ps://eudml.org/>). • The Czech Digital Mathema�cs Library (DML-CZ, <h�p://dml.cz/>). • Serves not only metadata but also full texts with mathema�cal formulae. • PDF. • MathML. • *T EX. Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013
Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . PDF, T EX/L A T EX, MathML • Thanks to pdfT EX, PDF is the de facto standard output format of the modern T EX distribu�ons. • L A T EX mathema�cal nota�on is well known and effec�ve. • Used not only in L A T EX documents but also in a variety of other projects such as Wikipedia. • L A T EX source code is usually a good choice for plain text representa�on of mathema�cal expressions. • MathML is o�en used as both machine and human readable language for describing mathema�cal nota�ons. Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013
Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . PDF Processing 1 Introduc�on . . 2 PDF Processing . . PDF Processing PDF Enhancement 3 MathML Processing . . 4 Summary . . Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013
Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . MaxTract • A command line tool that reads a PDF and returns various types of enriched output. • L A T EX for use with Tralics. • L A T EX for layered PDF with L A T EX and text layers. • L A T EX for annotated PDF with L A T EX annota�ons. • A simple text file. • A text file with math in L A T EX. • Under development by the Scien�fic Document Analysis Group at School of Computer Science, University of Birmingham, UK. • Homepage: <h�p://www.cs.bham.ac.uk/research/groupings/reasoning/sdag/maxtract.php> Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013
Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . MaxTract (cont.) • For successful analysis, the PDF file must make sole use of Type 1 fonts with embedded encodings. • MaxTract is wri�en in OCaml and uses the pd�k for decompressing PDF files. Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013
Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . In�yReader OCR • Old documents are o�en available in paper form only. • It is necessary to scan them and process by Op�cal Character Recogni�on (OCR) so�ware. • In�yReader OCR so�ware has unique feature of recogni�on of mathema�cal expressions in scanned documents. • In�yReader is part of the In�yProject (<h�p://www.in�yproject.org/>) under development by Masakazu Suzuki’s research and development group in Japan. Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013
Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . In�yReader OCR (cont.) • In�yReader inputs and output various formats. input TIFF, BMP, GIF, PNG, PDF. output L A T EX, XHTML+MathML, various XML formats. • Quality and resolu�on of scans is crucial. Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013
Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . CopyMath • The ActualText command of the PDF language is used to mark the region of the mathema�cal expression inside the PDF document. • We want the package to be as user friendly as possible – users should not be forced to modify their mathema�cal expressions in any way, \usepackage{copymath} should cater for all their needs. • The implementa�on is not easy. • This requires nonstandard modifica�ons of the L A T EX mathema�cal environments. Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013
Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . Standard PDF document L A T EX source code: Text $\Pi(x) = \pi(x) + \frac{1}{2}\pi(x^{1/2}) + \frac{1}{3}\pi(x^{1/3}) + \cdots$ text. Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013
Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . Standard PDF document PDF code: BT /F16 9.9626 Tf 148.712 707.125 Td [(T)83(ext)]TJ/F17 9.9626 Tf 23.247 0 Td [(\005\050)]TJ/F20 9.9626 Tf 11.346 0 Td [(x)]TJ/F17 9.9626 Tf 5.694 0 Td [(\051)-278(=)]TJ/F20 9.9626 Tf 17.158 0 Td [(\031)]TJ/F17 9.9626 Tf 6.036 0 Td [(\050)]TJ/F20 9.9626 Tf 3.875 0 Td [(x)]TJ/F17 9.9626 Tf 5.694 0 Td [(\051)-222(+)]TJ/F18 6.9738 Tf 17.247 3.923 Td [(1)]TJ ET Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013
Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . Standard PDF document Text obtained using Copy & Paste func�on of PDF reader: Text Π( 𝑦 ) = 𝜌 ( 𝑦 ) + 1 2 𝜌 ( 𝑦 1/2) + 1 3 𝜌 ( 𝑦 1/3) + · · · text. Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013
Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . CopyMath-enabled PDF document L A T EX source code: Text $\Pi(x) = \pi(x) + \frac{1}{2}\pi(x^{1/2}) + \frac{1}{3}\pi(x^{1/3}) + \cdots$ text. Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013
Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . CopyMath-enabled PDF document PDF code: BT /F16 9.9626 Tf 148.712 707.125 Td [(T)83(ext)]TJ ET 1 0 0 1 171.959 707.125 cm /Span << /ActualText<245C506920287829203D205C706920287829202B205C66726163207B317D7B32 7D5C70692028785E7B312F327D29202B205C66726163207B317D7B337D5C70692028785E7B31 2F337D29202B205C63646F74732024> >> BDC 1 0 0 1 -171.959 -707.125 cm BT /F17 9.9626 Tf 171.959 707.125 Td [(\005\050)]TJ/F20 9.9626 Tf 11.346 0 Td [(x)]TJ/F17 9.9626 Tf 5.694 0 Td [(\051)-278(=)]TJ/F20 9.9626 Tf 17.158 0 Td [(\031)]TJ/F17 9.9626 Tf 6.036 0 Td [(\050)]TJ/F20 9.9626 Tf 3.875 0 Td [(x)]TJ/F17 9.9626 Tf 5.694 0 Td [(\051)-222(+)]TJ/F18 6.9738 Tf 17.247 3.923 Td [(1)]TJ ET Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013
Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . CopyMath-enabled PDF document Text obtained using Copy & Paste func�on of PDF reader: Text $\Pi (x) = \pi (x) + \frac {1}{2}\pi (x^{1/2}) + \frac {1}{3}\pi (x^{1/3}) + \cdots $ text. Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013
Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . CopyMath Implementa�on • We need to add \pdfliteral at the beginning and end of every mathema�cal environment. • The dollar sign ($) is ac�vated and redefined. • It is necessary to keep track of nested mathema�cal environments. • Simple redefini�on of A MS -L A T EX mathema�cal environments is not possible. • S�ll experimental. Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013
Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . MathML Processing 1 Introduc�on . . 2 PDF Processing . . 3 MathML Processing . . Making Maths Accessible MathML Processing 4 Summary . . Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013
Recommend
More recommend