detection and segmentation of detection and segmentation
play

Detection and Segmentation of Detection and Segmentation of - PowerPoint PPT Presentation

Detection and Segmentation of Detection and Segmentation of Touching Characters in Touching Characters in Mathematical Expressions Mathematical Expressions A. Nomura, A. Nomura, K. Michishita, K. Michishita, S. Uchida, S. Uchida, M.


  1. Detection and Segmentation of Detection and Segmentation of Touching Characters in Touching Characters in Mathematical Expressions Mathematical Expressions A. Nomura, A. Nomura, K. Michishita, K. Michishita, S. Uchida, S. Uchida, M. Suzuki M. Suzuki Kyushu University, Japan Kyushu University, Japan

  2. I ntroduction I ntroduction 2 Suzuki Lab. Kyushu Univ.

  3. OCR for mathematical document OCR for mathematical document � Aim � Recognition of ordinary texts and mathematical expressions � Merits � Storage size reduction bitmap image → ASCII codes � Search services theorem search, definition search, … � Format conversion from scanned image to LaTeX, XML, Mathematica Notebook, Braille, … 3 Suzuki Lab. Kyushu Univ.

  4. Hurdles Hurdles � Large categories (> 500) � alphabets, numerals, Greek, operators, parentheses, big symbols (e.g., “ Σ ”), … � Various fonts roman, italic, calligraphic, … � � Various sizes and positions sub-/super-scripts, fractions, … � � Touching characters 50% and more misrecognitions were � due to touching characters 4 Suzuki Lab. Kyushu Univ.

  5. Touching characters in math. expressions Touching characters in math. expressions ) touching not only horizontally but also diagonally conventional segmentation techniques will fail 5 Suzuki Lab. Kyushu Univ.

  6. Our purpose Our purpose � Development of a novel segmentation technique for touching characters in mathematical expressions � And higher recognition accuracy 6 Suzuki Lab. Kyushu Univ.

  7. Outline of the Outline of the proposed proposed technique technique 7 Suzuki Lab. Kyushu Univ.

  8. Outline of the proposed technique (1) Outline of the proposed technique (1) all connected components in document (image data with initial recognition result) … clustering based on shape similarity •computational efficiency •neglect of trivial shape difference about 1/10 centroids … compression 8 Suzuki Lab. Kyushu Univ.

  9. Outline of the proposed technique (2) Outline of the proposed technique (2) centroids … Detection of touching characters candidates of touching char. … Segmentation of touching characters … single (separated) characters 9 Suzuki Lab. Kyushu Univ.

  10. Detail of detection procedure Detail of detection procedure initial recognition connected “ Γ ” result component extraction of feature values of standard “ Γ ” two shape features difference > threshold ? yes no candidates of single touching chars characters … … 10 Suzuki Lab. Kyushu Univ.

  11. Detail of segmentation procedure Detail of segmentation procedure candidates of single touching chars characters match! finding 1st char thickening residual match! finding 2nd char 11 Suzuki Lab. Kyushu Univ.

  12. Notes Notes Single characters from the same document 1. are utilized � Components of touching characters are usually found as single characters in the document Font-style/size adaptive separation is realized � Recognition result is provided 2. Tolerant to false positives 3. If no match is found, the candidate is rejected � as a single character 12 Suzuki Lab. Kyushu Univ.

  13. Experimental Experimental results results 13 Suzuki Lab. Kyushu Univ.

  14. Database Database � 391 pages from 21 math. documents � 140,000 characters in math. expressions � groundtruth was manually attached to each character � characters in ordinary text parts were excluded in our experiment � 2,978 touching char images (~ 6,000 chars) � 4.2% of all 140,000 characters 14 Suzuki Lab. Kyushu Univ.

  15. Detection result Detection result all characters in math. expression (about 140,000 characters) touching single 2978 134375 detection result touching single (false positive) 50 2864 10002 false negative � False negatives were small (1.6%) � False positives were large (but will be rejected in the segmentation procedure) 15 Suzuki Lab. Kyushu Univ.

  16. Segmentation result Segmentation result detection result touching single (false positive) 2864 10002 segmentation result success (rejected as single) success failure 1468 9984 1396 failure (forced separation) 18 � 50% of touching chars were successfully separated � Forced separations were very small 16 Suzuki Lab. Kyushu Univ.

  17. Segmentation result Segmentation result 17 Suzuki Lab. Kyushu Univ.

  18. Effect on total recognition rate Effect on total recognition rate � Recognition rate of all 140,000 characters 92.9 % → 95.1 % (2.2% up) � The number of misrecognitions 9710 → 6792 (30% reduction) the proposed technique is very meaningful ! 18 Suzuki Lab. Kyushu Univ.

  19. Conclusion Conclusion � Detection and segmentation procedures for touching characters in math. expressions were investigated � About 50% of touching characters were successfully detected and separated � Total character recognition rate was improved from 92.9 % to 95.1 %. 19 Suzuki Lab. Kyushu Univ.

Recommend


More recommend