natural language processing
play

Natural Language Processing Historical Document Transcription Dan - PowerPoint PPT Presentation

Natural Language Processing Historical Document Transcription Dan Klein UC Berkeley Joint work with Taylor Berg-Kirkpatrick and Greg Durrett [ACL 2013] Historical Document Historical Document Old Bailey Court Proceedings 1775 Transcription


  1. Natural Language Processing Historical Document Transcription Dan Klein — UC Berkeley Joint work with Taylor Berg-Kirkpatrick and Greg Durrett [ACL 2013]

  2. Historical Document

  3. Historical Document Old Bailey Court Proceedings 1775

  4. Transcription Document Image

  5. Transcription Transcription Document Image (Google Tesseract) and Ch’: priftmer anhc bar. Jacob Lazarus and his IHP1 uh: prifoner. were both together when! rcccivcd lhczn. I fold eievén pair of than for xiirce guincas, and dclivcrcd the rcll'l:.in- d:r hack lo :11: prifuner. 1 fold ftvcn pairof filk to Mark Simpcr : nncpuir of mixcd. and. mo pair of Ifircad to lhz: foolnun, and on: pair of zhrzad to lh: barber. ' Q: What is the foolmarfs name? Fraum Mgfzr. I dun’: know. Hairy Hzrvir. l was flandingar the Camp Icr waizin far the thcrrilfs ufliceruo employ in: : Mo 3‘: daughter came for me to 0 am! take the prifoncr. 1 Wm! to |hc Old aailcy

  6. Pipelined Approach

  7. Pipelined Approach

  8. Pipelined Approach

  9. Pipelined Approach

  10. Pipelined Approach

  11. Pipelined Approach m

  12. Pipelined Approach m o

  13. Pipelined Approach m o d

  14. Historical Document

  15. Unknown Fonts

  16. Unknown Fonts po

  17. Unknown Fonts po

  18. Unknown Fonts po

  19. Unknown Fonts long s glyph

  20. Wandering Baseline

  21. Wandering Baseline

  22. Wandering Baseline

  23. Wandering Baseline

  24. Uneven Inking

  25. Uneven Inking

  26. Uneven Inking

  27. Uneven Inking

  28. Various Historical Documents 1725 1875 1823 1883:

  29. Our Approach

  30. Our Approach po

  31. Our Approach po

  32. Our Approach po

  33. Generative Model p r i s o n e r

  34. Generative Model p r i s o n e r

  35. Generative Model p r i s o n e r

  36. Generative Model p r i s o n e r

  37. Generative Model Language Model p r i s o n e r

  38. Generative Model p r i s o n e r Typesetting Model

  39. Generative Model p r i s o n e r Typesetting Model

  40. Generative Model p r i s o n e r Typesetting Model

  41. Generative Model p r i s o n e r Typesetting Model

  42. Generative Model p r i s o n e r Typesetting Model

  43. Generative Model p r i s o n e r Typesetting Model

  44. Generative Model p r i s o n e r Typesetting Model

  45. Generative Model p r i s o n e r Typesetting Model

  46. Generative Model p r i s o n e r Typesetting Model

  47. Generative Model p r i s o n e r Typesetting Model

  48. Generative Model p r i s o n e r Typesetting Model

  49. Generative Model p r i s o n e r Typesetting Model

  50. Generative Model p r i s o n e r Typesetting Model

  51. Generative Model p r i s o n e r Typesetting Model

  52. Generative Model p r i s o n e r Rendering Model

  53. Generative Model p r i s o n e r Rendering Model

  54. Generative Model p r i s o n e r Rendering Model

  55. Generative Model p r i s o n e r

  56. Generative Model Language Model p r i s o n e r E P ( E )

  57. Generative Model Language Model p r i s o n e r E P ( E ) Typesetting Model T · P ( T | E )

  58. Generative Model Language Model p r i s o n e r E P ( E ) Typesetting Model T · P ( T | E ) Rendering Model X P ( X | E, T )

  59. Generative Model Language Model E P ( E ) Typesetting Model T · P ( T | E ) Rendering Model X P ( X | E, T )

  60. Language Model E

  61. Language Model E

  62. Language Model E r a t e i e i − 1 e i +1

  63. Language Model E r a t e i e i − 1 e i +1 Kneser-Ney smoothed character 6-gram

  64. Typesetting Model a e i

  65. Typesetting Model a e i T

  66. Typesetting Model a e i 5 1 T Left pad width l i

  67. Typesetting Model a e i 5 1 T Left pad width l i g i 1 30 Glyph box width

  68. Typesetting Model a e i 5 1 1 5 T Right pad Left pad width width l i g i r i 1 30 Glyph box width

  69. Typesetting Model a e i 5 1 1 5 T Right pad Left pad width width l i g i r i 1 30 a Glyph box width

  70. Typesetting Model a e i 5 1 1 5 T Right pad Left pad width width l i g i r i 1 30 a Glyph box width

  71. Typesetting Model a e i 5 1 1 5 T Right pad Left pad width width l i g i r i 1 30 a Glyph box width a a a v i Vertical offset

  72. Typesetting Model a e i 5 1 1 5 T Right pad Left pad width width l i g i r i 1 30 a d i Glyph box width a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a v i Vertical offset Inking level

  73. Rendering Model

  74. Rendering Model

  75. Rendering Model Glyph box

  76. Rendering Model Glyph box width g i Glyph box

  77. Rendering Model Glyph box Vertical width offset g i v i Glyph box

  78. Rendering Model Glyph box Vertical Inking width offset level d i g i v i Glyph box

  79. Rendering Model Glyph box Vertical Inking width offset level d i g i v i Glyph box X

  80. Rendering Model Glyph box Vertical Inking width offset level d i g i v i Glyph shape parameters Glyph box X

  81. Rendering Model Glyph box Vertical Inking width offset level d i g i v i Glyph shape parameters Glyph box X

  82. Rendering Model Glyph box Vertical Inking width offset level d i g i v i Glyph shape parameters Glyph box X Bernoulli pixel probs

  83. Rendering Model Glyph box Vertical Inking width offset level d i g i v i Glyph shape parameters Glyph box X Sample pixels Bernoulli pixel probs

  84. Rendering Model Glyph box Vertical Inking width offset level d i g i v i Glyph shape parameters Glyph box X Sample pixels Bernoulli pixel probs

  85. Rendering Model Glyph box Vertical Inking width offset level d i g i v i Glyph shape parameters Glyph box X Sample pixels Bernoulli pixel probs

  86. Rendering Model Glyph box Vertical Inking width offset level d i g i v i Glyph shape parameters Glyph box X Sample pixels Bernoulli pixel probs

  87. Rendering Model Glyph box Vertical Inking width offset level d i g i v i Glyph shape parameters Glyph box X Sample pixels Bernoulli pixel probs

  88. Rendering Model Glyph box Vertical Inking width offset level d i g i v i Glyph shape parameters Glyph box X Sample pixels Bernoulli pixel probs

  89. Rendering Model Glyph box Vertical Inking width offset level d i g i v i Glyph shape parameters Glyph box X Sample pixels Bernoulli pixel probs

  90. Rendering Model Glyph box Vertical Inking width offset level d i g i v i Glyph shape parameters Glyph box X Sample pixels Bernoulli pixel probs

  91. Rendering Model Glyph box Vertical Inking width offset level d i g i v i Glyph shape parameters Glyph box X Sample pixels Bernoulli pixel probs

  92. Rendering Model Glyph box Vertical Inking width offset level d i g i v i Glyph shape parameters Glyph box X Sample pixels Bernoulli pixel probs

  93. Rendering Model Glyph box Vertical Inking width offset level d i g i v i Glyph shape parameters Glyph box X Sample pixels Bernoulli pixel probs

  94. Rendering Model Glyph box Vertical Inking width offset level d i g i v i Glyph shape parameters Glyph box X Sample pixels Bernoulli pixel probs

  95. Rendering Model Glyph box Vertical Inking width offset level d i g i v i Glyph shape parameters Glyph box X Sample pixels Bernoulli pixel probs

  96. Rendering Model Glyph box Vertical Inking width offset level d i g i v i Glyph shape parameters Glyph box X Sample pixels Bernoulli pixel probs

  97. Rendering Model Glyph box Vertical Inking width offset level d i g i v i Glyph shape parameters Glyph box X Sample pixels Bernoulli pixel probs

  98. Log-linear Interpolation

  99. Log-linear Interpolation Glyph shape parameters φ Bernoulli pixel probs θ

  100. Log-linear Interpolation Glyph shape parameters φ Bernoulli pixel probs θ

Recommend


More recommend