Efficient Binarization for Historical Document Analysis Florian Westphal H˚ akan Grahn Niklas Lavesson Blekinge Institute of Technology Karlskrona, Sweden flw@bth.se 2016-02-02 F. Westphal, H. Grahn, N. Lavesson (BTH) Efficient Binarization 2016-02-02 1 / 20
Outline Document Readability 1 Howe’s Binarization Algorithm 2 Heterogenous Computing 3 Binarization Pipeline 4 F. Westphal, H. Grahn, N. Lavesson (BTH) Efficient Binarization 2016-02-02 2 / 20
BTH & ArkivDigital Swedish university, established Swedish company, established in 1989 in 2004 Over 6000 registered students Provides access to almost 60 million images BigData@BTH Church books, court records, military records, census records, . . . F. Westphal, H. Grahn, N. Lavesson (BTH) Efficient Binarization 2016-02-02 3 / 20
Document Readability F. Westphal, H. Grahn, N. Lavesson (BTH) Efficient Binarization 2016-02-02 4 / 20
Approach F. Westphal, H. Grahn, N. Lavesson (BTH) Efficient Binarization 2016-02-02 5 / 20
Approach F. Westphal, H. Grahn, N. Lavesson (BTH) Efficient Binarization 2016-02-02 5 / 20
Approach F. Westphal, H. Grahn, N. Lavesson (BTH) Efficient Binarization 2016-02-02 6 / 20
Approach - Demo F. Westphal, H. Grahn, N. Lavesson (BTH) Efficient Binarization 2016-02-02 7 / 20
Howe’s Binarization Algorithm F. Westphal, H. Grahn, N. Lavesson (BTH) Efficient Binarization 2016-02-02 8 / 20
Howe’s Binarization Algorithm (Cont.) F. Westphal, H. Grahn, N. Lavesson (BTH) Efficient Binarization 2016-02-02 9 / 20
Heterogenous Computing F. Westphal, H. Grahn, N. Lavesson (BTH) Efficient Binarization 2016-02-02 10 / 20
Binarization Pipeline F. Westphal, H. Grahn, N. Lavesson (BTH) Efficient Binarization 2016-02-02 11 / 20
Binarization Pipeline (Cont.) I II III IV V VI VII VIII 1 CPU GPU CPU GPU CPU GPU CPU GPU 2 CPU CPU GPU GPU CPU CPU GPU GPU 3 CPU CPU CPU CPU GPU GPU GPU GPU F. Westphal, H. Grahn, N. Lavesson (BTH) Efficient Binarization 2016-02-02 12 / 20
Preliminary Results Reference Implementation Configuration IV F. Westphal, H. Grahn, N. Lavesson (BTH) Efficient Binarization 2016-02-02 13 / 20
Preliminary Results (Cont.) Reference Implementation Configuration VIII F. Westphal, H. Grahn, N. Lavesson (BTH) Efficient Binarization 2016-02-02 14 / 20
Preliminary Results - Binarization Performance 100 12 95 10 Pseudo F-Measure in % 90 8 DRD 85 6 80 4 75 2 * * * * * * * * C G C G F. Westphal, H. Grahn, N. Lavesson (BTH) Efficient Binarization 2016-02-02 15 / 20
Preliminary Results - Time H-DIBCO 2014 Benchmark 11 Own Reference 10 9 Time in Seconds 8 7 6 5 I I I I V V V V I I V ( I I I I C ( ( I I G ( ( C ( I C G G ( C C ( C C C G G C G G C G ) C C G G ) G ) ) ) ) G ) ) F. Westphal, H. Grahn, N. Lavesson (BTH) Efficient Binarization 2016-02-02 16 / 20
Preliminary Results - Time (cont.) High Resolution Image 40 Own Reference 35 30 Time in Seconds 25 20 15 10 I I I I V V V V I I V ( I I I I C ( ( I I G ( ( ( I C C ( C G G C C C ( G G C G C C G G ) C G G C ) ) G ) ) ) G ) ) F. Westphal, H. Grahn, N. Lavesson (BTH) Efficient Binarization 2016-02-02 17 / 20
Preliminary Results - Time per Step Time taken in each binarization step for the used high resolution image. 1 2 3 CPU 2.27 s 0.17 s 28.76 s GPU 0.39 s 0.11 s 14.54 s F. Westphal, H. Grahn, N. Lavesson (BTH) Efficient Binarization 2016-02-02 18 / 20
Next Steps Revision of the implementation Implementation of the binarization pipeline F. Westphal, H. Grahn, N. Lavesson (BTH) Efficient Binarization 2016-02-02 19 / 20
Acknowledgements We would like to thank ArkivDigital for providing us with access to their image database. This work is part of the research project ”Scalable resource-efficient systems for big data analytics” funded by the Knowledge Foundation (grant: 20140032) in Sweden. F. Westphal, H. Grahn, N. Lavesson (BTH) Efficient Binarization 2016-02-02 20 / 20
Recommend
More recommend