RetrievingOCRText: ASurveyofCurrentApproaches - PowerPoint PPT Presentation

Mar 20, 2023 •250 likes •319 views

RetrievingOCRText: ASurveyofCurrentApproaches InformationRetrievalLab IllinoisInstituteofTechnology S. Beitzel E.Jensen D.Grossman {steve,ej,grossman}@ir.iit.edu Overview

Retrieving�OCR�Text: A�Survey�of�Current�Approaches Information�Retrieval�Lab Illinois�Institute�of�Technology S. Beitzel E.�Jensen D.�Grossman {steve,�ej,�grossman}@ir.iit.edu
Overview • Models�for�OCR�Text • Processing�OCR�Text�for�Categorization • Auto-correction�of�OCR�Errors 2
Models�for�OCR�Text • Mittendorf,�Schauble,�and�Sheridan�(1995,�1996) • Incorporate�probabilities�of�typical�OCR�errors • Harding,�Croft�,Weir�(1997) – Addition�of�character-based�n-grams�to�the�model.� – Ex:�Environment • _en�env�nvi�vir�iro�onm�nme�men�ent�– 3-grams 3
Auto-Correction�of�OCR�Errors • Liu�(1991) – Classify�each�type�of�error – Use�dictionary�lookup�to�identify�candidate� terms • Taghva,�Borsack and�Condit�(1994) – Clustering�to�group�mis-spellings�in�with�their� correctly�mis-spelled�terms 4
OCR�Text�for�Categorization • Hoch (1994) – Use�of�categorizer�on�OCR�text,�showed� degraded�performance�with�OCR�data. • Junker�and�Hoch (1997) – N-grams�were�used�to�show�some�improvement� as�well�in�[Junk97]. 5
Summary • Models�exist�for�OCR�retrieval • N-grams�have�been�shown�to�have�some� success • No�large�standard�test�collection�of�OCR� data,�small�collections�exist�with�some�early� TREC�data.� 6

Recommend

Process for OCR Audit and Remediation What is an OCR Complaint? How do I resolve an OCR

Process for OCR Audit and Remediation What is an OCR Complaint? How do I resolve an OCR complaint? How can I be proactive about handling an OCR complaint? No two OCR Complaints are exactly the same! 1. School receives complaint NOTE: This

906 views • 12 slides

OCR for CJK Mark Ravina CEAL Technology Forum 2018 I am an OCR end-user, not an OCR developer

OCR for CJK Mark Ravina CEAL Technology Forum 2018 I am an OCR end-user, not an OCR developer I am a passionate open-source advocate Commercial software needs to be worth the cost Can we put high-quality CJK OCR on every computer in

1.01k views • 26 slides

M-Files OCR Presented By: Syed Raza What is OCR? OCR - Optical Character Recognition

M-Files OCR Presented By: Syed Raza What is OCR? OCR - Optical Character Recognition Method of recognizing text in image METADATA-POWERED INFORMATION MANAGEMENT Key features Conversion to Searchable PDF For existing image files

646 views • 16 slides

ABBYY Fi ABBYY Fi ABBYY FineReader ABBYY FineReader R R d d OCR and PDF Conversion OCR and

ABBYY Fi ABBYY Fi ABBYY FineReader ABBYY FineReader R R d d OCR and PDF Conversion OCR and PDF Conversion What is FineReader? What is FineReader? What is FineReader? What is FineReader? OCR of scanned images or PDFs OCR of scanned

315 views • 12 slides

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text Sample text Sample text Sample text Sample text Sample text Sample text Sample text Sample text Sample

208 views • 10 slides

Introduction to OCR ZHANG Xinyun SmartMore Outline Background Text Detection Text

Introduction to OCR ZHANG Xinyun SmartMore Outline Background Text Detection Text Recognition Conclusion 2 Background What is OCR ? OCR stands for Optical Character Recognition, which is the electronic or mechanical

1.05k views • 29 slides

What Does OCR Do? OCR enforces several civil rights laws. These laws prohibit discrimination on

The U.S. Department of Education Office for Civil Rights (OCR) Students with disabilities attending a postsecondary Institution Ohio AHEAD conference October 23, 2015 1 What Does OCR Do? OCR enforces several civil rights laws. These laws

417 views • 12 slides

OCR Level 2 ITQ - Unit 59 - Presentation Software Using OCR Level 2 ITQ - Unit 59 - Presentation

2YMBBQL88PU8 # Doc OCR Level 2 ITQ - Unit 59 - Presentation Software Using Microsoft... OCR Level 2 ITQ - Unit 59 - Presentation Software Using OCR Level 2 ITQ - Unit 59 - Presentation Software Using Microsoft PowerPoint 2010 Microsoft

203 views • 3 slides

OCR Level 1 ITQ - Unit 58 - Presentation Software Using OCR Level 1 ITQ - Unit 58 - Presentation

QBZNVMMQUK9L / Book OCR Level 1 ITQ - Unit 58 - Presentation Software Using Microsoft... OCR Level 1 ITQ - Unit 58 - Presentation Software Using OCR Level 1 ITQ - Unit 58 - Presentation Software Using Microsoft PowerPoint 2013 Microsoft

243 views • 3 slides

A MultiAgent System for A MultiAgent System for Retrieving Bioinformatics Retrieving

A MultiAgent System for A MultiAgent System for Retrieving Bioinformatics Retrieving Bioinformatics Publications from Web Sources Publications from Web Sources A. Addis, A. Manconi, M. Saba, and E. Vargiu Intelligent Agents and Soft-Computing

564 views • 29 slides

A Workflow Workflow for for Retrieving Retrieving Orthologous Orthologous A Promoters and I

A Workflow Workflow for for Retrieving Retrieving Orthologous Orthologous A Promoters and I mplications I mplications for for Workflow Workflow Promoters and Management Systems. Management Systems. A Case Case Study Study. . A Part

330 views • 17 slides

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here Enter Text Here Enter Text Here CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here Enter Text

699 views • 66 slides

Evaluating Binarization for OCR Donald B. Curtis MyFamily.com, Inc. Genealogical Data

Evaluating Binarization for OCR Donald B. Curtis MyFamily.com, Inc. Genealogical Data Extraction Keying data from digital images is costly. OCR can be cost- effective for machine- printed documents. OCR projects can be

611 views • 24 slides

OCR vs. text2Pitman ... Tell me about plans. OCR How old are you? It is time to close

OCR vs. text2Pitman ... Tell me about plans. OCR How old are you? It is time to close office now Im office. Tell me about plans. How old are you? It is time

698 views • 18 slides

OCR Post-Processing Michal Richter Noisy channel approach I Scanning of the document and OCR

OCR Post-Processing Michal Richter Noisy channel approach I Scanning of the document and OCR introduce errors noise Post processing step reduce the number of errors Noisy channel approach II Post processing corrects one

394 views • 14 slides

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B Benefits C Take-Aways D Research Areas Add text add text add text add text add text add text add text add text add text add text add text E Research

514 views • 12 slides

Themes underlying legislative developments in the past decade Optimising well-established 3 rd

2017/02/22 Themes underlying legislative developments in the past decade Optimising well-established 3 rd tier retirement funding arrangements Stronger regulation of more funds (not a light touch) Governance (for example, managing

274 views • 10 slides

TAKING ON YOUR EMPLOYEES FUTURE TOGETHER WHAT ABOUT AUTO ENROLMENT? 1,489,815 10.07M

Youre in but #areyoufuturefit 26 June 2019 TAKING ON YOUR EMPLOYEES FUTURE TOGETHER WHAT ABOUT AUTO ENROLMENT? 1,489,815 10.07M 11.53M 9.37M Employers have Eligible Jobholders Active Members of Out of scope declared Compliance

998 views • 22 slides

Implementing Auto Enrolment Successfully Andy Agathangelou, Head of Strategic Relationships,

Implementing Auto Enrolment Successfully Andy Agathangelou, Head of Strategic Relationships, Close Brothers Asset Management Auto-enrolment represents a seismic shift in the UKs pensions landscape The best approach for this session

1.08k views • 41 slides

The Poverty & Inequality Effects of Pensions Dr Michel Collins, UCD ml.collins@ucd.ie

The Poverty & Inequality Effects of Pensions Dr Michel Collins, UCD ml.collins@ucd.ie NERI Labour Market Conference, September 2020 Outline 1. Research Question 2. Why? 3. Data and Methods 4. Initial Results 5. Next Steps 2 1.

456 views • 18 slides

OCR Errors by Michael Barz Motivation In general: How to get information out of noisy input?

OCR Errors by Michael Barz Motivation In general: How to get information out of noisy input? Dealing with noisy input (scan/fax/e- mail) in written form Approach: Combination of diverse NLP tools in one pipeline Optical

543 views • 29 slides

Shape Context Matching For Efficient OCR Sudeep Pillai May 14, 2012 Sudeep Pillai Shape Context

Background & Motivation Shape Context Fast Matching Shape Context Matching For Efficient OCR Sudeep Pillai May 14, 2012 Sudeep Pillai Shape Context Matching For Efficient OCR Background & Motivation Shape Context Fast Matching

632 views • 34 slides

Compilation of a Large Ground-Truth Data Set Using Transkribus Matthias Boenig & Kay-Michael

Compilation of a Large Ground-Truth Data Set Using Transkribus Matthias Boenig & Kay-Michael Wrzner {boenig|wuerzner}@bbaw.de Transkribus User Conference Vienna, 2nd November 2017 Overview Goal: Compilation of a large, homogeneous Ground

385 views • 20 slides

TINY TEXT AHEAD! Move up! Quality OCR A TANGO OF AVAILABLE RESOURCES Michelle Paolillo,

TINY TEXT AHEAD! Move up! Quality OCR A TANGO OF AVAILABLE RESOURCES Michelle Paolillo, Digital Lifecycle Lead Mira Basara, Ingest Collection Specialist Cornell University Quality OCR A TANGO OF AVAILABLE RESOURCES Michelle Paolillo,

355 views • 33 slides

RetrievingOCRText: ASurveyofCurrentApproaches - PowerPoint PPT Presentation

RetrievingOCRText: ASurveyofCurrentApproaches InformationRetrievalLab IllinoisInstituteofTechnology S. Beitzel E.Jensen D.Grossman {steve,ej,grossman}@ir.iit.edu Overview

Process for OCR Audit and Remediation What is an OCR Complaint? How do I resolve an OCR

OCR for CJK Mark Ravina CEAL Technology Forum 2018 I am an OCR end-user, not an OCR developer

M-Files OCR Presented By: Syed Raza What is OCR? OCR - Optical Character Recognition

ABBYY Fi ABBYY Fi ABBYY FineReader ABBYY FineReader R R d d OCR and PDF Conversion OCR and

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Introduction to OCR ZHANG Xinyun SmartMore Outline Background Text Detection Text

What Does OCR Do? OCR enforces several civil rights laws. These laws prohibit discrimination on

OCR Level 2 ITQ - Unit 59 - Presentation Software Using OCR Level 2 ITQ - Unit 59 - Presentation

OCR Level 1 ITQ - Unit 58 - Presentation Software Using OCR Level 1 ITQ - Unit 58 - Presentation

A MultiAgent System for A MultiAgent System for Retrieving Bioinformatics Retrieving

A Workflow Workflow for for Retrieving Retrieving Orthologous Orthologous A Promoters and I

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Evaluating Binarization for OCR Donald B. Curtis MyFamily.com, Inc. Genealogical Data

OCR vs. text2Pitman ... Tell me about plans. OCR How old are you? It is time to close

OCR Post-Processing Michal Richter Noisy channel approach I Scanning of the document and OCR

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Themes underlying legislative developments in the past decade Optimising well-established 3 rd

TAKING ON YOUR EMPLOYEES FUTURE TOGETHER WHAT ABOUT AUTO ENROLMENT? 1,489,815 10.07M

Implementing Auto Enrolment Successfully Andy Agathangelou, Head of Strategic Relationships,

The Poverty & Inequality Effects of Pensions Dr Michel Collins, UCD ml.collins@ucd.ie

OCR Errors by Michael Barz Motivation In general: How to get information out of noisy input?

Shape Context Matching For Efficient OCR Sudeep Pillai May 14, 2012 Sudeep Pillai Shape Context

Compilation of a Large Ground-Truth Data Set Using Transkribus Matthias Boenig & Kay-Michael

TINY TEXT AHEAD! Move up! Quality OCR A TANGO OF AVAILABLE RESOURCES Michelle Paolillo,

Sambuz

Useful Links

Newsletter

Mail Us

RetrievingOCRText: ASurveyofCurrentApproaches - PowerPoint PPT Presentation

RetrievingOCRText: ASurveyofCurrentApproaches InformationRetrievalLab IllinoisInstituteofTechnology S. Beitzel E.Jensen D.Grossman {steve,ej,grossman}@ir.iit.edu Overview

Process for OCR Audit and Remediation What is an OCR Complaint? How do I resolve an OCR

OCR for CJK Mark Ravina CEAL Technology Forum 2018 I am an OCR end-user, not an OCR developer

M-Files OCR Presented By: Syed Raza What is OCR? OCR - Optical Character Recognition

ABBYY Fi ABBYY Fi ABBYY FineReader ABBYY FineReader R R d d OCR and PDF Conversion OCR and

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Introduction to OCR ZHANG Xinyun SmartMore Outline Background Text Detection Text

What Does OCR Do? OCR enforces several civil rights laws. These laws prohibit discrimination on

OCR Level 2 ITQ - Unit 59 - Presentation Software Using OCR Level 2 ITQ - Unit 59 - Presentation

OCR Level 1 ITQ - Unit 58 - Presentation Software Using OCR Level 1 ITQ - Unit 58 - Presentation

A MultiAgent System for A MultiAgent System for Retrieving Bioinformatics Retrieving

A Workflow Workflow for for Retrieving Retrieving Orthologous Orthologous A Promoters and I

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Evaluating Binarization for OCR Donald B. Curtis MyFamily.com, Inc. Genealogical Data

OCR vs. text2Pitman ... Tell me about plans. OCR How old are you? It is time to close

OCR Post-Processing Michal Richter Noisy channel approach I Scanning of the document and OCR

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Themes underlying legislative developments in the past decade Optimising well-established 3 rd

TAKING ON YOUR EMPLOYEES FUTURE TOGETHER WHAT ABOUT AUTO ENROLMENT? 1,489,815 10.07M

Implementing Auto Enrolment Successfully Andy Agathangelou, Head of Strategic Relationships,

The Poverty &amp; Inequality Effects of Pensions Dr Michel Collins, UCD ml.collins@ucd.ie

OCR Errors by Michael Barz Motivation In general: How to get information out of noisy input?

Shape Context Matching For Efficient OCR Sudeep Pillai May 14, 2012 Sudeep Pillai Shape Context

Compilation of a Large Ground-Truth Data Set Using Transkribus Matthias Boenig &amp; Kay-Michael

TINY TEXT AHEAD! Move up! Quality OCR A TANGO OF AVAILABLE RESOURCES Michelle Paolillo,

Sambuz

Useful Links

Newsletter

Mail Us

The Poverty & Inequality Effects of Pensions Dr Michel Collins, UCD ml.collins@ucd.ie

Compilation of a Large Ground-Truth Data Set Using Transkribus Matthias Boenig & Kay-Michael