in the acquisition pipeline
play

in the Acquisition Pipeline David W. Embley Christopher Almquist, - PowerPoint PPT Presentation

Research in the Acquisition Pipeline David W. Embley Christopher Almquist, Bill Barrett, Alan Cannaday, Robert Clawson, Jake Gehring, Doug Kennard, Tae Woo Kim, Steve Liddle, Peter Lindes, Deryle Lonsdale, Thomas Packer, Joseph Park, Pat


  1. Research in the Acquisition Pipeline David W. Embley Christopher Almquist, Bill Barrett, Alan Cannaday, Robert Clawson, Jake Gehring, Doug Kennard, Tae Woo Kim, Steve Liddle, Peter Lindes, Deryle Lonsdale, Thomas Packer, Joseph Park, Pat Schone, Scott Woodfield

  2. Acquisition Pipeline Strategic Planning Field Negotiations Field Capture HQ Image & Metadata Ingest Image Auditing Cataloging Collection Treatment Waypointing Book Scanning Oral History Recording Indexing Post-Processing/Quality Control Load to Search Engine, Publish 2

  3. Field Capture: Blur Detection (Alan Cannaday) 3

  4. Field Capture: Blur Detection Sharp/Focused Out of Focus Motion Blur 4

  5. Field Capture: Blur Detection Sharp/Focused Out of Focus Motion Blur “More than two “Transitional pixels in a transitional pixels between single direction exceed the edge of a high contrast one pixel.” line and the background.” 5

  6. Field Capture: Blur Detection Pass Failed Passed 82.0% 83.5% Fail (smoothed) Blur (smoothed) Out of Focus (smoothed) 6

  7. Load to Search Engine: Constraint Satisfaction (Scott Woodfield) 7

  8. Load to Search Engine: Constraint Satisfaction (Scott Woodfield) Existing assertions: Blood type of father, mother, and child all A-. New assertion: C hild’s blood type B - Conclusions: Probability = 0.0. (1) Parentage wrong (2) One or more blood types wrong 8

  9. Automated “Green” Indexing “Green”: improves with use— learns from user interaction • Intelligent Indexing • “Click” Annotator • GreenFIE-HD • Obituaries (100M+) – FROntIER – Machine Learning • Scanned Books (100K+) – ListReader – FormReader/TableReader – OntoSoar • GreenFIE-HD ++ 9

  10. “Green” Intelligent Indexing (Robert Clawson, Doug Kennard, …, Bill Barrett) 10

  11. “Green” Intelligent Indexing 11

  12. “Green” Intelligent Indexing 12

  13. “Green” Intelligent Indexing 13

  14. Annotator (Christopher Almquist , …, Steve Liddle) 14

  15. Annotator (Christopher Almquist , …, Steve Liddle) 15

  16. GreenFIE-HD (Tae Woo Kim) “ Green ” F orm-based I nformation E xtraction for H istorical D ocuments 16

  17. GreenFIE-HD: Extraction Rule Creation \d{1}\.\s([A-Z][a-z]{2,6})\s([A-Z][a-z]{4,10}),\sb\.\s(\d{4}),\sd\.\s(\d{4})\. 17

  18. GreenFIE-HD: Recall Error Resolution i860 \d{1}\.\s([A-Z][a-z]{2,8})\s([A-Z][a-z]{1,8}),\sb\.\s(\d{4})(\.|,\sd\.\s(\d{4})) \d{1}\.\s([A-Z][a-z]{2,8})\s([A-Z][a-z]{1,8}),\sb\.\s(\d{4} |i\d{3} )(\.|,\sd\.\s(\d{4})) 18

  19. GreenFIE-HD: Precision Error Resolution \.\s([A-Z][a-z]{2,8})\s([A-Z][a-z]{1,8}), \d{1} \.\s ([A-Z][a-z]{2,8})\s([A-Z][a-z]{1,8}), \sb\.\s 19

  20. GreenFIE-HD: Principles • Look-ahead: automatic extraction • Look-behind: rule derivation and adjustment • “Green”: improves with use 20

  21. Obituaries with FROntIER (Joseph Park) ( F act R ecognizer for Ont ologies with I nference and E ntity R esolution) 21

  22. Obituaries with FROntIER 22

  23. Obituaries with FROntIER 23

  24. Obituaries with FROntIER 24

  25. Obituaries with FROntIER 25

  26. Obituaries with FROntIER Jordan Frost Travis Frost Michael Brian Frost Bryce Frost Alex Reed Frost Brian Fielding & Susan Fox Frost Kenneth Wesley & Ellen Frost Dale & Anne Frost Elkins Kent & Sally Frost Britton Donald Glade & Lynn Frost Donald Fielding Frost & Helen Glade Frost 26

  27. Obituaries with Machine Learning (Pat Schone) 27

  28. Obituaries with GreenFIE-HD 28

  29. ListReader (Thomas Packer) 29

  30. ListReader 30

  31. ListReader 31

  32. ListReader (([\n])([\d]{1})(\.[ \n])(([A-Z]+[a-z]+|[A-Z]+[a-z]+[A-Z]+[a-z]+)))((,)([ \n])([\d]{4}))((\.)([\n])) Label Fields 32

  33. ListReader (Thomas Packer) 33

  34. FormReader 34

  35. ChartReader, Table Reader, … Jordan Frost Travis Frost Michael Brian Frost Bryce Frost Alex Reed Frost Brian Fielding & Susan Fox Frost Kenneth Wesley & Ellen Frost Dale & Anne Frost Elkins Kent & Sally Frost Britton Donald Glade & Lynn Frost Donald Fielding Frost & Helen Glade Frost 35

  36. OntoSoar (Peter Lindes, Deryle Lonsdale) 4/7/2014 BYU CS Colloquium 36

  37. OntoSoar (Peter Lindes, Deryle Lonsdale) +--------------Xp-------------+ +Wd+--Ss-+MVp+IN-+ | | | | | | | died ^ Mary died.v in 1853 . on Soar in(died,N4) 1853(N4) Mary(N2) died(N2) OntoES Person(…) Name(…) Person(X1) Person(…) has Name(…) Name(X2,"Mary") DeathDate (…) Person(X1) has Name(X2) Person(…) died on DeathDate (…) DeathDate(X3,"1853") Person(X1) died on DeathDate(X3) 4/7/2014 BYU CS Colloquium 37

  38. OntoSoar (Peter Lindes, Deryle Lonsdale) 4/7/2014 BYU CS Colloquium 38

  39. OntoSoar (Peter Lindes, Deryle Lonsdale) +---------------------------------Xp------------------------------+ | +--------Ost--------+ +-----Js-----+ | +-Wd-+-Ss-+ +-----A-----+--Mp---+ +---DG--+ | | | | | | | | | | ^ Emma was.v official.a historian.n of the NYCDAR . Soar OntoES “of”(x1,x2) Name(“Emma”) “NYCDAR”(x2) Officer(“historian”) “Emma”(x1) Organization(“NYCDAR”) “historian”(x1) Person –Name(y1,“Emma”) “official”(x1) Person-Officer- Organization(y1,“official historian”,“NYCDAR”) 4/7/2014 BYU CS Colloquium 39

  40. GreenFIE-HD ++ FROntIER } ListReader GreenFIE-HD OntoSoar Ever learning & improving 40

  41. Research Wish List Strategic Planning Field Negotiations Field Capture HQ Image & Metadata Ingest OCR alignment with Image Auditing images across fonts and typesetting layouts Cataloging Collection Treatment Waypointing Automated extraction from filled-in forms, Book Scanning tables and ahnentafel Oral History Recording templates Indexing Post-Processing/Quality Control Semantic OCR error correction Load to Search Engine, Publish 41

  42. Research Wish List (Jake Gehring) Strategic Planning Field Negotiations Field Capture Facial recognition HQ Image & Metadata Ingest based on labeled Image Auditing faces in other photos Cataloging Collection Treatment Waypointing Book Scanning Social/collaborative Oral History Recording indexing environments Indexing Post-Processing/Quality Control Snippet indexing on mobile devices Load to Search Engine, Publish 42

  43. Research Wish List (Jake Gehring) Strategic Planning Field Negotiations Field Capture Extraction of lineage- HQ Image & Metadata Ingest linked data in register- Image Auditing style tables Cataloging Collection Treatment Waypointing Book Scanning Handwriting Oral History Recording recognition Indexing Post-Processing/Quality Control Search results clustering based on Load to Search Engine, Publish kinship networks 43

  44. Research Wish List (Jake Gehring) Strategic Planning Field Negotiations Field Capture Extraction of lineage- HQ Image & Metadata Ingest linked data from text Image Auditing Cataloging Collection Treatment Waypointing Newspaper scanning, Book Scanning zoning, article concatenation Oral History Recording Indexing Post-Processing/Quality Control Records hinting for historical collections Load to Search Engine, Publish 44

  45. Research Wish List (Jake Gehring) Strategic Planning Field Negotiations Field Capture HQ Image & Metadata Ingest Automatic document Image Auditing classification Cataloging Collection Treatment Image capture Waypointing software to eliminate Book Scanning blur and focus issues Oral History Recording Indexing Post-Processing/Quality Control Efficient routing of Load to Search Engine, Publish work to volunteers 45

  46. Summary Streamline the Pipeline: Research Opportunities “turn … the heart of the children to their fathers” 46

Recommend


More recommend