extending an atomistic fedora commons object model to
play

Extending an atomistic Fedora- Commons object model to facilitate - PowerPoint PPT Presentation

Extending an atomistic Fedora- Commons object model to facilitate image segmentation and enhance discovery David Lacy david.lacy@villanova.edu Villanova University Open Repositories 2013 Prince Edward Island July 11 th , 2013


  1. Extending an atomistic Fedora- Commons object model to facilitate image segmentation and enhance discovery David Lacy david.lacy@villanova.edu Villanova University Open Repositories 2013 Prince Edward Island July 11 th , 2013

  2. digital.library.villanova.edu ● Our repository has large amounts of scanned/paginated resources – Books – Manuscripts – Newspapers – Theses – Scrapbooks – etc

  3. Topics ● Existing Model, Hierarchy and View ● Extensions – Image Segmentation – Page Level Search Results

  4. Basic Model Collection Core Data

  5. Enhanced Model Folder Folder Resource Collection List Core Image Data Document Audio Video

  6. Object Hierarchy rel:isMemberOf Dime Novel Collection (Folder) Bride of the Tomb (Resource) Page 1 (Image) Page 2 (Image) Page 3 (Image)

  7. Hierarchy with multiple relationships (1) rel:isMemberOf Dime Novel Collection (Folder) Series List (Folder) Buffalo Bill (Folder) Fiction (Folder)

  8. Hierarchy with multiple relationships (2) rel:isMemberOf Dime Novel Collection Page 1 (Folder) (Image) Page 2 Bride of the Tomb (Image) (Resource) Page 3 Page Images (Image) (List) Chapters (List) Page 33 (Image) Chapter 1 (List) Page 34 (Image) Chapter 2 Page 35 (List) (Image)

  9. Basic Object Hierarchy in Solr ● Objects included in Solr – Resource Objects – Folder Objects ● Each Solr Record includes parent record ID(s) – Facilitates browsing collections

  10. Browse Hierarchy

  11. Browse Hierarchy

  12. Browse Hierarchy Tree

  13. Search Resources and Folders

  14. Moving forward... We have a large amount of scanned pages

  15. That is, we have lots of stuff that looks like this

  16. We want to expose this

  17. But I want to work on this instead

  18. The Plan ● Define segments of Images and extract to create new objects ● Create new Article Resources from these new images

  19. Image Object ● Comprised utilizing Fedora's “Mixed-in” approach, and combines the following models: – Core Model – Data Model – Image Model

  20. Core Model ● Datastreams ● Methods – THUMBNAIL – getThumb – PARENT-LIST – generateParentList

  21. Data Model ● Datastreams ● Methods – MASTER – generateMetadata – MASTER-MD

  22. Image Data Model ● Datastreams ● Methods – LARGE – generateDerivative – MEDIUM – generateOCR – OCR-DIRTY

  23. Image Object ● Datastreams ● Methods – THUMBNAIL – getThumb – PARENT-LIST – generateParentList – MASTER – generateMetadata – MASTER-MD – generateDerivative – MEDIUM – generateOCR – LARGE – OCR-DIRTY

  24. Segment Image Extension of Image Object ● Comprised Utilizing Fedora's “Mixed-in” approach, and combines the following: – Core Model – Data Model – Image Model – Segment Model

  25. Segment Image Model – Part 1 New elements ● Datastreams ● Methods – COORDINATES – generateSegment

  26. Segment Object ● Datastreams ● Methods – THUMBNAIL – getThumb – PARENT-LIST – generateParentList – MASTER – generateMetadata – MASTER-MD – generateDerivative – MEDIUM – generateOCR – LARGE – generateSegment – OCR-DIRTY – COORDINATES

  27. Segment Image Model – Part 2 New relationship – rel:isPartOf rel:isPartOf Article Segment 1 Page 1 (Segment) (Image)

  28. Hierarchy of Segmented Images March 2003 (Resource) Page List (List) Page 1 (Image) Article A (Segment) rel:isPartOf Article B (Segment)

  29. Segment Image Model – Part 3 Creating a new MASTER datastream Article Segment 1 Page 1 (Segment) (Image) generateSegment MASTER MASTER COORDINATES rel:isPartOf

  30. Interface for generating COORDS

  31. Image MASTER Segment MASTER

  32. Segment Object ● Datastreams – THUMBNAIL – PARENT-LIST – MASTER – MASTER-MD – MEDIUM – LARGE – OCR-DIRTY – COORDINATES

  33. Segments within a Resource rel:isMemberOf Taj Mahal Interview (Resource) Segment List (List) Part 1 (Segment) Part 2 (Segment) Part 3 (Segment)

  34. Complex Object Hierarchy Page 1 (Image) March 2003 (Folder) Page 2 (Image) Page List (List) Page 3 (Image) Article List (List) rel:isPartOf Part 1 Taj Mahal Interview (Segment) (Resource) Part 2 (Segment) Segment List (List)

  35. Resource with multiple List Objects

  36. Article List Expanded

  37. Pages List Expanded

  38. Front End / Solr

  39. Current Solr Result Set Folders and Resources Record: PID = Resource Record: PID = Resource Record: PID = Folder Record: PID = Resource

  40. Front End: Existing Results

  41. Front End: Existing Results

  42. This works, but as mentioned before matching text on page 30 will return the entire Resource

  43. Expose page-specific matches by ingesting data objects too

  44. Total Objects ● 18,000+ Resource Objects ● 600+ Folder Objects ● 220,000+ Data objects

  45. Solr Field Collapsing ● Group results based on shared solr field – <parentGroup/> ● Data Objects – <parentGroup/> = Parent Resource ● Folders and Resources – <parentGroup> = Self

  46. Collapsed Solr Result Set Folders, Resources, and Data Objects Group: PID = Resource ● Display Groups as Record / Image search Results Record / Image instead of Records ● Records within Group: PID = Resource Groups can direct Record / Image patrons to specific Record / Image pages within Resources Group: PID = Resource Record / Resource

  47. Advanced Solr Results

  48. Taj Mahal Interview

  49. Taj Mahal Interview

  50. March Issue, page 27

  51. Lists in Accordion

  52. Lists in Accordion

  53. Hangups ● Null Resource hit on query ● Multiple collection memberships in Solr – Cannot sort on a multi-value field

  54. Acknowledgments ● Demian Katz, Villanova University ● Chris Hallberg, Villanova University ● Eoghan Ó Carragáin, National Library of Ireland

Recommend


More recommend