preserving geospatial data the national geospatial
play

Preserving Geospatial Data: The National Geospatial Digital - PowerPoint PPT Presentation

Preserving Geospatial Data: The National Geospatial Digital Archives Approach Greg Jane UC Santa Barbara NGDA genesis One of eight initial NDIIPP partners Members UCSB, Stanford, UT Knoxville, Vanderbilt Goal How


  1. Preserving Geospatial Data: The National Geospatial Digital Archive’s Approach Greg Janée UC Santa Barbara

  2. NGDA genesis • One of eight initial NDIIPP partners • Members – UCSB, Stanford, UT Knoxville, Vanderbilt • Goal – How to preserve geospatial data, on a national scale, for future generations? Archiving 2009 • 2009-05-05 2

  3. Three questions • What’s special about geospatial? • Are there any design principles that can last a century? • Can we define a useful, implementable, minimal level of preservation? Archiving 2009 • 2009-05-05 3

  4. Geospatial data • Representations of Earth’s surface – remote-sensing imagery georeferenced – aerial photography • geotagged photos, – maps documents – sensor data – GIS data geospatial Archiving 2009 • 2009-05-05 4

  5. Challenges • No uniform data model formats – vector, raster, topological, discrete, continuous, … • Proprietary formats ⇒ Many barriers to tools data mobility Archiving 2009 • 2009-05-05 5

  6. Challenges (cont.) a0000004d.gdbindexes • Multiple granule a0000004d.gdbtable sizes a0000004d.gdbtablx – features a0000004e.blk_key_index.atx a0000004e.col_index.atx – layers a0000004e.gdbindexes – databases a0000004e.gdbtable a0000004e.gdbtablx – projects a0000004e.row_index.atx – cartographic end a0000004f.gdbindexes products a0000004f.gdbtable a0000004f.gdbtablx a00000050.gdbtable • Relational data a00000050.gdbtable.sdc – geodatabases a00000050.gdbtable.sdc.prj a00000050.gdbtable.sdi … Archiving 2009 • 2009-05-05 6

  7. Challenges (cont.) • Large extent Visit the USGS Landsat website for important information – storage regarding: – time • ground station facts, • Landsat calibration parameter • Extensive context file details, • satellite ephemeris information, • Implicit context • satellite anomaly investigations, • data acquisition information, • Dynamic data • image processing particulars, • data product guidance, • SLC-off data product details, • and sample data products. http://landsat.gsfc.nasa.gov/data/tech_details.html Archiving 2009 • 2009-05-05 7

  8. Ocean color example surface radiance SeaWiFS chlorophyll semianalytic MODIS model * ... ... * S. Maritorena, D. Siegel (2005), Consistent merging of satellite ocean color data sets using a bio-optical model, Remote Sens. Env. 94 (4):429–440, doi:10.1016/j.rse.2004.08.014 Archiving 2009 • 2009-05-05 8

  9. User’s view surface radiance SeaWiFS chlorophyll semianalytic MODIS model * ... ... metadata data format (HDF) Archiving 2009 • 2009-05-05 9

  10. Preservation of use (only) surface radiance SeaWiFS chlorophyll semianalytic MODIS model * ... ... metadata preserve data format & (HDF) migrate Archiving 2009 • 2009-05-05 10

  11. The curse of reprocessing • SeaWiFS * – Reprocessing 5.2 - Completed July 12, 2007 – Reprocessing 5.1 - Completed July 5, 2005 – Reprocessing 5 - Completed March 18, 2005 – Reprocessing 4.1 - Completed May 24, 2004 – Reprocessing 4 - Completed July 25, 2002 new atmospheric, solar – Reprocessing 3 - Completed May 24, 2000 irradiance models • Calibration Update - December 1, 2000 • Calibration Update - April 10, 2001 – Reprocessing 2 - August, 1998 – Reprocessing 1 - January, 1998 * http://oceancolor.gsfc.nasa.gov/REPROCESSING/ Archiving 2009 • 2009-05-05 11

  12. Preservation of functionality lineage dependency surface radiance SeaWiFS chlorophyll semianalytic MODIS model * ... ... algorithms metadata software data format calibration preserve, (HDF) migrate, ... reprocess, revalidate Archiving 2009 • 2009-05-05 12

  13. Ozone reprocessing requirements • Calibration artifacts • xDRs – data • Delivered IPs – analysis tools • Engineering data – tables (incl. C3S data if not – logs in RDRs) – notebooks • Upload files – instrument design • All project • Databases documentation • Software (source • All scientific papers code) • All reports Mike Linda, “OMPS Aggregation and Packaging,” 2006 CLASS Users’ Workshop Archiving 2009 • 2009-05-05 13

  14. Challenges— conclusion • NGDA archive design requirements: – compound objects – aggregations and inter-object relationships – extensive context – equal treatment of data, context • Unmet challenges: – storage size – proprietary formats – relational data Archiving 2009 • 2009-05-05 14

  15. Relay principle system ... system system now 100 years • A preservation system should support its own migration Archiving 2009 • 2009-05-05 15

  16. Fallback principle export ingest archive archive storage storage system system Archiving 2009 • 2009-05-05 16

  17. Fallback principle archive archive storage storage system system • A preservation system should support some form of handoff of its content even if the system itself is no longer functional. Archiving 2009 • 2009-05-05 17

  18. iPhoto example iPhoto Library/ 2008/ 11/ DSC_0035.jpg DSC_0036.jpg 12/ DSC_0042.jpg • all metadata ... AlbumData.xml • self-describing Dir.data schema Library.data … Archiving 2009 • 2009-05-05 18

  19. Resurrection principle fully curated somewhat usable resurrectable now 100 years • A preservation system should allow archived information to lapse out of usability, but at all times should support future resurrection of full use of the information. Archiving 2009 • 2009-05-05 19

  20. NGDA archive system archive custom software management, policies, services, access logical data model instantiation of OAIS standard packaging of data, semantics physical data model filesystems, files, XML survivable, vendor-neutral representation of above storage virtualization layer Logistical Networking seamless movement, reliability, redundancy Archiving 2009 • 2009-05-05 20

  21. Physical data model identifier • object structure ...pathname/ • fixity metadata manifest.xml • inter- and intra-object cnty24k97.xml relationships data/ source/ cnty24k97.shp cnty24k97.dbf ... cnty24k97.png Archiving 2009 • 2009-05-05 21

  22. Defining context • Community-related problems – distributed, implicit, inscrutable to outsiders – “known well to those that know it well” • Semantic problems – formal semantics are too hard – multiple, conflicting, informal specifications – multiple software implementations • Conclusion – context defined by community of practice Archiving 2009 • 2009-05-05 22

  23. Capturing context archive software project wikis metadata AIP ? AIP documentation AIP scientific AIP literature Archiving 2009 • 2009-05-05 23

  24. NGDA format registry community wiki page + templated uploads automatic synchronization; curator mediation repository archival object curators Archiving 2009 • 2009-05-05 24

  25. Acknowledgements • UC Santa Barbara • UT Knoxville – James Frew – Micah Beck – Catherine Masi – Terry Moore – Justin Mathena – Adam Ross • NCSU – Steve Morris • Stanford – Nancy Hoebelheinrich • EDINA – Keith Johnson – Guy McGarva – Julie Sweetkind- Singer Archiving 2009 • 2009-05-05 25

Recommend


More recommend