Prioritizing use over perfection: a risk management approach to digital preservation Matthew Mihalik George Washington University Rachel Trent Library of Congress (Formerly George Washington University)
Introduction GW has committed to a transparent digital stewardship strategy that prioritizes access to digital assets over adherence to digital preservation standards. Our current strategy is informed by our past experience, failures, and our users needs.
Distant history Invested 1.5 years developing a custom storage environment, id system, custom inventory and audit tools. Features: file system inventory system, web management UI, synchronization between storage environments, built in checksum auditing Outcome: project became paralyzed by complexity and was never adopted.
Recent history No articulated commitments for our digital preservation work. Disconnected storage environments split between access and preservation. No active auditing or inventorying of digital assets. Patrons were only able to access a subset of our digital collections
Initial risk landscape ● No available inventory of digital assets on storage ● No access controls to preservation servers ● No auditing of digital assets on preservation servers ● Limited redundant copies of digital assets ● No clearly articulated policy of commitments to our digital content ● Unclear roles and responsibilities between GW units for management of digital assets
What did we decide to do about it? We defined set of principles that our stakeholders were able to commit to and defined our minimum viable product. They decided we wanted to first provide access to more our digital collections and then we wanted ensure that we know what we have, where we have it, and if it’s changed.
New mission statement GW Libraries’ Digital Stewardship Services provides long-term preservation of selected unique, rare, and institutionally-created digital materials , such as student and faculty research products, University records of enduring value, and specialized cultural heritage collections. These include born-digital and digitized materials .
Transparent commitments As a part of our digital steward initiatives, we committed to being transparent with our stakeholders and users about what GW is and is not doing for our digital assets. GW has committed to preserving and providing access to this carefully selected set of digital materials over the long term. Commitments are the result of strategic resource planning that balances the benefits of providing engaging, rich access for today’s users with key investments to support access for future users.
Tier 2
Tier 1
Tier 0
Current infrastructure ● Storage environment comprised of linux servers ○ Current file systems mirror legacy storage file systems. ○ Offsite backups of these servers of accepted as our “second” copy ● Access environment built on Hyrax ● Simple Audit Tool ● Amazon Web Services for offsite copies ○ Reserved for a selective subset of materials
Digital stewardship group Stakeholder group at GW responsible for digital stewardship decision making and resourcing. Membership includes: associate deans, IT staff, developers, scholarly communication staff, and digital services staff
Digital services team needs ● Know if something has been added to a filesystem ● Know if something has been removed from a filesystem ● Know if an asset on the filesystem has changed ● Know who performed actions on the filesystem ● Ability to schedule audits and run ad-hoc ● Email reports with results
Simple audit tool Developed to meet our digital services team basic needs to know where assets are stored, what we have, and if anything has changed. Command line tool written in Python that can be run manually or via cronjob. Available on GitHub: gwu-libraries/audit-tool
Excel report of files missing from inventory
Excel report results summary
JSON report sample
Current risk landscape Inventory of digital assets on storage tracking adds, deletes, and ● changes Limited access control in place on preservation servers ● Proactive and ad-hoc auditing of digital assets on preservation ● servers with routine reporting Redundant copies of selective digital assets ● Clearly articulated policy of commitments to our digital content ● Clear roles and responsibilities between GW units for management of ● digital assets
What’s next? pt. 1 Implement an administrative web interface to facilitate searching for items on storage servers by filename. Enhancing our support for Tier 1 content Explore automated restoration of files using JSON report outputs
What’s next? pt. 2 Develop a collection management policy for born digital content Evaluate restructuring our storage server filesystems from legacy paths to a modern storage hierarchy Annually reassess our risk management strategy
What’s next pt. 3 Exploring integrating MetaArchive as a storage location within our infrastructure. Looking at leveraging Simple Audit Tool with items stored in MetaArchive for consistency. Updating our digital services catalog to reflect this new endpoint.
More recommend