storage and preservation
play

Storage and Preservation Week 3 LBSC 671 Creating Information - PowerPoint PPT Presentation

Storage and Preservation Week 3 LBSC 671 Creating Information Infrastructures Physical Storage Segregate by: Users (e.g., Chemistry library) Type (e.g., audiovisual materials) Usage frequency (e.g., offsite storage) Size


  1. Storage and Preservation Week 3 LBSC 671 Creating Information Infrastructures

  2. Physical Storage • Segregate by: – Users (e.g., Chemistry library) – Type (e.g., audiovisual materials) – Usage frequency (e.g., offsite storage) – Size (e.g., folios) • Arrange in a way that facilitates access – Topical shelf order (e.g., Dewey Decimal System) • Foster preservation – Environment (temperature, humidity, light) – Access controls (closed stacks, gloves, …)

  3. High-Density Shelving http://www.kmhsystems.com/high-density-storage.html

  4. Compact Storage Robot Kyushu University, Japan

  5. Closed Stacks University of Education, Ghana

  6. Preservation c. 3000 BCE

  7. Organic Decay • Rag paper: 300-2,000 years • Acidic paper: 25-50 years • Acetate film: 40 years • Nitrate film: 40-1-00 years ISO 11799:2003 Image Permanence Institute, 2012

  8. Threats to Physical Collections • Organic decay • Intentional actions – Pilferage and vandalism – Official acts • Disasters – Natural disasters • Flood, tornado, earthquake, … – Accidents • Fire, sprinkler malfunction, … – Armed conflict

  9. Disaster Mitigation Examples • Flood: – Know where you can vacuum freeze dry • Decide quickly what to freeze • Air dry or dehumidify the rest – Immerse wet or muddy tape or film in water • Then air dry or dehumidify – Replace wet archival boxes immediately • Fire: – Handle as fragile, wrap in clean paper – Pack between cardboard to stiffen http://matrix.msu.edu/~disaster/balcplan.php

  10. Digital Preservation • Preservation of born-digital materials – Preserving appearance and interpretability – Preserving behavior • Digitization for preservation – Scanning (of paper, of microfilm) – Audio digitization – Video digitization – Volumetric imaging • Digital holography, computational tomography

  11. Binary Data Representation Example: American Standard Code for Information Interchange (ASCII) 01000001 = A 01100001 = a 01000010 = B 01100010 = b 01000011 = C 01100011 = c 01000100 = D 01100100 = d 01000101 = E 01100101 = e 01000110 = F 01100110 = f 01000111 = G 01100111 = g 01001000 = H 01101000 = h 01001001 = I 01101001 = i 01001010 = J 01101010 = j 01001011 = K 01101011 = k 01001100 = L 01101100 = l 01001101 = M 01101101 = m 01001110 = N 01101110 = n 01001111 = O 01101111 = o 01010000 = P 01110000 = p 01010001 = Q 01110001 = q … …

  12. Units of Size Unit Abbreviation Size (bytes) bit b 1/8 byte B 1 2 10 = 1024 kilobyte KB 2 20 = 1,048,576 megabyte MB 2 30 = 1,073,741,824 gigabyte GB 2 40 = 1,099,511,627,776 terabyte TB 2 50 = 1,125,899,906,842,624 petabyte PB

  13. Nothing new… Georges Seurat, A Sunday Afternoon on the Island of La Grande Jatte

  14. Basic Audio Coding • Sample at twice the highest frequency – 8 bits or 16 bits per sample Sampler • Speech (0-4 kHz) requires 8 kB/s – Standard telephone channel (1-byte samples) • Music (0-22 kHz) requires 172 kB/s – Standard for CD-quality audio (2-byte samples)

  15. MPEG Encoding • • • • • • I 1 B 1 B 2 B 3 P 1 B 4 B 5 B 6 P 2 B 7 B 8 B 9 I 2 Frame I Intra Encode complete image, similar to JPEG Types P Forward Predicted Motion relative to previous I and P’s B Backward Predicted Motion relative to previous & future I’s & P’s

  16. Volumetric Imaging

  17. Rotating Storage Media • Fixed magnetic disk – Hard drives • Removable magnetic disk – Floppy disk • Removable optical disc – CD, DVD, Blu-ray

  18. Magnetic Disk (Hard Drive) Shelly, Cashman and Vermatt, Discovering Computers, 2004

  19. Optical Disc

  20. Optical Disk Technologies near infared red violet

  21. Magnetic Tape • Tapes store data sequentially – Fast transfer, but no practical “random access” • Used only for low-use storage – Disaster recovery, offline storage

  22. Solid-State Memory • ROM – Does not require power to retain content – Used for “Basic Input/Output System” (BIOS) • RAM – Cheap and fast, but works only while power is on • Flash memory (Solid State Disk, memory sticks) – Much faster “random access” than rotating disk • ~10,000 times faster, but ~10 times more expensive per bit – Limited number of lifetime write operations (~5,000) • But Zipf’s law permits “wear leveling”

  23. Threats to Digital Collections • Business decisions – Termination of service – Termination of infrastructure support • e.g., reading Amiga files, displaying Word Perfect • Malfunctions – Hardware failure, operator error, software bugs, … • Vandalism (hackers) • Disasters – Physical risks to servers – Electromagnetic pulse

  24. http://www.crashplan.com/medialifespan/

  25. Media Migration • What format should old tapes be converted to? – Newer tape – Rotating media – Solid state disks • How often must we “refresh” these media?

  26. Risk Management • Redundancy drives down uncorrelated risk – Let p be the probability of loss of one copy – Then p*p*p is the chance of loss at 3 sites – Example: if p=0.01 then p*p*p=0.000001 • Two fundamental problems: – Unanticipated correlation • For example, an operating system bug – Underestimated “black swan” probabilities

  27. Layered Defense • Good storage practices – Offline: Media migration – Online: uninterruptable power, RAID, backups • Distributed storage – Storage Resource Broker (SRB), LOCKSS, … • Air gaps – Interrupt unexpected correlation

  28. Data Centers Source: Wikipedia

  29. Shared Data Center Locations http://www.datacentermap.com/usa/datacenters.html

  30. Data Center Electricity Use (USA) 2010 Jonathan Koomey, Analytics Press, 2010

  31. Digital Federal Depository Library http://lockss-usdocs.stanford.edu

  32. LOCKSS Distributed Repair

  33. ITHAKA • JSTOR digitization – Back runs of journals – Recently expanded to books • Portico preservation – Centralized management, originally for journals • Release triggers: discontinuation, loss of access – Also service for books and datasets

  34. HathiTrust • Centralized repository for digitized books – Google Books digitization (via owning libraries) – Microsoft book search (ran from 2006-2008) – Internet Archive • Million book project, project Gutenberg, contributions, … – Cooperative digitization As of August 13, 2010 6,549,680 Total volumes 3,798,116 Book titles 153,311 Serial titles 1,300,896 Public Domain

  35. Jeremy York, IFLA 2010

  36. Indiana University Digitization

  37. Preserving Behavior • Word processors – Formatting, track changes, undo deleted text • Spreadsheets – Formulas, visualizations • Databases – Queries, forms, derived values • Computer-Assisted Design (CAD) – Display, modification, manufacturing • Software – Simulation, games, embedded systems, …

  38. Behavior Preservation Strategies • Format migration – For example, convert Word Perfect to PDF • Emulation – Allows running old software on newer systems

  39. Apollo Guidance Computer Emulation http://www.ibiblio.org/apollo/

  40. An Integrated Strategy • Delay decay of organic materials to buy time • Balance quality and scale – For future access, quantity has a quality all its own • Rescue high-value at-risk collections • Design diversity into the process – Technologies, risk exposure, institutions • Adequately resource the process

  41. Before You Go! • On a sheet of paper (no names), answer the following question: What was the muddiest point in today’s class?

Recommend


More recommend