rapid replication of multi petabyte file systems
play

Rapid Replication of Multi- Petabyte File Systems Justin Sybrandt - PowerPoint PPT Presentation

Rapid Replication of Multi- Petabyte File Systems Justin Sybrandt Jason Hick (NSF award number 1157075) NERSC Stores PB of scientific data. Needed to replicate whole file systems. The Problem Need to freshen a stale copy.


  1. Rapid Replication of Multi- Petabyte File Systems Justin Sybrandt Jason Hick (NSF award number 1157075)

  2. NERSC ● Stores PB of scientific data. ● Needed to replicate whole file systems.

  3. The Problem ● Need to freshen a stale copy. ○ File system backups. ○ Disaster recovery. ○ Moving locations.

  4. Distsync NODE Out-Of-Date GPFS NODE ● Quickly determines the NODE changes between two file systems. ● Follows the Master-Slave Paradigm. Up-To-Date GPFS ● Similar to Shift, but streamlined for large synchronizations.

  5. Job Generation ● Policy scans produce lists of all files. ● Generator creates job files in linear time.

  6. Job File ● Contains a list of file paths. ● Limited in size. (when possible) ● Type implies action.

  7. Job Scheduling ● Manager ensures that jobs are completed in the right order.

  8. Job Completion ● Workers start processing jobs in parallel. ● Utilizes system commands.

  9. Micro Benchmark

  10. Conclusions DistSync Processes file system scans. Creates job files. Maximises file system bandwidth. Frequent syncs lead to faster syncs.

  11. Thank You! Questions?

Recommend


More recommend