file system architecture
play

File System Architecture APP APP APP APP Metad adat ata S a - PowerPoint PPT Presentation

B ATCH FS Scaling the File System Control Plane with Client-Funded Metadata Servers [ vision-paper ] Qing Zheng, Kai Ren, Garth Gibson Carnegie Mellon University 9 th Parallel Data Storage Workshop/SC 2014 File System Architecture APP APP


  1. B ATCH FS Scaling the File System Control Plane with Client-Funded Metadata Servers [ vision-paper ] Qing Zheng, Kai Ren, Garth Gibson Carnegie Mellon University 9 th Parallel Data Storage Workshop/SC 2014

  2. File System Architecture APP APP APP APP Metad adat ata S a Service metadata operations APP APP APP APP I/O operations Sha Shared O Object St Storage Inf Infrastructure OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD Data path is parallel but metadata path is not necessarily. Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 2

  3. Reality Data scales; META TADATA ATA is hard to scale, especially in HPC data centers Programmers like POSIX SEMANTICS, which limits linear scalability Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 3

  4. How We Scale the Metadata [SC14, Tue, 2:30pm, Room 393-94-95] Two orders of magnitude faster than Lustre/PVFS B ATCH FS Scale another order of magnitude Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 4

  5. BATCH APPLICATION

  6. Batch Applications input input input input MPI MPI MPI MPI Batch Batch Batch Batch Client Client Client Client output output output output chpfile chpfile chpfile chpfile chpfile chpfile chpfile chpfile Batch apps are self-coordinated by MPI and workflow engines Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 6

  7. Key Observation Batch apps DON’ N’T need FS to communicate SYNCHRONOUS and SERIALIZED metadata management is OVERKILL LL for batch apps Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 7

  8. Introducing BatchFS Deep batching for high throughput Batch APP Batch APP Batch APP Batch APP mknod remove mknod chmod mkdir mkdir chmod mknod Bat BatchFS chmod chmod mkdir mkdir batch batch batch batch Sha Shared U Und nderlying St Stor orage I Inf nfrastruc ucture Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 8

  9. BatchFS Philosophy From per-op to per-batch synchronization From server-side to mostly client-side processing  CLIE IENT NT-FUND NDED ED metada data a archit hitec ectur ure Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 9

  10. BATCHFS BACKGROUND

  11. Background IndexFS BatchFS Bat BatchFS is designed as an extension of Inde IndexFS [SC14, Tue, 2:30pm, Room 393-94-95] inheriting its metadata representation to enable high-performance metadata processing Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 11

  12. Metadata Representation Log-structured and indexed data structure (LSM Tree) [SSTable/LevelDB] mkdir IndexFS [(k,v), (k,v) , ..., (k,v)] In-mem buffer Servers/Clients SSTable 1 SSTable 2 SSTable 3 Key-Value Store Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 12

  13. Client-Server Interaction traditional non-batched IndexFS Server File System Client mkdir/chmod server metadata storage SST SST 1 SST SST 2 SST SST 3 SST SST 4 Global Namespace Shared Underlying Storage Infrastructure Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 13

  14. Metadata Bulk Insertion traditional non-batched IndexFS Server File System Client mkdir/chmod localized/batched mkdir/chmod under a subtree server metadata storage SST SST‘ 1 SST‘ 2 SST SST SST 1 SST SST 2 SST 3 SST SST 4 SST bulk insertion Local Lease-Protected Namespace Global Namespace Shared Underlying Storage Infrastructure Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 14

  15. Preliminary Results A prototype of BatchFS as an IndexFS [SC14] feature metadata bulk insertion (batching) Each node has 2 CPUs, 8GM RAM, 1 HDD SATA disk, and one 1Gb Eth port 8+1 No Node H HDFS FS Clus uster Name Data Data Data Data Data Data Data Data Node Node Node Node Node Node Node Node Node Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 15

  16. Workload Each client process creates 1 private directory (8-64 client processes on fixed 8 nodes) Clients insert empty files into their own directories (in total 1 million * #servers files) Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 16

  17. E xperiment Setup 8 node 1-8 IndexFS clients 1-8 IndexFS clients 1-8 IndexFS clients HDFS … Name Node 1 IndexFS Server HDFS Data Node HDFS Data Node HDFS Data Node DISK DISK DISK Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 17

  18. E xperiment Setup 8 node 1-8 IndexFS clients 1-8 IndexFS clients 1-8 IndexFS clients HDFS … Name Node 1 IndexFS Server 1 IndexFS Server HDFS Data Node HDFS Data Node HDFS Data Node DISK DISK DISK Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 18

  19. E xperiment Setup 8 node 1-8 IndexFS clients 1-8 IndexFS clients 1-8 IndexFS clients HDFS … Name Node 1 IndexFS Server 1 IndexFS Server 1 IndexFS Server HDFS Data Node HDFS Data Node HDFS Data Node DISK DISK DISK Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 19

  20. E xperiment Setup 8 node 1-8 Batch clients 1-8 Batch clients 1-8 Batch clients HDFS … Name Node 1 IndexFS Server HDFS Data Node HDFS Data Node HDFS Data Node DISK DISK DISK Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 20

  21. 8x-360x Perf. Improvements HDFS Baseline Single IndexFS Server Dual IndexFS Servers Full IndexFS Servers Client-Side Bulk Insertion 250 op/s) 216 203 188 200 (K o ut ( 139 360x 8-18x 8x 150 hput 360x oughp Throug 100 50 34 29 22 19 18 17 17 15 13 13 12 11 0.6 0.6 0.6 0.6 0 8 16 32 64 Total N al Number o r of Clie lient Pr Proc ocesse ses Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 21

  22. BATCHFS DE SIGN

  23. Deep Metadata Batch Lazy namespace synchronization global namespace SST ST SST ST SST ST Batch Pre-execute metadata ops at client-side Client snaps pshot(…) mkdir(…) chmod(…) bulk_inser sert(…) client-local namespace SST SST SST ST SST ST SST ST SST ST SST ST file system history Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 23

  24. Deep Metadata Batch Lazy namespace synchronization global namespace SST ST SST ST SST ST Batch Pre-execute metadata ops at client-side Client snaps pshot(…) Lazy semantics enforcement Another mkdir(…) chmod(…) mkdir(…) Client Delayed until synchronization is eventually needed chmod(…) bulk_inser sert(…) ill-formatted? client-local namespace SST SST permission violations? concurrent conflicts? SST ST SST ST SST ST SST ST SST ST file system history Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 24

  25. B ATCH FS [PDSW14] [SC14] Snapshot of a subtree Empty subtree Concurrent access Exclusive access Optimistic concurrency control Protected by server-issued leases No timeout Lease expires Snapshot reads w/ access control Empty subtree Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 25

  26. Cliend-Funded Metadata Processing Server Resources Client Resources Primary MDS Private MDS Global Namespace Snapshot Copy Unchecked Namespace Modified Namespace Merged Namespace Server Resources Client Resources Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 26

  27. Cliend-Funded Metadata Verification Server Resources Client Resources Primary MDS Private MDS Auxiliary MDS Global Namespace Snapshot Copy Modified Namespace Unchecked Namespace Merged Namespace Accepted Namespace Server Resources Client Resources Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 27

  28. FUTURE WORK

  29. Conflict Resolution Who is responsible? What’s the semantics? A) DB-like, read/write sets, transactional B) Bayou-like, auto resolution, domain rules C) Coda-like, resolved by human Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 29

  30. Self-probable Metadata For clients to generate proofs of the correctness of their namespace mutations A) operation log (possibly compressed) B) logic-based proof (proof-carrying-code) Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 30

  31. B ATCH FS Conclusion At least one RPC per operation Inefficient metadata representation Pessimistic concurrency control Synchronous metadata interface Dedicated authorization service Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 31

  32. BatchFS Architecture Fixed Server Nodes Primary Primary Primary MDS MDS MDS Clie lient-Provis vision ioned M Metadata C Computin ing Nodes Private Private Private Private Auxiliary Auxiliary Auxiliary Auxiliary MDS MDS MDS MDS MDS MDS MDS MDS Fast P Paral allel S Storag age Infras astruc uctur ure BatchFS scales with the number of client nodes. Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 32

  33. Reference B ATCH FS Scaling the File System Control Plane with Client-Funded Metadata Servers (PDSW14) Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion (SC14) Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 33

  34. QUE STIONS IDE AS AND FE E DBACK 

  35. BACKUP SLIDE S

  36. Access & Quota Control Access enforced by OSD No quota control for metadata Quota control on data provided by OSD Parallel Data Lab - http://www.pdl.cmu.edu/ PDSW14 36

Recommend


More recommend