bonsai balanced lineage authentication
play

Bonsai: Balanced Lineage Authentication Ashish Gehani - PowerPoint PPT Presentation

Bonsai: Balanced Lineage Authentication Ashish Gehani Bonsai:Balanced Lineage Authentication p. 1/19 What is data lineage ? Output Operation Input 1 Input n (a) Primitive operation (b) Compound operation tree Bonsai:Balanced Lineage


  1. Bonsai: Balanced Lineage Authentication Ashish Gehani Bonsai:Balanced Lineage Authentication – p. 1/19

  2. What is data lineage ? Output Operation Input 1 Input n (a) Primitive operation (b) Compound operation tree Bonsai:Balanced Lineage Authentication – p. 2/19

  3. Why track lineage? GIS - Data origins Material science - Component pedigree Biology - Experiment reproducibility Grid - Debugging Bonsai:Balanced Lineage Authentication – p. 3/19

  4. Why certify lineage? Reproduction costly PDB - $200,000 / protein Fermilab Collision Detector - 1 month, multiple TB / datum Reliability Accreditation Ownership Auditability Bonsai:Balanced Lineage Authentication – p. 4/19

  5. What’s been done? LFS - Inputs, Outputs, Options → SQL PASS - Runtime environs → Berkeley DB Trio - Tracks data accuracy using lineage CMCS - Chemistry toolkit → WebDAV Chimera - Workflow scripts my Grid - Biology Grid workflows V esta - Incremental builds ESSW - Earth Science data management Bonsai:Balanced Lineage Authentication – p. 5/19

  6. What’s the problem? Single trust domain Chimera , my Grid , V esta , ESSW Centralized service LFS , PASS , Trio , CMCS No assurance Unsigned Incomplete Bonsai:Balanced Lineage Authentication – p. 6/19

  7. What granularity? What to audit? Processes, System calls, File system? Fine grain → High overhead Coarse grain → False positives File system: Pro - Intermediate complexity Pro - Captures most persistent change Con - Can’t track data from: Network, Keyboard, Pipes, Memory maps Bonsai:Balanced Lineage Authentication – p. 7/19

  8. Certification approach ? Input = Output No global TCB Require commitments Consumer Check agreement of: Input Output Producer output Producer Consumer input Trusted user in subtree / path → Tampering detectable Bonsai:Balanced Lineage Authentication – p. 8/19

  9. Metadata generation Intercede on calls for: exec(), fork(), exit(), open(), close(), read(), write() Maintain process table entries for: accessed, modified files File 2 Read open() close() File 3 File 1 Read close() Process open() Owner Process execution Time close() File 1 File 2 open() File 3 Write Bonsai:Balanced Lineage Authentication – p. 9/19

  10. Minimal representation Executor Signature Output Input Input n 1 Net Address Inode Time Executor: 32 bit IPv4 address, 32 bit user ID Signature: 160 bits [ S IGN K E ( E, O, I 1 , . . . , I n ) ] Input / Output File: 32 bit IPv4 address 32 bit inode 32 bit time (Seconds since 1/1/70) Bonsai:Balanced Lineage Authentication – p. 10/19

  11. Workload Berkeley NOW file system traces Month of activity Access patterns stable Instruction - 20 workstations in teaching lab Research - 13 desktops of research group Web - 1 web server running Postgres Windows - 8 Windows desktops Bonsai:Balanced Lineage Authentication – p. 11/19

  12. Cumulative lineage Current paradigm Entire tree migrates with data Metadata grows rapidly: Steps 1 2 3 4 5 Workload Instruction 0.4 KB 3 KB 31 KB 253 KB 2 MB Research 0.2 KB 0.8 KB 2 KB 8 KB 29 KB Web 1 KB 39 KB 1 MB 29 MB 813 MB Windows 0.2 KB 0.8 KB 2 KB 9 KB 30 KB Bonsai:Balanced Lineage Authentication – p. 12/19

  13. Operational impact Time (in ms ) to read tree in open() : Steps 1 2 3 4 Workload Instruction 0.04 0.05 0.11 1.72 Research 0.05 0.05 0.04 0.04 Web 0.06 0.13 6.42 997.5 Windows 0.07 0.04 0.04 0.04 Time (in ms ) to write tree in close() : Steps 1 2 3 4 Workload Instruction 0.20 0.28 0.32 0.84 Research 0.16 0.19 2.39 3.1 Web 0.16 0.24 4.82 579.14 Windows 0.16 0.50 5.34 3.17 Bonsai:Balanced Lineage Authentication – p. 13/19

  14. In actu Larger representation Unless certification available for: DHCP bindings inode mappings Clock synchronization Bonsai:Balanced Lineage Authentication – p. 14/19

  15. Decentralized lineage Proposed paradigm Remote pointers replace branches Metadata remains small: Workload Storage Instruction 0.4 KB Research 0.2 KB Web 1 KB Windows 0.2 KB Bonsai:Balanced Lineage Authentication – p. 15/19

  16. Verifying lineage Algorithm : C HECK L INEAGE ( D ) { E, S, O, I 1 , . . . , I n } ← G ET R OOT ( D ) O UTPUT ( E ) P E ← P KI L OOKUP ( E ) if I 1 , . . . , I n = {}  Result ← V ERIFY ( P E , S, E, O )   then if Result = F ALSE  then CheckFailed   Result ← V ERIFY ( P E , S, E, O | I 1 | . . . | I n )    if Result = T RUE    � else for i ← 1 to n then  do C HECK L INEAGE ( I i ) Reliability drops ← −      else CheckFailed Bonsai:Balanced Lineage Authentication – p. 16/19

  17. Increasing availability Traditional strategy: Form virtual topology Flood neighbors Inefficient use of storage Bonsai:Balanced Lineage Authentication – p. 17/19

  18. Bonsai Prune lineage tree Stored locally Pruned − must be recovered λ from remote node levels Pruned Bonsai:Balanced Lineage Authentication – p. 18/19

  19. Simplest pruning Trade verification reliability for storage Bonsai:Balanced Lineage Authentication – p. 19/19

Recommend


More recommend