Bonsai: Balanced Lineage Authentication Ashish Gehani Bonsai:Balanced Lineage Authentication – p. 1/19
What is data lineage ? Output Operation Input 1 Input n (a) Primitive operation (b) Compound operation tree Bonsai:Balanced Lineage Authentication – p. 2/19
Why track lineage? GIS - Data origins Material science - Component pedigree Biology - Experiment reproducibility Grid - Debugging Bonsai:Balanced Lineage Authentication – p. 3/19
Why certify lineage? Reproduction costly PDB - $200,000 / protein Fermilab Collision Detector - 1 month, multiple TB / datum Reliability Accreditation Ownership Auditability Bonsai:Balanced Lineage Authentication – p. 4/19
What’s been done? LFS - Inputs, Outputs, Options → SQL PASS - Runtime environs → Berkeley DB Trio - Tracks data accuracy using lineage CMCS - Chemistry toolkit → WebDAV Chimera - Workflow scripts my Grid - Biology Grid workflows V esta - Incremental builds ESSW - Earth Science data management Bonsai:Balanced Lineage Authentication – p. 5/19
What’s the problem? Single trust domain Chimera , my Grid , V esta , ESSW Centralized service LFS , PASS , Trio , CMCS No assurance Unsigned Incomplete Bonsai:Balanced Lineage Authentication – p. 6/19
What granularity? What to audit? Processes, System calls, File system? Fine grain → High overhead Coarse grain → False positives File system: Pro - Intermediate complexity Pro - Captures most persistent change Con - Can’t track data from: Network, Keyboard, Pipes, Memory maps Bonsai:Balanced Lineage Authentication – p. 7/19
Certification approach ? Input = Output No global TCB Require commitments Consumer Check agreement of: Input Output Producer output Producer Consumer input Trusted user in subtree / path → Tampering detectable Bonsai:Balanced Lineage Authentication – p. 8/19
Metadata generation Intercede on calls for: exec(), fork(), exit(), open(), close(), read(), write() Maintain process table entries for: accessed, modified files File 2 Read open() close() File 3 File 1 Read close() Process open() Owner Process execution Time close() File 1 File 2 open() File 3 Write Bonsai:Balanced Lineage Authentication – p. 9/19
Minimal representation Executor Signature Output Input Input n 1 Net Address Inode Time Executor: 32 bit IPv4 address, 32 bit user ID Signature: 160 bits [ S IGN K E ( E, O, I 1 , . . . , I n ) ] Input / Output File: 32 bit IPv4 address 32 bit inode 32 bit time (Seconds since 1/1/70) Bonsai:Balanced Lineage Authentication – p. 10/19
Workload Berkeley NOW file system traces Month of activity Access patterns stable Instruction - 20 workstations in teaching lab Research - 13 desktops of research group Web - 1 web server running Postgres Windows - 8 Windows desktops Bonsai:Balanced Lineage Authentication – p. 11/19
Cumulative lineage Current paradigm Entire tree migrates with data Metadata grows rapidly: Steps 1 2 3 4 5 Workload Instruction 0.4 KB 3 KB 31 KB 253 KB 2 MB Research 0.2 KB 0.8 KB 2 KB 8 KB 29 KB Web 1 KB 39 KB 1 MB 29 MB 813 MB Windows 0.2 KB 0.8 KB 2 KB 9 KB 30 KB Bonsai:Balanced Lineage Authentication – p. 12/19
Operational impact Time (in ms ) to read tree in open() : Steps 1 2 3 4 Workload Instruction 0.04 0.05 0.11 1.72 Research 0.05 0.05 0.04 0.04 Web 0.06 0.13 6.42 997.5 Windows 0.07 0.04 0.04 0.04 Time (in ms ) to write tree in close() : Steps 1 2 3 4 Workload Instruction 0.20 0.28 0.32 0.84 Research 0.16 0.19 2.39 3.1 Web 0.16 0.24 4.82 579.14 Windows 0.16 0.50 5.34 3.17 Bonsai:Balanced Lineage Authentication – p. 13/19
In actu Larger representation Unless certification available for: DHCP bindings inode mappings Clock synchronization Bonsai:Balanced Lineage Authentication – p. 14/19
Decentralized lineage Proposed paradigm Remote pointers replace branches Metadata remains small: Workload Storage Instruction 0.4 KB Research 0.2 KB Web 1 KB Windows 0.2 KB Bonsai:Balanced Lineage Authentication – p. 15/19
Verifying lineage Algorithm : C HECK L INEAGE ( D ) { E, S, O, I 1 , . . . , I n } ← G ET R OOT ( D ) O UTPUT ( E ) P E ← P KI L OOKUP ( E ) if I 1 , . . . , I n = {} Result ← V ERIFY ( P E , S, E, O ) then if Result = F ALSE then CheckFailed Result ← V ERIFY ( P E , S, E, O | I 1 | . . . | I n ) if Result = T RUE � else for i ← 1 to n then do C HECK L INEAGE ( I i ) Reliability drops ← − else CheckFailed Bonsai:Balanced Lineage Authentication – p. 16/19
Increasing availability Traditional strategy: Form virtual topology Flood neighbors Inefficient use of storage Bonsai:Balanced Lineage Authentication – p. 17/19
Bonsai Prune lineage tree Stored locally Pruned − must be recovered λ from remote node levels Pruned Bonsai:Balanced Lineage Authentication – p. 18/19
Simplest pruning Trade verification reliability for storage Bonsai:Balanced Lineage Authentication – p. 19/19
Recommend
More recommend