building an extensible file system via policy based data
play

Building an Extensible File System via Policy-based Data - PowerPoint PPT Presentation

Building an Extensible File System via Policy-based Data Management Hao Xu Jewel H. Ward Mike Conway Arcot Rajasekar Reagan W. Moore (iRODS


  1. ¡ Building ¡an ¡Extensible ¡File ¡System ¡via ¡ ¡ Policy-­‑based ¡Data ¡Management ¡ Hao ¡Xu ¡ Jewel ¡H. ¡Ward ¡ Mike ¡Conway ¡ Arcot ¡Rajasekar ¡ Reagan ¡W. ¡Moore ¡ (iRODS ¡ConsorIum, ¡hLp://irods.org) ¡ ¡

  2. File System q Essential Functions: § Ingest, Store, Access q Modern File Systems are built on top of traditional file systems: § Google File System, Amazon S3, Hadoop Distributed File System § Driven by the need of a target application § Customized toward the target application domain

  3. Data Management Needs in Archive and Scientific Communities q Discoverability q Complex Metadata q Workflow Management q Data Sharing q Provenance q Long Term Preservation q Technology Migration q Interoperability Between Infrastructures

  4. Challenges Can generic infrastructure meet the needs of a diverse set of data management domains?

  5. Flexibility to Define a Wide Range of Application Domain Policies q User Community à à Policies q File ingest operations: § Authentication § Authorization § Storage Quota § Aggregation § Resource Selection § Replication § File Retention § Metadata

  6. Infrastructure Support For Non-standard Application Domain Operations q Standard file system operations have robust support: § Metadata § Auditing § Access Control List q Non-standard operations that are implemented as a library do not have direct support from the file system. Examples: § Preservation – OAIS: SIP, AIP, DIP packages § Digital library – Provenance & discovery metadata § Processing pipeline – Format transformation

  7. Interoperability with Other Infrastructures q Emergent scalability mechanisms: § Organization change • List à Tree à Graph (Internet) à Search § Data structure change • Files, tables, streams § Property enforcement expectations • Reproducible data-driven research q Separation of how files are stored, accessed, and manipulated

  8. Policy-based Data Management

  9. Policy = Metadata + Procedure q Purpose ¡ ¡ ¡ Reason ¡a ¡collecIon ¡is ¡assembled ¡ q Proper)es ¡ ¡ ¡ ALributes ¡needed ¡to ¡ensure ¡the ¡ purpose ¡ q Policies ¡ ¡ ¡ Controls ¡for ¡enforcing ¡desired ¡ proper)es ¡ ¡ § Procedural ¡Policy: ¡Example: ¡When ¡an ¡object ¡is ¡ingested, ¡run ¡workflow ¡ § Asser?onal ¡Policy: ¡Example: ¡A ¡file ¡has ¡three ¡or ¡more ¡replicas ¡ q Metadata ¡ ¡ Persistent ¡state ¡ § State ¡informa?on ¡(consistency ¡in ¡a ¡distributed ¡environment) ¡ § Generated ¡through ¡applica?on ¡of ¡ procedures ¡ q Procedures ¡ OperaIons ¡performed ¡within ¡the ¡system ¡ § What ¡to ¡run: ¡Func?ons ¡that ¡implement ¡the ¡ policies ¡ § How ¡to ¡verify: ¡Valida?on ¡that ¡ metadata ¡ conforms ¡to ¡the ¡desired ¡ purpose ¡

  10. Policy-based Data Management Purpose Collection Defines Defines Property Policy Procedure Defines Updates Controls Metadata SubType Periodic Assessment Criteria Policy

  11. Policy-based Data Management - Collection Purpose Collection Defines Has Has Defines Has Digital Has Attribute Object Has Isa Updates Property Policy Procedure Defines Updates Controls Metadata SubType Periodic Assessment Criteria Policy

  12. Policy-based Data Management – Collection Properties Purpose Collection Defines Has Has Defines Digital Has Attribute Object Has Integrity Isa Updates Isa Authenticity Isa Property Policy Procedure Defines Updates Controls Metadata Access Isa control SubType HasFeature Periodic HasFeature Assessment Completeness Criteria HasFeature Policy Correctness HasFeature Consensus Consistency

  13. Policy-based Data Management – Collection Policies Purpose Collection Defines Replication Has Has Policy Isa Checksum Defines Policy Digital Has Attribute Isa Quota Object Policy Has Isa Data Type Integrity Isa Updates Policy Isa Isa Authenticity Isa Property Policy Procedure Defines Updates Controls Metadata Access Isa control SubType HasFeature Periodic HasFeature Assessment Completeness Criteria HasFeature Policy Correctness HasFeature Consensus Consistency

  14. Policy-based Data Management –Collection Procedures Purpose Collection Defines Replication Has Has Policy Isa Checksum Defines Policy Digital Has Attribute Isa Quota Object Policy Has Isa Data Type Integrity Isa Updates Policy Isa Isa Authenticity Isa Property Policy Procedure Defines Updates Controls Metadata Access Isa control Isa SubType HasFeature GetUserACL Periodic HasFeature Workflow Isa Assessment SetDataType Completeness Criteria HasFeature Chains Isa Policy SetQuota Correctness Isa Function HasFeature Isa DataObjRepl Isa Consensus Isa SysChksumDataObj Operation Consistency

  15. Policy-based Data Management – Persistent State Purpose Collection Defines DATA_ID DATA_REPL_NUM DATA_CHECKSUM Replication Has Has Isa Isa Isa Policy Isa Checksum Defines Policy Digital Has Attribute Isa Quota Object Policy Has Isa Data Type Integrity Isa Updates Policy Isa Isa Authenticity Isa Property Policy Procedure Defines Updates Controls Metadata Access Isa control Isa SubType HasFeature GetUserACL Periodic HasFeature Workflow Isa Assessment SetDataType Completeness Criteria HasFeature Chains Isa Policy SetQuota Correctness Isa Function HasFeature Isa DataObjRepl Isa Consensus Isa SysChksumDataObj Operation Consistency

  16. Policy-based Data Management – Policy Enforcement Purpose Collection Defines DATA_ID DATA_REPL_NUM DATA_CHECKSUM Replication Has Has Isa Isa Isa Policy Isa Checksum Defines Policy Digital Has Attribute Isa Quota Object Policy Has Isa Data Type Integrity Isa Updates Policy Isa Isa Authenticity Isa Property Policy Procedure Defines Updates Controls Metadata Access Isa control Isa SubType Has HasFeature GetUserACL Periodic HasFeature Workflow Isa Assessment Policy SetDataType Completeness Criteria Enforcement HasFeature Chains Isa Policy Point SetQuota Correctness Isa Function HasFeature Invokes Isa DataObjRepl Isa Consensus Isa SysChksumDataObj Operation Client Consistency Action

  17. Example of Policy-based Data Management

  18. Policy-based Infrastructure integrated Rule Oriented Data System • Biology • Cognitive Science Temporal Dynamics of Learning Center • Human genome Broad Institute, Wellcome Trust Sanger Institute, NGS • Medicine Sick Kids Hospital • Neuroscience International Neuroinformatics Coordinating Facility • Plant genome the iPlant Collaborative • Phylogenetics Phylogenetics at CC IN2P3 • Computer Science • Network research GENI experimental network • Earth Sciences • Atmospheric science NASA Langley Atmospheric Sciences Center • Climate NOAA National Climatic Data Center • NASA Center for Climate Simulations • Ecology CEED Caveat Emptor Ecological Data • Hydrology Institute for the Environment, UNC-CH; Hydroshare • Oceanography Ocean Observatories Initiative • Seismology Southern California Earthquake Center • Engineering • Education repository CIBER-U • Physics • Astrophysics Auger supernova search • Cosmic Ray AMS experiment on the International Space Station • Dark Matter Physics Edelweiss II • High Energy Physics BaBar / Stanford Linear Accelerator • Neutrino Physics T2K and dChooz neutrino experiments • Optical Astronomy National Optical Astronomy Observatory • Particle Physics Indra multi-detector collaboration at IN2P3 • Quantum Chromodynamics IN2P3 • Radio Astronomy Cyber Square Kilometer Array, TREND, BAOradio • Social Science Odum, TerraPop

  19. Policy Applications q Pre-process policy § Applied before an operation is done q Operation § May be policy controlled q Post-process policy § Applied after the operation is done q Are these sufficient to handle the wide diversity of data management applications? q Does this minimize the number of required operations?

  20. RHESSys workflow to develop a Policy Choose gauge or outlet (HIS) nested watershed parameter file (Workflow) in (worldfile) containing a nested Hydrology Extract ecogeomorphic object framework, drainage area (NHDPlus) and full, initial system state. For each box, create a micro- Digital Slope Elevation Model (DEM) service to automate task, and Aspect chain into a workflow Nested watershed Streams (NHD) structure Soil and vegetation Roads (DOT) Strata parameter files Patch Land Use NLCD (EPA) Hillslope Basin Leaf Area Index Landsat TM Stream network Phenology MODIS Worldfile Flowtable Soil Data USDA RHESSys

  21. Policies in Software Defined Networking Control selection of network paths Rule Engine Network Data GraphDB iCAT Policies Policies OF Controller iRODS Server iRODS Server iRODS Server

  22. Policy in Data Storage Aggregation / Caching / Replication Queen Mary University of London Source: Di Lodovico et al.

Recommend


More recommend