Cloud Filesystem Jeff Darcy for BBLISA, October 2011 What is a - PowerPoint PPT Presentation

Cloud Filesystem Jeff Darcy for BBLISA, October 2011

What is a Filesystem? • “The thing every OS and language knows” • Directories, files, file descriptors • Directories within directories • Operate on single record (POSIX: single byte) within a file • Built-in permissions model (e.g. UID, GID, ugo·rwx) • Defined concurrency behaviors (e.g. fsync) • Extras: symlinks, ACLs, xattrs

Are Filesystems Relevant? • Supported by every language and OS natively • Shared data with rich semantics • Graceful and efficient handling of multi-GB objects • Permission model missing in some alternatives • Polyglot storage, e.g. DB to index data in FS

Network Filesystems • Extend filesystem to multiple clients • Awesome idea so long as total required capacity/performance doesn't exceed a single server o ...otherwise you get server sprawl • Plenty of commercial vendors, community experience • Making NFS highly available brings extra headaches

Distributed Filesystems • Aggregate capacity/performance across servers • Built-in redundancy o ...but watch out: not all deal with HA transparently • Among the most notoriously difficult kinds of software to set up, tune and maintain o Anyone want to see my Lustre scars? • Performance profile can be surprising • Result: seen as specialized solution (esp. HPC)

Example: NFS4.1/pNFS • pNFS distributes data access across servers • Referrals etc. offload some metadata • Only a protocol, not an implementation o OSS clients, proprietary servers • Does not address metadata scaling at all • Conclusion: partial solution, good for compatibility, full solution might layer on top of something else

Example: Ceph • Two-layer architecture • Object layer (RADOS) is self-organizing o can be used alone for block storage via RBD • Metadata layer provides POSIX file semantics on top of RADOS objects • Full-kernel implementation • Great architecture, some day it will be a great implementation

Ceph Diagram Data Metadata Data Client Metadata Data Data Ceph RADOS Layer Layer

Example: GlusterFS • Single-layer architecture o sharding instead of layering o one type of server – data and metadata • Servers are dumb, smart behavior driven by clients • FUSE implementation • Native, NFSv3, UFO, Hadoop

GlusterFS Diagram Brick A Brick C Data Data Data Data Metadata Metadata Metadata Metadata Client Data Data Data Data Metadata Metadata Metadata Metadata Brick B Brick D

OK, What About HekaFS? • Don't blame me for the name o trademark issues are a distraction from real work • Existing DFSes solve many problems already o sharding, replication, striping • What they don't address is cloud-specific deployment o lack of trust (user/user and user/provider) o location transparency o operationalization

Why Start With GlusterFS? • Not going to write my own from scratch o been there, done that o leverage existing code, community, user base • Modular architecture allows adding functionality via an API o separate licensing, distribution, support • By far the best configuration/management • OK, so it's FUSE o not as bad as people think + add more servers

HekaFS Current Features • Directory isolation • ID isolation o “virtualize” between server ID space and tenants' • SSL o encryption useful on its own o authentication is needed by other features • At-rest encryption o Keys ONLY on clients o AES-256 through AES-1024, “ESSIV-like”

HekaFS Future Features • Enough of multi-tenancy, now for other stuff • Improved (local/sync) replication o lower latency, faster repair • Namespace (and small-file?) caching • Improved data integrity • Improved distribution o higher server counts, smoother reconfiguration • Erasure codes?

HekaFS Global Replication • Multi-site asynchronous • Arbitrary number of sites • Write from any site, even during partition o ordered, eventually consistent with conflict resolution • Caching is just a special case of replication o interest expressed (and withdrawn) not assumed • Some infrastructure being done early for local replication

Project Status • All open source o code hosted by Fedora, bugzilla by Red Hat o Red Hat also pays me (and others) to work on it • Close collaboration with Gluster o they do most of the work o they're open-source folks too o completely support their business model • “current” = Fedora 16 • “future” = Fedora 17+ and Red Hat product

Contact Info • Project • http://hekafs.org • jdarcy@redhat.com • Personal • http://pl.atyp.us • jeff@pl.atyp.us

Cloud Filesystem Jeff Darcy for BBLISA, October 2011

What is a Filesystem? • “The thing every OS and language knows” • Directories, files, file descriptors • Directories within directories • Operate on single record (POSIX: single byte) within a file • Built-in permissions model (e.g. UID, GID, ugo·rwx) • Defined concurrency behaviors (e.g. fsync) • Extras: symlinks, ACLs, xattrs

Are Filesystems Relevant? • Supported by every language and OS natively • Shared data with rich semantics • Graceful and efficient handling of multi-GB objects • Permission model missing in some alternatives • Polyglot storage, e.g. DB to index data in FS

Network Filesystems • Extend filesystem to multiple clients • Awesome idea so long as total required capacity/performance doesn't exceed a single server o ...otherwise you get server sprawl • Plenty of commercial vendors, community experience • Making NFS highly available brings extra headaches

Distributed Filesystems • Aggregate capacity/performance across servers • Built-in redundancy o ...but watch out: not all deal with HA transparently • Among the most notoriously difficult kinds of software to set up, tune and maintain o Anyone want to see my Lustre scars? • Performance profile can be surprising • Result: seen as specialized solution (esp. HPC)

Example: NFS4.1/pNFS • pNFS distributes data access across servers • Referrals etc. offload some metadata • Only a protocol, not an implementation o OSS clients, proprietary servers • Does not address metadata scaling at all • Conclusion: partial solution, good for compatibility, full solution might layer on top of something else

Example: Ceph • Two-layer architecture • Object layer (RADOS) is self-organizing o can be used alone for block storage via RBD • Metadata layer provides POSIX file semantics on top of RADOS objects • Full-kernel implementation • Great architecture, some day it will be a great implementation

C A e t a d a t a C l i e n t R D a e e y a L h p C O r e y a L S M t e D p h D i a g r a m D a t a a a t d a t e M a a t D a t a D a r

Example: GlusterFS • Single-layer architecture o sharding instead of layering o one type of server – data and metadata • Servers are dumb, smart behavior driven by clients • FUSE implementation • Native, NFSv3, UFO, Hadoop

G k t a B r i c d C D a t a M a a t a a M e t a d t t a D a t a M e e a a a k D D a t M i e t a d a t c r d a a t a D a t M B e t a d a t a t D l M n t D a t a e i t a d a t a e l r S u s t e r F C D i a g r a m B i B t M e t a d a a t B r i c k a a c a k A D a t M D e t a d a t a a

OK, What About HekaFS? • Don't blame me for the name o trademark issues are a distraction from real work • Existing DFSes solve many problems already o sharding, replication, striping • What they don't address is cloud-specific deployment o lack of trust (user/user and user/provider) o location transparency o operationalization

Why Start With GlusterFS? • Not going to write my own from scratch o been there, done that o leverage existing code, community, user base • Modular architecture allows adding functionality via an API o separate licensing, distribution, support • By far the best configuration/management • OK, so it's FUSE o not as bad as people think + add more servers

HekaFS Current Features • Directory isolation • ID isolation o “virtualize” between server ID space and tenants' • SSL o encryption useful on its own o authentication is needed by other features • At-rest encryption o Keys ONLY on clients o AES-256 through AES-1024, “ESSIV-like”

HekaFS Future Features • Enough of multi-tenancy, now for other stuff • Improved (local/sync) replication o lower latency, faster repair • Namespace (and small-file?) caching • Improved data integrity • Improved distribution o higher server counts, smoother reconfiguration • Erasure codes?

HekaFS Global Replication • Multi-site asynchronous • Arbitrary number of sites • Write from any site, even during partition o ordered, eventually consistent with conflict resolution • Caching is just a special case of replication o interest expressed (and withdrawn) not assumed • Some infrastructure being done early for local replication

Cloud Filesystem Jeff Darcy for BBLISA, October 2011 What is a - PowerPoint PPT Presentation

Cloud Filesystem Jeff Darcy for BBLISA, October 2011 What is a Filesystem? The thing every OS and language knows Directories, files, file descriptors Directories within directories Operate on single record (POSIX: single

FrontendFS Creating a userspace filesystem in node.js Clay Smith, New Relic BUILDING A

Mostafa Z. Ali Mostafa Z. Ali mzali@just.edu.jo 1 1 The Linux FileSystem A filesystem is

The Btrfs Filesystem Chris Mason The Btrfs Filesystem Jointly developed by a number of

The Btrfs Filesystem Chris Mason The Btrfs Filesystem Jointly developed by a number of

Btrfs Filesystem Chris Mason Btrfs Goals General purpose filesystem that scales to very large

Linux Filesystem Hierarchy Linux Filesystem Hierarchy and Hard Disk Partitioning and Hard Disk

SElinux filesystem filesystem labeling labeling SElinux and type enforcement and type

Lecture 02: Unix Filesystem APIs Software layered over hardware, filesystem API calls

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

How to Build Reliable, Scalable Filesystem Solution Using Cloud Infrastructure Sasikanth

SNR SNR- -cloud interaction cloud interaction cloud interaction SNR SNR cloud interaction

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Cloud Ross Mallace Commercial Director Cloud/SaaS Cloud is here. ALL By 2020 most core

Embracing Cloud Ian Apperley Agenda A little about me What is Cloud and where did it come

Are We Really Cloud-Native? Bert Ertman Cloud-Native Computing What is Cloud-Native? answer:

exhale no compromise packing presented by team silver baggage fees exploding

Mobile Lung Cancer Screening Darcy Doege, RN, BSN Program Coordinator for the Lung B.A.S.E.S. 4

Problem Steady single-phase flow can be described by: a p = f This PDE can be

ON NUMERICAL UPSCALING FOR STOKES AND ON NUMERICAL UPSCALING FOR STOKES AND STOKES- -BRINKMAN

A very elementary introduction to proofs Part 1 Example: Prove a function is 1:1 By Dr. Isabel

A very elementary introduction to proofs Part 2 Example: Prove a function is not 1:1 By Dr.

Teddi, BuildTeddi & TEDDINET - What does success look like ? J udith Ward 15 June 2018

TLC Sept 12, , 2020 Your District Rotary Foundation Committee: Role Name(s) Email District

Cloud Filesystem Jeff Darcy for BBLISA, October 2011 What is a - PowerPoint PPT Presentation

Cloud Filesystem Jeff Darcy for BBLISA, October 2011 What is a Filesystem? The thing every OS and language knows Directories, files, file descriptors Directories within directories Operate on single record (POSIX: single

FrontendFS Creating a userspace filesystem in node.js Clay Smith, New Relic BUILDING A

Mostafa Z. Ali Mostafa Z. Ali mzali@just.edu.jo 1 1 The Linux FileSystem A filesystem is

The Btrfs Filesystem Chris Mason The Btrfs Filesystem Jointly developed by a number of

The Btrfs Filesystem Chris Mason The Btrfs Filesystem Jointly developed by a number of

Btrfs Filesystem Chris Mason Btrfs Goals General purpose filesystem that scales to very large

Linux Filesystem Hierarchy Linux Filesystem Hierarchy and Hard Disk Partitioning and Hard Disk

SElinux filesystem filesystem labeling labeling SElinux and type enforcement and type

Lecture 02: Unix Filesystem APIs Software layered over hardware, filesystem API calls

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

How to Build Reliable, Scalable Filesystem Solution Using Cloud Infrastructure Sasikanth

SNR SNR- -cloud interaction cloud interaction cloud interaction SNR SNR cloud interaction

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Cloud Ross Mallace Commercial Director Cloud/SaaS Cloud is here. ALL By 2020 most core

Embracing Cloud Ian Apperley Agenda A little about me What is Cloud and where did it come

Are We Really Cloud-Native? Bert Ertman Cloud-Native Computing What is Cloud-Native? answer:

exhale no compromise packing presented by team silver baggage fees exploding

Mobile Lung Cancer Screening Darcy Doege, RN, BSN Program Coordinator for the Lung B.A.S.E.S. 4

Problem Steady single-phase flow can be described by: a p = f This PDE can be

ON NUMERICAL UPSCALING FOR STOKES AND ON NUMERICAL UPSCALING FOR STOKES AND STOKES- -BRINKMAN

A very elementary introduction to proofs Part 1 Example: Prove a function is 1:1 By Dr. Isabel

A very elementary introduction to proofs Part 2 Example: Prove a function is not 1:1 By Dr.

Teddi, BuildTeddi &amp; TEDDINET - What does success look like ? J udith Ward 15 June 2018

TLC Sept 12, , 2020 Your District Rotary Foundation Committee: Role Name(s) Email District

Teddi, BuildTeddi & TEDDINET - What does success look like ? J udith Ward 15 June 2018