Linux Open Source Distributed Filesystem Ceph at SURFsara Remco van - PowerPoint PPT Presentation

Linux Open Source Distributed Filesystem Ceph at SURFsara Remco van Vugt July 2, 2013 1/ 34

Agenda ◮ Ceph internal workings ◮ Ceph components ◮ CephFS ◮ Ceph OSD ◮ Research project results ◮ Stability ◮ Performance ◮ Scalability ◮ Maintenance ◮ Conclusion ◮ Questions 2/ 34

Ceph components 3/ 34

CephFS ◮ Fairly new, under heavy development ◮ POSIX compliant ◮ Can be mounted through FUSE in userspace, or by kernel driver 4/ 34

CephFS (2) Figure: Ceph state of development 5/ 34

CephFS (3) Figure: Dynamic subtree partitioning 6/ 34

Ceph OSD ◮ Stores object data in flat files in underlying filesystem (XFS, BTRFS) ◮ Multiple OSDs on a single node (usually: one per disk) ◮ ’Intelligent daemon’, handles replication, redundancy and consistency 7/ 34

CRUSH ◮ Cluster map ◮ Object placement is calculated, instead of indexed ◮ Objects grouped into Placement Groups (PGs) ◮ Clients interact direct with OSDs 8/ 34

Placement group Figure: Placement groups 9/ 34

Failure domains Figure: Crush algorithm 10/ 34

Replication Figure: Replication 11/ 34

Monitoring ◮ OSD use peering, and report about each other ◮ OSD either up or down ◮ OSD either in or out the cluster ◮ MON keeps overview, and distrubutes cluster map changes 12/ 34

OSD fault recovery ◮ OSD down, I/O continues to secondary (or tertiary) OSD assigned to PG (active+degraded) ◮ OSD down longer than configured timeout, OSD is down and out (kicked out of the cluster) ◮ PG data is remapped to other OSD and re-replicated in the background ◮ PGs can be down if all copies are down 13/ 34

Rebalancing 14/ 34

Research 15/ 34

Research questions ◮ Research question ◮ Is the current version of CephFS (0.61.3) production-ready for use as a distributed filesystem in a multi-petabyte environment, in terms of stability, scalability, performance and manageability? ◮ Sub questions ◮ Is Ceph, and an in particular the CephFS component, stable enough for production use at SURFsara? ◮ What are the scaling limits in CephFS, in terms of capacity and performance? ◮ Does Ceph(FS) meet the maintenance requirements for the environment at SURFsara? 16/ 34

Stability ◮ Various tests performed, including: ◮ Cut power from OSD, MON and MDS nodes ◮ Pull disks from OSD nodes (within failure domain) ◮ Corrupt underlying storage files on OSD ◮ Killed daemon processes ◮ No serious problems encountered, except for multi-mds ◮ Never encountered data loss 17/ 34

Performance ◮ Benchmarked RADOS and CephFS ◮ Bonnie++ ◮ RADOS bench ◮ Tested under various conditions: ◮ Normal ◮ Degraded ◮ Rebuilding ◮ Rebalancing 18/ 34

RADOS Performance 19/ 34

CephFS Performance 20/ 34

CephFS MDS Scalability ◮ Tested metadata performance using mdtest ◮ Various POSIX operations, using 1000,2000,4000,8000 and 16000 files per directory ◮ Tested 1 and 3 MDS setup ◮ Tested single and multiple directories 21/ 34

CephFS MDS Scalability (2) ◮ Results: ◮ Did not multi-thread properly ◮ Scaled over multiple MDS ◮ Scaled over multiple directories ◮ However... 22/ 34

CephFS MDS Scalability (3) 23/ 34

Ceph OSD Scalability ◮ Two options for scaling: ◮ Horizontal: adding more OSD nodes ◮ Vertical: adding more disks to OSD nodes ◮ But how far can we scale..? 24/ 34

Scaling horizontal Number of OSDs PGs MB /sec max (MB /sec) Overhead % 24 1200 586 768 24 36 1800 908 1152 22 48 2400 1267 1500 16 25/ 34

Scaling vertical ◮ OSD scaling ◮ Add more disks, possibly using external SAS enclosures ◮ But, each disk adds overhead (CPU, I/O subsystem) 26/ 34

Scaling vertical (2) 27/ 34

Scaling vertical (3) 28/ 34

Scaling OSDs ◮ Scaling horizontal seems no problem ◮ Scaling vertical has it’s limits ◮ Possibly tunable ◮ Jumbo frames? 29/ 34

Maintenance ◮ Built in tools sufficient ◮ Deployment ◮ Crowbar ◮ Chef ◮ Ceph deploy ◮ Configuration ◮ Puppet 30/ 34

Research (2) ◮ Research question ◮ Is the current version of CephFS (0.61.3) production-ready for use as a distributed filesystem in a multi-petabyte environment, in terms of stability, scalability, performance and manageability? ◮ Sub questions ◮ Is Ceph, and an in particular the CephFS component, stable enough for production use at SURFsara? ◮ What are the scaling limits in CephFS, in terms of capacity and performance? ◮ Does Ceph(FS) meet the maintenance requirements for the environment at SURFsara? 31/ 34

Conclusion ◮ Ceph is stable and scalable ◮ RADOS storage backend ◮ Possibly: RBD and object storage, but outside scope ◮ However: CephFS is not yet production ready ◮ Scaling is a problem ◮ MDS failover was not smooth ◮ Multi-MDS not yet stable ◮ Let alone directory sharding ◮ However: developer attention back on CephFS 32/ 34

Conclusion (2) ◮ Maintenance ◮ Extensive tooling available ◮ Integration into existing toolset possible ◮ Self-healing, low maintenance possible 33/ 34

Questions? 34/ 34

Linux Open Source Distributed Filesystem Ceph at SURFsara Remco van - PowerPoint PPT Presentation

Linux Open Source Distributed Filesystem Ceph at SURFsara Remco van Vugt July 2, 2013 1/ 34 Agenda Ceph internal workings Ceph components CephFS Ceph OSD Research project results Stability Performance Scalability

Mostafa Z. Ali Mostafa Z. Ali mzali@just.edu.jo 1 1 The Linux FileSystem A filesystem is

Linux Filesystem Hierarchy Linux Filesystem Hierarchy and Hard Disk Partitioning and Hard Disk

Filesystem Hierarchy and Permissions Linux Prepared by Steven Gordon on 19 April 2017

Introduction to Linux Aline Abler Aline Abler Linux, whats that? The pieces of a Linux

FrontendFS Creating a userspace filesystem in node.js Clay Smith, New Relic BUILDING A

Linux Kung Fu Introduction What is Linux? Why Linux? What is the difference between a client

Linux Overview Amir Hossein Payberah payberah@gmail.com 1 Agenda Linux Overview Linux

Linux from Sensors to Servers ! When is Linux Not Linux? ! 1 1 Linux runs across a huge range

Btrfs Filesystem Chris Mason Btrfs Goals General purpose filesystem that scales to very large

Make Money With Open Source What is Open Source? Community Free software vs. open source

Introduction to Linux Fundamentals of Computer Science Outline Operating Systems Linux

The State of the Linux Desktop An OSDL Perspective John Cherry OSDL Desktop Linux (DTL)

The Btrfs Filesystem Chris Mason The Btrfs Filesystem Jointly developed by a number of

The Btrfs Filesystem Chris Mason The Btrfs Filesystem Jointly developed by a number of

SElinux filesystem filesystem labeling labeling SElinux and type enforcement and type

Lecture 02: Unix Filesystem APIs Software layered over hardware, filesystem API calls

Ceph: All-in-One Network Data Storage What is Ceph and how we use it to backend the Arbutus cloud

CephFS as a service with OpenStack Manila John Spray john.spray@redhat.com jcsp on #ceph-devel

THE POWER OF RED HAT CEPH STORAGE And how its essential to your OpenStack environment

Managing and Monitoring Ceph with the Ceph Dashboard Lenz Grimmer <lgrimmer@suse.com> |

A network for improvement of cephalopod welfare husbandry in research, aquaculture and fisheries (

Evaluating selected cluster file systems with Parabench Internship report Authors: Marcel Krause;

D3N A multi-layer cache for the rest of us E. Ugur Kaynar, Mania Abdi, Mohammad Hossein Hajkazemi,

Storage Cluster mit Ceph CeBIT 2015 20. Mrz 2015 Michel Rode Linux/Unix Consultant &

Linux Open Source Distributed Filesystem Ceph at SURFsara Remco van - PowerPoint PPT Presentation

Linux Open Source Distributed Filesystem Ceph at SURFsara Remco van Vugt July 2, 2013 1/ 34 Agenda Ceph internal workings Ceph components CephFS Ceph OSD Research project results Stability Performance Scalability

Mostafa Z. Ali Mostafa Z. Ali mzali@just.edu.jo 1 1 The Linux FileSystem A filesystem is

Linux Filesystem Hierarchy Linux Filesystem Hierarchy and Hard Disk Partitioning and Hard Disk

Filesystem Hierarchy and Permissions Linux Prepared by Steven Gordon on 19 April 2017

Introduction to Linux Aline Abler Aline Abler Linux, whats that? The pieces of a Linux

FrontendFS Creating a userspace filesystem in node.js Clay Smith, New Relic BUILDING A

Linux Kung Fu Introduction What is Linux? Why Linux? What is the difference between a client

Linux Overview Amir Hossein Payberah payberah@gmail.com 1 Agenda Linux Overview Linux

Linux from Sensors to Servers ! When is Linux Not Linux? ! 1 1 Linux runs across a huge range

Btrfs Filesystem Chris Mason Btrfs Goals General purpose filesystem that scales to very large

Make Money With Open Source What is Open Source? Community Free software vs. open source

Introduction to Linux Fundamentals of Computer Science Outline Operating Systems Linux

The State of the Linux Desktop An OSDL Perspective John Cherry OSDL Desktop Linux (DTL)

The Btrfs Filesystem Chris Mason The Btrfs Filesystem Jointly developed by a number of

The Btrfs Filesystem Chris Mason The Btrfs Filesystem Jointly developed by a number of

SElinux filesystem filesystem labeling labeling SElinux and type enforcement and type

Lecture 02: Unix Filesystem APIs Software layered over hardware, filesystem API calls

Ceph: All-in-One Network Data Storage What is Ceph and how we use it to backend the Arbutus cloud

CephFS as a service with OpenStack Manila John Spray john.spray@redhat.com jcsp on #ceph-devel

THE POWER OF RED HAT CEPH STORAGE And how its essential to your OpenStack environment

Managing and Monitoring Ceph with the Ceph Dashboard Lenz Grimmer &lt;lgrimmer@suse.com&gt; |

A network for improvement of cephalopod welfare husbandry in research, aquaculture and fisheries (

Evaluating selected cluster file systems with Parabench Internship report Authors: Marcel Krause;

D3N A multi-layer cache for the rest of us E. Ugur Kaynar, Mania Abdi, Mohammad Hossein Hajkazemi,

Storage Cluster mit Ceph CeBIT 2015 20. Mrz 2015 Michel Rode Linux/Unix Consultant &amp;

Managing and Monitoring Ceph with the Ceph Dashboard Lenz Grimmer <lgrimmer@suse.com> |

Storage Cluster mit Ceph CeBIT 2015 20. Mrz 2015 Michel Rode Linux/Unix Consultant &