CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - - PowerPoint PPT Presentation

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - Red Hat 1 OpenStack Summit - 2018.11.15

OUTLINE Ceph ● Data services ● Block ● File ● Object ● Edge ● Future ● 2

UNIFIED STORAGE PLATFORM OBJECT BLOCK FILE RGW RBD CEPHFS S3 and Swift Virtual block device Distributed network object storage with robust feature set file system LIBRADOS Low-level storage API RADOS Reliable, elastic, highly-available distributed storage layer with replication and erasure coding 3

RELEASE SCHEDULE WE ARE HERE Luminous Mimic Nautilus Octopus Aug 2017 May 2018 Feb 2019 Nov 2019 12.2.z 13.2.z 14.2.z 15.2.z ● Stable, named release every 9 months ● Backports for 2 releases ● Upgrade up to 2 releases at a time ● (e.g., Luminous → Nautilus, Mimic → Octopus) 4

FOUR CEPH PRIORITIES Usability and management Container platforms Performance Multi- and hybrid cloud 5

MOTIVATION - DATA SERVICES 6

A CLOUDY FUTURE IT organizations today ● Multiple private data centers ○ Multiple public cloud services ○ It’s getting cloudier ● “On premise” → private cloud ○ Self-service IT resources, provisioned on demand by developers and business units ○ Next generation of cloud-native applications will span clouds ● “Stateless microservices” are great, but real applications have state. ● 7

DATA SERVICES Data placement and portability ● Where should I store this data? ○ How can I move this data set to a new tier or new site? ○ Seamlessly, without interrupting applications? ○ Introspection ● What data am I storing? For whom? Where? For how long? ○ Search, metrics, insights ○ Policy-driven data management ● Lifecycle management ○ Conformance: constrain placement, retention, etc. (e.g., HIPAA, GPDR) ○ Optimize placement based on cost or performance ○ Automation ○ 8

MORE THAN JUST DATA Data sets are tied to applications ● When the data moves, the application often should (or must) move too ○ Container platforms are key ● Automated application (re)provisioning ○ “Operators” to manage coordinated migration of state and applications that consume it ○ 9

DATA USE SCENARIOS Multi-tier ● Different storage for different data ○ Mobility ● Move an application and its data between sites with minimal (or no) availability interruption ○ Maybe an entire site, but usually a small piece of a site ○ Disaster recovery ● Tolerate a site-wide failure; reinstantiate data and app in a new site quickly ○ Point-in-time consistency with bounded latency (bounded data loss) ○ Stretch ● Tolerate site outage without compromising data availability ○ Synchronous replication (no data loss) or async replication (different consistency model) ○ Edge ● Small (e.g., telco POP) and/or semi-connected sites (e.g., autonomous vehicle) ○ 10

BLOCK STORAGE 11

HOW WE USE BLOCK Virtual disk device ● Exclusive access by nature (with few exceptions) ● Strong consistency required ● Performance sensitive ● Basic feature set ● Applications Read, write, flush, maybe resize ○ Snapshots (read-only) or clones (read/write) ○ XFS, ext4, whatever Point-in-time consistent ■ Often self-service provisioning ● Block device via Cinder in OpenStack ○ via Persistent Volume (PV) abstraction in Kubernetes ○ 12

RBD - TIERING WITH RADOS POOLS Multi-tier ✓ Mobility ❏ DR ❏ KVM Stretch ❏ Edge ❏ FS FS KRBD librbd SSD 2x POOL HDD 3x POOL SSD EC 6+3 POOL CEPH STORAGE CLUSTER 13

RBD - LIVE IMAGE MIGRATION Multi-tier ✓ New in Nautilus ● Mobility ✓ DR ❏ KVM Stretch ❏ Edge ❏ FS FS KRBD librbd SSD 2x POOL HDD 3x POOL SSD EC 6+3 POOL CEPH STORAGE CLUSTER 14

RBD - STRETCH Multi-tier ❏ Apps can move ● Mobility ❏ DR ✓ Data can’t - it’s everywhere ● Stretch FS ✓ Performance is compromised ● Edge ❏ KRBD Need fat and low latency pipes ○ SITE A SITE B STRETCH POOL STRETCH CEPH STORAGE CLUSTER WAN link 15

RBD - STRETCH WITH TIERS Multi-tier ✓ Create site-local pools for performance ● Mobility ❏ DR ✓ sensitive apps Stretch FS ✓ Edge ❏ KRBD SITE A SITE B A POOL STRETCH POOL B POOL STRETCH CEPH STORAGE CLUSTER WAN link 16

RBD - STRETCH WITH MIGRATION Multi-tier ✓ Live migrate images between pools ● Mobility ✓ DR ✓ Maybe even live migrate your app VM? ● Stretch FS ✓ Edge ❏ KRBD SITE A SITE B A POOL STRETCH POOL B POOL STRETCH CEPH STORAGE CLUSTER WAN link 17

STRETCH IS SKETCH Network latency is critical ● Low latency for performance ○ Requires nearby sites, limiting usefulness ○ Bandwidth too ● Must be able to sustain rebuild data rates ○ Relatively inflexible ● Single cluster spans all locations ○ Cannot “join” existing clusters ○ High level of coupling ● Single (software) failure domain for all ○ sites 18

RBD ASYNC MIRRORING Asynchronously mirror writes ● Small performance overhead at primary ● KVM Mitigate with SSD pool for RBD journal ○ Configurable time delay for backup ● FS librbd WAN link PRIMARY BACKUP Asynchronous mirroring SSD 3x POOL HDD 3x POOL CEPH CLUSTER A CEPH CLUSTER B 19

RBD ASYNC MIRRORING Multi-tier ❏ On primary failure ● Mobility ❏ DR ✓ Backup is point-in-time consistent ○ KVM Stretch ❏ Lose only last few seconds of writes ○ Edge ❏ VM can restart in new site ○ FS If primary recovers, ● librbd Option to resync and “fail back” ○ WAN link DIVERGENT PRIMARY Asynchronous mirroring SSD 3x POOL HDD 3x POOL CEPH CLUSTER A CEPH CLUSTER B 20

RBD MIRRORING IN CINDER Ocata Gaps ● ● Cinder RBD replication driver Deployment and configuration tooling ○ ○ Queens Cannot replicate multi-attach volumes ● ○ Nova attachments are lost on failover ○ ceph-ansible deployment of rbd-mirror via ○ TripleO Rocky ● Failover and fail-back operations ○ 21

MISSING LINK: APPLICATION ORCHESTRATION Hard for IaaS layer to reprovision app in new site ● Storage layer can’t solve it on its own either ● Need automated, declarative, structured specification for entire app stack... ● 22

FILE STORAGE 23

CEPHFS STATUS Multi-tier ✓ Stable since Kraken ● Mobility ❏ DR ❏ Multi-MDS stable since Luminous ● Stretch ❏ Edge ❏ Snapshots stable since Mimic ● Support for multiple RADOS data pools ● Provisioning via OpenStack Manila and Kubernetes ● Fully awesome ● 24

CEPHFS - STRETCH? Multi-tier ❏ We can stretch CephFS just like RBD pools ● Mobility ❏ DR ✓ It has the same limitations as RBD ● Stretch ✓ Edge ❏ Latency → lower performance ○ Limited by geography ○ Big (software) failure domain ○ Also, ● MDS latency is critical for file workloads ○ ceph-mds daemons be running in one site or another ○ What can we do with CephFS across multiple clusters? ● 25

CEPHFS - SNAP MIRRORING SITE A SITE B Multi-tier ❏ CephFS snapshots provide ● Mobility ❏ DR ✓ point-in-time consistency ○ 1. A: create snap S1 Stretch ❏ S1 S1 granularity (any directory in the system) ○ Edge 2. rsync A→B ❏ 3. B: create snap S1 CephFS rstats provide ● rctime to efficiently find changes ○ 4. A: create snap S2 S2 S2 rsync provides ● 5. rsync A→B time 6. B: create S2 efficient file transfer ○ Time bounds on order of minutes ● 7. A: create snap S3 S3 Gaps and TODO ● 8. rsync A→B 9. B: create S3 “rstat flush” coming in Nautilus ○ Xuehan Xu @ Qihoo 360 ■ rsync support for CephFS rstats ○ scripting / tooling ○ 26

DO WE NEED POINT-IN-TIME FOR FILE? Yes. ● Sometimes. ● Some geo-replication DR features are built on rsync... ● Consistent view of individual files, ○ Lack point-in-time consistency between files ○ Some (many?) applications are not picky about cross-file consistency... ● Content stores ○ Casual usage without multi-site modification of the same files ○ 27

CASE IN POINT: HUMANS Many humans love Dropbox / NextCloud / etc. ● Ad hoc replication of directories to any computer ○ Archive of past revisions of every file ○ Offline access to files is extremely convenient and fast ○ Disconnected operation and asynchronous replication leads to conflicts ● Usually a pop-up in GUI ○ Automated conflict resolution is usually good enough ● e.g., newest timestamp wins ○ Humans are happy if they can rollback to archived revisions when necessary ○ A possible future direction: ● Focus less on avoiding/preventing conflicts… ○ Focus instead on ability to rollback to past revisions… ○ 28

BACK TO APPLICATIONS Do we need point-in-time consistency for file systems? ● Where does the consistency requirement come in? ● 29

MIGRATION: STOP, MOVE, START SITE A SITE B Multi-tier ❏ App runs in site A ● Mobility ✓ DR ❏ Stop app in site A ● Stretch ❏ Edge ❏ Copy data A→B ● Start app in site B ● time App maintains exclusive access ● Long service disruption ● 30

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - - PowerPoint PPT Presentation

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - Red Hat 1 OpenStack Summit - 2018.11.15 OUTLINE Ceph Data services Block File Object Edge Future 2 UNIFIED STORAGE PLATFORM OBJECT

Managing and Monitoring Ceph with the Ceph Dashboard Lenz Grimmer <lgrimmer@suse.com> |

Ceph Rados Block Device Venky Shankar Ceph Developer, Red Hat SNIA, 2017 1 WHAT IS CEPH?

CEPHALOPODS AND SAMBA IRA COOPER - SambaXP 2016.05.12 AGENDA CEPH Architecture. Why CEPH?

Linux Open Source Distributed Filesystem Ceph at SURFsara Remco van Vugt July 2, 2013 1/ 34

Agenda Openstack CEPH Storage Dream team: CEPH and Openstack Summary GUUG FFG 2015

BLUESTORE: A NEW STORAGE BACKEND FOR CEPH ONE YEAR IN SAGE WEIL 2017.03.23 OUTLINE Ceph

CEPH WIRE PROTOCOL REVISITED CEPH WIRE PROTOCOL REVISITED MESSENGER V2 MESSENGER V2 Ricardo

Ceph: All-in-One Network Data Storage What is Ceph and how we use it to backend the Arbutus cloud

Presentation: 1. I-Max Ceph key points 2. Exams 3. Dimensions 4. Technical features 5. I-Max

Ceph & RocksDB (Cloud Storage ) Ceph Basics Placement Group PG#1 PG#2 PG#3

Ceph storage with Rook Running Ceph on Kubernetes Alexander Trost, Rook Maintainer and DevOps

How to backup Ceph at scale FOSDEM, Brussels, 2018.02.04 About me Bartomiej wicki OVH

Know more about your Ceph Cluster with ELK Stack Cameron Seader Technology Strategist

Scaling Your Storage Using Ceph Wido den Hollander #CCCEU Who am I? Wido den Hollander

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - Red Hat 1 FOSDEM - 2019.02.02

an intro to ceph and big data patrick mcgarry inktank Big Data Workshop 27 JUN 2013 what

Alexandria City Council Retreat November 10, 2018 Agenda 9:00 Welcome & Opening Remarks

"Don't Be the Quality Gatekeeper: Just Hold Up the Mirror" Presented by: Mfundo Nkosi

RootZoneDNSSECDeployment ICANN39,Cartagena,Colombia

The ELT Site Control System Nick Kornweibel Control System Project Manager RTI Connext

Title 1 Presentation Jefferson Elementary School October 19, 2018 What is Title 1? Title 1

Marshall McLuhan as Foresighter Bob Logan logan@physics.utoronto.ca Every thing that I

Sketch-to-Scale Solutions Inves estor or Presentation esentation March 2017 Risks and

Quality assurance in the global statistical system- the role of CCSA Pieter Everaers Eurostat

Sambuz

Useful Links

Newsletter

Mail Us

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - - PowerPoint PPT Presentation

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - Red Hat 1 OpenStack Summit - 2018.11.15 OUTLINE Ceph Data services Block File Object Edge Future 2 UNIFIED STORAGE PLATFORM OBJECT

Managing and Monitoring Ceph with the Ceph Dashboard Lenz Grimmer &lt;lgrimmer@suse.com&gt; |

Ceph Rados Block Device Venky Shankar Ceph Developer, Red Hat SNIA, 2017 1 WHAT IS CEPH?

CEPHALOPODS AND SAMBA IRA COOPER - SambaXP 2016.05.12 AGENDA CEPH Architecture. Why CEPH?

Linux Open Source Distributed Filesystem Ceph at SURFsara Remco van Vugt July 2, 2013 1/ 34

Agenda Openstack CEPH Storage Dream team: CEPH and Openstack Summary GUUG FFG 2015

BLUESTORE: A NEW STORAGE BACKEND FOR CEPH ONE YEAR IN SAGE WEIL 2017.03.23 OUTLINE Ceph

CEPH WIRE PROTOCOL REVISITED CEPH WIRE PROTOCOL REVISITED MESSENGER V2 MESSENGER V2 Ricardo

Ceph: All-in-One Network Data Storage What is Ceph and how we use it to backend the Arbutus cloud

Presentation: 1. I-Max Ceph key points 2. Exams 3. Dimensions 4. Technical features 5. I-Max

Ceph &amp; RocksDB (Cloud Storage ) Ceph Basics Placement Group PG#1 PG#2 PG#3

Ceph storage with Rook Running Ceph on Kubernetes Alexander Trost, Rook Maintainer and DevOps

How to backup Ceph at scale FOSDEM, Brussels, 2018.02.04 About me Bartomiej wicki OVH

Know more about your Ceph Cluster with ELK Stack Cameron Seader Technology Strategist

Scaling Your Storage Using Ceph Wido den Hollander #CCCEU Who am I? Wido den Hollander

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - Red Hat 1 FOSDEM - 2019.02.02

an intro to ceph and big data patrick mcgarry inktank Big Data Workshop 27 JUN 2013 what

Alexandria City Council Retreat November 10, 2018 Agenda 9:00 Welcome &amp; Opening Remarks

&quot;Don't Be the Quality Gatekeeper: Just Hold Up the Mirror&quot; Presented by: Mfundo Nkosi

RootZoneDNSSECDeployment ICANN39,Cartagena,Colombia

The ELT Site Control System Nick Kornweibel Control System Project Manager RTI Connext

Title 1 Presentation Jefferson Elementary School October 19, 2018 What is Title 1? Title 1

Marshall McLuhan as Foresighter Bob Logan logan@physics.utoronto.ca Every thing that I

Sketch-to-Scale Solutions Inves estor or Presentation esentation March 2017 Risks and

Quality assurance in the global statistical system- the role of CCSA Pieter Everaers Eurostat

Sambuz

Useful Links

Newsletter

Mail Us

Managing and Monitoring Ceph with the Ceph Dashboard Lenz Grimmer <lgrimmer@suse.com> |

Ceph & RocksDB (Cloud Storage ) Ceph Basics Placement Group PG#1 PG#2 PG#3

Alexandria City Council Retreat November 10, 2018 Agenda 9:00 Welcome & Opening Remarks

"Don't Be the Quality Gatekeeper: Just Hold Up the Mirror" Presented by: Mfundo Nkosi