DRBD 9 Linux Storage Replication Lars Ellenberg LINBIT HA - PowerPoint PPT Presentation

„DRBD 9“ Linux Storage Replication Lars Ellenberg LINBIT HA Solutions GmbH Vienna, Austria

What this talk is about What is replication • Why block level replication • Why replication • What do we have to deal with • How we are dealing with it now • Where development is headed •

Linux Storage Replication Replication Basics DRBD 8 Overview DM-Replicator DRBD 9 Other Ideas

Standalone Servers No System Level Redundancy • Vulnerable to Failures Important Systems • Node 1 Node 2 Node 3

Application Level Replication Special Purpose Solution • Difficult to add to an application Important Systems • after the fact App App Node 1 Node 3

Filesystem Level Replication Special Filesystem • Complex Important Systems • Replicate on dirty? • Node 1 Node 3 FS FS ... on writeout? • ... on close? • What about metadata? • Resilience? •

Shared Storage (SAN) No Storage Redundancy • Important Systems Node 1 Node 2 Node 3 FC, iSCSI Shared data Shared Storage/SAN

Replication capable SAN Application agnostic • Expensive Hardware Important Systems • Expensive License costs • Node 1 Node 2 Node 3 FC, iSCSI Shared data Replica Shared Storage/SAN Shared Storage/SAN

Block Level Replication Storage Redundancy • Application Agnostic • DRBD Node 1 Node 2 Generic • Cluster Flexible •

SAN Replacement Storage Cluster Storage Redundancy • Application Agnostic Important Systems • Generic • Node 1 Node 2 Node 3 Flexible • iSCSI DRBD Node 1 Node 2 Storage Cluster

How it works: Normal operation Write I/O Write I/O Application Read I/O Read I/O Primary Node Data blocks Replicate Replicate Acknowledge Acknowledge Secondary Node Data blocks

How it works: Primary Node Failure Write I/O Application Read I/O Read I/O Write I/O Primary Node Read I/O Read I/O Data blocks Application Replicate Acknowledge Secondary Node Primary Node Data blocks

How it works: Secondary Node Failure Write I/O Write I/O Application Read I/O Read I/O Primary Node Data blocks Offline Node Data blocks

How it works: Secondary Node Recovery Application Read I/O Read I/O Primary Node Data blocks Resync Resync Acknowledge Acknowledge Secondary Node Data blocks

What if ... We want additional replica for desaster recovery • - we can stack DRBD The latency to the remote site is too high • - stack DRBD for local redundancy, run the high latency link in asynchronous mode, add buffering and compressing with DRBD proxy Primary node/site fails during resync • - Snapshot before becoming sync target

It Works. Though it may be ugly. • Can we do better? •

Generic Replication Framework Track Data changes • - Persistent (on Disk) Data Journal - “global” write ordering over multiple volumes - Fallback to bitmap based change tracking Multi-node. • - many “site links” feed from the journal Flexible Policy • - When to report completion to upper layers - (when to) do fallback to bitmap

Current „default“ reference implementation Only talks to “dumb” block devices • “Software RAID1” • allowing some legs to lag behind No concept of “data generation” • Cannot communicate metadata • Not directly suitable for failover solutions • Primary objective: cut down on “hardware” replication licence • costs, replicate SAN-LUNs in software to desaster recovery sites.

DRBD 9 Replication Basics DRBD 8 Overview DM-Replicator DRBD 9 Other Ideas

Replicating smarter, asynchronous Detect and discard overwrites • - shipped batches must be atomic Compress • Compress XOR-diff • Side effects • - Can be undone - Checkpointing of generic block data - Point in time recovery

Replicating smarter, synchronous Identify a certain Data Set Version • Start from scratch • continuous stream of changes • Data Generation Tags, dagtag • - which clone (node name) - which volume (label) - who modified it last (committer) - modification date (position in the change stream)

Colorful Replication Stream Data Set Divergence Primary Node Changes atomic batch discarding overwrites

Advantages of the Data Generation Tag scheme On handshake, exchange dagtag s • - Trivially see who has the best data even on primary site failure with multiple secondaries possibly lagging behind Communicate dagtags with atomic (compressed, xor-diff) • batches - allows for daisy chaining keep dagtag and batch payload • - Checkpointing: just store the dagtag .

DRBD 9 Replication Basics DRBD 8 Overview DM-Replicator DRBD 9 Other Ideas

Stretched cluster file systems? Multiple branch offices • One cluster filesystem • Latency would make unusable • But when • - keeping leases and - inserting lock requests into the replication data stream - while having mostly self-contained access in the branch offices It may feel like low latency most of the time, with occasional • longer delays on access. Tell me why I'm wrong :-) •

Comments? lars@linbit.com http://www.linbit.com http://www.drbd.org If you think you can help, we are Hireing!

DRBD 9 Linux Storage Replication Lars Ellenberg LINBIT HA - PowerPoint PPT Presentation

DRBD 9 Linux Storage Replication Lars Ellenberg LINBIT HA Solutions GmbH Vienna, Austria What this talk is about What is replication Why block level replication Why replication What do we have to deal with How we

Database HA with a Punch MySQL with DRBD for Dolphin Express What are we talking about? Write

Hands on Virtualization with Ganeti OSCON 2011 Setup Guide This setup guide covers installing and

The Hadoop Distributed File System Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert

1 What Limits Performance? Stalls (Data Hazards) Data hazards Code Instruction depends on

PipeDream: Generalized Pipeline Parallelism for DNN Training Deepak Narayanan , Aaron Harlap

Ideas for evolution of replication technology @ CERN Openlab Minor Review December 14 th , 2010

High availability and analysis of PostgreSQL Sergey Kalinin 18-19 of April 2012, dCache

Zerto Virtual Replication 4.5 Disaster Recovery Evolved Zerto provides enterprise-class, virtual

Replicating the Performance Evaluation of an N-Body Application on a Manycore Accelerator

ASTEROID AN ANALYZABLE, RESILIENT, EMBEDDED REAL-TIME OPERATING SYSTEM DESIGN Bj orn D

Content Replication in I2-DSI using Rsync+ Bert J Dempsey Debra Weiss University of North

an Object-Based File System for Large-Scale Federated IT Infrastructures Jan Stender, Zuse

MoSeL: A General, Extensible Modal Framework for Interactive Proofs in Separation Logic Robbert

Chapter 1 systems. Appreciate the evolution of computers. Introduction Understand the

IPv6 over Low power WPAN WG (6lowpan) Chairs: Geoff Mulligan <geoff@mulligan.com> Carsten

Gender Diversity at the Top: Good Intentions and Unexpected Consequences GENDER DIVERSITY

Message Authentication Codes (MACs) Tung Chou Technische Universiteit Eindhoven, The Netherlands

References Message Authentication Codes (MACs) Message Authentication Codes (MACs), Chapter

Objectives Security Notions of MACs NMACs and HMACs CBC-MACs Low Power Ajit Pal

MESSAGE AUTHENTICATION CODES and PRF DOMAIN EXTENSION The goal is to ensure that M really

3-509

CS 642: Midterm 1 Review Questions and General Study Pointers March 2020 1 Threat Modeling,

Tight PRF-Security of Double-block Hash-then-Sum MACs Seongkwang Kim, Byeonghak Lee , Jooyoung Lee

Lecture 7 - Applied Cryptography CSE497b - Spring 2007 Introduction Computer and Network Security

DRBD 9 Linux Storage Replication Lars Ellenberg LINBIT HA - PowerPoint PPT Presentation

DRBD 9 Linux Storage Replication Lars Ellenberg LINBIT HA Solutions GmbH Vienna, Austria What this talk is about What is replication Why block level replication Why replication What do we have to deal with How we

Database HA with a Punch MySQL with DRBD for Dolphin Express What are we talking about? Write

Hands on Virtualization with Ganeti OSCON 2011 Setup Guide This setup guide covers installing and

The Hadoop Distributed File System Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert

1 What Limits Performance? Stalls (Data Hazards) Data hazards Code Instruction depends on

PipeDream: Generalized Pipeline Parallelism for DNN Training Deepak Narayanan , Aaron Harlap

Ideas for evolution of replication technology @ CERN Openlab Minor Review December 14 th , 2010

High availability and analysis of PostgreSQL Sergey Kalinin 18-19 of April 2012, dCache

Zerto Virtual Replication 4.5 Disaster Recovery Evolved Zerto provides enterprise-class, virtual

Replicating the Performance Evaluation of an N-Body Application on a Manycore Accelerator

ASTEROID AN ANALYZABLE, RESILIENT, EMBEDDED REAL-TIME OPERATING SYSTEM DESIGN Bj orn D

Content Replication in I2-DSI using Rsync+ Bert J Dempsey Debra Weiss University of North

an Object-Based File System for Large-Scale Federated IT Infrastructures Jan Stender, Zuse

MoSeL: A General, Extensible Modal Framework for Interactive Proofs in Separation Logic Robbert

Chapter 1 systems. Appreciate the evolution of computers. Introduction Understand the

IPv6 over Low power WPAN WG (6lowpan) Chairs: Geoff Mulligan &lt;geoff@mulligan.com&gt; Carsten

Gender Diversity at the Top: Good Intentions and Unexpected Consequences GENDER DIVERSITY

Message Authentication Codes (MACs) Tung Chou Technische Universiteit Eindhoven, The Netherlands

References Message Authentication Codes (MACs) Message Authentication Codes (MACs), Chapter

Objectives Security Notions of MACs NMACs and HMACs CBC-MACs Low Power Ajit Pal

MESSAGE AUTHENTICATION CODES and PRF DOMAIN EXTENSION The goal is to ensure that M really

3-509

CS 642: Midterm 1 Review Questions and General Study Pointers March 2020 1 Threat Modeling,

Tight PRF-Security of Double-block Hash-then-Sum MACs Seongkwang Kim, Byeonghak Lee , Jooyoung Lee

Lecture 7 - Applied Cryptography CSE497b - Spring 2007 Introduction Computer and Network Security

IPv6 over Low power WPAN WG (6lowpan) Chairs: Geoff Mulligan <geoff@mulligan.com> Carsten