Scaling Your Storage Using Ceph Wido den Hollander #CCCEU
Who am I? ● Wido den Hollander (1986) – CTO at PCextreme B.V. ● Dutch Hosting provider – Ceph trainer and consultant at 42on B.V. – Part of the Ceph community since late 2010 ● Wrote PHP and Java bindings ● Wrote the CloudStack integration – Including libvirt storage pool support #CCCEU / @widodh 09 Oct 2015 2
CloudStack primary storage #CCCEU
CloudStack primary storage ● A set of hypervisors with storage – NFS, iSCSI or FiberChannel – Usually one NAS or SAN per cluster ● Local Network for low latency and high bandwidth #CCCEU / @widodh 09 Oct 2015 4
CloudStack primary storage #CCCEU / @widodh 09 Oct 2015 5
CloudStack primary storage ● Scaling is a problem however: – Number of disks – Network connections/bandwidth – CPU power – Protocols ● NFS and iSCSI do not scale #CCCEU / @widodh 09 Oct 2015 6
Scaling NFS or iSCSI ● NFS and iSCSI expect the server to always be available – Vendors implement all kinds of tricks ● Virtual IPs ● ARP spoofjng – This is a fundamental problem when it comes to large scale #CCCEU / @widodh 09 Oct 2015 7
Black boxes ● Black boxes: – EMC, EqualLogic, NetApp, they provide you a black box – Vendor lock-in – End-of-Life determined by vendor #CCCEU / @widodh 09 Oct 2015 8
Ceph #CCCEU
What is Ceph? ● Ceph is a distributed object store and fjle system designed to provide: – excellent performance – reliability – scalability #CCCEU / @widodh 09 Oct 2015 10
Design principles ● Data is replicated in the Ceph cluster – User specifjed, 2x, 3x (recommended), 4x, etc ● Hardware failure is the rule – Not the exception! ● Software defjned storage – Fully hardware agnostic ● Consistency goes over availability #CCCEU / @widodh 09 Oct 2015 11
What is Ceph? #CCCEU / @widodh 09 Oct 2015 12
How does it work? ● Clients are aware of cluster status – Client calculates where objects are – The connect directly to the nodes using TCP ● Ceph nodes are intelligent – They take care of replication and recovery ● Block Devices are striped over 4MB objects – These objects are replicated by the nodes #CCCEU / @widodh 09 Oct 2015 13
How does it perform? ● Ceph performs great with parallel I/O – Cloud workloads are parallel – Do not expect 10k IOps for a single disk ● Each node adds I/O, RAM and CPU – Thus adds performance ● Latency is mainly infmuenced by the network – The lower the latency, the better the performance #CCCEU / @widodh 09 Oct 2015 14
How does it perform? ● Network latency is key – Difgerence between 10GbE and 1GbE is big ● 8k packet round trip: – 1GbE: 0.8ms – 10GbE: 0.2ms #CCCEU / @widodh 09 Oct 2015 15
How does it perform? #CCCEU / @widodh 09 Oct 2015 16
Failure is the rule! ● Ceph is designed for failure! ● T ake out any machine you want – Kernel upgrades – Firmware upgrades – Replacement of hardware ● 1000 day uptimes are no longer cool – Upgrade and reboot those machines! #CCCEU / @widodh 09 Oct 2015 17
Failure is the rule! #CCCEU / @widodh 09 Oct 2015 18
Failure is the rule! #CCCEU / @widodh 09 Oct 2015 19
Failure is the rule! #CCCEU / @widodh 09 Oct 2015 20
Scaling Ceph ● Ceph is designed to scale – Start with 10TB and scale to 10PB ● No downtime or manual data migration is required – Never watch to rsync or scp ● Migration is proportional to the change – Expand by 10% and about 10% of migrates #CCCEU / @widodh 09 Oct 2015 21
Scaling Ceph ● During expansion data migrates automatically to new nodes ● New nodes add additional performance ● Mix difgerent types of hardware – 2TB and 4TB drives for example #CCCEU / @widodh 09 Oct 2015 22
Designing a Ceph cluster ● Use small(er) nodes – 2U machines with 8 drives ● More nodes means less impact when a node fails – 'Failure' could be maintenance! ● Start with at least 10 nodes #CCCEU / @widodh 09 Oct 2015 23
Designing a Ceph cluster #CCCEU / @widodh 09 Oct 2015 24
Ceph as Primary Storage #CCCEU
Ceph as Primary Storage ● Ceph block devices can be used as primary storage ● KVM is currently the only supported hypervisor – Ubuntu works best – CentOS works with patched libvirt ● All operations are supported – T emplates, snapshots, resizing #CCCEU / @widodh 09 Oct 2015 26
Future plans ● Incremental backups to other Ceph cluster – Is on Ceph's roadmap ● Snapshots without copy to Secondary Storage ● Xen support #CCCEU / @widodh 09 Oct 2015 27
Ceph at PCextreme #CCCEU
Ceph at PCextreme ● We use Ceph as Primary Storage behind CloudStack – KVM hypervisor on Ubuntu ● We have two Service Ofgerings: – Agile: Local Storage on SSD – Stamina: Ceph RBD storage ● Only available in Amsterdam #CCCEU / @widodh 09 Oct 2015 29
Ceph at PCextreme ● 500TB of storage – 39 hosts – 3 racks ● Replicas spread out over racks – 258 disks ● 96 SSDs ● 162 HDDs #CCCEU / @widodh 09 Oct 2015 30
Ceph at PCextreme ● 20.000 IOps on average ● SuperMicro hardware – Mix of Samsung, Intel, Seagate and Western Digital SSDs/HDDs ● Running on IPv6-only – There is NO IPv4 in the Ceph cluster – Public routed IPv6 (with a fjrewall) #CCCEU / @widodh 09 Oct 2015 31
Ceph at PCextreme #CCCEU / @widodh 09 Oct 2015 32
Ceph at PCextreme #CCCEU / @widodh 09 Oct 2015 33
Ceph at PCextreme #CCCEU / @widodh 09 Oct 2015 34
Ceph at PCextreme HEALTH_WARN #CCCEU / @widodh 09 Oct 2015 35
Ceph at PCextreme ● We are updating the whole Ceph cluster – Using bcache – Updating to Ubuntu 14.04 – Updating Ceph #CCCEU / @widodh 09 Oct 2015 36
Ceph at PCextreme During offjce hours :-) #CCCEU / @widodh 09 Oct 2015 37
Ceph at PCextreme ● We don't do Ceph maintaince at night anymore ● That's how Ceph should be used #CCCEU / @widodh 09 Oct 2015 38
Thanks! Find me @widodh on Skype and T witter wido@42on.com #CCCEU
Recommend
More recommend