mysql and ceph
play

MySQL and Ceph Yves Trudeau, Principal Architect, Percona Yves - PowerPoint PPT Presentation

MySQL and Ceph Yves Trudeau, Principal Architect, Percona Yves Trudeau, Principal Architect, Percona Santa Clara, California | April 24th 27th, 2017 Santa Clara, California | April 24th 27th, 2017 Who am I? Physicist (by training)


  1. MySQL and Ceph Yves Trudeau, Principal Architect, Percona Yves Trudeau, Principal Architect, Percona Santa Clara, California | April 24th – 27th, 2017 Santa Clara, California | April 24th – 27th, 2017

  2. Who am I? • Physicist (by training) • Physicist (by training) • MySQL (2007-2008) • MySQL (2007-2008) • Sun Microsystems (2008-2009) • Sun Microsystems (2008-2009) • Percona since 2009 • Percona since 2009 • Principal architect → HA and distributed systems • Principal architect → HA and distributed systems 2

  3. Plan • A Ceph 101 intro • A Ceph 101 intro • What Ceph brings to MySQL? • What Ceph brings to MySQL? • MySQL/Rados • MySQL/Rados 3

  4. A Ceph 101 intro

  5. Ceph 101: general • An object store (on steroids!!) • An object store (on steroids!!) • Distributed to 1000+ nodes • Distributed to 1000+ nodes • Scalable to multiple PB and 100k+ iops • Scalable to multiple PB and 100k+ iops • Highly-Available • Highly-Available • Multiple APIs • Multiple APIs • Very popular Openstack Cinder backend • Very popular Openstack Cinder backend 5

  6. Ceph 101: versions • Named releases are “stable” • Named releases are “stable” • In alphabetical order • In alphabetical order • Dumpling → Emperor → Firefmy …. • Dumpling → Emperor → Firefmy …. • One release per ~6 months • One release per ~6 months • Current is Kraken (January 2017) • Current is Kraken (January 2017) 6

  7. Ceph 101: nodes/processes • OSD → storage • OSD → storage • MON → cluster manager and map • MON → cluster manager and map • MDS → fjlesystem • MDS → fjlesystem • + protocol proxies • + protocol proxies 7

  8. Ceph 101: OSD nodes • The storage node • The storage node • As you guess… stores data and retrieves data • As you guess… stores data and retrieves data • Perform replication (write copies to other OSDs) • Perform replication (write copies to other OSDs) • Atomic operations • Atomic operations • T • T ypically one per disk ypically one per disk • NVMe may have more than one • NVMe may have more than one 8

  9. Ceph 101: MON • The MONitor nodes • The MONitor nodes • Should be many and odd numbers (3,5, etc.) • Should be many and odd numbers (3,5, etc.) • Paxos protocol for quorum • Paxos protocol for quorum • Provides the cluster maps to the clients • Provides the cluster maps to the clients • Monitor OSDs • Monitor OSDs • Perform maintenance tasks • Perform maintenance tasks 9

  10. Ceph 101: MDS • The MetaData Server • The MetaData Server • Optional, needed only for CephFS • Optional, needed only for CephFS • Only one active, but MON can promote a • Only one active, but MON can promote a standby automatically standby automatically 1 0

  11. Ceph 101: how objects are stored OSD.1 MON 2 1’ map Client view OSD.2 1 2 3 1 client 3’ OSD.3 3 Crushmap, client 2’ side calculations 1 1

  12. Ceph 101: pools • Objects are stored in pools • Objects are stored in pools • Pools can have difgerent confjgurations • Pools can have difgerent confjgurations • Pools can use difgerent sets of OSDs • Pools can use difgerent sets of OSDs • Pools support snapshots • Pools support snapshots • Replication is at the pool level • Replication is at the pool level • Access rights at the pool level → CephX • Access rights at the pool level → CephX • Pools are sharded by Placement Groups • Pools are sharded by Placement Groups 1 2

  13. Ceph 101: access ● Linux NBD ● iSCSI 1 3

  14. A Ceph 101: rados • An object store like S3 → radosgw for S3 api • An object store like S3 → radosgw for S3 api • Objects can be edited in place • Objects can be edited in place • “rados” command line tool • “rados” command line tool :~$ rados -p Mypool put ibdata1 /var/lib/mysql/ibdata1 :~$ rados -p Mypool put ibdata1 /var/lib/mysql/ibdata1 :~$ rados -p Mypool ls :~$ rados -p Mypool ls ibdata1 ibdata1 :~$ rados -p Mypool get ibdata1 /tmp/ibdata1 :~$ rados -p Mypool get ibdata1 /tmp/ibdata1 :~$ rados -p Mypool rm ibdata1 :~$ rados -p Mypool rm ibdata1 1 4

  15. A Ceph 101: rbd • Rbd stands for Rados Block Devices (disks) • Rbd stands for Rados Block Devices (disks) • Mount using librbd (KVM) or kernel module • Mount using librbd (KVM) or kernel module • Snapshot, clone, resize, thin provisioning, etc. • Snapshot, clone, resize, thin provisioning, etc. # rbd -p Mypool create mydisk -s 2G –image-format 2 \ # rbd -p Mypool create mydisk -s 2G –image-format 2 \ --image-feature layering --image-feature layering # rbd -p Mypool map mydisk # rbd -p Mypool map mydisk /dev/rbd0 /dev/rbd0 # mkfs.xfs /dev/rbd0 && mount /dev/rbd0 /var/lib/mysql # mkfs.xfs /dev/rbd0 && mount /dev/rbd0 /var/lib/mysql 1 5

  16. A Ceph 101: CephFS • A distributed Posix fjlesystem • A distributed Posix fjlesystem • Fuse and Kernel module • Fuse and Kernel module • Can be mounted my multiple clients • Can be mounted my multiple clients # mount -t ceph 10.2.2.20:6789/ /mnt/ceph \ # mount -t ceph 10.2.2.20:6789/ /mnt/ceph \ -o name=admin,secret=AQAaV2Qda...== -o name=admin,secret=AQAaV2Qda...== # df -h / # df -h / … … 10.2.2.20:6789/ 7,2T 1.2T 6.0T 17% /mnt/ceph 10.2.2.20:6789/ 7,2T 1.2T 6.0T 17% /mnt/ceph 1 6

  17. Ceph 101: Deploying • The ceph-deploy, a shell script used by the • The ceph-deploy, a shell script used by the documentation documentation • Have a look to ceph-ansible (in ceph github) • Have a look to ceph-ansible (in ceph github) – Still challenging though, many options – Still challenging though, many options – Very good for large deployments – Very good for large deployments 1 7

  18. Ceph 101: My home Ceph cluster 1 8

  19. What Ceph brings to MySQL?

  20. Ceph brings to MySQL: the basics... • A scalable storage • A scalable storage • The possibility to leverage multiple disks to • The possibility to leverage multiple disks to scale the iops scale the iops • Effjcient backups • Effjcient backups • A non-local storage allowing to move services • A non-local storage allowing to move services easily easily • Like an iSCSI SAN right? • Like an iSCSI SAN right? 2 0

  21. Ceph brings to MySQL: Efficient storage • Thin provisioned slaves • Thin provisioned slaves • Thin provisioned PXC nodes • Thin provisioned PXC nodes • No need for a full copy per node • No need for a full copy per node 2 1

  22. Ceph brings to MySQL: thin prov. slaves 2 2

  23. Ceph brings to MySQL: thin prov. slaves(2) Example: Example: • 3 servers with 1TB of disk, MySQL+OSD+MON • 3 servers with 1TB of disk, MySQL+OSD+MON • Dataset 500GB • Dataset 500GB • Normally, each node has 500GB so 1.5TB • Normally, each node has 500GB so 1.5TB • Here, master has 1TB (replicated) • Here, master has 1TB (replicated) • Slaves have… the delta since their last snapshot • Slaves have… the delta since their last snapshot • Slaves can use non-replicated and even • Slaves can use non-replicated and even localized pools (tricky) localized pools (tricky) 2 3

  24. Ceph brings to MySQL: thin prov. Slaves(3) • But the delta will grow over time!! • But the delta will grow over time!! • Just re-provision the slaves • Just re-provision the slaves • Demo!!! • Demo!!! 2 4

  25. Ceph brings to MySQL: thin prov. PXC • Very similar to the thin provisioned slaves • Very similar to the thin provisioned slaves • Need to be careful with non-replicated pools • Need to be careful with non-replicated pools • wsrep-sst-ceph script • wsrep-sst-ceph script 2 5

  26. Can we go further? MySQL/Rados

  27. MySQL/Rados: the idea • InnoDB pages are basically objects • InnoDB pages are basically objects • Rados is an object store • Rados is an object store • What about modifying InnoDB to store pages in • What about modifying InnoDB to store pages in Rados? Rados? • With the galera or group replication, servers • With the galera or group replication, servers could share the same pages could share the same pages • An opensource Aurora? • An opensource Aurora? 2 7

  28. MySQL/Rados: RadosFS • A Cern project • A Cern project • Simple fjlesystem over Rados • Simple fjlesystem over Rados • Can write in chunks → atomic ops • Can write in chunks → atomic ops • Allow us not to care too much about fjlesystem • Allow us not to care too much about fjlesystem stufg stufg 2 8

  29. MySQL/Rados: Modifying InnoDB • Mostly in os0fjle.cc • Mostly in os0fjle.cc • Initialization in srv0start.cc • Initialization in srv0start.cc • Some changes for temporary InnoDB fjles • Some changes for temporary InnoDB fjles 2 9

  30. MySQL/Rados: Status • POC coded • POC coded • Compiles OK • Compiles OK • Strange SEGV deep in librados • Strange SEGV deep in librados • Need help! • Need help! • https://github.com/y-trudeau/percona-server-rados • https://github.com/y-trudeau/percona-server-rados • tools: https://github.com/y-trudeau/ceph-related-tools • tools: https://github.com/y-trudeau/ceph-related-tools 3 0

  31. Questions?

  32. Rate My Session 3 2

Recommend


More recommend