china railway business situation
play

China Railway BUSINESS SITUATION China Railway A solely - PowerPoint PPT Presentation

The Challenge of Large-Scale Ceph Cluster China Railway BUSINESS SITUATION China Railway A solely state-owned 4.02Bt enterprise 40.2 3.64Bt The main artery of the 36.4 3.37B 3200 33.7 national economy 32. MAIN BUSINESS Passenger


  1. The Challenge of Large-Scale Ceph Cluster China Railway

  2. BUSINESS SITUATION China Railway A solely state-owned 4.02Bt enterprise 40.2 3.64Bt The main artery of the 36.4 3.37B 3200 ㎡ 33.7 national economy 32. MAIN BUSINESS Passenger and freight transport service 1.67B 16.7 BUSINESS FEATURE 131k km 12000 13.1 12. Large scale, wide coverage, 91k km 9.1 690 ㎡ uninterrupted 6.9 ENTERPRISE GOAL 800 0.8 To be a world-class modern Operating Mileage Passenger Volume Freight Volume Computer Room Area Equipment quantity logistics enterprise 2010 2018

  3. Cloud Computing Development in China Railway OpenStack-base for Cloud Cloud Computing Starting in 2014 Data Center Powering Data Center Hub Built the cloud date center in stages, and Currently reached a scale of Expected to reach the scale of gradually migrated thousands of physical machines, newly built Data Center Hub production applications and finished deploying more with above 15,000 physical to the cloud environment. than 280 applications, including machines by the end of 2020. passenger transport, freight, scheduling, locomotive and public infrastructure platform.

  4. SRCloud Architecture and Components Cloud Application System Cloud Security Management Management & ransportation Resource Inplementation Intergrated Stratgic Decision Development ProductionT Management management Collaboration Acess Resource Security Management PaaS of China Railway Cloud Authority Storage & Back-up Host Security Big Data Service Middleware Service Container Service Database Service Management Service Operation & IaaS of China Railway Cloud System Maintenance Security Management Block Storage Object Storage File Storage Computing Management Image Management Management Management Management Global Network Management Secured Authentication Resource Orchestration Resource Metering Bare-metal Management Data Security Monitor Flexible Resource Physical Machine Load Balance HA of Virtual Machine Log Optimization Expansion Monitoring Network Configuration Security Management Computing Resource Pool Network Resource Pool Storage Resource Pool KVM/Xen VMware Virtual Switch Virtual Router Distributed Storage Application Asset Security Management Physical devices PowerVM VLAN VXLAN Centralized Storage Management

  5. Storage Usage in China Railway Resource type Storage type Critical database Centralized storage resource pool As shown in this figure, it is the current storage usage of China Railway data center, where the centralized storage Centralized storage Vmware based serves for the database service. Both centralized storage & distributed storage resource pool and distributed storage serve as the back-end storage of VMware and OpenStack. Centralized storage KVM based & distributed storage resource pool

  6. Sharing of Experience in Distributed Storage Ceph Monitor Storage OpenStack OSD Capacity Nodes Nodes 3 controller nodes 5 84 1512 2.15P 202 compute nodes 3 controller nodes 5 68 1224 1.71P 161 compute nodes 3 controller nodes 3 40 720 1.02P 94 compute nodes ……

  7. Deployment Architecture

  8. Challenges in The Depolyment of Ceph Cluster with 1512 OSDs 1.How many Ceph Mon nodes were deployed? Based on our large-scale testing experience (with 600 compute nodes, over 1000 OSDs), five Ceph Mon nodes are necessary to ensure the stability of Ceph cluster.

  9. Challenges in The Depolyment of Ceph Cluster with 1512 OSDs 2.How to configure the failure domain? There is no difference in available capacity when separating replicas either across hosts or across racks. However, they have different fault-tolerance levels, one for host-level fault, and the other for rack-level fault. We separates replicas across racks to tolerate the rack-level fault.

  10. Challenges in The Depolyment of Ceph Cluster with 1512 OSDs 3.How to configure the network? 2 * 10G Public Network 2 * 10G Cluster Network

  11. Challenges in The Depolyment of Ceph Cluster with 1512 OSDs 4.Stop the Ceph services before adjusting network configuration The error of Ceph cluster and recovery failure occur when we adjust the configuration of netowrk equipment in the running state of Ceph cluster. It is suggested that if you need to adjust the physical network, stop all services of the Ceph cluster first.

  12. Challenges in The Depolyment of Ceph Cluster with 1512 OSDs 5.Creating 100 incremental VMs in batches with high failure rate It is mainly caused by the incremental snapshot. Concurrent creation is an operation of incremental cloning for the rbd layer. The rbd side is designed with a distributed exclusive lock. In the concurrent cloning operation, librbd needs to obtain the exclusive lock first to perform the following operations, which leads to the failure of the competition of exclusive locks in the case of high concurrency. Eventually we closed the exclusive lock.

  13. Challenges in The Depolyment of Ceph Cluster with 1512 OSDs 6. Creating 100 incremental VMs in batches with high failure rate when there are already 1500 VMs In Cinder's code, each time you create a disk with image, you will first create a snapshot for the disk. The rbd client needs to query all the snapshot lists of the target volume when creating a snapshot. Therefore, if the snapshot list is too long, the interface call will time out, and the creation fails. We modified the Cinder code to create a snapshot for each image. When you create a disk based on this image later, no new snapshots are created.

  14. Challenges in The Depolyment of Ceph Cluster with 1512 OSDs 7. Restart a Ceph Mon service Due to the operation and maintenance requirements, a Ceph Mon service was restarted. After restarting, 87% of pg unknowns and dozens of slow requests were found, and a recovery was triggered for a while. It was found through analysis that the clock was inconsistent.

  15. Challenges in The Depolyment of Ceph Cluster with 1512 OSDs 8. Test the impact of latency on Ceph TC is used to increase a 1000-millisecond latency on a certain data network card of OSD, resulting in the failure of some instances on the platform to read and write. After the latency is deleted, the virtual machine returns to normal. In order to discover network problems in time, we have strengthened monitoring of the network.

  16. Performance of Ceph Cluster with 1512 OSDs 4K Random R/W IOPS 512K Seq R/W Bandwith(MB/s) 4000 120000 3270k 90784.5 3000 90000 2000 60000 34694 30821 30465.5 1000 442.7k 30000 241.4k 219k 0 0 4K Random W 4K Random R 512K Seq W 512K Seq R HDD PCIE HDD PCIE Note: 1344 HDD OSDs, 168 PCIE OSDs

  17. HA Test Case of Ceph Cluster with 1512 OSDs Testing Scenarios Result The consistency of data read and write when two Ceph Mon nodes restart Pass The consistence of data read and write when two failure domains restart Pass The pause time of VM I/O request should be less than 10 seconds when the sysrq kernel exception occurs on one OSD Failed node The pause time of VM I/O request should be less than 10 seconds when one OSD node reboots Pass The pause time of VM I/O request should be less than 10 seconds when one disk of OSD node can not be read or written Pass The pause time of VM I/O request should be less than 10 seconds when one disk of OSD node is disconnected Pass The VM's I/O request should be continuous when latency occurs on a network card of OSD node Failed The VM's I/O request should be continuous when packet loss occurs on a network card of OSD node Pass The pause time of VM I/O request should be less than 10 seconds when one OSD node is disconnected Failed

  18. THANK YOU

Recommend


More recommend