Continuous availability: from the shift paradigm to unmanned operation. Pietro Tiberi 17 January 2018 – TIPS Contact Group
Agenda 1 2 3 4 Introduction Continuous Results Conclusions and Availability perspective 2 Continuous availability: from the shift paradigm to unmanned operation
Introduction TIPS Non functional requirements - Reliability / Availability 99.9% (RPO=0) (RTO=15 minutes) Transactions Lost Downtime 3 Continuous availability: from the shift paradigm to unmanned operation
Introduction Datacenter Operations Human based Unmanned (on shifts) 4 Continuous availability: from the shift paradigm to unmanned operation
CONTINUOUS OPERATION 5 Continuous availability: from the shift paradigm to unmanned operation
Continuous Availability From high availability to continuous availability o Proactive o Redundancy monitoring o Continuous o Fault Tolerance delivery o Automatic o Clustering remediation o Dynamic capacity o Active Active configuration management 6 Continuous availability: from the shift paradigm to unmanned operation
Continuous Availability Proactive Monitoring o Application monitoring o Detect events before failures o Analyze the event o Trigger automatic o Infrastructure monitoring actions 7 Continuous availability: from the shift paradigm to unmanned operation
Continuous Availability IT Automation 8 Continuous availability: from the shift paradigm to unmanned operation
Continuous Availability From Agile to Devops 9 Continuous availability: from the shift paradigm to unmanned operation
Continuous Availability DevOps - Everything as Code Code Virtual Infrastructure 10 Continuous availability: from the shift paradigm to unmanned operation
Continuous Availability Dynamic Capacity Management o Resource utilization o What if scenarios rate optimization o Consumption o Predict future trend analysis requirements and trends 11 Continuous availability: from the shift paradigm to unmanned operation
12 Continuous availability: from the shift paradigm to unmanned operation
Test Plant Architecture User A User B put get put get Message Message Message Application Layer Router Processor Router read write Kafka Message Layer write write Broker Aerospike Database Layer store store Database 13 Continuous availability: from the shift paradigm to unmanned operation
Results T est Architecture Specific tests to verify the relevant domain functions. executed on Common simulation layer to reproduce real operational environment. 14 Continuous availability: from the shift paradigm to unmanned operation
Results Simulation – continous delivery (1) SIMUL.APP.01 : message latency (1 sec average) Normal traffic condition (500 msg/s), timeout = 10.000 ms Kafka cluster rolling update 0 messages lost 0 timeout expired 15 Continuous availability: from the shift paradigm to unmanned operation
Results Simulation – continous delivery (2) SIMUL.APP.02 : message latency (1 sec average) Heavy traffic condition (2000 msg/s), timeout = 10.000 ms Kafka cluster rolling update 0 messages lost some timeout expired 16 07 November 2017 – CMG Impact 2017 Continuous availability: from the shift paradigm to unmanned operation
Results Simulation – proactive monitoring Normal traffic condition (500 msg/s) average E2E processing time = 45 ms High vCPU load added to Message Processor nodes. T0-T1 below threshold T2-T3 exceed threshold 17 Continuous availability: from the shift paradigm to unmanned operation
Conclusions and perspective Phased Bi-modal Approach Data Center Tool 18 Continuous availability: from the shift paradigm to unmanned operation
Continuous availability: from the shift paradigm to unmanned operation. Thanks for your attention Pietro Tiberi (pietro.tiberi@bancaditalia.it)
Recommend
More recommend