HI HIGH H AVA VAILAB ILABILITY ILITY AND DIS ISASTER ASTER RECO ECOVERY VERY FOR OR IM IMDG VLADIMIR KOMAROV, MIKHAIL GORELOV SBERBANK OF RUSSIA 1
ABOUT UT SP SPEA EAKER ERS Vladimir imir Komaro marov in Sberbank since 2010. He realized the concepts of operational data store (ODS) and retail risk data mart as a part of enterprise data warehouse. In 2015 performed the testing of 10+ Enterprise IT Architect distributed in-memory platforms for transaction processing. Now responsible for grid-based core banking infrastructure vikomarov@sberbank.ru architecture including high availability and disaster recovery. Mik ikhail hail Gorelov relov in Sberbank since 2012. He is responsible for building the infrastructure landscape for the major mission-critical applications as core banking and cards processing including Operations expert & manager new grid-based banking platform. Now he acts as both expert and project manager in “ 18+ ” core banking transformation magorelov@sberbank.ru program. 2
ABOUT UT SB SBER ERBANK NK The la large gest st bank in Russian in ian Federa ration tion • 16K+ offices in Russia, 11 time zones • 110M+ retail clients • 1M+ corporate clients • 90K+ ATMs & POS terminals • 50M+ active web & mobile banking users 3
OUR UR GOA GOALS 𝑈𝑝𝑢𝑏𝑚 𝑢𝑗𝑛𝑓 − 𝐸𝑝𝑥𝑜𝑢𝑗𝑛𝑓 Availability = × 100 % 𝑈𝑝𝑢𝑏𝑚 𝑢𝑗𝑛𝑓 Avail ilabi ability lity Yearly ly downtime ntime 99 % 3d 15:39:29.5 99.9 % 8:45:57.0 99.99 % 0:52:35.7 target et for 20 2018 18 99.999 % 0:05:15.6 99.9999 % 0:00:31.6 4
OUR UR ME METH THODS • additional co contro rol l and ch check ckin ing g tools ls; • monito itorin ing improvement: • new metrics design; • new visualizations; • continuous testin ing: • operational acceptance tests; • performance tests; • 45+ scenarios of destructive testing; • keeping in incid ident ent response nse pla lan up-to-date. 5
TH THRE REATS TS AN AND FA FACI CILITIE TIES Datace cente nter DC Applicati tion on User r data HW/OS OS/J /JVM VM interc ercon onne nect ct bugs, admin loss ss corrupti uption failures res failure re errors rs On-disk data persistence Data redundancy Distributed cluster Data snapshots Point-in-time recovery Health self-check Data replication 6
TH THE E LEG EGACY CY GR GRID-EN ENABLED ED ARC RCHI HITE TECT CTUR URE Strength engths Applicati tion on servers vers Robust and stable persistence layer • compute A grid hasn’t to be highly available • In In-memo memory y data grid Weaknes aknesses ses caching & temporary storage The write performance is limited by database • The persistence layer is not horizontally scalable • Data need to be converted from object • Relati ational nal DBMS representation to relational model persistence & compute Database and grid can become inconsistent if • data is changed directly in the database The database requires high-end hardware • 7
SB SBER ERBANK NK CO CORE RE BA BANK NKING NG PLATF TFORM RM ARC RCHI HITE TECT CTUR URE Opportun portunit ities ies Applicati tion on servers vers compute Fully horizontally scalable architecture • on commodity hardware The data is stored as objects, • no conversion required In In-memo memory y data grid The only instance of the data • compute & data persistence Challen hallenges ges The grid has to persist the data • The grid has to be fault tolerant • 8
SE SERV RVICE CE CO CONT NTINU NUITY TY TH THRE REATS TS Continuity threats Errors Service jobs Cluster topology Local failures Disasters Software update change Cluster breakdown Data corruption due Hardwa dware/OS /OS/J /JVM VM due to application Datac acente ter Firmwar are/OS /OS/JV /JVM Application Netw twor ork k failures to user/admin Datac acente ter loss Platf tfor orm upgr grade ade failures errors and/or intercon onnect t loss upgr grade ade upgrade action admin action The above tree does not consider security issues • Application and user issues cannot be solved at platform level • Let’s focus on system issues! • 9
TH THE E CO CONC NCEP EPT T OF SE SERV RVICE CE P PRO ROVI VIDER ER INT NTER ERFACE CE (SP SPI) API vs vs. SPI Appli licat atio ion API SPI Defin ined by Platform Platform System software Impleme mente nted by Platform API (custom code) Called by Application Platform GridGa Gain in IMDG (custom code) Sberbank implements GridGain SPI: SPI TopologyValidator Cust stom om service vice imple plementa ntatio ion AffinityFunction 10
TH THE E CO CONC NCEP EPT T OF AFFINI NITY TY Data nodeFilter the property of the cache that defines the set of nodes where the cache’s data can reside Data area 2 Data area 1 (e. g. g. acco ccoun unti ting) ng) (e. . g. cli client ents) AffinityFunction SPI partition() the fast, simple and deterministic function usually division reminder mapping object to the partition (chunk) assignPartitions() the function distributing partitions (chunks) across the nodes Data/compute grid 11
TH THE E CO CONC NCEP EPT T OF CE CELL; NEW NEW A AFFINIT NITY FUN UNCT CTION Broke ken n node: Sberbank’s affinity nity impleme lementation ntation Datacenter ter 1 Datacenter ter 2 more nodes in the cluster → faster recovery 1 2 3 4 5 6 7 8 2 3 4 5 6 7 8 1 3 4 5 6 7 8 1 2 4 5 6 7 8 1 2 3 Semi-bro broke ken n node: e: Cell The grid is distributed across 2 datacenters. • Data connectivity is limited to 8 8 nodes (a cell). • Every partition has the master copy and more linked nodes → stronger performance impact • 3 backups. Each datacenter has 2 copies of a partition. • Find a b balanc nce! Both datacenters are active. • 12
SB SBER ERBANK NK CO CORE RE BA BANK NKING NG INF NFRA RAST STRU RUCT CTURE URE DC1 C1 DC2 C2 Nodes of a cell reside in different rent racks ks. • Clos netw twork rk provides stable high-speed connectivity. • Doubled datacenter ter interconne connect ct reduces split-brain probability. • Every server contains NVMe Me flash h and HDDs. • 13
LET’S SPEAK ABOUT NETWORK FRAGMENTATION… DC1 DC1 DC2 DC1 DC2 DC2 Regular operation Datacenter loss DC interconnect loss DC1 DC2 DC1 DC2 DC1 DC2 Fragmentation type 1 Fragmentation type 2 Fragmentation type 3 14
HO HOW D W DOES ES GR GRIDGA GAIN RE RECO COVE VER R A B BRO ROKEN EN CL CLUST USTER ER? End (commit or rollback) all the active transactions. Choose new cluster coordinator Call TopologyValidator.validate() true (default) false Continue normal operation Go read-only mode 15
LET’S OVERRIDE DEFAULT TOPOLOGY VALIDATOR! Check if Decisio cisions ns possibl sible: e: the previous topology was valid RW (read-write): continue normal • either the new nodes appear operation or not more than N nodes lost AW (admin wait): freeze the cluster • and wait for admin interaction yes RW RW no DC1 DC2 Data Decisio ion Check All Partial RW 1. ther 1. ere e are nodes odes from om DC1 ○ All ○ Partial ○ None All Partial 2. ther 2. ere e are nodes odes from om DC2 All None RW ○ All ○ Partial ○ None All None 3. data 3. a is integ tegral ral (no partition loss happens) ○ Yes ○ No Partial All AW 16
DEC ECISI SION N AUT UTOMATIO TION N USI USING NG QUO QUORU RUM NO NODE Datace cente nter r 1 Dat atacen acente ter r 2 STOP RW RW STOP Qu Quorum rum dat atacenter acenter Quorum node 17
LOCA CAL FILE ST E STORE RE (L (LFS) S) Trx processing Paged memory Sync. write RAM Async nc. write te Sync. write HDD Paged disk storage (files) Write-ahead log 18
BACK CKUP UP SUB SUBSY SYST STEM EM Current rrent Futu ture • Snapshot shot to local disk • Point nt-in in-tim time e reco cover very using (full/incremental/differential) snapshot and WAL; • Snapshot ca catalog log inside the data grid • Ex Exter erna nal bac ackup kup ca catal alog og in relational DBMS; • Copying to NAS using g NFS • Copying to SDS using S3/ 3/SWIFT WIFT; • Restoring on arbitra trary ry grid topolo ology gy • ...and more! 19
TH THANK NK YOU! U! Vlad adimir imir Komarov marov <vikomarov@sberbank.ru> Mikha hail l Gorelov lov <magorelov@sberbank.ru> 20
Recommend
More recommend