huawei s story of leveraging gridgain as a distributed
play

Huawei's story of leveraging GridGain as a distributed caching - PowerPoint PPT Presentation

Huawei's story of leveraging GridGain as a distributed caching service on its public cloud environment Paul Chen Chief Architect, Cloud Services Research and Development, Huawei Technologies Canada Lab Agenda Huawei Public Cloud


  1. Huawei's story of leveraging GridGain as a distributed caching service on its public cloud environment Paul Chen Chief Architect, Cloud Services Research and Development, Huawei Technologies Canada Lab

  2. Agenda • Huawei Public Cloud Overview • DCS Caching Architecture & Usage Patterns • Caching Engines & Use Cases • Public Cloud Caching Performance/Latency Summary • Current Challenges • Hybrid & Private Cloud Use Cases and Challenges • Things to Explore 2

  3. Huawei Public Cloud Overview

  4. Huawei Public Cloud Overview Digital Smart City Internet finance e-Government manufacturing 860+ solution partners for business Co- innovation, and 2900+ service partners for operation E2E services including consultancy, deployment and O&M SAP Web& Generic-specific solution, to adapt to … On IoT Solutions 60+ Mobile Cloud industry business and optimize services Cloud Cloud Solutions HPC Commun Office ication Dedicated Cloud FCS Migration IT hosting DR High-performance ECSs and BMSs guarantee cloudification of critical businesses. Heterogeneous computing capacity supports artificial intelligent applications. Enterprise Cloud Enterprise-class storage, DB, and data analysis 14 categories Database Application EI DevCloud Video IoT application communication Services services deeply dig into values of data. 100+services Management and Development Computing Storage Network Security Security: Anti-DDoS, WAF, and DBSS deployment and testing guarantee business security. Atlas heterogeneous hardware, HPC, AI, and latest GPU and FPGA improve the computing Software capability. & 4 Chip Server Network Software Hardware Storage Customized CPU, NVMe SSD card, smart NIC, RDMA, InfiniBand network and security chipset

  5. Huawei Cloud Services Enterprise Management & Enterprise Cloud IoT Deployment Apps Comm. -I73F�M�IKG S 0��MCHA �FIN�-3�� �IC����FF 0�+�0� �IKEL���� -0 o l u S t a i o a n S s App Application Builder �01 ��KOC���M�A��NH�MCIH�M�A� ��0 �0� ��� ��� 0�� ��� �0� 5�� ��� 5�� �0� S �5� -�0 a P a DevOps a S Dev a P Cloud �IFF��IK�MCIH �FIN�-�� �FIN�5�F��L� �FIN��NCF� �I���B��E �FIN�3C��FCH� 0I�CF�7�LM 3KID��M0�H �I���N� �FIN����FIP 7�LM0�H ��� -�0� Distributed Caching Services Data Analysis Database 05� �3� �-� 0�� ��� ��0 84N�KP ��5��IKGL ��� 5�� ��� S BigData ��� �7� a a P Security �7� �HMC ��I� ��� ������H �0� ��� �5� ��� �-�� �-� ��� ��� ��� ���� �73 ���� System, ��� �� -0� �0� ��� ��� ��� ��� ��� ��� ��� ��� ���� ��1 ���� �3� ��� �1� �CK��M�IHH��M �31 I a a Network, S Computing Storage Network Storage, … �� categories ��� services ��� categories ��� services ��� categories ���� services ��� categories ����� services ������� �� ������ �� ������� �� ������ �� 5

  6. Architecture & Usage Patterns

  7. DCS Architecture App Manage my caching instance developers DMZ Caching Service Dashboard PRV Caching Service Broker (manager) Caching service providers Resource Scheduling & Deployment Provision service instances Caching Engines GridGain • Resources are isolated per tenant T enant Resources Shared Resources Redis • uses Horizontal Scale On-demand Apps Bare-metal VMs (x86/ARM) App users

  8. DCS 2.0 Released • Faster , more flexible and more secure • 8 seconds to create a caching instance • Caching operations 300% faster (leveraging seamless HW/SW/OS integration) • Scale on demand (add new caching capacity dynamically) • Strong Security: strong multi-tenant isolation; SLA warranty via caching overflow , cache persistency and alert/notification 8

  9. Caching Usage Patterns Side Cache 01 HTTP Session 02 Replication 03 Change Data Capturing 04 Write-through/Write- behind/Map-reduced 05 SQL-like Query 9

  10. DCS Caching Usage: Side Cache Side Cache 01 Cache engine • e-commerce & Websites HTTP Session 02 • Public services Replication • Social media (e.g. feeds) • Network games 03 Change Data Capturing • Objects/Classes (e.g. POJO), Search engines • SQL like queries • Transaction controls • 04 Write-through/Write- Locking strategy control • behind/Map-reduced Multi-language supports • • Customization & Serializations 05 SQL-like Query Simple Key/Value • Cloud native client • Redis/Memcached Interfaces • 10 • Redis objects (MSET e.g.)

  11. DCS Caching Usage: HTTP Session Web App Layer Side Cache App Server 01 Web App Filter HTTP Session 02 Replication Instance 1 HTTP Server • Session persistent App Server Change Data “HTTP Session objects cached on DCS” 03 Caching Cluster Cache Capturing • User/login profile and session, user objects, Web App session data (shopping cart, store catalogues, Filter DCS browsing histories ..) 04 Write-through/Write- behind/Map-reduced Instance 2 Session failover • App Server “ App instances were down or restarted” 05 SQL-like Query Survived from instance restart • Web App Fast warm-up time • Filter Database Cache 11 Client plugin Instance 3

  12. DCS Caching Usage: Data Grid Side Cache 01 • Oracle GoldenGate • IBM Data Capture HTTP Session 02 Replication Change Data 03 Capturing Write- 04 through/Write- behind/Map- reduced 05 SQL-like Query 12

  13. Engines & Use Cases

  14. DCS Use Case 1 • A public service agency (App was deployed on Huawei public cloud) • > 50 ,000 concurrency => Database becomes a bottleneck • Impact significantly on business during the request peak due to DB latency ������ ������ �������� ����� Web Web Server Server Messaging Service Web Web Server Server RDS ELB DCS q After leveraging DCS caching • Performance and concurrency improved 10 times 14

  15. DCS Use Case 2 • A search engine provider (in Asian pacific) • Huge amount of business data to collect and analyze (e.g. news, social media, blogs, chat groups, online forum…) – increase exponentially • Large amount of collected data were redundant – significantly increase the process, modeling and analysis time – became “low performance” and “inefficient” huge instance instance Info Info Info. amount of Messaging Sources Metrics the Analytic DCS cluster Service after redundant Engine removing the data/objects/ instance instance redundancy messages synchronization q After leveraging DCS caching Business • 70% deployment cost savings Management Platforms 15 • Double the data process efficiency

  16. DCS GridGain & Redis Engine Performance/Latency - Clustered nodes Note: the following result is for reference purpose only – not for comparison) - 1 full async replica The different test tools used (Yardstick vs. memtier) • - 9 million requests The different cached objects measured (Java objects vs.. MSET) • - 1 K per object or value The different heap requirements (Java vs.. n+on-Java) - > 200 connections • GridGain Engine (Enterprise v8.4.1) CPU Usage % MEM Usage Performance Network Latency (Average per heap Nodes Replica Threads Driver Server Driver Requests Server Mbps msec node) Use Case 1 � 2 clients 1 replica � increases # of nodes and # of client connections 1 360 8G 9000,000 498 252 1.56G 5.72G 60 2.01 95417 9 Redis Engine (v4.0.11) Performance Network Latency (Average per heap Nodes Replica Threads Requests Mbps msec node) Use Case 1 � 2 clients 1 replica � increases # of nodes and # of client connections 1 320 64G 1000,000 1.5 91795 16 8

  17. Challenges

  18. Challenges • IMDG ecosystem buildup on public cloud • Enterprise cloud transformation (private -> hybrid, private -> public cloud) • Migration across the different cloud providers • Smart cache (more reliable, predictable, intelligent, interoperable) e.g. user doesn’t care what caching engines are used, but elastically picked by the intelligent behind the scene based on use cases (Redis engine ßà GridGain engine) • Hardware optimization (FPGA, AEP …, Cache offload) 18

  19. Things to Explore

  20. Things to Explore • Write-through/Write-behind • Data change capturing • Smart cache (OLAP ,,. Caching streaming data and real-time data analytics) • Migrate caching services seamlessly from one cloud provider to another • AEP (non-volatile memory (NVM) technology) 20

Recommend


More recommend