Pronto Elasticsearch Extension Practice in eBay Donggeng Yu 12/07/2019, Pronto, eBay 1
Agenda 1 Overview of Elasticsearch in eBay 2 Use Cases & Challenges 3 Tools Extension for Clusters Management 4 Service Extension for Clusters Capability 2
Elastic Stack • ELKB ‒ Elasticsearch - Search & Aggregation ‒ Logstash – ETL ‒ Kibana – Visualization ‒ Beats – Data Shipper • X-Pack ‒ security, alerting, monitoring, reporting, machine learning and etc. • Use Cases & OOTB Solutions ‒ Logs / Metrics ‒ APM / Uptime ‒ SIEM / Endpoint Security ‒ Site Search / App Search / Enterprise ‒ Maps 3
Pronto Ecosystem in eBay 62% Supporting text goes here under the number 4
100+ clusters 6k+ nodes VM( openstack ) / Container( k8s ) 5
Agenda 1 Overview of Elasticsearch in eBay 2 Use Cases & Challenges 3 Tools Extension for Clusters Management 4 Service Extension for Clusters Capability 6
Use Cases in eBay • Use Cases: ‒ Near real time search / aggregation Virtual Shop / Tire Installation / ‒ Terapeak / SEO On-Site Traffic ‒ ‒ Metrics & Logs UFES / Ceilometer / SRE / UMP ‒ More than 20T/day for a single cluster ‒ 7
Vertical Shop & Tire Installation 8
Terapeak - eCommerce Data Insights • Terapeak ‒ SAAS based tool for providing ecommerce data insights to online sellers ‒ Acquired by eBay • Tech Stack ‒ From RMDB + SOLR to ELK ‒ S3 and Hadoop for data staging ‒ Spark for data ETL ‒ Kafka for data queue ‒ Postgres for Data Warehouse ‒ Elasticsearch for indexing and search ‒ ReactJS for front-end application 9
UFES - Anomaly Detection for SLB • Goal ‒ Unified Front-End Services - Move eBay Closer to Users so that the world shops first on eBay. UFES team built out 8 new Internet Points of Presence(POP) across the globe ‒ Need to route traffic via UFES PoPs by replacing the Netscaler Hardware SEO Load Balancers with Envoy Proxy based Software Load Balancers. • Elastic Stack ‒ Filebeats + Kafka + Elasticsearch Clusters ‒ Dashboard for monitoring and comparison ‒ Anomaly Detection for SLB 10 10
Ceilometer - IT Operation Analytics 11 11
Challenges of Managing Clusters Fleets at Scale • Integrated with eBay’s Platform & follow the standards ‒ Configuration management & Change management ‒ Full lifecycle management • Easy onboarding and integration ‒ Elasticsearch as a Service ‒ How to free customer to focus on domain business • Performance & High Availability Performance ‒ Search: Site facing application response time should less than 100 ms ‒ Ingesting: 20T per day for a single cluster ‒ Different deployments, like cross region deployment Cost HA • Cost Control ‒ Hardware cost ‒ License fee (support some features like security, alert and ML) Onboarding ‒ Human resource Integration ‒ Support (7*24 on-call support & on-site support, etc.) 12 12
Performance Solutions for Challenges Cost HA Cluster Provision & Management Onboarding Integration • From VM to Container ‒ VM (Openstack) Fixed flavor ‒ Puppet Foreman infrastructure ‒ Puppet module for Elasticsearch ‒ ‒ Container (K8s) Flexible flavor (request/limit) ‒ Operator Pattern ‒ Deployment + Statefulset + Service ‒ • Best practices & Different deployments ‒ Important System Configuration & Best practices ‒ Anti-Affinity (High availability) ‒ Cross region deployment (High availability) ‒ Flavor chosen by traffic (Cost saving) ‒ Hot-warm architecture (Cost saving) ‒ LB for write / read 13 13
Performance Solutions for Challenges Cost HA Tooling and Service Extension Onboarding Integration 14 14
Agenda 1 Overview of Elasticsearch in eBay 2 Use Cases & Challenges 3 Tools Extension for Clusters Management 4 Service Extension for Clusters Capability 15 15
Use Case Onboarding • Capacity planning ‒ What’s the use case and use scenarios Data retention / active period ‒ ‒ Performance Index rate / search rate ‒ Document & bulk size ‒ ‒ Deployment & Cost How many nodes? ‒ What’s the hardware configuration? ‒ What kind of deployment should be used? ‒ Node Storage Memory CPU Network ‒ Best practices Software configuration Master Low Low Low Low ‒ Deployment in different Region ‒ Data Extreme High High Medium Keep the margin to ensure that traffic ‒ becomes large without performance Ingest Low Medium High Medium issues Coordinator Low Medium Medium Medium Machine Low Extreme Extreme Medium Learning 16 16
Onboarding Integration Onboarding Self-Service and Sizing Tool 17 17
Onboarding Cost Integration Customer Support • Support model ‒ Different SLA for different use cases Search response time should less than 100ms ‒ Cluster should NOT be in RED ‒ ‒ 7*24 support for Site-facing or Tier 2 above SEC call / Pagerduty ‒ • Support case ‒ Cluster in RED Node missing and replica is 0 ‒ Dangling index ‒ ‒ Response time Full GC because of Machine check error (MCE) ‒ Too many shards and fields ‒ 18 18
Onboarding Integration Data Ingestion Pipeline • Added Value for customers ‒ Self-service, no coding/testing ‒ No onboarding required • Shared cluster ‒ 30+ use cases / 3T per day • Shared data assets ‒ Partition by application name • Shared dashboard ‒ 30+ Dashboards ‒ 300+ Charts/Visualizations 19 19
Simple Steps - service onboarding a new use case pom.xml web.xml 20 20
Performan ce Data Management & Optimization Onboardi ng Cost Integratio n • Backup & Restore ‒ Snapshot lifecycle management (SWIFT as the repository ) • Time series data ‒ Benefits of using time-based indices Delete index is faster than delete by query ‒ Use hot-warm architecture ‒ Close indices or force-merge read-only ‒ indices ‒ Time series data Treapeak v.s UFES (different needs) ‒ • LifeCycle management ‒ Central policy management / Web UI / OOTB Policies 21 21
Index Management Tool vs. Curator vs. ILM Pronto Index Mgmt. Function Curator Elastic ILM Tool High Availability N/A YES YES Web UI N/A YES YES Version Compatibility N/A 2.x/5.x/6.x/7.x 6.8+ Multi-Clusters N/A YES N/A 22 22
Performanc Cost e Diagnostic Tool • Features ‒ Find Improper settings or usage ‒ Job scheduler & Diagnostic report for potential issues • Rules ‒ Too many indices / Too many shards / Index have too many fields ‒ Shard size check (20GB to 40GB) ‒ Imbalance shards ‒ Replica number should bigger than 0 ‒ Node missing / Rack Id attribute missed / Minimum master ‒ Machine check error / Server disk full ‒ Alias & index template checking 23 23
Performance & User Scenarios • Many Factors: ‒ Index / Shard ‒ Query / Scripting ‒ Mapping / Setting Behavior Use Cases Index heavy Logging / Metrics / Security / APM Search heavy App Search / Site Search / Analytics Update heavy Caching / Systems of Record 24 24
Performance Issues & Optimization • Wildcard search • Performance Optimization ‒ Customer use beginning patterns ‒ Disable swapping & give memory to the with * and ?. file system cache ‒ Avoid to use * or ?. ‒ Unset or increase the refresh interval ‒ Disable refresh and replicas for initial loads • Stopwords & Shard Size ‒ Use auto generated Ids ‒ Reindex with the stop words ‒ Disable the features you do not need ‒ Use more shards to improve the ‒ Don’t use default dynamic string mapping throughput ‒ Watch your shard size / shrink index ‒ Force Merge • Too many indices / shards / fields ‒ Pre-Index data ‒ Avoid scripts ‒ Close or delete the unused indices ‒ Force-merge read-only indices ‒ Improve the document modeling ‒ Warm up global ordinals ‒ Disable the dynamic mapping ‒ Replicas might help with through, but not always 25 25
Performance Performance Testing Tool • Performance testing ‒ Testing data ‒ Testing scripts ‒ Test report for analysis • Web based tool ‒ Developed based on the Gatling ‒ Web UI to select the testing scripts and testing data ‒ Test report for analysis 26 26
Agenda 1 Overview of Elasticsearch in eBay 2 Use Cases & Challenges 3 Tools Extension for Clusters Management 4 Service Extension for Clusters Capability 27 27
Solution and security plugin for Cost Elasticsearch • Pronto Security Plugin ‒ TLS for encrypted communications ‒ Cluster / Index level RBAC control ‒ Follow eBay’s standard API Key for Application ‒ 2FA for user login ‒ Audit logs ‒ • Security Consideration ‒ Authentication / RBAC ‒ Certification retention ‒ Firewall / White IP list ‒ Vulnerability management 28 28
Cost X-Pack Subscription • License cost ‒ License fee is based on the node count • How to Extend ‒ Develop the Kibana Application ‒ Integrate with the alerting and anomaly detection service 29 29
Recommend
More recommend