The return of OpenStack Telemetry and the 10,000 Instances - PowerPoint PPT Presentation

The return of OpenStack Telemetry and the 10,000 Instances Telemetry Project Update Alex Krzos Julien Danjou 8 November 2017

The return of OpenStack Telemetry and the 10,000 Instances 20,000 Alex Krzos Julien Danjou 8 November 2017

Introductions Alex Krzos Senior Performance Engineer @ Red Hat akrzos@redhat.com IRC: akrzos Julien Danjou Principal Software Engineer @ Red Hat jdanjou@redhat.com IRC: jd_ Red Hat

Lets talk about Telemetry and Scaling... ● Why scale test? Telemetry Architecture ● ● Gnocchi Architecture ● The Road to 10,000 Instances ● Scale and Performance Test Results Conclusion ● Red Hat

Why Scale Test? Determine capacity and limits Develop good defaults and recommendations Characterize resource utilization Telemetry must scale as number of metrics collected will only increase. Red Hat

Telemetry Architecture Red Hat

Gnocchi Architecture Red Hat

The Road to 10,000 Instances Ocata struggled to get 5,000 instances even with lots of tuned parameters and reducing workload. Goal: Achieve 10,000 instances with less tuning than Ocata and a more difficult workload. Extra Credit: Go beyond 10,000 with same hardware. Red Hat

Workloads Boot Persisting Instances with Network 500 at a time, then quiesce ● Boot Persisting Instances ● 1000 at a time, then quiesce Measure Gnocchi API Responsiveness ● Metric Create/Delete ● Resource Create/Delete Get Measures ● Red Hat

Hardware 3 Controllers 2 x E5-2683 v3 - 28 Cores / 56 Threads ● ● 128GiB Memory ● 2 x 1TB 7.2K SATA in Raid 1 12 Ceph Storage Nodes 2 x E5-2650 v3 - 20 Cores / 40 Threads ● ● 128GiB Memory ● 18 x 500GB 7.2K SAS ( 2 - Raid 1 - OS, 16 OSDs), 1 NVMe Journal 59 Compute Nodes 2 x E5-2620 v2 - 12 Cores / 24 Threads ● ● 128GiB / 64GiB Memory ● 2 x 1TB 7.2K SATA in Raid 1 Red Hat

Network Topology Red Hat

10,000 Instances with NICs Test Workload (20 iterations) 500 instances with attached network booted every 30 minutes ● Gnocchi Settings ● metricd workers per Controller = 18 ● api workers per Controller = 24 Ceilometer Settings ● notification_workers = 3 ● rabbit_qos_prefetch_count = 128 ● 300s polling interval Red Hat

Pike Results - 10k Test Gnocchi Backlog Red Hat

Pike Results - 10k Test CPU on Controllers Red Hat

Pike Results - 10k Test Memory on All Hosts Red Hat

Pike Results - 10k Test Disks on Controllers Red Hat

Pike Results - 10k Test Disks on CephStorage Red Hat

20,000 Instances Test Workload (20 iterations) 1000 instances booted ● ● 5000 get measures ● 1000 metric and resource creates/deletes Gnocchi metricd workers per Controller = 36 ● ● api processes per Controller = 24 Ceilometer ● notification_workers = 5 rabbit_qos_prefetch_count = 128 ● ● 300s polling interval Red Hat

Ocata Results Red Hat

Ocata Results Not in Pike Red Hat

Pike Results - 20k Test Gnocchi Backlog Red Hat

Pike Results - 20k Test CPU on Controllers Red Hat

Pike Results - 20k Test Memory on All Hosts Red Hat

Pike Results - 20k Test Disks on Controllers Red Hat

Pike Results - 20k Test Disks on CephStorage Red Hat

Pike Results - 20k Test Network Controllers Em1 Red Hat

Pike Results - 20k Test Network Controllers Em2 Red Hat

API Get Measures - 20k Test Red Hat

API Create/Delete Metrics - 20k Test Red Hat

API Create/Delete Resources - 20k Test Red Hat

Tuning - Gnocchi Some Differences between versions (Newton, Ocata, Pike) Pike (Gnocchi v4) metricd/api workers ● Incoming storage driver (Redis is currently prefered) ● Ocata / Newton (Gnocchi 3.1 / 3.0) metricd/api workers ● tasks_per_worker / metric_processing_delay ● Check scheduler (Use latest version of Gnocchi) ● Red Hat

Tuning - Ceilometer Always avoid overwhelming Gnocchi backlog (collect what you need/use) ● check rabbit_qos_prefetch_count - Monitor Rabbitmq too ● Pike agent-notification workers ● Ocata publish directly to Gnocchi (disable collector) ● Newton collector workers ● Red Hat

Conclusion OpenStack Telemetry is now proven to the 10,000 instance mark and more in Pike Minimal degradation in response timing of API as more and more metrics are collected Of course there is still room for improvements: Reduce the load on the archival storage ● Spikes in API timings (Frontend API vs Backend API) ● Performance testing with other storage drivers (Swift, File) ● Red Hat

THANK YOU plus.google.com/+RedHat facebook.com/redhatinc linkedin.com/company/red-hat twitter.com/RedHatNews youtube.com/user/RedHatVideos

The return of OpenStack Telemetry and the 10,000 Instances - PowerPoint PPT Presentation

The return of OpenStack Telemetry and the 10,000 Instances Telemetry Project Update Alex Krzos Julien Danjou 8 November 2017 The return of OpenStack Telemetry and the 10,000 Instances 20,000 Alex Krzos Julien Danjou 8 November 2017

OpenStack Telemetry and the 10,000 Instances To infinity and beyond Julien Danjou Alex Krzos 9

What is OpenStack ? Hello! I am Thierry Carrez I work for the OpenStack Foundation. You can

All Saints: Pi Top Telemetry System ALL SAINTS SOLAR CAR Usage The Telemetry system

Example Instances 22 101 10/10/96 58 103 11/12/96 We will use these S1 sid sname

TELEMETRY Project overview and update JULIEN DANJOU IRC: JD_ GORDON CHUNG IRC: GORDC What

Get a Python job, Work on OpenStack ! about:me Release Manager for OpenStack Chair of

BUILD YOUR FIRST OPENSTACK APPLICATION WITH OPENSTACK PYTHONSDK VICTORIA MARTINEZ DE LA CRUZ

Future of OpenStack Looking Forward to 2019 Alan.Clark@suse.com What and Why OpenStack

High Tide Technologies, LLC Advances In Telemetry- Satellite Telemetry COMPANY INTRODUCTION

Collecting telemetry data using P4 and RDMA Rutger Beltman Silke Knossen Supervisors: Joseph

Example Instances 22 101 10/10/96 58 103 11/12/96 We will use these sid sname rating

OpenStack Summit Primer: The Who, What, Why, and How of OpenStack Presented by Ben Silverman,

Build your own Web Portal using OpenStack APIs and Services OpenStack Summit in Austin 2016

Utilizing Lean Methodologies to Manage Telemetry Devices Christina Carranza, MSN, RN-BC, CNML

Satellite Telemetry Services for ARGO floats Solne Routaboul sroutaboul@groupcls.com

Introduction to OpenStack Nabil Abdennadher, HES-SO What is OpenStack ? Free and

Introduction to OpenStack Nabil Abdennadher, HES-SO What is OpenStack ? Free and

GPU on OpenStack Masafumi Ohta @masafumiohta Who am I > Working for System Integrator as

Coordination and Leadership challenges in producing OpenStack Thierry Carrez (@tcarrez) Release

Preventing craziness A deep dive into OpenStack testing automation Thierry Carrez (@tcarrez)

JS8 and JS8Call --- Telemetry and Messaging --- A JS8 to APRS Gateway Receiver Paul Elliott /

OpenStack Charms Project Update, OpenStack Summit Vancouver James Page (jamespage) What are the

OpenStack Hackathon Rio de Janeiro, Brazil - 2017 OpenStack Hackathons Help app developers

OpenStack and OVN Whats New with OVS 2.7 OpenStack Summit -- Boston 2017 Russell Bryant

The return of OpenStack Telemetry and the 10,000 Instances - PowerPoint PPT Presentation

The return of OpenStack Telemetry and the 10,000 Instances Telemetry Project Update Alex Krzos Julien Danjou 8 November 2017 The return of OpenStack Telemetry and the 10,000 Instances 20,000 Alex Krzos Julien Danjou 8 November 2017

OpenStack Telemetry and the 10,000 Instances To infinity and beyond Julien Danjou Alex Krzos 9

What is OpenStack ? Hello! I am Thierry Carrez I work for the OpenStack Foundation. You can

All Saints: Pi Top Telemetry System ALL SAINTS SOLAR CAR Usage The Telemetry system

Example Instances 22 101 10/10/96 58 103 11/12/96 We will use these S1 sid sname

TELEMETRY Project overview and update JULIEN DANJOU IRC: JD_ GORDON CHUNG IRC: GORDC What

Get a Python job, Work on OpenStack ! about:me Release Manager for OpenStack Chair of

BUILD YOUR FIRST OPENSTACK APPLICATION WITH OPENSTACK PYTHONSDK VICTORIA MARTINEZ DE LA CRUZ

Future of OpenStack Looking Forward to 2019 Alan.Clark@suse.com What and Why OpenStack

High Tide Technologies, LLC Advances In Telemetry- Satellite Telemetry COMPANY INTRODUCTION

Collecting telemetry data using P4 and RDMA Rutger Beltman Silke Knossen Supervisors: Joseph

Example Instances 22 101 10/10/96 58 103 11/12/96 We will use these sid sname rating

OpenStack Summit Primer: The Who, What, Why, and How of OpenStack Presented by Ben Silverman,

Build your own Web Portal using OpenStack APIs and Services OpenStack Summit in Austin 2016

Utilizing Lean Methodologies to Manage Telemetry Devices Christina Carranza, MSN, RN-BC, CNML

Satellite Telemetry Services for ARGO floats Solne Routaboul sroutaboul@groupcls.com

Introduction to OpenStack Nabil Abdennadher, HES-SO What is OpenStack ? Free and

Introduction to OpenStack Nabil Abdennadher, HES-SO What is OpenStack ? Free and

GPU on OpenStack Masafumi Ohta @masafumiohta Who am I &gt; Working for System Integrator as

Coordination and Leadership challenges in producing OpenStack Thierry Carrez (@tcarrez) Release

Preventing craziness A deep dive into OpenStack testing automation Thierry Carrez (@tcarrez)

JS8 and JS8Call --- Telemetry and Messaging --- A JS8 to APRS Gateway Receiver Paul Elliott /

OpenStack Charms Project Update, OpenStack Summit Vancouver James Page (jamespage) What are the

OpenStack Hackathon Rio de Janeiro, Brazil - 2017 OpenStack Hackathons Help app developers

OpenStack and OVN Whats New with OVS 2.7 OpenStack Summit -- Boston 2017 Russell Bryant

GPU on OpenStack Masafumi Ohta @masafumiohta Who am I > Working for System Integrator as