automation machine learning
play

Automation + Machine Learning = Hands Free NFV A Word On Automation - PowerPoint PPT Presentation

01.11.2017 Automation + Machine Learning = Hands Free NFV A Word On Automation through ML for Openstack NFV PRAKASH RAMCHANDRAN MICHAEL TIEN JAYANTHI A GOKHALE Agenda NFV Automation Challenges? Practical Viewpoint from Dell Labs?


  1. 01.11.2017 Automation + Machine Learning = Hands Free NFV A Word On “ Automation through ML for Openstack NFV ” PRAKASH RAMCHANDRAN MICHAEL TIEN JAYANTHI A GOKHALE

  2. Agenda NFV Automation Challenges? Practical Viewpoint from Dell Labs? Automation of NFV Service & NFV Infrastructure Redfish What’s new in Standards ? Practical Viewpoint from Dell Labs? ZSM the new NFV Zero Touch Evolving Standards Swordfish What can ML Bring to automation ? What’s next? The Industry in moving towards E2E Orchestration and Manual to ML driven Intelligent Automation Management

  3. NFV Automation Challenges Dell openstack @OpenStack openstack OpenStackFoundation

  4. NFV Adoption Challenge Lack of end-to-end automation • Lack interoperability NFVI/VNF and VNF/VNF • Unpredictable datacenter planning • Lack of service awareness • Not easy consumable API • Limited programmability service-to-service • Not zero touch-free • Requires various resources to maintain (IT, • DevOps, Operations) Requires multiple POCs •

  5. Today’s Datacenter Challenge Automation - the key to unlocking future efficiencies

  6. What’s new in Standards Dell openstack @OpenStack openstack OpenStackFoundation

  7. ETSI ISG ZSM

  8. Zero Touch Networking Service and Management Zero touch NFV provides true next generation NFVaaS or VNFaaS M2M Communications, Provisioning, and • Management Dynamic service chain data mapping • Dynamic policy enhancement and • enforcement Continuous Data Collector and Analytic • Auto reactive + proactive self-healing • Real-time datacenter capacity scheduler • Autonomous end-to-end orchestration • lifecycle management Intelligent service-state awareness • optimization Smart API •

  9. What is Machine Learning? Machine learning is a way for infrastructure or platforms to understand and progressively learn from input data to validate models to understand the behavior of system to attain desirable outcomes. (e.g.. Overcoming FCAPS in Telco terms) Why Does Automation need ML? In our case Anomaly detection of systems, networking and network functions is the goal based on FCAPS. This can be done by supervised or unsupervised or dynamic learning Basic requirements for this is Closed loop Control mechanism.

  10. Self healing within Layers (Local Policy) & ML for Cross-layer (Global Policy) What’s new in Standards / Opensource for NFV Stack Connected TR 188 004 Service / OSS/BSS Vehicle Open Policy Agent Application Application Network Slicing ONAP, SDNC NS NFVO/SDNO Container EMS, VNF SDK VNF VNFM workload Kata,CNI,NVME Containerized OS VM, VN,VS VIM

  11. What can ML Bring to Automation Gokhale Jayanthi @OpenStack openstack openstack OpenStackFoundation

  12. Traditional Manual Deploy Cycle The changing landscape of Infrastructure • Bare metal • Hypervisor • VM – Booting, Secure Booting, Booting from Volume • Container CRI & CNI Light weight VM – Kata - Intel • • VM in Container – Unnamed yet - Redhat

  13. Why NFV Automation needs innovation? NFV and SDN integrated clouds are growing from centralized to geographically scattered and • massively distributed clouds Thus Orchestration, Management and Maintenance has become more challenging and requires • more attention to distributed , hybrid clouds and need of hour is to accelerated service velocity. Automation is a prime solution to Provision and Maintain complete environment. • With a mix of Intelligent Infrastructure & Machine Learning it is possible to target dynamic cloud • management . We focus here on Service & Infrastructure Management automation . • We share our experience dealing with compute (Redfish), storage(swordfish) and Networking (SDN- • WAN) and how we add closed loops and benefit derived form Data Collection and Analysis with ML. Leads to Hands Free or Zero Touch NFV. •

  14. Some Statistics • 80% of outages impacting mission critical applications are caused by people and process issues • 50% of these are caused by change, configuration, handoff, release integration, re deployable application services etc • Though the number of downtime hours is reduced, cost of downtime is now 50X • Automate the Deployment Process, Intelligently

  15. The Learning Input Points

  16. Intelligent Infrastructure Deployment • Identify smart ways to create, manage and orchestrate federation environments. • ML can be utilized to train AI systems to recognize demand and deployment patterns in the context of various Service Level Objective metrics, called dimensions, like • Number of VM instances • Network demand • Migration metrics • Latency measures • SLA parameters of throughput • Number and type of SLA violations • Cluster sizes • AZs • ML can be used to devise optimized containers, container sizing, planning of microservices • Results in true Agile Infrastructure provisioning.

  17. Intelligent Infrastructure • Service Providers can easily and efficiently accommodate the demands of mixed workloads from a single platform. • Leveraging the QoS capabilities, policies can be provisioned and enforced to isolate each workload while running simultaneously within a shared infrastructure. • ML needs vast amounts of real time performance data generated by a QOS monitor and network telemetry data, providing early recognition of developing performance issues, before they negatively impact human experience. The ML provides information to fine tune / redeploy the infrastructure to optimize the QOS metrics.

  18. Automated Deployment Process • Static Deployment • Templated. Flavours can be used to select based on requirements. • Dynamic • Dynamically determine deployment context and deployment parameters. Define the deployment plan. Once defined, it remains static. • Smart / Intelligent Deployment • ML and AI driven deployment to optimise the Service Level Objectives. The deployment plan is predicted, evaluated, customized and optimized. • TOSCA document used to describe the services and applications to be deployed on the cloud  the deployment description

  19. Advantages • Eliminate manual intervention out of the deployment process (application and infrastructure) • Reduce complexity. Can now consider major and minor driving factors to strategise deployment plan • Global and local optimization is possible

  20. Automating the process

  21. TOSCA • T opology & O rchestration S pecification of C loud A pplications • Standardised language to describe • Detailing of the application & infrastructure in a portable manner • Defines the structure and composition of applications and their infrastructure • Defines the relationships • Specifies state and behaviour (deploy, shutdown, restart etc) • Relate this with the cloud infrastructure management policies (and associated SLAs) • Model that specifies applications, virtual and physical infrastructure. • Stores the info in a ‘service template’ in yaml which is processed at deploy-time and perform virtual & physical deployment

  22. Application Topology • Defined at 3 levels • Infrastructure (cloud and DC objects) • Platform / Middleware (App Containers) • Application modules and their configuration

  23. Service Orchestration • Should address to • Cloud Infra Orchestration • Container Orchestration • Network Orchestration • Application Orchestration (including Legacy Applications)

  24. TOSCA supported ML Models Modify Conductor Candidate Build & Update Template Model models Params ML METEOS Revise & Re deploy Select Gather Metrics Ceilometer Logs

  25. Training System • Pruned Decision Tree • Neural Network • Hyper parameter optimization using cross validation (Random Forests)

  26. Metrics, a few examples • Number of instances • Instance size • Demand of Load • Inter arrival request time • Delay time / Latency to service a request • Workload latency • Throughput time for service • Telemetry data • Network demand • Number of SLA violations • Number of containers • Cost of number of replication sets

  27. Technology Stack • Apache Kafka • WEKA • Scala • Python & Java languages • Docker • Kubernetes • Kata

  28. Practical Viewpoint from Dell Labs Michael Tien openstack openstack OpenStackFoundation @OpenStack

  29. Redfish – the next-generation systems management standard for an evolving IT environment • DMTF Scalable Platform Management Forum has created an open industry standard specification and schema for simple, modern, and secure management of scalable platform hardware • A secure, multi-node, RESTful management interface built upon HTTPS in JSON format based upon OData v4 • Schema-based but human-readable; usable by client applications and browser-based GUIs • Covers key use cases and customer requirements

  30. What Redfish can do today? Provides a common interface across platforms and vendors supporting ▪ Reset, reboot, and power control servers ▪ Inventory server hardware and firmware versions ▪ Monitor health status of server ▪ Access system logs ▪ Alert on server health status changes

  31. Delivering the benefits of Redfish - 14G iDRAC9 with Lifecycle Controller

Recommend


More recommend