01.11.2017 Automation + Machine Learning = Hands Free NFV A Word On “ Automation through ML for Openstack NFV ” PRAKASH RAMCHANDRAN MICHAEL TIEN JAYANTHI A GOKHALE
Agenda NFV Automation Challenges? Practical Viewpoint from Dell Labs? Automation of NFV Service & NFV Infrastructure Redfish What’s new in Standards ? Practical Viewpoint from Dell Labs? ZSM the new NFV Zero Touch Evolving Standards Swordfish What can ML Bring to automation ? What’s next? The Industry in moving towards E2E Orchestration and Manual to ML driven Intelligent Automation Management
NFV Automation Challenges Dell openstack @OpenStack openstack OpenStackFoundation
NFV Adoption Challenge Lack of end-to-end automation • Lack interoperability NFVI/VNF and VNF/VNF • Unpredictable datacenter planning • Lack of service awareness • Not easy consumable API • Limited programmability service-to-service • Not zero touch-free • Requires various resources to maintain (IT, • DevOps, Operations) Requires multiple POCs •
Today’s Datacenter Challenge Automation - the key to unlocking future efficiencies
What’s new in Standards Dell openstack @OpenStack openstack OpenStackFoundation
ETSI ISG ZSM
Zero Touch Networking Service and Management Zero touch NFV provides true next generation NFVaaS or VNFaaS M2M Communications, Provisioning, and • Management Dynamic service chain data mapping • Dynamic policy enhancement and • enforcement Continuous Data Collector and Analytic • Auto reactive + proactive self-healing • Real-time datacenter capacity scheduler • Autonomous end-to-end orchestration • lifecycle management Intelligent service-state awareness • optimization Smart API •
What is Machine Learning? Machine learning is a way for infrastructure or platforms to understand and progressively learn from input data to validate models to understand the behavior of system to attain desirable outcomes. (e.g.. Overcoming FCAPS in Telco terms) Why Does Automation need ML? In our case Anomaly detection of systems, networking and network functions is the goal based on FCAPS. This can be done by supervised or unsupervised or dynamic learning Basic requirements for this is Closed loop Control mechanism.
Self healing within Layers (Local Policy) & ML for Cross-layer (Global Policy) What’s new in Standards / Opensource for NFV Stack Connected TR 188 004 Service / OSS/BSS Vehicle Open Policy Agent Application Application Network Slicing ONAP, SDNC NS NFVO/SDNO Container EMS, VNF SDK VNF VNFM workload Kata,CNI,NVME Containerized OS VM, VN,VS VIM
What can ML Bring to Automation Gokhale Jayanthi @OpenStack openstack openstack OpenStackFoundation
Traditional Manual Deploy Cycle The changing landscape of Infrastructure • Bare metal • Hypervisor • VM – Booting, Secure Booting, Booting from Volume • Container CRI & CNI Light weight VM – Kata - Intel • • VM in Container – Unnamed yet - Redhat
Why NFV Automation needs innovation? NFV and SDN integrated clouds are growing from centralized to geographically scattered and • massively distributed clouds Thus Orchestration, Management and Maintenance has become more challenging and requires • more attention to distributed , hybrid clouds and need of hour is to accelerated service velocity. Automation is a prime solution to Provision and Maintain complete environment. • With a mix of Intelligent Infrastructure & Machine Learning it is possible to target dynamic cloud • management . We focus here on Service & Infrastructure Management automation . • We share our experience dealing with compute (Redfish), storage(swordfish) and Networking (SDN- • WAN) and how we add closed loops and benefit derived form Data Collection and Analysis with ML. Leads to Hands Free or Zero Touch NFV. •
Some Statistics • 80% of outages impacting mission critical applications are caused by people and process issues • 50% of these are caused by change, configuration, handoff, release integration, re deployable application services etc • Though the number of downtime hours is reduced, cost of downtime is now 50X • Automate the Deployment Process, Intelligently
The Learning Input Points
Intelligent Infrastructure Deployment • Identify smart ways to create, manage and orchestrate federation environments. • ML can be utilized to train AI systems to recognize demand and deployment patterns in the context of various Service Level Objective metrics, called dimensions, like • Number of VM instances • Network demand • Migration metrics • Latency measures • SLA parameters of throughput • Number and type of SLA violations • Cluster sizes • AZs • ML can be used to devise optimized containers, container sizing, planning of microservices • Results in true Agile Infrastructure provisioning.
Intelligent Infrastructure • Service Providers can easily and efficiently accommodate the demands of mixed workloads from a single platform. • Leveraging the QoS capabilities, policies can be provisioned and enforced to isolate each workload while running simultaneously within a shared infrastructure. • ML needs vast amounts of real time performance data generated by a QOS monitor and network telemetry data, providing early recognition of developing performance issues, before they negatively impact human experience. The ML provides information to fine tune / redeploy the infrastructure to optimize the QOS metrics.
Automated Deployment Process • Static Deployment • Templated. Flavours can be used to select based on requirements. • Dynamic • Dynamically determine deployment context and deployment parameters. Define the deployment plan. Once defined, it remains static. • Smart / Intelligent Deployment • ML and AI driven deployment to optimise the Service Level Objectives. The deployment plan is predicted, evaluated, customized and optimized. • TOSCA document used to describe the services and applications to be deployed on the cloud the deployment description
Advantages • Eliminate manual intervention out of the deployment process (application and infrastructure) • Reduce complexity. Can now consider major and minor driving factors to strategise deployment plan • Global and local optimization is possible
Automating the process
TOSCA • T opology & O rchestration S pecification of C loud A pplications • Standardised language to describe • Detailing of the application & infrastructure in a portable manner • Defines the structure and composition of applications and their infrastructure • Defines the relationships • Specifies state and behaviour (deploy, shutdown, restart etc) • Relate this with the cloud infrastructure management policies (and associated SLAs) • Model that specifies applications, virtual and physical infrastructure. • Stores the info in a ‘service template’ in yaml which is processed at deploy-time and perform virtual & physical deployment
Application Topology • Defined at 3 levels • Infrastructure (cloud and DC objects) • Platform / Middleware (App Containers) • Application modules and their configuration
Service Orchestration • Should address to • Cloud Infra Orchestration • Container Orchestration • Network Orchestration • Application Orchestration (including Legacy Applications)
TOSCA supported ML Models Modify Conductor Candidate Build & Update Template Model models Params ML METEOS Revise & Re deploy Select Gather Metrics Ceilometer Logs
Training System • Pruned Decision Tree • Neural Network • Hyper parameter optimization using cross validation (Random Forests)
Metrics, a few examples • Number of instances • Instance size • Demand of Load • Inter arrival request time • Delay time / Latency to service a request • Workload latency • Throughput time for service • Telemetry data • Network demand • Number of SLA violations • Number of containers • Cost of number of replication sets
Technology Stack • Apache Kafka • WEKA • Scala • Python & Java languages • Docker • Kubernetes • Kata
Practical Viewpoint from Dell Labs Michael Tien openstack openstack OpenStackFoundation @OpenStack
Redfish – the next-generation systems management standard for an evolving IT environment • DMTF Scalable Platform Management Forum has created an open industry standard specification and schema for simple, modern, and secure management of scalable platform hardware • A secure, multi-node, RESTful management interface built upon HTTPS in JSON format based upon OData v4 • Schema-based but human-readable; usable by client applications and browser-based GUIs • Covers key use cases and customer requirements
What Redfish can do today? Provides a common interface across platforms and vendors supporting ▪ Reset, reboot, and power control servers ▪ Inventory server hardware and firmware versions ▪ Monitor health status of server ▪ Access system logs ▪ Alert on server health status changes
Delivering the benefits of Redfish - 14G iDRAC9 with Lifecycle Controller
Recommend
More recommend