service quality management for multi domain network
play

Service Quality Management for multi- domain network services Pavle - PowerPoint PPT Presentation

Service Quality Management for multi- domain network services Pavle Vuleti , AMRES eduPERT videoconference, 20 July 2015 What is Service Quality Management? Resource Performance Management (RPM) provides insight into the network and


  1. Service Quality Management for multi- domain network services Pavle Vuleti ć , AMRES eduPERT videoconference, 20 July 2015

  2. What is Service Quality Management? Resource Performance Management (RPM) – provides insight into the network and network element performance and behavior (e.g. status of the interface, the amount of traffic passing through the interface, CPU load or similar) Service Quality Management (SQM) Correlates network measurement data with the service information and gives it a specific meaning. processes related to SLA verification and assurance Used to check and verify key SLA parameters and customers experience. Increasingly important with the development of virtualized environments where multiple service instances share the same physical infrastructure 2 Connect | Communicate | Collaborate

  3. SQM supporting tool – main goals Support multi-domain, multi-instance (and multi-point) network services. MDVPN is a primary target, but also multi domain circuits or other multi- domain services should be able to use this system. Aim : Monitor service end-to-end, capture user’s experience of the service and verify contractual obligations (SLA) - Service Quality Management - SQM. Tool users : service operators to continuously monitor KPI of the service and knowledgeable service users to have an insight into the SLA verification. Allow dynamic service paths - do not depend on the service path and network element access rights along the path Scalability. Monitoring end-to-end m instances of a service, with n end points each in multiple domains requires in a straightforward solution mn measurement agents. Reduce this number! Simple for use (simple configuration) Accuracy and reliability Always prefer reusing/integrating existing tools if possible than developing new components 3 Connect | Communicate | Collaborate

  4. MDVPN, an example of multi-domain, multi-instance, multi-point service 4 Connect | Communicate | Collaborate

  5. What is specified by the SLA? Service is offered to the end-user – SLA should capture user’s demands and expected service experience ITU-T Y.1540 and Y.1541 specify in detail the definition of specific performance metrics and KPIs SLA parameters are: packet latency , latency variation (jitter) and packet loss rate (PLR) For different services and applications, there are specific performance metrics that must be guaranteed in order to satisfy service perception. Real-time applications demand guarantees of all three metrics, whereas applications like file transfer and web browsing only need guarantees for PLR. Bandwidth measurements (capacity, available bandwidth, TCP throughput and similar) are not a part of the SLA . Bandwidth measurements are typically used before the service is put into production. 5 Connect | Communicate | Collaborate

  6. SLA models for multi-point services SLA model depends on the type of the service Point to point Hub and spoke (point to multiple points) Multipoint (mesh SLA) MDVPN offers a very general set of services for end users – all SLA models are applicable for MDVPN services Problem of multipoint services – SLA scalability MEF 10.2 and Y.1563 propose the use of aggregated SLA metrics for multipoint services (e.g. an average value of the delay or jitter of all combinations of paths between service instance end-points). However, even for the aggregated metrics to be properly measured in the multipoint case the number of measurement agents will be equal to the number of end points and the number of measurements on the order of O(n 2 ) 6 Connect | Communicate | Collaborate

  7. SLA verification in multi-domain environments - strategies End-to-end measurements: assume the use of passive and active methods to measure delay, jitter and packet loss between service end points Problem: If service has m service instances where service instance x has n x end points, the total number of measurement agents is ​ 𝑂↓𝑛𝑏 = ∑𝑦 =1 ↑𝑛▒​𝑜↓𝑦 Metric composition: described in several standard documents – RFC 5644, RFC 5835, RFC 6049, ITU-T Y.1541 Key measurement parameters are measured in each domain and then end-to-end metrics are estimated from the per-domain measurements Total number of measurement agents in this case can be ​𝑂↓𝑛𝑏 = ∑𝑦 =1 ↑𝑒▒​𝑐↓𝑦 , where b x is the number of cross-border connections of the domain x, and d is the number of domains. 7 Connect | Communicate | Collaborate

  8. Metric composition - issues this methodology is more scalable, but inherently less accurate especially for jitter measurements [Douardo et. al] because of the several issues like: the measurement of the border link, double measurements on MA links, time synchronization of the measurements, etc. Also there are issues with the exposure of the per-domain data towards the central measurement gathering and calculation device in-instance measurements and changes in service instance topology due to the routing table changes Who ¡measures ¡this ¡ What ¡if ¡this ¡is ¡ link? measured ¡1 ¡min ¡ after ¡Domain ¡A? Delay, ¡jitter, ¡loss Delay, ¡jitter, ¡loss MA MA MA MA Domain ¡A Domain ¡B This ¡link ¡can ¡be ¡ measured ¡twice ¡if ¡ MA ¡is ¡not ¡on ¡the ¡ network ¡element 8 Connect | Communicate | Collaborate

  9. Measurement methodologies Active and passive monitoring Delay, jitter have to be measured actively, while packet loss can be inferred from the passive measurements, although using complex methodology and large resources What can be active probes for SLA verification? Separate measurement points (perfSONAR MP, Atlas probes,...) Network element features (Cisco SLA, Juniper RPM) – not compatible MPLS VPNs do not have standardized method for the performance measurement (features like MPLS BFD not compatible between different vendors – incomplete implementations) Recently concluded IETF l3vpn WG aimed to propose standards for MPLS and MPLS VPN performance monitoring, the extension of RFC 6374: draft-zheng-l3vpn-pm-analysis-03 (expired), July 2014. draft-dong-l3vpn-pm-framework-02 (expired), January 2014. draft-ni-l3vpn-pm-bgp-ext-01 (expired), February 13, 2014 9 Connect | Communicate | Collaborate

  10. MPLS and MPLS VPN monitoring challenges Problem is LSP aggregation, especially when PHP is used Drafts propose the new concept of the "VRF-to-VRF Tunnel" (VT). In this concept, each PE router needs to allocate MPLS labels to identify the VRF-to- VRF tunnel between the local VRF and the remote VRFs (labels are called VT labels). It is likely that the functionality that is being developed is going to be a feature for the PE routers, but it does not exist now. 10 Connect | Communicate | Collaborate

  11. SQM - High level design decisions - 1 Uses standard active measurement architecture, like perfSONAR or IETF LMAP (measurement agent + measurement collector + measurement controller) Measure and monitor only the SLA parameters: delay, jitter, loss No heavyweight, intrusive and unreliable capacity/available bandwidth estimations Relies on reliable and standardized active measurements (owamp): No dependence on the service path and network element access rights along the service instance path Accuracy: Use end-to-end measurements instead of metric composition strategies External devices are needed as there are no interoperable solutions on network elements 11 Connect | Communicate | Collaborate

  12. High level design decisions - 2 Scalability and simplicity: Small/zero footprint measurement agents (SBCs), Measurement results are not collected on agents „One-click“ configuration of measurement agents Multi-homing measurement agents (one measurement agent can serve multiple service instances with overlapping address spaces) 12 Connect | Communicate | Collaborate

  13. SQM architecture and the prototype Alarm ¡ Service ¡quality ¡ Trouble ¡ticket ¡ Based on IETF LMAP architecture User ¡Interface management ¡ reports system system Main new components: Service/SLA inventory Service/SLA ¡ SQM processing tool Service ¡Quality ¡Management Inventory OWAMP based measurement agents (not zero footprint at the moment) Reporting, alarming not a part controller collector of the short-term goals MA MA MA MA MA Resource ¡Performance ¡Management 13 Connect | Communicate | Collaborate

  14. Service/SLA inventory Stores the relevant data about service instances (both transport service and MDVPN services) Stores SLA parameters and thresholds Data model used TMF SID as inspiration Based on MDVPN data model for service requests First implementation was the extension of perfSONAR SLS, current version is built from scratch 14 Connect | Communicate | Collaborate

  15. SQM component Gathers SLA data from inventory Gathers measurement data Makes distinction between measurement data belonging to different service instances Displays SLA data Displays temporal graphs of main SLA parameters 15 Connect | Communicate | Collaborate

  16. SQM prototype

  17. Prototype setup 17 Connect | Communicate | Collaborate

  18. Service inventory / configuration

  19. SLA/Service inventory front page 19 Connect | Communicate | Collaborate

  20. Creating new service instances 20 Connect | Communicate | Collaborate

  21. List of NRENs subscribed to the service 21 Connect | Communicate | Collaborate

  22. Adding new Measurement Agent 22 Connect | Communicate | Collaborate

Recommend


More recommend