Ethernet OAM Victor Olifer (JANET/GEANT JRA1 Task 1) JRA1/TERENA workshop, Copenhagen, 20 November 2012 connect • communicate • collaborate 1
Agenda Ethernet Service Assurance & Monitoring overview Monitoring standards Service assurance standards Service assurance lab trials CFM/Y.1731 trial Multi-domain testbed OAM agent boxes CyPortal JRA1 & JRA2 trial (Year 4 extension) Multi-segment connections Diverse equipment perfSONAR extensions connect • communicate • collaborate 2
Wide-area point-to-point Ethernet connections Ethernet over MPLS Ethernet over Transport Ethernet Multi-segment multi-domain connection with: - Ethernet UNI (a must); - segments of pure Ethernet (optional); - segments where Ethernet is tunneled over some other technology, e.g TDM (SDH, OTN) or MPLS (optional) Where we can find such connections? - GEANT Plus, JANET Lightpath: demand is from big projects, large scientific centres - Inter-router connections - An offer from commercial providers: they had 20% revenue growth in 2010 over 2009. Mobile backhaul and multi-site corporates are major users; the reasons – price and flexibility - New demand for academic providers might arise from such areas as cloud services, data centres, HD videoconferences, multi-site university connections connect • communicate • collaborate 3
Problems with managing Ethernet connections Until recently Ethernet had no OAM tools (hence cheapest equipment) -> no way to check, monitor and troubleshoot connectivity and performance end-to-end ( a customer view) or within a domain (a provider view). E.g. comparing to IP experience: No ping, traceroute and ICMP diagnostic messages available. Partial solution: we can use MPLS or SDH/OTN OAM to manage tunnels Good news: Ethernet OAM functions started being developed and implemented in equipment since 2007-8 Bad news: We (JANET) don’t have much experience in Ethernet OAM use. The same situation in other NRENs (as far as I know from GEANT3 participants). connect • communicate • collaborate 4
Three areas of emerging Ethernet OAM standards • Checks whether a connection performs to its specs, e.g. up to CIR and EIR, after service configuration and Service activation. assurance • Periodic checks of connection connectivity (continuity) and performance (delay, loss, throughput, availability) Service monitoring • When monitoring shows a fault one needs to locate a faulty point along a path and possible reason(s) of a Service Service failure trouble shooting connect • communicate • collaborate 5
Service Assurance (1) 1. Service definitions (topology: e.g. point-to-point, bandwidth profile: CIR, EIR for several CoS): • MEF 10.2 • ITU-T G.8011 Very important as it is often a cause of confusions: e.g. CIR might be measured for UDP payload or Ethernet frames – very different figures for the same data flow 2. Service performance parameters (delay, loss, throughput, availability): • MEF 10.2.1 • Y.1563 connect • communicate • collaborate 6
Service Assurance (2) 3.Service Verification Relatively new (Summer 2011) ITU-T spec Y.1564 “Ethernet service activation test methodology” • Defines a simple disruptive on-demand procedure that tests connectivity and throughput up to CIR & EIR & policing limit by injecting traffic into a connection • More suitable for Ethernet than complex and IP-centric RFC2544; implemented in many traffic generators connect • communicate • collaborate and boxes 7
Service Assurance trials JANET lab trial of SunRise RxT tester Positive impression, works according the standard, looks worth to try in wide-area tests Tester PIR Box PIR=CIR+EIR CIR Just one problem: Y.1564 doesn't’t give an opportunity to detect the situation when real PIR value set up lower connect • communicate • collaborate than expected (not box bug, just the standard intention) 8
Service Monitoring IEEE 802.1ag Connectivity Fault Management (CFM) (ratified in 2007): - Hierarchical sessions of heartbeat messages (Continuity Check Messages, CCM) -> up/down status check - VLAN-aware - MEP (End) and MIP (Intermediate) maintenance points ITU-T Y.1731 (ratified in 2008): Same as CFM + Performance monitoring (delay, loss, throughput) Customer maintenance session level 7 Service provider maintenance session level 5 Operator maintenance sessions level 3 connect • communicate • collaborate 9
Service Troubleshooting CFM : - Linktrace (analogy of IP traceroute ) - Loopback (analogy of IP ping ) - RDI (Remote Defect Indication) Y.1731 : - same as CFM + a richer set of diagnostic messages + performance monitoring (loss, delay, throughput) : - Alarm Indication Signal (AIS) - Lock Signal - … connect • communicate • collaborate 10
Service monitoring trials JRA1 Task 1 Ethernet OAM trial (2011): - 5 NRENs, 5 connections under 6 months monitoring - Small Y.1731 agent boxes from Overture - CyPortal from Cyan Optics for storing and visualising of monitoring data Positive results but only for single-segment connections Combined JRA1 Task 1& JRA2 Task 3 Service Assurance & Monitoring trial GN3 Year 4 (2012-2013) - ongoing connect • communicate • collaborate 11
JRA 1 Ethernet OAM trial (2011) objectives Test CFM/Y.1731 functions in multi-domain and multi-vendor environment (5 connections) Evaluate Y.1731 agent boxes Evaluate OAM data visualisation system (CyPortal) Essex Uni JANET LH Cyan OAM portal Collector OAM Cloud service NORDUnet Data from Collector Equipment under test OAM agent (Overture ISG24) Monitored VLAN connections SURFnet CESNET PIONIER (PSNC) connect • communicate • collaborate 12
OAM agent options Dedicated extra network switch with advanced OAM capabilities Pros: uniform, rich OAM functionality, and consistent source of monitoring data Cons: extra boxes overheads (adds complexity, cost – especially for high speed links, maintenance etc) OAM capabilities of existing network boxes: routers, switches, muxes Pros: no extra equipment, ability to test internal segments Cons: some vendor-specific features, e.g. in CFM MIBs – diverse environment with possible incompatibilities Software OAM agent on a dedicated server (e.g. ‘dot1ag -utils ’ developed by SARA and presented by Ronald van der Pol at NORDUnet 2011) Pros: end users can ping and trace network elements; no switches needed Cons: currently limited to MEP down functionality, performance depends on a server performance, time precision might be an issue connect • communicate • collaborate 13
ISG24 OAM agent box trial Compact 4 port GE demarcation box, low cost (~ $1000) 2 copper GE and 2 SFP ports (there is 10GE version) Web GUI OAM functions: CFM Y.1731 D(elay)MM and L(oss)M RFC 2544 PAA – proprietary analogy of Y.1731 Ethernet First Mile 802.2ag connect • communicate • collaborate 14
ISG24 CCM (continuity) tests Positive results – properly detected the Up/Down state of all 5 connections by permanent monitoring over 6 months Compact web form Detailed web form connect • communicate • collaborate 15
ISG 24 DMM (performance) tests Mostly positive results – CFM and PAA Delay Measurement sessions showed stable and close to expected (from other sources) One Way and Two Ways delays and jitter results Janet – NORDUnet PAA results: PSNC– CESNET CFM DMM results: We experienced some problems with CFM One Way delay measurements on two connections – will talk later after CyPortal slides connect • communicate • collaborate 16
CyPortal: monitoring data storage and visualisation Detailed monitoring data are collected from ISG24 agent boxes and stored in a cloud-based database Web GUI provides a map of all services; in red those which current parameters violate SLD connect • communicate • collaborate 17
CyPortal: Per- service data Historical graphical presentation of all parameters under monitoring Zooming of a selected time period Setting of SLA limits Flexible reports connect • communicate • collaborate 18
Problems encountered 1. Saw-tooth shape of delay between JANET LH and Essex Uni Level 5 DDM session There was no reason for saw-tooth shape of Two Way Delay with peaks of about 1 sec showed by MEP Level 5 (ISG24 box) Level 3 DDM session Capturing and analyzing traffic before and after MEP Level 3 (Ciena 311v box) showed the ‘guilty’ box: MEP Level 3 time-stamped packets of MEP Level 5 instead of their transparent forwarding – definitely a bug in a box software connect • communicate • collaborate 19
Problems encountered (cont.) 2. Inability of ISG boxes to measure CFM One Way Delay on some connections (LH-Copenhagen, LH-Essex) PAA: OAD = 10. 903 TWD = 23,004 CFM DMM: OAD = ---- TWD = 23,004 ISG vendor version: too poor synchronization to calculate CFM OWD Seems not to be true: why it is enough for proprietary PAA Needs further investigation ! connect • communicate • collaborate 20
Recommend
More recommend