NREN N NOC TF-NOC preparation meeti ing Copenhagen May 3, 2010 Håvard Kusslid, NOC C-manager, UNINETT hk@un ninett.no
UNINETT NOC , history UNINETT –The Norw wegian research network t level) outsourced. Pre 2002: heldesk (1 st 2 UNINETT technicians a acted on tickets / requests, no engineer on duty ou g y utside office hours. Customers felt at dist tance / " in the dark", often bypassing established ch bypassing established ch hannels, calling their hannels calling their favourite contact direct tly. Unsatisfactory worki ing conditions for key personell, 24/7 inhouse NOC si ince march 2002
UNINETT NOC - 24/7 Standard NOC Hours: Mond S d d NOC H M day - Friday: 08:00 – 16:00 d d F id 08 00 16 00 Full monitoring of network a and services. 2 persons on daily duty, pool l of 20 engineers. A mix of network engineers and system support/ 3 developement engineers. Minimum 1 network enginee Minimum 1 network enginee er on each shift initially. er on each shift initially. Extended hours/ Weekend: Duty engineer on call for pri Duty engineer on call for pri iority issues iority issues Periodic status checking of n network and services and responding to monitoring syste ems alarms. Requirement for duty engine R i t f d t i eer: in-depth network i d th t k knowledge and basic to fair sy ystem knowledge. A pool of 12 duty engineers doing one-week periods.
NOC operators All our network engineers p All k participate as noc-operators, including noc manager and ou ur director of Network and Services. Rotating staff, no de g signated noc operators. (Untill g p ( recently we had two permane ent operators, alternating on 4 week duty with one from the e “pool”.) Recently extended the pool l of noc-operators with persons with system backgro ound. (Merged system and network dept). k d ) One “veteran” noc-operato or allways on duty untill the level of experience is more "u p uniform".
NOC layout NOC centre with 3 work-s stations (two manned) Homegrown monitoring en o eg ow o to g e nvironment for status display v o e t o status sp ay for network and services. 5 Dual-screen pc's (ubuntu) + + place for personal laptop One overhead monitor wit One overhead monitor wit h permanent live view h permanent live view One large-screen monitor f for visualisations/ show-off Standalone environment for r power (ups with backup generator) Dedicated access-switches i n the NOC bypassing the "floor" patch room and local floor patch room and local net, connecting directly to net, connecting directly to core switch in the main serve er facility in the basement.
NOC Tasks, overview w Problem management, chan ge management. Monitors network equipme o to s etwo equ p e e t a ent and status. status. Monitors services (inhouse and external systems). 6 Coordinates the distributio on and updating of software Coordinates planned work Coordinates planned work and outages with customers, and outages with customers circuit providers and NORD Unet. Router configuration servic ces, our own and customers Allocates and manages IP ad ddresses DNS and registry services
Problem manageme ent The NOC performes the usua al steps when an outage or problem is occuring: p g problem identification, troub bleshooting 7 notification, escalation (if n necessary) problem resolution confirm problem resolution, confirm ming status with customers. ming status with customers Problems and events are logg ged in a daily watch log if follow-up is needed a tick if follow up is needed a tick ket is logged. ket is logged We do not currently ticket ev verything (but need to ticket more..) Ticket system/ knowledgeba i k / k l d b ase is beeing researched. i b i h d Problem Management statist tics are not currently produced, but are allso "on the horizon" ".
Change managemen nt The NOC coordinate netwo The NOC coordinate netwo ork installations and ork installations and maintenance, and assists staff f in the field. We allso assist customers with configuration n changes on when asked. 8 Change review, approval, s scheduling, and notification: Changes that will have a co nsiderable effect on the network or topology are revi iewed by the section responsible for the work beei ing done. Scheduled maintenance are Scheduled maintenance are notified in advance. notified in advance. If the changes meet certain criteria thay are performed during set service windows. Emergency changes are han Emergency changes are han ndled on a case by case basis ndled on a case-by-case basis. Changes that need immedia ate attention are notified but may be subject to short notice e or "post-event" notification.
Supporting facilities s Test Lab: We have testing fac Test Lab: We have testing fac cilities for new hardware and cilities for new hardware and software before incorporating it in the network, and for simulation/ troubleshooting. N Not run by the NOC. 9 Inventory system: "Next busi iness day" agreements for larger equipment (larger than C Cisco 6500-series and for the larger Juniper routers) Inhouse stock of smaller route ers/ switches - able to recreate and ship replacement equipme p p q p ent from our own premises . p We allso collaborate with the u universities and have use of emergency replacement from t their spare equipment stock. We have a basic home-grown We have a basic home-grown inventory system with inventory system with thresholds notifications for gb bics, fiber patch cables with different connectors etc.
NOC T ools: Home grown/ open source / monitoring environment for f net/ services: ZINO for netw work monitoring – snmp-based. Hobbit for system monitoring y g g g Home grown traffic enginee ering and simulation tool: 10 Pymetric (metric adjustments s, simulate effects of changes, outages, router failures) outages router failures) Home grown calendaring an nd staffing tool for duty roster planning/ notification of scedu uled duty Open source ticketing syste ems with some modifications (RT, RT -IR) Home grown CMS system: g y Inventory, circuits, service y, , catalogue, customers/ vendor rs, service agreements, operator instructions etc: KIN ND (web front, database)
NOC «on campus» tools Campus toolbox network m Campus toolbox - network m onitoring server, physically onitoring server physically placed on the campus they (+3 30 deployed) are set to manage. Run Debian linux, mai in features: network management system m: NAV 11 netflow analysis tool: Nfsen ( (including NfDump) service monitor : Hobbit tftp with RCS for switch and router configuration archive radius-service for routers an d switches, syslog server + Measuring beacon server, dn ns/mail/all-purpose-server The UNINETT NOC has the operational responsibility for the servers, have spare servers s in the case of a breakdown. Cfengine and backup enables u us to fairly quickly get replaced server back in operat p p tion.
What the NOC do n not run: We have separate helpdesks for: Internal IT (though persone te a (t oug pe so e e ove app g pa t y) ell overlapping partly) Registry operation (suppor t for NORID/ .no) 12 UNINETT FAS helpdesk – FAS coordinates/ developes systems for billing administra systems for billing, administra ative systems purchase wages ative systems, purchase, wages, personell administration for t the researsh and educational sector Abuse/ CERT (but same peo ople involved/ coordinating between NOC and CERT) Sales / process of connnect Sales / process of connnect ting new customers (though ting new customers (though we assist in support/ troubles shooting)
NOC (re-)organizin ng Merged network and servic M d k d i ce department d Within this new departmen nt we are organized in cross- department sections (exper p ( p rt groups) g p ) Internal Helpdesk and NOC C are standalone sections 13 Coordinating through depar rtment meetings, and by key people beeing members of people beeing members of several sections several sections. Establishing trouble-ticket q queues per section. Tickets that are not resolve ed by NOC are distributed to the responsible expert grou h ibl ups. Handover from NOC queu e to another section queue would be triggered by comp gg y p pexity, lacking or obsolete p y g documentation or repeating g issues.
Uninett NOC - futu ure Thus far we have not tickete T f ed everything, and no SLA's - S A' Best effort policies and a goo od track record have kept us out of trouble so far. Demand (authorities / gove rment) for contingency plans 14 and quality systems- Critical n national infrastructure. 15 of our larger customers 15 of our larger customers (University Colleges) have (University Colleges) have joined rescources- implement ting FreeTIL - "ITIL light" We are participating in this project as observers. This developement is expect ted to further raise demands and expectations towards ou r NOC - If we are not explicitly implementing ITIL p y p g we need to have system that y adresses the same needs and s supports and documents what we do and why we choose to do it this way. Quality control system Quality control system
Recommend
More recommend