FEDERICA – Beauty and the NOC Tomasz Sroczynski, PSNC 7th TF-NOC meeting, Poznan, 12.12.12
What is FEDERICA? – main idea and purposes FEDERICA – Federated E-infrastructure Dedicated to European Researchers Innovating in Computing Network Architectures FEDERICA allows researchers (NRENs as well as individuals, e.g. PhD students) to have their own testbed for network R&D purposes. Future Internet research activities are privileged (like new routing protocols). Researcher has it’s own „ slice ” of isolated (VLAN, MPLS), virtualised computing and network infrastructure that allows to perform even disruptive experiments. Research on layers 2-7 of ISO OSI model is possible. FEDERICA is operational since 2009, currently it is supported as a part of GN3 SA1 T3 (GARR as coordinator). www.fp7-federica.eu
FEDERICA vs Clouds Is FEDERICA a Cloud? FEDERICA IaaS Complex topologies, custom Simple clusters, one VLAN ID, Complexity routing, up to 100 VLAN IDs for NAT user User ’s choice Yes No network topology No (except Amazon – special Special purpose High performance MX LR purpose hardware for GPU hardware instances processing) Computing/storage Limited High capabilities Reproducibility Yes (e.g. QoS) No Research on the Internet or Usually providing extra storage Purposes networks in general, down to L2 of space in servers/data centers ISO OSI model It may be, if TaaS will be considered as a part of Cloud services.
Topology Core PoPs – full mesh topology: - Juniper MX480: dual CPU, 1 line card with 32 GbE ports, virtual and logical routing, VLAN, MPLS, IPv4, IPv6; - 2x Sun Fire X2200 M2: 2x quad core AMD@2GHz, 32GB RAM, 8x 1000/100/10 Ethernet NIC, 2x 500GB HDD, VMware ESXi 5.0 Non-Core PoPs: - Juniper EX3200; - 1x Sun Fire X2200 M2
Use cases (projects) and their slices IIDS – Intelligent Intrusion Detection System VMSR – Virtual Multi-Stage Routing NOVI – Networking innovations Over Virtualized Infrastructures BonFIRE – Building service testbeds for Future Internet Research and Experimentation CONFINE IP – Community Networks Testbed for the Future Internet
FEDERICA NOC Currently (since 2011) the FEDERICA NOC is run entirely by Poznan Supercomputing and Networking Center: • Monitoring of the FEDERICA infrastructure 24 hours a day, 365 days a year; • Substrate management – administration, maintenance and provisioning of the FEDERICA resources; • User support – network slices provisioning, user helpdesk; Not a typical NOC: • Virtual infrastructures over the substrate; • Not only networking maintenance performed;
NOC general routine and procedures FEDERICA NOC shape is still evolving. However, we’ve developed a NOC routine duties list that consists of: • constant substrate monitoring (Nagios, G3, vSphere, SSH), • constant slices monitoring (Nagios, G3, vSphere), • regular devices’ health status lookup (vSphere, SSH), • regular (or after crucial changes have been made) Juniper equipment configuration backups (SSH), • regular (or before risky maintenance work) VM backups (vSphere).
NOC recent activities In the last few months performed activities mostly involved: • V-Nodes and Juniper equipment cleanup, • FEDERICA Wiki update and cleanup, • major and minor fixes, including restoration of one PoP switch. One of the main tasks was to perform V-Nodes migration from VMware ESXi 3.5 to 5.0 version. In collaboration with PSNC NOC: • PSNC NOC procedures for support FEDERICA in substrate monitoring, • unique mailing list for issue reporting, • new TTS group with access for FEDERICA NOC members – also people outside PSNC (from other participating NRENs) that are relevant. We’ve also developed new release of FEDERICA User FAQ along with our internal knowledge base for present or future FEDERICA NOC: information, procedures, JunOS configuration examples, troubleshooting advice.
Our tools There are several tools that look to be obvious to use: Nagios, RT, SSH, eLOMs … And some other: FEDERICA Wiki, vSphere.
Our tools (continued) G3 (developed and maintained by CESNET) monitors all substrate elements: links, routers, interfaces, virtualization servers and every single VM etc. (over 50,000 entities monitored).
Our tools (continued) G3 produces graphs from between one hour statistics up to one year statistics. Monitors e.g. memory activity, network data I/O rate, CPU usage of both hosts/VMs etc.
Issues met (example 1) Case: eLOM interfaces for all the servers (two V-Nodes and User Access Server) in PSNC were unavailable Investigation: no pings, „CRITICAL” in Nagios … but all three interfaces went down simultaneously, in one minute! Diagnosis: patchord from the switch between MX and the V-Nodes was accidentally unplugged … This case taught us not to reject even such possibilities that sound silly.
Issues met (example 2) Case: several interfaces on REDIRIS V-Node were indicated by G3 as being down Investigation: in vSphere relevant vmnics were up along with corresponding EX interfaces, traffic was alright Diagnosis: MIB was to be updated by ” set interfaces ge-x/y/z unit 0 bandwidth 1g ” configuration statement In general the G3 monitoring system provided along with Nagios by CESNET does the work and is a very helpful tool for NOC indeed.
Next steps (TODOs) FEDERICA NOC is going to perform operations which will help to keep things in order and thus operate on the substrate faster and easier: • solving any other remaining issues, • rearrangement of FEDERICA address pool usage, • ACL update.
Q&A Thank you for your attention!
Recommend
More recommend