Different Goals for our NMS Many uses for Internet-scale path - PDF document

Different Goals for our NMS • Many uses for Internet-scale path measurements: Towards a High Quality Path- – Discover network trends, find paths – Building network models oriented Network Measurement – Run experiments using models and data and Storage System • A different design point on the NMS spectrum: – Obtain highly accurate measurements David Johnson , Daniel Gebhardt, Jay Lepreau – … from a resource-constrained , unreliable network – … for multiple simultaneous users School of Computing, University of Utah – … sometimes at high frequency www.emulab.net – ... and return results fast and reliably 2 Flexlab: a Motivating Use Case Requirements • Problem: real Internet conditions matter, but • Shareable can make controlled experiments difficult – Anticipate multiple users • Flexlab [NSDI 07]: integrate network models – Frequent simultaneous probing can cause self- interference, and increase cost into emulation testbeds (i.e., Emulab) – Amortize cost of measurements by removing probe – Example: network models derived from PlanetLab duplication across users • How it works: • Reliable – Measure Internet – Reliably buffer, transmit, and store measurements paths in real time – Probing & storage should continue when network – Clone conditions partitions disrupt control plane in Emulab 3 4 Requirements, cont’d Hard System To Build! • Accurate • End-to-end reliability – Need best possible measurements for models – Data transfer and storage, control • Safe – PlanetLab: overloaded nodes, sched delays – Protect resource-constrained networks and nodes • Measurement accuracy vs resource limits from probing tools, and vice versa • And yet support high freq measurements – Limit BW usage, reduce probe tool CPU overhead • Adaptive & controllable • => We’re not all the way there yet – Function smoothly despite unreliable nodes – Modify parameters of executing probes 5 6 1

Flexmon Flexmon Overview • A measurement service providing shared, accurate, safe, reliable wide area path-oriented measurements Manager Client XML-RPC Manager Client – Reliable probing and results transfer & storage atop unreliable Server Manager Client networks and nodes Auto-Manager – Accurate , high freq measurements for multiple users despite Manager Client network resource limits Write-back – Transfers and exports results quickly and safely Cache Datapository • Not perfect, but good start • Deployed on an unreliable network, PlanetLab, for 2+ yrs Data Path Path Path Collector • Nearly 1 billion measurements Prober Prober Prober • Data available publicly via multiple query interfaces and the Web 7 8 User Interface Central Management • Authentication through Emulab • Manager applies safety checks to client probe requests: • Users request probes through manager clients – Reject if probe request is over frequency and – Type of probe, set of nodes, frequency and duration thresholds duration, and other tool-specific arguments – Can reject if expected bandwidth usage will – Users can “edit” currently executing probes to violate global or per-user limits change parameters • Estimates future probe bandwidth usage based off • Get measurements from caching DB via past results in write-back cache SQL 9 10 Background Measurements Probing • The Auto-manager Client requests all-pairs • A Path Prober on each node receives probe commands from the Manager probing for one node at each PlanetLab site – Assumption: all nodes at a site exhibit “identical” path • Spawns probe tools at requested intervals characteristics to other sites – Newer (early) generic tool support, although – Chooses least loaded node at each site to avoid safety not generalized latencies in process scheduling on PlanetLab • Multiple probe modes to reduce overhead • Assesses node liveness and adjusts node set – One-shot: tool is executed once per interval, • Uses low probe duty cycle to leave bandwidth returns one result for high-freq user probing – Continuous: tool is executed once; returns periodic results 11 12 2

Collecting & Storing Probing, cont’d Measurements • Probers maintain a queue of probe commands • Probers send results to central data collector for each probe type and path, ordered by over UDP frequency – Stable commit protocol on both sides – Serially execute highest-frequency probe – Collector drops duplicate results from retransmits – All users get at least what they asked for, maybe • Not perfectly reliable – i.e., cannot handle node more disk failures • Trust model: only allow execution of approved probing tools with sanity-checked parameters • Use write-back cache SQL DB for perf • Currently use two tools • Newest results in write-back cache are flushed – fping measures latency hourly to long-term storage in Datapository • Attempts to distinguish loss/restoration of connectivity from – Fast stable commit heavy packet loss by increasing probing frequency – Modified iperf estimates ABW 13 14 Searching the Data Deployment & Status • “Write-back cache” SQL DB • Probers run in an Emulab experiment, using Emulab’s portal to PlanetLab – Available to Emulab users by default • Managers, clients, and data collectors run on a – Fast but limited scope central Emulab server • Datapository containing all measurements – Use secure event system for management – Access upon request – Weekly data dumps to www • Running on PlanetLab for over 2 years • XMLRPC server – Some architecture updates, but largely unchanged – Can query both DBs over specific time periods over past year – Some system “hiccups” – i.e., our slice has been – More expressive query power (i.e., bandwidth-capped by PlanetLab FullyConnectedSet, data filtering, etc) – Set of monitored nodes changes over time 15 16 Measurement Summary PlanetLab Sites • Logfile snapshot of • Many measurements of pairwise latency 100-day period and bandwidth Site Availability Over Time 200 • Median of 151 sites • Latency measurements are 89% of total • System “restart” is 150 – 17% are failures (timeouts, name resolution Availability (sites) the big drop failures, ICMP unreachable) 100 • Available bandwidth estimates are 11% 50 – Of these, 11% are failures (mostly timeouts) 0 0 20 40 60 80 100 Time (days) 17 18 3

Node Churn Brief Look at Some Data • Typically 250-325 Node Churn Over Time nodes in slice 100 • Churn: number # of Nodes Leaving System 80 of newly 60 unresponsive 40 nodes at periodic liveness check 20 • 24-hour snapshot from Feb 0 0 20 40 60 80 100 – 100k+ ABW samples; 1M+ latency samples Time (days) • Latency vs bandwidth: curve approx BDP – Outliers due to method 19 20 Related Work More To Be Done… • S3: scalable, generic probing framework; data • More safety aggregation support – LD_PRELOAD, libpcap to track usage tool- – We need fast & reliable results path agnostically at probe nodes – Need support to limit probe requests when necessary – distributed rate limiting [SIGCOMM ’07]; scale – Also need adaptability for background measurements probe frequency depending on use • Scriptroute: probe scripts executed in safe • Add another user data retrieval interface environment, in custom language (pubsub would be nice) – No node-local storage, limited data output facilities • Increase native capabilities of clients • Others that lack shareability or reliable storage path; see paper – Adaptability, liveness 21 22 Conclusion Data! • Developed an accurate, shareable, safe, • http://utah.datapository.net/flexmon reliable system – Weekly data dumps and statistical summaries • Deployed on PlanetLab for 2+ years • Write-back cache DB available to Emulab users • Accumulated lots of publicly-available data • SQL Datapository access upon request; ask testbed-ops@flux.utah.edu 23 24 4

Different Goals for our NMS Many uses for Internet-scale path - PDF document

Different Goals for our NMS Many uses for Internet-scale path measurements: Towards a High Quality Path- Discover network trends, find paths Building network models oriented Network Measurement Run experiments using models and

SDMF EL (English Learner) Presentation EL Teachers in SDMF: Jessica Costa: NMS/HS Ann

The New Medicine Service (NMS) - Key factors for delivering a successful NMS Fin Mc Caul PSNC

Leyla Hannbeck Chief Pharmacist Twitter: @LeylaHannbeck NMS statistics: 2014/15 and 2015/16

Key factors for delivering a successful NMS Fin Mc Caul Workforce Development Lead, GMLPC

Current status and perspectives of NMs for space weather Rolf B utikofer University of Bern,

PRABANDH-NMS P ervasive R esilient A ccountable B hartiya N ational D istributed H ierarchical

NMS - Exploring NAV/OpenNMS/Monolith/Nagios NORDUnet Jonas Hagstrm Outline Background

DWDM system integration Roundtable Discussion Pieter Hanssens 7 th TF-NOC Poznan The optical NMS

OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR

MUR AND NMS RESPIRATORY TOOLKIT. Sian Retallick (C) Devon LPC 20/01/2017 WHAT IS IMPORTANT TO

after NMS and Seizures in a Patient with DiGeorge Austin Campbell, Pharm.D., BCPP Clinical

NETWORK MANAGEMENT SYSTEM October 2016 Overview Overview Product Product NETWORK

Nowcasting initiatives in Argentina Cynthia Matsudo and coauthors of ALERT.AR project NMS

Conductive textiles Signal transition: Can we get a signal? matthew.j.howard@kcl.ac.uk

BayesOD: Bayesian Inference for Fusing Epistemic and Aleatoric Uncertainty in Deep Object

Codec Control Requirements Draft draft-basso-avt-videoconreq-01.txt Andrea Basso NMS

Other monitoring tools Bartek Gajda Poznan Supercomputing and Networking Center

BISmark Platform 2015/16 Status and Roadmap Agenda Project Status & Architecture

Hands-On Ethical Hacking and Network Defense Second Edition Chapter 5 Port Scanning Objectives

Dual-stack IPv4+IPv6 monitoring with Nagios Teemu Kiviniemi, CSC/Funet 6th June 2012 6th TF-NOC

Vollautomatische Installationen mit FAI Grazer Linuxtage, April 2009 Thomas Lange, Uni K oln

Scientific Computing 2013 Maastricht Science Program Week 1 Frans Oliehoek

Playing FPS Games with Deep Reinforcement Learning Guillaume Lample, Devendra Singh Chaplot

Interactive Rendering of Large Unstructured Grids Using Dynamic Level-of-Detail Steven P. Callahan

Sambuz

Useful Links

Newsletter

Mail Us

Different Goals for our NMS Many uses for Internet-scale path - PDF document

Different Goals for our NMS Many uses for Internet-scale path measurements: Towards a High Quality Path- Discover network trends, find paths Building network models oriented Network Measurement Run experiments using models and

SDMF EL (English Learner) Presentation EL Teachers in SDMF: Jessica Costa: NMS/HS Ann

The New Medicine Service (NMS) - Key factors for delivering a successful NMS Fin Mc Caul PSNC

Leyla Hannbeck Chief Pharmacist Twitter: @LeylaHannbeck NMS statistics: 2014/15 and 2015/16

Key factors for delivering a successful NMS Fin Mc Caul Workforce Development Lead, GMLPC

Current status and perspectives of NMs for space weather Rolf B utikofer University of Bern,

PRABANDH-NMS P ervasive R esilient A ccountable B hartiya N ational D istributed H ierarchical

NMS - Exploring NAV/OpenNMS/Monolith/Nagios NORDUnet Jonas Hagstrm Outline Background

DWDM system integration Roundtable Discussion Pieter Hanssens 7 th TF-NOC Poznan The optical NMS

OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR SKILLS OUR

MUR AND NMS RESPIRATORY TOOLKIT. Sian Retallick (C) Devon LPC 20/01/2017 WHAT IS IMPORTANT TO

after NMS and Seizures in a Patient with DiGeorge Austin Campbell, Pharm.D., BCPP Clinical

NETWORK MANAGEMENT SYSTEM October 2016 Overview Overview Product Product NETWORK

Nowcasting initiatives in Argentina Cynthia Matsudo and coauthors of ALERT.AR project NMS

Conductive textiles Signal transition: Can we get a signal? matthew.j.howard@kcl.ac.uk

BayesOD: Bayesian Inference for Fusing Epistemic and Aleatoric Uncertainty in Deep Object

Codec Control Requirements Draft draft-basso-avt-videoconreq-01.txt Andrea Basso NMS

Other monitoring tools Bartek Gajda Poznan Supercomputing and Networking Center

BISmark Platform 2015/16 Status and Roadmap Agenda Project Status &amp; Architecture

Hands-On Ethical Hacking and Network Defense Second Edition Chapter 5 Port Scanning Objectives

Dual-stack IPv4+IPv6 monitoring with Nagios Teemu Kiviniemi, CSC/Funet 6th June 2012 6th TF-NOC

Vollautomatische Installationen mit FAI Grazer Linuxtage, April 2009 Thomas Lange, Uni K oln

Scientific Computing 2013 Maastricht Science Program Week 1 Frans Oliehoek

Playing FPS Games with Deep Reinforcement Learning Guillaume Lample, Devendra Singh Chaplot

Interactive Rendering of Large Unstructured Grids Using Dynamic Level-of-Detail Steven P. Callahan

Sambuz

Useful Links

Newsletter

Mail Us

BISmark Platform 2015/16 Status and Roadmap Agenda Project Status & Architecture