Data Needs for Sampling the Internet to Measure Performance Juana - PowerPoint PPT Presentation

1 Data Needs for Sampling the Internet to Measure Performance Juana Sanchez UCLA Statistics In this talk, I will give a brief survey of the work that statisticians are doing to try to model the Internet with statistical models. Objective of my interest in Internet data analysis: introduce undergraduate/graduate students in Statistics courses to the field and motivate them to propose ideas and solutions. ISMA Data Catalog 2004 Workshop. UCSD, Thursday, June 3rd 2004

2 Outline 1. Probabilistic modeling 2. Single node data analysis 3. Network tomography 4. Network Topology Identification 5. Sampling 6. Other 7. Conclusions

3 1. Probabilistic modeling • Assume a probability model for the process, e.g., packet counts follow a mixture of Poisson distributions with parameters λ i i = 1 , ....., k . Or bytes counts follow an infinite source Poisson model... • Use a random sample to estimate the parameter λ • Attach a standard error to the estimate, express degree of confidence in the estimate. • Properties of the estimators usually large sample properties. • Many attempts to model the Internet until now are attached to known probability models, no matter how complex the process. Also attached to independence assumptions to large extent.

4 2. Single node (link) data analysis • SAMSI (Statistical And Applied Mathematical Sciences Institute) Program: Network Modeling for the Internet. 2002-2004. Internet Statistics and Research Consortium (forthcoming) ⋆ Probabilists (heavy traffic queueing theory and fluid models), Internet measu- rers, statisticians ⋆ Statistical characterization of traces. Long range dependence property and scale invariance -Estimate hurst parameter. What causes the burstiness of traces? Synthetic and real traces. Bytes and packet counts. Effect of time scales. Which traces are similar? ⋆ Wavelet spectrum of byte counts � = spectrum of packet counts. SIZER-helps see how wavelet features correspond to trace features. For example, does a burst correspond to a bump in the spectrum? ⋆ Experiments: see if changing parameters of a synthetic network changes the burstiness..

5 ⋆ Study trace driven queues; effects of different utilization or buffer size scales on packet loss and queue length. Trace is the customers arriving at rate r(t). Many problems with assumptions. Traces that look similar under various statistical measures (such as the Hurst index) can exhibit rather different behavior under queueing simulation. • Streaming data graphics (Wegman, E et al.2003). • Long tradition of research in this direction: Taqqu, Willinger, Vexson,etc... • CAIDA–Broido et al... New traces show signs of Poisson assumption. Are we back to old queueing models for networks?

6 2.General Network Tomography Models • Objective: Estimating source-destination traffic intensities from link data (i.e., counts).” (Vardi, 1996, JASA 91 (433),pp 365-377 • Inference of the internal link delay distribution through multicast end-to-end measurements. • Estimate packet loss. • There are many possible combinations of internal link delays. The point is to estimate the most likely combination. • Pseudo-Maximum Likelihood estimation of the intermediate paths • OD matrix inference of counts through link-based counts

7 • Intricate details regarding network transportation are ignored. • Bin Yu and associates at Berkeley (Sprint Europe to compare with AT&T data sets, Lucent Technologies network-4 nodes). Use a pseudo-likelihood approach (likelihood for smaller subparts,ignoring dependence between the subparts). The multicast tree is broken into parts. ⋆ Fixed routing matrix unknown Y = AX . Y and A are observed. X is the unknown. A is the routing parameters. ⋆ Normal models with variances function of the mean (to mimic Poisson..).

8 3.- Network Topology Identification • Characterizing the structure of the network • The network structure determines the delays. So, from the final delays, they do agglomerative cluster analysis and find the link tree-structure in the middle. • Object similarities rather than object features determine the clusters. • Probing non-TCP traffic in small university network. • Rui Castro and R.Nowack, Rice University. IEEE Transactions, Coates.

9 Sampling • Provisioning of information about a specific characteristic of the parent po- pulation at a lower cost than a full census would demand. Could use filtering (mask/match, hash-based), or sampling algorithms such as: Systematic sampling, random sampling, probabilistic sampling, etc. • Sampling already used in routers. But it is important to infer about the unsampled flows using what we know about the sampled ones. • Crucial to determine the needed type of information and the desired degree of accuracy in advance. E.g., What kind of metric? number of packets, packet size distribution? • Internet Engineering Task Force (Duffield, N.G. et al.-AT&T Labs), K.C. Claffy (ISMA)

10 Other • Computer Intrusion Detection-Marchette et al. • Visualization tools for streaming data from internet packet headers. Evolutionary graphics that discard data as it is analyzed because it can not be stored -Wegman et al. • Kolaczyk et al (Boston University): Principal components analyisis of complete set of OD flow time series from Sprint-Europe and Abilene. Find small dimensio- nality. Suggest decomposition of series into common periodic trends, short-lived bursts and noise. • Other work at CAIDA

11 Conclusions • Statisticians have not studied an awful lot of “whole network” data sets. • Whole estimation of network-wide characteristics done usually in collaboration with engineers and computer scientists. • more network-wide data sets similar to those already studied would help validate existing probability models. • Perhaps more joint work of statisticians with the engineers/computer scientists may lead to more useful probability models. • Good way to start, maybe: if we repeat the same models with wider network data, what would happen? • Having the priority questions clear helps determine data needs and useful sampling.

13 Thank you for going through this document. You may proceed reading the report on PPower4, the software tool which was used to prepare this demo. Please note that the report only describes the initial development, not the current state. Hit Esc to leave FullScreen mode. Select the appropriate entry in the View menu to return to this mode. Back to the page displayed before.

Data Needs for Sampling the Internet to Measure Performance Juana - PowerPoint PPT Presentation

1 Data Needs for Sampling the Internet to Measure Performance Juana Sanchez UCLA Statistics In this talk, I will give a brief survey of the work that statisticians are doing to try to model the Internet with statistical models. Objective of

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling

Overview of Sampling Topics (Shannon) sampling theorem Impulse-train sampling

Internet Measurements Dr. Vaibhav Bajpai 1. Measure Adoption 2. Measure Performance 3. Measure

1 Introductions Measure H: Background Measure H: Bond Program Progress Measure H:

Regional Measure 3 May 16, 2017 SFMTA Board of Directors Regional Measure 3 Prior Regional

Polynomial Julia sets with positive measure Why bother? Quasiconformal NILF Measure 0? Measure

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Impr oving Memor y Hier ar chy Per for mance For Ir r egular Applications J ohn Mellor- Crummey

CnC for Tuning Hints on OCR Nick Vrvilo, Rice University The 7 th Annual CnC Workshop September

Translation Caching: Skip, Dont Walk (The Page Table) Thomas W. Barr, Alan L. Cox, Scott

Magnifying (unknown) rare clusters to increase the chance of detection, using unsupervised

Op#miza#on of High-Order Stencils* Kevin Stock

NETWORK COMMUNITY DETECTION IN PRACTICAL SCENARIOS Lovro Subelj University of Ljubljana

Solving Linear and Integer Programs Robert E. Bixby ILOG, Inc. and Rice University Outline

A formal proof of Borodin-Trakhtenbrots Gap Theorem Andrea Asperti DISI, University of

Data Needs for Sampling the Internet to Measure Performance Juana - PowerPoint PPT Presentation

1 Data Needs for Sampling the Internet to Measure Performance Juana Sanchez UCLA Statistics In this talk, I will give a brief survey of the work that statisticians are doing to try to model the Internet with statistical models. Objective of

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Sampling Distributions Sampling Distribution of the Mean &amp; Hypothesis Testing Sampling

Overview of Sampling Topics (Shannon) sampling theorem Impulse-train sampling

Internet Measurements Dr. Vaibhav Bajpai 1. Measure Adoption 2. Measure Performance 3. Measure

1 Introductions Measure H: Background Measure H: Bond Program Progress Measure H:

Regional Measure 3 May 16, 2017 SFMTA Board of Directors Regional Measure 3 Prior Regional

Polynomial Julia sets with positive measure Why bother? Quasiconformal NILF Measure 0? Measure

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Impr oving Memor y Hier ar chy Per for mance For Ir r egular Applications J ohn Mellor- Crummey

CnC for Tuning Hints on OCR Nick Vrvilo, Rice University The 7 th Annual CnC Workshop September

Translation Caching: Skip, Dont Walk (The Page Table) Thomas W. Barr, Alan L. Cox, Scott

Magnifying (unknown) rare clusters to increase the chance of detection, using unsupervised

Op#miza#on of High-Order Stencils* Kevin Stock

NETWORK COMMUNITY DETECTION IN PRACTICAL SCENARIOS Lovro Subelj University of Ljubljana

Solving Linear and Integer Programs Robert E. Bixby ILOG, Inc. and Rice University Outline

A formal proof of Borodin-Trakhtenbrots Gap Theorem Andrea Asperti DISI, University of

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling