CPN for Traffic Engineering eth0:10.0.10.10� eth1:10.0.12.10� eth0:10.0.10.30� eth2:10.0.14.10� eth1:10.0.13.30� eth3:10.0.15.10� eth2:192.168.2.30� 10.0.10.0� eth0� eth0� IP� eth2� CPN-NODE� eth3� a10� CPN-NODE� e Router_Wan=ISP� t h a30� 1 � � 1 h 1 t e � 0 0 . . 5 0 . 1 1 . 0 2 . . 0 0 1 � eth1� IP� 10.0.14.0� eth0� CPN-NODE� HTTP Client� a100� eth2� eth0:10.0.17.100� eth1:10.0.15.100� 10.0.16.0� eth2:10.0.16.100� � 0 . 3 1 . 0 10.0.17.0� . e 0 t 1 h 1 eth1:10.0.17.7� � eth1� eth3� 10.0.11.0� IP� CPN-NODE� eth0� eth0� eth2� a20� CPN-NODE� Router_Wan=ISP� a40� eth0:10.0.11.20� HTTP - SERVER� eth0:10.0.11.40� eth1:10.0.13.20� eth1:10.0.12.40� eth0:X.X.X.X� eth2:10.0.14.20� eth2:192.168.1.40� eth3:10.0.16.20�
CPN Learns Paths with Minimum Hops Average Path length with different background traffic without background traffic 1.6Mbps background traffic 3.2Mbps background traffic 6.4Mbps background traffic 11 11 11 11 10 10 10 10 9 9 9 9 8 8 8 8 7 7 7 7 path length (hops) path length (hops) path length (hops) path length (hops) 6 6 6 6 5 5 5 5 4 4 4 4 3 3 3 3 2 2 2 2 delay delay delay delay 1 1 1 1 hops hops hops hops hops+Delay hops+Delay hops+Delay hops+Delay 0 0 0 0 2 4 6 8 2 4 6 8 2 4 6 8 2 4 6 8 rate (Mbps) rate (Mbps) rate (Mbps) rate (Mbps)
15 Route Length 10 5 0 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Packet Number 30 25 Percentage used (%) 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 Route No. • 12 routes discovered and 4 of them are shortest path • Only about 51% of DPs use these 4 shortest paths
Delay with different background traffic 3.2Mbps background traffic delay hops hops+Delay without background traffic delay hops hops+Delay dumb packet delay (ms) 2 10 Medium dumb packet delay (ms) 2 10 1 10 1 2 3 4 5 6 7 8 rate (Mbps) 6.4Mbps background traffic delay hops hops+Delay dumb packet delay (ms) 1 2 10 10 High 1 2 3 4 5 6 7 8 rate (Mbps) 1 No background traffic 10 1 2 3 4 5 6 7 8 rate (Mbps)
Genetic Algorithms for CPN • Exploit an analogy between “ genotypes ” and network paths • The fitness of the individual is the QoS of the path • Selection function chooses paths for reproduction based on the QoS of paths • The genetic operation used to create new individuals is crossover
Crossover D y e f x S D b y a c z S D y z x S b D a c y e f S
No background traffic 2.4Mb background traffic Without background traffic About 1.6Mbps background traffic GA−disabled GA−disabled GA−enabled GA−enabled 2 2 10 10 Forward Delay (ms) Forward Delay (ms) Delay 1 1 10 10 0 0 10 10 100 200 300 400 500 600 700 800 100 200 300 400 500 600 700 800 Packets Input Rate Packets Input Rate Without background traffic About 1.6Mbps background traffic 100 100 GA−disabled GA−disabled GA−enabled GA−enabled 90 90 Loss 80 80 70 70 60 60 Loss Rate (%) Loss Rate (%) 50 50 40 40 30 30 20 20 10 10 0 0 100 200 300 400 500 600 700 800 100 200 300 400 500 600 700 800 Packets Input Rate Packets Input Rate
4 Mb background traffic 5.6Mb background traffic About 5.6Mbps background traffic About 4Mbps background traffic GA−disabled GA−disabled GA−enabled GA−enabled 2 10 2 10 Delay Forward Delay (ms) Forward Delay (ms) 1 10 1 10 0 0 10 10 100 200 300 400 500 600 700 800 100 200 300 400 500 600 700 800 Packets Input Rate Packets Input Rate About 4Mbps background traffic About 5.6Mbps background traffic 100 100 GA−disabled GA−disabled Loss GA−enabled GA−enabled 90 90 80 80 70 70 60 60 Loss Rate (%) Loss Rate (%) 50 50 40 40 30 30 20 20 10 10 0 0 100 200 300 400 500 600 700 800 100 200 300 400 500 600 700 800 Packets Input Rate Packets Input Rate
8 Mb/s background traffic About 8Mbps background traffic About 8Mbps background traffic 100 GA−disabled GA−enabled 90 80 2 10 70 Forward Delay (ms) 60 Loss Rate (%) 50 1 10 40 30 20 10 GA−disabled GA−enabled 0 10 100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 700 800 Packets Input Rate Packets Input Rate
Experiments with Energy Savings Measurements on Feasibility Using our 46-node Laboratory Packet Network Test-Bed: E. Gelenbe and S. Silvestri, ``Optimisation of Power Consumption in Wired Packet Networks,'' Proc. QShine'09, 22 (12), 717-728, LNICST, Springer Verlag, 2009.
Power Measurement on Routers
Example of Measured Router Power Profile
Experiments with a Self-Aware Approach Minimise Power subject to End-to-End Delay (80ms) Constraint [10] E. Gelenbe, ``Steps Toward Self-Aware Networks,'‘ Comm. ACM, 52 (7), pp. 66-75, July 2009. [15] E. Gelenbe and T. Mahmoodi “Energy aware routing in the Cognitive Packet Network”, presented at NGI/Co July 2010, submited for publication. Measuring Avg Power Over All Routers Vs Average Traffic per Router
Power and Delay with EARP Energy Aware Routing Protocol
Power Savings and QoS using EARP
Wireless Power-Aware Adhoc CPN Test-Bed 192.168.2.20� )� X� .� 8� 0� 1� 192.168.2.102� .� 0� 7� 1� .� 2� 3� 1� (� � k� r� o� w� t� e� N� � e� 192.168.2.1� 192.168.2.101� c� n� Network� e� i� S�c� Address� 192.168.2.30� � r� Translation� C� I� S� C� O� S� Y� S� T�E� M� S� e� t� u� p� m� o� C� 192.168.2.103� Internet� 192.168.2.40�
Monitoring Architecture ∗ Agent-based monitoring solution, consisting of the ∗ monitoring managers (MMs), and ∗ monitoring agents (MAs). ∗ Monitoring agent ∗ Software agent collecting local monitoring metrics ∗ Resides on nodes (hosts and VMs) ∗ Interfaces with the monitoring manager ∗ Uses collectd and RRDtool ∗ Monitoring manager ∗ Controls and configures monitoring via the agents ∗ Provides an interface to the monitoring system 51
The Monitoring Agent • Controller configures and controls probing of local resources by the collectd daemon. • Controller starts, stops and restarts collectd as necessary. • collectdmon monitors the collectd process and automatically restarts it if it fails. • Periodic measurements from probes are written to a local DB via RRDtool. • Controller reads measurements from the local DB via RRDtool to perform computations and make them available for reporting, acting as a mailbox. 52
Monitored Metrics • System metrics CPU – CPU, memory, disk, network, load, processes • Application metrics – SQL databases, Apache Apache (bytes/sec) server, memcached • Network metrics – Latency, jitter, loss 53
Collection of Monitored Data • Monitoring agents (MA) record periodic measurements of multiple metrics to a local repository. • Monitoring manager collects the data from the MAs via smart reports (SRs), dumb reports (DRs) and acknowledgements (ACKs) – Smart reports are sent periodically by the MM, and collect the monitoring data from each hop (MA) they visit (via the corresponding ACK). – Smart reports are routed probabilistically based on the random neural network [2] at each MM and MA, therefore prioritizing parts of the cloud from which to collect data. – The ACK sent by the last MA in the chain carries the monitoring data back to the MM and updates the RNNs. [2] E. Gelenbe. Random neural networks with negative and positive signals and product form solution . Neural Computation, 1(4): 502-510, 1989. 54
Collection of Monitored Data • MMs and MAs have an RNN for each QoS goal as related to the given requirements for monitoring. • The users of the monitoring data provide the QoS goals, such as – system-related goals (e.g. free memory must be > 20% of total memory), – network-related goals (e.g. loss rate between VM 1 and VM 2 must be < 0.1%), – application-related goals (e.g. memory and CPU used by the Apache server must be < 80% of total system resources), and – monitoring-related goals (e.g. monitoring data from a host must be not be older than 10 minutes) • Each RNN has a neuron representing each resource (MA) connected to the MM or MA • The weights of the neurons are used to decide which parts of the cloud are prioritized for monitoring. – Smart reports are sent probabilistically based on the weights of the neurons. 55
Machine Learning Based Overlays: SMART Proxy 1. It monitors the quality of the Internet paths by sending probe packets to other proxies, 2. It routes each incoming packet sent by a local application to its destination. Transmission/Reception Agents 1. TA intercepts the packets of the application and forwards them to the local Proxy. 2. RA receives packets from the Proxy and delivers them to the local application. 56
Packet Routing/Forwarding 57
Main challenge: scalability • All-pairs probing : monitoring the quality of all overlay links is excessively costly, and impairs scalability • SMART : The routing decisions are made on-line at the source proxy based on adaptive learning techniques using a random neural network (RNN) Probing does not cover all possible paths but only a few paths which have been observed to be of good quality, exploring also at random other paths whose quality might have improved recently SMART uses a limited monitoring effort but achieves asymptotically the same average (per round) end-to-end latency as the best path 58
Minimising Latency IP route SMART Melbourne/Gibraltar 390.0 274.6 Narita/Santiago 406.7 253.8 Moscow/Dublin 179.9 81.7 Honk Kong/Calgary 267.1 130.7 Singapore/Paris 322.3 154.1 Tokyo/Haifa 322.6 180.8 Experiment with 20 nodes Average RTT (ms) for some of the NLNog ring pathological OD pairs
Latency Minimization RTT Japan/Chile (zoom) RTT Japan/Chile over 5 days
Throughput Maximization IP route SMART Dublin/Sydney 11.5 37.5 Singapore/Sao Paulo 12.8 42.0 Sydney/Virginia 8.5 52.3 Virginia/Singapore 7.4 33.8 Virginia/Sydney 6.9 35.0 Virginia/Tokyo 10.3 39.7 Experiment with Amazon Average Throughput (Mbps) for some EC2 pathological OD pairs
Throughput Maximization Throughput from Virginia to Sydney Throughput from Virginia to Sydney (zoom) over 4 days
SMART : offline validation • Carried out using two approaches (simulation and emulation) Based on the interaction between NEST and SMART, Aimed at enabling Large-scale experimentation in a controlled environment, • The simulation approach relies on the ability of NEST kernel to simulate complex underlay networks and evaluate relevant key performances, SMART to optimize overlay routing based on computed performance indicators, • The emulation approach is built using an emulated network (by CORE), real applications (VMs) connected to it, and the SMART system. The simulation kernel drives this emulation by Generating stochastic traffic, injecting adverse events and computing overlay link performance, Changing link parameters within the emulation environment 63
SMART : offline validation • We also developed an overlay Hypervisor which is a GUI for the SMART system. • This hypervisor allows to follow at each time instant – Routing decisions, – And the metric evolution for each overlay path • The Hypervisor can be used either – In real-time, displaying the current health status of an application overlay – Or to replay the evolution of key metrics within a specific time-frame, based on historical data. 64
SMART : offline validation • We performed several measurement campaigns, each lasting 24H – The results presented are a zoom over 3H periods, – Tests used either delay or bandwidth probes • Typically SMART enhances network performances • As an example for downloading a 10MB file – SMART achieves • An improvement of 124% of available bandwidth • User total delay for the session – On the IP path : 1m52s (737.4 Kb/s) – Over SMART Paths : 33s (2.42 Mb/s) – A 70.5% improvement 65
SDN Based Network Control: H2020 SerIoT Research & Innovation Project
SDN Based Network Control: SerIoT Project • Use CPN and Smart Packets to actively collect measurement data in IoT Network • Measurements Concern Security, Qos and Energy • Define Relevant Goal Functions and Path Sensitivity Function • Data reaches SDN routers which take decisions based on the above
SDN based Network Control The Goal Function for Security is rf,e) the “rejection factor”: node e, path f r(f,e) = [ S(f,e) – T(f,e) ] + , or for path P S(f,e) is estimated from attack detection, T(f,e) is a tolerance threshold r(f,P)= Σ e ε P r(f,e), or r(f,P)= max e ε P r(f,e) G(f,P) = a.r(f,P)+b.Q(f,P)+c.E(f,P)
SDN based Network Control Experiments
SDN Network Control: Reinforcement Learning and SDN Reaction Time to QoS
SDN Reaction Time to Security Alert
Self-Aware Cloud Task Allocation Platform RN N Heavy load Medium load Light load 72
ML Task Allocation for the Cloud • Online QoS-driven task allocation Platform for the cloud: – Performs autonomous task-to-resource allocations based on given QoS goals for the tasks – Uses concepts similar to those presented for the pervasive monitoring system and SMART, i.e. learning using online measurements and random neural networks – Aims to optimize the performance related to the QoS requirements requested by end users, e.g. job response time , economic cost, etc. • The approach is to: – Do online measurements related to the given QoS goal(s) of tasks that need to be allocated to nodes (hosts, VMs) on the cloud – Adaptively learn the “best nodes” to which tasks should be allocated, depending on their QoS goals – Allocate incoming tasks to nodes based on the learned information, and continue to learn from the allocation decisions by monitoring the QoS received by the tasks 73
QoS Metrics Relevance: Relevance: Metric web servers, batch jobs, Metric Category Search MapReduce engines jobs Response time High Low Application level Throughput High High CPU utilization Low High System level Memory Low High utilization Economic cost Medium to Medium to High High Other Energy Medium to Medium to consumption High High 74
Allocation Strategies RNN with reinforcement learning [3] • – Each RNN corresponds to a job class with a specified QoS goal – Each neuron corresponds to the choice of a host in a cluster. – Smart reports are sent to the most excited neuron. – Jobs are dispatched to the most excited neuron, resulting in the minimizing of the QoS goal. Sensible decisions [4] • – Smart reports are sent probabilistically based on the history of measurements. – No RNN. History is kept as an exponentially weighted moving average (EWMA) per QoS goal. – Jobs are dispatched probabilistically using the EWMA metric for their QoS goals. • Round robin [3] E. Gelenbe, K.-F. Hussain. Learning in the multiple class random neural network . IEEE Trans. Neural Networks, 13(6): 1257-1267, Nov. 2002. [4] E. Gelenbe. Sensible decisions based on QoS . Computational Management Science, 1(1): 1-14, Dec. 2003. 75
Experimental Setup ∗ One job controller ∗ Three hosts ∗ have different processing speed and I/O capacity. ∗ Software components: ∗ Job generator (clients) ∗ Job dispatcher (controller) ∗ Job executor (hosts) 76
Experimental Results Scenario: Hosts with distinct processing speeds RNN vs. Sensible Allocation vs. Round Robin Detail from figure on right 77
Experimental Results ∗ Scenario : ∗ Different service (job) classes in terms of resource requirement ∗ CPU intensive services and I/O bound services ∗ Different background load on hosts, stressing CPU and I/O differently 78
SLA Penalties 10000 8000 Penalty (for Job1) 6000 4000 2000 0 0 50 100 150 200 250 300 Job execution time (second) 10000 8000 Penalty (for Job2) 6000 4000 2000 0 79 0 500 1000 1500 2000 2500 3000 Job execution time (second)
Experimental Results for Penalties 80
Experimental Results Scenario : • Hosts with distinct power consumption and speed Measured Job Dispatching Probabilities to minimise the Energy-QoS goal function for different relative importance attributed to response time (a) and energy consumption (b) versus Load in Jobs/sec 81
Experimental Results Measured energy consumption per job (in Joules) and job response time at the hosts (in seconds) averaged over the three hosts as a function of job arrival rate λ 82
Task Allocation across Multiple Clouds ∗ A TAP deployed in each Cloud ∗ The routing overlay, “SMART”, which includes a set of proxies installed at different cloud servers, or in other servers over wide area networks. ∗ Allocating the incoming jobs to the local or remote clouds depends on ∗ Cloud processing delay ∗ Network delay for job transfer 83
Experimental Results Weighted average network delay over time on the connections to the three remote clouds Derived from measurement of the round-trip delay and loss via pinging The network performance is optimized using the routing overlay, “SMART” over clouds 84
Experimental Results The measured response time for each web request as time elapses; the different colours represent the different clouds where the requests are allocated and processed. As long as the local cloud’s response time is low, tasks are processed locally. When the local response time increases significantly tasks are sent to the remote cloud, and then when things improve at the local cloud, they are again executed locally. 85
Distributed DoS
Issues that we have adressed • Detection – Pattern detection – Anomaly detection – Hybrid detection – Third-party detection • Response – Agent identification – Rate-limiting – Filtering – Reconfiguration
CPN DDoS Defence Scheme • The CPN architecture traces flows using smart and ACK packets • A DoS produces QoS degradation • The user(s) and victim(s) detect the attack and inform nodes upstream from the victim(s) using ACK packets • These nodes drop possible DoS packets • The detection scheme is never perfect (false alarms & detection failures)
Mathematical model (1) • Analyses the impact of DDoS protection on overall network performance • Measures traffic rates in relation to service rates and detection probabilities
Mathematical model − j 1 ∏ n n = λ − − I (( 1 L )( 1 f )) n , n n , n n n , n j j l l = l 1 j − 1 ∏ d d = λ − − I (( 1 L )( 1 d )) d , d d , d d d , d j j l l = l 1 + B 1 − ρ 1 i B = ρ ⋅ i L i i i − ρ 1 i n d ρ = − + − s ( I ( 1 f ) I ( 1 d ) ) i i i , n i , n i , d i , d n d
Illustration on an Experimental CPN Test-Bed
Predictions of the Model
Experiments • 2.4GHz P4 PCs, Linux kernel 2.4.26, CPN • Different QoS protocols for normal and attack traffic • 60 sec
Trace1 --- Attack Traffic Averaged Likelihood False Alarms: 2.8 % Correct Detections: 88 % Feedforward RNN False Alarms: 11 % Correct Detections: 96 % Recurrent RNN False Alarms: 11 % Correct Detections: 96 %
Trace2 --- Attack Traffic Averaged Likelihood False Alarms: 0 % Correct Detections: 76 % Feedforward RNN False Alarms: 8.3 % Correct Detections: 84 % Recurrent RNN False Alarms: 2.8 % Correct Detections: 80 %
Stable Networks in the Presence of Failures • A node failure is emulated by disabling some or all of the Ethernet interfaces of a node (so that no traffic will be able to go through that node, just like in a real breakdown of a machine) and is restored by re-enabling all the interfaces. • Failures are considered to be similar to worm attacks and their spread is modelled according to the AAWP (Analytical Active Worm Propagation) model. This is a discrete- time and continuous state deterministic approximation model, proposed by Chen et al. [1] to model the spread of active worms that employ random scanning. • The AAWP is a discrete time model which is more realistic, since a node must be completely infected before it starts infecting other node, so that the speed of the spread is connected to the number of nodes that can be infected. • AAWP uses the random scanning mechanism in which it is assumed that every nodes is just as likely to infect or be infected by others. Other scanning mechanisms can be used, such as local subnet, permutation and topological scanning.
Experiments
DoS on a streaming video
Recommend
More recommend