Modeling a Large Data-Acquisition Network in a Simulation Framework - PowerPoint PPT Presentation

1st IEEE Workshop on High-Performance Interconnectjon Networks towards the Exascale and Big-Data Era Chicago, 8 September 2015 Modeling a Large Data-Acquisition Network in a Simulation Framework Tommaso Colombo 1,2 Holger Fröning 2 Pedro Javier Garcìa 3 Wainer Vandelli 1 1 Physics Department, CERN 2 Instjtut für Technische Informatjk, Universität Heidelberg 3 Departamento de Sistemas Informátjcos, Universidad de Castjlla-La Mancha

Data-acquisition systems ● In a scientjfjc experiment, a data-acquisitjon (DAQ) system handles the experimental signals ● Main functjons: – Signal processing (e.g. analog-to-digital conversion) – Data gathering (collectjon of signals from difgerent devices) – Filter / Trigger (discarding faulty / uninterestjng data) – Storage ● Usually implemented as a mix of custom hardware and sofuware running on commodity hardware T. Colombo • Modeling a Large Data-Acquisitjon Network in a Simulatjon Framework HiPINEB • Chicago • 8 Sept 2015 2

Data-acquisition systems ● Key requirement: DAQ effjciency – Fractjon of correctly acquired experimental data – Ideally 100%: experimental data is precious! – An ineffjcient DAQ might introduce bias in the data ⬇ ● Stringent requirements on – System availability – Bufger depth – Latency T. Colombo • Modeling a Large Data-Acquisitjon Network in a Simulatjon Framework HiPINEB • Chicago • 8 Sept 2015 3

Data-acquisition systems ● Systematjcally studying the performance envelope of a DAQ system is diffjcult: – A DAQ system is a mission-critjcal component of an experiment – System availability for performance studies is limited – Hardware or system sofuware modifjcatjons are usually not possible ● Simulatjon models give more freedom – Must be accurate enough in reproducing the system's behavior – Must be reasonably fast T. Colombo • Modeling a Large Data-Acquisitjon Network in a Simulatjon Framework HiPINEB • Chicago • 8 Sept 2015 4

Case study: the ATLAS experiment Large scale machine built to discover and study rare partjcle physics phenomena T. Colombo • Modeling a Large Data-Acquisitjon Network in a Simulatjon Framework HiPINEB • Chicago • 8 Sept 2015 5

Case study: the ATLAS experiment Observes proton collisions delivered by the LHC accelerator at CERN T. Colombo • Modeling a Large Data-Acquisitjon Network in a Simulatjon Framework HiPINEB • Chicago • 8 Sept 2015 6

Case study: the ATLAS experiment ● Basic parameters: – LHC delivers a collision “event” every 25 ns (40 MHz) – Each event is separately detected and measured – An event corresponds to 1-2 MB ● The data-acquisitjon system incorporates a data fjltering component: – If all collision events were acquired ATLAS would produce up to 80 TB/s and hundreds of EB per year! – Afuer two fjltering stages, ~1/10000 events survive – Data is recorded at 1-4 GB/s T. Colombo • Modeling a Large Data-Acquisitjon Network in a Simulatjon Framework HiPINEB • Chicago • 8 Sept 2015 7

First stage: custom hardware ● Synchronous, pipelined electronics Readout channels (~80 million) ● Selects and acquires ~1800 Level-1 1/400 events accept Event fragments Event fragments Readout drivers Level-1 (~2000 per event) ● 40 MHz input, Level-1 ~100 Full events Full events result Readout systems 100 kHz output up to 24 Region-of-interest builder ● ~80 million input Readout buffers channels, aggregated Event clear Region of interest into ~2000 outputs ~2000 High-Level Trigger worker nodes (“Event fragments”) order of 10 Region of interest High-Level Trigger High-Level Trigger Data Collection ● Output is striped over supervisor processing units Manager HLT decision ~100 “Readout” nodes Event clear 10 with deep bufgers Permanent Data loggers storage T. Colombo • Modeling a Large Data-Acquisitjon Network in a Simulatjon Framework HiPINEB • Chicago • 8 Sept 2015 8

Second stage: distributed software Readout channels (~80 million) ● Commodity hardware: ~1800 Level-1 accept Event fragments Event fragments ~10000 CPU cores in Readout drivers Level-1 (~2000 per event) ~2000 worker nodes Level-1 ~100 Full events Full events result Readout systems ● Events are processed up to 24 Region-of-interest builder Readout buffers in parallel, as soon as acquired by fjrst stage Event clear Region of interest ~2000 ● 100 kHz input, High-Level Trigger worker nodes order of 10 Region of ~1 kHz output interest High-Level Trigger High-Level Trigger Data Collection supervisor processing units Manager HLT decision Event clear 10 Permanent Data loggers storage T. Colombo • Modeling a Large Data-Acquisitjon Network in a Simulatjon Framework HiPINEB • Chicago • 8 Sept 2015 9

Second stage: distributed software ● A “Supervisor” assigns events to free cores Readout channels (~80 million) (“Processing Units”) ~1800 Level-1 ● Each Unit handles a accept Event fragments Event fragments Readout drivers Level-1 (~2000 per event) difgerent event Level-1 ~100 Full events Full events result Readout systems – Retrieves event up to 24 Region-of-interest builder fragments Readout buffers – Decides if the event is Event clear Region of to be kept interest ~2000 High-Level Trigger worker nodes – Avg tjme per event: order of 10 Region of interest High-Level Trigger 50 ms High-Level Trigger Data Collection supervisor processing units Manager HLT decision ● I/O is mediated by a Event clear 10 per-node “Manager” Permanent Data loggers storage T. Colombo • Modeling a Large Data-Acquisitjon Network in a Simulatjon Framework HiPINEB • Chicago • 8 Sept 2015 10

Second stage: commodity hardware ● Datacenter technologies ● Two large 10GbE routers Readout 1 Gbps System (x 98) – Several hundreds ports 10 Gbps ● Readout bufger nodes x 196 x 196 HLT Supervisor – 2x 10GbE links to each router Router Router ● Worker nodes organized in x 10 x 50 racks of 40 nodes each x 50 x 10 – One switch per rack Switch – GbE links from nodes Data x 40 Logger (x 10) to switch HLT rack (x 50) – 10GbE links from switch to each core router T. Colombo • Modeling a Large Data-Acquisitjon Network in a Simulatjon Framework HiPINEB • Chicago • 8 Sept 2015 11

DAQ traffic pattern ● Need to aggregate data from difgerent instruments ➡ Communicatjon patuern: many-to-one ● Data transfers are driven by the experimental conditjons ➡ Bursty traffjc ● In ATLAS: From Readout System – Event fragments are striped over all the readout nodes – A processing unit needs fragments from Funneling Router multjple nodes at the same tjme – Many nodes will start sending fragments at Bandwidth the same tjme to the same destjnatjon, mismatch Switch 1 Gbps creatjng instantaneous network congestjon 10 Gbps To HLT node T. Colombo • Modeling a Large Data-Acquisitjon Network in a Simulatjon Framework HiPINEB • Chicago • 8 Sept 2015 12

DAQ traffic pattern ● On a lossy network such as Ethernet, the DAQ traffjc patuern leads to the incast pathology: – A client (the worker node) simultaneously receives short bursts of data from multjple sources (the readout nodes) – The switch bufgers are overfmown – All the packets from one source are dropped – TCP congestjon control mechanisms cannot prevent this ● A dramatjc increase in data transfer latency is observed – Incast triggers slow TCP tjmeout-based retransmission – Causes under-utjlizatjon of computjng power – Can lead to violatjng DAQ latency requirements T. Colombo • Modeling a Large Data-Acquisitjon Network in a Simulatjon Framework HiPINEB • Chicago • 8 Sept 2015 13

DAQ traffic pattern ● Simple incast mitjgatjon strategy: client-side traffjc shaping Smoothing the rate of data requests limits the maximum size of the traffjc bursts ● Key metric: data collectjon tjme Time required to gather all fragments of an event ● Implementatjon in ATLAS: – Each worker node has a fjxed number of credits available – Each requested fragment “costs” one credit ● Results: – Few traffjc shaping credits: data collectjon tjme grows because the worker nodes cannot fully utjlize the network bandwidth – Many traffjc shaping credits: high latency due to incast – Optjmal working point must be found manually T. Colombo • Modeling a Large Data-Acquisitjon Network in a Simulatjon Framework HiPINEB • Chicago • 8 Sept 2015 14

Quantifying the problem ● Measurements in test system: one worker rack ● Synthetjc traffjc: – 2.1 MB events, assigned to Processing Units at 750 Hz 1.6 GB/s input ➡ ● Core routers have huge bufgers no drops ➡ ● Two worker rack switches tested: Per-port bufgers (600 kB each) Shared bufgers (2x 10 MB) T. Colombo • Modeling a Large Data-Acquisitjon Network in a Simulatjon Framework HiPINEB • Chicago • 8 Sept 2015 15

Modeling a Large Data-Acquisition Network in a Simulation Framework - PowerPoint PPT Presentation

1st IEEE Workshop on High-Performance Interconnectjon Networks towards the Exascale and Big-Data Era Chicago, 8 September 2015 Modeling a Large Data-Acquisition Network in a Simulation Framework Tommaso Colombo 1,2 Holger Frning 2 Pedro Javier

Simulation Simulation CHAPTER 1 INTRODUCTION TO SIMULATION 2 MODELING CHAPTER 1 INTRODUCTION

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Simulation & Modeling Event-Oriented Simulations Maria Hybinette, UGA Outline Simulation

Computer Simulation Modeling Jonathan Thaler Department of Computer Science 1 / 61 Modeling

T7 Cloud Simulation On-demand access simulation December 2016 T7 Cloud Simulation December 2016

Characterization, Modeling, and Characterization, Modeling, and Simulation Simulation of Mouse

CSN08101 Digital Forensics Lecture 6: Acquisition Lecture 6: Acquisition Module Leader: Dr

Simulation Simulation Modeling and Performance Analysis with Discrete-Event Simulation g y Dr.

Simulation Simulation Modeling and Performance Analysis with Discrete-Event Simulation g y Dr.

Simulation Simulation Modeling and Performance Analysis with Discrete-Event Simulation g y Dr.

Simulation Simulation Modeling and Performance Analysis with Discrete-Event Simulation g y Dr.

Simulation Simulation Modeling and Performance Analysis with Discrete-Event Simulation g y Dr.

Simulation Simulation Modeling and Performance Analysis with Discrete-Event Simulation g y Dr.

Simulation Simulation Modeling and Performance Analysis with Discrete-Event Simulation g y Dr.

Making Cyber Security Part of Your Business Cybercrime The rapid digitization of

DNSSEC Signing at Scale on the Edge lafur Gu mundsson What we do: DNS Third party DNS

PPC-1 Presentation Please find attached a presentation that will be delivered to an institutional

Organizing for T-Trak Bruce DeMaeyer NMRA-MCR-Division 10 Eastgate, Ohio November 11, 2018

GAS ODORIZING GAS ODORIZING SYSTEM SYSTEM ASOG (AUTOMATED GAS ODORIZING ASOG AUTOMATED GAS

& SE & SERVIC VICES ES CO CONFE NFERENCE RENCE JUNE JUNE 20 2019 19 SAF SAFE

NEW OIZ OMR XC EMBARGO 25TH JUNE 4 PM (TC+2) AIM FOR THE TOP * MAKING THE BEST EVEN BETTER 1

Team 21: Autonomous RoboSub Final Presentation Fall 2012 Team Members: Santiago Franco, Darryl

Modeling a Large Data-Acquisition Network in a Simulation Framework - PowerPoint PPT Presentation

1st IEEE Workshop on High-Performance Interconnectjon Networks towards the Exascale and Big-Data Era Chicago, 8 September 2015 Modeling a Large Data-Acquisition Network in a Simulation Framework Tommaso Colombo 1,2 Holger Frning 2 Pedro Javier

Simulation Simulation CHAPTER 1 INTRODUCTION TO SIMULATION 2 MODELING CHAPTER 1 INTRODUCTION

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Simulation &amp; Modeling Event-Oriented Simulations Maria Hybinette, UGA Outline Simulation

Computer Simulation Modeling Jonathan Thaler Department of Computer Science 1 / 61 Modeling

T7 Cloud Simulation On-demand access simulation December 2016 T7 Cloud Simulation December 2016

Characterization, Modeling, and Characterization, Modeling, and Simulation Simulation of Mouse

CSN08101 Digital Forensics Lecture 6: Acquisition Lecture 6: Acquisition Module Leader: Dr

Simulation Simulation Modeling and Performance Analysis with Discrete-Event Simulation g y Dr.

Simulation Simulation Modeling and Performance Analysis with Discrete-Event Simulation g y Dr.

Simulation Simulation Modeling and Performance Analysis with Discrete-Event Simulation g y Dr.

Simulation Simulation Modeling and Performance Analysis with Discrete-Event Simulation g y Dr.

Simulation Simulation Modeling and Performance Analysis with Discrete-Event Simulation g y Dr.

Simulation Simulation Modeling and Performance Analysis with Discrete-Event Simulation g y Dr.

Simulation Simulation Modeling and Performance Analysis with Discrete-Event Simulation g y Dr.

Making Cyber Security Part of Your Business Cybercrime The rapid digitization of

DNSSEC Signing at Scale on the Edge lafur Gu mundsson What we do: DNS Third party DNS

PPC-1 Presentation Please find attached a presentation that will be delivered to an institutional

Organizing for T-Trak Bruce DeMaeyer NMRA-MCR-Division 10 Eastgate, Ohio November 11, 2018

GAS ODORIZING GAS ODORIZING SYSTEM SYSTEM ASOG (AUTOMATED GAS ODORIZING ASOG AUTOMATED GAS

&amp; SE &amp; SERVIC VICES ES CO CONFE NFERENCE RENCE JUNE JUNE 20 2019 19 SAF SAFE

NEW OIZ OMR XC EMBARGO 25TH JUNE 4 PM (TC+2) AIM FOR THE TOP * MAKING THE BEST EVEN BETTER 1

Team 21: Autonomous RoboSub Final Presentation Fall 2012 Team Members: Santiago Franco, Darryl

Simulation & Modeling Event-Oriented Simulations Maria Hybinette, UGA Outline Simulation

& SE & SERVIC VICES ES CO CONFE NFERENCE RENCE JUNE JUNE 20 2019 19 SAF SAFE