LHCONE NETWORK SERVICES: GETTING SDN TO DEV-OPS IN ATLAS Shawn - PowerPoint PPT Presentation

LHCONE NETWORK SERVICES: GETTING SDN TO DEV-OPS IN ATLAS Shawn McKee/Univ. of Michigan LHCONE/LHCOPN Meeting, Taipei, Taiwan March 14th, 2016 1 March 14, 2016

Context for this Presentation • Within LHCONE we have had a point-to-point service effort for quite a while. – It has been challenging to make progress beyond a few limited demonstrations • Within the LHC experiments there has been interest in what might be possible with networking and especially with how a future production quality software defined networking capability would fit with the way the experiments manage, operate and orchestrate their globally distributed resources – Network device support for SDN has not really been “production quality”…hard to interest the experiments in even testing because of problems getting anything enabled between sites of interests • How best to make some progress? 2 March 14, 2016

Challenges Getting SDN into LHC Production Systems • While we have dabbled as a community for years with various SDN capabilities we have never managed to effectively bridge the gap into the core LHC experiment middleware and workflow systems. Why? – The experiments have their own “stove - pipes” of effort and there hasn’t been much interaction with networking – The experiments focused on what they perceive as bigger problems they must face • We have helped ensure the network has been the most reliable and capable component of their distributed infrastructure – Our test implementations are typically one-offs designed to demonstrate features and capabilities but not then easily translated into use with existing production systems. – SDN itself (both software and hardware) has not been near “production quality” to -date. • This is improving rapidly, new hardware/chipsets are much more capable, problems with software usability improving. 3 March 14, 2016

Getting SDN to the Ends using Dev-Ops • To make progress with SDN capabilities for LHC we need to start focusing on enabling new SDN features in production instances, blending production and development. – Software and technology development has called doing this “ dev-ops ” (Development and Operations) • One shortcoming in the P2P effort to-date has been the significant challenge in getting all the way to the ends: to the servers that source and sink our data. – We have been able to create WAN circuits but it then gets “messy” for how those are actually used to carry the right traffic for production activities • We now have an interesting option to help us: Open vSwitch (openvswitch.org). – This is well tested, supported software to create virtual switches on Linux (and other OSs) with traffic control and shaping and OpenFlow and OVSDB support. 4 March 14, 2016

Details on Deploying Open vSwitch (OVS) • There is a web page on the Wiki below documenting both the creation of RPMS for Redhat/CentOS/SL 6.x and their deployment onto existing hosts: – https://www.aglt2.org/wiki/bin/view/AGLT2/InstallOpenvSwitch – This web site will soon provide some detailed and tested configuration examples for implementing OVS on hosts with various types of network configuration (bonded, VLANs, multiple interfaces, etc) • The idea is to move your systems IP addresses off from their existing physical (or virtual OS) NICs and onto the OVS bridge you will bring up. • OVS can be installed and turned on without any impact to the running system (install RPM, activate service) – It is actually moving the IP that is potentially disruptive and must be done with some care. The URL above has details. 5 March 14, 2016

Advantages of OVS on Production Instances • By getting OVS in place on LHC production storage systems we immediately gain visibility and control all the way to the sources and sinks of data-flows for LHC • We have verified that OVS has almost no measureable impact when shaping traffic on 10G NICs (See Ramiro’s presentation at the last LHONE meeting: https://indico.cern.ch/event/401680/contribution/16/attachments/1 178611/1705261/LHCONE-AM_SDN_OVS_rv1.pdf • Having OVS running on production systems with the IPs moved to the OVS bridge allows us to continue to operate all production services identically to how they were operated prior to installation and configuration – The big win is that we can start to do simple tests incorporating specific flows or sets of servers into end-to-end circuits. – Gradually, we can verify the impact of using such capabilities with LHC production systems and, if positive, it makes a strong argument for other sites to begin joining the effort. 6 March 14, 2016

Diagram of Possible Future SDN Dev-Ops Testbed Interfaces PanDA/DaTri Agent In development 1) Request WAN circuit Currently in place 2) Integrate circuit with OVS 1 2 3) Transfer on new E2E path Site B Site A NSA_N NSA_1 Agent Agent OVS OVS Start transfer Control Plane 3 Data Plane LHCONE p-t-p STP A STP B Multi-domain Fabric Transfer Node Transfer Node (OVS+FDT/GridFTP) (OVS+FDT/GridFTP) OVS tail OVS tail (site dependent) (site dependent) Original Slide from Ramiro/Azher, Caltech 7 March 14, 2016

Challenges • While having OVS “at the ends” will be a huge step forward for our Point-to-Point work, there remain a number of challenges • The primary challenge is integrating existing circuit creation systems with OVS as a participant – How can we incorporate the OVS-enabled end-systems seamlessly into the end-to-end circuit? • How best to use the many OVS features to improve the overall performance of the circuit? • The main “meta - question”: How can SDN capabilities improve the LHC experiments ability to manage, utilize and optimize their global infrastructure? – There is a lot of work to do to investigate this: Getting SDN “in - line” with production LHC work is our first step! 8 March 14, 2016

Next Steps • Finalize testing of OVS Configuration to support various network configurations • AGLT2 (Michigan, Michigan State) and MWT2 (Illinois, Indiana and University of Chicago) have agreed to deploy OVS onto their ATLAS dCache storage systems – Total of 8.7 Petabytes of storage between the two – Most system dual 10G connected; sites 80 Gbits to WAN – This will provide an example to experiment with SDN end-to-end using real ATLAS production traffic • We want to expand as soon as is feasible. Interest from – DE-KIT – Possible Canadian participation – Seeking additional sites with real use cases (at least one more in North America and in Europe) • Timescale April-May 2016 for initial tests (assumes documentation and initial OVS configurations documented and tested by end of March) • Email Shawn McKee if your site is interested in participating 9 March 14, 2016

QUESTIONS & COMMENTS Shawn McKee smckee@umich.edu 10 March 14, 2016

LHCONE NETWORK SERVICES: GETTING SDN TO DEV-OPS IN ATLAS Shawn - PowerPoint PPT Presentation

LHCONE NETWORK SERVICES: GETTING SDN TO DEV-OPS IN ATLAS Shawn McKee/Univ. of Michigan LHCONE/LHCOPN Meeting, Taipei, Taiwan March 14th, 2016 1 March 14, 2016 Context for this Presentation Within LHCONE we have had a point-to-point

understanding production through your customers eyes @cyen @honeycombio 2012 DEV OPS

Topics Why SDN? What is SDN? SDN in OpenStack and K8s Overview of SDN controllers

ESnet's LHCONE Service Presented by Jason Zurawski, zurawski@es.net Science Engagement Authored

Supporting LHCONE Services LHCONE-LHCOPN Workshop QuickTime and a Stockholm, Sweden H.264

Presentation of the LHCONE Architecture document Marco Marletta, GARR LHCONE Meeting Paris,

Learning Objectives What is SDN? How key SDN technologies work? SDN applications

Multi Multi Multi- Multi - - -Layer Access Control Layer Access Control Layer Access

LHCONE Operational Framework Part 1 : principles and ideas for the operational model Part 2 :

Measuring DNSSEC using RIPE Atlas Kaveh Ranjbar RIPE NCC RIPE Atlas Coverage RIPE Atlas 2

ATLAS Searches for SUSY Chris Young, CERN ATLAS Group What have we not looked for? 1 / 37 ATLAS

Decision making in Endless Space 2 Background Our Studio 40 20 Dev 30 Dev 60 Dev 12 Dev

LHCONE status and future Alice workshop Tsukuba, 7 th March 2014 Edoardo.Martelli@cern.ch CERN

LHCOPN-LHCONE meeting Guest of HEPiX Fall 2017 meeting KEK Tsukuba, Japan 16 th October 2017

ANSE-RELATED PROJECTS: LHCONE, DYNES AND OTHERS AN OVERVIEW Artur Barczyk/Caltech 2 nd ANSE

ATLAS ROOT I/O pt 2 Atlas Hot Topics (with reference to CHEP presentations) Big data

ATLAS I/O Overview Peter van Gemmeren (ANL) gemmeren@anl.gov for many in ATLAS 8/23/2018 Peter

Evaluation of X-ray astronomical SOI CMOS pixel sensor aimed at improvement charge-collection

CSCI 3136 Principles of Programming Languages Control Flow - 2 Summer 2013 Faculty of Computer

Computational Complexity Lecture 9 Alternation (Continued) 1 ATM Guess 0 Guess 1

Infinite Data Structures! CS 51 and CSCI E-51 March 25, 2014 . let head (Cons(x, _) : a

IN3170/4170, spring 2020, mandatory labratory exercise 3: Differential Amplifiers (deadline

Chapter 1: The digital abstraction Computer Structure - Spring 2004 Dr. Guy Even c Tel-Aviv

CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall http:/

CHAPTER 10 HIGH VOLTAGE TESTING OF ELECTRICAL APPARATUS Introduction 1. Classification of High

Sambuz

Useful Links

Newsletter

Mail Us