Business Continuity at DESY a collection of themes and thoughts - PowerPoint PPT Presentation

Business Continuity at DESY … a collection of themes and thoughts … covering among others measures, procedures and dependencies Peter van der Reest, Yves Kemp, DESY IT Hepix Spring 2014, 21.05.2014

General DESY risk assessment > DESY performs a general, yearly risk assessment  This is a formal process  Risks from all possible fields, including financial and other external ones  Also covers IT > Risk assessment performed by separate DESY entities  E.g. administration, machine control, …  Not always formal process  Written/Oral reports from units to directorate after incidents > “DESY is an experiment- oriented laboratory” translates into “IT is second in priority for e.g. power and cooling after accelerators and experiments”  Does not mean that IT is neglected! Yves Kemp | Business Continuity at DESY | 21.5.2014 | Page 2

ISO 27001 certification > Background: DESY project management office is asked by funding agencies to certify that its procedures and infrastructures conforms to ISO 27001  Includes IT … which is most of central IT ISO > External consultant first evaluating status and 27001 estimating work and costs of such a certification > So far interviews with all relevant groups within IT > First impression is that many requirements concerning setup and workflows are met, but formal documentation of processes should be enforced Yves Kemp | Business Continuity at DESY | 21.5.2014 | Page 3

Network and IDS > Scanning networks and testing ports  Get to learn who does what - > “Who is running https server? HeartBleed ”  See differences, e.g. when malware listens on ports > Efforts to separate different networks  Or define relations between networks  Incident containment > Investigations into flow monitoring  Checking for unusual patterns in network traffic > Network interventions and glitches have huge impact > Linux: Dedicated intrusion detection software on (most) systems > Windows: No dedicated IDS, anti-virus also catches some intrusions Yves Kemp | Business Continuity at DESY | 21.5.2014 | Page 4

CC operation and Communication to users > Operational aspects  Control room, workdays 8:00-20:00 with operator- on-duty  On-call operator all other times > User Consulting Office (UCO)  Generates user documentation  Handles first level requests and trouble shooting  Organizes communication with users in disaster situations e.g. also by pinning paper information about network outages to entry doors of buildings… Yves Kemp | Business Continuity at DESY | 21.5.2014 | Page 5

Computing Center and Power > Three independent power lines to HH campus two used by IT in room 1 & 2 (same building) > These two lines shared with other groups on campus > Two independent lines with generally good and stable quality > Have battery powered UPS – but mainly to flatten out voltage fluctuations or very short interruptions (~20 minutes) > ~2 years ago, we had disturbances in internal power distribution system – complete black- out … other independent power feeds would not have helped Yves Kemp | Business Continuity at DESY | 21.5.2014 | Page 6

Cooling > Climate (also in CC) not under IT control  ... The same for power distribution  More communication with infrastructure groups needed to make them understand our needs for separation and decoupling (which is more expensive) > Cooling redundancy: Cold water ring  On HH campus, 8.4 MW total, 2 MW for IT  Two inputs: overhauled HERA cooling and new highly efficient PETRA III cooling  Currently ring not closed – more like a bus > Cooling redundancy: Distribution in the CC  Recent incident: Work on increasing redundancy of in air cooling for room 1 resulted in cascade of short- circuits that stopped cooling of water-cooled racks im room 2  (Some) water-cooled racks react very fast to cooling disturbance because of small amount of air Yves Kemp | Business Continuity at DESY | 21.5.2014 | Page 7

General comments on cooling and power > IT depends on other DESY departments for climate and power  … recall “DESY is an experiment-oriented laboratory”  Generally good service and fast reaction > Climate and power: Historically grown infrastructure > Chasing single-point of failures?  We will discover unknown single-point-of-failures  Probably better to accept this fact and concentrate on optimizing reaction handling Yves Kemp | Business Continuity at DESY | 21.5.2014 | Page 8

One event we failed to prepare against (7/2013) > One of our two lines was cut > Transformer on second line overheated > On batteries for ~20 minutes … power came back in last second > No set procedures, but the whole crew reacted well – we survived! > … and we were lucky: The helium line above was not in use … Yves Kemp | Business Continuity at DESY | 21.5.2014 | Page 9

High-Availability, Server and service redundancy High-Availability & Redundancy : > Whenever possible, set up systems in high- availability mode > Using VMware + Cisco UCS to build infrastructure for mission critical applications  … spread over Computer Rooms 1&3 (~500m apart)  … e.g. for EDMS, Person management systems, Mail, … > Classic Cold/Warm/Hot standby > Load-Balancer with fail-over: F5 & Poise (own development, advanced metrics) > Fail-Over cluster etc. whenever necessary and possible Yves Kemp | Business Continuity at DESY | 21.5.2014 | Page 10

Configuration management > General tendency towards common and widespread tools  WDS/WSUS for Windows well established  Migration to Puppet for Linux (actually consolidation of Quattor/Salad+WBOOM/FAI) > Introducing version control management in configurations with puppet  Enables roll- back, auditing, … > Automate configuration as much as possible  Fast reinstall with guaranteed results > Make secret handling processes (pw, keys, certs,…) auditable  See Sven’s talk > Using vanilla distributions with only minimal changes  E.g. discontinue HEP ENV / HEP X11 Yves Kemp | Business Continuity at DESY | 21.5.2014 | Page 11

Backup & Archive & Tapes > Backup & Archive & Tapes:  For TSM backups data is saved redundantly in two locations (HH and ZN)  For selected archive data sets two copies are held: one online in silo, other offline in former atomic shelter  Other methods of redundant data keeping are considered, e.g. cloud storage syncing: although this is not backup it might help users with broken notebooks > Desaster recovery  of notebooks&desktops: TSM backup methods are sufficient (or not needed: $HOME on network FS)  of RAID-Arrays without copy/backup: Very rare, rapid escalation to external data rescue experts … costly but usually successful Yves Kemp | Business Continuity at DESY | 21.5.2014 | Page 12

Human Continuity _ 1 > as workload is high, for some services we do not have n+1 (n=1) redundancy  even when desirable, budgets won’t allow for it > absence or exit of colleagues can leave holes  illness  leaving DESY usually before new recruitment has finished  spreading tasks over remaining staff will only work for limited time > standardization, use of widespread tools and products  Allows for hiring external fire-fighters Yves Kemp | Business Continuity at DESY | 21.5.2014 | Page 13

Human Continuity _ 2 > past cases have raised awareness of importance of up-to-date documentation  In disaster situations  Knowledge transfer after changes in personnel > and even more of the independent check that this documentation is understandable and complete  many minor details are taken as common knowledge (by the author…) > unfortunately, this also increases workload  but can well be built into operating procedures Yves Kemp | Business Continuity at DESY | 21.5.2014 | Page 14

… being a Scientific Computing Center > In the end, our mission is to serve Scientists and enable Science Need to find a balance between > Stable, well documented infrastructures and workflows > Flexible environment to ad-hoc  Deploy non-standard hardware and software  Bypass procedures in case of needs from scientists  … and later include in standardization and documentation This is what distinguishes us from commercial hosters Yves Kemp | Business Continuity at DESY | 21.5.2014 | Page 15

Business Continuity at DESY a collection of themes and thoughts - PowerPoint PPT Presentation

Business Continuity at DESY a collection of themes and thoughts covering among others measures, procedures and dependencies Peter van der Reest, Yves Kemp, DESY IT Hepix Spring 2014, 21.05.2014 General DESY risk assessment > DESY

Business Continuity Board of Directors August 26, 2013 Agenda Business Continuity

RF Control for the DESY UV-FEL Stefan Simrock DESY DESY CASA Seminar 2/6/04 Stefan Simrock

Continuity and Recovery Planning Continuity and Recovery Planning Continuity and Recovery

Focus Slide . p.1 . p.2 CONTINUITY . p.2 CONTINUITY . p.2 CONTINUITY

Business Continuity in Action Business Continuity & Disaster Resilience Forum May 2012

How Does Business Continuity Differ from Emergency Preparedness? Business Continuity Emergency

Continuity Definition (Continuity) A function f is said to be continuous at c if lim x c f ( x

dCache in Use at DESY dCache in Use DESY DV/IT 5.7.2010 Overview grid gridFTP, (gsi)dcap,

CW Cryomodule testing at DESY - differences from pulsed tests J. Sekutowicz DESY/SLAC J.

design of the radiation cooled positron target Sabine Riemann (DESY), Andriy Ushakov (UHH),

Overview of DESY and Accelerator R&D at the Photo Injector Test facility at DESY in Zeuthen

Business Continuity Management (BCM) March 2013 Bank UOB Indonesia BUSINESS CONTINUITY

Continuity Clinic : Augmenting Continuity Clinic the Longitudinal Experience Erik Stratman, MD Sept

Teaching Continuity Goals Focus your planning efforts Thinking inward rather than just

Research Continuity Goals Focus your planning efforts Thinking inward rather than just

Continuity of Hausdorff measure by Rafal Tryniecki April 2020 by Rafal Tryniecki Continuity of

ISGC 2017 Security Workshop Sven Gabriel Security Incident handling in Federated Clouds

TCIPG TECHNICAL CLUSTERS AND THREADS Trustworthy Trustworthy Technologies for Wide Technologies

HOW TO CONNECT VEHICLE IN SAFE AND SECURE WAY MIKKO HURSKAINEN TECHNOLOGIST 17+ 200+ 70+ 5

Efficient Packet Classification for Intrusion Detection Using FPGA Haoyu Song, John W. Lockwood

Software Security VMI (Virtual Machine Introspection) / VMM-based Intrusion Prevention Julian

Goals of IDS Detect wide variety of intrusions Previously known and unknown attacks

Outline Intrusion detection systems CSci 5271 Malware and the network Introduction to Computer

Lab 8: Firewalls & Intrusion Detec6on Systems Fengwei Zhang Wayne State University CSC 5991

Sambuz

Useful Links

Newsletter

Mail Us

Business Continuity at DESY a collection of themes and thoughts - PowerPoint PPT Presentation

Business Continuity at DESY a collection of themes and thoughts covering among others measures, procedures and dependencies Peter van der Reest, Yves Kemp, DESY IT Hepix Spring 2014, 21.05.2014 General DESY risk assessment > DESY

Business Continuity Board of Directors August 26, 2013 Agenda Business Continuity

RF Control for the DESY UV-FEL Stefan Simrock DESY DESY CASA Seminar 2/6/04 Stefan Simrock

Continuity and Recovery Planning Continuity and Recovery Planning Continuity and Recovery

Focus Slide . p.1 . p.2 CONTINUITY . p.2 CONTINUITY . p.2 CONTINUITY

Business Continuity in Action Business Continuity &amp; Disaster Resilience Forum May 2012

How Does Business Continuity Differ from Emergency Preparedness? Business Continuity Emergency

Continuity Definition (Continuity) A function f is said to be continuous at c if lim x c f ( x

dCache in Use at DESY dCache in Use DESY DV/IT 5.7.2010 Overview grid gridFTP, (gsi)dcap,

CW Cryomodule testing at DESY - differences from pulsed tests J. Sekutowicz DESY/SLAC J.

design of the radiation cooled positron target Sabine Riemann (DESY), Andriy Ushakov (UHH),

Overview of DESY and Accelerator R&amp;D at the Photo Injector Test facility at DESY in Zeuthen

Business Continuity Management (BCM) March 2013 Bank UOB Indonesia BUSINESS CONTINUITY

Continuity Clinic : Augmenting Continuity Clinic the Longitudinal Experience Erik Stratman, MD Sept

Teaching Continuity Goals Focus your planning efforts Thinking inward rather than just

Research Continuity Goals Focus your planning efforts Thinking inward rather than just

Continuity of Hausdorff measure by Rafal Tryniecki April 2020 by Rafal Tryniecki Continuity of

ISGC 2017 Security Workshop Sven Gabriel Security Incident handling in Federated Clouds

TCIPG TECHNICAL CLUSTERS AND THREADS Trustworthy Trustworthy Technologies for Wide Technologies

HOW TO CONNECT VEHICLE IN SAFE AND SECURE WAY MIKKO HURSKAINEN TECHNOLOGIST 17+ 200+ 70+ 5

Efficient Packet Classification for Intrusion Detection Using FPGA Haoyu Song, John W. Lockwood

Software Security VMI (Virtual Machine Introspection) / VMM-based Intrusion Prevention Julian

Goals of IDS Detect wide variety of intrusions Previously known and unknown attacks

Outline Intrusion detection systems CSci 5271 Malware and the network Introduction to Computer

Lab 8: Firewalls &amp; Intrusion Detec6on Systems Fengwei Zhang Wayne State University CSC 5991

Sambuz

Useful Links

Newsletter

Mail Us

Business Continuity in Action Business Continuity & Disaster Resilience Forum May 2012

Overview of DESY and Accelerator R&D at the Photo Injector Test facility at DESY in Zeuthen

Lab 8: Firewalls & Intrusion Detec6on Systems Fengwei Zhang Wayne State University CSC 5991