Centre de Calcul de l’Institut National de Physique Nucléaire et de Physique des Particules FJPPL Computing Workshop Operational experience with second machine room at CC-IN2P3 Xavier Canehan
Introduction 2 computing rooms at CC-IN2P3 since 2011 Critical choices upon conception lead to consequent advantages Adaptability remains mandatory Monitoring and testing even the building Drawbacks 2 FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015
Centre de Calcul de l’Institut National de Physique Nucléaire et de Physique des Particules 2 computing rooms Square feets and high power consumption 3 FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015
Vil-2 initial objectives +10 year perspective Hot water Modernity Modularity Ease of Multi-Tier deployment architecture 4 FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015
High Quality results of Initial Conception Modern computing room (details shown during visit) Initial plan for 2011 - 2019 Multi-tier by design Target: 3 phase deployment 2011 2015 2019 ◦ first one dedicated to 50 racks 125 racks 240 racks 0.6 MW 1.5 MW 3.6 MW computing farm ◦ relying upon regular Tier II Tier III Tier III-IV budget 5 FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015
First phase: Tier II over 2 lines 28 InRow Cooling units,18 – 20 kW each One 2 MVA UPS chain of 4 * 500 kVA UPS 2 transformers of 1600 kVA 3 chilling units for 2,4 MW, only one distribution circuit. Backup through a 24m^3 water tank. 2 power lines: dedicated main up to 9 MW, 2 MW TIER II reservation on backup line ⅓ floor space used at each level TIER II 6 FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015
Site mean power usage, PUE advantage to Vil-2 Site power er co consumpti sumption on 800 ar around nd 1. 1.1 1 MW MW 700 600 Mean Power wer Usage age IT agai ainst nst Total al Room kW IT kW Total PUE PUE 1 (kW kW) 500 kW IT Vil-1 320 720 (-130) 1.84 400 kW Total Vil-2 300 440 1.46 300 200 Best PUE UE in in Vil Vil-2 100 Movin ing from Vil il-1 to V Vil il-2 0 gain ins s ~20% of of power cost Vil-1 Vil-2 7 7 FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015
Entry ticket cost vs fully functional Vil-1 PUE is not linear: works by step, intercept is not null ◦ Beside investment costs ◦ Operational cost of electrical infrastructure must be taken in account eg 1 UPS consumes up to 3kW Other costs of investment ◦ Water cooled racks value ◦ PDU and rack power protection costs Vil-1 is fully redundant, deals with hygrometry Vil-2 initial target was deliberately limited No p poin int t in in d dit itchi hing ng Vi Vil-1 8 FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015
Centre de Calcul de l’Institut National de Physique Nucléaire et de Physique des Particules Moving grounds Adaptation remains mandatory 9 FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015
Have to evolve Environment changes Needs and perspectives clarify and evolve IT technology is volatile Infrastructure pace stays slower Monitoring everything 10 FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015
Environment – IT densification effects upon racks Space floor is no more a problem kW/m² increase Power 14 C6200 Dell sockets # per rack PowerEdge will ◦ approach InRow Cooling IT densification unit capacity (18kW) ◦ need 45 sockets, 3 phases PDU ◦ (dedicated PDU development) At rack limits Partially filled racks to limit constraints 11 FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015
Environment – Power costs evolution 5% to 7% of annual /annual power consumption power cost increase /annual power cost, without tax / kWh cost, in € cents Looking for IT efficiency Fine tuned power contract helps to minimize costs Se Seek the more effi ficien cient IT ha hardwa ware in the most effici ficient ent room 12 FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015
Power efficiency, ASHRAE recommendations Actual hardware bears 35 ° C all year long Increasing room temperature lowers overall power consumption 25°C server -40% fan entry activity for 11 kW Gain setpoint cooling units Less noise: 95-110 dB to 85-95 dB Hot corridor temperature also increased Corridor temperature 36 36°C-40 40°C What will be next setpoint ? 13 FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015
Planning to cope with scientific needs in foreseable future Estimated imated #Racks ks What if all new hardware goes into Vil-2 ? 140 120 Estimated imated IT power er [kW] W] 100 2500 80 2000 60 40 1500 20 1000 0 2014 2015 2016 2017 2018 2019 500 Estimations from LHC and LSST/Euclid/CTA figures • Data modified with current densification factor • 0 Need Vil-2 adaptation to host storage systems 14 FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015
Initial Plans revision Previous power and cooling distribution systems implied evolution by pair of hot corridors Infrastr astruc uctur ture e cost for each new la lane: ~1.5 .5 M € Minimizing costs by actual infrastructure hardware reuse Adapt pting ing exis istin ting g le least used ed Tie ier II II la lane 2 p phase ase pla lan le less s than 350 k k € per year 2017 aim: ◦ 80 racks, 1 MW IT ◦ 1 lane Tier II and 1 lane Tier III 15 FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015
Introducing Tier III – Phase 1, power redundancy 2015 15 TIER III Hot Aisle C/D Used exten tension on 2015 2015 Used Hot Aisle A/B TIER II ext. details to be seen during visit 16 FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015
Centre de Calcul de l’Institut National de Physique Nucléaire et de Physique des Particules Drawbacks and limits From details to major drawbacks 17 FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015
Infrastructure limits are easily forgotten Coping with interdependant limits ◦ Cooling capacity or cooling redundancy ◦ Power capacity and Power redundancy ◦ Per rack, group of racks, aisle, distribution line Multi-tier ability adds an order of complexity ◦ Event more if you mix tiers in a single line Strict deployment plans needed Monit itoring ring is is manda dato tory 18 FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015
Dealing with cooling defect Stopping InRow cooling units for 20 minutes → +15 ° C Increases corridor temperature around 49 ° C Increases front temperature to 43 ° C Need an effi ficient ient shut utdow down system tem Our water tank provide ides 20 m min in dela lay 19 FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015
Smart versus dumb shutdown systems Smart shutdown relies upon IPMI ◦ detects low continuous slope ◦ or fast temperature change IPMI needs network If network switches shut down before servers, IPMI is useless Need a dumb backup power cut system Won’t reproduce a bad experiment with a water leak -20 ° C on roof, +65 ° C in hot corridor 20 FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015
Cooling technology: hot water choice Improving global Campus Energy Reuse Efficiency ? • Agreement to provide hot water on campus Land nd procu ocureme rement nt • Very efficent chillers, allowing reuse of hot water • Silent hardware Hot t water ter need ed • Campus is late • 3 years spent Fixed xed tech chnolo ology 21 FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015
No humans: no window, no faucet ? 22 FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015
Cooling technology: improving efficiency Won’t ever be able to use direct Free Cooling But new cooling technologies are available New cooling Change IT technologies procurement 23 FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015
Very valuable modularity outcomes Ceiling rails Preset pipes Movable separation wall between used and free space Roof as technical level 24 FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015
Questions? Thank you! 25 FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015
Recommend
More recommend