xrootd monitoring report
play

XRootD Monitoring Report A.Beche D.Giordano Outlines Talk 1: - PowerPoint PPT Presentation

XRootD Monitoring Report A.Beche D.Giordano Outlines Talk 1: XRootD Monitoring Dashboard Context Dataflow and deployment model Database: storage & aggregation User interface & use cases Open issues & future


  1. XRootD Monitoring Report A.Beche D.Giordano

  2. Outlines  Talk 1: XRootD Monitoring Dashboard  Context  Dataflow and deployment model  Database: storage & aggregation  User interface & use cases  Open issues & future work  Summary  Talk 2: Beyond XRootD monitoring  HTTP/WebDAV integration  Integration in the WLCG Transfers Dashboard 10 – April - 14 2 A.Beche – Federated Workshop

  3. XRootD federation monitoring  Activity started during summer 2012  4 sites for FAX, 11 for AAA Number of sites reporting 45 Monitoring data increased 40 accordingly 35 30 # sites July 2012 March 2014 25 20 15 AAA 606k 43M 10 5 FAX 15k 22M 0 10 – April - 14 3 A.Beche – Federated Workshop

  4. Why monitoring ?  Understand data flows to estimate data traffic  Provide information for efficient operations  Identify access patterns and propose data placement strategies 10 – April - 14 4 A.Beche – Federated Workshop

  5. XRootD monitoring dataflow real time asynchronous Federation stomp GLED Consumer ActiveMQ Raw Collector stomp UDP 10 minutes External applications WEB Stats API Dashboard UI 10 – April - 14 5 A.Beche – Federated Workshop

  6. GLED Deployment model FAX EU FAX US Federation monitoring data rate CERN (1 site) SLAC (9Hz) 20 15 Hz 10 EOS 5 CERN (150Hz) 0 AMQ @ CERN Shared cluster EOS monitoring data rate 5 nodes AAA 200 UCSD (16Hz) 150 Hz 100 50 0 EOS CERN (10Hz) 10 – April - 14 6 A.Beche – Federated Workshop

  7. Consolidated dataflow  Two usage of these raw data:  Dashboard monitoring  XRootD popularity  Now share the same database:  Storage optimization  Consistency guaranteed 10 – April - 14 7 A.Beche – Federated Workshop

  8. Database AAA Database usage growth* 700 ~300 GB 600 ~1B records 500 400 GB Daily insert 300 2 GB / 6M rows 200 100 FAX 0 ~600 GB ~2B records * Indexes excluded  Storage  Raw, statistics, metadata  Tables daily partitioned , no global indexes 10 – April - 14 8 A.Beche – Federated Workshop

  9. Database  Raw data aggregation:  Done using PL/SQL procedures  Events are unordered  Stateless: Full re-computation of touched bins each time  Compute stats from raw data in 10 min bins  Aggregate 10 min stats in daily bins 10 – April - 14 9 A.Beche – Federated Workshop

  10. Aggregation methods Transfers Easy method 2pm 3pm 4pm 5pm 6pm 7pm Transfers 1 0 0 2 1 Bytes 10 0 0 15 20 10 – April - 14 10 A.Beche – Federated Workshop

  11. Aggregation methods Transfers Easy method 2pm 3pm 4pm 5pm 6pm 7pm Transfers 1 0 0 2 1 Bytes 10 0 0 15 20 Adopted method Transfers 1 (1) 1 (0) 2 (0) 3 (2) 1 (1) Bytes 8 1 14 (9+6) 15 (1+9+5) 5 10 – April - 14 11 A.Beche – Federated Workshop

  12. Visualization Interface 10 – April - 14 12 A.Beche – Federated Workshop

  13. Pre-defined set of views 10 – April - 14 13 A.Beche – Federated Workshop

  14. Use case example Understand site access patterns 1. Which sites are reading from FNAL 2. Zoom to a specific site to 1 understand which users are reading 3. Understand which files are read by a user 2 2 3 10 – April - 14 14 A.Beche – Federated Workshop

  15. Data popularity  XRootD monitoring provides information about file access patterns:  Including non official collections (ie: user files)  Contribute to simplify and make more efficient the usage of disk resources  Popularity data analytics built on this information:  Adopted already for CMS-EOS  will be extended to full AAA 10 – April - 14 15 A.Beche – Federated Workshop

  16. Archive recommendation for CMS-EOS  Help to manage the disk space of EOS including user space  No central bookkeeping system % TB  Unused files: created > 4 months ago, no access in the last 3 months:  ~500 TB of space occupied and not used <=> 30% of total for these areas 10 – April - 14 16 A.Beche – Federated Workshop

  17. Open issues  Missing servers:  Dcache sites  Server should provide their site name.  CMS: only 5 sites:  anon, BUDAPEST, Hephy-Vienna, T2_US_USCD, UKI-LT2-Brunel  Not coherent convention naming  ATLAS: GLED RPM to be deployed  GLED Collector improvements:  Reliability of the service:  Recover time, can be long due to time difference  GLED should be operated as a production service  Scalability:  to be fixed with automatic reconnection soon 10 – April - 14 17 A.Beche – Federated Workshop

  18. Future work  Strong requirement from ATLAS to understand efficiency:  Need the concept of error / failure  How XRootD server could be instrumented to report it?  European GLED collector is up and running:  Only 1 pilot site is reporting to it (CNAF)  Should we keep it?  Data mining activity (not started yet):  Almost 2 years of raw data (1TB) 10 – April - 14 18 A.Beche – Federated Workshop

  19. Data Mining  Extract further knowledge from the data…  Detect inefficiencies  Propose deletion strategies  Define data placement  … by  Understand access patterns and data usage  Correlate data traffic and data access performance  Possibility to automate some operations 10 – April - 14 19 A.Beche – Federated Workshop

  20. Application usage FAX AAA 30 20 15 10 10 – April - 14 20 A.Beche – Federated Workshop

  21. Summary  Monitoring federations is a challenge  High rate of traffic & information  Challenge met by data aggregation, scalable technologies  Dashboard is not actively used  Less than 10 daily users (FAX), less than 15 (AAA)  Is there any missing functionalities?  Improvement work is ongoing  New requests are coming  XRootD monitoring is a one piece of the entire Data transfers puzzle  See next talk 10 – April - 14 21 A.Beche – Federated Workshop

  22. Beyond XRootD monitoring A.Beche D.Giordano

  23. Outlines  Talk 1: XRootD Monitoring Dashboard  Context  Dataflow and deployment model  Database: storage & aggregation  User interface & use cases  Open issues & future work  Summary  Talk 2: Beyond XRootD monitoring  HTTP/WebDAV integration  Integration in the WLCG Transfers Dashboard 10 – April - 14 23 A.Beche – Federated Workshop

  24. HTTP Federation is coming  HTTP protocol will be used in the future  XRootD servers can be accessed  See Fabrizio’s presentation on xrdhttp  Two kind of accesses:  Pure HTTP access (through Apache)  HTTP gate to XRootD server  Can’t be monitor in the same way 10 – April - 14 24 A.Beche – Federated Workshop

  25. Monitoring XRootD access protocol  XRootD 4 will now reports the user protocol:  All the monitoring chain needs to be updated  Dashboard DB and UI are fully ready HTTP XRootD 10 – April - 14 25 A.Beche – Federated Workshop

  26. HTTP/WebDAV federation monitoring XRootD Federation Site Site XRootD SE JOB GLED collector ActiveMQ 10 – April - 14 26 A.Beche – Federated Workshop

  27. HTTP/WebDAV federation monitoring XRootD Federation HTTP Federation Site Site Site XRootD SE JOB GLED collector ActiveMQ 10 – April - 14 27 A.Beche – Federated Workshop

  28. HTTP/WebDAV federation monitoring XRootD Federation HTTP Federation Site Site Site Xrd JOB XRootD HTTP SE JOB GLED collector ActiveMQ 29 November 2013 28 Alexandre Beche - ITTF

  29. HTTP/WebDAV federation monitoring XRootD Federation HTTP Federation Site Site Site Xrd JOB XRootD HTTP SE JOB Apache JOB GLED collector ActiveMQ 10 – April - 14 29 A.Beche – Federated Workshop

  30. HTTP/WebDAV federation monitoring XRootD Federation HTTP Federation Site Site Site Xrd JOB XRootD HTTP SE JOB Apache JOB GLED ? collector ActiveMQ 10 – April - 14 30 A.Beche – Federated Workshop

  31. How to compare data from different applications? 10 – April - 14 31 A.Beche – Federated Workshop

  32. data transfers & accesses monitoring tools WEB WEB WEB API / UI API/UI API/UI WLCG FAX AAA EOS EOS FTS FAX AAA 10 – April - 14 32 A.Beche – Federated Workshop

  33. WLCG Transfers Dashboard federated approach WLCG Transfers Dashboard API / UI WEB WEB WEB API/UI API/UI API / UI FAX AAA FTS EOS EOS FAX AAA FTS 10 – April - 14 33 A.Beche – Federated Workshop

  34. Some plots FTS XRootD ALTAS LHCb CMS ALICE 10 – April - 14 34 A.Beche – Federated Workshop

  35. Summary  Lots of effort has been put in XRootD monitoring workflow and dashboard in the last 2 years  Reliable system achieved  Lots of use cases covered  HTTP Monitoring already started  Will require a lot of effort to reach XRootD monitoring level  New WLCG Transfers Dashboard architecture  Highly extensible system  Cross-VO or cross-technology analysis 10 – April - 14 35 A.Beche – Federated Workshop

  36. Credits  Andreeva Julia  Cons Lionel  Giordano Domenico  Saiz Pablo  Tadel Matevz  Tuckett David  Vukotic Ilija  The AAA and FAX deployment team  …. 10 – April - 14 36 A.Beche – Federated Workshop

Recommend


More recommend