arda technology for distributed analysis
play

ARDA: Technology for Distributed Analysis http://cern.ch/arda - PowerPoint PPT Presentation

ISGC 2004 Taipei ARDA: Technology for Distributed Analysis http://cern.ch/arda Jakub Moscicki ARDA Project Jakub.Moscicki@cern.ch www.eu-egee.org cern.ch/lcg EGEE is a project funded by the European Union under contract IST-2003-508833


  1. ISGC 2004 Taipei ARDA: Technology for Distributed Analysis http://cern.ch/arda Jakub Moscicki ARDA Project Jakub.Moscicki@cern.ch www.eu-egee.org cern.ch/lcg EGEE is a project funded by the European Union under contract IST-2003-508833

  2. Contents • Context of ARDA Project • Technology Discussions and ARDA Prototypes � Service Interaction and Architecture � Data Management � Files on the GRID � Metadata Catalogs � GRID Services and Databases � Interactivity on the GRID � Connectivity & Protocols • Outlook ISGC 2004 Taipei- 2

  3. Evolution of ARDA • November 2003: ARDA RTAG Report � ARDA Blueprint = Architectural Roadmap for Distributed Analysis Set of collaborating Grid services and their interfaces • • January 2004: ARDA Workshop 1 � ARDA Project = A Realisation of Distributed Analysis Coordination and early integration between Generic Middleware (EGEE) and • LHC Experiments' Software • May 2004: EGEE protype � Glite = new-generation generic middleware Very first prototype available internally to ARDA group • • June 2004: ARDA Workshop 2 � Hands-on meeting: “First 30 days of ARDA prototype” • Fall 2004: ARDA Workshop 3 � Summary of the first phase of ARDA prototype ISGC 2004 Taipei- 3

  4. Software Activities in ARDA Alice Atlas CMS LHCb ARDA-Alice ARDA-Atlas ARDA-CMS ARDA-LHCb ARDA LCG POOL, SEAL, ROOT Regional Centers Taiwan, Russia Glite / EGEE ARDA ARDA ARDA RTAG RTAG RTAG Bio Science ISGC 2004 Taipei- 4

  5. EGEE Middleware vs ARDA • Interfacing middleware to the experiment frameworks • Early deployment of (a series of) end-to-end prototypes to ensure functionality and coherence ARDA Prototype(s) � Middleware as a building block � Validation of the design � Feedback, discussions • Identification of common/generic components • New service decomposition EGEE Middleware � Strong influence of Alien system the Grid system developed by the ALICE experiments and used by a wide • scientific community (not only HEP) • Role of experience, existing technology… � Web service framework ISGC 2004 Taipei- 5

  6. ISGC 2004 Taipei- 6 GRID Service Framework

  7. Service Interaction • WSDL vs Client API � Regular Services (Metadata Catalog,....) • API expected by Clients may be not implementable in a performant way if mapped one-to-one to the WSDL – e.g. paging of query results • may require a client side library (stubs) – langauge bindings, platform compatibility,... � “Interoperability Service” • Require dynamic interface discovery and composition: WSDL WSDL Client Service WSDL API Client Service ISGC 2004 Taipei- 7

  8. Grid Access Service • Grid Access Service � Client Proxy • All client operations are channeled via a user-based GAS instance • GAS instance exports the interface based on user's certificate – authorisation to access different services � Entry Point (Bootstraping) • Resolves references to other services and gives back the pointers File Catalog Metadata client GAS Replica Workload ISGC 2004 Taipei- 8

  9. Grid Access Library: ARDA-Alice • Client Proxy : Grid Access Library • Service Factory (OGSA) vs Pool of Services File Access I/O Services (via catalogue) S S M o T Iod libgliteIO gLite SHM-Bus session UI Authen. Authentication Embedded Perl (GSI, ROOT module PAM) *Ssl-poolserver C2PERL gLite.pl gshell libgliteUI *Mtpoolserver C2PERL gLite.pl Shell gLite/AliEn *Mtpoolserver C2PERL gLite.pl commands Services *Mtpoolserver C2PERL gLite.pl ISGC 2004 Taipei- 9

  10. Modularity & Interfaces • Architectural Issues � ability to replace implementation of components • No backdoor interfaces • Integration with middleware developed by experiments Interface Client Service X Interface Service ISGC 2004 Taipei- 10

  11. ISGC 2004 Taipei- 11 Files and Metadata

  12. Files on the GRID • Files on the GRID � Categories of files • 90% of files are WORM • Read/Write Files: – File GUID lifecycle and management – Versioning � Role of LFN: • LFN is mutable and may point to different GUIDs over time • Files without LFNs? • High-level, VO specific optimization based on LFN? � Persistent references • To other files (via GUID, LFN) • To objects inside files ISGC 2004 Taipei- 12

  13. Metadata • Metadata Flavours � Filesystem Metadata: property of File Catalog � Physics Metadata: in a separate database • Physics Metadata � Many parallel developments in LHC experiments • AMI, RefDB, BKDB,... • Glite should (?) provide mechanisms to plug in these components � Generic metadata service • “Annotate any objects with key-value lists of metadata” – any objects = datasets, applications, users,... • Isn't it purely a database problem? ISGC 2004 Taipei- 13

  14. ISGC 2004 Taipei- 14 Data Management

  15. File Transfer • Data management transfer service needed � Reliable file transfer � Misuse protection � Example: TMDB (CMS) RefDB Summaries of successful jobs Reconstruct CMS DC04 ion instructions production Reconstruction jobs McRunjob T0 worker nodes Transfer Checks Reconstructed what has agent data arrived GDB castor Updates Updates pool Tapes Reconstructed data RLS TMDB Export Buffers ISGC 2004 Taipei- 15

  16. GridFTP • GridFTP (GT 3.3) • 1GB in 5-10 s : for files > 100MB – 2x PIII, 2.40GHz, 1GB RAM ISGC 2004 Taipei- 16

  17. Protocol Stack: SOAP • Control layer vs Transport Layer � Example: RFT (Reliable File Transfer, GT3) Interface Interface SOAP Service Service GridFTP ISGC 2004 Taipei- 17

  18. SOAP Encoding • SOAP problems � encoding of binary data and complex structures: overhead 10x � floating point representation: parser dependant � SOAP/https encryption: overhead 30% • Example: � Grid Access Library (ARDA-Alice) • ~1500 (plain) / 800 (encrypted) call per second Session ID Session ID Session ID Session Crypto(t) Session Crypto Session Crypto UU Call XY C O Arg1 = '...'; D E C / Sym.Cipher UU Encoding SOAP Body SOAP Body SOAP String S e r i a l i z e r SSLEncryption Arg2 = '...'; Arg3 = '...'; grid command encoding+encryption ISGC 2004 Taipei- 18

  19. ISGC 2004 Taipei- 19 Database Access

  20. DB / SOAP: AMI • AMI : Atlas File Metadata Catalogue: � Simulation/Reconstruction-Software Version � Does not contain physical filenames • Implementation Features: � dynamic schema versioning � Java/AXIS Web Service Framework + MySQL Backend User SOAP-Proxy Meta-Data (MySQL) User User ISGC 2004 Taipei- 20

  21. AMI Performance AMI behaviour using many concurrent clients: 100 30 Clients 60 10 Clients per Second 50 Time to completion 10 5 10 20 20 40 50 30 5 50 1 20 40 150 100 10 100 1 150 Rows 0 0 20 40 60 80 100 120 140 0.1 0 20 40 60 80 100 120 140 Number of rows selected Large network traffic overhead due to schema independent tables • SOAP Web Services proxy supposed to provide DB access • – Note that Web Services are “stateless” (not automatic handles to have the concept of session, transaction, etc…): 1 query = 1 (full) response Large queries might crash server • Shall SOAP front-end proxy re-implement all the database functionality? • ISGC 2004 Taipei- 21

  22. DB / PHP: RefDB • RefDB � CMS Metadata and Bookkeeping Database • Implementation Features: User � MySQL backend PHP proxy Meta-Data � PHP script frontend (MySQL) User • User query encoded in URL • DB query logic in the PHP scripts User � Result handling: • small queries: XML over HTTP • large queries: cached in a file to be fetched again ISGC 2004 Taipei- 22

  23. ISGC 2004 Taipei- 23 RefDB Performance

  24. DB / XML RPC : BKDB • LHCb Bookkeeping DB • Implementation Features: � Oracle backend � XML-RPC frontend • Python client for tests ISGC 2004 Taipei- 24

  25. CERN/Taiwan tests on LHCb metadata catalogue 30 Unoptimized SQL access Time to completion(sec) 25 1 Client 5 Clients 20 200 rows = 180KB 10 Clients 20 Clients 15 30 Clients 40 Clients 10 50 Clients 5 100 Clients 0 0 50 100 150 200 250 50 Response Time(s) Number of rows selected 40 30 20 10 0 0 50 100 150 200 250 300 350 400 450 500 550 XML-RPC access The Number of Clients ISGC 2004 Taipei- 25

  26. Database Access Summary • Service fronted to a DB � problems: • reproducing DB capabilities is very hard – ACID – transactions – timeouts – large queries • additional conversion layer: significant performance drop � benefits • high-level query support: client is abstracted from the internal DB optimization techniques – simple user query may map to a complex ‘optimized’ internal query ISGC 2004 Taipei- 26

  27. Connectivity and Interactivity ISGC 2004 Taipei- 27

  28. Connectivity • Asynchronous Messaging � SOAP over Jabber (IM) • “Agent Model” � Outbound Connectivity Required • direct/indirect � Examples • DIRAC, PROOF � Security models • Open ports for trusted inbound traffic • Parachute trusted agent which will initiate outbound connection • Connection Managament � SOAP over http is connectionless ISGC 2004 Taipei- 28

Recommend


More recommend