queueing in dcache paul millar berlin 2013 05 28 mythical
play

Queueing in dCache Paul Millar Berlin, 2013.05.28 Mythical - PowerPoint PPT Presentation

Queueing in dCache Paul Millar Berlin, 2013.05.28 Mythical self-organising users? Credit: Florenz Kley @flickr Queues: dealing with uneven loads Credit: toronto_pcu @flickr Always fast-enough too expensive Credit: @flickr Threads and


  1. Queueing in dCache Paul Millar Berlin, 2013.05.28

  2. Mythical self-organising users? Credit: Florenz Kley @flickr

  3. Queues: dealing with uneven loads Credit: toronto_pcu @flickr

  4. Always fast-enough too expensive Credit: 黒忍者 @flickr

  5. Threads and ThreadPools ● A thread is the smallest unit of independent CPU activity supported by Java (and most OSes) ● Creating a thread is (relatively) expensive ● A thread-pool allows a thread to be used for multiple tasks

  6. Queue + ThreadPool Credit: mimi anderson @flickr

  7. Queues and dCache Credit: Sarah Macmillan @flickr

  8. Queue overflowing are a symptom Credit: Alexandre Duret-Lutz @flickr

  9. Queues and dCache ● Impossible to get a comprehensive talk on this subject ● It's far, far to big a topic! ● Instead, take a worked example : ● Uploading a file using SRM and FTP ● Client chooses to write file into a specific space-reservation ● File is custodial/nearline, so flushed to tape ● With the following limitations: ● Everything runs smoothly (no errors) ● only presenting some of the interactions between components Skip over some details, when they are unenlightening. ● No advise on tuning ● Many details are specific for this worked example.

  10. Helicopter view ● Client connects to... ● SRM and issues a prepareToPut. ● GridFTP door, preparing dCache for upload ● Pool, delivering data ● SRM and issues a putDone command. ● Independent from the putDone, the pool flushes file to tape

  11. The communication: srmPrepareToPut client SRM gPlazma PnfsManager Space Manage prepareToPut() login login OK Does file already exist? No request parent directory metadata <metadata> Check user can write mark space in use Select door Increase READY turl count <turl>

  12. Communication: FTP (part 1) client FTP door gPlazma PnfsManager Space Manager PoolManager Pool USER 200 User logged in PUT create file <PnfsId>,<SI> SelectWritePool select LinkGroup SelectWritePool <pool> store <PnfsId> <pool> PoolAcceptFile lookup AL & RP PoolAcceptFile Update load-model PoolAcceptFile <moverId> <moverId> update status <moverId>

  13. Communication: FTP (part 2) client FTP door gPlazma PnfsManager Space Manager PoolManager Pool Billing TransferStarted 127 PORT Connect & send data Last byte & close connection <size>,<AL>,<RP>,<checksum> Log transfer TransferFinished Update load-model TransferFinished update status TransferFinished Log transfer 226 Transfer complete. QUIT 221 Goodbye

  14. Communication: srmPutDone client SRM gPlazma PnfsManager prepareToPut() login login OK file exists? yes decrease READ TURL count SUCCESS

  15. Communication: pool-flush Pool PnfsManager Space Manager Time to flush file to tape File exists? Yes Run tape integration script FileFlushed Update storage info FileFlushed Free space

  16. Queues: messages

  17. Messaging Domain Cell Cell Thread Thread

  18. Messages: generating quick reply Domain Cell Cell Thread Thread Generate quick Reply Message Queue

  19. Messages: slower replies Domain Cell Cell Thread Thread Thread Slower Reply Message Queue Another queue

  20. Messages: replying (blocking) Domain Cell Cell Thread WAIT Thread Generate quick Reply Message Queue

  21. Messages: registered call-back Domain Cell Cell Thread Thread Generate quick Reply Thread Message Queue Callback Queue

  22. Messages: pure asynchronous Domain Cell Cell Thread Thread Generate quick Reply Thread Message Queue Message Queue

  23. Messages: tunnel Tunnel Cell Thread Write Message Queue TCP socket Read Thread

  24. SRM: srmPrepareToPut

  25. srmPrepareToPut: Jetty server srmJettyThreadsMaxQueued (500) srmJettyConnector- BackLog (1024) 1. Wait for request: srmJettyThreadsMax Disconnect if client takes too long (500) 2. Parse request TCP Backlog srmJettyThreads- 3. Run SRM code to build reply Thread IdleTime(30s) Thread 4. Send reply 5. If client requested it or HTTP/1.0 srmJettyConnector- Acceptors (1) Disconnect 6. Loop Thread srmJettyThreadsMin Connection Queue (10)

  26. srmPrepareToPut: generating reply gPlazma: login OK Thread Queue Priority Thread ThreadQueue Manager SURL unbounded srmPutReq ThreadQueue Size (10,000) 200 Thread Thread srmPutReq ThreadPool Thread PnfsManager: Does file exist? Size (250) No Thread Thread PnfsManager: Get parent dir metadata Thread Thread <metadata> SpaceManager: use space ReadyQueue Done srmPutReq ReadyQueue Size (250) srmPutReqMax ReadyRequests (1,000) TURL

  27. SRM and databases DB Connection Pool Jdbc Queue max (50) srmMaxNumber Never Expires OfJdbcTasksInQueue DB Query (1,000) min(0) Thread Thread srmJdbcExecution ThreadNum (5)

  28. gPlazma MessageQueue gPlazmaNumber OfSimultaneous Process login attempt Requests (30) Thread Deliver login result

  29. PnfsManager pnfsNumberOfThreads(4) MessageQueue PnfsQueue MaxSize (0) Single Thread pnfsNumber Thread OfThread Groups (1)

  30. PnfsManager PnfsThread DB Connection Pool max (90) idle (4 hours) min(30) Thread Send Result Folding

  31. SpaceManager ThreadManager DB Connection Pool MessageQueue max (30) idle (4 hours) Unbounded min(0) 200 Thread Thread Send Result

  32. FTP upload and pool flush

  33. FTP door Command Queue (50) Unbounded FTP Cmd Single Create Thread Thread TCP Backlog Thread Process Thread Command Create Thread MessageQueue Thread React on notification

  34. PoolManager MessageQueue SelectWrite Pool Thread Create Thread Reply with result Thread Forward message AcceptFile

  35. Pool: AcceptFile Mover Queue MessageQueue (1,000) maxActive max=50 Movers Idle=1 min min=5 Thread Thread (50) Reply Open socket TransferStarted TCP Backlog Notify: new precious file PnfsManager: update Billing:Info Door:TransferFinished

  36. Pool: flush StorageClass Queue Storage Queue Thread Notify: new precious file Thread PnfsManager:file exists? Yes run script PnfsManager:FileFlushed SpaceManager:FileFlushed

  37. Billing MessageQueue Thread Write to file Write to db

  38. SRM srmPutDone

  39. srmPutDone: Jetty server srmJettyThreadsMaxQueued (500) srmJettyConnector- BackLog (1024) 1. Wait for request: srmJettyThreadsMax Disconnect if client takes too long (500) 2. Parse request TCP Backlog srmJettyThreads- 3. Run SRM code to build reply Thread IdleTime(30s) Thread 4. Send reply 5. If client requested it or HTTP/1.0 srmJettyConnector- Acceptors (1) Disconnect 6. Loop Thread srmJettyThreadsMin Connection Queue (10)

  40. srmPutDone: generating reply Thread gPlazma: login OK PnfsManager: Does file exist? Yes Decrease READY count

  41. Summary ● Queues are not evil ● Queues should be tuned for good performance, even under heavy load Simply increasing queues isn't always optimal ● Quite a number of queues in dCache: not all are obvious, not all are tunable

Recommend


More recommend