predicting intermediate storage performance for workflow
play

Predicting Intermediate Storage Performance for Workflow - PowerPoint PPT Presentation

Predicting Intermediate Storage Performance for Workflow Applications Lauro B. Costa , Samer Al-Kiswany, Abmar Barros*, Hao Yang, and Matei Ripeanu University of British Columbia *UFCG, Brazil PSDW13 Nov, 18th co-located with SC13 Storage


  1. Predicting Intermediate Storage Performance for Workflow Applications Lauro B. Costa , Samer Al-Kiswany, Abmar Barros*, Hao Yang, and Matei Ripeanu University of British Columbia *UFCG, Brazil PSDW’13 Nov, 18th co-located with SC’13

  2. Storage System Compute Nodes Backend Storage (e.g., NFS, GPFS) High Aggregated BW Storage system co-deployed One or few Avoid backend storage as bottleneck servers Many Nodes Opportunity to configure per application 2

  3. Storage System Configuration Different storage parameters e.g., data placement, #nodes, chunk size Benefit different workloads e.g., data sharing, I/O intensive, read/write size Proper choice of parameters depend on the workload 3

  4. BLAST Example 4

  5. How to support the intermediate storage configuration? 5

  6. Configuration Loop Identify parameters Define a target performance Costly Analyze system activity Run application 6

  7. Automating the Configuration Loop Application Automated Configuration Trace Evaluation What...If... What...If... Execute Desired Engine Benchmark Configuration Platform description 7

  8. Predictor Requirements Accuracy Response Time/Resource Usage Usability What...If... 8

  9. Storage System Model Focus at high level – Manager, storage nodes, clients – No details (e.g., CPU) Simple seeding 9

  10. Storage System Model 10

  11. Seeding the Model No monitoring changes to the system – Use coarse level measurements – Infers services’ time Small deployment – One instance of each component 11

  12. Evaluation Metrics – Accuracy – Response time Workload – Synthetic benchmark – An application Testbed: cluster of 20 machines 12

  13. An Application BLAST DNA database file Several queries (tasks) over the file Evaluate different parameters # of storage nodes, # of clients chunk size 13

  14. BLAST Results ~2x difference Performance varies Accuracy allows good decisions ~3000x less resources 14

  15. Concluding Remarks Non-intrusive seeding process/system identification Low-runtime Accuracy allows good decision Predictor can support development 15

  16. Future Work Automate parameter exploration – Prune space by preprocessing input – Induce placement based on task dependency Add applications Increase Scale Add metrics – Cost – Energy is challenging – Data transferred is accurate 16

  17. Concluding Remarks Non-intrusive seeding process/system identification Low-runtime Accuracy allows good decision Predictor can support development 17

  18. Workflow Applications DAG represents task- dependency Scheduler controls dependency and task execution on a cluster Tasks communicate via files 18

  19. Synthetic Benchmarks Stress the system – I/O only, tend to create contention Based workflow patterns – Evaluate different data placements 19

  20. Workflow Patterns 20

  21. Synthetic Benchmarks Accuracy can support the decision Pipeline Reduce Broadcast ~2000x less resources 21

  22. Related Work • Storage enclosure focused • Detailed model and seeding (monitoring changes) • Lack of prediction on the total execution time for workflow applications • Machine Learning 22

  23. Workload Description I/O trace per task – read, write – size, offset Task dependency graph 23

  24. BLAST: CPU hours 24

  25. Platform Example – Argonne BlueGene/P 2.5K IO Nodes 160K cores GPFS IO rate : 8GBps = 51KBps / core Hi-Speed Network 10 Gb/s Switch Complex 24 servers Nodes dedicated to an application Storage system coupled with the application’s execution 850 MBps 2.5 GBps per 64 nodes per node 25

  26. Tuning is Hard Defining target values can be hard Understanding distributed systems, application or application’s workloads is complex Workload or infrastructure can change Tuning is time-consuming 26

  27. Storage System 27

  28. Montage Example Tasks communicate via shared files 28

  29. Storage System Meta-data manager Storage module Client module 29

  30. Configuration Loop Identify Identify parameters parameters Define a target Define a target performance performance Costly Analyze system Analyze system activity activity Performance Run Predictor application 30

  31. Intermediate Storage System Storage system co-deployed Avoid backend storage as bottleneck Opportunity to configure per application 31

Recommend


More recommend