ai and predictive analytics in data center environments
play

AI and Predictive Analytics in Data-Center Environments Performance - PowerPoint PPT Presentation

AI and Predictive Analytics in Data-Center Environments Performance & Executing Experiments Josep Ll. Berral @BSC Intel Academic Education Mindshare Initiative for AI Introduction We have to choose where/how to run AI algorithms &


  1. AI and Predictive Analytics in Data-Center Environments Performance & Executing Experiments Josep Ll. Berral @BSC Intel Academic Education Mindshare Initiative for AI

  2. Introduction “We have to choose where/how to run AI algorithms & experiments ”

  3. Introduction • Algorithms have computing/data requirements • Computation Resources • CPU, Memory, GPUs, accelerators, Machines to run our algorithms storage, ... • Time to run • Train models, infer new data, … • Data Data to feed our algorithms • What we are modeling and imitating

  4. Resources • Algorithms have computing requirements Algorithms Resources Mem CPU Machine Mem CPU CPU Mem Disk Time

  5. Resources • Algorithms also have data requirements Algorithms In Memory Mem Data In Disks Disk From the Disk From Users Network Disk

  6. Environment We need a COMPUTING ENVIRONMENT!

  7. Environment • Local Machines • Own computer • Workstations at work • ...

  8. Environment • Cluster Machines • DataCenter at work • DataCener at labs • Scientific grids • ... Your Machine Data-Center / Cluster Submission Execution

  9. Environment • The Cloud • Data-Centers from Resource Providers Data-Centers Experiments Resource Provider

  10. Environment • Choosing the environment X 1 CPU X 4 CPU X 16GB Mem X 256 GB Mem X 2 CPU X 128GB Mem X 32 CPU X 2GB Mem X 64 CPU X 128 GB Mem X 16 CPU X 4GB Mem

  11. Environment • Choosing the environment X 128 CPU X 1TB Mem X 1 CPU X 4 CPU X ∞ CPU X 4GB Mem X 4GB Mem X ∞ GB Mem X 2 CPU X 2GB Mem X 16 CPU X 16GB Mem X 64 CPU X 16 CPU X 128GB Mem X 16GB Mem

  12. Performance “ Work x Time ” “ Capacity to Progress ”

  13. Performance • Performance is (usually) linked to Resources CPU Mem Disk Resources Performance

  14. Performance • Performance is (usually) linked to Resources CPU Mem Disk Resources (Up to a point) (Not always) (Not necessarily linear) (etc...) Performance

  15. Performance • Computing Environment • Pool of Resources • Algorithms/Apps/Experiments • Require resources X 1 CPU Mem OK! CPU X 32GB Mem Mem CPU CPU Mem Disk

  16. Some Theory Little’s Law • Little’s Law: L = λ W • “ Arrival rate x Dedicated time per input = Average load ”

  17. Some Theory Little’s Law • Relation between • Received load • Resources/Time required CPU Mem • Average load // Required resources Disk CPU CPU CPU

  18. Little’s Law: L = λ W • Example 100 Exp / hour • we submit λ = 100 experiments / hour • Exps. take an average of W = 0.5 hours ½ hour • [with 1 CPU per exp.] CPU 1 CPU • Average number of exps. on our system: L = 50 exps • [avg. 50 CPUs in use] BUSY What we expect CPU x 50 (what we need)

  19. Little’s Law Demo!

  20. Resource Limits Arrival rate: λ = 100 experiments / hour • 100 CPUs Exps. take: W = 0.5 hours [1 CPU per exp.] Average exps. in: L = 50 exps CPU x 100 100 Jobs 0 Jobs 100 CPUs BUSY 0 CPUs BUSY x 100 CPU Running Running 0 30 60

  21. Resource Limits Arrival rate: λ = 100 experiments / hour • 50 CPUs Exps. take: W = 0.5 hours [1 CPU per exp.] (What Little’s Law indicated) Average exps. in: L = 50 exps CPU CPU 50 Jobs 50 Jobs x 100 50 CPUs BUSY 50 CPUs BUSY Running Running x 50 CPU 50 Jobs 0 Jobs Queue Queue 0 30 60

  22. Resource Limits Arrival rate: λ = 100 experiments / hour • 25 CPUs Exps. take: W = 0.5 hours [1 CPU per exp.] • Less than needed! Average exps. in: L = 50 exps CPU CPU 25 Jobs 25 Jobs x 100 25 CPUs BUSY 25 CPUs BUSY Running Running x 25 CPU 75 Jobs 50 Jobs Queue Queue 0 30 60

  23. Resource Limits Arrival rate: λ = 100 experiments / hour • 25 CPUs Exps. take: W = 0.5 hours [1 CPU per exp.] • Less than needed! Average exps. in: L = 50 exps CPU CPU CPU 25 Jobs 25 Jobs 25 Jobs x 100 25 CPUs BUSY 25 CPUs BUSY 25 CPUs BUSY x 25 Running Running Running CPU 75 Jobs 50 Jobs 125 Jobs Queue Queue Queue 0 30 60 90 x 100

  24. Throughput • Throughput: outcome per time unit • E.g. experiments finished per hour • E.g. data-sets processed per minute • E.g. data points trained per second Throughput Resources Limit Load

  25. Resource Competition • Systems do not always have “queues” • Processes (applications) compete in the system • For using the CPU • For getting some memory • For accessing the disk and network (I/O)

More recommend