introduction to autonomic computing
play

Introduction to Autonomic Computing Johan Tordsson Department of - PowerPoint PPT Presentation

Introduction to Autonomic Computing Johan Tordsson Department of Computing Science www.cloudresearch.org About me MSc (Civ.Ing) Computer Science (2004) PhD Ume, Grid computing (2009) Postdoc in Madrid Spain (2009), OpenNebula


  1. Introduction to Autonomic Computing Johan Tordsson Department of Computing Science www.cloudresearch.org

  2. About me • MSc (Civ.Ing) Computer Science (2004) • PhD Umeå, Grid computing (2009) • Postdoc in Madrid Spain (2009), OpenNebula • Architect etc. in misc. EC projects (2009-2013) • Associate professor (2014 - now) • Research – Autonomic cloud and data center management – How to make clouds run themselves faster/better/cheaper? • Spare time job: – CTO & co-founder for Elastisys (UMU cloud research spinoff) – Evangelizing that computers (will) beat humans at IT operations

  3. Outline • Why – do we need autonomic computing? • What – are autonomic systems? • How – to build these autonomic systems? • When – will they happen? • Who – will build them?

  4. Motivation: software complexity

  5. Motivation: scale • Enorma byggnader med servrar, lagringsutrustning, nätverk, kylning • En fabrik för IT-tjänster 5

  6. Motivation: faults Question: what is the probability of a hard drive failure? In my laptop? Will happen every few years, hopefully not right now… In a large supercomputer or data center? More than 100k nodes Will happen during this talk!

  7. Motivation: costs • Question: How many servers can be handled by a system administrator? • Very old question… • Some numbers: – 10 - very complex systems – ~300 - standard large-scale organization – Several 1000s – virtualized data center – 26k (Facebook 2013) • Highel-level management and better abstractions are needed – Alternative: exponential increase in need for systems management

  8. Autonomic option • Autonomic computing – Named after autonomic nervous system – Systems manage themselves according to admin goals – Self-governing operation of entire system, not just parts of it – New components integrate effortlessly - as a new cell establishes itself in the body

  9. Autonomic Computing • IBM initiative in early 2000’s • Landmark paper published 2003 in IEEE Computer by Kephart and Chess @ IBM • Active research field since, during 2003-2013: – 200 conferences/workshops – 8000+ papers • Lots of funding – EC FP6, FP7, H2020 – WASP… • Industry uptake – Many big IT vendors & startups • Key point – Self-management of IT systems

  10. Self-management (1/3) • Self-management – Changing components – External conditions – Hardware/software failures • Ex. component upgrade – Continually check for component upgrades – Download and install – Reconfigure itself – Run a regression test – When it detects errors, revert to the older version

  11. Self-management (2/3) • Four aspects of self-management – Self-configuration • Configure themselves automatically • High-level policies ( what is desired, not how ) – Self-optimization • Continually seek ways to improve their operation • Hundreds of tunable parameters – Self-healing • Handle faults and errors • Analyze information from logs and monitors – Self-protection • Malicious attacks • Cascading failures • Admin mistakes

  12. Self-management (3/3) Hal 9000, 2001 • Autonomic computing achievable without self-awareness? – Without hard artificial intelligence Terminator • (Hollywood) Misconception: machines will take over g! all human tasks – AI could be a “real danger” (S. Hawking) – Unemployment? – • Actual idea: Machines will free people to manage systems at higher level

  13. Autonomic elements Autonomic manager Analyze Plan Knowledge Monitor Execute Managed element • Fundamental atom of • Responsible for: the architecture – Providing its service – Managed element(s) – Managing behavior • Server, database, according to goals storage system, etc. Interacting with other – Autonomic manager autonomic elements

  14. Autonomic element details Sensors Effectors Autonomic Manager Analyze Plan Monitor Execute Knowledge Managed Element Sensors Effectors • Sensors: monitor environment • Effectors: tune managed element • MAPE loop: – Process for self-management of autonomic element

  15. The MAPE loop 1. Monitor: – Collect information about state of system – Lot of metrics around – Which ones to gather? – How often to monitor? 4. Execute – Turn the “knobs” of the managed element – Interactions between knobs? • Unknown, even to human operators • At Google, 238 knobs in each managed entity

  16. The MAPE loop (cont.) 2. Analyze – Estimate current state based on monitoring data – Commonly use model of the world for this • “All models are wrong, but some are useful” • What part of system to model? How? • Correlations? 3. Plan – Select action(s), i.e., which knobs to turn? – Can be formulated as optimization problem – Reactive vs. Predictive/Proactive methods • Knowledge management – Update model dynamically (monitoring) – Evaluate effects of actions (execution)

  17. Engineering challenges (1/3) • Life cycle of an autonomic element – Design, test, and verification • Testing autonomic elements a challenge – Installation and configuration • Element registers itself in a directory service – Monitoring and problem determination • Elements will continually monitor themselves • Adaptation, optimization, reconfiguration – Upgrading – Uninstallation or replacement

  18. Engineering challenges (2/3) • Relationships among autonomic elements – Specification • Set of output/input services of autonomic elements • Expressed in a standard format • Description syntax and semantics – Location • Find input services that autonomic element needs – Negotiation – Provision – Operation • Autonomic manager oversees the operation – Termination

  19. Engineering challenges (3/3) • System-wide issues – Authentication, encryption, signing – Autonomic elements can identify themselves – Autonomic system must be robust against insidious forms of attack • Goal specification – Humans provide the goals and constraints – Ensure that goals are specified correctly in the first place – Autonomic systems need to protect themselves from bad input goals: • Inconsistent, implausible, dangerous, or unrealizable

  20. Specifying goals (1/3) • Rules – Often simple condition-action pairs • If something happens, do this • If something else happens, do that • … – Can use more complex languages to express states, context, etc. – Explicit enumeration tedious – Very limited ability to express complex actions

  21. Specifying goals (2/3) • Utility functions – Mathematical expressions – Maps system state to scalar value – Represents high-level objectives – What parts of system state to include? – What should function look like?

  22. Specifying goals (3/3) • Policies – (higher-level) descriptions of goals and constraints for operation – How to map to lower-level behavior? – Composition of multiple policies – What high-level language to use? • Turing-complete? • No widely used languages available today • Human operators used to explicit steering – Not used to indirect goal specification

  23. Autonomic management techniques - requirements • Robustness – Avoid oscillations or behavioral changes • Scalability – Internet-scale: millions of servers and networks, even more autonomic agents (50 billion devices?) • Adaptive to changing workloads – Some methods reliable for certain load patterns, but unstable once the load or system dynamics change • Performance – Need to make decisions fast enough to react timely – Optimal solutions vs. approximations • Simplicity – Key to adoption – Complex models vs. model-free? – Learning phase required before deployment?

  24. Autonomic management - sample techniques • Heuristic frameworks – Fast and simple, rules of thumb • Control theory – Used to steer, e.g., industrial plants, embedded systems, etc. – Discretization for data packet flows (queuing theory) • Machine learning – Evolve behavior based on empirical (monitor) data – Examples: Neural networks, genetic algorithms, reinforcement learning

  25. Heuristics • Rules of thumb – Often lack theoretic background • Often used to handle very complex (NP-hard) problems – Scalable, find fast solutions • Greedy: • Local decisions that make sense right here/now • May not result in optimal solution – Hill climbing • Steer search (manage system in this case) towards steepest slope – Often no upper bound • Not possible to know distance from optimal solution – ”The O-word…”

  26. Control theory • Mathematical models to monitor and steer dynamic systems – Real-time allocation of CPU, memory, etc. • Some simple examples: – Proportional control • Adjust signal proportionally to compensate error – PID (Proportional Integral Derivative) control: • Integral: adjustment w.r.t. error over time • Derivative: adjustment w.r.t. error trend

  27. Neural networks & Deep learning • Mimics the brain’s neuron systems • Input/hidden/output layers of neurons: – Neurons in hidden layer: activation functions maps input signal to output signal – Action functions tuned upon error in output layer (errors are propagated back for tuning) • Often used to capture multi-dimensional problems that are hard to model with other techniques • Hard to train (need representative training data) • Hard to understand cause/effect (hidden layers)

Recommend


More recommend