Introduction to Autonomic Computing Johan Tordsson Department of - PowerPoint PPT Presentation

Introduction to Autonomic Computing Johan Tordsson Department of Computing Science www.cloudresearch.org

About me • MSc (Civ.Ing) Computer Science (2004) • PhD Umeå, Grid computing (2009) • Postdoc in Madrid Spain (2009), OpenNebula • Architect etc. in misc. EC projects (2009-2013) • Associate professor (2014 - now) • Research – Autonomic cloud and data center management – How to make clouds run themselves faster/better/cheaper? • Spare time job: – CTO & co-founder for Elastisys (UMU cloud research spinoff) – Evangelizing that computers (will) beat humans at IT operations

Outline • Why – do we need autonomic computing? • What – are autonomic systems? • How – to build these autonomic systems? • When – will they happen? • Who – will build them?

Motivation: software complexity

Motivation: scale • Enorma byggnader med servrar, lagringsutrustning, nätverk, kylning • En fabrik för IT-tjänster 5

Motivation: faults Question: what is the probability of a hard drive failure? In my laptop? Will happen every few years, hopefully not right now… In a large supercomputer or data center? More than 100k nodes Will happen during this talk!

Motivation: costs • Question: How many servers can be handled by a system administrator? • Very old question… • Some numbers: – 10 - very complex systems – ~300 - standard large-scale organization – Several 1000s – virtualized data center – 26k (Facebook 2013) • Highel-level management and better abstractions are needed – Alternative: exponential increase in need for systems management

Autonomic option • Autonomic computing – Named after autonomic nervous system – Systems manage themselves according to admin goals – Self-governing operation of entire system, not just parts of it – New components integrate effortlessly - as a new cell establishes itself in the body

Autonomic Computing • IBM initiative in early 2000’s • Landmark paper published 2003 in IEEE Computer by Kephart and Chess @ IBM • Active research field since, during 2003-2013: – 200 conferences/workshops – 8000+ papers • Lots of funding – EC FP6, FP7, H2020 – WASP… • Industry uptake – Many big IT vendors & startups • Key point – Self-management of IT systems

Self-management (1/3) • Self-management – Changing components – External conditions – Hardware/software failures • Ex. component upgrade – Continually check for component upgrades – Download and install – Reconfigure itself – Run a regression test – When it detects errors, revert to the older version

Self-management (2/3) • Four aspects of self-management – Self-configuration • Configure themselves automatically • High-level policies ( what is desired, not how ) – Self-optimization • Continually seek ways to improve their operation • Hundreds of tunable parameters – Self-healing • Handle faults and errors • Analyze information from logs and monitors – Self-protection • Malicious attacks • Cascading failures • Admin mistakes

Self-management (3/3) Hal 9000, 2001 • Autonomic computing achievable without self-awareness? – Without hard artificial intelligence Terminator • (Hollywood) Misconception: machines will take over g! all human tasks – AI could be a “real danger” (S. Hawking) – Unemployment? – • Actual idea: Machines will free people to manage systems at higher level

Autonomic elements Autonomic manager Analyze Plan Knowledge Monitor Execute Managed element • Fundamental atom of • Responsible for: the architecture – Providing its service – Managed element(s) – Managing behavior • Server, database, according to goals storage system, etc. Interacting with other – Autonomic manager autonomic elements

Autonomic element details Sensors Effectors Autonomic Manager Analyze Plan Monitor Execute Knowledge Managed Element Sensors Effectors • Sensors: monitor environment • Effectors: tune managed element • MAPE loop: – Process for self-management of autonomic element

The MAPE loop 1. Monitor: – Collect information about state of system – Lot of metrics around – Which ones to gather? – How often to monitor? 4. Execute – Turn the “knobs” of the managed element – Interactions between knobs? • Unknown, even to human operators • At Google, 238 knobs in each managed entity

The MAPE loop (cont.) 2. Analyze – Estimate current state based on monitoring data – Commonly use model of the world for this • “All models are wrong, but some are useful” • What part of system to model? How? • Correlations? 3. Plan – Select action(s), i.e., which knobs to turn? – Can be formulated as optimization problem – Reactive vs. Predictive/Proactive methods • Knowledge management – Update model dynamically (monitoring) – Evaluate effects of actions (execution)

Engineering challenges (1/3) • Life cycle of an autonomic element – Design, test, and verification • Testing autonomic elements a challenge – Installation and configuration • Element registers itself in a directory service – Monitoring and problem determination • Elements will continually monitor themselves • Adaptation, optimization, reconfiguration – Upgrading – Uninstallation or replacement

Engineering challenges (2/3) • Relationships among autonomic elements – Specification • Set of output/input services of autonomic elements • Expressed in a standard format • Description syntax and semantics – Location • Find input services that autonomic element needs – Negotiation – Provision – Operation • Autonomic manager oversees the operation – Termination

Engineering challenges (3/3) • System-wide issues – Authentication, encryption, signing – Autonomic elements can identify themselves – Autonomic system must be robust against insidious forms of attack • Goal specification – Humans provide the goals and constraints – Ensure that goals are specified correctly in the first place – Autonomic systems need to protect themselves from bad input goals: • Inconsistent, implausible, dangerous, or unrealizable

Specifying goals (1/3) • Rules – Often simple condition-action pairs • If something happens, do this • If something else happens, do that • … – Can use more complex languages to express states, context, etc. – Explicit enumeration tedious – Very limited ability to express complex actions

Specifying goals (2/3) • Utility functions – Mathematical expressions – Maps system state to scalar value – Represents high-level objectives – What parts of system state to include? – What should function look like?

Specifying goals (3/3) • Policies – (higher-level) descriptions of goals and constraints for operation – How to map to lower-level behavior? – Composition of multiple policies – What high-level language to use? • Turing-complete? • No widely used languages available today • Human operators used to explicit steering – Not used to indirect goal specification

Autonomic management techniques - requirements • Robustness – Avoid oscillations or behavioral changes • Scalability – Internet-scale: millions of servers and networks, even more autonomic agents (50 billion devices?) • Adaptive to changing workloads – Some methods reliable for certain load patterns, but unstable once the load or system dynamics change • Performance – Need to make decisions fast enough to react timely – Optimal solutions vs. approximations • Simplicity – Key to adoption – Complex models vs. model-free? – Learning phase required before deployment?

Autonomic management - sample techniques • Heuristic frameworks – Fast and simple, rules of thumb • Control theory – Used to steer, e.g., industrial plants, embedded systems, etc. – Discretization for data packet flows (queuing theory) • Machine learning – Evolve behavior based on empirical (monitor) data – Examples: Neural networks, genetic algorithms, reinforcement learning

Heuristics • Rules of thumb – Often lack theoretic background • Often used to handle very complex (NP-hard) problems – Scalable, find fast solutions • Greedy: • Local decisions that make sense right here/now • May not result in optimal solution – Hill climbing • Steer search (manage system in this case) towards steepest slope – Often no upper bound • Not possible to know distance from optimal solution – ”The O-word…”

Control theory • Mathematical models to monitor and steer dynamic systems – Real-time allocation of CPU, memory, etc. • Some simple examples: – Proportional control • Adjust signal proportionally to compensate error – PID (Proportional Integral Derivative) control: • Integral: adjustment w.r.t. error over time • Derivative: adjustment w.r.t. error trend

Neural networks & Deep learning • Mimics the brain’s neuron systems • Input/hidden/output layers of neurons: – Neurons in hidden layer: activation functions maps input signal to output signal – Action functions tuned upon error in output layer (errors are propagated back for tuning) • Often used to capture multi-dimensional problems that are hard to model with other techniques • Hard to train (need representative training data) • Hard to understand cause/effect (hidden layers)

Introduction to Autonomic Computing Johan Tordsson Department of - PowerPoint PPT Presentation

Introduction to Autonomic Computing Johan Tordsson Department of Computing Science www.cloudresearch.org About me MSc (Civ.Ing) Computer Science (2004) PhD Ume, Grid computing (2009) Postdoc in Madrid Spain (2009), OpenNebula

Autonomic Systems Autonomic Systems Autonomic : adaptive : adaptive Autonomic Self

Autonomic Dysfunction: Autonomic Non-Epileptic Seizures and the Autonomic Epilepsies James J.

Autonomic Security Compliance Framework Cihan Tunc and Salim Hariri Cloud and Autonomic

From autonomic computing to autonomic ICT Fabrice Saffre Pervasive ICT Research Centre Fabrice

Autonomic Computing Introduction, Motivations, Overview S. Hariri and M. Parashar AICCSA03

AUTONOMIC DISORDERS AND AUTONOMIC TESTING Kamal R. Chmali, MD Associate Professor of Clinical

Autonomic Web-based Simulation Yingping Huang and Gregory Madey Computer Science and Engineering

Autonomic Addressing draft-behringer-anima-autonomic-addressing-02.txt 94 rd IETF, 2 Nov 2015

Autonomic Configuration of HyperDex via Analytical Modelling SAC 2014 (DADS) Nuno Diegues ,

Trends and Future challenges in autonomic communications S-38.4030 Contents 1. Autocom and

Autonomic Slice Networking draft-galis-anima-autonomic-slice-networking-01 V1.0 10 th November

The Autonomic Nervous System and Visceral Sensory Neurons The Autonomic Nervous System and Visceral

Autonomic Computing Introduction, Motivations, Overview M. Parashar, The AutoMate Group The

The Center for Cloud and Autonomic Computing: Vision and Capabilities Salim Hariri, Director

Autonomic Computing Research Issues, Challenges and Opportunities S. Hariri and M. Parashar

A Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation (2006) Paper: Gerald

Failure to Thrive: Rethinking Our Treatment Goals Darren Fiore, MD 2013 Advances &

FAILURE TO THRIVE: Disclosures RETHINKING OUR I have nothing to disclose. TREATMENT GOALS

Specifications Introduction to the Module This module is dedicated to specifications The

Relationships Session 7 PMAP 8921: Data Visualization with R Andrew Young School of Policy

Autonomic Grid Computing: Concepts, Infrastructure and Applications The Applied Software Systems

Runtime Models as Interfaces for Adapting Software Systems Seminar on Software Engineering for

Assessing Fault Sensitivity in MPI Applications Charng-Da Lu Daniel A. Reed Center for

Lunch-N-Learn Lunch-N-Learn 1:00-1:45 PM 1:00-1:45 PM The future of system The future of

Introduction to Autonomic Computing Johan Tordsson Department of - PowerPoint PPT Presentation

Introduction to Autonomic Computing Johan Tordsson Department of Computing Science www.cloudresearch.org About me MSc (Civ.Ing) Computer Science (2004) PhD Ume, Grid computing (2009) Postdoc in Madrid Spain (2009), OpenNebula

Autonomic Systems Autonomic Systems Autonomic : adaptive : adaptive Autonomic Self

Autonomic Dysfunction: Autonomic Non-Epileptic Seizures and the Autonomic Epilepsies James J.

Autonomic Security Compliance Framework Cihan Tunc and Salim Hariri Cloud and Autonomic

From autonomic computing to autonomic ICT Fabrice Saffre Pervasive ICT Research Centre Fabrice

Autonomic Computing Introduction, Motivations, Overview S. Hariri and M. Parashar AICCSA03

AUTONOMIC DISORDERS AND AUTONOMIC TESTING Kamal R. Chmali, MD Associate Professor of Clinical

Autonomic Web-based Simulation Yingping Huang and Gregory Madey Computer Science and Engineering

Autonomic Addressing draft-behringer-anima-autonomic-addressing-02.txt 94 rd IETF, 2 Nov 2015

Autonomic Configuration of HyperDex via Analytical Modelling SAC 2014 (DADS) Nuno Diegues ,

Trends and Future challenges in autonomic communications S-38.4030 Contents 1. Autocom and

Autonomic Slice Networking draft-galis-anima-autonomic-slice-networking-01 V1.0 10 th November

The Autonomic Nervous System and Visceral Sensory Neurons The Autonomic Nervous System and Visceral

Autonomic Computing Introduction, Motivations, Overview M. Parashar, The AutoMate Group The

The Center for Cloud and Autonomic Computing: Vision and Capabilities Salim Hariri, Director

Autonomic Computing Research Issues, Challenges and Opportunities S. Hariri and M. Parashar

A Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation (2006) Paper: Gerald

Failure to Thrive: Rethinking Our Treatment Goals Darren Fiore, MD 2013 Advances &amp;

FAILURE TO THRIVE: Disclosures RETHINKING OUR I have nothing to disclose. TREATMENT GOALS

Specifications Introduction to the Module This module is dedicated to specifications The

Relationships Session 7 PMAP 8921: Data Visualization with R Andrew Young School of Policy

Autonomic Grid Computing: Concepts, Infrastructure and Applications The Applied Software Systems

Runtime Models as Interfaces for Adapting Software Systems Seminar on Software Engineering for

Assessing Fault Sensitivity in MPI Applications Charng-Da Lu Daniel A. Reed Center for

Lunch-N-Learn Lunch-N-Learn 1:00-1:45 PM 1:00-1:45 PM The future of system The future of

Failure to Thrive: Rethinking Our Treatment Goals Darren Fiore, MD 2013 Advances &