artificial intelligence database performance tuning
play

Artificial Intelligence Database Performance Tuning Roel Van de - PowerPoint PPT Presentation

Artificial Intelligence Database Performance Tuning Roel Van de Paar Percona Agenda GA: How it works, terminology, variables, example Database Tuning & Surrounding thoughts gaai POC Results 2 Define: GA A Genetic


  1. Artificial Intelligence Database Performance Tuning Roel Van de Paar Percona

  2. Agenda ● GA: How it works, terminology, variables, example ● Database Tuning & Surrounding thoughts ● gaai ● POC ● Results 2

  3. Define: GA A Genetic Algorithm (GA) is an lightweight Artificial Intelligence (AI) evolutionary algorithm which mimics Darwin’s theory of natural evolution. 3

  4. GA Terminology Population (inc. any offspring) v Chromosomes (“ Individuals ”) v Genes (“Chromosome Length”) 4

  5. GA: it’s all about the genes 2 Parents v Children Children may get; ● Genes mixed from parents (“crossover”) ● Modified (“mutated”) genes 5

  6. GA Loop Population is created ‘randomly’ (can be pre-populated) v [loop]> Population is evaluated (i.e. each individual receive a fitness value) v A new population is created: Population can be sorted / kept or discarded in part (“ selection ”) Crossover, gene mutations etc. v possible intermediary re-eval <[loop] 6

  7. GA Fitness A fitness value is the result of a chosen fitness function As a rather simple/limited example: FITNESS = RAND(A) + RAND(B) + RAND(C) where A,B,C are 0-1000: Highest fitness value=3000, lowest=0 Optimize towards a negative (lowest value=best) calculated fitness; FITNESS=-FITNESS i.e. -3000 becomes 3000 so the lowest value becomes best Optimize towards %: 1/FITNESS or 1-(1/FITNESS) etc. Basically; anything that can be optimized towards a best result can be GA’ed 7

  8. GA Variables GA Variables are often binary (or represented in binary) where a single bit is a single gene But they do not need to be! One step further is variables as genes, where all variables are alike a=0-100 with step 1, b=0-100 with step 1, c=0-100 with step 1 The most advanced is variables that are in disparate ranges a=-1 to 1 with step 0.01, b=0-100 with step 0.5, etc. 8

  9. GA Dev TIP: GA Value-store Genes Everyone working with GA can benefit from this hack/approach Example: significant genes (used in fitness): a,b,c non-significant genes (not used in fitness): d i.e. sub-eval (think sub-total) data can be stored in another gene where such gene is never set/updated/mutated, but only used for tracking certain calculations, results, statuses, etc. This optimizes (though not in all cases) the number of calculations 9

  10. So why do we need GA’s? To optimize... everything 10

  11. How many persons can we fit... 11

  12. GA Application Domains #1 Bayesian inference links to particle methods in Bayesian statistics and hidden Markov chain models Artificial creativity Chemical kinetics (gas and solid phases) Calculation of bound states and local-density approximations Code-breaking, using the GA to search large solution spaces of ciphers for the one correct decryption. Computer architecture: using GA to find out weak links in approximate computing such as lookahead. Configuration applications, particularly physics applications of optimal molecule configurations for particular systems like C60 (buckyballs) Construction of facial composites of suspects by eyewitnesses in forensic science. Data Center/Server Farm. Distributed computer network topologies Electronic circuit design, known as evolvable hardware Feature selection for Machine Learning Feynman-Kac models File allocation for a distributed system Filtering and signal processing Finding hardware bugs. Game theory equilibrium resolution Genetic Algorithm for Rule Set Production Scheduling applications, including job-shop scheduling and scheduling in printed circuit board assembly. Learning robot behavior using genetic algorithms Image processing: Dense pixel matching Learning fuzzy rule base using genetic algorithms Molecular structure optimization (chemistry) Optimisation of data compression systems, for example using wavelets. Power electronics design. SOURCE: https://en.wikipedia.org/wiki/List_of_genetic_algorithm_applications Traveling salesman problem and its applications 12

  13. GA Application Domains #2 Climatology: Estimation of heat flux between the atmosphere and sea ice Climatology: Modelling global temperature changes Design of water resource systems Groundwater monitoring networks Design of anti-terrorism systems Linguistic analysis, including grammar induction and other aspects of Natural language processing (NLP) such as word sense disambiguation. Automated design of sophisticated trading systems in the financial sector Representing rational agents in economic models such as the cobweb model Real options valuation Audio watermark insertion/detection Airlines revenue management Automated design of mechatronic systems using bond graphs and genetic programming (NSF) Automated design = computer-automated design Automated design of industrial equipment using catalogs of exemplar lever patterns Automated design, including research on composite material design and multi-objective design of automotive components for crashworthiness, weight savings, and other characteristics Container loading optimization Control engineering, Marketing mix analysis Mechanical engineering Mobile communications infrastructure optimization.Plant floor layout Pop music record production Quality control Timetabling problems, such as designing a non-conflicting class timetable for a large university Vehicle routing problem Optimal bearing placement 13

  14. GA Application Domains #3 Computer-automated design Bioinformatics Multiple Sequence Alignment Bioinformatics: RNA structure prediction Bioinformatics: Motif Discovery Biology and computational chemistry Building phylogenetic trees. Gene expression profiling analysis. Medicine: Clinical decision support in ophthalmology Computational Neuroscience: finding values for the maximal conductances of ion channels in biophysically detailed neuron models Protein folding and protein/ligand docking Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly recurrent neural networks Training artificial neural networks when pre-classified training examples are not readily obtainable (neuroevolution) Clustering, using genetic algorithms to optimize a wide range of different fit-functions. Multidimensional systems Multimodal Optimization Multiple criteria production scheduling Multiple population topologies and interchange methodologies Mutation testing Parallelization of GAs/GPs including use of hierarchical decomposition of problem domains and design spaces nesting of irregular shapes using feature matching and GAs. Rare event analysis Solving the machine-component grouping problem required for cellular manufacturing systems Stochastic optimization Tactical asset allocation and international equity strategies Wireless sensor/ad-hoc networks. 14

  15. Simple GA Example @ https://github.com/Percona-QA/gaai/blob/master/ga_example/ga_example.lua git clone https://github.com/Percona-QA/gaai.git cd ga_example lua ga_example.lua Polation: 100, Genes: 10, Generations: 100 This GA simply takes sum(rand(0,9999999/10),rand(idem),...rand(n)) i.e. a random number between 0 and 9999999 divided by the number of genes * the number of genes (max fitness=9999999) 15

  16. A bit of (MySQL) database tuning history ● Past: very poor defaults/templates, settings tuning a must ● Current: more optimized/increased defaults, settings tuning may still be recommended for high-use production systems ● Future: automatically adjusting settings (GA or logic based) Past: MANUAL > Future: AUTOMATED Automated systems are less error prone and can be optimized over time! 16

  17. GA Database Tuning: a new concept / mindset ● It does not really matter which workload GA optimizes ○ i.e. there is no “right”, “wrong”, “common” or “specific” one ○ GA will be able to optimize any of them ● This is dissimilar to past performance benchmarking, which is usually tuned towards/optimized for a specific load (or set of loads) ● It matters less here how much effective % is gained using a specific set of options for a specific semi-synthetic workload ● It matters much more here how much overall improvement is seen over time as the workload changes (real production workloads) 17

  18. Thoughts on Database Tuning #1 ● R/O variable optimization require restart: not suitable for production systems ● Sysbench load is uniform/synthetic (easier to optimize), though I expect that actual user loads will achieve similar (i.e. 80%) similar ROI’s, unless the data being processed is highly random ● Tuning various memory buffers can be complex and requires surrounding “safety” code calculations (or value ranges) to avoid OOM ● Things may change over time, for ex. the number of client connections 18

  19. Thoughts on Database Tuning #2 ● It would be good to cover for special events like checkpoints (Longer sample runtimes may be sufficient to cover this) ● Not all mysqld variables automatically lend themselves to “pure performance tuning” as some variables are features - setting them changes the performance, but only because the server functionality was modified also - i.e. the performance offset may be expected (credit: Laurynas Biveinis) ● Some vars require longer runtime to sample (e.g. InnoDB buffer pool) 19

Recommend


More recommend