AI and Predictive Analytics in Data-Center Environments Introduction to Machine Learning Josep Ll. Berral @BSC Intel Academic Education Mindshare Initiative for AI
Introduction “Let the machine to automate the analysis for you”
Introduction • Machine Learning: 1. Algorithms and methods … 2. … to automatically learn/model a system … 3. … from some observations
Machine Learning Example of “Supervised Learning” Yep! It’s Versicolor Collected Labelled Obtaining Data Data Data and Pre-processing “a - posteriori” labelling Samples
Machine Learning Example of “Supervised Learning” Yep! It’s Versicolor Collected Labelled Obtaining Data Data Data and Pre-processing “a - posteriori” labelling Samples Training a Model Labelled Model for ML method Data Labelling
Machine Learning Example of “Supervised Learning” Yep! It’s Versicolor Collected Labelled Obtaining Data Data Data and Pre-processing “a - posteriori” labelling Samples Training a Model Labelled Model for ML method Data Labelling Model for Infering New Data “It’s Versicolor” Labelling New sample Prediction
Learning Example Let’s see that with Real Data
Learning Example • “Recognizing iris flowers” • Iris setosa? Iris versicolor? Iris virginica?
Learning Example • The “Iris Data - Set” • People measured and classified the flowers 1. x 1 : sepal length in cm 2. x 2 : sepal width in cm 3. x 3 : petal length in cm 4. x 4 : petal width in cm 5. class: [Iris Setosa, Iris Versicolor, Iris Virginica] *R.A. Fisher (1936). Source: https://archive.ics.uci.edu/ml/datasets/Iris
Learning Example • We have labeled samples: sepal length sepal width petal length petal width class 5.1 3.5 1.4 0.2 Setosa 7.0 3.2 4.7 1.4 Versicolor 5.8 2.7 5.1 1.9 Virginica ... ... ... ... ...
Learning Example • Given any iris, we want to know to which class it belongs sepal length sepal width petal length petal width class 6.1 3.2 5.0 1.3 ???
Learning Example • Find a function: • That function can be linear/non-linear/tree/set-of-rules/... f(sepal length, sepal width, petal length, petal width) → class The ML algorithm will produce that Model (here a formula) The Model can predict new samples
Learning Example Another example
Another Example • Algorithm to detect spam e-mails • A Bayes-based (Naïve Bayes) approach • Counting the word “diamonds” from spam-classified e-mails: spam ¬ spam diamonds 130 5 135 ¬ diamonds 987 300 1287 1117 305 1422
Another Example • The NB algorithm concludes from emails that: spam ¬ spam diamonds 130 5 135 ¬ diamonds 987 300 1287 1117 305 1422 P(spam) = 1117/1422 = 0.786 P(¬spam) = 305/1422 = 0.214 P(diamonds) = 135/1422 = 0.095 P(¬diamonds) = 1287/1422 = 0.905 P(diamonds & spam) = 130/1422 = 0.091 P(diamonds & ¬spam) = 5/1422 = 0.0035 • These stats will define the model
Another Example • The model for “diamonds” become: P(spam) = 1117/1422 = 0.786 P(¬spam) = 305/1422 = 0.214 P(diamonds) = 135/1422 = 0.095 P(¬diamonds) = 1287/1422 = 0.905 P(diamonds & spam) = 130/1422 = 0.091 P(diamonds & ¬spam) = 5/1422 = 0.0035 P(spam | diamonds) ← P(diamonds & spam) P(¬spam | diamonds) ← P(diamonds & ¬spam) P(diamonds) P(diamonds) • Given a new mail with “diamonds” : P(spam | diamonds) = 0,9514 P(¬spam | diamonds) = 0,0367 • The mail is classified as “spam”
ML Capabilities • Models are based on statistical properties of data • ML algorithms “automate” such modeling • Uses: 1. Estimate values (regression) 2. Predict classes or categories (classification) 3. Find similarities from data (clustering) 4. Recommendations 5. Display properties of the modeled system • There are lots of different Algorithms, with different properties
Interpretability • Interpreting Models • Know the mechanism behind data! A → B Model Data
Interpretability • Models are usually interpretable • Read the model • View the statistical properties of data A → B C = Ax + B Model Data A > 5 → C Model A ≤ 5 → B Data Model Data
Interpretability • E.g. Regressions f(x) = 102 + speed * 10.34 + weight * 5.14 • E.g. Decision trees • E.g. Naive Bayes P(diamonds | spam) = 0.115 P(spam) = 0.786
Summary • Time to decide which algorithm use • Choose those that are expected to fit better to your problem • Look for the statistical analysis you would perform • And which question you want to answer • According to • … the data you have • … the problem you are solving • … what you know about the problem • … how you are going to use the model • Some people… • … have a favorite set of methods • … just try a bunch of them, and see which one works better • … select the method according to their constraints
Recommend
More recommend