outils statistiques pour data science
play

Outils Statistiques pour Data Science Part I : Supervised Learning - PowerPoint PPT Presentation

Outils Statistiques pour Data Science Part I : Supervised Learning Massih-Reza Amini Universit Grenoble Alpes Laboratoire dInformatique de Grenoble Massih-Reza.Amini@imag.fr 2 Organization Balikas) Amini, Georgios Balikas) Balikas)


  1. Outils Statistiques pour Data Science Part I : Supervised Learning Massih-Reza Amini Université Grenoble Alpes Laboratoire d’Informatique de Grenoble Massih-Reza.Amini@imag.fr

  2. 2 Organization Balikas) Amini, Georgios Balikas) Balikas) Massih-Reza.Amini@imag.fr Introduction to Data-Science ❑ Classifjcation Automatique (Massih R Amini, Georgios ❑ Clustering (Massih R Amini, Georgios Balikas) ❑ Représentation et indexation d’un document (Massih R ❑ Recherche de thèmes latents (Marianne Clausel, Georgios ❑ Visualisation (Marianne Clausel, Georgios Balikas)

  3. 3 Learning and Inference The process of inference is done in three steps: 1. Observe a phenomenon, 2. Construct a model of the phenomenon, 3. Do predictions. These steps are involved in more or less all natural sciences! The aim of learning is to automate this process, The aim of the learning theory is to formalize the process. Massih-Reza.Amini@imag.fr Introduction to Data-Science

  4. 3 Learning and Inference The process of inference is done in three steps: 1. Observe a phenomenon, 2. Construct a model of the phenomenon, 3. Do predictions. All that is necessary to reduce the whole nature of laws similar to those which Newton discovered with the aid of calculus, is to have a suffjcient number of observations and a mathematics that is complex enough (Marquis de Condorcet, 1785) The aim of learning is to automate this process, The aim of the learning theory is to formalize the process. Massih-Reza.Amini@imag.fr Introduction to Data-Science ❑ These steps are involved in more or less all natural sciences!

  5. 3 Learning and Inference The process of inference is done in three steps: 1. Observe a phenomenon, 2. Construct a model of the phenomenon, 3. Do predictions. The aim of the learning theory is to formalize the process. Massih-Reza.Amini@imag.fr Introduction to Data-Science ❑ These steps are involved in more or less all natural sciences! ❑ The aim of learning is to automate this process,

  6. 3 Learning and Inference The process of inference is done in three steps: 1. Observe a phenomenon, 2. Construct a model of the phenomenon, 3. Do predictions. Massih-Reza.Amini@imag.fr Introduction to Data-Science ❑ These steps are involved in more or less all natural sciences! ❑ The aim of learning is to automate this process, ❑ The aim of the learning theory is to formalize the process.

  7. 4 Induction vs. deduction particular facts or instances. which a conclusion follows necessarily from the stated premises; it is an inference by reasoning from the general to the specifjc. This is how mathematicians prove theorems from axioms. Massih-Reza.Amini@imag.fr Introduction to Data-Science ❑ Induction is the process of deriving general principles from ❑ Deduction is, in the other hand, the process of reasoning in

  8. 5 Pattern recognition If we consider the context of supervised learning for pattern recognition: representation of an observation, class label), the theory of ML we consider the binary classifjcation case Massih-Reza.Amini@imag.fr Introduction to Data-Science ❑ The data consist of pairs of examples (vector ❑ Class labels are often Y = { 1 , . . . , K } with K large (but in Y = {− 1 , +1 } ), ❑ The learning algorithm constructs an association between the vector representation of an observation → class label, ❑ Aim: Make few errors on unseen examples.

  9. 6 Pattern recognition (Exemple) IRIS classifjcation, Ronald Fisher (1936) Iris Setosa Iris Versicolor Iris Virginica Massih-Reza.Amini@imag.fr Introduction to Data-Science

  10. 7 Pattern recognition (Exemple) relevant common characteristics, that constitute the features of their vector representations. Massih-Reza.Amini@imag.fr Introduction to Data-Science ❑ First step is to formalize the perception of the fmowers with ❑ This usually requires expert knowledge.

  11. 8 Pattern recognition (Exemple) Massih-Reza.Amini@imag.fr Introduction to Data-Science ❑ If observations are from a Field of Irises

  12. 8 Pattern recognition (Exemple) ... ... Massih-Reza.Amini@imag.fr Introduction to Data-Science ❑ If observations are from a Field of Irises then they become

  13. 8 Pattern recognition (Exemple) If observations are from a Field of Irises associated labels is generally time consuming. using deep neural networks function that maps vectorised observations (inputs) to their associated outputs Massih-Reza.Amini@imag.fr Introduction to Data-Science ❑ The constitution of vectorised observations and their ❑ Many studies are now focused on representation learning ❑ Second step: Learning translates then in the search of a

  14. 9 Pattern recognition Massih-Reza.Amini@imag.fr Introduction to Data-Science 2. Trouver les séparateurs 0. Base d’apprentissage 1. Vecteur de représentation 3. Nouveaux exemples 5. Prédire les étiquettes des nouveaux exemples

  15. 10 Approximation - Interpolation It is always possible to construct a function that exactly fjts the data. Massih-Reza.Amini@imag.fr Introduction to Data-Science

  16. 10 Approximation - Interpolation It is always possible to construct a function that exactly fjts the data. Massih-Reza.Amini@imag.fr Introduction to Data-Science

  17. 10 Approximation - Interpolation It is always possible to construct a function that exactly fjts the data. Is it reasonable? Massih-Reza.Amini@imag.fr Introduction to Data-Science

  18. 11 Occam razor Idea: Search for regularities (or repetitions) in the observed phenomenon, generalization is done from the passed model ... But how to measure the simplicity ? 1. Number of constantes, 2. Number de parameters, 3. ... Massih-Reza.Amini@imag.fr Introduction to Data-Science observations to the new futur ones ⇒ Take the most simple

  19. 12 Basic Hypotheses Two types of hypotheses: Massih-Reza.Amini@imag.fr Introduction to Data-Science ❑ Past observations are related to the future ones → The phenomenon is stationary ❑ Observations are independently generated from a source → Notion of independence

  20. 13 Aims hypotheses? overfjtting, Massih-Reza.Amini@imag.fr Introduction to Data-Science → How can one do predictions with past data? What are the ❑ Give a formel defjnition of learning, generalization, ❑ Characterize the performance of learning algorithms, ❑ Construct better algorithms.

  21. 14 Probabilistic model Relations between the past and future observations. individual information, on the phenomenon which generates the observations. Massih-Reza.Amini@imag.fr Introduction to Data-Science ❑ Independence: Each new observation provides a maximum ❑ identically Distributed : Observations provide information

  22. 15 Formally independently distributed ( i.i.d ) with respect to an unknown error. Massih-Reza.Amini@imag.fr Introduction to Data-Science We consider an input space X ⊆ R d and an output space Y . Assumption: Example pairs ( x , y ) ∈ X × Y are identically and but fjxed probability distribution D . Samples: We observe a sequence of m pairs of examples ( x i , y i ) generated i.i.d from D . Aim: Construct a prediction function f : X → Y which predicts an output y for a given new x with a minimum probability of

  23. 16 Supervised Learning Massih-Reza.Amini@imag.fr otherwise. misclassifjcation error: The risk function considered in classifjcation is usually the Introduction to Data-Science probability of error ❑ Discriminant models directly fjnd a classifjcation function f : X → Y from a given class of functions F ; ❑ The function found should be the one having the lowest ∫ R ( f ) = E ( x,y ) ∼D L ( x, y ) = L ( f ( x ) , y ) d D ( x, y ) X×Y Where L is a risk function defjned as L : Y × Y → R + ∀ ( x, y ); L ( f ( x ) , y ) = [ [ f ( x ) ̸ = y ] ] Where [ [ π ] ] is equal to 1 if the predicate π is true and 0

  24. 17 Empirical risk minimization (ERM) principle Massih-Reza.Amini@imag.fr not the right way of proceeding (occam razor) ... Introduction to Data-Science form of the true risk cannot be driven, so the prediction ❑ As the probability distribution D is unknown, the analytic function cannot be found directly on R ( f ) . ❑ Empirical risk minimization (ERM) principle: Find f by minimizing the unbiased estimator of R on a given training set S = ( x i , y i ) m i =1 : m R m ( f, S ) = 1 ∑ ˆ L ( f ( x i ) , y i ) m i =1 ❑ However, without restricting the class of functions this is

  25. 18 Consider now, a learning algorithm which minimizes the Massih-Reza.Amini@imag.fr otherwise ERM principle, problem empirical risk by choosing a function in the function class in the following way ; after reviewing a training set Introduction to Data-Science Suppose that the input dimension is d = 1 , let the input space X be the interval [ a, b ] ⊂ R where a and b are real values such that a < b , and suppose that the output space is {− 1 , +1 } . Moreover, suppose that the distribution D generating the examples ( x , y ) is an uniform distribution over [ a, b ] × {− 1 } . F = { f : [ a, b ] → {− 1 , +1 }} (also denoted as F = {− 1 , +1 } [ a,b ] ) S = { ( x 1 , y 1 ) , . . . , ( x m , y m ) } the algorithm outputs the prediction function f S such that { − 1 , if x ∈ { x 1 , . . . , x m } f S ( x ) = +1 ,

Recommend


More recommend