an introduction to probabilistic modeling
play

An Introduction to Probabilistic modeling Oliver Stegle and Karsten - PowerPoint PPT Presentation

An Introduction to Probabilistic modeling Oliver Stegle and Karsten Borgwardt Machine Learning and Computational Biology Research Group, Max Planck Institute for Biological Cybernetics and Max Planck Institute for Developmental Biology,


  1. An Introduction to Probabilistic modeling Oliver Stegle and Karsten Borgwardt Machine Learning and Computational Biology Research Group, Max Planck Institute for Biological Cybernetics and Max Planck Institute for Developmental Biology, Tübingen Oliver Stegle and Karsten Borgwardt: Computational Approaches for Analysing Complex Biological Systems, Page 1

  2. Motivation Why probabilistic modeling? ◮ Inferences from data are intrinsically uncertain. ◮ Probability theory: model uncertainty instead of ignoring it! ◮ Applications: Machine learning, Data Mining, Pattern Recognition, etc. ◮ Goal of this part of the course ◮ Overview on probabilistic modeling ◮ Key concepts ◮ Focus on Applications in Bioinformatics O. Stegle & K. Borgwardt An introduction to probabilistic modeling T¨ ubingen 1

  3. Motivation Why probabilistic modeling? ◮ Inferences from data are intrinsically uncertain. ◮ Probability theory: model uncertainty instead of ignoring it! ◮ Applications: Machine learning, Data Mining, Pattern Recognition, etc. ◮ Goal of this part of the course ◮ Overview on probabilistic modeling ◮ Key concepts ◮ Focus on Applications in Bioinformatics O. Stegle & K. Borgwardt An introduction to probabilistic modeling T¨ ubingen 1

  4. Motivation Why probabilistic modeling? ◮ Inferences from data are intrinsically uncertain. ◮ Probability theory: model uncertainty instead of ignoring it! ◮ Applications: Machine learning, Data Mining, Pattern Recognition, etc. ◮ Goal of this part of the course ◮ Overview on probabilistic modeling ◮ Key concepts ◮ Focus on Applications in Bioinformatics O. Stegle & K. Borgwardt An introduction to probabilistic modeling T¨ ubingen 1

  5. Motivation Further reading, useful material ◮ Christopher M. Bishop: Pattern Recognition and Machine learning. ◮ Good background, covers most of the course material and much more! ◮ Substantial parts of this tutorial borrow figures and ideas from this book. ◮ David J.C. MacKay: Information Theory, Learning and Inference ◮ Very worth while reading, not quite the same quality of overlap with the lecture synopsis. ◮ Freely available online. O. Stegle & K. Borgwardt An introduction to probabilistic modeling T¨ ubingen 2

  6. Motivation Lecture overview 1. An Introduction to probabilistic modeling 2. Applications: linear models, hypothesis testing 3. An introduction to Gaussian processes 4. Applications: time series, model comparison 5. Applications: continued O. Stegle & K. Borgwardt An introduction to probabilistic modeling T¨ ubingen 3

  7. Outline Outline O. Stegle & K. Borgwardt An introduction to probabilistic modeling T¨ ubingen 4

  8. Prerequisites Outline Motivation Prerequisites Probability Theory Parameter Inference for the Gaussian Summary O. Stegle & K. Borgwardt An introduction to probabilistic modeling T¨ ubingen 5

  9. Prerequisites Key concepts Data ◮ Let D denote a dataset, consisting of N datapoints } N D = { x n , y n n =1 . ���� ���� Inputs Outputs ◮ Typical (this course) ◮ x = { x 1 , . . . , x D } multivariate, spanning D features for each observation (nodes in a graph, etc.). ◮ y univariate (fitness, expression level etc.). ◮ Notation: ◮ Scalars are printed as y . ◮ Vectors are printed in bold: x . ◮ Matrices are printed in capital bold: Σ . O. Stegle & K. Borgwardt An introduction to probabilistic modeling T¨ ubingen 6

  10. Prerequisites Key concepts Data ◮ Let D denote a dataset, consisting of N datapoints } N D = { x n , y n n =1 . ���� ���� Inputs Outputs ◮ Typical (this course) ◮ x = { x 1 , . . . , x D } multivariate, spanning D features for each observation (nodes in a graph, etc.). ◮ y univariate (fitness, expression level etc.). ◮ Notation: ◮ Scalars are printed as y . Y ◮ Vectors are printed in bold: x . ◮ Matrices are printed in capital bold: Σ . X O. Stegle & K. Borgwardt An introduction to probabilistic modeling T¨ ubingen 6

  11. Prerequisites Key concepts Data ◮ Let D denote a dataset, consisting of N datapoints } N D = { x n , y n n =1 . ���� ���� Inputs Outputs ◮ Typical (this course) ◮ x = { x 1 , . . . , x D } multivariate, spanning D features for each observation (nodes in a graph, etc.). ◮ y univariate (fitness, expression level etc.). ◮ Notation: ◮ Scalars are printed as y . Y ◮ Vectors are printed in bold: x . ◮ Matrices are printed in capital bold: Σ . X O. Stegle & K. Borgwardt An introduction to probabilistic modeling T¨ ubingen 6

  12. Prerequisites Key concepts Predictions } N ◮ Observed dataset D = { x n , y n n =1 . ���� ���� Inputs Outputs ◮ Given D , what can we say about y ⋆ at an unseen test input x ⋆ ? Y X O. Stegle & K. Borgwardt An introduction to probabilistic modeling T¨ ubingen 7

  13. Prerequisites Key concepts Predictions } N ◮ Observed dataset D = { x n , y n n =1 . ���� ���� Inputs Outputs ◮ Given D , what can we say about y ⋆ at an unseen test input x ⋆ ? ? Y x* X O. Stegle & K. Borgwardt An introduction to probabilistic modeling T¨ ubingen 7

  14. Prerequisites Key concepts Model } N ◮ Observed dataset D = { x n , y n n =1 . ���� ���� Inputs Outputs ◮ Given D , what can we say about y ⋆ at an unseen test input x ⋆ ? ◮ To make predictions we need to make assumptions. ◮ A model H encodes these assumptions and often depends on some parameters θ . ◮ Curve fitting: the model relates ? x to y, Y y = f ( x | θ ) = θ 0 + θ 1 · x � �� � example, a linear model x* X O. Stegle & K. Borgwardt An introduction to probabilistic modeling T¨ ubingen 8

  15. Prerequisites Key concepts Model } N ◮ Observed dataset D = { x n , y n n =1 . ���� ���� Inputs Outputs ◮ Given D , what can we say about y ⋆ at an unseen test input x ⋆ ? ◮ To make predictions we need to make assumptions. ◮ A model H encodes these assumptions and often depends on some parameters θ . ◮ Curve fitting: the model relates x to y, Y y = f ( x | θ ) = θ 0 + θ 1 · x � �� � example, a linear model x* X O. Stegle & K. Borgwardt An introduction to probabilistic modeling T¨ ubingen 8

  16. Prerequisites Key concepts Uncertainty ◮ Virtually in all steps there is uncertainty ◮ Measurement uncertainty ( D ) ◮ Parameter uncertainty ( θ ) ◮ Uncertainty regarding the correct model ( H ) ◮ Uncertainty can occur in both Y inputs and outputs. ◮ How to represent uncertainty? X O. Stegle & K. Borgwardt An introduction to probabilistic modeling T¨ ubingen 9

  17. Prerequisites Key concepts Uncertainty ◮ Virtually in all steps there is uncertainty ◮ Measurement uncertainty ( D ) ◮ Parameter uncertainty ( θ ) ◮ Uncertainty regarding the correct model ( H ) ◮ Uncertainty can occur in both Y inputs and outputs. ◮ How to represent uncertainty? X O. Stegle & K. Borgwardt An introduction to probabilistic modeling T¨ ubingen 9

  18. Prerequisites Key concepts Uncertainty ◮ Virtually in all steps there is uncertainty ◮ Measurement uncertainty ( D ) ◮ Parameter uncertainty ( θ ) ◮ Uncertainty regarding the correct model ( H ) Measurement uncertainty ◮ Uncertainty can occur in both Y inputs and outputs. ◮ How to represent uncertainty? X O. Stegle & K. Borgwardt An introduction to probabilistic modeling T¨ ubingen 9

  19. Probability Theory Outline Motivation Prerequisites Probability Theory Parameter Inference for the Gaussian Summary O. Stegle & K. Borgwardt An introduction to probabilistic modeling T¨ ubingen 10

  20. Probability Theory Probabilities ◮ Let X be a random variable, defined over a set X or measurable space. ◮ P ( X = x ) denotes the probability that X takes value x , short p ( x ) . ◮ Probabilities are positive, P ( X = x ) ≥ 0 ◮ Probabilities sum to one � � p ( x ) dx = 1 p ( x ) = 1 x ∈X x ∈X ◮ Special case: no uncertainty p ( x ) = δ ( x − ˆ x ) . O. Stegle & K. Borgwardt An introduction to probabilistic modeling T¨ ubingen 11

  21. Probability Theory Probability Theory Marginal Probability P ( X = x i ) = c i N Conditional Probability P ( Y = y j | X = x i ) = n i,j Joint Probability c i P ( X = x i , Y = y j ) = n i,j N (C.M. Bishop, Pattern Recognition and Machine Learning) O. Stegle & K. Borgwardt An introduction to probabilistic modeling T¨ ubingen 12

  22. Probability Theory Probability Theory Marginal Probability P ( X = x i ) = c i N Conditional Probability P ( Y = y j | X = x i ) = n i,j Product Rule c i P ( X = x i , Y = y j ) = n i,j N = n i,j · c i c i N = P ( Y = y j | X = x i ) P ( X = x i ) (C.M. Bishop, Pattern Recognition and Machine Learning) O. Stegle & K. Borgwardt An introduction to probabilistic modeling T¨ ubingen 12

  23. Probability Theory Probability Theory Sum Rule L P ( X = x i ) = c i N = 1 � n i,j N j =1 � Product Rule = P ( X = x i , Y = y j ) j P ( X = x i , Y = y j ) = n i,j N = n i,j · c i c i N = P ( Y = y j | X = x i ) P ( X = x i ) (C.M. Bishop, Pattern Recognition and Machine Learning) O. Stegle & K. Borgwardt An introduction to probabilistic modeling T¨ ubingen 12

Recommend


More recommend