FINAL EXAM REVIEW Will cover: All content from the course (Units - PowerPoint PPT Presentation

FINAL EXAM REVIEW Will cover: • All content from the course (Units 1-5) • Most points concentrated on Units 3-5 (mixture models, HMMs, MCMC) Logistics • Take-home exam, maximum 2 hour time limit • Exam release late afternoon Fri 5/1 • Exam due NOON (11:59am ET) on Fri 5/8 • Can use: Any notes, any textbook, any Python code (run locally) • Cannot use: The internet to search for answers, other people • We will provide most needed formulas or give textbook reference

Takeaway Messages 1) When uncertain about a variable, don’t condition on it, integrate it away! 2) Model performance is only as good as your fitting algorithm, initialization, and hyperparameter selection. 3) MCMC is a powerful way to estimate posterior distributions (and resulting expectations) even when the model is not analytically tractable

<latexit sha1_base64="x2/tMCazDkzDw1uMp72i63SWnY=">ACDnicbVDLTsJAFJ3iC/GFunQzkZAgMaRFE92YEN24xEQeCdRmOp3ChOm0mZkKBPkCN/6KGxca49a1O/GAbpQ8CQ3OTn3tx7jxsxKpVpfhupeWV1bX0emZjc2t7J7u7V5dhLDCp4ZCFoukiSRjlpKaoYqQZCYICl5G27ua+I17IiQN+a0aRsQOUIdTn2KktORk81EBDu6K8AE2j+AFbFOunD6MCgOneAz7M9nrO9mcWTKngIvESkgOJKg62a+2F+I4IFxhqRsWak7BESimJGxpl2LEmEcA91SEtTjgIi7dH0nTHMa8WDfih0cQWn6u+JEQqkHAau7gyQ6sp5byL+57Vi5Z/bI8qjWBGOZ4v8mEVwk20KOCYMWGmiAsqL4V4i4SCudYEaHYM2/vEjq5ZJ1UirfnOYql0kcaXADkEBWOAMVMA1qIawOARPINX8GY8GS/Gu/Exa0Zycw+APj8we+c5jE</latexit> <latexit sha1_base64="lKLrfhc+1Z5XvW67gMKyBwIphE0=">AB+nicbVBNT8JAEJ36ifhV9OhlIzFBD6RFEz0SvXjERD4SqGS7LBhu212tyIp/BQvHjTGq7/Em/GBXpQ8CWTvLw3k5l5fsSZ0o7zba2srq1vbGa2sts7u3v7du6gpsJYElolIQ9lw8eKciZoVTPNaSOSFAc+p3V/cDP1649UKhaKez2KqBfgnmBdRrA2UtvORYWnhzM0Rq0+1slwgk7bdt4pOjOgZeKmJA8pKm37q9UJSRxQoQnHSjVdJ9JegqVmhNJthUrGmEywD3aNFTgCovmZ0+QSdG6aBuKE0JjWbq74kEB0qNAt90Blj31aI3Ff/zmrHuXnkJE1GsqSDzRd2YIx2iaQ6owyQlmo8MwUQycysifSwx0SatrAnBXx5mdRKRfe8WLq7yJev0zgycATHUAXLqEMt1CBKhAYwjO8wps1tl6sd+tj3rpipTOH8AfW5w+0LpL+</latexit> Takeaway 1! When uncertain about a parameter, better to INTEGRATE AWAY than CONDITION ON p ( x ∗ | ˆ w ) OK: Using a point estimate BETTER: Integrate away ”w” via the sum rule Z p ( x ∗ | X ) = p ( x ∗ , w | X ) dw w

Takeaway 2 • Initialization, remember CP3 (GMMs) • as well as CP5 (coming!) • Algorithm, remember the difference between LBFGS and EM in CP3 Difference between purple and blue is 0.01 on log scale * Hyperparameter: Remember the poor When normalized over 400 pixels (20x20) per image performance in CP2 Means purple model says average validation set image is exp(0.01 * 400) = 54.5 times more likely than the blue model

<latexit sha1_base64="yD5xTIqcA75dojf35Uv58djnkRk=">ACl3icbVFda9swFJW9r9b7aNr1ZezlsrCRlBLsrtAxKCvrGH1s6dIG4sTIstyKyh+VrpcEzX9pP2Zv+zeTkzxsyS4Ijs65hyPdG5dSaPT93474OGjx082Nr2nz56/2Gpt71zpolKM91khCzWIqeZS5LyPAiUflIrTLJb8Or47bfTr71xpUeTfcFbyUZvcpEKRtFSUetn2YHpeA9+wKAL74hFDlGEyg702hvHyYLPpl4YeitqFadC1c60kVZSaozWUNoa6yOjoB5fQpO0sI01dPchvL+vaGJvRjedSNmd4tKEyKdohEjq2oRaZHVjbCxNTNRq+z1/XrAOgiVok2WdR61fYVKwKuM5Mkm1HgZ+iSNDFQome2FlealTaY3fGhTjOuR2Y+1xreWiaBtFD25Ahz9m+HoZnWsy2nRnFW72qNeT/tGF6YeREXlZIc/ZIitJGABzZIgEYozlDMLKFPCvhXYLbVzRbtKzw4hWP3yOrg6AXvewcXh+2Tz8txbJDX5A3pkIAckRNyRs5JnzBn1/nonDpf3FfuJ/ere7ZodZ2l5yX5p9yLP1A7wmo=</latexit> Takeaway 3 • Can use MCMC to do posterior predictive Z p ( x ∗ | X ) = p ( x ∗ , w | X ) dw w Z = p ( x ∗ | w ) p ( w | X ) dw w S = 1 w s iid X p ( x ∗ | w s ) , ∼ p ( w s | X ) S s =1

You are capable of so many things now! Given a proposed probabilistic model, you can do: ML estimation of parameters Heldout likelihood computation MAP estimation of parameters Hyperparameter selection via CV EM to estimate parameters Hyperparameter selection via evidence MCMC estimation of posterior

Optimization Skills Unit 1 • Finding extrema by zeros of first derivative • Handling Constraints via Lagrange multipliers Probabilistic Analysis Skills Data analysis • Discrete and continuous r.v. • Beta-Bernoulli for binary data • Sum rule and product rule • ML estimation of ”proba. heads" • Bayes rule (derived from above) • MAP estimation of “proba. heads" • Expectations • Estimating the posterior • Independence • Predicting new data • Dirichlet-Categorical for discrete data Distributions • Bernoulli distribution • ML estimation of unigram probas • MAP estimation of unigram probas • Beta distribution • Estimating the posterior • Gamma function • Predicting new data • Dirichlet distribution

Example Unit 1 Question a) True or False: Bayes Rule can be proved using the Sum Rule and Product Rules a) You’re modeling the wins/losses of your favorite sports team with a Beta-Bernoulli model. a) You assume each game’s binary outcome (win=1/loss=0) is iid. b) You observe in preseason play: 5 wins and 3 losses c) Suggest a prior to use for the win probability d) Identify 2 or more assumptions about this model that may not be valid in the real world (with concrete reasons)

Example Unit 1 Answer

Unit 2 Optimization Skills Probabilistic Analysis Skills • Convexity and second derivatives • Joints, conditionals, marginals • Finding extrema by zeros of first derivative • Covariance matrices (pos. definite, symmetric) • First and second order gradient descent • Gaussian conjugacy rules Linear Algebra Skills Data analysis • Determinants • Positive definite • Gaussian-Gaussian for regression • Invertibility • ML estimation of weights • MAP estimation of weights Distributions • Univariate Gaussian distribution • Estimating the posterior over weights • Multivariate Gaussian distribution • Predicting new data

<latexit sha1_base64="6vUJKUEKlj3EmXfpJftwyboOxw=">ACO3icbVBNSxBFOzRaHT8Ws0xl0cWRUWGSPoRdgYhFyEVxd2BmHnt5et7Hng+436jLO/Lin/DmJZcIiHX3NOzu4hfBQ1FVT36vQpTKTQ6zoM1Nv5hYvLj1LQ9Mzs3v1BZXDrRSaYb7JEJqoVUs2liHkTBUreShWnUSj5aXjxvfRPL7nSIomPsZ9yP6LnsegKRtFIQeXI61HMsQjWYWUXrs7yg2+NAq6DdvzbA+F7PAn14so9sIw3y+CHMHTIoJ0FeGmjG9Aa62ANiD4QaXq1JwB4C1xR6RKRmgElXuvk7As4jEySbVu06Kfk4VCiZ5YXuZ5ilF/Sctw2NacS1nw9uL2DZKB3oJsq8GgPp/IaR1PwpNslxfv/ZK8T2vnWF3x89FnGbIYzb8qJtJwATKIqEjFGco+4ZQpoTZFViPKsrQ1G2bEtzXJ78lJ5s192t83CrWt8b1TFPpMvZJW4ZJvUyQ/SIE3CyC35SX6TR+vO+mX9sf4Oo2PWaOYTeQHr382I6qL</latexit> <latexit sha1_base64="a43bVnPThLqB7flrdowacMns+mI=">ACG3icbVBNSwMxEM36WetX1aOXYBGqSNmtgl4EURFPUsGq0NYlm6ZtaJdklm1rP0fXvwrXjwo4knw4L8x2/bg14OBx3szMwLIsENuO6nMzI6Nj4xmZnKTs/Mzs3nFhbPTRhryio0FKG+DIhgitWAQ6CXUaERkIdhF0DlL/4pw0N1Bt2I1SVpKd7klICV/FwpKoCv8B2+9dUa3sU1YLeQnIRalg+PegV8g9dTawPXDG9JclVaw34u7xbdPvBf4g1JHg1R9nPvtUZIY8kUEGMqXpuBPWEaOBUsF62FhsWEdohLVa1VBHJTD3p/9bDq1Zp4GaobSnAfX7REKkMV0Z2E5JoG1+e6n4n1eNoblT7iKYmCKDhY1Y4EhxGlQuME1oyC6lhCqub0V0zbRhIKNM2tD8H6/Jecl4reZrF0upXf2x/GkUHLaAUVkIe20R46RmVUQRTdo0f0jF6cB+fJeXeBq0jznBmCf2A8/EFs5Sesg=</latexit> Example Unit 2 Question You are doing regression with the following model • Normal prior on the weights p ( t n | x n ) = NormPDF( w ∗ x n , σ 2 ) • Normal likelihood: a. Consider the following two estimators for t_*. What’s the difference? t ∗ = w MAP x ∗ ˆ ˜ t ∗ = E t ∼ p ( t | x ∗ ,X ) [ t ] b. Suggest at least 2 ways to pick a value for the hyperparameter \sigma

Unit 3: K-Means and Mixture Models Distributions Optimization Skills • Mixtures of Gaussians (GMMs) • K-means objective and algorithm • Coordinate ascent / descent algorithms • Mixtures in general • Can use any likelihood (not just Gauss) • Optimization objectives with hidden vars • Complete likelihood: p(x, z | \theta) Numerical Methods • Incomplete likelihood: p( x | \theta) logsumexp • Expectations of complete likelihood • How to derive it Data analysis • Why it is important • K-means or GMM for a dataset • Expectation-Maximization algorithm • How to pick K hyperparameter • Lower bound objective • Why multiple inits matter • What E-step does • What M-step does

Ex Exampl ple Uni nit 3 Que uestion Consider two possible models for clustering 1-dim. data • K-Means • Gaussian mixtures Name ways that the GMM is more flexible as a model: • How is the GMM’s treatment of assignments more flexible? • How is the GMM’s parameterization of a “cluster” more flexible? Under what limit does the GMM likelihood reduce to the K-means objective?

Ex Exampl ple Uni nit 3 An Answer

Unit 4: Markov models and HMMs Probabilistic Analysis Skills Algorithm Skills • Markov conditional independence • Forward algorithm • Backward algorithm • Stationary distributions • Viterbi algorithm • Deriving independence properties (all examples of dynamic programming) • Like HW4 problem 1 Linear Algebra Skills Optimization Skills • Eigenvectors/values for stationary • EM for HMMs distributions • E-step • M-step Distributions • Discrete Markov models

Example Unit 4 Question • Describe how the Viterbi algorithm is an instance of dynamic programming Identify all the key parts: • What is the fundamental problem being solved? • How is the final solution built from solutions to smaller problems? • How to describe all the solutions as a big “table” that should be filled in? • What is the “base case” update (the simplest subproblem)? • What is the recursive update?

FINAL EXAM REVIEW Will cover: All content from the course (Units - PowerPoint PPT Presentation

FINAL EXAM REVIEW Will cover: All content from the course (Units 1-5) Most points concentrated on Units 3-5 (mixture models, HMMs, MCMC) Logistics Take-home exam, maximum 2 hour time limit Exam release late afternoon Fri 5/1

Math 211 Math 211 Review for the Final Exam December 8, 2002 2 The Final Exam The Final Exam

Final Review Drawing on the Web Final exam on Thursday, May 14 at 2:00 p.m. (EST) Final Review

ICS 101 Final Exam Review Fall 2016 Final Exam information In lab: check final exam schedule

Final Review Introduction to Web Design Final exam on Thursday, December 19 at 12:00 p.m. Final

Final exam effects Textures I Final exam effects Final exam effects Lighting Grads

Announcements Announcements Final Exam will be a take Final Exam will be a take- -home exam

The final exam Other finals review Final Exam Review CSH Review November 17 th

Did I happen to mention? Final exam Final Exam Review The date for the Final has been

Exam4 Information and Guidance General Topics General Exam Information Exam types

Quicksort Sorting Lower Bound Exam Exam Exam Exam 2 2 tomorrow evening 2 2 tomorrow

Review Final exam Final exam will be 11-12 problems, drop any 2 Cumulative up to and including

Final Exam Details The final exam will be posted on Blackboard by 7am on April 26th It will be

Final exam on Thursday, May 16 Drawing on the Web Final CSCI-UA 380 Review Multiple choice

FINAL EXAM REVIEW PACKET ANSWERS All answers can be found on my website! Final Exam Review 1.

Exam Review 2 Exam Overview Final Exam Friday,

Examination Lydia Love DVM DACVAA 2018 Exam Committee Chair September 2018 Exam Format

Chapter 3. Distribution of random variables Feb. 2, 2016 Huamei Dong 4. Bernoulli distribution

Bayesian statistics DS GA 1002 Statistical and Mathematical Models

Alex Psomas: Lecture 18. Random Variables: Variance 1. Variance 2. Distributions Variance Flip

variance Var[ X ] = E[( X - ) 2 ], often denoted 2 . The standard deviation of X is =

Probability Theory CMPUT 296: Basics of Machine Learning 2.1-2.2 Recap This class is about

Statistics of spike trains: A dynamical systems Statistics of spike trains: A dynamical systems

Statistical Natural Language Processing Outcome Whether a review is negative or positive:

1. Foundations of Numerics from Advanced Mathematics 1. Foundations of Numerics from Advanced