csce 970 lecture 8
play

CSCE 970 Lecture 8: Prediction Stephen Scott Structured Prediction - PowerPoint PPT Presentation

CSCE 970 Lecture 8: Structured CSCE 970 Lecture 8: Prediction Stephen Scott Structured Prediction and Vinod Variyam Introduction Definitions Stephen Scott and Vinod Variyam Applications Graphical Models (Adapted from Sebastian Nowozin


  1. CSCE 970 Lecture 8: Structured CSCE 970 Lecture 8: Prediction Stephen Scott Structured Prediction and Vinod Variyam Introduction Definitions Stephen Scott and Vinod Variyam Applications Graphical Models (Adapted from Sebastian Nowozin and Christoph H. Lampert) Training sscott@cse.unl.edu 1 / 80

  2. Introduction Out with the old ... CSCE 970 Lecture 8: Structured We now know how to answer the question: Prediction Does this picture contain a cat? Stephen Scott and Vinod Variyam Introduction Definitions Applications Graphical Models Training E.g., convolutional layers feeding connected layers feeding softmax 2 / 80

  3. Introduction ... and in with the new. CSCE 970 Lecture 8: What we want to know now is: Where are the cats? Structured Prediction Stephen Scott and Vinod Variyam Introduction Definitions Applications Graphical Models Training No longer a classification problem; need more sophisticated ( structured ) output 3 / 80

  4. Outline CSCE 970 Lecture 8: Structured Prediction Stephen Scott and Vinod Variyam Definitions Introduction Applications Definitions Graphical modeling of probability distributions Applications Graphical Training models Models Training Inference 4 / 80

  5. Definitions Structured Outputs CSCE 970 Lecture 8: Structured Prediction Stephen Scott Most machine learning approaches learn function and Vinod Variyam f : X → R Inputs X are any kind of objects Introduction Output y is a real number (classification, regression, Definitions density estimation, etc.) Applications Structured output learning approaches learn function Graphical Models f : X → Y Training Inputs X are any kind of objects Outputs y ∈ Y are complex (structured) objects (images, text, audio, etc.) 5 / 80

  6. Definitions Structured Outputs (2) CSCE 970 Lecture 8: Structured Prediction Stephen Scott Can think of structured data as consisting of parts, where and Vinod Variyam each part contains information, as well as how they fit together Introduction Definitions Text: Word sequence matters Applications Hypertext: Links between documents matter Graphical Models Chemical structures: Relative positions of molecules Training matter Images: Relative positions of pixels matter 6 / 80

  7. Applications Image Processing CSCE 970 Lecture 8: Structured Prediction Stephen Scott and Vinod { 0 ,..., 255 } 3 ( m × n ) { 0 , 1 } m × n Variyam z }| { z }| { Semantic image segmentation: f : { images } { masks } → Introduction Definitions Applications Graphical Models Training 7 / 80

  8. Applications Image Processing (2) CSCE 970 Lecture 8: Structured Prediction Stephen Scott and Vinod { 0 ,..., 255 } 3 ( m × n ) R 3 K Variyam z }| { z }| { { images } { K positions & angles } Pose estimation: f : → Introduction Definitions Applications Graphical Models Training 8 / 80

  9. Applications Image Processing (3) CSCE 970 Point matching: Lecture 8: f : { image pairs } → { mappings between images } Structured Prediction Stephen Scott and Vinod Variyam Introduction Definitions Applications Graphical Models Training 9 / 80

  10. Applications Image Processing (4) CSCE 970 Lecture 8: Structured Prediction Stephen Scott and Vinod Variyam Object localization f : { images } → { bounding box coordinates } Introduction Definitions Applications Graphical Models Training 10 / 80

  11. Applications Others CSCE 970 Lecture 8: Structured Prediction Stephen Scott Natural language processing (e.g., translation; output is and Vinod Variyam sentences) Introduction Bioinformatics (e.g., structure prediction; output is Definitions graphs) Applications Speech processing (e.g., recognition; output is Graphical Models sentences) Training Robotics (e.g., planning; output is action plan) Image denoising (output is “clean” version of image) 11 / 80

  12. Graphical Models Probabilistic Modeling CSCE 970 Lecture 8: Structured Prediction Stephen Scott and Vinod Variyam To represent structured outputs, we will often employ probabilistic modeling Introduction Joint distributions (e.g., P ( A , B , C ) ) Definitions Conditional distributions (e.g., P ( A | B , C ) ) Applications Graphical Can estimate joint and conditional probabilities by Models counting and normalizing, but have to be careful about Directed Undirected representation Energy Separation Training 12 / 80

  13. Graphical Models Probabilistic Modeling (2) CSCE 970 Lecture 8: E.g., I have a coin with unknown probability p of heads Structured Prediction I want to estimate the probability of flipping it ten times Stephen Scott and Vinod and getting the sequence HHTTHHTTTT Variyam One way of representing this joint distribution is a Introduction single, big lookup table: Definitions Applications Each experiment consists of Graphical ten coin flips Outcome Count Models Directed 1 For each outcome, increment TTHHTTHHTH Undirected Energy 0 its counter HHHTHTTTHH Separation 0 Training HTTTTTHHHT After n experiments, divide 1 TTHTHTHHTT HHTTHHTTTT ’s counter by n to . . . . get the estimate . . Will this work? 13 / 80

  14. Graphical Models Probabilistic Modeling (3) CSCE 970 Lecture 8: Structured Prediction Stephen Scott and Vinod Variyam Problem: Number of possible outcomes grows exponentially with number of variables (flips) Introduction ⇒ Most outcomes will have count = 0 , a few with 1, Definitions probably none with more Applications ⇒ Lousy probability estimates Graphical Models Ten flips is bad enough, but consider 100 .. _ Directed Undirected How would you solve this problem? Energy Separation Training 14 / 80

  15. Graphical Models Factoring a Distribution CSCE 970 Lecture 8: Structured Prediction Of course, we recognize that all flips are independent, Stephen Scott and Vinod so Variyam Pr [ HHTTHHTTTT ] = p 4 ( 1 − p ) 6 Introduction So we can count n coin flips to estimate p and use the Definitions formula above Applications I.e., we factor the joint distribution into independent Graphical Models components and multiply the results: Directed Undirected Energy Pr [ HHTTHHTTTT ] = Pr [ f 1 = H ] Pr [ f 2 = H ] Pr [ f 3 = T ] · · · Pr [ f 10 = T ] Separation Training We greatly reduce the number of parameters to estimate 15 / 80

  16. Graphical Models Factoring a Distribution (2) CSCE 970 Lecture 8: Structured Prediction Stephen Scott Another example: Relay racing team and Vinod Variyam Alice, then Bob, then Carol Introduction Let t A = Alice’s finish time (in seconds), t B = Bob’s, Definitions t C = Carol’s Applications Want to model the joint distribution Pr [ t A , t B , t C ] Graphical Models Let t C , t B , t A ∈ { 1 , . . . , 1000 } Directed Undirected Energy How large would the table be for Pr [ t A , t B , t C ] ? Separation How many races must they run to populate the table? Training 16 / 80

  17. Graphical Models Factoring a Distribution (3) CSCE 970 Lecture 8: Structured Prediction But we can factor this distribution by observing that t A is Stephen Scott independent of t B and t C and Vinod Variyam ⇒ Can estimate t A on its own Also, t B directly depends on t A , but is independent of t C Introduction Definitions t C directly depends on t B , and indirectly on t A Applications Can display this graphically: Graphical Models Directed Undirected Energy Separation Training 17 / 80

  18. Graphical Models Factoring a Distribution (4) CSCE 970 Lecture 8: Structured Prediction Stephen Scott and Vinod Variyam Introduction Definitions Applications This directed graphical model (often called a Graphical Models Bayesian network or Bayes net ) represents Directed conditional dependencies among variables Undirected Energy Separation Makes factoring easy: Training Pr [ t A , t B , t C ] = Pr [ t A ] Pr [ t B | t A ] Pr [ t C | t B ] 18 / 80

  19. Graphical Models Factoring a Distribution (5) CSCE 970 Lecture 8: Structured Prediction Pr [ t A , t B , t C ] = Pr [ t A ] Pr [ t B | t A ] Pr [ t C | t B ] Stephen Scott and Vinod Variyam Introduction Table for Pr [ t A ] requires 1 1000 entries, while Pr [ t B | t A ] Definitions requires 10 6 , as does Pr [ t C | t B ] Applications ⇒ Total 2 . 001 × 10 6 , versus 10 9 Graphical Models Idea easily extends to continuous distributions by Directed Undirected changing discrete probability Pr [ · ] to pdf p ( · ) Energy Separation Training 1 Technically, we only need 999 entries, since the value of the last one is implied since probabilities must sum to one. However, then the analysis requires the use of a lot of “9”s, and that’s not something I’m willing to take on at this point in my life. 19 / 80

  20. Directed Models Conditional Independence CSCE 970 Lecture 8: Definition : X is conditionally independent of Y given Z if Structured Prediction the probability distribution governing X is independent of the Stephen Scott value of Y given the value of Z ; that is, if and Vinod Variyam ( ∀ x i , y j , z k ) Pr [ X = x i | Y = y j , Z = z k ] = Pr [ X = x i | Z = z k ] Introduction Definitions more compactly, we write Applications Graphical Pr [ X | Y , Z ] = Pr [ X | Z ] Models Directed Undirected Example: Thunder is conditionally independent of Rain , Energy Separation given Lightning Training Pr [ Thunder | Rain , Lightning ] = Pr [ Thunder | Lightning ] 20 / 80

Recommend


More recommend