CSCE 496/896 Lecture 11: CSCE 496/896 Lecture 11: Structured Prediction and Structured Prediction and Probabilistic Probabilistic Graphical Models Graphical Models Stephen Scott and Vinod Variyam Introduction Stephen Scott and Vinod Variyam Definitions Applications Graphical (Adapted from Sebastian Nowozin and Christoph H. Lampert) Models Training sscott@cse.unl.edu 1 / 74
Introduction Out with the old ... CSCE 496/896 Lecture 11: We’ve long known how to answer the question: Structured Prediction and Does this picture contain a cat? Probabilistic Graphical Models Stephen Scott and Vinod Variyam Introduction Definitions Applications Graphical Models Training E.g., convolutional layers feeding connected layers feeding softmax 2 / 74
Introduction ... and in with the new. CSCE 496/896 What we want to know now is: Where are the cats? Lecture 11: Structured Prediction and Probabilistic Graphical Models Stephen Scott and Vinod Variyam Introduction Definitions Applications Graphical Models Training No longer a classification problem; need more sophisticated ( structured ) output 3 / 74
Outline CSCE 496/896 Lecture 11: Structured Prediction and Probabilistic Graphical Models Definitions Stephen Scott and Vinod Applications Variyam Graphical modeling of probability distributions Introduction Definitions Training models Applications Inference Graphical Models Training 4 / 74
Definitions Structured Outputs CSCE 496/896 Lecture 11: Structured Prediction and Probabilistic Most machine learning approaches learn function Graphical f : X → R Models Stephen Scott Inputs X are any kind of objects and Vinod Output y is a real number (classification, regression, Variyam density estimation, etc.) Introduction Structured output learning approaches learn function Definitions f : X → Y Applications Inputs X are any kind of objects Graphical Models Outputs y ∈ Y are complex (structured) objects Training (images, text, audio, etc.) 5 / 74
Definitions Structured Outputs (2) CSCE 496/896 Lecture 11: Structured Prediction and Can think of structured data as consisting of parts, where Probabilistic Graphical each part contains information, as well as how they fit Models Stephen Scott together and Vinod Variyam Text: Word sequence matters Introduction Hypertext: Links between documents matter Definitions Chemical structures: Relative positions of molecules Applications Graphical matter Models Images: Relative positions of pixels matter Training 6 / 74
Applications Image Processing CSCE 496/896 Lecture 11: Structured Prediction and Probabilistic Graphical { 0 ,..., 255 } 3 ( m × n ) { 0 , 1 } m × n Models � �� � � �� � Stephen Scott Semantic image segmentation: f : { images } → { masks } and Vinod Variyam Introduction Definitions Applications Graphical Models Training 7 / 74
Applications Image Processing (2) CSCE 496/896 Lecture 11: Structured Prediction and Probabilistic Graphical { 0 ,..., 255 } 3 ( m × n ) R 3 K Models � �� � � �� � Pose estimation: f : { images } → { K positions & angles } Stephen Scott and Vinod Variyam Introduction Definitions Applications Graphical Models Training 8 / 74
Applications Image Processing (3) CSCE Point matching: 496/896 f : { image pairs } → { mappings between images } Lecture 11: Structured Prediction and Probabilistic Graphical Models Stephen Scott and Vinod Variyam Introduction Definitions Applications Graphical Models Training 9 / 74
Applications Image Processing (4) CSCE 496/896 Lecture 11: Structured Prediction and Probabilistic Graphical Object localization Models f : { images } → { bounding box coordinates } Stephen Scott and Vinod Variyam Introduction Definitions Applications Graphical Models Training 10 / 74
Applications Others CSCE 496/896 Lecture 11: Structured Prediction and Probabilistic Natural language processing (e.g., translation; output is Graphical Models sentences) Stephen Scott and Vinod Bioinformatics (e.g., structure prediction; output is Variyam graphs) Introduction Speech processing (e.g., recognition; output is Definitions sentences) Applications Robotics (e.g., planning; output is action plan) Graphical Models Image denoising (output is “clean” version of image) Training 11 / 74
Graphical Models Probabilistic Modeling CSCE 496/896 Lecture 11: Structured Prediction and Probabilistic Graphical To represent structured outputs, we will often employ Models probabilistic modeling Stephen Scott and Vinod Variyam Joint distributions (e.g., P ( A , B , C ) ) Conditional distributions (e.g., P ( A | B , C ) ) Introduction Can estimate joint and conditional probabilities by Definitions counting and normalizing, but have to be careful about Applications representation Graphical Models Directed Undirected Energy Separation Training 12 / 74
Graphical Models Probabilistic Modeling (2) CSCE 496/896 E.g., I have a coin with unknown probability p of heads Lecture 11: Structured I want to estimate the probability of flipping it ten times Prediction and Probabilistic and getting the sequence HHTTHHTTTT Graphical Models One way of representing this joint distribution is a Stephen Scott and Vinod single, big lookup table: Variyam Introduction Each experiment consists of Definitions ten coin flips Outcome Count Applications 1 TTHHTTHHTH For each outcome, increment Graphical 0 HHHTHTTTHH Models its counter Directed 0 HTTTTTHHHT Undirected After n experiments, divide Energy 1 TTHTHTHHTT Separation HHTTHHTTTT ’s counter by n to . . Training . . get the estimate . . Will this work? 13 / 74
Graphical Models Probabilistic Modeling (3) CSCE 496/896 Lecture 11: Structured Prediction and Probabilistic Graphical Problem: Number of possible outcomes grows Models exponentially with number of variables (flips) Stephen Scott and Vinod ⇒ Most outcomes will have count = 0 , a few with 1, Variyam probably none with more Introduction ⇒ Lousy probability estimates Definitions .. Ten flips is bad enough, but consider 100 ⌢ Applications Graphical How would you solve this problem? Models Directed Undirected Energy Separation Training 14 / 74
Graphical Models Factoring a Distribution CSCE 496/896 Lecture 11: Structured Of course, we recognize that all flips are independent, Prediction and Probabilistic so Graphical Models Pr[ HHTTHHTTTT ] = p 4 ( 1 − p ) 6 Stephen Scott and Vinod So we can count n coin flips to estimate p and use the Variyam formula above Introduction I.e., we factor the joint distribution into independent Definitions components and multiply the results: Applications Graphical Pr[ HHTTHHTTTT ] = Pr[ f 1 = H ] Pr[ f 2 = H ] Pr[ f 3 = T ] · · · Pr[ f 10 = T ] Models Directed We greatly reduce the number of parameters to Undirected Energy estimate Separation Training 15 / 74
Graphical Models Factoring a Distribution (2) CSCE 496/896 Lecture 11: Structured Prediction and Probabilistic Another example: Relay racing team Graphical Models Alice, then Bob, then Carol Stephen Scott and Vinod Let t A = Alice’s finish time (in seconds), t B = Bob’s, Variyam t C = Carol’s Introduction Want to model the joint distribution Pr[ t A , t B , t C ] Definitions Let t C , t B , t A ∈ { 1 , . . . , 1000 } Applications Graphical How large would the table be for Pr[ t A , t B , t C ] ? Models Directed How many races must they run to populate the table? Undirected Energy Separation Training 16 / 74
Graphical Models Factoring a Distribution (3) CSCE 496/896 Lecture 11: Structured But we can factor this distribution by observing that t A is Prediction and independent of t B and t C Probabilistic Graphical Models ⇒ Can estimate t A on its own Stephen Scott Also, t B directly depends on t A , but is independent of t C and Vinod Variyam t C directly depends on t B , and indirectly on t A Introduction Can display this graphically: Definitions Applications Graphical Models Directed Undirected Energy Separation Training 17 / 74
Graphical Models Factoring a Distribution (4) CSCE 496/896 Lecture 11: Structured Prediction and Probabilistic Graphical Models Stephen Scott and Vinod Variyam Introduction This directed graphical model (often called a Definitions Bayesian network or Bayes net ) represents Applications conditional dependencies among variables Graphical Models Makes factoring easy: Directed Undirected Energy Pr[ t A , t B , t C ] = Pr[ t A ] Pr[ t B | t A ] Pr[ t C | t B ] Separation Training 18 / 74
Recommend
More recommend