Models for Structured Data Linear Chains If we take a persons BP - PDF document

1 Models for Structured Data

Linear Chains • If we take a person’s BP every five minutes over a 24 hour period then there is some significant dependence between successive values • How to model this dependence? • Occurs in protein sequences, time series (measurements ordered in time) image data (measurements defined on a spatial grid) 2

First Order Markov Model • Structure of the data suggests a natural structuring of models we will build • T data points observed sequentially y 1 ,.., y T T ∏ = p ( y ,.., y ) p ( y ) p ( y | y ) − 1 T 1 1 t t t 1 = t 2 3

Generative interpretation of Markov Model T ∏ = p ( y ,.., y ) p ( y ) p ( y | y ) − 1 T 1 1 t t t 1 = t 2 y’ s instead of x ’s First value chosen by drawing a y 1 value randomly according to initial distribution p(y 1 ) Value at time t = 2 chosen according to the conditional density function p(y 2 /y 1 ) y 3 is generated according to p(y 3 /y 2 ) 4

Markov model limitation • Influence of the past is completely summarized by the value of Y at time t-1 • Y does not have any long-range dependencies • This model may not be accurate in many situations – In modeling English text, where Y takes on values such as verb, adjective, noun, etc, deciding whether a verb is singular or plural depends on the subject of theverb which may be much further abck than just one word back 5

Real-valued Y • Markov model is specified as a conditional Normal distribution Deterministic function Linking the past y t-1 to present y t − − 2 ⎛ ⎞ 1 1 y g ( y 1 ) = − ⎜ ⎟ t t p ( y | y ) exp − σ t t 1 π σ ⎝ ⎠ 2 2 Noise in the model = α + α If g is chosen such that it is a linear function of y t-1 : g ( y 1 ) y − − t 0 1 t 1 It leads to first-order autoregressive model = α + α + y y e − 1 t 0 1 t 6

Hidden State Variable • Notion of hidden state for sequential and spatial models is prevalent in engineering and the sciences • Examples include HMMs and Kalman filters 7

Graphical Model of HMM Observation Hidden Variable State Variable 8

Generative view of HMM • Observations are generated by moving from left to right along the chain • Hidden state variable X is categorical (corresponding to m discrete states) and is first order Markov • Thus x t is generated by by sampling a value from the conditional distribution p(x t |x t-1 ) • Where p(x t |x t-1 ) is an m x m matrix • Once the state at time t is generated (with value x t ) an observation is generated with probability p(y t |x t ) 9

View of HMM as a Mixture Model • m different density functions for the Y variable with added Markov dependence between “adjacent” mixture components x t and x t+1 • Joint probability of an observed sequence and any particular state sequence is T ∏ = p ( y ,.., y , x ,.., x ) p ( x ) p ( y | x ) p ( y | x ) p ( x | x ) − 1 T 1 T 1 1 1 t t t t 1 = t 2 • To calculate p(y 1 ,..,y T ), the likelihood of the observed date, one has to sum the LHS terms over the m T possible state sequences. – Appears to involve a sum over an exponential number of terms – Viterbi algorithm performs the calculation in time proportional to O(m 2 T) 10

Generalizations of HMMs • k th order Markov model – x t depends on the previous k states • Dependence of y s can be generalized – y t depends on the previous k previous y s 11

Generalizations of HMMs • Kalman Filters – Hidden states are real-valued – E.g., unknown velocity or momentum of a vehicle – Independence structure is the same as for HMM 12

Relationship to Finite State Machines • First order HMM is directly equivalent to a stochastic finite state machine (FSM)with m states – Choice of the next state is governed by p(x t |x t+1 ) • FSMs are simple forms of regular grammars • Next level up are context-free grammars – Augmenting FSM with a stack – To remember long-range dependencies such as closing parentheses – Models become more expressive but much more difficult to fit to data • Although simple in structure HMMs have dominated due to difficulties of fitting such data 13

Markov Random Fields • Instead of Y s existing in an ordered sequence more general data dependencies • Such as data on a two-dimensional grid • MRFs are multidimensional analogs of Markov chains (in two-dimensions a grid structure instead of chains) 14

Models for Structured Data Linear Chains If we take a persons BP - PDF document

1 Models for Structured Data Linear Chains If we take a persons BP every five minutes over a 24 hour period then there is some significant dependence between successive values How to model this dependence? Occurs in protein

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Industrial Robots Industrial Robots Kinematic chains Kinematic chains Kinematic chains Kinematic

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Variational Inference for Tutorial Outline Structured NLP Models 1. Structured Models and Factor

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Semi-structured data Data is not just text, but is not as well- Semi-structured data

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Introduction to SparkSQL Structured Data Processing in Spark 1 Structured Data Processing A

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Food Losses/Waste in Food Value Chains Food Losses/Waste in Food Value Chains Areas

Imprecise Markov chains From basic theory to applications II prof. Jasper De Bock Imprecise

Overview Motivation Verifying Continuous-Time Markov Chains 1 Lecture 1+2: Discrete-Time Markov

Discrete time Markov chains Today: Discrete Time Markov Chains, Limiting Discrete time Markov

Lecture 18 Models for areal data Colin Rundel 03/22/2017 1 areal / lattice data 2 Example -

Seminar in Computer Graphics 186.175, WS 2019, 2.0h (3 ECTS) Philipp Erler

Finite element methods in scientifjc computing Wolfgang Bangerth, Colorado State University

Analysis of Survival Times Using Bayesian Networks Helge Langseth Presented at ESREL 98

Models s Prof. Leal-Taix and Prof. Niessner 1 Condit itio ional l GANs on Videos

Tracking using Goal CONDENSATION: Model-based visual tracking in dense Conditional Density

Tracking using CONDENSATION: Conditional Density Propagation M. Isard and A. Blake,

Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University