CSE 527 " Markov Models and Hidden Markov Models ! - PowerPoint PPT Presentation

CSE 527 " Markov Models and Hidden Markov Models ! http://upload.wikimedia.org/wikipedia/commons/b/ba/Calico_cat ! Dosage Compensation Reminder: Proteins “Read” DNA ! and X-Inactivation ! E.g.: ! 2 copies (mom/dad) of each chromosome 1-23 ! Mostly, both copies of each gene are expressed ! E.g., A B O blood group defined by 2 alleles of 1 gene ! Women (XX) get double dose of X genes (vs XY) ? ! So, early in embryogenesis: ! • One X randomly inactivated in each cell ! How? ! Helix-Turn-Helix ! ! ! ! Leucine Zipper ! • Choice maintained in daughter cells ! Calico: major coat color gene is on X !

Down DNA Methylation ! in the Groove ! CH 3 ! CpG - 2 adjacent nts, same strand " (not Watson-Crick pair; “p” mnemonic for the " Different phosphodiester bond of the DNA backbone) ! patterns of C of CpG is often (70-80%) methylated in " hydrophobic cytosine ! methyls, mammals i.e., CH 3 group added (both strands) ! potential H Why? Generally silences transcription. (Epigenetics) " bonds, etc. at X-inactivation, imprinting, repression of mobile elements, " edges of different base some cancers, aging, and developmental differentiation ! pairs. They’re How? DNA methyltransferases convert hemi- to fully- accessible, methylated ! esp. in major Major exception: promoters of housekeeping genes ! groove CH 3 ! DNA Methylation–Why ! Same Pairing ! CH 3 ! In vertebrates, it generally silences transcription ! (Epigenetics) X-inactivation, imprinting, repression of mobile " Methyl-C elements, cancers, aging, and developmental differentiation ! alters major E.g., if a stem cell divides, one daughter fated " groove cytosine ! to be liver, other kidney, need to ! CH 3 ! profile ( ∴ TF (a) turn off liver genes in kidney & vice versa, ! (b) remember that through subsequent divisions ! binding), but How? ! not base- (a) Methylate genes, esp. promoters, to silence them ! pairing, (b) after ÷, DNA methyltransferases convert hemi- to fully-methylated " (& deletion of methyltransferse is embrionic-lethal in mice) ! transcription Major exception: promoters of housekeeping genes ! or replication !

“CpG Islands” ! CpG Islands ! CH 3 ! Methyl-C mutates to T relatively easily ! CpG Islands ! Net: CpG is less common than " expected genome-wide: " More CpG than elsewhere (say, CpG/GpC>50%) ! cytosine ! f(CpG) < f(C)*f(G) ! More C & G than elsewhere, too (say, C+G>50%) ! Typical length: few 100 to few 1000 bp ! BUT in some regions (e.g. active NH 3 ! Questions ! promoters), CpG remain CH 3 ! unmethylated, so CpG → TpG less Is a short sequence (say, 200 bp) a CpG island or not? ! likely there: makes “CpG Islands”; Given long sequence (say, 10-100kb), find CpG islands? ! often mark gene-rich regions ! thymine ! Markov & Hidden Independence ! Markov Models ! References (see also online reading page): ! Eddy, "What is a hidden Markov model?" Nature Biotechnology, 22, #10 (2004) 1315-6. ! A key issue: Previous models we’ve talked about Durbin, Eddy, Krogh and Mitchison, “Biological assume independence of nucleotides in different Sequence Analysis”, Cambridge, 1998 (esp. chs 3, 5) ! positions - definitely unrealistic. ! Rabiner, "A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition," Proceedings of the IEEE, v 77 #2,Feb 1989, 257-286

Markov Chains ! A Markov Model (1st order) ! A sequence of random variables is a k-th order Markov chain if, for all i , i th value is independent of all but the previous k values: " } ! 0th " Example 1: Uniform random ACGT ! States: ! A,C,G,T ! order ! Example 2: Weight matrix model ! Emissions: ! corresponding letter ! } ! 1st ! Transitions: ! a st = P(x i = t | x i- 1 = s) ! Example 3: ACGT, but ↓ Pr(G following C) ! 1st order ! order ! A Markov Model (1st order) ! Pr of emitting sequence x ! States: ! A,C,G,T ! Emissions: ! corresponding letter ! Transitions: ! a st = P(x i = t | x i- 1 = s) ! B egin/ E nd states !

Training ! Discrimination/Classification ! Max likelihood estimates for transition Log likelihood ratio of CpG model vs background model ! probabilities are just the frequencies of transitions when emitting the training sequences ! E.g., from 48 CpG islands in 60k bp: ! From DEKM ! From DEKM ! CpG Island Scores ! What does a 2nd order CpG islands ! Markov Model look like? " Non-CpG ! 3rd order? ! Figure 3.2 Histogram of length-normalized scores. ! From DEKM !

Questions ! Combined Model ! Q1: Given a short sequence, is it more likely from CpG + " } ! model ! feature model or background model? Above ! Q2: Given a long sequence, where are the features in it (if any) ! Approach 1: score 100 bp (e.g.) windows ! CpG – " } ! Pro: simple ! model ! Con: arbitrary, fixed length, inflexible ! Approach 2: combine +/- models. ! Emphasis is “Which (hidden) state?” not “Which model?” ! The Occasionally Hidden Markov Models " Dishonest Casino ! (HMMs; Claude Shannon, 1948) ! 1 fair die, 1 “loaded” die, occasionally swapped !

Rolls � 315116246446644245311321631164152133625144543631656626566666 � Inferring hidden stuff ! Die � FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFLLLLLLLLLLLLLLL � Viterbi � FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFLLLLLLLLLLLL � Rolls � 651166453132651245636664631636663162326455236266666625151631 � Die � LLLLLLFFFFFFFFFFFFLLLLLLLLLLLLLLLLFFFLLLLLLLLLLLLLLFFFFFFFFF � Viterbi � LLLLLLFFFFFFFFFFFFLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLFFFFFFFF � Joint probability of a given path ! & emission sequence x: " Rolls � 222555441666566563564324364131513465146353411126414626253356 � Die � FFFFFFFFLLLLLLLLLLLLLFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFLL � Viterbi � FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFL � Rolls � 366163666466232534413661661163252562462255265252266435353336 � Die � LLLLLLLLFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF � But ! is hidden; what to do? Some alternatives: ! Viterbi � LLLLLLLLLLLLFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF � Rolls � 233121625364414432335163243633665562466662632666612355245242 � Most probable single path ! Die � FFFFFFFFFFFFFFFFFFFFFFFFFFFLLLLLLLLLLLLLLLLLLLLLLFFFFFFFFFFF � Viterbi � FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFLLLLLLLLLLLLLLLLLLLFFFFFFFFFFF � Sequence of most probable states ! Figure 3.5 ! Rolls: Visible data–300 rolls of a die as described above. ! Die: Hidden data–which die was actually used for that roll (F = fair, L = loaded). ! Viterbi: the prediction by the Viterbi algorithm is shown. ! From DEKM ! The Viterbi Algorithm: " Unrolling an HMM ! The most probable path ! 3 6 6 2 ... ! Viterbi finds: ! L ! L ! L ! L ! ... ! Possibly there are 10 99 paths of prob 10 -99 ! F ! F ! F ! F ! ... ! More commonly, one path (+ slight variants) dominate others. " t=0 t=1 t=2 t=3 ! (If not, other approaches may be preferable.) ! Key problem: exponentially many paths !" Conceptually, sometimes convenient ! Note exponentially many paths !

Viterbi ! HMM Casino Example ! ! probability of the most probable path ! ! emitting and ending in state l ! Initialize : ! General case : ! (Excel spreadsheet on web; download & play…) ! HMM Casino Example ! Viterbi Traceback ! Above finds probability of best path ! To find the path itself, trace backward to the state k attaining the max at each stage ! (Excel spreadsheet on web; download & play…) !

Most probable path ≠ Sequence Rolls � 315116246446644245311321631164152133625144543631656626566666 � Die � FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFLLLLLLLLLLLLLLL � Viterbi � FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFLLLLLLLLLLLL � of most probable states ! Rolls � 651166453132651245636664631636663162326455236266666625151631 � Die � LLLLLLFFFFFFFFFFFFLLLLLLLLLLLLLLLLFFFLLLLLLLLLLLLLLFFFFFFFFF � Viterbi � LLLLLLFFFFFFFFFFFFLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLFFFFFFFF � Another example, based on casino dice again ! Rolls � 222555441666566563564324364131513465146353411126414626253356 � Suppose p(fair ↔ loaded) transitions are 10 -99 and Die � FFFFFFFFLLLLLLLLLLLLLFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFLL � Viterbi � FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFL � roll sequence is 11111…66666; then fair state is Rolls � 366163666466232534413661661163252562462255265252266435353336 � more likely all through 1’s & well into the run of Die � LLLLLLLLFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF � Viterbi � LLLLLLLLLLLLFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF � 6’s, but eventually loaded wins, and the Rolls � 233121625364414432335163243633665562466662632666612355245242 � improbable F → L transitions make Viterbi = all L. ! Die � FFFFFFFFFFFFFFFFFFFFFFFFFFFLLLLLLLLLLLLLLLLLLLLLLFFFFFFFFFFF � Viterbi � FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFLLLLLLLLLLLLLLLLLLLFFFFFFFFFFF � 1 ! 1 ! 1 ! 1 ! 1 ! 6 ! 6 ! 6 ! 6 ! 6 ! Figure 3.5 ! * = max prob ! L ! * ! * ! Rolls: Visible data–300 rolls of a die as described above. ! Die: Hidden data–which die was actually used for that roll (F = fair, L = loaded). ! * = Viterbi ! * ! * ! * ! * ! * ! * ! * ! * ! F ! Viterbi: the prediction by the Viterbi algorithm is shown. ! From DEKM ! An HMM (unrolled) ! Is Viterbi “best”? ! States ! Viterbi finds ! x 1 ! x 2 ! x 3 ! x 4 !! Emissions/sequence positions ! Most probable (Viterbi) path goes through 5, but most probable state at 2nd step is 6 " (I.e., Viterbi is not the only interesting answer.) !

CSE 527 " Markov Models and Hidden Markov Models ! - PowerPoint PPT Presentation

CSE 527 " Markov Models and Hidden Markov Models ! http://upload.wikimedia.org/wikipedia/commons/b/ba/Calico_cat ! Dosage Compensation Reminder: Proteins Read DNA ! and X-Inactivation ! E.g.: ! 2 copies (mom/dad) of each chromosome

Rigid Geometric Transformations COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision

Camera Calibration COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision Camera

Training Neural Nets COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision Training

Tracking Feature Windows COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision

CSE 527 Computational Biology http://www.cs.washington.edu/527 Lecture 1: Overview & Bio

CSE 527 Computational Biology http://www.cs.washington.edu/527 Lecture 1: Overview & Bio

CSE 527, Additional notes on MLE & EM Based on earlier notes by C. Grant & M. Narasimhan

Image Motion COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision Image Motion 1 /

HW2o Image Differentiation COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision

Correlation, Convolution, Filtering COMPSCI 527 Computer Vision COMPSCI 527 Computer

Rigid Geometric Transformations COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision

Image Pyramids COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision Image Pyramids 1

The Eight-Point Algorithm COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision The

The Singular Value Decomposition COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision

Image Motion COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision Image Motion 1 /

The Epipolar Geometry COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision The

AGENDA 1. Where are we now with COVID-19? 2. Phase 3 Back on Track Plan 3. Tips and Guidelines

Who We Are Our mission at CFED is to make it possible for

Effective symbolic dynamics E. Jeandel Montpellier, France, World, Universe March 4, 2012 E.

YEAR I R IN R N REVI VIEW EW 8: 8:30 30-9: 9:00 00 Congratulations 5 Year North Veterans

4/19/16 Some general and specific points for midterm papers: April

TRANSFORMATION Priya Patra | Program Manager | Capgemini Katrina Collins | Direct of Events |

An introduction to R: Basics of Algorithmics in R No emie Becker, Sonja Grath & Dirk

Computability Theory at Work: Factoring Polynomials and Finding Roots Russell Miller Queens

CSE 527 " Markov Models and Hidden Markov Models ! - PowerPoint PPT Presentation

CSE 527 " Markov Models and Hidden Markov Models ! http://upload.wikimedia.org/wikipedia/commons/b/ba/Calico_cat ! Dosage Compensation Reminder: Proteins Read DNA ! and X-Inactivation ! E.g.: ! 2 copies (mom/dad) of each chromosome

Rigid Geometric Transformations COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision

Camera Calibration COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision Camera

Training Neural Nets COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision Training

Tracking Feature Windows COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision

CSE 527 Computational Biology http://www.cs.washington.edu/527 Lecture 1: Overview &amp; Bio

CSE 527 Computational Biology http://www.cs.washington.edu/527 Lecture 1: Overview &amp; Bio

CSE 527, Additional notes on MLE &amp; EM Based on earlier notes by C. Grant &amp; M. Narasimhan

Image Motion COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision Image Motion 1 /

HW2o Image Differentiation COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision

Correlation, Convolution, Filtering COMPSCI 527 Computer Vision COMPSCI 527 Computer

Rigid Geometric Transformations COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision

Image Pyramids COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision Image Pyramids 1

The Eight-Point Algorithm COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision The

The Singular Value Decomposition COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision

Image Motion COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision Image Motion 1 /

The Epipolar Geometry COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision The

AGENDA 1. Where are we now with COVID-19? 2. Phase 3 Back on Track Plan 3. Tips and Guidelines

Who We Are Our mission at CFED is to make it possible for

Effective symbolic dynamics E. Jeandel Montpellier, France, World, Universe March 4, 2012 E.

YEAR I R IN R N REVI VIEW EW 8: 8:30 30-9: 9:00 00 Congratulations 5 Year North Veterans

4/19/16 Some general and specific points for midterm papers: April

TRANSFORMATION Priya Patra | Program Manager | Capgemini Katrina Collins | Direct of Events |

An introduction to R: Basics of Algorithmics in R No emie Becker, Sonja Grath &amp; Dirk

Computability Theory at Work: Factoring Polynomials and Finding Roots Russell Miller Queens

CSE 527 Computational Biology http://www.cs.washington.edu/527 Lecture 1: Overview & Bio

CSE 527 Computational Biology http://www.cs.washington.edu/527 Lecture 1: Overview & Bio

CSE 527, Additional notes on MLE & EM Based on earlier notes by C. Grant & M. Narasimhan

An introduction to R: Basics of Algorithmics in R No emie Becker, Sonja Grath & Dirk