CS481: Bioinformatics Algorithms
Can Alkan EA224 calkan@cs.bilkent.edu.tr
http://www.cs.bilkent.edu.tr/~calkan/teaching/cs481/
CS481: Bioinformatics Algorithms Can Alkan EA224 - - PowerPoint PPT Presentation
CS481: Bioinformatics Algorithms Can Alkan EA224 calkan@cs.bilkent.edu.tr http://www.cs.bilkent.edu.tr/~calkan/teaching/cs481/ HMM for Fair Bet Casino (contd) HMM model for the HMM model for the Fair Bet Casino Fair Bet Casino Problem
http://www.cs.bilkent.edu.tr/~calkan/teaching/cs481/
A
A path path π = π π = π1… π … πn in the HMM in the HMM is defined as a is defined as a sequence of states. sequence of states.
Consider path
Consider path π π = = FFFBBBBBFFF and sequence FFFBBBBBFFF and sequence x x = = 01011101001 01011101001
x 0 1 0 1 1 1 0 1 0 0 1
P( P(xi|π |πi) ½ ½ ½ ¾ ¾ ¾ ½ ½ ½ ¾ ¾ ¾ ¼ ¼ ¾ ½ ½ ½ ¾ ½ ½ ½ P(π P(πi-1
1 πi)
½ ½ 9/10
10
9/10
10 1/10 10 9/10 10 9 9/10 10 9 9/10 10 9/10 10 1/10 10 9/10 10 9/10 10
Transition probability from state πi-1
1 to state π
to state πi
Probability that xi was emitted from state πi
P(
i → π
i+1) i=1 =1
, π1 1 ·
i (x
i)
, πi+1 i+1
Goal:
Input:
Output:
Andrew Viterbi used the Manhattan grid
Every choice of
n corresponds to a
The only valid direction in the graph is
This graph has |
w
(k, i) (l, i+1)
w
(k, i) (l, i+1)
i+1 (x
i+1) . a
, πi+1 i+1 i=0 i=0
w
(k, i) (l, i+1)
th term = e e πi+1
i+1 (x
(xi+1
i+1) . a
. a πi,
, πi+1 i+1
w
(k, i) (l, i+1)
i-th term th term = e πi (x (xi) . a . a πi,
, πi+1 i+1 =
= el(xi+1). akl for πi
i =k, π
=k, πi+1
i+1=l
l,i+1 =
maxk Є Q
Є Q {sk,i k,i ·
· weight of edge between weight of edge between (k,i) (k,i) and and (l,i+1) (l,i+1)}= }= max maxk Є Q
Є Q {sk,i k,i · a
· akl
kl · e
· el (x (xi+1
i+1)
) }= }= e el (x (xi+1
i+1) ·
) · max maxk Є Q
Є Q {sk,i k,i · a
· akl
kl}
begin,0 = 1
k,0 = 0 for
Є Q {sk,n k,n .
k,end}
begin,0 = 1
k,0 = 0 for
Є Q {sk,n k,n .
k,end}
Is there a problem here?
The value of the product can become
The value of the product can become
To avoid overflowing, use log value instead.
k,i+1=
k Є Q Є Q {sk,i k,i + log(akl)}
Define
k,i (forward probability
i and
The recurrence for the forward algorithm:
k,i =
l,i-1 . a
lk l Є Q
l Є Q
However,
The sequence of transitions and emissions
i+1 and
Define
k,i as the
i = k
i+1…x
The recurrence for the
k,i =
i+1) .
l,i+1 . a
kl l Є Q
l Є Q
The probability that the dealer used a
_______________ =
______________
P(x) is the sum of P(x, π P(x, πi = k) = k) over all k
A distant cousin of functionally related sequences in
However, they may have weak similarities with
The goal is to align a sequence to
Family of related proteins can be represented by
HMMs can also be used for aligning a
A
Multiple alignment of a protein family shows
Example: after aligning many globin proteins,
A Profile HMM is a probabilistic
A given multiple alignment (of a protein
This model then may be used to find and
Multiple alignment is used to construct the HMM model. Assign each column to a Match state in HMM. Add Insertion and
Deletion state.
Estimate the emission probabilities according to amino acid
counts in column. Different positions in the protein will have different emission probabilities.
Estimate the transition probabilities between Match, Deletion and
Insertion states
The HMM model gets trained to derive the optimal parameters.
Match states
Insertion states
Deletion states
log(a
MI)+log(a
IM) =
log(a
II) = gap extension penalty
Ij(a) = p(a)
Define
j (i)
vI j j (i)
j j (i)
vM
j-1(i
(i-1) + log(a 1) + log(aMj-1,
1,Mj j )
vM
j(i) = log (e
(i) = log (eMj(x (xi)/p(x )/p(xi)) + max v )) + max vI
j-1(i
(i-1) + log(a 1) + log(aIj-1, ,Mj
j )
vD
j-1(i
(i-1) + log(a 1) + log(aDj-1,Mj
j )
vM
j(i
(i-1) + log(a 1) + log(aMj, I , Ij) vI
j(i) = log (e
(i) = log (eIj(x (xi)/p(x )/p(xi)) + max v )) + max vI
j(i
(i-1) + log(a 1) + log(aIj, I , Ij) vD
j(i
(i-1) + log(a 1) + log(aDj, I , Ij)