→ What comes next? example for a hardness result: cross × plain → cross , ’all operations’ is Max SNP-hard (i.e. without the restriction w a = w r + w b ). 2 S.Will, 18.417, Fall 2011
→ Max-Cut-3 max cut v v v v v v • formal: 2 2 2 3 3 3 • let G = ( V , E ) be a graph v v v v v v 1 1 1 4 4 4 • a cut in G is a set of edges s.t. there is a partition v v v v v v 5 5 5 6 6 6 V 1 ⊎ V 2 = V , where for every edge one endpoint is in V 1 , the other in V 2 . • Max-Cut-3: given graph g with degree ≤ 3, find cut with maximal cardinality. Theorem Max-Cut-3 is Max-SNP-hard Remark An optimization problem is Max-SNP-hard iff it does not have a PTAS (Polynomial Time Approximation Scheme). S.Will, 18.417, Fall 2011 A PTAS is an algorithm that takes an instance of a maximization problem and a parameter ǫ > 0 and, in polynomial time, produces a solution that is within a factor 1 − ǫ of being maximal.
→ Reduction of Max-Cut-3 to cross × plain → cross Reduction idea: represent Max-Cut-3 problem as alignment problem cross × plain → cross such that optimal alignment corresponds to maximum cut. → if Max-Cut-3 can be solved using the alignment problem, the alignment problem must also be Max-SNP-hard. Plan • show how to represent graph G as input of alignment problem (e.g. Sequences S 1 , S 2 + structure P 1 for S 1 ) • show how optimal alignment corresponds to maximum cut for G . S.Will, 18.417, Fall 2011
→ Representation of Graph G as Alignment Problem (Example) v v 2 3 v v 1 4 v v 5 6 S.Will, 18.417, Fall 2011 AAAUUU AAAUUU AAAUUU AAAUUU AAAUUU AAAUUU UUUAAA UUUAAA UUUAAA UUUAAA UUUAAA UUUAAA v v v v v v 1 2 3 4 5 6
→ Representation of Graph G as Alignment Problem (formally) v v 2 3 • given G = v v 1 4 v v 5 6 • sequences • S 1 = ( AAAUUU ( C ) c ) n − 1 AAAUUU , and • S 2 = ( UUUAAA ( C ) c ) n − 1 UUUAAA . • the segments AAAUUU in S 1 and UUUAAA in S 2 correspond to the nodes • each edge ( v i , v j ) of G corresponds to two arcs in P 1 : one connecting S.Will, 18.417, Fall 2011 an A of the i -th segment with a U of the j -th segment and one connecting a U of the i -th segment with an A of the j -th segment. • C s are used to avoid alignment of different segments, and their number c depends on the ratio min( w b , w a , w r ) ← arc changes w d ← base deletion
→ Correspondence of Optimal Alignment and Max Cut Properties of Optimal Alignment • we choose c such that every optimal alignment must match all C s • we choose a scoring with w m > w d and 2 w a > w b + w r . A A A U U U A A A U U U • w m > w d implies no base mismatch: > U U U A A A U U U A A A A A A U U U • two alignment types for each node v i : • A-type: U U U A A A A A A U U U • U-type: U U U A A A • A-type : ⇔ node in V 1 U-type : ⇔ node in V 2 . • cost for each edge of the cut ( v i and v j have different type) arc breaking S.Will, 18.417, Fall 2011 arc removing A A A U U U A A A U U U U U U A A A U U U A A A cost: w b + w r
→ Correspondence of Optimal Alignment and Max Cut • cost for each edge that is not in the cut ( v i and v j have same type) arc altering arc altering A A A U U U A A A U U U U U U A A A U U U A A A cost: 2 w a • total cost for alignment: • V 1 = all A-type nodes • V 2 = all U-type nodes • n nodes, each degree 3 ⇒ 3 n 2 edges • k := | cut( V 1 , V 2 ) | C = k ( w b + w r ) + (3 n 2 − k ) 2 w a + n 3 w d assumption: 2 w a > w b + w r > 0 ⇒ S.Will, 18.417, Fall 2011 � �� � = 3 n ( w a + w d ) − k (2 w a − w b − w r ) • ⇒ C minimal ≡ k maximal • ⇒ maximal cut ≡ minimal edit distance.
→ Approaches for Alignments of RNAs Plan C Plan A Plan B A: B: ALIGN FOLD single single sequences sequences A: A: B: simultanously FOLD ALIGN and FOLD B: alignment [Sankoff 85] ALIGN sequence AND structure A: B: consensus structure S.Will, 18.417, Fall 2011 A: adopted from: B: [Gardener & Giiegerich BMC 2004] consensus: consensus structure:
→ Simultaneous Alignment and Folding: Sankoff (1985) • What do we want? What means folding into a common structure? • First idea: preserve “shape” ≡ branching structure • Formally: let i 1 < i 2 . . . < i v in a and j 1 < j 2 . . . < j w in b be the positions in pairs that limit multiloops or are external ( branching configuration ) Then: structures equivalent (according to branching) iff v = w , and ( i f , i g ) ∈ P a if and only if ( j f , j g ) ∈ P b • finding good equivalent structures not sufficient: S.Will, 18.417, Fall 2011 • Hence: minimize edit distance + energies (of 2 equiv. structures)
→ Sankoff Problem Definition • Idea: Sankoff = Zuker Folding + Needleman/Wunsch Alignment • IN: two sequences a and b • find two equivalent structures P a and P b and compatible alignment A of a and b such that Energy ( a , P a ) + Energy ( b , P b ) + EditDistance ( A ) minimal • where: Energy yields (loop-based) Turner free energy, EditDistance is edit distance (base mismatch x, indel y) • what means compatible ? alignment must be “consistent” with branching structure S.Will, 18.417, Fall 2011 formally: the base pairs ( i f , i g ) ∈ P a and ( j f , j g ) ∈ P b (from Def. of equivalent) must be aligned to each other
→ Constraints We want to find the optimal structures + alignment with the following constraints: constraints on the predicted structures: • must be equivalent (intuitively: same kind of multiloops) constraints on the alignment: • multiloops must be aligned to their equivalent partner • hairpin loops must be aligned to their equivalent partner • each 2-loop (or stacking or bulge) must be aligned to exactly one other 2-loop or must be entirely aligned to a gap. S.Will, 18.417, Fall 2011
→ Edit distance of sub-sequences • distance based score x = base mismatch y = base deletion/insertion • D ( i , j ; h , k ) minimum sequence alignment cost between sequences a i . . . a j and b h . . . b k . D ( i , j − 1; h , k − 1) + x if a j � = b k D ( i , j − 1; h , k − 1) if a j = b k D ( i , j ; h , k ) = min D ( i , j − 1; h , k ) + y D ( i , j ; h , k − 1) + y • Recursion: D ( i + 1 , j ; h + 1 , k ) + x if a i � = b h D ( i + 1 , j ; h + 1 , k ) if a i = b h = min D ( i + 1 , j ; h , k ) + y D ( i , j ; h + 1 , k ) + y S.Will, 18.417, Fall 2011 � x if a i � = b h • Initialization: D ( i , i ; h , h ) = 0 else
→ Recall Zuker • Energies: e ( s ), where s is k-loop (or s = φ for empty structure) • F ( i , j ) “free”, minimum energy for subsequence a i . . . a j • C ( i , j ) “closed”, minimum energy for subsequence where ( i , j ) ∈ P • Zuker Recursion: • Problem: (6) requires time proportional to n 2 K S.Will, 18.417, Fall 2011 where K maximum k in k -loops
→ Usual Simplification • e(s) for k-loops with k ≥ 3 (multiloops) e ( s ) = A + ( k − 1) P + uQ • New matrix: G ( i , j ) for multiloops • Recursion: S.Will, 18.417, Fall 2011
→ Simultanous Alignment and Folding • Extend definition of D ( i 1 , j 1 ; i 2 , j 2 ) if i 1 > j 1 , then cost for deleting b i 2 . . . b j 2 . if j 2 > i 2 , then cost for deleting a i 1 . . . a j 1 . • F ( i 1 , j 1 ; i 2 , j 2 ) minimum cost (sum of alignment and free energy) for a i 1 . . . a j 1 and b i 2 . . . b j 2 . • C ( i 1 , j 1 ; i 2 , j 2 ): minimum cost for a i 1 +1 . . . a j 1 − 1 and b i 2 +1 . . . b j 2 − 1 under condition ( i 1 , j 1 ) ∈ P a and ( i 2 , j 2 ) ∈ P b S.Will, 18.417, Fall 2011
→ Simultanous Alignment and Folding: “Closed” S.Will, 18.417, Fall 2011
→ Simultanous Alignment and Folding: Multiloop • G ( i 1 , j 1 ; i 2 , j 2 ): matrix for multiloop alignment • Recursion for G G ( i 1 , j 1 ; i 2 , j 2 ) match j 1 and j 2 match i 1 and i 2 � �� � � �� � C ( i 1 , j 1 ; i 2 , j 2 ) + 2 P + D ( i 1 , i 1 ; i 2 , i 2 ) + D ( j 1 , j 1 ; j 2 , j 2 ) G ( i 1 , h 1 ; i 2 , h 2 ) + ( j 1 − h 1 + j 2 − h 2 ) Q = min + D ( h 1 + 1 , j 1 ; h 2 + 1 , j 2 ) , min G ( i 1 , h 1 ; i 2 , h 2 ) + G ( h 1 + 1 , j 1 ; h 2 + 1 , j 2 ) , i 1 < h 1 < j 1 ( h 1 − i 1 + 1 + h 2 − i 2 + 1) Q i 2 < h 2 < j 2 + D ( i 1 , h 1 ; i 2 , h 2 ) + G ( h 1 + 1 , j 1 ; h 2 + 1 , j 2 ) S.Will, 18.417, Fall 2011
→ Simultanous Alignment and Folding: “free” • Recursion for F C ( i 1 , j 1 ; i 2 , j 2 ) + D ( i 1 , i 1 ; i 2 , i 2 ) + D ( j 1 , j 1 ; j 2 , j 2 ) min F ( i 1 , h 1 ; i 2 , h 2 ) + F ( h 1 + 1 , j 1 ; h 2 + 1 , j 2 ) F ( i 1 , j 1 ; i 2 , j 2 ) = min i 1 < h 1 < j 1 i 2 < h 2 < j 2 D ( i 1 , j 1 ; i 2 , j 2 ) • with initial conditions C ( i 1 , i 1 ; i 2 , i 2 ) = ∞ and G ( i 1 , i 1 ; i 2 , j 2 ) = G ( i 1 , j 1 ; i 2 , i 2 ) = ∞ S.Will, 18.417, Fall 2011
→ Complexity space complexity O ( n 4 ) • constant number of matrices (C,D,F, and G) • each of them has O ( n 4 ) entries time complexity O ( n 6 ) • each entry of matrix D requires constant time • each entry of F,C, and G requires O ( n 2 ) time (minimize over all h 1 , h 2 ) • hence: n 4 · n 2 = n 6 S.Will, 18.417, Fall 2011
Recommend
More recommend