Insertions Yielding Equivalent Double Occurrence Words Daniel A. Cruz, Margherita Maria Ferrari, Nataˇ sa Jonoska, Lukas Nabergall, and Masahico Saito University of South Florida mmferrari@usf.edu 19 May, 2019 1 / 23
Motivation: Analysis of DNA Scrambling in Ciliates M 5 M 1 M 1 M 2 M 3 M 5 M 4 M 3 1 12 23 4 34 ⇒ w = 11223434 1 4 2 M 2 3 M 4 Prescott, D. M. Genome Gymnastics: Unique Modes of DNA Evolution and Processing in Ciliates . Nature Reviews Genetics 1 (3) (2000) pp. 191-198. 2 / 23
Preliminaries Given an alphabet Σ, e.g. N , ◮ w = 15164443 is a word over Σ ◮ The length of w is 8, written | w | = 8 ◮ The set of symbols used in w is Σ[ w ] = { 1 , 3 , 4 , 5 , 6 } ◮ w R = 34446151 is the reverse of w The set of all words over Σ is denoted Σ ∗ and includes the empty word ǫ . 3 / 23
Preliminaries Given an alphabet Σ, e.g. N , ◮ w = 15164443 is a word over Σ ◮ The length of w is 8, written | w | = 8 ◮ The set of symbols used in w is Σ[ w ] = { 1 , 3 , 4 , 5 , 6 } ◮ w R = 34446151 is the reverse of w The set of all words over Σ is denoted Σ ∗ and includes the empty word ǫ . The word w is a double occurrence word (DOW) if each symbol in Σ appears 0 or 2 times in w . The set of all DOWs is Σ DOW . 11 , 1221 , 11223434 ∈ Σ DOW The size of the DOW w is | w | / 2 Single occurrence words (SOWs) are similarly defined. 4 / 23
Definition: Equivalence Words v , w ∈ Σ ∗ are equivalent if there exists a bijection f : Σ → Σ such that f ( v ) = w ; in this case, we write v ∼ w . 123123 1 2 3 1234562345617887 ↓ ↓ ↓ ↓ ↓ 321321 3 2 1 1232314567887654 Equivalent Words Non Equivalent Words 5 / 23
Definition: Equivalence Words v , w ∈ Σ ∗ are equivalent if there exists a bijection f : Σ → Σ such that f ( v ) = w ; in this case, we write v ∼ w . 123123 1 2 3 1234562345617887 ↓ ↓ ↓ ↓ ↓ 321321 3 2 1 1232314567887654 Equivalent Words Non Equivalent Words A word w = a 1 · · · a n is in ascending order if: ◮ a 1 = 1 ◮ when i appears for the first time, it is preceded by 1 , 2 , . . . , i − 1 For example: 123123 is ascending order while 131232 is not 6 / 23
Definition: Repeat and Return Words Given w ∈ Σ ∗ and SOW u ∈ Σ + = Σ ∗ \ { ǫ } , ◮ the word uu is a repeat word in w if w = z 1 uz 2 uz 3 for some z 1 , z 2 , z 3 ∈ Σ ∗ ◮ the word uu R is a return word in w if w = z 1 uz 2 u R z 3 for some z 1 , z 2 , z 3 ∈ Σ ∗ w Repeat words 1123455234678876 234234, 2323, 88, etc. w Return words 1123455234678876 678876, 6776, 22, etc. A repeat word uu or return word uu R is trivial if | u | = 1. 7 / 23
Repeat and Return Words in Ciliate DNA M 6 M 7 M 8 M 9 M 11 M 1 M 3 M 10 M 2 M 4 M 5 M 12 M 13 56 67 78 89 ab 1 23 9 a 12 34 45 bc c w 0 = 56677889 ab 1239 a 123445 bcc Burns, J. et al. Recurring patterns among scrambled genes in the encrypted genome of the ciliate Oxytricha trifallax . Journal of Theoretical Biology 410 (2016) pp. 171-180. 8 / 23
Repeat and Return Words in Ciliate DNA M 6 M 7 M 8 M 9 M 11 M 1 M 3 M 10 M 2 M 4 M 5 M 12 M 13 56 67 78 89 ab 1 23 9 a 12 34 45 bc c w 0 = 56677889 ab 1239 a 123445 bcc w 1 = 59 ab 1239 a 1235 b Burns, J. et al. Recurring patterns among scrambled genes in the encrypted genome of the ciliate Oxytricha trifallax . Journal of Theoretical Biology 410 (2016) pp. 171-180. 9 / 23
Repeat and Return Words in Ciliate DNA M 6 M 7 M 8 M 9 M 11 M 1 M 3 M 10 M 2 M 4 M 5 M 12 M 13 56 67 78 89 ab 1 23 9 a 12 34 45 bc c w 0 = 56677889 ab 1239 a 123445 bcc w 1 = 59 ab 1239 a 1235 b w 2 = 5 b 5 b Burns, J. et al. Recurring patterns among scrambled genes in the encrypted genome of the ciliate Oxytricha trifallax . Journal of Theoretical Biology 410 (2016) pp. 171-180. 10 / 23
Repeat and Return Words in Ciliate DNA M 6 M 7 M 8 M 9 M 11 M 1 M 3 M 10 M 2 M 4 M 5 M 12 M 13 56 67 78 89 ab 1 23 9 a 12 34 45 bc c w 0 = 56677889 ab 1239 a 123445 bcc w 1 = 59 ab 1239 a 1235 b w 2 = 5 b 5 b w 3 = ǫ Nested appearances of repeat and return words explain over 95% of all scrambled MIC genome of Oxytricha trifallax . Burns, J. et al. Recurring patterns among scrambled genes in the encrypted genome of the ciliate Oxytricha trifallax . Journal of Theoretical Biology 410 (2016) pp. 171-180. 11 / 23
Definition: Repeat and Return Insertions Given w = a 1 · · · a n ∈ Σ DOW in ascending order, ◮ let 1 ≤ k ≤ ℓ ≤ n + 1, ◮ let u be a SOW over Σ \ Σ[ w ] in ascending order, where | u | = ν Then I ( ν, k , ℓ ) is an insertion into w which acts as follows: w ⋆ I ( ν, k , ℓ ) = a 1 · · · a k − 1 ua k · · · a ℓ − 1 u ′ a ℓ · · · a n where � u for repeat insertion ( I = ρ ) u ′ = u R for return insertion ( I = τ ) . ρ (3 , 4 , 6) − → 1232314554 1236782367814554 τ (3 , 7 , 11) 1232314554 − → 1232316784554876 12 / 23
Insertions Yielding Equivalent DOWs The following insertions yield equivalent DOWs: 1221 ⋆ τ (2 , 3 , 3)= 12344321 1 2 3 4 ↓ ↓ ↓ ↓ 1221 ⋆ τ (2 , 1 , 5)= 34122143 3 4 1 2 ∼ 12344321 If w 1 = w ⋆ I 1 ( ν 1 , k 1 , ℓ 1 ) ∼ w ⋆ I 2 ( ν 2 , k 2 , ℓ 2 ) = w 2 , what can we say about w if I 1 and I 2 are “distinct” (i.e. ( k 1 , ℓ 1 ) � = ( k 2 , ℓ 2 ))? If w 1 ∼ w 2 , then ν 1 = ν 2 = ν 13 / 23
Insertions Yielding Equivalent DOWs Without loss of generality, we take k 1 ≤ k 2 . Suppose that k 1 = k 2 : u ′ 1 ∈ { u , u R } u w 1 w 2 u ′ 2 ∈ { u , u R } u Thus, k 1 � = k 2 ; similarly ℓ 1 � = ℓ 2 . We have three cases: u ′ u ′ u ′ u u u 1 1 1 u ′ u ′ u ′ u u u 2 2 2 Interleaving Nested Sequential ( k 1 ≤ ℓ 1 < k 2 ≤ ℓ 2 ) ( k 1 < k 2 ≤ ℓ 1 < ℓ 2 ) ( k 1 < k 2 ≤ ℓ 2 < ℓ 1 ) 14 / 23
Interleaving Insertions ( k 1 < k 2 ≤ ℓ 1 < ℓ 2 ) We consider two repeat insertions to start. u z 1 z 2 u z 3 w 1 w 2 z 1 u z 2 z 3 u Note that uz 1 ∼ z 1 u . 15 / 23
Interleaving Insertions ( k 1 < k 2 ≤ ℓ 1 < ℓ 2 ) u f ( u ) z 2 u z 3 w 1 f w 2 f ( u ) u z 2 z 3 u 16 / 23
Interleaving Insertions ( k 1 < k 2 ≤ ℓ 1 < ℓ 2 ) u f ( u ) z 2 u z 3 w 1 f w 2 f ( u ) u z 2 z 3 u f 2 ( u ) u f ( u ) z 2 u z 3 w 1 f f w 2 f 2 ( u ) f ( u ) u z 2 z 3 u 16 / 23
Interleaving Insertions ( k 1 < k 2 ≤ ℓ 1 < ℓ 2 ) f h ( u ) f h ( u ) u f ( u ) · · · z 2 u f ( u ) · · · w 1 f f f f f w 2 f ( u ) f 2 ( u ) · · · u z 2 f ( u ) · · · u We adapt a result by Lyndon and Sch¨ utzenberger: Lemma If xz = zy and x � = ǫ , then x = st, z = ( st ) h s, and y = ts for some s , t ∈ Σ ∗ and h ≥ 0 . utzenberger, M.-P. “The equation a M = b N c P in a free Lyndon, R.C., and Sch¨ group.” The Michigan Mathematical Journal 9 :4 (1962) pp. 289-298. 17 / 23
Interleaving Insertions ( k 1 < k 2 ≤ ℓ 1 < ℓ 2 ) Proposition (Interleaving) ◮ For repeat insertions, z 1 z 3 is a repeat word. ◮ For return insertions, z 1 z 3 ∼ Int( k 2 − k 1 , ν ). ν Int( h , q ) = x 1 x 2 · · · x h x R 1 x R 2 · · · x R h where each x i x R is a return word i and | x i | = q for 1 ≤ i ≤ h . Int( h , q ) can be obtained recursively: x 1 x R 1 x 1 x 2 x R 1 x R 2 . . . x 1 x 2 · · · x h x R 1 x R 2 · · · x R h For example, Int(2 , 2) = 12342143 where x 1 = 12 and x 2 = 34. 18 / 23
Nested Insertions ( k 1 < k 2 ≤ ℓ 2 < ℓ 1 ) Proposition (Nested) ◮ For repeat insertions, z 1 z 3 ∼ Nes( k 2 − k 1 , ν ). ν ◮ For return insertions, z 1 z 3 is a return word. Nes( h , q ) = x 1 x 2 · · · x h − 1 x h x h x h − 1 · · · x 2 x 1 where each x i x i is a repeat word and | x i | = q for 1 ≤ i ≤ h . Nes( h , q ) can be obtained recursively: x 1 x 1 x 1 x 2 x 2 x 1 . . . x 1 x 2 · · · x h − 1 x h x h x h − 1 · · · x 2 x 1 For example, Nes(2 , 2) = 12343412 where x 1 = 12 and x 2 = 34. 19 / 23
Sequential Insertions ( k 1 ≤ ℓ 1 < k 2 ≤ ℓ 2 ) Consider the following words: v 0 = 123123 | v 0 | = 2 · 3 = 6 v 1 = 1234512345 = v 0 ⋆ ρ (2 , | v 0 | − 2 , | v 0 | + 1) v 2 = 12345126734567 = v 1 ⋆ ρ (2 , | v 1 | − 2 , | v 1 | + 1) v 3 = 123451267348956789 = v 2 ⋆ ρ (2 , | v 2 | − 2 , | v 2 | + 1) Word v j is a ρ -tangled cord at level j , denoted T ρ ( ν, m , j ) with m = 3 and ν = 2. τ -tangled cord al level j T τ ( ν, m , j ) is defined similarly. Tangled cords , T ρ (1 , 1 , i ), were introduced in: Burns, J. et al. Four-regular graphs with rigid vertices associated to DNA recombination . Discrete Applied Mathematics, 161 :10-11 (2013) pp. 1378-1394. 20 / 23
Sequential Insertions ( k 1 ≤ ℓ 1 < k 2 ≤ ℓ 2 ) Proposition (Sequential) � ν, ℓ 1 − k 1 , k 2 − ℓ 1 � ◮ For repeat insertions, z 1 z 2 z 3 ∼ T ρ . 2 ν � ν, ℓ 1 − k 1 , k 2 − ℓ 1 � ◮ For return insertions, z 1 z 2 z 3 ∼ T τ . 2 ν Proposition Every T I ( ν, m , j ) is a palindrome where I ∈ { ρ, τ } . 21 / 23
Recommend
More recommend