Output of ABA 574 393 575 502 334 503 571 80 572 442 115 443 570 508 2 3 509 130(2) 510 10 512 492 182 6 511 493 578 170 579 584 405 585 602 146 603 566 127 567 439 476 580 530 480 551 86 486 92 583 189 137 524 2 501 478 2 2 555 2 5 162 2 2(2) 151 18(3) 152 30(4) 153 27(8) 52(4) 154 39(3) 161 3(2) 18 160 468 38 53 149 11(2) 150 11(2) 34 4 155 3(2) 156 31(4) 157 26(2) 158 96 159 376 568 20 111 487 10(2) 488 481 6 2 482 5(3) 483 5 375 598 5 379 587 113 588 132(2) 589 192 15(4) 10 3 378 147 479 558 162 257 82 253(2) 374 46 304 420 3 28 163 22 86(2) 377 5(3) 129 140 477 3 114 59 4(9) 60 59(2) 130 27(3) 19(2) 148 322 186 460 31 13(8) 461 110(2) 70 421 131 392 543 11 202 16(2) 61(2) 2(13) 114 267 5(6) 50(2) 25 20(6) 126 3(10) 127 23(4) 128 99 320 145 435 112 113 436 27(6) 17(3) 138 68(2) 139 29 125 29(2) 83 144 562 523 87 5 462 47 121 105 372 581 142 13 268 463 37(2) 464 22(3) 465 25(2) 466 60 55 303 130(4) 325 49(2) 437 22 72 321 596 36 37 143 499 2(2) 324 69 13(5) 124 206(2) 185 49(2) 109 18 26 405 592 7 43(3) 6(2) 65 532 19(2) 533 38(3) 467 98(2) 123 119 51 53 102 404 544 6 13 545 546 265 547 91(2) 4 30 199 56 72(2) 122 79 42 234 406 582 4 3 33(2) 42(3) 8 61 69 233 595 498 56 318 29(2) 319 389 8(3) 390 15(2) 391 447 3(3) 4 273 303 591 28 542 2 317 50(3) 37(2) 490 30(2) 219 3(6) 220 2(5) 221 38(2) 222 90 223 18(2) 224 8(2) 136 45(3) 137 61(2) 141 270 198 142 64(2) 143 5 4(7) 146 531 2 505 21(2) 491 80(2) 37 448 150 72(4) 21(6) 373 4(6) 4(6) 62 70 371 600 3 2 6(7) 516 11(11) 74 77 5(5) 225 6(7) 226 3(7) 32(8) 19 5(15) 20 411 412 599 9 3 142 147 8(2) 2(5) 18 24(3) 402 20(2) 403 70 5(4) 71 87 414 541 42(2) 221(2) 185 31 180 2(5) 9(4) 415 410 489 526 95 386 23(4) 387 2(3) 388 17 191 122 132 60(2) 232 5(3) 2 2 416 594 70(2) 136(2) 109 1 205 173 227 19(4) 228 8(3) 17 229 3(5) 14(4) 230 17 315 504 78 269 28(8) 270 16(3) 271 29(2) 272 80 28(2) 73 13(5) 154(2) 14 8(3) 274 25(5) 304 48(4) 4(2) 32 11 316 312 549 445 45(2) 446 10 30(4) 4(5) 278 74 97 82 8 305 59(2) 52 306 6(3) 307 5 58 163 20 7(14) 44 313 0 39 115 11(4) 116 2(6) 117 80(2) 31 62 20(3) 418 277 63 235 22(3) 26(2) 218 132 31(4) 133 14(3) 135 63 7(4) 65 109(3) 381 597 83 33(3) 118 14(2) 119 122 129 3(4) 56(2) 2 103 36(2) 216 14(2) 217 22 5(4) 110 241 11 419 417 444 385 124(4) 42 38 97(2) 60 6 5(4) 64 199 145 8 2 382 563 593 4(27) 86 7(10) 193 32(3) 77 3 3(9) 13(12) 53(2) 426 6 11 7 4(9) 45 190 7(2) 314 2 43 383 472 56 108(2) 380 22 98(8) 4 5 38(4) 93(4) 18(2) 8 27(8) 9 158(2) 413 195 368 30(2) 309 11(3) 310 293(2) 311 44 553 32 3(26) 33 66(4) 228 10 144 5(9) 76(3) 275 276 17 172 23(3) 16(2) 3(3) 41 3(2) 8 42 3 4 45 525 71 77(3) 34 26(4) 35 4(6) 36 5(4) 4(2) 75 83 8(16) 97 7(17) 9(8) 35(2) 44 4(12) 30 7 33 206 66 308 2(2) 46 2 2 48 47 441 52 59(11) 4(2) 37 2(2) 24 25 2 44(5) 26 29 38(4) 10(9) 10(3) 2(13) 28 12(4) 51(2) 60 48 9(10) 40 2(2) 2(2) 49 2 2 50 521 5 38 384 11(4) 38 50 10(2) 394 27 62(2) 5(3) 52 8 51 557 459 239 73(2) 240 79(8) 295 88 87 53(4) 7 39 11 14 3 5 53 54 537 60 58 77 279 36(5) 280 8(6) 2(3) 2(10) 3 384 2(12) 55 496 65 236 18 33 45 187 42(2) 188 473 2 456 20(2) 3(7) 68 28(3) 363 65 56 57 450 98(2) 7(2) 2 281 10(7) 17(2) 393 3(4) 2(2) 328 2(3) 10(2) 11 17(2) 395 26(2) 67(2) 396 60(4) 196 2(3) 5(2) 8(2) 197 78(3) 455 2(3) 457 2 6(2) 67 5 33(2) 180 81 474 28 11 57(9) 12 26(11) 4(14) 12(13) 4(12) 16 2(2) 407 41(3) 408 39 76 39(4) 497 152(2) 326 327 3(2) 356 3 2 329 17 97 195 200 6 37 59(2) 78 564 15 244 7(2) 470 10(3) 471 30(4) 237 4(6) 3 13 14 15 2 77 12(2) 79 23(4) 19(3) 67 451 42(3) 2 453 3(2) 454 9(5) 55 201 2(4) 2 202 4(3) 2 203 11(4) 204 32(5) 205 5(4) 206 20(2) 7(2) 146 211 7 6 199 198 534 469 66 24(3) 528 238 49(2) 265 17(2) 12(2) 12(14) 452 75(2) 9(2) 458 4 2 207 20 214 527 14 282 38(5) 296 42(3) 75 266 6 120 5(3) 121 307 2 20 32 213 212 560 134 83 8 310 2 2(4) 2(2) 2(8) 192 2(6) 194 7(3) 193 10(4) 6(6) 154 49(5) 63 208 210 548 46(8) 242 35 86(4) 34 84 251 6(6) 252 4(5) 253 3(7) 254 5(8) 255 6(4) 260 2(3) 2 2(6) 256 5(9) 2(7) 2 258 259 156(2) 173 174 87 175 48(2) 45 181 3(4) 177 89(3) 178 288 179 522 209 56(3) 22 5(19) 23 29(2) 283 3(4) 364(2) 115 2(2) 2 83(2) 18(3) 261 257 128 115 364 169(2) 365 409 82 573 25 56(2) 29(5) 109 21(3) 110 29 1054(2) 11(4) 353 11(3) 160 145 66 108 370 11(3) 346 19(3) 108 174 215 554 559 284 115(2) 285 2 2 246 2(6) 5(7) 217(2) 39 44 288 6(3) 84 184 40(3) 167 45(4) 168 507 227(2) 9 169 65 19 182 290 6(4) 291 9(5) 292 3(4) 293 5(3) 294 5(4) 295 3(5) 243 8(7) 244 2(6) 245 2(5) 2 247 248 108 8(2) 286 13(2) 287 3(12) 42 16 176 29 369 183 2 3(5) 222 249 19(6) 102 20 78(2) 250 30(2) 262 4 18 277(2) 46 362 367 30 3 297 20(4) 205 189 44(3) 191 65 289 366 168 25 85 2 332 3(3) 333 351 11(7) 245 263 2 298 358 43(2) 164 23(4) 165 29(3) 166 330 5(6) 331 17(5) 52(4) 334 5(2) 350 8(5) 111 58 338 60(2) 83(2) 170 19(3) 171 253(2) 21 264 168(2) 397 2(13) 2 5 9(2) 323 3 106 6(2) 107 32(2) 41(2) 188 32(3) 231 400 3(4) 104 9(3) 105 2 2 7 22 22 172 5(5) 103 41(2) 100 5 134 352 86 2(10) 87 4(12) 88 4(15) 89 15(14) 91 3(13) 92 85(2) 4(10) 93 6(5) 94 26(3) 95 41(2) 219(2) 7 190 351 19(2) 3(3) 354 4(4) 31 2(5) 336 4(7) 337 61 345 299 506 552 359 97 2 2 10(3) 16 32(2) 357 102 586 2(3) 57 107 101 484 4 355 2 167 243 96 529 3 339 51(3) 340 488(2) 341 98 99 429 104 30 25 3 3 360 565 239 6(16) 27 347 126(2) 141 90 361 515 518 65 2 183 14(5) 426 33(2) 427 302 540 2 93(2) 5 485 335 423 6(2) 424 3(4) 425 83 348 342 556 2 516 517 6 147 438 26(2) 105 344 343 550 47 23 300 434 139 107 301 561 6 124 401 398 440 422 9 399 437 2 428 449 2 91 198 590 134 475 128(2) 139 349 500 11 9 433 132(2) 569 2 520 3(2) 514 2(5) 431 4(6) 432 13(11) 172 124 519 601 2 10 536 4(2) 5(3) 136 140 604 7 9 25 539 4(2) 495 130 535 513 56 430 2 538 2 37 576 494 577 Applied to only 2 species. Rendering takes a long time. Hard to interprete (manually). Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Parsing output of ABA Applied to 4 species. Reconstructed A-Bruijn Graph from ABA-Output. Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Distribution of edge multiplicity High-weight edges point out to conserved and repeated elements. Within and across proteins. (Girth parameter did not seem to work.) Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Distribution of edge multiplicity (filtered) Filtered distribution of the multiplicity of edges (length > 40). Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Comparison with PFAM-Annotation Hidden markov models learned from multiple sequence alignments. Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Comparison with PFAM-Annotation Annotated all proteins with PFAM/HMMER. Detected 561 domains (not unique). Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Distribution of edges with domains ≈ 210 edges of multiplicity 1. ≈ 150 edges of multiplicity 16. Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Repeated domains Domains seem to share edges in ABA-graph. Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Repeated domains Domain Average Multiplicity DUF1679 21.0 Elf1 21.0 DUF1825 21.0 Fib alpha 21.0 ZZ 17.7 Otopetrin 17.0 CDK5 activator 17.0 . . . . . . RFX DNA binding 1.0 zfC5HC2 1.0 DUF1542 1.0 Rep N 1.0 DUF3619 1.0 TIP49 1.0 HTH Mga 1.0 Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Whats next? Do ABA-edges correlate with found domains? Apply real null model. Significance tests. Can ABA be used to complement the domains found with HMMER? Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Non-Collinear Alignment: Reannotation of genomes. Carosonella ruddii: an interesting thing unclassified γ -proteobacteria. (Like e.g. E.Coli ) Sequenced 2006. Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Carosonella ruddii what is it? Smallest bacterial genome known. → 160 Mb (!). E.Coli has 4,5 Gb Smallest genome before Carsonella 362 protein-coding genes in Buchnera aphidicola BCc Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Carosonella ruddii what is it? CG-Content: Very low (16%). E.coli : (50%) GC-Content GC Content is defined as: GC-content (or guanine-cytosine content), in molecular biology, is the percentage of specific bases on a DNA molecule which are either guanine or cytosine. Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Carosonella ruddii what is it? CG-Content: Very low (16%). E.coli : (50%) First annotation: 213 genes. E.coli : 4400 genes Minimal set of genes for life : Moya A. et al. proposed 2003 that the minimal gene set for a endosymbiotic life is close to 313. Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Interesting question DNA replication and repair system is strongly degraded. Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Interesting question DNA replication and repair system is strongly degraded. Transcriptioin machinery is reduced to core subunits of RNA Polymerase (no promotor-recognition) Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Interesting question DNA replication and repair system is strongly degraded. Transcriptioin machinery is reduced to core subunits of RNA Polymerase (no promotor-recognition) Translation machinery is highly reduced. (three essential rRNAs are present) Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Interesting question DNA replication and repair system is strongly degraded. Transcriptioin machinery is reduced to core subunits of RNA Polymerase (no promotor-recognition) Translation machinery is highly reduced. (three essential rRNAs are present) No Shine-Dalgarno sequence present (the way it is defiend) 16S rRNA and Shine-Dalgarno Sequence Shine-Dalgarno (SD) is a regulatory sequence strongly involved in translation of bacterial poly-cystronic mRNAs. Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Interesting question Is Carsonella ruddii a living cell? 9 aminoa-cyl-tRNA synthetases and 15 out of 50 essential ribosomal protein are missing or degraded. Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Interesting question Is Carsonella ruddii a living cell? 9 aminoa-cyl-tRNA synthetases and 15 out of 50 essential ribosomal protein are missing or degraded. Two different theories C.ruddii is a bacteria which undergoes the change to endosymbiont. C.ruddii is an former primary endosymbiont, is being driven towards its extinction and replacement by a new symbiont. Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Current Annotation What has been done until now 2006: First annotaion (213 genes) 2007: Second annotation Both teams used well known Gene-prediction algorithms + collinear alignment Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Current Annotation What has been done until now 2006: First annotaion (213 genes) 2007: Second annotation Both teams used well known Gene-prediction algorithms + collinear alignment Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Current Annotation What has been done until now 2006: First annotaion (213 genes) 2007: Second annotation Both teams used well known Gene-prediction algorithms + collinear alignment Problem: Over-annotation of function of genes. Many genes that are believed to be orthologous are much shorter and therefore deffer in their function. My goal use an non-collinear alignment algorithm to reannotate the whole genome of C.ruddii Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Reannotation Algorithms SuperMap + S-LAGAN A-Bruijn Alignment (ABA) Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
S-LAGAN Species used Carsonella Ruddii PV (160 kb genome, 213 genes) Buchnera aphidicola BCc (Cc) (+ a plasmid) : 450 kb. (397 genes) Candidatus Blochmannia floridanus: 705 kb. (631 genes). Wigglesworthia glossinidia (+ a plasmid): 698 kb. (651 genes) Baumannia cicadellinicola str. Hc: 686 kb (651 genes) Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
S-LAGAN plus Supermap A guiding tree (evolutionary tree) was build out of 16S-rRNAs of the species. Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
S-LAGAN plus Supermap A guiding tree (evolutionary tree) was build out of 16S-rRNAs of the species. Neighbor joining tree Maximum likelyhood tree Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Trees of 16S-rRNA sequence outtree_phylip_nj. Sun Jul 18 17:54:23 2010� Page 1 of 1 all_16S_rRNA_alignment_phylip_format.faa_phyml_tree.txt Sun Jul 18 17:56:23 2010� Page 1 of 1 0.02 0.05 Carsonella Carsonella Baumannia Wiggleswor Buchnera Blochmanni Blochmanni Buchnera Baumannia Wiggleswor Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
ABA Using “my” 5 Species Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
ABA Using “my” 5 Species Species 0 and 5: Wigglesworthia 1 and 6: Buchnera aphidicola 2 and 7: Carsonella Ruddii 3 and 8: Blochmannia 4 and 9: Baumannia Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
ABA Using “Moya’s” Species Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
ABA Using “Moya’s” Species Species 0 and 6: Buchnera aphidicola str. Cc 1 and 7: Buchnera aphidicola str. Bp 2 and 8: Buchnera aphidicola str. Sg 3 and 9: Buchnera aphidicola str. APS 4 and 10: Carsonella ruddii 5 and 11: E.Coli Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
ABA Only using Carsonella and E.Coli 2 Species (Carsonella and E.Coli) produce the same alignment as 6 Species from Moya paper Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
ABA Gene prediction 7 genes of 213 were cut by the prediction in C.ruddii . 22 genes of 4494 were cut by the prediction in E.Coli . Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
ABA Gene prediction 7 genes of 213 were cut by the prediction in C.ruddii . 22 genes of 4494 were cut by the prediction in E.Coli . Example region 0 - 46219 : 56 genes region 46219 - 47795 : 0 genes region 47795 - 53155 : 10 genes region 53155 - 53218 : 0 genes region 53218 - 54412 : 4 genes region 54412 - 56011 : 0 genes region 56011 - 58258 : 4 genes region 58258 - 59412 : 0 genes region 59412 - 65459 : 8 genes region 65459 - 67041 : 1 genes region 67041 - 70177 : 4 genes Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
ABA Gene prediction 7 genes of 213 were cut by the prediction in C.ruddii . 22 genes of 4494 were cut by the prediction in E.Coli . Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
ABA Gene prediction 7 genes of 213 were cut by the prediction in C.ruddii . 22 genes of 4494 were cut by the prediction in E.Coli . Example region 0 - 46219 : 56 genes region 46219 - 47795 : 0 genes region 47795 - 53155 : 10 genes region 53155 - 53218 : 0 genes region 53218 - 54412 : 4 genes region 54412 - 56011 : 0 genes region 56011 - 58258 : 4 genes region 58258 - 59412 : 0 genes region 59412 - 65459 : 8 genes region 65459 - 67041 : 1 genes region 67041 - 70177 : 4 genes Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
A possible future There are still at least 29 genes with no assigned function. Insightes into the possibility to create symbiotic life. Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Project: Reimplementation of S-LAGAN Using SeqAn F. Heeger, S. Specovius 1 Introduction to S-LAGAN 2 Implementation and Problems 3 Results Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
S-LAGAN Shuffle-Limited Area Global Alignment of Nucleotides S-LAGAN computes glocal alignments of 2 sequences → Set of local alignments which cover the whole sequence S-LAGAN is able to handle rearrangements Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
S-LAGAN Rearrangements No rearrangements Translocation Inversion Duplication Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
S-LAGAN Overview 1 Computation of local alignments 2 Chaining 3 Realignment of consistent subchains Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
S-LAGAN 1. Computation of local alignments S-LAGAN uses CHAOS for this step Applies CHAOS twice → Sequence 1 with sequence 2 → Sequence 1 with reverse complement of sequence 2 Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
S-LAGAN 2. Chaining 1-monotonic Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
S-LAGAN 3. Realignment of consistent subchains Consistent (co-linear) subchains are globally aligned S-LAGAN uses LAGAN for this step Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Implementation and Problems Goal Implementation in SeqAn Extract Chaos from SeqAn implementation of LAGAN Implement 1-monotonic chaining Use existing SeqAn implementation of LAGAN Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Implementation and Problems Local Alignments Find seeds with q-gram index Merge overlapping seeds Chain seeds with Chaos algorithm → Segmentation Fault on certain data → Only gap-free local matches Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Implementation and Problems Chaining Graph with nodes representing local matches Edges to all matches, which can be chained 1-monotonic → Heaviest path (Bellman-Ford Algorithm) O ( n 3 ) Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Implementation and Problems Realign Consistent Subchains Find consistent subchains Align them with global alignment algorithm LAGAN runs into an endless loop on certain data → Use Needleman-Wunsch Algorithm Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Results Our implementation... is very slow can be used on small data, like virus genomes ( ∼ 5000 bp) finds manually inserted rearrangements Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Introduction OSL Motivation Assume there are two assemblies obtained from different assemblers: Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Introduction OSL W hole G enome S hotgun Approach (WGS) Aim: Assemble a genome sequence from given reads. Reads → Collection of short sequences → Obtained from an automated sequencer → Orientation is not known Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Introduction OSL W hole G enome S hotgun Approach (WGS) Assemble overlapping reads together to obtain contigs. Contigs → Large, contiguous fragments of assembled reads ⇓ Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Introduction OSL Assembly Layout Problem Order and orientation of contigs is unknown ↓ Search for a good assembly layout ! Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
OSL O ptimal S yntenic L ayout of unfinished assemblies Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
OSL Idea Maximize no. of extended local diagonals Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
OSL Idea Maximize no. of extended local diagonals permute and flip contigs of assembly A Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
OSL Idea Maximize no. of extended local diagonals permute and flip contigs of assembly A Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
OSL Idea Maximize no. of extended local diagonals permute and flip contigs of assembly A switch roles of A and B Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
OSL Idea Maximize no. of extended local diagonals permute and flip contigs of assembly A switch roles of A and B Independency in constructing the layouts of A and B ! Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
The OSL Problem Basics Assemblies A = ( a 1 , . . . , a p ) B = ( b 1 , . . . , b q ) Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
The OSL Problem Basics Assemblies A = ( a 1 , . . . , a p ) B = ( b 1 , . . . , b q ) Set of Matches M = ( m 1 , . . . , m r ) Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
The OSL Problem Layout Local diagonal extension c and c ′ form a local diagonal extension iff y ∼ y ′ and o = o ′ Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
The OSL Problem Layout Local diagonal extension c and c ′ form a local diagonal extension iff y ∼ y ′ and o = o ′ Weight of extension w + w ′ − | y − y ′ | Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Project: Assembly Comparison Goal 1 Assemble a set of reads with two different Assemblers 2 Compare the results using Layout Software → OSLay Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Project: Assembly Comparison 1 Assemble a set of reads with two different Assemblers Reads of Chromosom 21 Assembler: Mira and Celera (WGS) Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Project: Assembly Comparison 1 Assemble a set of reads with two different Assemblers Reads of Chromosom 21 Assembler: Mira and Celera (WGS) Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Project: Assembly Comparison Problems: WGS Assembler doesn’t work with given reads ↓ Plan B: Take given sequence of chr. 21 Create artificial contigs Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Project: Assembly Comparison Create artificial contigs: Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Project: Assembly Comparison Create artificial contigs: Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Project: Assembly Comparison BLAST Assemblies are from the same sequence ↓ Megablast Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Project: Assembly Comparison OSLay OSLay is the implementation of the OSL algorithm. Input: target assembly reference assembly matches (e.g. BLAST) Output: original layout new layout Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Project: Assembly Comparison OSLay Problem: Input too large for OSLay Chr. 21 ∼ 34 MB ↓ Plan B: segment of 210 KB Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Project: Assembly Comparison OSLay Assembly A: sequence divided by 100 Assembly B: sequence divided by 19 Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Project: Assembly Comparison OSLay Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Project: Assembly Comparison OSLay Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Project: Assembly Comparison OSLay Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Project: Assembly Comparison OSLay False connections: Whole Genome Comparison: Project Presentations F. Heeger, M. Homilius, I. Kel, S. Krakau, S. Specovius, J. Wiedenhoeft
Recommend
More recommend