results of the burrows wheeler algorithm
play

Results of the Burrows-Wheeler Algorithm Zichuan Wang (Jack) - PowerPoint PPT Presentation

Results of the Burrows-Wheeler Algorithm Zichuan Wang (Jack) Nankai University, China Content Hamlet, Shakespeare The Mathematician, John von Neumann The Other Tiger, Jorge Luis Borges Luceafrul (Evening Star), Mihai Eminescu


  1. Results of the Burrows-Wheeler Algorithm Zichuan Wang (Jack) Nankai University, China

  2. Content ➢ Hamlet, Shakespeare ➢ The Mathematician, John von Neumann ➢ The Other Tiger, Jorge Luis Borges ➢ Luceafãrul (Evening Star), Mihai Eminescu ➢ The Raven, Edgar Alan Poe ➢ DNA Sequences-TP53 ➢ DNA Sequences-Promoter ➢ Summary

  3. Hamlet, Shakespeare Act 1, Sc Scene e 1 FRANCISCO at his post. Enter to him BERNARDO FRANCISCO BERNARDO For this relief much thanks: 'tis bitter cold, Who's there? And I am sick at heart. FRANCISCO BERNARDO Nay, answer me: stand, and unfold yourself. Have you had quiet guard? BERNARDO FRANCISCO Long live the king! Not a mouse stirring. FRANCISCO BERNARDO Bernardo? Well, good night. BERNARDO If you do meet Horatio and Marcellus, He. The rivals of my watch, bid them make haste. FRANCISCO FRANCISCO You come most carefully upon your hour. I think I hear them. Stand, ho! Who's there? BERNARDO Enter HORATIO and MARCELLUS 'Tis now struck twelve; get thee to bed, Francisco. ... Text: http://shakespeare.mit.edu/hamlet/index.html Image: https://viewpoints.iu.edu/art-at-iu/feed/atom/

  4. Preprocessing Letters only Lower case only First 1001 letters (with $) bernardowhostherefrancisconayanswermestandandunfoldyourselfbernardolonglivetheking franciscobernardobernardohefranciscoyoucomemostcarefullyuponyourhourbernardotisno wstrucktwelvegettheetobedfranciscofranciscoforthisreliefmuchthankstisbittercoldandiamsic katheartbernardohaveyouhadquietguardfrancisconotamousestirringbernardowellgoodnigh tifyoudomeethoratioandmarcellustherivalsofmywatchbidthemmakehastefranciscoithinkihea rthemstandhowhosthereenterhoratioandmarcellushoratiofriendstothisgroundmarcellusandl iegementothedanefranciscogiveyougoodnightmarcellusofarewellhonestsoldierwhohathrelie vedyoufranciscobernardohasmyplacegiveyougoodnightexitmarcellushollabernardobernard osaywhatishoratiotherehoratioapieceofhimbernardowelcomehoratiowelcomegoodmarcellu smarcelluswhathasthisthingappeardagaintonightbernardoihaveseennothingmarcellushorati osaystisbutourfantasyandwillnotletbelieftakeholdofhimtouchingthisdreadedsighttwiceseeno fusthereforeihaveentreatedhimalongwithustowatchtheminutesofthisnightthatifagainth$

  5. Frequency of Letters Rank Letter Freq Rank Letter Freq 1 e 103 14 m 27 2 o 91 15 f 26 3 a 88 16 g 25 4 r 78 17 b 17 5 t 78 18 w 17 6 i 70 19 y 15 7 h 65 20 v 9 8 n 64 21 k 7 9 s 55 22 p 5 10 l 44 23 q 1 11 d 42 24 x 1 12 c 40 25 j 0 13 u 32 26 z 0 Frequency of Letters-Original Text

  6. Transformation 139 letters repeat 529 times in total Max number of repeats: 14 (a and e) hllehfdggmtmvtirrrrrrrrrttdsoodydhyfogmmmmmmmmeunnnnnnnnnnnncfeehhhtwwe red: some continuous hkhhrrrrrrrhhhhnssotoaototformg$hsstarrrrrrrreituutnnnnnnnnniussssssrllusssrlneaere repeated sequences nnlnnnonooorrlrrrrrurrrrraseninnelrphhrihbtdvssrvmhirntrhirivcmkkrmrhwwsrbrwwcccc ccccwghhmhieeemecthhhhthwbbbbbbbbbbbbivctmsnliveegirtvvvirolooeoneoguodee yellow: sequences edeoeoeoiaanneeniiiiiieonneulusntntutoetwotwiioctittotttttttttdffttcttttttweslrtessseww with the max number rdtggccggggtdwsbplllrdulttnnnnnseokwhhhhrkhhhaamttttttttttccccccccchhththhowxbr of repeats lggcaanenclpeeooootelleedegoeeieeeeeeeeuloaallllllllelmidtdgdtdsiooooereeeaaeifsfr rrrrrrrrrrroaaaaaaaaaaaaauaaeaaaouiiioiioodddsiaeeonlsaaeiieeioiiitccddoooosedsccis ncdhdddccshfhddcccchlltcpgggghhhhhhhffidmhhntintdntyyyyyyrhtyymtiddhdncpayua dfffffffffooooooouaaaaaaaaeaaaaaaaaaaaaaadteeeaeoshaueufreeeeeeeeeeeeeegiuaa oteuooiiiiiiiiiiiieeruiuuuimduaiiuletiemoaououiaekyduwenuafossnerhsaaastnuhenhatao tehdrsssossoigofrseaiahaaaaaaasssaooihennsmusnssihehktgomrooooooqfodyoooololl lllflhlnbiealaiiiayooooetssyrootdgoesaofdeeendmalma

  7. Distribution of Splits Only consider splits with more than one letters in a row.

  8. Distribution of Splits

  9. Compressed Text String # repeats # letters: 611 [1,2,1,1,1,1,2,1,1,1,1,1,1,9,2,1,1,2,1,1,1,1,1,1,1,1,8,1,1,12,1,1, Compression factor: 61.04% 2,3,1,2,1,1,1,2,7,4,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,8,1,1, 1,2,1,9,1,1,6,1,2,1,3,1,1,1,1,1,1,1,1,2,1,3,1,1,3,2,1,5,1,5,1,1,1, 1,1,2,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2, hlehfdgmtmvtirtdsodydhyfogmeuncfehtwehkhrhns 1,1,1,1,2,1,1,1,1,2,8,1,1,2,1,1,1,3,1,1,1,1,4,1,1,1,12,1,1,1,1,1,1 otoaototformg$hstareitutniusrlusrlneaerenlnonorlr ,1,1,1,1,2,1,1,1,1,3,1,1,1,1,2,1,1,1,1,1,1,1,1,1,3,1,1,1,1,1,1,1,1, uraseninelrphrihbtdvsrvmhirntrhirivcmkrmrhwsrbr 2,2,2,1,6,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,1,2,1, wcwghmhiemecththwbivctmsnlivegirtviroloeoneog 9,1,2,2,1,6,1,1,1,1,1,1,1,3,1,2,1,1,1,2,2,4,1,1,1,1,1,1,3,1,1,1,1, uodedeoeoeoianenieoneulusntntutoetwotwioctito 2,5,1,1,1,1,1,4,1,1,3,2,1,10,9,2,1,1,1,2,1,1,1,1,1,1,2,1,2,1,1,1,1 tdftctweslrtesewrdtgcgtdwsbplrdultnseokwhrkham ,1,1,2,4,1,1,2,2,1,1,1,1,2,1,8,1,1,1,2,8,1,1,1,1,1,1,1,1,1,1,1,1,1, tchththowxbrlgcanenclpeoteledegoeieuloalelmidtd 4,1,1,3,2,1,1,1,1,1,12,1,13,1,2,1,3,1,1,3,1,2,2,3,1,1,1,2,1,1,1,1, gdtdsioereaeifsfroauaeaouioiodsiaeonlsaeieioitcdo 2,1,2,2,1,1,3,1,2,2,4,1,1,1,1,2,1,1,1,1,1,1,3,2,1,1,1,1,2,4,1,2,1, 1,1,4,7,2,1,1,1,2,1,1,1,1,1,1,1,1,6,1,1,1,2,1,1,1,2,1,1,1,1,1,1,1, sedscisncdhdcshfhdchltcpghfidmhntintdntyrhtymti 1,1,1,9,7,1,8,1,14,1,1,3,1,1,1,1,1,1,1,1,1,1,1,14,1,1,1,2,1,1,1,1, dhdncpayuadfouaeadteaeoshaueufregiuaoteuoier 2,12,2,1,1,1,3,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 uiuimduaiuletiemoaououiaekyduwenuafosnerhsast ,1,1,1,1,1,1,1,2,1,1,1,1,1,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3,1, nuhenhataotehdrsosoigofrseaiahasaoihensmusnsi 2,1,1,1,1,1,1,1,1,1,1,1,1,7,3,1,2,1,1,1,2,1,1,1,1,1,2,1,1,1,1,1,1, hehktgomroqfodyololflhlnbiealaiayoetsyrotdgoesa 1,1,1,1,6,1,1,1,1,1,4,1,1,5,1,1,1,1,1,1,1,1,1,1,1,3,1,1,4,1,1,2,1, ofdendmalma 1,2,1,1,1,1,1,1,1,1,1,1,3,1,1,1,1,1,1,1]

  10. Histogram Number of Repeats for the First 80 Letters in the Compressed Text

  11. Manhattan Plot Number of Repeats for All Letters in the Compressed Text *All 6039 letters in the text are used.

  12. Frequency of Letters Change in Rank Rank Letter Freq Rank Letter Freq Change in Rank 1 e 64 = 14 c 18 ↓ 2 2 o 57 = 15 f 16 = 3 t 52 ↑ 2 16 g 16 = 4 i 44 ↑ 2 17 w 13 ↑ 1 5 h 40 ↑ 2 18 y 9 ↑ 1 6 a 38 ↓ 3 19 v 7 ↑ 1 7 r 37 ↓ 3 20 b 6 ↓ 3 8 d 35 ↑ 3 21 k 6 = 9 s 35 = 22 p 5 = 10 n 34 ↓ 2 23 q 1 = 11 u 29 ↑ 2 24 x 1 = 12 l 27 ↓ 2 25 j 0 = 13 m 20 ↑ 1 26 z 0 = Frequency of Letters-Compressed Text

  13. Complete Expression # symbols (letters + digits): 758 Complete compression factor: 75.72% h,l,2,e,h,f,d,g,2,m,t,m,v,t,i,r,9,t,2,d,s,o,2,d,y,d,h,y,f,o,g,m,8,e,u,n,12,c,f,e,2,h,3,t,w,2,e,h,k,h,2,r,7,h,4,n,s,2,o,t,o,a,o,t, o,t,f,o,r,m,g,$,h,s,2,t,a,r,8,e,i,t,u,2,t,n,9,i,u,s,6,r,l,2,u,s,3,r,l,n,e,a,e,r,e,n,2,l,n,3,o,n,o,3,r,2,l,r,5,u,r,5,a,s,e,n,i,n,2,e,l,r,p, h,2,r,i,h,b,t,d,v,s,2,r,v,m,h,i,r,n,t,r,h,i,r,i,v,c,m,k,2,r,m,r,h,w,2,s,r,b,r,w,2,c,8,w,g,h,2,m,h,i,e,3,m,e,c,t,h,4,t,h,w,b,12,i,v, c,t,m,s,n,l,i,v,e,2,g,i,r,t,v,3,i,r,o,l,o,2,e,o,n,e,o,g,u,o,d,e,3,d,e,o,e,o,e,o,i,a,2,n,2,e,2,n,i,6,e,o,n,2,e,u,l,u,s,n,t,n,t,u,t,o,e, t,w,o,t,w,i,2,o,c,t,i,t,2,o,t,9,d,f,2,t,2,c,t,6,w,e,s,l,r,t,e,s,3,e,w,2,r,d,t,g,2,c,2,g,4,t,d,w,s,b,p,l,3,r,d,u,l,t,2,n,5,s,e,o,k,w,h,4 ,r,k,h,3,a,2,m,t,10,c,9,h,2,t,h,t,h,2,o,w,x,b,r,l,g,2,c,a,2,n,e,n,c,l,p,e,2,o,4,t,e,l,2,e,2,d,e,g,o,e,2,i,e,8,u,l,o,a,2,l,8,e,l,m,i, d,t,d,g,d,t,d,s,i,o,4,e,r,e,3,a,2,e,i,f,s,f,r,12,o,a,13,u,a,2,e,a,3,o,u,i,3,o,i,2,o,2,d,3,s,i,a,e,2,o,n,l,s,a,2,e,i,2,e,2,i,o,i,3,t,c,2, d,2,o,4,s,e,d,s,c,2,i,s,n,c,d,h,d,3,c,2,s,h,f,h,d,2,c,4,h,l,2,t,c,p,g,4,h,7,f,2,i,d,m,h,2,n,t,i,n,t,d,n,t,y,6,r,h,t,y,2,m,t,i,d,2,h,d ,n,c,p,a,y,u,a,d,f,9,o,7,u,a,8,e,a,14,d,t,e,3,a,e,o,s,h,a,u,e,u,f,r,e,14,g,i,u,a,2,o,t,e,u,o,2,i,12,e,2,r,u,i,u,3,i,m,d,u,a,i,2,u,l, e,t,i,e,m,o,a,o,u,o,u,i,a,e,k,y,d,u,w,e,n,u,a,f,o,s,2,n,e,r,h,s,a,3,s,t,n,u,h,e,n,h,a,t,a,o,t,e,h,d,r,s,3,o,s,2,o,i,g,o,f,r,s,e,a,i,a ,h,a,7,s,3,a,o,2,i,h,e,n,2,s,m,u,s,n,s,2,i,h,e,h,k,t,g,o,m,r,o,6,q,f,o,d,y,o,4,l,o,l,5,f,l,h,l,n,b,i,e,a,l,a,i,3,a,y,o,4,e,t,s,2,y,r,o, 2,t,d,g,o,e,s,a,o,f,d,e,3,n,d,m,a,l,m,a

  14. Complete Expression Length Complete of Compression Text Factor Factor (with $) (with $) (%) (%) 1001 75.72 2001 76.96 3001 80.51 4001 82.05 5001 82.76 6001 83.17 * A shorter text is the prefix of a longer one.

Recommend


More recommend