normalized maximum likelihood models in genomics ioan
play

Normalized maximum likelihood models in genomics Ioan Tabus - PowerPoint PPT Presentation

1 Normalized maximum likelihood models in genomics Ioan Tabus Department of Signal Processing Tampere University of Technology Department of Signal Processing NML models in genomics 8.7.2008 2 Universal distributions Optimality of the


  1. 1 Normalized maximum likelihood models in genomics Ioan Tabus Department of Signal Processing Tampere University of Technology Department of Signal Processing NML models in genomics 8.7.2008

  2. 2 Universal distributions Optimality of the Normalized Maximum Likelihood model The universal distribution q(.) is solution of two minmax problems: 1. Best distribution for the worst string: θ ˆ ( ,..., , ( ,..., )) P x x x x 1 n 1 n min max ( ,..., ) q x x (.) ,..., q x x 1 n 1 n 2. Best distribution q(.) for the average regret of the worst generating distribution g(.) : θ ˆ ( ,..., , ( ,..., )) P x x x x 1 1 n n min max E g ( ,..., ) q x x (.) q g (.) 1 n Department of Signal Processing NML models in genomics 8.7.2008

  3. 3 Two related goals: DNA compression and DNA modelling Department of Signal Processing NML models in genomics 8.7.2008

  4. 4 Finding the regressor in the past Department of Signal Processing NML models in genomics 8.7.2008

  5. 5 The NML model for approximate matching Department of Signal Processing NML models in genomics 8.7.2008

  6. 6 Memoryless model Department of Signal Processing NML models in genomics 8.7.2008

  7. 7 The NML for memoryless discrete regression Department of Signal Processing NML models in genomics 8.7.2008

  8. 8 Department of Signal Processing NML models in genomics 8.7.2008

  9. 9 Memory model Department of Signal Processing NML models in genomics 8.7.2008

  10. 10 NML for the class of memory models Department of Signal Processing NML models in genomics 8.7.2008

  11. 11 NML-1 Encoding algorithm Department of Signal Processing NML models in genomics 8.7.2008

  12. 12 Department of Signal Processing NML models in genomics 8.7.2008

  13. 13 Compression of Human Genome (average 1.45 bit/base) Department of Signal Processing NML models in genomics 8.7.2008

  14. 14 Compression ratio in bits / base when compressing the human genome (only A,C,G,T alphabet) with a 10 MB window size. Blue: bzip2, Red:GeNML. 2.5 2 1.5 Bits / base 1 0.5 0 0 5 10 15 20 25 Chromosome Department of Signal Processing NML models in genomics 8.7.2008

  15. 15 Approximate matching in DNA analysis Use a universal coding of the binary mask resulting from matching two candidate sequences Normalized maximum likelihood models � For memoryless sources (Bernoulli) � For sources with memory Example: the DNA locus HUMGHCSA (about 65000 bases) � The genes GH-1 and GH-2 are human growth hormone genes � CS-5, CS-1, and CS-2 are chorionic somatomammotropin genes � expressed either in pituitary gland or in placenta Department of Signal Processing NML models in genomics 8.7.2008

  16. 16 DNA duplications and their role in evolution and disease Department of Signal Processing NML models in genomics 8.7.2008

  17. 17 Traditional approach to gene duplication Department of Signal Processing NML models in genomics 8.7.2008

  18. 18 Hamming versus NML 0 (memoryless) Department of Signal Processing NML models in genomics 8.7.2008

  19. 19 Optimizing the overall cost for duplication analysis Department of Signal Processing NML models in genomics 8.7.2008

  20. 20 Encoding the pointers and the mask Department of Signal Processing NML models in genomics 8.7.2008

  21. 21 Dynamic programming problem Department of Signal Processing NML models in genomics 8.7.2008

  22. 22 Department of Signal Processing NML models in genomics 8.7.2008

  23. 23 NML with memory (orders 1 and 2) Department of Signal Processing NML models in genomics 8.7.2008

  24. 24 Department of Signal Processing NML models in genomics 8.7.2008

  25. 25 Summarizing the duplication scenarios Department of Signal Processing NML models in genomics 8.7.2008

  26. 26 NML Encoding Department of Signal Processing NML models in genomics 8.7.2008

  27. 27 Efficient search of regressor Department of Signal Processing NML models in genomics 8.7.2008

  28. 28 Department of Signal Processing NML models in genomics 8.7.2008

  29. 29 Renormalization to account for contiguous matches Department of Signal Processing NML models in genomics 8.7.2008

  30. 30 Accounting for contiguous perfect matches � Unconstrained case: � Constrained case Department of Signal Processing NML models in genomics 8.7.2008

  31. 31 Accounting for contiguous perfect matches: efficient approach Department of Signal Processing NML models in genomics 8.7.2008

  32. 32 Comparison of NML 1 and NML 2 Department of Signal Processing NML models in genomics 8.7.2008

  33. 33 Open avenues � Universal models provide efficient representation tools for genomic sequences � More refined model order selection procedures may better account for non-stationarity along sequences � The techniques are easy to extend to more adaptive tools � Derive the exact NML model for more structured classes of models with memory Department of Signal Processing NML models in genomics 8.7.2008

Recommend


More recommend