using disco and mapreduce to study mrna complexity
play

Using Disco and MapReduce to study mRNA complexity Dan Williams - PowerPoint PPT Presentation

Using Disco and MapReduce to study mRNA complexity Dan Williams SciPy 2011 Lightning Talk 7/14/2011 | Life Technologies Proprietary & Confidential | 1 Disco MapReduce framework written in Python and Erlang useful for dealing with


  1. Using Disco and MapReduce to study mRNA complexity Dan Williams SciPy 2011 Lightning Talk 7/14/2011 | Life Technologies Proprietary & Confidential | 1

  2. Disco • MapReduce framework written in Python and Erlang − useful for dealing with massive data • Users specify map and reduce operations as Python functions, then chain them together to get stuff done 7/14/2011 | Life Technologies Proprietary & Confidential | 2

  3. mRNA molecules contain three distinct regions: AAATGACGACAACGGTGAGGGTTCTCGGGCGGGGCCTGGGACAGGCAGCTCCGGGGTCCGCGGTTTCACATCGGAAACAAAACAGCGG CTGGTCTGGAAGGAACCTGAGCTACGAGCCGCGGCGGCAGCGGGGCGGCGGGGAAGCGTATACCTAATCTGGGAGCCTGCAAGTGACA ACAGCCTTTGCGGTCCTTAGACAGCTTGGCCTGGAGGAGAACACATGAAAGAAAGAACCTCAAGAGGCTTTGTTTTCTGTGAAACAGT ATTTCTATACAGTTGCTCCAATGACAGAGTTACCTGCACCGTTGTCCTACTTCCAGAATGCACAGATGTCTGAGGACAACCACCTGAG CAATACTGTACGTAGCCAGAATGACAATAGAGAACGGCAGGAGCACAACGACAGACGGAGCCTTGGCCACCCTGAGCCATTATCTAAT GGACGACCCCAGGGTAACTCCCGGCAGGTGGTGGAGCAAGATGAGGAAGAAGATGAGGAGCTGACATTGAAATATGGCGCCAAGCATG TGATCATGCTCTTTGTCCCTGTGACTCTCTGCATGGTGGTGGTCGTGGCTACCATTAAGTCAGTCAGCTTTTATACCCGGAAGGATGG GCAGCTAATCTATACCCCATTCACAGAAGATACCGAGACTGTGGGCCAGAGAGCCCTGCACTCAATTCTGAATGCTGCCATCATGATC AGTGTCATTGTTGTCATGACTATCCTCCTGGTGGTTCTGTATAAATACAGGTGCTATAAGGTCATCCATGCCTGGCTTATTATATCAT CTCTATTGTTGCTGTTCTTTTTTTCATTCATTTACTTGGGGGAAGTGTTTAAAACCTATAACGTTGCTGTGGACTACATTACTGTTGC ACTCCTGATCTGGAATTTTGGTGTGGTGGGAATGATTTCCATTCACTGGAAAGGTCCACTTCGACTCCAGCAGGCATATCTCATTATG ATTAGTGCCCTCATGGCCCTGGTGTTTATCAAGTACCTCCCTGAATGGACTGCGTGGCTCATCTTGGCTGTGATTTCAGTATATGATT TAGTGGCTGTTTTGTGTCCGAAAGGTCCACTTCGTATGCTGGTTGAAACAGCTCAGGAGAGAAATGAAACGCTTTTTCCAGCTCTCAT TTACTCCTCAACAATGGTGTGGTTGGTGAATATGGCAGAAGGAGACCCGGAAGCTCAAAGGAGAGTATCCAAAAATTCCAAGTATAAT GCAGAAAGCACAGAAAGGGAGTCACAAGACACTGTTGCAGAGAATGATGATGGCGGGTTCAGTGAGGAATGGGAAGCCCAGAGGGACA GTCATCTAGGGCCTCATCGCTCTACACCTGAGTCACGAGCTGCTGTCCAGGAACTTTCCAGCAGTATCCTCGCTGGTGAAGACCCAGA GGAAAGGGGAGTAAAACTTGGATTGGGAGATTTCATTTTCTACAGTGTTCTGGTTGGTAAAGCCTCAGCAACAGCCAGTGGAGACTGG AACACAACCATAGCCTGTTTCGTAGCCATATTAATTGGTTTGTGCCTTACATTATTACTCCTTGCCATTTTCAAGAAAGCATTGCCAG CTCTTCCAATCTCCATCACCTTTGGG Research question: Do the three mRNA regions generally differ in information content? 7/14/2011 | Life Technologies Proprietary & Confidential | 3

  4. Method: Calculate the Shannon entropy of each 21- nucleotide segment of each mRNA from a well-known database. Group results by region and compare. MapReduce with Disco speeds the computation (across ~30k mRNA sequences) 7/14/2011 | Life Technologies Proprietary & Confidential | 4

  5. Map 21-mer segments and regions to 1 Reduce to remove duplicates Reduce to get a boxplot for each region Map Shannon entropy of 21-mer segment to region 7/14/2011 | Life Technologies Proprietary & Confidential | 5

  6. 7/14/2011 | Life Technologies Proprietary & Confidential | 6

  7. Thank you! 7/14/2011 | Life Technologies Proprietary & Confidential | 7

Recommend


More recommend