JOBIM 3 July 2012
Chondrichthyans Teleostomi
Scyliorhinus canicula (dog fish) Genome sequencing Ongoing project with Génoscope started 3.5 Gbases, Illumina paired-end sequencing, 32 x Draft assembly : 3 449 662 contigs, N50 : 1 292 bp Draft assembly Callorhinchus milii (elephant shark) 910 Mbases Sanger + 454,1.4 x, 633 833 contigs, N50 : 1 466 bp Draft assembly Leucoraja erinacea (little skate) 3.42 Gbases, Illumina paired-end, 26 x, 2 962 365 contigs, N50 : 665 bp
Transcriptome project Peptisan project Sequencing done by Génoscope Libraries for mRNA Two normalised libraries (Non directional / directional) Illumina paired-end sequencing (~412 M, ~316 M) Poster on the transcriptome assembly (Pierre Pericard) Two Small RNA libraries Adult and Embryo libraries Illumina paired-end sequencing 51 nt long to identify miRNA : de novo identification
Small non coding RNA post-transcriptional regulators of mRNA transcripts Discovery of lin-4 in C.elegans in 1993 Pre-miRNA structure miRNA* GAGUAAA UA UA GA U 5’ CCUUG G GCAGCACA AUGGUUUGUG UU U ||||| | |||||||| |||||||||| || G 3’ GGAAC C CGUCGUGU UACCGGACGU AA A AUAAAAA UC UA GG A miRNA miRNA conservation miR-143 miRNA * loop miRNA Zebrafish .....GAUCUACAGUCGUCUGGCCCGCGGUGCAGUGCUGCAUCUCUGGUCAACUGGGAGUC UGAGAUGAAGCACUGUAGCUC GGGAGGACAACACUGUCAGCUC..... Medaka UGGUUCUGGUCCAUCUCUGCUGCCCAUGGUGCAGUGCUGCAUCUCUGGUCAGUUGAUAGUC UGAGAUGAAGCACUGUAGCUC GGGACGGAGGGCAGGAGUCUCAGUCUG Xenopus ............UGUCUCCCAGCCCAAGGUGCAGUGCUGCAUCUCUGGUCAGUUGUGAGUC UGAGAUGAAGCACUGUAGCUC GGGAAGGGGGAAU.............. Human .GCGCAGCGCCCUGUCUCCCAGCCUGA GGUGCAGUGCUGCAUCUCUGGU CAGUUGGGAGUC UGAGAUGAAGCACUGUAGCUC AGGAAGAGAGAAGUUGUUCUGCAGC.. Mouse ......................CCUGA GGUGCAGUGCUGCAUCUCUGG UCAGUUGGGAGUC UGAGAUGAAGCACUGUAGCUC AGG........................ Rat .GCGGAGCGCC.UGUCUCCCAGCCUGA GGUGCAGUGCUGCAUCUCUGG UCAGUUGGGAGUC UGAGAUGAAGCACUGUAGCUC AGGAAGGGAGAAGAUGUUCUGCAGC.. Cow ......GCGUCCUGUCUCCCAGCCUGAGGUGCAGUGCUGCAUCUCUGGUCAGUUGGGAGUC UGAGAUGAAGCACUGUAGCUC GGGAAGGGAGAAGUUGUUCUGCAGC.. Pig .............GUCCCCCAGCCGGA GGUGCAGUGCUGCAUCUCUGG UCAGCUGGGAGUC UGAGAUGAAGCACUGUAGCUC GGGAAGGGAGA................ Opossum ......................CCCGAGGUGCAGUGCUGCAUCUCUGGUCAGUUGUGAGUC UGAGAUGAAGCACUGUAGCUC GGG........................ Lizard ...........AUGUCUCCCAGCCCAA GGUGCAGUGCUGCAUCUCUGG UCAGUUGUGAGUC UGAGAUGAAGCACUGUAGCUC GGGAAGGGAGGAAC.............
Illumina paired-end sequencing Adult Embryo Sequences < 17nt ; >27nt Data Cleaning Rfam no adaptors PRINSEQ Flash cutadapt rRNA, tRNA, ncRNA High-Quality Sequences S. canicula miRBase 18.0 17 – 27 nt Draft Genome miRDeep2 C. milii Putative miRNA Genome R. erinacea Mature, Star, pre-miRNA Genome MIReNA CIDmiRNA Validation Triplet-SVM Conservation randfold PHDcleav miRNA SVM miRNAPred MFE
Cleaning Illumina paired-end sequencing Adult Embryo Sequences < 17nt ; >27nt Data Cleaning Rfam no adaptors PRINSEQ Flash cutadapt rRNA, tRNA, ncRNA High-Quality Sequences S. canicula miRBase 18.0 17 – 27 nt Draft Genome miRDeep2 Prediction C. milii Putative miRNA Validation Genome R. erinacea Mature, Star, pre-miRNA Genome MIReNA CIDmiRNA Validation Triplet-SVM Conservation randfold PHDcleav miRNA SVM miRNAPred MFE
Cleaning Illumina paired-end sequencing Adult Embryo Sequences < 17nt ; >27nt Data Cleaning Rfam no adaptors PRINSEQ Flash cutadapt rRNA, tRNA, ncRNA High-Quality Sequences 17 – 27 nt @PHOSPHORE_0144:8:1101:1512:2663#GGCUAC/1 @PHOSPHORE_0144:8:1101:1512:2663#GGCUAC/2 UUCCCAAGACUGUGAAACCCUU UGGAAUUCUCGGGUGCCAAGGAACUCCAG AAGGGUUUCACAGUCUUGGGAA GAUCGUCGGACUGUAGAACUCUGAACGUG @PHOSPHORE_0144:8:1101:1699:2666#GGCUAC/1 @PHOSPHORE_0144:8:1101:1699:2666#GGCUAC/2 AGGGCCCGGAUAGCUCAGUCGGUAG UGGAAUUCUCGGGUGCCAAGGAACUC CUACCGACUGAGCUAUCCGGGCCCU GAUCGUCGGACUGUAGAACUCUGAAC @PHOSPHORE_0144:8:1101:1503:2691#GGCUAC/1 @PHOSPHORE_0144:8:1101:1503:2691#GGCUAC/2 GAAUACCAGGUGCAGUAGGCUU UGGAAUUCUCGGGUGCCAAGGAACUCCAG AAGCCUACUGCCCCUGGUAUUC GAUCGUCGGACUGUAGAACUCUGAACGUG UUCCCAAGACUGUGAAACCCUU UGGAAUUCUCGGGUGCCAAGGAACUCCAG CACGUUCAGAGUUCUACAGUCCGACGAUC UUCCCAAGACUGUGAAACCCUU AGGGCCCGGAUAGCUCAGUCGGUAG UGGAAUUCUCGGGUGCCAAGGAACUC GUUCAGAGUUCUACAGUCCGACGAUC AGGGCCCGGAUAGCUCAGUCGGUAG GAAUACCAGGUGCAGUAGGCUU UGGAAUUCUCGGGUGCCAAGGAACUCCAG CACGUUCAGAGUUCUACAGUCCGACGAUC GAAUACCAGGGGCAGUAGGCUU • PRINSEQ (Schmieder and Edwards 2011 Bioinformatics ) • Cutadapt (Martin 2011. EMBnet.journal ) • Flash ( Magoč and Salzberg 2011 Bioinformatics )
Embryo Adult All Initial reads 89,766,100 81,179,402 170,945,502 Cleaned reads 82,325,424 65,651,400 147,976,824 Frequency
Embryo Adult All Initial reads 89,766,100 81,179,402 170,945,502 Cleaned reads 82,325,424 65,651,400 147,976,824 Frequency miR-143-3p
Illumina paired-end sequencing Adult Embryo Sequences < 17nt ; >27nt Data Cleaning Rfam no adaptors PRINSEQ Flash cutadapt rRNA, tRNA, ncRNA High-Quality Sequences S. canicula miRBase 18.0 17 – 27 nt Draft Genome miRDeep2 Prediction Putative miRNA Mature, Star, pre-miRNA miRDeep2 : Friedländer et al. 2008 Nature Biotechnology
Pre-miRNA Structural information: miRNA and miRNA* information: both miRNA and miRNA* Overexpression of the miRNA vs miRNA* Overhang (around 2 nt) Sequence conservation
Modification to miRDeep2 Variability of the miRDeep2 related to randfold Putative new miRNA 2445 new miRNA with score >= 0 1103 new miRNA with score >= 5 with 10% expected false positives
Conserved miRNA 170 miRNA identified similar to other species 15 rejected after manual inspection (2 with score > 5) 155 good known miRNA (21 with score < 5) contig_452580_14256 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNAACAUUCAACGCUGUCGGUGAGUNNNNNNNNNNNNNNNNNACCAUCGACCGUUGAUUGUACC NNNNNNNNNNNNNNNNNNNNGUUUCAGGGAACAUUCAACGCUGUCGGUGAGUUUGAUGCUAUUGGAGAAACCAUCGACCGUUGAUUGUACCUUGUAGC GAAUUCUGCUUCGAAUGGUUGCUUCAGUGAACAUUCAACGCUGUCGGUGAGUUUGGAAUUAAAGUAGAAACCAUCGACCGUUGAUUGUACCCUGCGGCAACCACCGUCCU NNNNNNNNNNNNNNNNNNNNNNNNNNNNNAACAUUCAACGCUGUCGGUGAGUNNNNNNNNNNNNNNNNNACCAUCGACCGUUGAUUGUACC oan-mir-181a (Ornithorhynch) NNNUNNNNNANNNUNNNNNNCUNNNNNNNANNNNGANGNU GCUU AA U U A U CU A GGAAU GUUNCAGGGNACANUCAACGNNGUCGGUGNGUUUNNUNCNA CG UGGUUGCU CAG G ACA UCAACG GUCGGUG GUUU U |||N|||||N|||N||||||NN|||||||N||||NN|N| || |||||||| ||| | ||| |||||| ||||||| |||| A CGANGUUCCNUGUNAGUUGCNNCAGCUACNCAAANNANGNU GC ACCAACGG GUC C UGU AGUUGC CAGCUAC CAAA A NNNUNNNNNANNNUNNNNNN--NNNNNNN-NNNNG-NGNU UCCU -C C C A U -- - GAUGA
Comparison conserved miRNA with other species C. milii (elephant shark) and L. erinacea (little skate) 131 identified in C.milii , 152 identified in L.erinacea , 154 altogether Previously identified chondrichthyans miRNA (Heimberg et al . 2011) 104 S.canicula miRNA mapped on C.milii scaffolds all 104 miRNAs identified in S. canicula miRNA* loop miRNA sca-mir-301 UGUCGGAG GCUCUGACGAUAUUGCACUACU GUACUCACAGU-UAAG CAGUGCAAUAGUAUUGUCAAAGC GUCAGGCACC cmi-mir-301 UGUCGGAG GCUCUGACGAUAUUGCACUACU GUCCUCACCGU-UAAG CAGUGCAAUAGUAUUGUCAAAGC GUCAGGCAAC ler-mir-301 UGUCGGGC GCUCUGACGAUAUUGCACUACU GUCCGCACAGCUAAAG CAGUGCAAUAGUAUUGUCAAAGC GUCAGGCACC hsa-mir-301a ACUGCUAACGAAU GCUCUGACUUUAUUGCACUACU GUACUUUACAG-CUAG CAGUGCAAUAGUAUUGUCAAAGC AUCUGAAAGCAGG mmu-mir-301a CCUGCUAACGGCU GCUCUGACUUUAUUGCACUACU GUACUUUACAG-CGAG CAGUGCAAUAGUAUUGUCAAAGC AUCCGCGAGCAGG pma-mir-301a CUUGCAAGCCCCUGCUGGAG GCUCUGACACCAUUGCACUACU GUACGCAAUGG-UGAG CAGUGCAAUUGUAUUGUCAAAGC UUCCGUCGGUGAGCCCA G G C --- A GU U UGUC GA GCU UGACGAUAU UGCACU CU AC C |||| || ||| ||||||||| |||||| || || A ACGG CU CGA ACUGUUAUG ACGUGA GA UG C A G A AUA C AU A
miRBase miRNA not in data set blastn of all miRBase miRNA against genome assembly 24 potential new conserved miRNA 2 identified by miRDeep2 but not identified as conserved 23444 522851 AAAG-UUCUGUCAUACACUCAGGCU UCAGUGCAUCACAGAACUUUGA contig_3412856_61753 CUCGAGCU AAAG-UUCUGUCAUACACUCAGGCU GCAGAUACACA-AGG UCAGUGCAUCACAGAACUUUGA UUCGGG rno-mir-148b UUGAGGU GAAG-UUCUGUUAUACACUCAGG CUGUGGCU-CUGA-AAG UCAGUGCAUCACAGAACUUUGU CUCG cmi CCCAAGCU GAAG-UUCUGUCAUACACUCAGGCU GUAGCUAAUGG-AAG UCAGUGCAUCACAGAACUUUGA CUCGAGAU ler CUCAAGCC AAAGGUUCUGUCAUACACUUUGGCU CUGUCGCUGGG-AAG UCAGUGCAUGACAGAACUUUG C C A CA GCAGA CUCGAG UAAAGUUCUGU AU CACU GGCU U |||||| ||||||||||| || |||| |||| GGGCUU GUUUCAAGACA UA GUGA CUGG A A C C -- AACAC 1425623 19236 UGAGAACUGAAUUCCAUGGGC UCCAUAGUAGACAGUUCUCCAG contig_2512524_51750 UUCCCAGCUA UGAGAACUGAAUUCCAUGGGC UGGUUGCACACUUUAUUUC-UCAG UCCAUAGUAGACAGUUCUCCAG CUUGGCUGCU gga-mir-146c-1 UUCCCAGCUC UGAGAACUGAAUUCCAUGGAC UGGUUUCAAUUCCAUGCGU-UCAG UCCAUGGUAUUCAGUUCUCUAG CUUGGCUGC cmi CCAGCUG UGAGAACUGAAUUCCAUGGGC UGGUCACGCAGUUUUCUUCCUCAG UCCAUAGUAGUCAGUUCUUCCG UUUGGCUGCU ler UUCCUGGCUC UGAGAACUGAAUUCCAUGGGC UGGUUGUUCACAUUAUUUC-UCAG UCCAUAGUAG-CAGUUCUCCGG CUUGGCUGCU ---UUCCCA AU AAUUCC UUGCACA GCU GAGAACUG AUGGGCUGG C ||| |||||||| ||||||||| CGA CUCUUGAC UACCUGACU U UCGUCGGUU C- AGAUGA CUUUAUU
Recommend
More recommend