extracting relevant information from uhts data analysis
play

Extracting relevant information from UHTS data: analysis pipelines - PowerPoint PPT Presentation

Extracting relevant information from UHTS data: analysis pipelines (smallRNA) Patricia Otten 3th July 2012 JOBIM Rennes - France Fasteris SA: Illumina sequencing - founded in 2003 by L. FARINELLI and M. OSTERAS - 2012: about 20


  1. Extracting relevant information from UHTS data: analysis pipelines (smallRNA) Patricia Otten 3th July 2012 JOBIM Rennes - France

  2. Fasteris SA: Illumina sequencing - founded in 2003 by L. FARINELLI and M. OSTERAS - 2012: about 20 collaborators - capillary and UHTS sequencing + bioinformatics - private and academic labs - no business plan, no external investors, no sales forces 3th July 2012 JOBIM - Rennes 2

  3. Illumina sequencing Key technology based on the concept of DNA colonies, invented in 1996 at the GlaxoWellcome's Geneva Biomedical Research Institute Mayer P., Farinelli L. and Kawashima, E., 1997, Patent application WO 98/44151 3th July 2012 JOBIM - Rennes 3

  4. Illumina sequencing: step1 Library preparation (smallRNA protocol) 3 ug total RNA selection of small RNAs (20-30 nt) acrylamide gel purification single-stranded ligation of the 3' adapter P7 single-stranded ligation of the 5' adapter P5 reverse transcription, PCR, index addition, gel purification index library 3th July 2012 JOBIM - Rennes 4

  5. Illumina sequencing: step2 Flowcell preparation Templates are hybridized to a surface (flowcell) and in situ amplified (bridge amplification) to form DNA colonies. - each colony produces one read - all colonies are sequenced in parallel x - ~150 mio passed filter reads per lane 3th July 2012 JOBIM - Rennes 5

  6. Illumina sequencing: step 3 Sequencing Incorporation of reversible- terminator nucleotides labeled with fluorescent dyes - base per base sequencing (50, 100 cycles, SR or PE) - laser excitation and image capture; release of dye; - intensities extraction and base calling by RTA software 1x100 run: 1 week; 1.5 TB intensities; 200 GB sequences; 3th July 2012 JOBIM - Rennes 6

  7. Trimming (smallRNAs) Adapter trimming 3th July 2012 JOBIM - Rennes 7

  8. Introduction to smallRNAs transposons silencing poorly conserved downregulation of genes chemical modifications of highly conserved other RNAs, mainly rRNAs, tRNAs and snRNAs RNA splicing, guides for telomere elongation from dsRNA translation downregulation Expression analysis Virus assembly Sequencing by siRNA: a novel generic tool for virus discovery Kreuze et al. (2009) Complete viral genome sequence and discovery of novel viruses by deep sequencing of small RNAs: a generic method for diagnosis, discovery and sequencing of viruses. Virology 388: 1-7 3th July 2012 JOBIM - Rennes 8

  9. Introduction to smallRNAs 3th July 2012 JOBIM - Rennes 9

  10. Pipelines and automation produce time meaningfull resources data www.photo-dictionary.com → automation + checks → handle unexpected issues, keep time for the client → a pipeline is a set of predetermined tasks that have to be executed to complete a specific analysis 3th July 2012 JOBIM - Rennes 10

  11. Pipelines and automation Eg: comparison of libraries in terms of miRNA coverage mapping miRNA insert selection (ref. genome) coverage normalization Each module may involve one or several processes library comparison mapping (ref. genome) 1. reference Makefiles 2. indexing Bash scripts visualization 3. mapping 4. format conversion R scripts 5. reporting 3th July 2012 JOBIM - Rennes 11

  12. Expression pipelines reads Perl Bash BWA inserts mapping mapping (ref. genome) (sequence db) iGenome PMRD mirBase sequence annotated peak profile features detection Seqmonk Bedtools Bash/R R coverage post-processing 3th July 2012 JOBIM - Rennes 12

  13. Expression reads Perl Bash BWA inserts mapping mapping (ref. genome) (sequence db) iGenome PMRD mirBase sequence annotated peak profile features detection Seqmonk Bedtools Bash/R R coverage post-processing 3th July 2012 JOBIM - Rennes 13

  14. Expression reads Perl Bash BWA inserts mapping mapping (ref. genome) (sequence db) iGenome PMRD mirBase sequence annotated peak profile features detection Seqmonk Bedtools Bash/R R coverage post-processing 3th July 2012 JOBIM - Rennes 14

  15. Expression reads Perl Bash BWA inserts mapping mapping (ref. genome) (sequence db) iGenome PMRD mirBase sequence annotated peak Size profile, BDGP5.25 profile features detection LIB-1 Seqmonk 250000 200000 Bedtools 150000 100000 RPM 50000 0 Bash/R R 23 coverage 24 25 26 27 28 29 -50000 -100000 -150000 Insert size post-processing 3th July 2012 JOBIM - Rennes 15

  16. Expression reads Perl Bash BWA inserts mapping mapping (ref. genome) (sequence db) iGenome PMRD mirBase sequence annotated peak profile features detection Seqmonk Bedtools Bash/R R coverage post-processing 3th July 2012 JOBIM - Rennes 16

  17. Expression reads Perl Bash BWA inserts mapping mapping (ref. genome) (sequence db) iGenome PMRD mirBase sequence annotated peak profile features detection Seqmonk Bedtools Bash/R R coverage post-processing 3th July 2012 JOBIM - Rennes 17

  18. Expression reads Perl Bash BWA inserts mapping mapping (ref. genome) (sequence db) iGenome PMRD mirBase sequence annotated peak profile features detection Seqmonk Bedtools Bash/R R coverage post-processing 3th July 2012 JOBIM - Rennes 18

  19. Expression reads Perl Bash BWA inserts mapping mapping (ref. genome) (sequence db) iGenome PMRD count mirBase RPKM = sequence annotated peak insertNb [M] * probeLength [K] profile features detection Seqmonk Bedtools Bash/R R coverage post-processing 3th July 2012 JOBIM - Rennes 19

  20. Expression reads Perl Bash BWA inserts Comparison scores between pairs of mapping mapping libraries. (ref. genome) (sequence db) n1,n2~binomial distribution with same probability of event (p=(n1/N2+n2/N2)/2); iGenome PMRD 0.8 mirBase score~p(observing a count <n1 or >n2) sequence annotated peak 0.9 0.95 profile features detection 0.99 Seqmonk Bedtools Bash/R R coverage post-processing 3th July 2012 JOBIM - Rennes 20

  21. Expression reads Perl Bash BWA inserts mapping mapping (ref. genome) (sequence db) s iGenome PMRD mirBase sequence annotated peak profile features detection Seqmonk Bedtools Bash/R R coverage post-processing 3th July 2012 JOBIM - Rennes 21

  22. Virus identification Sequencing by siRNA: a novel generic tool for virus discovery Kreuze et al. (2009) Complete viral genome sequence and discovery of novel viruses by deep sequencing of small RNAs: a generic method for diagnosis, discovery and sequencing of viruses. Virology 388: 1-7 SiRNAs: - class of dsRNAs of 20-25 nts - involved in post-transcriptional gene silencing - endogenous or exogenous → synthetic dsRNA introduced into cells can induce silencing of specific genes of interest → viral infection: presence of viral dsRNA leading to siRNAs that participate in the cell antiviral response; 3th July 2012 JOBIM - Rennes 22

  23. Virus assembly pipelines reads reads (infected sample) (control sample) Perl/R Perl/R inserts inserts BWA Velvet+Oases de novo assembly mapping mapping (host ref. genome) (host contigs) iGenome de novo assembly (unmapped inserts) Velvet+Oases Velvet+Oases Blast/BWA/Mummer mapping (viral db) RefSeq 3th July 2012 JOBIM - Rennes 23

  24. Virus assembly reads reads (infected library) (control library) Perl Perl inserts inserts BWA Velvet+Oases de novo assembly mapping mapping (host ref. genome) (host contigs) iGenome de novo assembly (unmapped inserts) Velvet+Oases Velvet+Oases Blast/BWA/Mummer mapping (viral db) RefSeq 3th July 2012 JOBIM - Rennes 24

  25. Virus assembly reads reads (infected library) (control library) Perl Perl inserts inserts BWA Velvet+Oases de novo assembly mapping mapping (host ref. genome) (host contigs) iGenome de novo assembly (unmapped inserts) Velvet+Oases Velvet+Oases Blast/BWA/Mummer mapping (viral db) RefSeq 3th July 2012 JOBIM - Rennes 25

  26. Virus assembly reads reads (infected library) (control library) Perl Perl inserts inserts BWA Velvet+Oases de novo assembly mapping mapping (host ref. genome) (host contigs) iGenome de novo assembly (unmapped inserts) Velvet+Oases Velvet+Oases Blast/BWA/Mummer mapping (viral db) RefSeq 3th July 2012 JOBIM - Rennes 26

  27. Virus assembly reads reads (infected library) (control library) Perl Perl inserts inserts BWA Velvet+Oases de novo assembly mapping mapping (host ref. genome) (host contigs) iGenome de novo assembly (unmapped inserts) Velvet+Oases Velvet+Oases Blast/BWA/Mummer mapping (viral db) RefSeq 3th July 2012 JOBIM - Rennes 27

  28. Thank you for your attention 3th July 2012 JOBIM - Rennes 28

Recommend


More recommend