usin ing tmm and deseq
play

usin ing TMM and DESeq -Ying Sha, Lu Wang 1 Extreme low library - PowerPoint PPT Presentation

Normalization of f kazak RNA-seq data usin ing TMM and DESeq -Ying Sha, Lu Wang 1 Extreme low library size of two samples before filtering after filtering Density Density Log2 (raw count) Log2 (raw count) Sen_treated_1 and


  1. Normalization of f kazak RNA-seq data usin ing TMM and DESeq -Ying Sha, Lu Wang 1

  2. Extreme low library size of two samples before filtering after filtering Density Density Log2 (raw count) Log2 (raw count) • Sen_treated_1 and sen_untreated_1 are the two samples with lowest raw read counts. • The two samples also have two abnormal peaks(red box). For this reason, the genes are filtered by the criteria that in each of the library, raw read count should be more than 3. 2

  3. log 2 Observed ERCC Counts log 2 Expected ERCC Counts • This also supports filtering the genes by the criteria that in each of the library, raw read count should be more than 3(the dashed line represents three read counts). 3

  4. Choic ice of normalization methods • TMM and DESeq methods are chosen because they preformed best in a comprehensive comparison study of normalization methods .(Dillies, Marie-Agnès, et al. "A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis." Briefings in bioinformatics 14.6 (2013): 671-683.) • Both of the methods assume that most genes are not differentially expressed. • TMM method uses a weighted trimmed mean of the log expression ratios (trimmed mean of M values [TMM]) , implemented in edgeR. • DESeq estimates the normalization factor by the median of scaled counts and it is implemented in DESeq. 4

  5. Boxplot-distributio ion of f read counts A Raw Counts (filtered) B RPKM normalization C TMM normalization Log 2 (normalized counts) w/ ERCC D E F DESeq normalization LOESS normalization RPKM + LOESS normalization Log 2 (normalized counts) 5

  6. Density Plo lot-distrib ibution of f rea ead counts RPKM normalization TMM normalization A Raw Counts (filtered) B C Density w/ ERCC D E F DESeq normalization LOESS normalization RPKM + LOESS normalization 6 Log 2 counts

  7. • RPKM normalization failed to reduce variations between the SEN samples. • TMM normalization and DESeq normalization (Panel C and D) results are similarly good. • LOESS normalization and RPKM + LOESS normalization (Panel E and F) were performed using ERCC counts. • The difference between these two is that the LOESS was performed on the raw counts while RPKM + LOESS was done by running the LOESS normalization in addition to the RPKM normalization. 7

  8. Evalu luatio ion (1 (1)-ERCC Seq equences RPKM normalization TMM normalization A B C Raw Counts Log 2 (normalized counts) w/ ERCC D E F DESeq normalization LOESS normalization RPKM + LOESS normalization Log 2 (normalized counts) 8

  9. Evalu luatio ion (1 (1)-ERCC Seq equences RPKM normalization TMM normalization A Raw Counts B C Density w/ ERCC D E F DESeq normalization LOESS normalization RPKM + LOESS normalization 9 Log 2 counts

  10. Evaluati tion (1 (1)-ERCC Se Sequences w/ ERCC TMM Normalization LOESS Normalization log 2 Normalized ERCC Counts log 2 Normalized ERCC Counts log 2 Expected ERCC Counts log 2 Expected ERCC Counts Both normalization methods with or without ERCC did not change linear relationship between expected 10 ERCC counts/concentrations and normalized counts.

  11. Evalu luatio ion (2 (2)-housekeeping gen enes Not normalized w/ ERCC A B C D normalized counts E F G H normalized counts TMM and DESeq normalization did not adjust the housekeeping genes very well. 11 LOESS with ERCC actually increased the variation.

  12. Evalu luatio ion (2 (2)-housekeeping gen enes Not normalized w/ ERCC A B C D normalized counts TMM and DESeq normalization did not adjust the housekeeping genes very well. 12

  13. Evalu luatio ion (3) (3)-new hou housekeepin ing gen enes • Eisenberg et al. defined a new set of house-keeping genes based on RNA-seq data across different human tissue types from the Human BodyMap (HBM) 2.0 Project. • Eisenberg, E., & Levanon, E. Y. (2013). Human housekeeping genes, revisited. Trends in Genetics , 29 (10), 569-574. • This set gives 3804 house-keeping genes and no longer include GAPDH and ACTB. • Evaluation(3) was based on a few genes from the top 11 new house-keeping genes recommended used for calibration. 13

  14. Evalu luatio ion (3 (3)-new ho housekeepin ing gen enes Not normalized w/ ERCC A B C D normalized counts E F G H normalized counts 14

  15. Evalu luatio ion (3 (3)-new ho housekeepin ing gen enes Not normalized w/ ERCC A B C D normalized counts E F G H normalized counts TMM and DESeq normalization reduced the variation for new housekeeping genes. 15

  16. Evalu luatio ion (3 (3)-new ho housekeepin ing gen enes Not normalized w/ ERCC A B C D normalized counts E F G H normalized counts TMM and DESeq normalization reduced the variation for new housekeeping genes. 16

  17. Evalu luatio ion (4 (4)-genes of of in interest Not normalized w/ ERCC A B C D normalized counts E F G H normalized counts TMM and DESeq normalized data is consistent with our previous findings about 17 these genes of interest.

  18. Combining Replicates for Differential Analysis - What we know: - According to Aibek’s notes, replicates such as SR_treated_1 and SR_treated_2 are not sequenced on the same day. - Before normalization, all replicates for a given condition showed significant difference in variance between each other. - After TMM normalization, although the variation was reduced in general and some replicates from a given condition, for example SR_treated_1 and SR_treated_2 showed reduced difference in variance between replicates. However some replicates, such as SEN_treated_1 and SEN_treated_2 still shows large difference in variance. - This can be shown in the ACTB example. the ACTB expression level in all conditions were reduced after TMM normalization, however, the we can see that the error bars for SEN_treated and SEN_untreated are still large. - What we do not know: - Whether if these replicates such as SR_treated_1 and SR_treated_2 were from the same patient 18

  19. Discussion - How to treated replicates such as SEN_treated_1 and SEN_treated_2 if they have large variation even after normalization. - This is important because it will determine: - How to choose the library or combined libraries for the two conditions between which we run the differential expression analysis to call differential expression - How we should prepare the UCSC browser tracks for each condition. For example, whether if we use SEN_treated_2(the larger library) to represent SEN_treated or the combination of SEN_treated_1 and SEN_treated_2 - Possible solutions: - Keep all libraries including the smaller ones and apply the TMM or DESeq normalization - Then treat each replicates as separate samples ( do not combine ) - Or combine replicates for each condition, for example, combine SEN_treated_1 and SEN_treated_2. - Discard the smaller libraries - Then normalize and compare the with-ERCC results and without-ERCC results - We will lose some statistical power without some replicates but most differential expression tools now can deal with missing replicates. 19

Recommend


More recommend