Differential expression analysis of SR and SEN cells before and after IL2 treatment Lu Wang and Ying Sha 9/18/2014 1
Update since the 9/15/2014 slides: 1. Standardized the p-value cutoff (P < 0.05) for all DE analysis tools (except for GFOLD where p-value does not apply) 2. All related plots has been updated 3. Added differential expression comparison between SR treated and SEN treated 4. Added GFOLD results for drop two small SEN library while keeping the SR replicates. 5. Validation on genes with known behavior. 9/18/2014 2
Previously, we evaluated five different normalization methods (with and without ERCC normalization) and decided together with Victoria on TMM normalization (without ERCC) In this presentation, we are dealing with two issues: 1. How to best treat the technical replicates (in particular the low abundance samples) 2. The best differential expression method to use (conditioned upon the different methods used to treat the technical replicates) 9/18/2014 3
• Three ways of combining datasets • Technical Replicates(Tech): Treat sequencing runs for the same cell and condition as technical replicates • Drop Small Libraries(Drop): Treat sequencing runs for the same cell and condition as technical replicates but drop the smaller libraries/replicates • Combine All Replicates(Combine): Combine sequencing runs/technical replicates for the same cell and condition • Five differential expression calling tools • DESeq: Frequently used DE Analysis tool when replicates are present; Does not work well without replicates • edgR: Frequently used DE Analysis tool when replicates are present; Does not work without replicates • Gfold: Works well without replicates • NOISeq: Looks at dRPKM and Fold-change at the same time; Works with or without replicates • DEGseq: Works with or without replicates. 9/18/2014 4
DESeq edgeR GFold NOISeq DEGSeq Tech Drop X Combine X = methods wont work 9/18/2014 5
Distribution of Raw Read Counts Technical Replicates(Tech) Drop Small Libraries(Drop) Combine All Replicates(Combine) log 2 Read counts log 2 Read counts log 2 Read counts Drop and Combine gives more similar distribution across samples before normalization 9/18/2014 6
Distribution of Raw Read Counts Technical Replicates(Tech) Drop Small Libraries(Drop) Combine All Replicates(Combine) Density Drop and Combine gives more similar distribution across samples before normalization 9/18/2014 7
Distribution of TMM Normalized Counts Technical Replicates(Tech) Drop Small Libraries(Drop) Combine All Replicates(Combine) After normalization , all three methods, Tech, Drop and Combine gives similar distribution across samples 9/18/2014 8
Distribution of TMM Normalized Counts Technical Replicates(Tech) Drop Small Libraries(Drop) Combine All Replicates(Combine) Density After normalization , all three methods, Tech, Drop and Combine gives similar distribution across samples 9/18/2014 9
For each combination of dataset and tools, DESeq edgeR GFold NOISeq DEGSeq Tech Drop Combine there are four differential expression comparisons • IL-2 treated SR cells and untreated SR cells • IL-2 treated SEN cells and untreated SEN cells • Untreated SEN cells and untreated SR cells • IL-2 treated SEN cells and treated SR cells 9/18/2014 10
Comparison #1 DESeq edgeR GFold NOISeq DEGSeq Tech Drop X Combine X = methods wont work 9/18/2014 11
edgeR vs s DESeq SR treated vs SR untreated SEN treated vs SEN untreated SEN untreated vs SR untreated edgeR down DESeq down edgeR down DESeq down edgeR up edgeR up DESeq up DESeq up No data from edgeR • Overlap exist in opposite DEG lists. • DEG lists have high overlop. DESeq identified 46.1% of total transcripts as DEGs. • Each condition contains a low-library-size replicate. 9/18/2014 12
Comparison #1 - Conclusion Treating technical replicates separately does not yield reliable results Therefore, either the low abundance samples need to be dropped or the technical replicates need to be combined 9/18/2014 13
Comparison #1 - Conclusion DESeq edgeR GFold NOISeq DEGseq X X Tech Drop X Combine X = methods wont work 9/18/2014 14
Comparison #2 DESeq edgeR GFold NOISeq DEGSeq X X Tech Drop X Combine 9/18/2014 15
Comparison #2 SR treated vs SR untreated SEN treated vs SEN untreated edgeR up DESeq down edgeR up DESeq down edgeR edgeR DESeq up DESeq up down down • Overall, DESeq and edgeR does not share a high proportion of DE genes except for SEN untreated vs. SR untreated where the difference between two conditions are relatively large. • This could be result from partially missing replicates from SEN samples. 9/18/2014 16
Comparison #2 SEN treated vs SR treated SEN untreated vs SR untreated edgeR up DESeq down edgeR up DESeq down edgeR edgeR DESeq up DESeq up down down • Overall, DESeq and edgeR does not share a high proportion of DE genes except for SEN untreated vs. SR untreated where the difference between two conditions are relatively large. • This could be result from partially missing replicates from SEN samples. 9/18/2014 17
Comparison #2 - Conclusion Dropping low abundance samples results in low overlap of genes identified as differentially expressed (IL2-up or IL-2 down) using different analysis methods Therefore, either the technical replicates (including the low abundance samples) need to be combined 9/18/2014 18
Comparison #2 - Conclusion DESeq edgeR GFold NOISeq DEGseq X X Tech X X Drop X Combine X = methods wont work 9/18/2014 19
Comparison #3 DESeq edgeR GFold NOISeq DEGSeq X X Tech X X Drop X Combine X = methods wont work 9/18/2014 20
Comparison #3 SR treated vs SR untreated SEN treated vs SEN untreated NOISeq NOISeq NOISeq up NOISeq up down down DESeq DESeq DESeq DESeq up up down down • Overall, NOISeq failed to identify a reasonable number of differentially expressed genes from the combined data with FDR adjusted p-value < 0.05 9/18/2014 21
Comparison #3 SEN treated vs SR treated SEN untreated vs SR untreated NOISeq NOISeq NOISeq up NOISeq up down down DESeq DESeq DESeq DESeq up up down down • Overall, NOISeq failed to identify a reasonable number of differentially expressed genes from the combined data with FDR adjusted p-value < 0.05 9/18/2014 22
Comparison #3 - Conclusion Overall, NOISeq failed to identify any differentially expressed genes from the combined data with the same confidence as DESeq DESeq also yields a very low number of differentially expressed genes. This is likely due to combining samples resulting in a lack of replicates (which DESeq is sensitive to) 9/18/2014 23
Comparison #3 - Conclusion DESeq edgeR GFold NOISeq DEGseq X X Tech X X Drop ? X X Combine X = methods wont work ? = still not sure which is the best 9/18/2014 24
Comparison #4 DESeq edgeR GFold NOISeq DEGSeq X X Tech X X Drop ? X X Combine X = methods wont work ? = still not sure which is the best 9/18/2014 25
Comparison #4 SR treated vs SR untreated SEN treated vs SEN untreated Combine Combine Combine up Combine up down down Drop Drop Drop up Drop up down down • Overall, DE genes detected by DESeq from Combined method were subsets of ones from Drop method. • This is because the Drop method kept the SR technical replicates while the Combine method do not have any replicates. • This proves that DESeq tend to give a more conservative set of DE genes when replicates are missing. 9/18/2014 26
Comparison #4 SEN treated vs SR treated SEN untreated vs SR untreated Combine Combine up Combine down Combine up down Drop Drop up Drop Drop up down down • Overall, DE genes detected by DESeq from Combined method were subsets of ones from Drop method. • This is because the Drop method kept the SR technical replicates while the Combine method do not have any replicates. • This proves that DESeq tend to give a more conservative set of DE genes when replicates are missing. 9/18/2014 27
Comparison #4 - Conclusion DESeq does indeed identify many more differentially expressed genes using the drop method (whereby low abundance technical replicates are removed) than the combine method (whereby technical replicates are combined to a single sample) This confirms the dependence of DESeq on replicates and indicates that it should not be used here 9/18/2014 28
Comparison #4 - Conclusion DESeq edgeR GFold NOISeq DEGSeq X X Tech X X Drop ? X X Combine X = methods wont work ? = still not sure which is the best 9/18/2014 29
Comparison #5 DESeq edgeR GFold NOISeq DEGSeq X X Tech X X Drop ? X X Combine X = methods wont work ? = still not sure which is the best 9/18/2014 30
Comparison #5 SR treated vs SR untreated SEN treated vs SEN untreated GFOLD GFOLD DEGseq up DEGseq up down down DEGseq DEGseq GFOLD up GFOLD up down down • GFOLD and DEGseq performs similarly in terms of number of DE genes identified • Most of DE genes identified are shared between two tools 9/18/2014 31
Recommend
More recommend