ComiR: A New Efficient Tool for Predicting Multiple miRNA Targets Claudia Coronnello, PhD Dept. of Computational and System Biology, UPMC Mentor: Panayiotis (Takis) Benos, AP
Agenda • Introduction to miRNAs and to existing miRNA target prediction tools • Why develop ComiR? • ComiR: the algorithm • ComiR: training and validation • Next steps • Conclusion
miRNA • Mature miRNA: 20-24 nucleotides • Function: translation regulation • RISC assembly: Argonaute protein & mature miRNA & mRNA target • Binding sites are mostly located in 3’UTR regions • One miRNA regulates many genes and vice versa
miRNA target prediction tools Tool Features PITA It considers the difference between the free energy gained from the formation of the miRNA-target duplex and the energetic cost of unpairing the target to make it accessible to the miRNA. miRanda Sequence binding, thermodynamics-based miRNA-mRNA duplex prediction and comparative sequence analysis TargetScan Thermodynamics-based miRNA-mRNA duplex prediction and comparative sequence analysis. Focus on seed region. mirSVR *** Based on regression method for predicting likelihood of target mRNA down-regulation from sequence and structure features in microRNA/mRNA predicted target sites. All the existing tools are focused on single miRNA target prediction
Why develop ComiR? • Endogenous miRNAs are typically in the order of tens. • Biological question: what are the genes regulated by a set of miRNAs? • With a naïve application of existing tools, the set of predicted targets is not informative (too big or too small). • Often we find a poor agreement between the sets of targets predicted with different tools. • ComiR is a target prediction tool designed to predict targets of a set of miRNAs.
ComiR: the algorithm Input example m miRNA List of hsa-miR-1 0.5 miRNAs hsa-miR-2 0.01 hsa-miR-3 0.25 … … Expression of miRNAs
ComiR: the algorithm Input Single miR target pred E ijk miRanda Target predictions are D G ijk PITA List of computed separately for miRNAs each single miRNA i of the N ik TargetSca n input list. FC ijk mirSVR Pre-computed values Expression of miRNAs i … miRNAs k … genes j … multiple miRNA -gene binding sites
ComiR: the algorithm Input Single miR target pred Score combination N ij miRs 1 FD miRanda FD : S k ( S ijk RT m i )/ RT 1 e j 1 i FD PITA List of miRs Targetscan : S k m i N ik miRNAs w-sum TargetSca i n N ij miRs w-sum mirSVR mirSVR : S k m i FC ijk j 1 i Expression of miRNAs FD: Fermi Dirac model 1 w-sum: weighted sum (1) Zhao, Y., D. Granas, and G.D. Stormo, Inferring binding energies from selected binding sites. PLoS Comput Biol, 2009. 5 (12): p. e1000590.
ComiR: the algorithm Input Single miR target pred Score combination Tools integration FD miRanda FD PITA List of SVM miRNAs w-sum TargetSca n w-sum mirSVR Expression • Support Vector Machine classifier • Training of miRNAs
ComiR: the algorithm Input Single miR target pred Score combination Tools integration Output FD miRanda Gene FD PITA List of SVM ranking miRNAs w-sum TargetSca n w-sum mirSVR Expression of miRNAs
ComiR: the algorithm Input Single miR target pred Score combination Tools integration Output FD miRanda Gene FD PITA List of SVM ranking miRNAs w-sum TargetSca n w-sum mirSVR ComiR Expression of miRNAs
ComiR: training • Input Set of miRNAs • Known output Targets -> Positive set No-Targets -> Negative set
ComiR: training set AGO1 IP experiment in D.melanogaster S2 cells: • 28 miRNAs (known expression level) • 142 mRNAs over-expressed in IP and up-regulated after AGO1 depletion ( POSITIVE SET ) • 142 mRNAs not over-expressed in IP and not up-regulated after AGO1 depletion ( NEGATIVE SET ) AGO1 IP enriched not-enriched AGO1 depletion down up 142 287 949 4907 Hong, X., et al., Immunopurification of Ago1 miRNPs selects for a distinct class of microRNA targets. Proceedings of the National Academy of Sciences of the United States of America, 2009. 106 (35)
ComiR: validation • Test set – Set of miRNAs – Targets -> Positive set – No-Targets -> Negative set • Calculate sensitivity (SN) and specificity (SP) TP TN SN SP TP FN TN FP TP: number of true positive TN: number of true negative FN: number of false negative FP: number of false positive
ComiR: validation • Tool’s performance comparison by ROC curves analysis. • Calculate SN and SP at different threshold level • Calculate the area under the resulting curve (AUC) AUC
ComiR: validation ROC – self training set ROC – extended set Algorithm AUC Algorithm AUC ComiR 0.864 ComiR 0.724 PITA 0.713 PITA 0.647 MIRANDA 0.694 MIRANDA 0.616 mirSVR 0.663 mirSVR 0.801 Targetscan 0.792 Targetscan 0.644 Self-test 142 top expressed
ComiR: training and validation Training set Test sets Normalization
ComiR: validation H.sapiens – hek293 cells PAR-CLIP protocol AGO1 IP • 27 top expressed miRNAs • Positive test set: 2083 genes with CCR matching the top 27 miRs in 3’ UTR sequence • Negative test set: 2083 genes without CCR matching the top 27 miRs, with the highest average expression.
ComiR: validation ROC – H.sapiens test set H.sapiens – hek293 cells PAR-CLIP protocol AGO1 IP • 27 top expressed miRNAs • Positive test set: Algorithm AUC 2083 genes with CCR matching the ComiR 0.774 top 27 miRs in 3’ UTR sequence PITA 0.649 MIRANDA 0.626 mirSVR 0.65 • Negative test set: Targetscan 0.601 2083 genes without CCR matching the top 27 miRs, with the highest average expression.
Conclusion • ComiR outperforms existing tools in predicting the targets of set of miRNAs, as validated by IP or PAR-CLIP experiment. • ComiR outperforms existing tools in predicting single miRNA targets, tested on experimental validated miRNA-mRNA pairs.
Next steps • Developing a web interface to run custom ComiR target predictions on any set of miRNAs • Improve ComiR predictions by including the information on the location of the target prediction site in the mRNA sequence.
Acknowledgment N. Kaminski, MD Benos lab L. Huleihel K. Pandit, PhD M. Butterworth, PhD G. Stormo, PhD I. Rakova, PhD
Questions
Recommend
More recommend