Effects of Sequencing Errors on geno2pheno [coreceptor] Alejandro Pironti, Saleta Sierra, Rolf Kaiser, Thomas Lengauer and Nico Pfeifer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics April 18, 2013
Motivation • How safe is a geno2pheno [coreceptor] prediction? • What happens if the submitted sequence contains (editing) errors? • Do sequence errors have the same influence on X4 and R5 predictions? • What is the influence of cut-offs in this context? Alejandro Pironti April 18, 2013
Materials and Methods • 70,644 HIV-1 nucleotide In-silico Experiment 1: sequences: Exchange in-silico each position in each sequence in dataset. – Non-duplicated V3 regions of the ENV gene – Replace original nucleotide by – Los Alamos National Laboratory another nucleotide or IUPAC Sequence Database ambiguity code – Evaluate with • Dataset for in-silico geno2pheno [coreceptor] experiment 1: In-silico Experiment 2: – All sequences in a dataset Introduce one, two or three random • Datasets for in-silico experiment changes in each sequence 2: – Position(s) chosen at random – Build 6 datasets containing 1000 – Differentiate between sequences each nucleotides and ambiguity codes – Choose sequences at random In both experiments, sequences with alignment errors are discarded. Alejandro Pironti April 18, 2013
Evaluation of Unchanged Sequences • 65,309 sequences aligned correctly. • Average FPR: 44.65 (SD=33) Histogram of the original FPRs Average FPR Logo for 5 most frequent aminoacids. Height of letter is proportional to frequency. Color: see key to the right 0 50 100 Alejandro Pironti April 18, 2013
In-silico Experiment 1: Altered Sequence FPRs Original average FPR: 44.65 (SD=33) Altered average FPR: 42.18 (SD= 34) Comparison of the FPR histograms for the unchanged and the altered sequences. Alejandro Pironti April 18, 2013
In-silico Experiment 1: Mean FPR Shifts by Position Aminoacid position 11 On average: •64 positions lower FPR •41 positions increase FPR Aminoacid position 25 Alejandro Pironti April 18, 2013
In-silico Experiment 1: Mean FPR Shifts by Nucleotide Alejandro Pironti April 18, 2013
In-silico Experiment 1: Effect of Cut- Offs on Predicted Tropism FPR ≤ 5: X4, 5 < FPR < 15: Intermediate, FPR ≥ 15: R5 Data X4 Intermediate R5 Original Sequences 10,484 (16%) 6,157 (9%) 48,668 (75%) Altered Sequences 16,538,450 (17%) 12,109,891 (13%) 66,396,041 (70%) FPR < 10: X4, FPR ≥ 10: R5 Data X4 R5 Original Sequences 13,181 (20%) 52,128 (80%) Altered Sequences 23,625,020 (25%) 71,419,362 (75%) FPR < 20: X4, FPR ≥ 20: R5 Data X4 R5 Original Sequences 20,385 (31%) 44,924 (69%) Altered Sequences 34,047,123 (36%) 60,997,259 (64%) Alejandro Pironti April 18, 2013
In-silico Experiment 1: Effect of Cut- Offs on Predicted Tropism FPR ≤ 5: X4, FPR < 10: X4, FPR ≥ 10: R5 5 < FPR < 15: Intermediate, FPR ≥ 15: R5 0.02 0.98 0.94 X4 R5 0.00 0.06 0.96 0.01 0.92 X4 R5 FPR < 20: X4, FPR ≥ 20: R5 0.07 0.04 0.14 0.16 0.07 0.93 0.90 Int X4 R5 0.10 0.70 = ∩ = ( ) P switch T original T = = = ( | ) s o P switch T original T = s o ( ) P original T o Alejandro Pironti April 18, 2013
In-silico Experiment 2: Results Change in Predicted Tropism to X4 FPR < 20: X4, FPR ≥ 20: R5 Change in Predicted Tropism to R5 10 9 8 7 6 % Switches 5 4 3 2 1 0 1 Nucleotide 2 Nucleotide 3 Nucleotide 1 Ambiguity 2 Ambiguity 3 Ambiguity Change Changes Changes Change Changes Changes Each pair of bars is one experiment with 1000 sequences. One, two or three nucleotide changes were introduced to each sequence at random. Changes were either nucleotides or ambiguity codes. Alejandro Pironti April 18, 2013
Conclusions • MVC prescription with genotypic geno2pheno [coreceptor] tropism determination is safe (European coreceptor proficiency panel) • Changes in predicted tropism from R5 to X4 are more frequent • FPR shifts can vary depending on nucleotide position • Changes with unique nucleotides cause larger shifts than those with ambiguity codes • Importance of accurate base calling is underlined Alejandro Pironti April 18, 2013
Recommend
More recommend