Predicting virus mutations through relational learning AIMM 2012 E - - PowerPoint PPT Presentation

predicting virus mutations through relational learning
SMART_READER_LITE
LIVE PREVIEW

Predicting virus mutations through relational learning AIMM 2012 E - - PowerPoint PPT Presentation

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion Predicting virus mutations through relational learning AIMM 2012 E Cilia 1 , S Teso 2 , S Ammendola 3 , T Lenaerts 1 , and A


slide-1
SLIDE 1

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Predicting virus mutations through relational learning

AIMM 2012 E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 September 9th, 2012

1 - D´ epartement d’Informatique, FS, Universit´ e Libre de Bruxelles 2 - Department of Computer Science and Information Engineering, University of Trento 3 - Ambiotec sas E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 1/24

slide-2
SLIDE 2

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Motivations

Mining relevant features from protein mutation data understanding the properties of functional sites developing novel proteins with useful/relevant function Rational Design engineering technique modifying existing proteins by site directed mutagenesis assumes knowledge (or intuition) about the effects of specific mutations involves extensive trial-and-error experiments also serves to improve understanding protein function

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 2/24

slide-3
SLIDE 3

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Introduction

An artificial system mimicking rational design

Goal To build an artificial system mimicking the rational design process A relational learning approach to:

1 mine rules from mutation data describing mutations

relevant to a certain behavior

2 use the rules to infer novel mutations that may induce a

similar behavior

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 3/24

slide-4
SLIDE 4

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

A Relational Learning Approach

background knowledge dataset of mutations / mutants rank of novel relevant mutations hypothesis

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 4/24

slide-5
SLIDE 5

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Step 1: Relational Learning Phase

Learning in First Order Logic

data D, background knowledge B and features induced during learning are represented in first order logic

res against(M,nnrti) ← mut(M,P) AND close to site(P)

head body searching for a set of clauses (hypothesis) covering all or most positive examples, and none or few negative ones. Advantages expressivity and interpretability of the learned model possibility to make use of specific background knowledge ability to learn rules from description of complex, structured entities the learnt rules constrain the rational design space

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 5/24

slide-6
SLIDE 6

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Step 2: Generative Phase

Mutation Generation Algorithm

Algorithm Mutation generation

1: input: background knowledge B, learned model H, k 2: output: rank of the most relevant mutations R 3: procedure GenerateMutations(B, H, k) 4: Initialize DM ← ∅ 5: A ← find all mutations m that satisfy at least one clause ci ∈ H 6: for m ∈ M do 7: score ← SM(m) . number of clauses ci satisfied by m 8: DM ← DM ∪ {(m, score)} 9: end for 10: R ← RankMuts(DM, B, H, k) . rank relevant mutations 11: return R 12: end procedure

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 6/24

slide-7
SLIDE 7

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

HIV-1 RT Drug Resistance

mining rules from HIV mutation data understand the virus adaptation mechanism design drugs that effectively counter potentially resistant mutants Datasets

1 Reverse Transcriptase (RT) mutations from the Los Alamos

National Laboratories HIV resistance database NRTI → 95 mutations NNRTI → 56 mutations

2 RT mutants from the Stanford HIV drug resistance

database NRTI → 639 mutants NNRTI → 747 mutants

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 7/24

slide-8
SLIDE 8

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Learning settings

Learning from mutations

Mutation -based learning Input examples: single amino-acid mutations conferring resistance to a class of drugs

aa(Pos,AA) mut(MutationID,AA,Pos,AA1)

Target concept: a model (i.e. set of rules) describing a mutation conferring resistance to a certain class

  • f drugs

res against(MutationID,Drug)

Learning setting: learn from positive examples only (annotation on mutations NOT conferring resistance is scarce) Output: generated resistance mutations

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 8/24

slide-9
SLIDE 9

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Background Knowledge

Background Knowledge Predicates (excerpt)

typeaa(T,AA) same type aa(R1,R2,T) same type mut t(MutID,Pos,T) close to site(Pos) location(L,Pos) catalytic propensity(AA,CP)

(Betts and Russell, 2003)

Background Knowledge Rules (example)

same type aa(R1,R2,T) ← typeaa(T,R1) AND typeaa(T,R2) different type mut t(MutID,Pos) ← mut(MutID,R1,Pos,R2) AND NOT same type aa(R1,R2,T)

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 9/24

slide-10
SLIDE 10

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Learned Hypothesis

Model for the resistance to NNRTI >wt ...AGLKKKKSVTVLDVG...YQYMDDLYVG...WETWWTEY...WIPEWEFVN... | | | | | | | | 98 112 181 190 398 405 410 418 D DD W W

mut(A,B,C,D) AND position(C,190) mut(A,B,C,D) AND position(C,190) AND typeaa(polar,D) mut(A,y,C,D) AND typeaa(aliphatic,D) mut(A,B,C,a) AND position(C,106)

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 10/24

slide-11
SLIDE 11

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Experimental Setting

Aleph ILP system (one-class classification setting ) 30 random training/test set splits (70/30) (for each of the 2 learning tasks) enrichement in the test mutations (recall) comparison against the random generator

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 11/24

slide-12
SLIDE 12

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Experimental Results

Mean recall % on 30 splits

Algorithm Random Generator NNRTI

86 • 58

NRTI

55 • 46

Mean n. generated mutations

  • n. test mutations

NNRTI

5201 17

NRTI

5548 28

(•) significant improvement evaluated with a paired Wilcoxon test (↵=0.01)

0" 10" 20" 30" 40" 50" 60" 70" 80" 90" 100" 1" 2" 3" 4" 5" 6" 7" 8" 9" 10" mean%recall% number%of%sa.sfied%clauses%per%generated%muta.on% NNRTI" NNRTI"(rand)" NRTI" NRTI"(rand)"

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 12/24

slide-13
SLIDE 13

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Learning settings

Learning from mutants

Mutant -based learning Input examples: mutant resistant or not to a class of drugs

aa(Pos,AA) mut(MutantID,AA,Pos,AA1)

Target concept: a model (i.e. set of rules) describing a mutant resistant to a certain class of drugs

res against(MutantID,Drug)

Learning setting: binary classification setting Output: generated resistant mutants with a single amino acid mutation

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 13/24

slide-14
SLIDE 14

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Experimental Setting

Aleph ILP system (binary classification setting ) 30 random training/test set splits (for each of the 2 learning tasks) enrichment in test set mutations as performance measure (recall) comparison against the random generator

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 14/24

slide-15
SLIDE 15

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Experimental Results

Mean recall % on 30 splits

Algorithm Random Generator NNRTI

17 • 1

NRTI

7 • 3

Mean n. generated mutations mean n. test mutations

NNRTI

236 26

NRTI

420 40

0" 2" 4" 6" 8" 10" 12" 14" 16" 18" 1" 2"

mean%recall% number%of%sa.sfied%clauses%% per%generated%muta.on%

NNRTI" NNRTI"(rand)" NRTI" NRTI"(rand)"

(•) significant improvement evaluated with a paired Wilcoxon test (↵=0.01)

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 15/24

slide-16
SLIDE 16

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Results

NNRTI rules (excerpt) res against(A,nnrti) ← mut(A,B,C,D) AND position(C,177) AND catalytic propensity(D,medium) AND same type mut t(A,C,polar) res against(A,nnrti) ← mut(A,B,C,D) AND catalytic propensity(D,high) AND typeaa(aromatic,B) AND same typeaa(D,B,neutral) NRTI rules (excerpt) res against(A,nrti) ← mut(A,B,C,D) AND position(C,33) res against(A,nrti) ← mut(A,B,C,r) AND typeaa(tiny,B) AND typeaa(polar,B) NNRTI prediction highlights Identified resistance survaillance mutations (53%): 103N, 106A, 181C, 181I, 181V, 188C, 188H, 190A, 190E, 190S Other identified resistance mutations (29% of Dataset 1): 98G, 227C, 190C, 190Q, 190T, 190V Other identified mutations (from the literature): 238N Other key positions from the rules are: 177 Highly scored not reported as resistance mutations: 181N, 181D, 318C, 232C NRTI prediction highlights Identified resistance survaillance mutations (18%): 67E, 67G, 67N, 116Y, 184V, 184I Other identified resistance mutations (18% of Dataset 1): 44D, 62V, 67A, 67S, 69R, 184T Other identified mutations (from the literature): 219H Other key positions from the rules are: 33, 194, 218 E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 16/24

slide-17
SLIDE 17

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Summary

Relational learning approach mimicking the rational design process: HIV RT mutations/mutants we built a relational knowledge base we mined relevant relational features for modeling resistance mutations/mutants we generated candidate mutations satisfying the learned rules promising results, both in the mutation-based and in the mutant-based learning settings, suggest a potential in guiding mutant engineering or predicting virus evolution

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 17/24

slide-18
SLIDE 18

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Future Work

Work in progress

extend the background knowledge

single nucleotide change(a,d). aamutations single(R1,R2) ← mut(M,R1,P,R2) AND (single nucleotide change(R1,R2) OR single nucleotide change(R2,R1))

post-processing involving mutant evaluation by statistical learning approaches and stability predictors or search against HIV genome databases and generalize the approach to jointly generate sets of related mutations (mutants with multiple mutations)

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 18/24

slide-19
SLIDE 19

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Future Work

From single to multiple amino acid mutations

Observations multiple mutations are often required in order to affect protein function neutral network theory claims that neutral mutations are required as intermediate steps to effective ones (debated)

...EYIQAKVQM...LDNLLNIEVAY... ...EYIQAKVQM...LDNLLDIEVAY... ...EYIQAKVQM...LENLLDIEVAY... ...EYIQAKVQM...LENLLNIEVAY...

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 19/24

slide-20
SLIDE 20

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Future Work

From single to multiple amino acid mutations

Predicting multiple mutations predicting single mutations does not consider the joint effect of multiple mutations trying all possible combinations is computationally infeasible (and not enough data)

...EYIQAKVQM...LDNLLNIEVAY... ...EYIQAKVQM...LDNLLDIEVAY... ...EYIQAKVQM...LENLLDIEVAY... ...EYIQAKVQM...LENLLNIEVAY...

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 20/24

slide-21
SLIDE 21

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Predicting multiple mutations

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 21/24

slide-22
SLIDE 22

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Predicting multiple mutations

>m542 PISPIET FAIKKKSSS PLDKDFRKY ELREHLLKWGFY EIQKQGPGQWT IVGAETF >wt PISPIET...FAIKKKDST...PLDEDFRKY...ELRQHLLRWGFT...EIQKQGQGQWT...IVGAETF >m2012 PISPIET FAIKKKDST PLDESFRKY KLREHLLRWGFT EVQKQGPDQWT IPGAETY ******* ******.*: ***:.**** :**:***:*** *:**** .*** * ****:

  • - - -

| | | | 67 123 207 334 mut(A,B,C,p),pos(C,334),correlated_mut(A,D,E),pos(D,207),typeaa(A,E,negative). >m2006 PMSPIET FAIKKKDST PLHEDFRKY ELREHLLKWGLT EVQKQGPDQWT IAGAETY >wt PISPIET...FAIKKKDST...PLDEDFRKY...ELRQHLLRWGFT...EIQKQGQGQWT...IVGAETF >m1288 PISPIDT FAIKKKNSD PLDESFRKY ELREHLLKWGFF EIQKQGPGQWT IPGAETY *:***:* ******:* ** *.**** ***:***:**: *:**** .*** * ****:

  • - - - -

| | | | | 67 121 123 207 334 E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 22/24

slide-23
SLIDE 23

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Predicting multiple mutations

>m542 PISPIET FAIKKKSSS PLDKDFRKY ELREHLLKWGFY EIQKQGPGQWT IVGAETF >wt PISPIET...FAIKKKDST...PLDEDFRKY...ELRQHLLRWGFT...EIQKQGQGQWT...IVGAETF >m2012 PISPIET FAIKKKDST PLDESFRKY KLREHLLRWGFT EVQKQGPDQWT IPGAETY ******* ******.*: ***:.**** :**:***:*** *:**** .*** * ****:

  • - - -

| | | | 67 123 207 334 mut(A,B,C,p),pos(C,334),correlated_mut(A,D,E),pos(D,207),typeaa(A,E,negative). 67 122 328 | | |

  • - -

PISPIET...FAIKKKDST...PLDNDFRKY...ELREHLLRWGFT...EIQKQGPGQWT...IVGAETF PISPIET...FAIKKKDST...PLDEDFRKY...ELRDHLLRWGFT...QIQKQGPGQWT...IVGAETF PISPIET...FAIKKKSST...PLDEDFRKY...ELRDHLLRWGFT...EIQKQGPGQWT...IVGAETF >m2006 PMSPIET FAIKKKDST PLHEDFRKY ELREHLLKWGLT EVQKQGPDQWT IAGAETY >wt PISPIET...FAIKKKDST...PLDEDFRKY...ELRQHLLRWGFT...EIQKQGQGQWT...IVGAETF >m1288 PISPIDT FAIKKKNSD PLDESFRKY ELREHLLKWGFF EIQKQGPGQWT IPGAETY *:***:* ******:* ** *.**** ***:***:**: *:**** .*** * ****:

  • - - - -

| | | | | 67 121 123 207 334 E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 23/24

slide-24
SLIDE 24

Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion

Thank you Questions ?

Elisa Cilia

ecilia@ulb.ac.be

E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 24/24