uncovering interactions with random forests
play

Uncovering interactions with Random Forests Jake Michaelson Marit - PowerPoint PPT Presentation

Uncovering interactions with Random Forests Jake Michaelson Marit Ackermann Andreas Beyer Random Forests >> ensembles of decision trees >> diverse trees trying to solve the same problem >> used frequently for: >>


  1. Uncovering interactions with Random Forests Jake Michaelson Marit Ackermann Andreas Beyer

  2. Random Forests >> ensembles of decision trees >> diverse trees trying to solve the same problem >> used frequently for: >> prediction (knowledge of model less important) >> feature selection (prediction less important)

  3. RF interactions: prior art >> online official RF manual >> Lunetta, et al. (2004) >> Bureau, et al. (2005) >> pairwise permutation importance >> Mao and Mao (2008) >> Jiang, et al. (2009) >> selection with RF Gini importance, conventional (LM-based) interaction test (up to 3-way)

  4. a typical problem selection frequency 0.020 0.010 0.000 0 200 400 600 800 1000 predictors selection frequency 0.020 0.010 0.000 0 200 400 600 800 1000 predictors

  5. a typical problem selection frequency 0.020 0.010 0.000 0 200 400 600 800 1000 predictors selection frequency 0.020 0.010 0.000 0 200 400 600 800 1000 predictors

  6. a typical problem selection frequency 0.020 0.010 0.000 0 200 400 600 800 1000 predictors selection frequency 0.020 0.010 0.000 0 200 400 600 800 1000 predictors

  7. a typical problem selection frequency 0.020 0.010 0.000 0 200 400 600 800 1000 predictors ? selection frequency 0.020 0.010 0.000 0 200 400 600 800 1000 predictors

  8. split symmetry A B B B B B B B B B B

  9. split asymmetry A B B B B B B B B B B

  10. testing split symmetry >> independence of predictors A and B: >> expect B as left daughter 50% of the time >> expect B as right daughter 50% of the time >> the prior (a beta density) is centered around 0.5

  11. testing split symmetry 20 15 density 10 5 0 0.0 0.2 0.4 0.6 0.8 1.0 proportion

  12. testing split symmetry >> we update the prior density parameters with the observed left/right daughter counts: >> a posterior = a prior + AB left >> b posterior = b prior + AB right >> ... and take the posterior/prior density ratio at 0.5 >> this is the Bayes factor

  13. testing split symmetry 20 15 density 10 5 0 0.0 0.2 0.4 0.6 0.8 1.0 proportion

  14. building a graph >> using the Bayes factor from each pair of predictors, we calculate the posterior probability of symmetry >> i.e. that the true proportion is 0.5 >> we use a high prior probability of the hypothesis (e.g. p h = 0.999999)

  15. building a graph posterior probabilities adjacency matrix graph A A B C D A B C D A 1 0.001 0.001 0.3 A 0 1 1 0 C B B 0.8 1 0.99 0.2 B 0 0 0 0 C 0.99 0.3 1 0.003 C 0 0 0 1 D D 1 0.89 0.99 1 D 0 0 0 0

  16. simulations >> 1000 binary predictor variables, 200 observations >> 3 - 4 predictors participate in true model >> tested ability of the method to recover the true topology of the simulated model >> recorded TP, FP while varying mtry and ntree

  17. test models 3 independent A B C effects (i.e. no edges) 1000 1000 250 500 750 250 500 750 2500 2500 ntree ntree 5000 5000 7500 7500 10000 10000 mtry mtry TP FP

  18. test models A C 3-way unordered interaction B 1000 1000 250 500 750 250 500 750 2500 2500 � ntree ntree 5000 5000 � � 7500 7500 � � 10000 10000 mtry mtry TP FP

  19. test models A one main effect, one ordered 3-way interaction, B D one ordered 2-way interaction C 1000 1000 250 500 750 250 500 750 � � 2500 2500 � � ntree ntree 5000 5000 � � 7500 7500 � � � 10000 10000 mtry mtry TP FP

  20. test models A C two independent, ordered two-way interactions B D 1000 1000 250 500 750 250 500 750 � � 2500 2500 � � � ntree ntree 5000 5000 � � � 7500 7500 � � � � 10000 10000 mtry mtry TP FP

  21. real world >> Gabrb3 >> neurotransmitter receptor subunit >> absence (or misexpression) yields autism-like behavior >> what mechanisms influence Gabrb3 expression? Livet, et al. (2007)

  22. regulation of Gabrb3 grow an RF that regresses gnf07.050.858 rs13479274 rs13479276 CEL.7_46763479 rs3693478 rs6166250 hippocampal Gabrb3 expression on the genotypes (m=3,794) of the same population of mice, then extract the interaction rs13481641 CEL.2_73370728 rs6375622 rs4220193 rs8274734 rs3164054 rs4221305 graph rs13478123

  23. regulation of Gabrb3 grow an RF that regresses L1 gnf07.050.858 rs13479274 rs13479276 CEL.7_46763479 rs3693478 rs6166250 hippocampal Gabrb3 expression on the genotypes (m=3,794) of the same population of mice, then extract the interaction rs13481641 CEL.2_73370728 rs6375622 rs4220193 rs8274734 rs3164054 rs4221305 graph rs13478123

  24. regulation of Gabrb3 grow an RF that regresses L1 gnf07.050.858 rs13479274 rs13479276 CEL.7_46763479 rs3693478 rs6166250 hippocampal Gabrb3 expression on the genotypes (m=3,794) of the same population of mice, then extract the interaction L2 rs13481641 CEL.2_73370728 rs6375622 rs4220193 rs8274734 rs3164054 rs4221305 graph rs13478123

  25. regulation of Gabrb3 grow an RF that regresses L1 gnf07.050.858 rs13479274 rs13479276 CEL.7_46763479 rs3693478 rs6166250 hippocampal Gabrb3 expression on the genotypes (m=3,794) of the same population of mice, then extract the interaction L2 rs13481641 CEL.2_73370728 rs6375622 rs4220193 rs8274734 rs3164054 rs4221305 graph L3 rs13478123

  26. regulation of Gabrb3 grow an RF that regresses L1 gnf07.050.858 rs13479274 rs13479276 CEL.7_46763479 rs3693478 rs6166250 hippocampal Gabrb3 expression on the genotypes (m=3,794) of the same population of mice, then extract the interaction L2 rs13481641 CEL.2_73370728 rs6375622 rs4220193 rs8274734 rs3164054 rs4221305 graph L1 L3 genomic L2 rs13478123 variation L3 Gabrb3 expression

  27. regulation of Gabrb3 grow an RF that regresses L1 gnf07.050.858 rs13479274 rs13479276 CEL.7_46763479 rs3693478 rs6166250 hippocampal Gabrb3 expression on the genotypes (m=3,794) of the same population of mice, then extract the interaction L2 rs13481641 CEL.2_73370728 rs6375622 rs4220193 rs8274734 rs3164054 rs4221305 graph L1 L3 genomic L2 rs13478123 variation L3 Gabrb3 expression

  28. regulation of Gabrb3 L1 gnf07.050.858 rs13479274 rs13479276 CEL.7_46763479 rs3693478 rs6166250 L1 - Gabrb3 (cis effect) L2 - Dscam (axon guidance) L3 - Magi2 (synaptic scaffolding) L2 rs13481641 CEL.2_73370728 rs6375622 rs4220193 rs8274734 rs3164054 rs4221305 L1 L3 genomic L2 rs13478123 variation L3 Gabrb3 expression

  29. the context

  30. the context Dscam Magi2 Gabrb3

  31. conclusion >> (a)symmetry of transitions between subsequently selected variables can give us clues about the degree of dependence between them >> constructing a graph of these dependencies can illustrate the emergent dependency structure of the predictors in light of the response

  32. forthcoming... >> does this work for continuous and categorical predictors? >> what about correlated predictors? >> strategy for choosing optimal mtry and ntree?

  33. RF is an example of a tool that is useful in doing analyses of scientific data. But the cleverest algorithms are no substitute for human intelligence and knowledge of the data in the problem. Take the output of random forests not as absolute truth, but as smart computer generated guesses that may be helpful in leading to a deeper understanding of the problem. - Breiman & Cutler

  34. Thanks! jacob.michaelson@biotec.tu-dresden.de

Recommend


More recommend