Balancing Selection and Beyond: Machine learning approaches for determining selection scenarios in a complex parameter space Thursday 14 th February 2019 Kaileigh Ahlquist Ramachandran Lab, Brown University
Balancing selection maintains malaria resistance and sickle cell anemia Malaria Disease Pressure AA AT TT Pauling, Linus, et al. 1949. Science Ingram, V.M. 1957. Nature Malaria Sickle Cell Malaria Resistant Susceptible Anemia
Balancing selection creates diversity in self-incompatibility systems S3 S1 Plant Reproductive System S1 S2 Incompatible: S1, S2 Compatible: S3, S4 Uyenoyama, M.K., and E. Newbigin. 2000. Plant Cell Kamau, Esther, and Deborah Charlesworth. 2005. Current Biology
Balanced sites in the genome are important and there is potential for more discovery Problem 1: Distinguishing multiple Human modes of selection chromosome 6 “The MHC is one of the most prominent examples of balancing selection in the vertebrate genome” M ajor Lenz, Tobias L., et al. 2016. Molecular Biology and Evolution H istocompatibility C omplex “…several alleles within the MHC region show evidence for recent selective sweeps.” De Bakker, et al. 2006. Nature Genetics “Searching human and chimpanzee gene sequences for trans-specific polymorphism, we uncovered little evidence for long-term balancing selection” Charlesworth, Deborah. 2006. PLoS Genetics
There are multiple types of balancing selection Problem 2: Identifying Example of heterozygote multiple types of advantage/overdominance balancing selection Example of negative frequency dependent selection Problem 3: Variable selection parameters and detection limits
Core problems in balancing selection Problem 1: Distinguishing multiple modes of selection • Example: positive selection, background selection, balancing selection Problem 2: Identifying multiple types of balancing selection • Example: overdominance, frequency dependent Problem 3: Variable selection parameters and detection limits • Example: age of mutation, selection strength, overlapping events
Methods to detect balancing selection focused on older events, polymorphic sites “…specifically tailored to uncover regions of long-term balancing selection” Cheng, Xiaoheng, and Michael DeGiorgio. 2018. bioRxiv preprint NCD statistic “…the new methods have limited power to detect young balanced polymorphisms” DeGiorgio, Michael, Kirk E. Lohmueller, and Rasmus Nielsen. 2014. Molecular Biology and Evolution T1, T2 BALLET statistics Methods to detect positive selection tend to focus on sweeps
Testing Training Known Known Unknown Unknown Unknown Unknown Balanced Sweep 1 2 3 4 Site Site Data Balancing Detection 91 undefined 89 77 undefined 27 Statistic Sweep Detection 54 96 56 98 94 71 Statistic Classification Balancing Statistic Only Balanced ? Balanced Balanced ? ? Combined Statistics Balanced Sweep Balanced Sweep Sweep ?
SWIF(r) is a machine learning approach that uses multiple statistics, handles missing data, and can compare multiple selection scenarios SWIF(r) = SWeep Inference Framework (controlling for correlation) “can be run without imputing undefined statistics” “explicitly learns pairwise joint distributions of selection statistics, which gives substantial gains in power” “computes the per-site calibrated probability of selective sweep, which is immediately interpretable and does not require any selective phenomena with comparison with a genome-wide training examples distribution”
SWIF(r) joint distributions gain power over individual statistics Frequency Statistic 1 Hard to separate Statistic 1 Frequency Statistic 2 Statistic 2
SWIF(r) is trained on simulated data to create many instances where the selection scenario is known Simulation Pipeline Population experiencing selection Neutrally evolving population Control: Neutrally evolving population Haller, Benjamin C., and Philipp W. Messer. 2017. “SLiM 2: Flexible, Interactive Forward Genetic Simulations.” Molecular Biology and Evolution
SWIF(r) classification with multiple modes of selection Statistic 2 Statistic 4 Statistic 1 Statistic 3
SWIF(r) can usefully express ambiguity SWIF(r) Classification: Highest Probability 100 Neutral True Classification Probability of class “Sweep” 0.472015870602 Sweep Problem 1: 0.524628895006 0.661572464304 Distinguishing multiple 0.62060257772 modes of selection 0.577741418065 Balanced 0.58133184056 0.509290518833 0.537009355657 0 0.599232903772 0.773543482346 ... SWIF(r) Classification
Finding detection limits with similar selection scenarios Heterozygote advantage/ Heterozygote fitness > Homozygote fitness overdominance 2 alleles present Stable at 50% Negative frequency Fitness adjusted in each generation depending on dependent allele frequency selection
SWIF(r) can identify ambiguity between similar modes of selection Neutral Overdominance (b01) Frequency Dependent (b02) Statistic 2 Statistic 2 Statistic 1 Statistic 3
SWIF(r) can identify ambiguity between similar modes of selection SWIF(r) Classification: Highest Probability 100 Neutral Problem 2: Identifying True Classification multiple types of balancing selection Overdominance Problem 3: Variable selection parameters Frequency- and detection limits Dependent 0 Neutral O FD SWIF(r) Classification
Addressing core problems in balancing selection Problem 1: Distinguishing multiple modes of selection • With SWIF(r) we can compare multiple modes, even if some data is missing, and get a probability of each mode Problem 2: Identifying multiple types of balancing selection • With SWIF(r) we can measure different types of balancing selection and determine how similar or distinct they are Problem 3: Variable selection parameters and detection limits • We can use SWIF(r) to find detection limits
Applications and Future Directions • Understanding ambiguity: • Allows us to asses claims made in the literature more accurately • Encourages targeted development of new methods • Accurately classifying sites: • Identifies targets for experimentation, modification or gene therapy
Acknowledgements Sohini Ramachandran Wei Cheng Lauren Alpert Sudgen Priyanka Nakka Katherine Brunson Sahar Shahamatdar Michael Turchin Sam Smith Committee Members: Mark Johnson Molecular Biology, Cell David Rand Biology and Biochemistry Daniel Weinreich Graduate Program
Recommend
More recommend