Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490 HST.506 David Gifford Lecture 19 April 16, 2020 Predicting genome editing outcomes with machine learning methods http://mit6874.github.io 1
Poll Warm Up: Where are you located today? How do you prefer to receive lectures? 2
Predicting the outcomes of genome editing • CRISPR (cas9) genome editing in detail • Assays to detect off target cutting • Machine learning models to predict off target cutting • Discovering the necessary genome for Tdgf1 • Machine learning models of on target cutting • The limitations of base editing 3
CRISPR ( clustered regularly interspaced short palindromic repeats ) editing mechanics 4
Cas9 nuclease engaged in cutting Required PAM sequence limits available cut sites (NGG Cas9) 17-24 nucleotide RNA spacer complementary to target DNA sequence 5
Genome cuts resolve in two ways Desired outcome sequence 6
CRISPR is relevant as a therapeutic tool 7 https://www.hindawi.com/journals/bmri/2019/1369682/
CRISPR derivatives can implement many functions 8
How well does CRISPR find its way to a specific site? Characterizing off target effects with a genome wide assay 9
GUIDE-seq incorporates a 34-bp phosphothiorated double stranded DNA oligo (dsODN) into cut sites 10
GUIDE-seq identifies off target cuts 11
CIRCLE-seq reveals CRISPR cut sites genome wide 12 https://www-nature-com/articles/nmeth.4278.pdf
CIRCLE-seq reveals CRISPR cut sites genome wide (arrow is intended cut site) 13 https://www-nature-com/articles/nmeth.4278.pdf
How can we predict off target activity of a CRISPR based enzyme? 14
Recall our biological model What features should we use to predict off target effects? Required PAM sequence limits available cut sites 17-24 nucleotide RNA spacer complementary to target DNA sequence 15
Example CRISPR features for off target prediction • CROP-IT grades gRNA sequences dividing 23bp sequence into three regions with different weights, penalty scores for consecutive mismatched sites • CCTOP and MIT score considers positions and counts of mismatches • CFD (cutting frequency determination) emulates a large number of single base, deletion, and insertions in the gRNA and scores these with reference to validated gRNAs in a cellular assay 16
17
A deep neural network for classifying CRISPR recognition of a genomic site 10 4x1 RELU 10 4x2 on 10 4x3 output 18 10 4x5
Performance of different architectures on a 5x cross-validation on CRISPOR dataset 19
Train on CRISPOR dataset Test on GUIDE-seq 20
What bases are necessary for genome function? 21
Genome editing allow us to change genome sequence and observe the function of each base in a selected cellular context 22
Genome editing allow us to change genome sequence and observe the function of each base in a selected cellular context 23
Genome editing allow us to change genome sequence and observe the function of each base in a selected cellular context 24
Wh What are the key y genomic ic ele lements that are ne necessary for gene ne expr pression? n? Enhancer region Promoter region Coding sequence 25
Ne Nece cessar ary elements ts will depend upon on ce cell ty type – the b th binding s sites u used b by a g a given f fact actor c can an d depend upo upon n cell type pe (he here Tcf7l2) Bi Binding sites change across ti time ~50,000 binding sites for a typical TF ~650,000 TF Motifs 26
An An annotation of potential Tdgf1 1 ci cis-re regulation Histone modifications Dnase-I HS TF density Predictions of regulatory function based on indirect epigenomic measurements 27
Idea – break parts of the genome to see what is essential for gene expression Green Florescent Protein (GFP) lights up cell when gene is expressed Cell 1 Cell 2 Cell n - Native context measurement - High-throughput - Directly observe expression of target gene via GFP - Controlled delivery of only 1 gRNA per cell 28
Idea – determine parts of your computer necessary for Zoom by breaking parts 29
Idea – determine parts of your computer necessary for Zoom by breaking parts 30
Refinement – need fine grain resolution on what we break
We can use CRISPR genome editing to make localized genome alterations that are addressed by a guide RNA (gRNA) • often inefficient • efficient • designable • random indels • undesirable byproducts • highly heterogeneous • impractical beyond gene disruption • “genome vandalism” 32
Mu Multiplexed Editing Regulatory Assay (ME MERA) ex experimenta tal flow: 1. Put one gRNA in each cell that targets a location of interest 2. Use CRISPR to ablate the respective location in each cell 3. Sort cells by expression of GFP 4. Sequence gRNAs in each population to determine what locations are necessary, what locations are not necessary for GFP expression 33
Distribution of the log 10 ratio of GFP neg to bulk reads for all integrated gRNAs for Tdgf1 34
ME MERA enables systematic identification of required ci cis- re regulatory elements for Tdgf1 35
Te Testing of individual gR gRNAs suppo supports s requi quired d ci cis-re regulatory elements for Tdgf1 36
Ne Necessary genome me goes beyond know own annotations (T (Tdgf1 f1) Genomic regions sorted by importance 37
How can we predict the genotypes of on target CRISPR cuts? 38
The state of CRISPR genome editing • often inefficient • efficient • designable • random indels • undesirable byproducts • highly heterogeneous • impractical beyond gene disruption • “genome vandalism” 39
The state of CRISPR genome editing • often inefficient • efficient • designable • predictable indels • predictable byproducts • can be homogeneous • practical: repair of pathogenic alleles to wild-type • “genome art” 40
High-throughput genome-integrated assay of Cas9-mediated DNA repair • 96 target sites in largest previous study • Designed 1,872 target sites (55-bp) based on the human genome • Observed 1,262 unique genotypes / target site 41 Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.
Cas9 primarily causes microhomology deletions in genome-integrated and endogenous settings A microhomology deletion is a deletion with multiple equal-scoring alignments mESC Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 42
Cas9 primarily causes microhomology deletions in genome-integrated and endogenous settings mESC Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 43
Majority of repair products arise from microhomology-mediated end-joining (MMEJ) mESC Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 44
inDelphi predicts 90% of repair products from 3 major repair classes mESC Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 45
1-bp insertions copy the adjacent nucleotide Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 46
1-bp insertion frequency depends on local sequence context Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 47
inDelphi accurately predicts nearly all repair outcomes Input: Sequence, cutsite • Predicts 90% of observed repair outcomes • 70% at single-base resolution Training & testing on held-out cell-types • Median r = 0.87 on genotype prediction • Median r = 0.84 on indel length prediction 48
inDelphi accurately predicts frameshifts Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 49
Target sites yielding a single deletion repair genotype >50% of the time Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 50
Target sites yielding a single insertion repair genotype >50% of the time Weak microhomology Local sequence context Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 51
How can we use DNA cuts to restore function? 52
inDelphi predicts that 5% of gRNAs yield a single repair genotype the majority of the time Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 53
Pathogenic microduplications are efficiently repairable to wild-type with simple Cas9 cutting Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 54
Recommend
More recommend