Genotype likelhoods Anders Albrechtsen The bioinformatic Centre, - PowerPoint PPT Presentation

Genotype likelhoods Anders Albrechtsen The bioinformatic Centre, Copenhagen University February 13, 2018

Mapped reads My definitions (The literature is not consistent) Depth The number of reads that maps to a position Counts The number of different alleles mapped to a position Coverage The fraction of the genome (region) with data

why don’t we have genotypes? This is not like Sanger sequencing Sanger Both alleles are amplified and sequenced at the same time. NGS Each allele is sequenced separately and the allele are sampled with replacement

why don’t we have genotypes? Question? Assuming an error rate of 1% • Is the individual heterozygous C/T?

What do we expect P(2 or less minor bases | heterozygous) = 0.065 assuming heterozygous 0.20 0.15 probability 0.10 0.05 0.00 0 1 2 3 4 5 6 7 8 9 10 11 Number of Ts

What do we expect P(2 or more errors | homozygous) = 0.00015 assuming homozygous 0.8 0.6 probability 0.4 0.2 0.0 0 1 2 3 4 5 6 7 8 9 10 11 Number of Errors

why don’t we have genotypes? Question? Assuming an error rate of 1% • Is the individual heterozygous C/T? • P(2 or more errors | homozygous) = 0.00015 • P(2 or less minor bases | heterozygous) = 0.065

why don’t we have genotypes? Question? Assuming an error rate of 1% • Is the individual heterozygous C/T? • P(2 or more errors | homozygous) = 0.00015 • P(2 or less minor bases | heterozygous) = 0.065 • on average there is about 1 heterozygous site per 1000 bases

Genotype likelihoods Summarise the data in 10 genotype likelihoods A C G T bases (b): A 1 2 3 4 TCCTTTTTTTT C 5 6 7 ֌ quality scores (Q): G 8 9 GHSSBBTTTTG T 10 The likelihood P ( Data | G = { A 1 , A 2 } ) ∝ P ( X | G = { A 1 , A 2 } ) = P ( X | G ) where A ∈ { A , C , G , T }

Estimating genotype likelihoods GATK (McKenna et al. 2010) n n � 1 2 P ( b i | A 1 ) + 1 � � � P ( X | G ) ∝ P ( b i | A 1 , A 2 ) = 2 P ( b i | A 2 ) i =0 i =0 � b � = A ǫ where P ( b | A ) = 3 b = A , 1 − ǫ where G = { A 1 , A 2 } , b is the observed base and ǫ is the probability of error from the quality score.

Example of genotype likelihood calculations b Qasci Qscore ǫ p ( b i | T ) p ( b i | C ) p ( b i | G / A ) T G 38 0.00016 1 - 0.00016 5.3e-05 5.3e-05 C H 39 0.00013 4.2e-05 1 - 0.00013 4.2e-05 C S 50 1e-05 3.3e-06 1 - 1e-05 3.3e-06 T S 50 1e-05 1 - 1e-05 3.3e-06 3.3e-06 T B 33 5e-04 1 - 5e-04 0.00017 0.00017 T B 33 5e-04 1 - 5e-04 0.00017 0.00017 T T 51 7.9e-06 1 - 7.9e-06 2.6e-06 2.6e-06 T T 51 7.9e-06 1 - 7.9e-06 2.6e-06 2.6e-06 T T 51 7.9e-06 1 - 7.9e-06 2.6e-06 2.6e-06 T T 51 7.9e-06 1 - 7.9e-06 2.6e-06 2.6e-06 T G 38 0.00016 1 - 0.00016 5.3e-05 5.3e-05 n n � 1 2 P ( b i | T ) + 1 � � � P ( Data | G = TC ) ∝ P ( b i | T , C ) = 2 P ( b i | C ) i =0 i =0

Genotype likelihoods Other methods samtools/H. Li et al. 2008 quality scores, quality dependency soapSNP/R. Li et al. 2009 quality scores, quality dependency GATK/McKenna et al. 2010 quality scores Kim et al. 2010? type specific errors

Genotype calling 10 genotype likelihoods A C G T A 0.0 0.001 0.0 0.01 C 0.02 0.001 0.12 G 0.0 0.003 T 0.001 simple genotype callers - Maximum likelihood ML I Choose the genotype with the largest likelihood arg max G P ( X | G ) ML II only call a genotype if the likelihood with much better than the second best e.g. a likelihood ratio > 2

Genotype likelhoods Anders Albrechtsen The bioinformatic Centre, - PowerPoint PPT Presentation

Genotype likelhoods Anders Albrechtsen The bioinformatic Centre, Copenhagen University February 13, 2018 Mapped reads My definitions (The literature is not consistent) Depth The number of reads that maps to a position Counts The number of

Design of WHO Genotype Panels for HBsAg and HBV-DNA and of WHO anti-HBc Standard WHO Genotype

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype

Genotype imputation accuracy with different reference panels Guan-Hua Huang and Yi-Chi Tseng

Lecture 3: Biology Basics Continued Spring 2020 January 28, 2020 Genotype/Phenotype Phenotype:

Lecture 3: Biology Basics Continued Fall 2019 September 3, 2019 Genotype/Phenotype Phenotype:

Application of the GGE biplot to Application of the GGE biplot to evaluate Genotype, Environment

Rick Sturm Pigmentation /DNA repair /Signalling Epigenetics MC1R /Diet Genotype Senescence

HIGH RATE OF SUSTAINED VIROLOGIC RESPONSE IN PATIENTS WITH HCV GENOTYPE-1A INFECTION: A PHASE 2

A Whole-Cell Computational Model Predicts Phenotype from Genotype Computational Models in

HBV Genotype Panel AREVIR-GenaFor-Meeting Bonn, 23. 24. April 2009 Michael Chudy Section of

Clinical Resolution and CSF Viral Suppression Following Switching to a Genotype-guided South

Genotype imputation accuracy with different reference panels Guan-Hua Huang and Yi-Chi Tseng

Definitions What is a genotype? The genetic make up of an organism What is a phenotype?

The Apportionment & Evolution of Variation February 16, 2017 Suhana Posani ANP330 Key

Case 4 Goal: What does team plan for Phase 2b trial? Phase 2A study PK results by CYP2C8

Practical tools for exploring data and models Hadley Alexander Wickham The process of data

Selecting Hypopharyngeal The following personal financial relationships with Surgery in OSA

Surgery of the Hypopharynx So Many Choices Medical Advisory Board ReVENT Medical Medical

Feature Grouping as a Stochastic Regularizer for High Dimensional Structured Data Sergl

Identifying Dormant Functionality in Malware Programs Paolo Milani Comparetti Vienna University

Increasing Clopidogrel Based on CYP2C19 Genotype in Patients with Cardiovascular Disease JL Mega,

Do Do NO NOT m measure co correlat ated observables, , but tr train ain an an Artif

Signature Biometrics Prof. Julian FIERREZ Universidad Autonoma de Madrid - SPAIN

Genotype likelhoods Anders Albrechtsen The bioinformatic Centre, - PowerPoint PPT Presentation

Genotype likelhoods Anders Albrechtsen The bioinformatic Centre, Copenhagen University February 13, 2018 Mapped reads My definitions (The literature is not consistent) Depth The number of reads that maps to a position Counts The number of

Design of WHO Genotype Panels for HBsAg and HBV-DNA and of WHO anti-HBc Standard WHO Genotype

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype

Genotype imputation accuracy with different reference panels Guan-Hua Huang and Yi-Chi Tseng

Lecture 3: Biology Basics Continued Spring 2020 January 28, 2020 Genotype/Phenotype Phenotype:

Lecture 3: Biology Basics Continued Fall 2019 September 3, 2019 Genotype/Phenotype Phenotype:

Application of the GGE biplot to Application of the GGE biplot to evaluate Genotype, Environment

Rick Sturm Pigmentation /DNA repair /Signalling Epigenetics MC1R /Diet Genotype Senescence

HIGH RATE OF SUSTAINED VIROLOGIC RESPONSE IN PATIENTS WITH HCV GENOTYPE-1A INFECTION: A PHASE 2

A Whole-Cell Computational Model Predicts Phenotype from Genotype Computational Models in

HBV Genotype Panel AREVIR-GenaFor-Meeting Bonn, 23. 24. April 2009 Michael Chudy Section of

Clinical Resolution and CSF Viral Suppression Following Switching to a Genotype-guided South

Genotype imputation accuracy with different reference panels Guan-Hua Huang and Yi-Chi Tseng

Definitions What is a genotype? The genetic make up of an organism What is a phenotype?

The Apportionment &amp; Evolution of Variation February 16, 2017 Suhana Posani ANP330 Key

Case 4 Goal: What does team plan for Phase 2b trial? Phase 2A study PK results by CYP2C8

Practical tools for exploring data and models Hadley Alexander Wickham The process of data

Selecting Hypopharyngeal The following personal financial relationships with Surgery in OSA

Surgery of the Hypopharynx So Many Choices Medical Advisory Board ReVENT Medical Medical

Feature Grouping as a Stochastic Regularizer for High Dimensional Structured Data Sergl

Identifying Dormant Functionality in Malware Programs Paolo Milani Comparetti Vienna University

Increasing Clopidogrel Based on CYP2C19 Genotype in Patients with Cardiovascular Disease JL Mega,

Do Do NO NOT m measure co correlat ated observables, , but tr train ain an an Artif

Signature Biometrics Prof. Julian FIERREZ Universidad Autonoma de Madrid - SPAIN

The Apportionment & Evolution of Variation February 16, 2017 Suhana Posani ANP330 Key