amino acids in immunogenetic studies
play

Amino Acids in Immunogenetic Studies Richard M. Single Department of - PowerPoint PPT Presentation

Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont Outline HLA background and nomenclature Asymmetric Linkage Disequilibrium (ALD)


  1. Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

  2. Outline • HLA background and nomenclature • Asymmetric Linkage Disequilibrium (ALD) – Motivation, Definition & Example • Amino acid level analyses of HLA disease associations – SFVT Analysis & Pairwise allele level analyses – Conditional Haplotype analyses & ALD • Identifying units of selection – ALD as a tool

  3. HLA molecules are cell-surface proteins that present peptide fragments to T-cells HLA class II HLA class I TCR TCR = peptide fragment     T CR = T -cell receptor  -m =  microglobulin     -m • HLA molecules bind specific sets of peptides (based on structure) • Any given HLA allele codes to present a subset of available peptides to T-cells

  4. HLA Allele Nomenclature HLA-A * 24 : 02 : 01 : 02 : L Field 2 Field 3 Expression Field 4 Locus Field 1 (4-Digit) (6-Digit) N = null (8-Digit) (2-Digit) Peptide level Nucleotide level L = low Intron level Serological level (amino acid (3’ or 5’ [silent] S = soluble (where possible) difference) … (synonymous polymorphism) substitutions) • For most analyses, we want to distinguish among unique peptide sequences, i.e., 2 fields (“4 - digit”) level • This level of resolution treats alleles with the same peptide sequence for exons 2 & 3 (class I) or exon 2 (class II) as being equivalent [“ binning ” alleles]

  5. HLA Nomenclature and why it matters • Challenges for HLA data management and analysis – The HLA genes are very polymorphic; – HLA nomenclature is complicated; – There are multiple ways to generate HLA data; – All common typing systems generate ambiguous data; – There are multiple ways to report alleles and ambiguities;  These issues make meta-analyses of HLA data from different sources very difficult.

  6. Extending STREGA to Immunogenomic Studies • The STrengthening the REporting of Genetic Association studies (STREGA) statement provides community-based data reporting and analysis standards for genomic disease association studies • The IDAWG (immunogenomics.org) has proposed an extension of STREGA: STrengthening the REporting of Immunogenomic Studies (STREIS)

  7. From STREGA to STREIS Extensions to the STREGA guidelines for immunogenomic data include: • Describing the system(s) used to store, manage, and validate genotype and allele data • Documenting all methods applied to resolve ambiguity • Defining any codes used to represent ambiguities - e.g., NMDP codes - A*0201/0209/0266 = A*02AJEY - A*0201/0209/0266/0275/0289 = A*02BSFJ • Describing any binning or combining of alleles into common categories - e.g., G-codes A*0201/ 0209/ 0243N/ 0266/ 0275/ 0283N/ 0289 = “ A020101g ” - • Avoiding the use of subjective terms (e.g. high-resolution typing), that may change over time

  8. Resources for HLA Data Validation & Analysis • Imm unology Database and Analysis Port al (www.ImmPort.org) Developed under the Bioinformatics Integration Support Contract (BISC) for NIH, NIAID, & DAIT (Division of Allergy, Immunology, and Transplantation) – Data validation pipeline – Analysis tools – Standardized ambiguity reduction tools – Data from a large number of immunogenomic studies • ImmunoGenomics Data Analysis Working Group (www.immunogenomics.org) (www.IgDAWG.org) An international collaborative group working to … – facilitate the sharing of immunogenomic data (HLA, KIR, etc.) and – foster consistent analysis and interpretation of immunogenomic data

  9. Outline • HLA background and nomenclature • Asymmetric Linkage Disequilibrium (ALD) – Motivation, Definition & Example • Amino acid level analyses of HLA disease associations – SFVT Analysis & Pairwise allele level analyses – Conditional Haplotype analyses & ALD • Identifying units of selection – ALD as a tool

  10. Asymmetric Linkage Disequilibrium (ALD) - Standard LD measures give an incomplete description of the correlation of genetic variation at two loci when there are different numbers of alleles at the loci. - We developed a pair of conditional asymmetric LD (ALD) measures that more accurately capture this information. - For disease association studies, the ALD can help to identify when stratification analyses can be applied to detect primary disease predisposing genes. - For evolutionary studies, the ALD can be informative for the study of forces such as selection acting on individual amino acids, or other loci in high LD. - For SNP studies, ALD measures can be used for analyses of LD between haplotype blocks, for SNP – gene LD, and for haplotype block – gene LD.

  11. Linkage Disequilibrium (LD) Measures The two most common measures of the strength of LD are: (1) the normalized measure of the individual LD values, namely D ij ' = D ij / D max (Lewontin 1964); and (2) the correlation coefficient r for bi-allelic data, which is most often reported as r 2 = D 2 / (p A1 p A2 p B1 p B2 ). r =1 only when the allelic variations at the two loci show 100% correlation I J     Their multi-allelic extensions are: D p q D ij i j   i 1 j 1 1   I J 2   2  D p q  1    ij i j 2 2 2 X N        i 1 j 1 LD  W         n   min( I 1 J 1) min( I 1 J 1)    

  12. Asymmetric LD measures: W A/B and W B/A • When there are different numbers of alleles at two loci, the direct correlation property for the r measure is not retained. • The asymmetric LD (ALD) measures more accurately reflect covariation at two loci. W A/B and W B/A describe variation observed at the 1 st locus conditioned on the 2 nd - • Example: ( two and three alleles at the A and B loci) f(A 1 B 1 ) = 0.3, f(A 2 B 2 ) = 0.5, f(A 2 B 3 ) = 0.2, W n = 1, W A/B = 1 and W B/A = 0.73 , There is variation at the B locus on haplotypes containing the A 2 allele  there is not 100% correlation. - ALD measures indicate that, with appropriate sample size, stratification analyses could be carried out for some comparisons. W n = 1 could result in passing over these data for conditional analyses. -

  13. Standard LD measures D’ and Wn Standard LD measures (overall D’ & Wn ) assume/force symmetry, even though with >2 alleles per locus that is not the case Data Source: Immport Study#SDY26: Identifying polymorphisms associated with risk for the development of myopericarditis following smallpox vaccine

  14. Asymmetric Linkage Disequilibrium (ALD) Interpretation: ALD for HLA-DRB1 conditioning on HLA-DQA1 W DRB1 / DQA1 = .58 ALD for HLA-DQA1 conditioning on HLA-DRB1 W DQA1 / DRB1 = .95 The overall variation for DRB1 is relatively high given specific DQA1 alleles. ALD row gene conditional on column gene The overall variation for DQA1 is relatively low given specific DRB1 alleles.

  15. Asymmetric Linkage Disequilibrium (ALD) Table 1. Linkage disequilibrium and genetic diversity measures Definition of Measures a Description 1. Single locus homozygosity ( F ) b F A =  i p Ai 2 F A/Bj =  i ( f ij / p Bj ) 2 2. Haplotype specific homozygosity ( HSF ) c 3. Overall weighted HSF values d F A/B =  j ( F A/Bj ) ( p Bj ) = F A +  i  j D ij 2 / p Bj F A/B (and F B/A ) 4. Multi-allelic ALD e squared 2 = ( F A/B − F A ) / (1− F A ) W A/B W A/B (and W B/A ) Thomson and Single(2014) Genetics

  16. Asymmetric Linkage Disequilibrium (ALD) Table 1. Linkage disequilibrium and genetic diversity measures Definition of Measures a Description 1. Single locus homozygosity ( F ) b F A =  i p Ai 2 F A/Bj =  i ( f ij / p Bj ) 2 2. Haplotype specific homozygosity ( HSF ) c 3. Overall weighted HSF values d F A/B =  j ( F A/Bj ) ( p Bj ) = F A +  i  j D ij 2 / p Bj F A/B (and F B/A ) 4. Multi-allelic ALD e squared 2 = ( F A/B − F A ) / (1− F A ) W A/B W A/B (and W B/A ) If both loci are bi-allelic: 2 = [  i  j ( D ij 2 / p Bj )] / (1 − F A ) = D 2 / (p A1 p A2 p B1 p B2 ) = r 2 , since D 11 = − D 12 = − D 21 = D 22 = D W A/B Thomson and Single(2014) Genetics

  17. Other Conditional Measures of LD • Other measures of LD that are conditional have been proposed (Nei and Li, 1980; Chakravarti et al, 1984; Hudson, 1985; Kaplan and Weir, 1992; Guo SW, 1997). - They measure association between alleles at a marker locus (locus B) and alleles at a disease locus (locus A). - They were developed to account for study designs in which individuals are not randomly sampled from a single population, but where sampling intensity varies within disease categories. They are equivalent to Somer’s D statistic defined on the contingency table - relating two categorical variables • In contrast, our statistic is a population-based measure that does not depend on a specific patient sampling scheme.

  18. ALD & tag-SNPs in the HLA region • DeBakker et al. (2006) identified tag-SNPs based on r 2 for SNPs with recoded HLA alleles (recoded as presence/absence of each specific HLA allele) DeBakker et al. (2006) Nature Genetics

  19. ALD & tag-SNPs in the HLA region Thomson and Single(2014) Genetics

Recommend


More recommend