modeling cancer cells using multi comics data
play

Modeling cancer cells using multi-comics data February 23, 2019 - PowerPoint PPT Presentation

Modeling cancer cells using multi-comics data February 23, 2019 The Second Korea-Japan Machine Learning Workshop Sun Kim Bioinformatics Institute Computer Science and Engineering Seoul National University BHI lab @ SNU 1 Cells, Genetic and


  1. Modeling cancer cells using multi-comics data February 23, 2019 The Second Korea-Japan Machine Learning Workshop Sun Kim Bioinformatics Institute Computer Science and Engineering Seoul National University BHI lab @ SNU 1

  2. Cells, Genetic and Epigenetic Elements literature Bio & Health Informatics Lab, SNU 2

  3. Our Strategies • Understanding biological sequences • Capturing a bigger picture • Modeling complex relationships • Multi-omics data integration • Driver gene identification • Dynamics from time-series data analysis

  4. Today I will present three of the recently submitted (unpublished) manuscripts towards this goal. (1) sequence level: Ranked k-spectrum kernel for comparative and evolutionary comparison of exons, introns, and CpG islands (2) transcript level: Cancer subtype classification and modeling by pathway attention and propagation, ( poster by Sangseon Lee ) and (3) epigenetic level: PRISM: Methylation Pattern-based, Reference- free Inference of Subclonal Makeup. ( poster by Dohoon Lee )

  5. Ranked k-spectrum kernel for comparative and evolutionary comparison of exons, introns, and CpG islands Sangseon Lee, Taeheon Lee, Yung-Kyun Noh, and Sun Kim Bio & Health Informatics Lab.

  6. Cells, Genetic and Epigenetic Elements literature Bio & Health Informatics Lab, SNU 6

  7. Introduction Motivation Methods & Materials Results Conclusion Biological sequence analysis • The heart of bioinformatics research • Alignment of biological sequences • Phylogeny • Gene prediction • Structure prediction

  8. Introduction Motivation Methods & Materials Results Conclusion Two type of methods for biological sequence analysis • Alignment-based method • Smith-Waterman or BLAST • Successfully used due to accuracy • Cost expensive • Hard to handle high throughput sequencing data due to variable length and amounts of sequences • Alignment-free method • Based on k-mer frequency vectors • Measure distance between two vectors by Euclidean distance or Kullback-Leibler discrepancy. → String kernel method

  9. K-mer based Alignment-free Sequence Analysis • Among others, two major issues: • Length of k-mer • Comparison of genome scale sequences

  10. DeepFam : Deep learning based alignment- free method for protein family modeling and prediction (ISMB 2018)

  11. Introduction Motivation Methods & Materials Results Conclusion String kernel method for comparative and evolutionary sequence comparison • Traditional String kernel method: k-spectrum kernel (Leslie, 2006) • Designed for protein sequence classification • Various expansions: mismatch, various k-length, and so on • Limited for comparative and evolutionary comparison of multiple species • Pairwise distance of two genomes → combine many pairwise distances is not straightforward • k-spectrum kernel is sensitive to over-represented k-mers • Propose New string kernel method: Ranked K-spectrum string (RKSS) kernel

  12. Introduction Motivation Methods & Materials Results Conclusion K-spectrum string kernel • On the input space 𝒴 of all finite length sequences of characters from alphabet 𝒝 , 𝒝 = 𝑚 (𝑚 = 4 for DNA) , a feature map from 𝒴 to ℝ 𝑚 𝑙 , Φ 𝑙 𝑦 = 𝜚 𝛽 𝑦 𝛽∈𝒝 𝑙 𝜚 𝛽 𝑦 : the number of times 𝛽 occurs in 𝑦 • K-spectrum string kernel 𝐿 𝑙 𝑦, 𝑧 = Φ 𝑙 𝑦 , Φ 𝑙 𝑧 • Kernel distance 𝐿 𝑙 𝑦, 𝑧 ෩ 𝐿 𝑙 𝑦, 𝑧 = 𝐿 𝑙 𝑦, 𝑦 𝐿 𝑙 𝑧, 𝑧 𝐿 𝑙 𝑦, 𝑦 + ෩ ෩ 𝐿 𝑙 𝑧, 𝑧 − 2෩ 𝐸 𝑙 𝑦, 𝑧 = 𝐿 𝑙 𝑦, 𝑧

  13. Introduction Motivation Methods & Materials Results Conclusion Ranked k-spectrum string (RKSS) kernel • Two main features of RKSS kernel • Build and use common k-mers template (= landmark ) to encapsulate information of common ancestry • Use of correlation in rank of k-mers instead of occurrence counts • RKSS kernel 𝑆𝑏𝑜𝑙 𝑦, 𝑧 = 𝑆𝐷(Φ 𝑙 𝑑𝑝𝑛𝑛𝑝𝑜 𝑦 , Φ 𝑙 𝑑𝑝𝑛𝑛𝑝𝑜 𝑧 ) 𝐿 𝑙 𝑆𝐷: the Kendall tau rank correlation 𝑑𝑝𝑛𝑛𝑝𝑜 : a feature map on the landmark Φ 𝑙 • Kernel distance 𝑆𝑏𝑜𝑙 𝑦, 𝑧 𝑆𝑏𝑜𝑙 𝑦, 𝑧 = 1 + 𝐿 𝑙 ෩ 𝐿 𝑙 2 𝑆𝑏𝑜𝑙 𝑦, 𝑦 + ෩ 𝑆𝑏𝑜𝑙 𝑧, 𝑧 − 2෩ 𝑆𝑏𝑜𝑙 𝑦, 𝑧 ෩ 𝑒𝑗𝑡𝑢 𝑦, 𝑧 = 𝐿 𝑙 𝐿 𝑙 𝐿 𝑙 𝑆𝑏𝑜𝑙 𝑦, 𝑧 1 − ෩ = 𝐿 𝑙

  14. Introduction Motivation Methods & Materials Results Conclusion Effect of rank information

  15. Introduction Motivation Methods & Materials Results Conclusion

  16. Introduction Motivation Methods & Materials Results Conclusion Phylogenetic tree reconstruction on exon, intron, and CpG island

  17. Human CpG Island Sequences Bio & Health Informatics Lab, SNU 17

  18. DNA Methylation http://www.ks.uiuc.edu/Research/methylation/ Bio & Health Informatics Lab, SNU 18

  19. DNA Methylation and Gene Silencing in Cancer Cells CpG island CGCG CG CG M CG M CG CG Normal 1 2 3 4 M CG M CG M CG M CG CG CG CG Cancer 1 2 3 4 X C: cytosine m C: methylcytosine Bio & Health Informatics Lab, SNU 19

  20. Introduction Motivation Methods & Materials Results Conclusion Measure Information contents on genomic regions using Landmark space

  21. Introduction Motivation Methods & Materials Results Conclusion Measure Information contents on genomic regions using Landmark space

  22. Introduction Motivation Methods & Materials Results Conclusion Conclusion • Using landmark and rank information of k-mers, we proposed new string kernel method for comparative and evolutionary sequence comparison. • Ranked k-mer spectrum string (RKSS) kernel • From two landmark-based experiments, • We demonstrated effectiveness of RKSS kernel on phylogeny reconstruction problem. • In addition, we found the relationship across the information contents in exons, introns, and CpG islands. • In terms of evolutionary information, the order of three region was like that: exon > CpG island > intron.

  23. Cancer subtype classification and modeling by pathway attention and propagation Sangseon Lee, Sangsoo Lim, Taeheon Lee, and Sun Kim

  24. Introduction Motivation Methods & Materials Results Conclusion Pathway: A prior knowledge for Bioinformatics Analysis • A graph-based representation of biological system Dimension Reduction Enrichment Test Subpath Mining Pathway Activity Inference

  25. Cancer subtype classification and modeling by pathway attention and propagation • What to do: • To model mechanisms of cancer subtypes in terms of biological pathways, • How to do this: • Graph convolutional network (GCN) modeling of each biological pathway • Integration of 287 GCNs using attention mechanisms . • To open up the black-box the GCN ensemble model, we used graph propagation technique to explain how pathway interact differently in cancer subtypes

  26. Introduction Motivation Methods & Materials Results Conclusion Two major points of modeling cancer subtypes with pathways RNA-seq Pathway Pathway #1 #2 Pathway #N A Comprehensive Useful Information Biological Mechanism

  27. Introduction Motivation Methods & Materials Results Conclusion Idea to address two challenges Pathway Aggregation with Encoding Pathway Information interactions • Graph Convolutional Network (GCN) • Open “black - box” using attention • Merge pathways by MLP, i.e. Fully Connected Layers • It is hard to interpret. “BLACK - BOX” • Solution: Attention!! • Consider pathway interactions by Network propagation

  28. Introduction Motivation Methods & Materials Results Conclusion What & How to Achieve Goal Input & Output • Modeling cancer subtypes with • Input considering • Gene expression profile with cancer subtype • • comprehensive biological mechanism Biological pathways • Interaction between pathways • Output • Classification of cancer subtypes • Importance of pathways with interaction information  Attention & Networkpropagation

  29. Introduction Motivation Methods & Materials Results Conclusion Workflow

  30. Introduction Motivation Methods & Materials Results Conclusion Biological Interpretation of Attention and Network propagation Multi attention based ensemble (MAE) Network propagation on patient-specific pathway network

  31. Introduction Motivation Methods & Materials Results Conclusion Dataset

  32. Introduction Motivation Methods & Materials Results Conclusion Performance comparison of Proposed model

  33. Introduction Motivation Methods & Materials Results Conclusion Heatmap of the attention weight of GCN+MAE model on BRCA data.

  34. Introduction Motivation Methods & Materials Results Conclusion Pathway Network for BRCA subtypes by Network propagation Natural killer cell mediated cytotoxicity Rescued by Network propagation Retrograde endocannabinoid signaling Apoptosis Proliferation Neovascularization and angiogenesis Metastasis formation Autophagy

Recommend


More recommend