bioinformatics pipeline for revealing tumour heterogeneity
play

Bioinformatics pipeline for revealing tumour heterogeneity Mustafa - PowerPoint PPT Presentation

Bioinformatics pipeline for revealing tumour heterogeneity Mustafa Anl Tuncel Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anl Tuncel | 10.07.19 | 1 Mustafa


  1. Bioinformatics pipeline for revealing tumour heterogeneity Mustafa Anıl Tuncel Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel | 10.07.19 | 1

  2. Mustafa Anıl Tuncel Software Engineer @ ETH Zürich Research interests § Data analysis workflows § Bioinformatics § Machine learning § Recommender systems anilbey /in/aniltuncel anilbey Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel | 10.07.19 | 2

  3. Outline § Background § Biology prior § Single cell sequencing technologies § Mutations on DNA § DNA mutation trees § Tree model § MCMC moves § Pipeline § Snakemake § HDF5 Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel | 10.07.19 | 3

  4. What is a cell? Figure 1. Representation of cell, tissue, organ, system and organism. Retrieved from https://www.colscol.com/body-system/ Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel | 10.07.19 | 4

  5. DNA from single cells Figure 2. DNA structure. Retrieved from https://www.interleucina.org/ Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel | 10.07.19 | 5

  6. Structural mutations on DNA § Copy number variations § Deletion § Duplication § Mutations from DNA of single cells § Heterogeneous § Have ancestors, children, siblings Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel | 10.07.19 | 6

  7. Trees to represent structural mutations DNA: Region 1 Region 2 Region 3 Region 4 Region 5 root R1 : +1 R2 : -1, R5 : +2 R4 : -1 R3:+1 R1 : +1 R3 : -1 Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel | 10.07.19 | 7

  8. Learning the tree • Dirichlet-multinomial model with overdispersion • We target maximising the tree posterior with an MCMC scheme root • Prune-reattach • Label swap R1 : +1 • Add/remove events • Add/remove node • Condense/split node • Genotype preserving prune-reattach R2 : -1, R5 : +2 R4 : -1 R3:+1 R1 : +1 R3 : -1 Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel | 10.07.19 | 8

  9. Prune-reattach After Before root root R1 : +1 R1 : +1 R2 : -1, R5 : +2 R2 : -1, R4 : -1 R4 : -1 R3:+1 R3:+1 R1 : +1 R3 : -1 R5 : +2 R1 : +1 R3 : -1 Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel | 10.07.19 | 9

  10. Add / remove node After Before root root R1 : +1 R1 : +1 R2 : -1, R5 : +2 R5 : +2 R4 : -1 R2 : -1, R4 : -1 R3:+1 R3:+1 R1 : +1 R3 : -1 R1 : +1 R3 : -1 R1 : +1 Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel | 10.07.19 | 10

  11. Condense / split node After Before root root R1 : +1 R1 : +1 R5 : +2, R2 : -1, R5 : +2 R2 : -1, R4 : -1 R4 : -1 R1:+1 R3:+1 R3:+1 R1 : +1 R3 : -1 R3 : -1 Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel | 10.07.19 | 11

  12. Tree learned from mouse data Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel | 10.07.19 | 12

  13. What else is required? § Reproducibility in research § Scalability § Support for Multiple programming languages § Multi processing § Cluster execution § Resources management § Statistics about resource usages Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel | 10.07.19 | 13

  14. Workflow management system Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel | 10.07.19 | 14

  15. Snakemake • A Pythonic workflow management system • Extends the Python syntax • Follows the GNU make paradigm • Workflows are defined in terms of rules that define how to create output files from input files • Dependencies between the rules are determined automatically • Benefits from Python libraries • Automated logging of the status • Suspend/resume workflow • A general-purpose workflow management system for any discipline Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel | 10.07.19 | 15

  16. Example: read mapping Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel | 10.07.19 | 16

  17. Example: read mapping (generalised) Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel | 10.07.19 | 17

  18. DAG of jobs Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel | 10.07.19 | 18

  19. Snakefile Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel | 10.07.19 | 19

  20. Config file Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel | 10.07.19 | 20

  21. Cluster execution • Configurable for LSF/BSUB scheduler • Allows scaling without changing the workflow Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel | 10.07.19 | 21

  22. HDF5 HDF = H ierarchical D ata F ormat § Hierarchical data format v5 Metadata Data § Binary files Dataspace • Easy to manage multiple datasets Rank Dimensions • Keeps metadata with data 3 Dim_1 = 4 Dim_2 = 5 • Fast I/O operations & storage Dim_3 = 7 space optimization (compressed binary files) Datatype • Platform/language independent Integer Attributes • Self describing Storage info Time = 32.4 • No need to load whole data Chunked Pressure = 987 Compressed Temp = 56 Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel | 10.07.19 | 22

  23. HDF5 wrappers in Python h5py is a thin, pythonic wrapper around the HDF5 Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel | 10.07.19 | 23

  24. Outline Future work § Background • Publish the method • Compare to clustering methods § Biology prior • Evaluate on simulated data § Single cell sequencing technologies • Show results on real data § Mutations on DNA § DNA mutation trees • Wrap up the workflow as a Python § Tree model package § MCMC moves • Do the C++ bindings § Pipeline • Open source it § Snakemake § HDF5 Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel | 10.07.19 | 24

  25. Thank you! Mustafa Anıl Tuncel Software Engineer ETH Zurich anilbey /in/aniltuncel anilbey mtuncel@ethz.ch

Recommend


More recommend