How to build a large-scale biological simulator CERN openlab summer student lecture Lukas Breitwieser
Help life scientists understand (patho)physiological processes Source: Kaiser, University of Newcastle, UK; www.dynamic-connectome.org
From atoms to organisms
Agent-based simulations
Platform
BioDynaMo design goals ● Modular system that supports different fields (e.g. neuroscience, oncology, immunology, ...) ● Support large-scale biological simulations ● Hide complexity of parallel and distributed computing ● Promote reproducibility of results
Is this a good research idea? ● Why is this question important? – Most ideas fail – Good ones take a long time to implement – → Terminate bad ideas quickly ● How to determine which ideas to pursue? – Heilmeier catechism ● What are you trying to do? Articulate your objectives using absolutely no jargon. ● How is it done today, and what are the limits of current George H. Heilmeier practice? ● What is new in your approach and why do you think it will Slide credit: Bill Dally; https://www.darpa.mil/work-with-us/heilmeier-catechism be successful?
How to answer all these questions? Levels of transformation Research Question Computational Model BioDynaMo Now Timeline System Software Hardware Electrons ● Look back ● Look up ● Look forward ● Loop down
Look back ● Literature review ● How to review a research paper? – Summary ● What is the problem the paper is trying to solve? What are the key ideas of the paper? Key insights? ● What is the key contribution to literature at the time it was written? ● What are the most important things you take out from it? – Strengths – Weaknesses – Can you do better? – What have you learned/enjoyed/disliked in the Slide credit: Onur Mutlu
Look forward ● Research is a moving target Aim here Inspired by: Bill Dally, Moving the needle
Look up ● What do users expect from the Research Question Computational Model system? BioDynaMo System Software ● Which workflows are they used to? Hardware Electrons ● Which technologies are they familiar with? ● What kind of models will they run?
Look down ● Abstraction: A higher level only needs to know about the interface to the lower level, not how the lower level is implemented ● Then, why would you want to know what goes on underneath? – The program you wrote is running slow? – The program you wrote does not run correctly? – ... Source: Onur Mutlu; Andrzej Nowak; http://www.iue.tuwien.ac.at/phd/weinbub/dissertationsu16.html
Design tradeoffs
Software Engineering Best Practices
Testing & Continues Integration ● Essential to keep code base maintainable – Refactoring ● Reduces the risk to “touch” others code ● Protect reputation – Ensure that software installs fine on supported systems and demos work ● Continues integration
Follow a styleguide ● Set of guidelines and best practices which improve readability Avoid and maintainability of a code base ● Code is more often read then (re)written → Important that a developer quickly understands a piece of code ● Use automation Source: https://www.reddit.com/r/badcode/comments/bjsdyc/my_teach_kees_getting_mad_that_i_never_properly
Use existing libraries ● Instead of copy pasting code from a textbook, or stackoverflow – Correctness – Development effort – Maintenance effort ● Questions to answer before adopting a library – Is the license compatible? – Is it actively maintained? – Does it have an active user community? – How big is the library? – How many dependencies does it have?
Manage scope ● Lifecycle costs of applications over 10 years Slide credit: Dr. Marc Brandis
Advice on debugging ● Remove complexity ● Isolate the issue ● Avoid ad-hoc solutions; find the root cause 5 why’s example from Uber: – Why did the issue happen? --> A bug was committed as part of the code. – Why did the bug not get caught by someone else? --> The code reviewer did not notice that the code change could cause such an issue. – Why did we depend on only a code reviewer catching this bug? ---> Because we don't have an automated test for this use case. Source: https://blog.pragmaticengineer.com/operating-a-high-scale-distributed-system/
Refactor ● Simplify program while running all the tests ● Because – We all violate our own best practices from time to time. – A reliable, maintainable system is not built overnight. ● Enabled by testing and continues integration
Some examples that need refactoring Source: https://www.reddit.com/r/badcode
BioDynaMo Implementation
BioDynaMo overview
BioDynaMo core concepts Simulation Algorithm Simulation Objects Local Neighborhood NeuriteElement Cell NeuronSoma Biology Modules Event UID 123 UID 123 Copy to new Remove from existing Move Divide st daughter 1 x x UID 456 x x Cell division event nd daughter Grow 2 Secrete substance into extracellular Matrix
Simulation objects
Spatial organization Source: Ahmad Hesam
Biological behavior
Physical processes ● Mechanical ● Diffusion interactions
Performance ● Minimize serial part of the application – Amdahl’s law https://en.wikipedia.org/wiki/Amdahl%27s_law ● Load balance ● Optimize data access patterns ● Avoid unnecessary data movement ● Minimize synchronization ● Use caches ● Pitfalls when measuring performance http://htor.inf.ethz.ch/publications/img/hoefler-scientific-benchmarking_slides.pdf From: Scalability! But at what COST!
Current status ● Modular simulation engine ● Fully parallelized with OpenMP ● GPU & FPGA implementation for mechanical interactions using CUDA and OpenCL ● First version of distributed runtime based on the framework Ray ● ROOT I/O for storage of simulation results and snapshots ● Visualization using ParaView and
Demos
“Hello World” Simulation
Chemotaxis
Soma Clustering 1/2 Simulation at the end ● Simulation at timestep 0 ● As expected, cells form ● Cells are color coded by their ● clusters based on their type type
Soma clustering 2/2
Tumor concept 1/2 Slide credit: Jean De Montigny
Tumor concept 2/2 Slide credit: Jean De Montigny
Neuroscience Demo
Overview Image: https://en.wikipedia.org/wiki/File:Brainmaps-macaque-hippocampus.jpg used under CC Attribution 3.0
Model NeuriteElement NeuronSoma
Simulation Single pyramidal cell ● Neurite elements are colored ● based on their diameter Simulation: Jean De Montigny
Animation Simulation: Jean De Montigny
Comparison with real neurons Simulation and Analysis: Jean De Montigny
Large-Scale Simulation ● 80k Neurons ● ~2M simulation objects
Questions? Lukas.Breitwieser@cern.ch
Recommend
More recommend