parallel and hybrid evolutionary algorithm in python
play

Parallel and Hybrid Evolutionary Algorithm in Python E. Kieffer UL - PowerPoint PPT Presentation

Parrallel Computing & University of Luxembourg Optimization Group Parallel and Hybrid Evolutionary Algorithm in Python E. Kieffer UL HPC Userssession -- UL HPC school 2017 Contents n Context and motivation n Clustering of the


  1. Parrallel Computing & University of Luxembourg Optimization Group Parallel and Hybrid Evolutionary Algorithm in Python E. Kieffer UL HPC Users’session -- UL HPC school 2017

  2. Contents n Context and motivation n Clustering of the Parkinson Disease Map n Bi-level Clustering approach n Python tools on the UL HPC Platform n CPLEX solver n SCOOP library n DEAP library n Experiments & Validation n Experiments on the Parkinson Disease Map n Comparison with Hierarchical Clustering

  3. CONTEXT & MOTIVATION

  4. Parkinson Disease Map • Large (hyper-)Graph • Extract Knowledge • First experiments with standard Clustering approach • Hierarchical Clustering • Several metric (e.g. GO, NET, EU) • Hard to combine

  5. Bi-level Clustering n Clustering often based on a two phase algorithm: n Find cluster representatives n Assign data to clusters n Generally the same metric is used for both steps n Consider these two steps as two nested optimization problems with different metrics n Metric: n Euclidean distance n Network distance n Distance based on Gene/Disease Ontology n Use Evolutionary Algorithm (EA) to solve the Bi-level Clustering problem n Use MOEA to detect the number of clusters

  6. Bi-level Optimization n Bi-levels ßà Nested problems n A problem constraining another one à NP-hard even for convex levels Upper-level Lower-level

  7. Bi-level Clustering

  8. Parallel and hybrid EA HPC

  9. PYTHON TOOLS ON THE UL HPC PLATFORM

  10. Using CPLEX on the UL HPC n IBM ILOG CPLEX Optimizer's mathematical programming technology. n One of the most efficient solver on the market: n CPLEX available for HPC user with IBM Academic Initiative membership n Need first to register to the IBM Academic Initiative: n https://developer.ibm.com/academic/ n Forward the membership confirmation mail to the HPC admins n To use CPLEX on the cluster: n $ module use $PROJECTWORK/cplex/soft/modules $ module load CPLEX

  11. Parallel Evaluations with SCOOP n Scalable COncurrent Operations in Python n is a distributed task module n concurrent parallel programming n on various environments, from heterogeneous grids to supercomputers n Command to execute a python script using SCOOP n python -m scoop --hostfile $OAR_NODEFILE -n 16 --ssh-executable “oarsh” hello.py n Parameters: n --hostfile: path to the file contains all hostnames n --ssh-executable: the command to access nodes (here oarsh) n -n: the number of workers from __future__ import print_function from scoop import futures import socket def helloWorld ( value ): Hello.py return "Hello World from{0}" . format ( socket . gethostname ()) if __name__ == "__main__" : returnValues = list ( futures . map ( helloWorld , range ( 16 ))) print( "\n" . join ( returnValues ))

  12. Example

  13. DEAP library for Evolutionary Computation in Python n https://github.com/DEAP/deap n Rapid prototyping and testing of ideas n Parallelization mechanism based on SCOOP n CMA-ES algorithm

  14. EXPERIMENTS & VALIDATION

  15. Clustering results

  16. Bi-level Clustering Enrichment analysis: hypergeometric test Enrichment analysis: hypergeometric test % '(% & )(& 𝑄 𝑌 = 𝑙 = n genes in a cluster ' N genes altogether m genes ) (background) in a GO term k genes in a cluster Adapted from: Florian Markowetz and in a GO term Network Biology Lent 2010 A cluster represents a sample of n genes from a total population of N genes. It is know that the considered GO term contains m genes. What is the probability to have the same k genes in our cluster and in the considered GO term ?

  17. Bi-level Clustering Enrichment of Disease Ontology terms p value cutoff 0.001 350 distance 01_net_go_ward 300 02_eu_go_ward 03_eu_net_ward unique_terms 250 04_clusteringNETEU 05_clusteringEUNET 06_clusteringGOEU 200 07_clusteringEUGO 08_clusteringGONET 09_clusteringNETGO 150 10_expert 100 2 10 20 30 40 50 60 70 80 90 clusters

  18. Conclusions n Knowledge extraction on the Parkinson Disease MAP n Bi-level clustering model n Solve the model with Hybrid and Parallel EA n Experiments required a lot of resources à UL HPC Platform n Hybrid à CPLEX solver n Parallel à SCOOP library for parallel evaluations n Evolutionary Computation à DEAP library

  19. Questions ? Thank you for your attention PS9 (13h30 – 15h30): Advanced Prototyping with python presented by Clement Parisot

Recommend


More recommend