FINAL WORKSHOP OF GRID PROJECTS, “PON RICERCA 2000-2006, AVVISO 1575” 1 Parameters estimate in metabolic networks reconstruction G. Aprea 1 , G. Licciardello 2 , and V. Rosato 3 1 ENEA, Via del Vecchio Macello, 80055 Portici (Naples), Italy, giuseppe.aprea@gmail.com 2 Science and Technology Park of Sicily, stradale V. Lancia 57, z.i. Blocco Palma I, 95121 Catania, Italy, gralicci@unict.it 3 ENEA, Computinig and Modelling Unit, Via Anguillarese 301, 00123 S.Maria di Galeria (Rome), Italy, rosato@casaccia.enea.it Abstract —In this paper we describe the im- • LSF - load sharing facility - a job scheduler plementation of two different parallel codes for for (multi)cluster [3]. parameters estimate in metabolic networks based on the genetic algorithm to fully take advantage of II. T HE GENETIC ALGORITHM : B ASICS modern computational facilities such as the Enea According to GA analogy, in a biochemical GRID. network, a set K = ki, i = 1 , m of unknown Index Terms —COMETA, journal, L A T EX, paper, parameters is defined genome. Each of these template. missing array of constants gives rise to a dif- ferent behavior of the network, that is different I. I NTRODUCTION time evolutions for the metabolites’ concentra- Advances in omics sciences produce large tions. The network with the genome end its amounts of data that need analysis and interpre- behavior together constitute an individual and tation. Reliable explanations of how processes a group of individuals is a population. As in are regulated require an accurate modeling ap- the case of populations of organisms in nature, proach at the systems level. In this paper we GA populations undergo a selection where good focus on metabolic networks models which re- experimental data fitting represent the selection produce the time evolution of all the metabo- criterion. After every selection stage, a new lites. Quite often these models rely on several population is created; the current generation is unknown parameters which have to be estimated over and a new one is ready. After a large from experimental data. This task consists in the number of generations, GA is expected to yield solution of an inverse problem which requires the individuals which best fit experimental data. the use of an efficient optimization algorithm. III. T HE GENETIC ALGORITHM : GA [1] is a widely known optimum search C OMPUTATIONAL SCHEME method which yields reliable values for model parameters with a large computational demand. GA is implemented following this steps: Our aim is to develop a parallel implementation 1) a random number I of genomes is chosen, for parameter estimate based on GA to fully each of them differing from the others by take advantage of the modern different compu- the value of the unknown m constants. tational facilities. Our implementation relies on: This is the starting population. • Ecell software from Keio Univer- 2) for each individual, the time evolution of sity(Japan) [2], [4] for simulations of the metabolites’ concentration is calcu- biochemical networks; lated with the E-Cell tool.
BARBERA et al. : COMETA DEMO PAPER A. SCA implementation 3) metabolites’ concentration allows to eval- uate, for each individual, the correspond- In this case we propose a single-stage par- ing value of the cost function: F k = allelism approach. A master node generates an �� [ x i ( t j +1 ) − x i ( t j )] 2 , ( k = 1 , I ) ; a instance of a population of N individuals and good fit to data implies a small F k and then, if γ is the number of computing nodes, allots to each of them the same number N/γ of viceversa. fitness evaluations. When the fitness of all the 4) Mating is performed selecting the individ- genomes has been evaluated, the master node uals of the current generation according to the cost function F k , the smaller the gathers all data from the other computing nodes and performs selection, mating and mutation to more likely to be selected. Mutation is provide new offspring which is subsequently re- also added to yield the final offspring. allotted, as before, to the computing nodes. The After mating and mutation, a new gen- procedure is repeated for n G generations. eration is ready and and the cycle restarts from point (2) unless we meet one of the B. GRID implementation following conditions (in these cases, the procedure ends): In the GRID case, the specific computing architecture suggests a different implementation • the best value for the cost function of the method. In this case, in fact, the single becomes smaller than a fixed thresh- larger generation of SCA is replaced by a num- old; ber γ of smaller “secondary generations” (SG), • a pre-defined limit number of gener- each composed of N/β genomes. The master ations has been reached. node allots the SGs to other different secondary master nodes which accomplish the procedure described above for SCA. This procedure can be IV. GA AND SPECIFIC COMPUTATIONAL seen as the definition of many “islands” where ARCHITECTURES subsets of a generation are allowed to evolve. Each “island” evolves new generations, inde- The efficiency of the GA implementation pendently, on the basis of their initial genomes. stems from the choice of a number of param- After a number of generations n G , the is- eters: the number of individuals in a popula- lands are allowed to exchange their resulting tion, the number of generations and the spe- genomes. The master node gathers all data from cific values of the parameters used for selec- the SGs, ranks their best genomes and send back tion, mating and mutation processes. Another those with the lowest cost function. Then each important element to save overall time exe- SG generates a new seed-population by using cution is parallelizing the code; for example half of the best-fitting genomes received from the computation of the fitness of individuals the master and half by randomly generating new belonging to the same generation. In this work individuals. This procedure is iterated n S times, we suggest efficient implementations of GA, in so that the total number of SG generations is relation to specific computing architectures: a N T = n G n S . In this way two main results are single cluster architecture (SCA), characterized achieved: the first is to allow cooperation among by many computational nodes tightly intercon- SGs, which can exchange best individuals, and nected through a low-latency, high-bandwidth the second is keeping diversity to avoid being network and a GRID (or multicluster archi- trapped by local minima effects. tecture), characterized by a number of SCAs V. D ETAILS OF THE MODELS USED FOR linked through a Wide Area Network. The com- COMPUTATIONS putational problem has thus been mapped on a specific computing architecture by assuming We have considered different networks whose different implementations of the method. topology and kinetic constants are known. We 2
Recommend
More recommend