Analysis of metabolic networks in presence of biological constraints: geometrical, combinatorial and logical approaches Philippe Dague LRI, Univ. Paris-Sud & CNRS, Univ. Paris-Saclay on leave at INRIA-Saclay Lifeware Acknowledgements: Sabine Peres, Antoine Deza, Martin Morterol, Eva Dechaux BIOSS-IA Workshop: Logical Models for Formal Representation of Living Systems Gif-sur-Yvette, June 22, 2017 1
2
3
Context Input A metabolic network (weighted bipartite graph) given by: m × r stoichiometric real (actually, rational) matrix S . m number of internal metabolites, r number of enzymatic reactions. S ji stoichiometric coefficient of metabolite j in reaction i (< 0 for j substrate, > 0 for j product). m < r and S assumed of full rank m . Irrev : those of reactions which are irreversible (given « compiled » thermodynamic knowledge). Steady-state solutions (feasible flux vectors at steady-state): P = { v ∈ ℝ r | Sv = 0 and v i ≥ 0, i ∈ Irrev }. Convex polyhedral cone in dimension dim(Ker( S )) = r – m with Irrev facets. Two ways of dealing with reversible reactions - Keep them as they are, allowing them to appear with positive or negative coefficient in solutions (thus, the cone P is not pointed). - Decompose each reversible reaction into two exclusive irreversible reactions. This means adding r – |Irrev| columns to S , whose columns number becomes d = 2 r – |Irrev| . Then all d reactions are irreversible and the cone P is thus pointed , why this is the choice adopted by most of the algorithms and that we will adopt in the following. 4
Example X1 X1 X2 m = 2 r = 5 1 1 T1 T2 1 1 T1 T2 T3 T4 R1 A B A 1 0 − 1 0 − 1 R1 = S A A B B B 0 1 0 − 1 2 1 2 Irrev = {T1, T2, T3, T4} 1 1 T3 T4 1 1 T1 T2 T3 T4 R1 R1rev A 1 0 –1 0 –1 1 X3 X3 X4 X4 ( ) B 0 1 0 –1 2 –2 d = 6 Irrev = {T1, T2, T3, T4, R1, R1rev} 5
Context For v ∈ P , Supp( v ) = { i | v i ≠ 0} is the support of v . As P is pointed: minimal (for set inclusion of supports) solutions = undecomposable solutions = extreme rays of P . They form the unique (up to positive scalings) minimal set of generating (for non negative linear combinations) vectors of all solutions in P . They are called Elementary Flux Modes ( EFMs ). 6
The Double Description method: theoretical framework behind EFMs enumeration Almost all known algorithms 1 for enumerating EFMs are variants of the Double Description Method (Motzkin-Raiffa-Thompson-Thrall,1953, known as Fourier-Motzkin). A pair ( A,R ) of real matrices A ( m × d ) and R ( d × n ) is said to be a double description pair or simply a DD pair if the following relationship holds: A x ≥ 0 if and only if x = R λ for some λ ≥ 0 P ( A ) polyhedral cone represented by A ( representation matrix ) as: P ( A ) = { x ∈ ℝ d | A x ≥ 0 } is simultaneously represented by R ( generating matrix ) as: { x ∈ ℝ d | x = R λ for some λ ≥ 0 }. Minkowski’s Theorem for Polyhedral Cones : For any m × d real matrix A , there exists some d × n real matrix R such that ( A,R ) is a DD pair, or in other words, the cone P ( A ) is generated by R . The theorem states that every polyhedral cone admits a generating matrix. The non-triviality comes from the fact that the column size n of R is finite: every cone is finitely generated. 1 For enumerating only part of EFMs according to certain criteria or contraints, other approaches are: LP, MILP, SAT/SMT (our work), constraints, genetic algorithms, … 7
The Double Description method Task : how does one construct a matrix R from a given matrix A ? A more appropriate formulation of the problem is to require the minimality of R : find a matrix R such that no proper submatrix is generating P ( A ). A minimal set of generators is unique up to positive scaling when we assume that the cone is pointed ( ⟺ rank( A ) = d ), i.e., the origin is an extreme point of P ( A ) . Geometrically, the columns of a minimal generating matrix are in 1-to-1 correspondence with the extreme rays of P. Thus the problem is also known as the extreme ray enumeration problem . No efficient (polynomial) algorithm is known for the general problem (which is isomorphic to the vertex enumeration problem for a convex polyhedron). 8
The Double Description method 9
The Double Description Method: Complexity? P : an H-polytope represented by m halfspaces h 1 , . . . , h m in R d . P k = ∩ k i =1 h i : k th polytope ( P = P m ). V k = V ( P k ) : the vertex set computed at k th step. (Pk-1, Vk-1) (Pk, Vk) newly generated for each adj hk pair ( , ) 8 10
How Intermediate Sizes Fluctuate with Different Orderings Size INTERMEDIATE SIZES FOR CCP6 maxcutoff 1500 mincutoff 1250 random 1000 750 500 250 lexmin Iteration 20 22 24 26 28 30 32 The input is a 15 -dimensional polytope with 32 facets. The output is a list of 368 vertices. The lexmin is a sort of shelling ordering. 10 11
How Intermediate Sizes Fluctuate with Different Orderings Size Maxcutoff 30000 25000 Random 20000 15000 10000 5000 Mincutoff Iteration 200 400 600 800 1000 The input is a 10 -dimensional cross polytope with 2 10 facets. The output is a list of 20 vertices. The highest peak is attained by maxcutoff ordering, following by random and mincutoff. Lexmin is the best among all and the peak intermediate size is less than 30 . (Too small too see it above.) 11 12
Application to EFMs computation Problem expressed in DD framework as A x ≥ 0 with solutions P ( A ), where m × d matrix A ( m = 2 m + d ) is given by: A = ( S T , –S T , I d ) T A is of full rank d and P = P ( A ). v is an extreme ray of P if and only if: dim(Ker( S Supp( v ) )) = 1 (and thus |Supp( v )| ≤ m + 1). Extreme rays v and v’ are adjacent if and only if: dim(Ker( S Supp( v ) ∪ Supp( v’ ) )) = 2. P is of dimension d – m and a solution v ∈ P verifies d – |Supp( v )| inequalities with equality. In particular an extreme ray v verifies at least d – m –1 inequalities with equality. d – m –1 is the non-degenerate case. The problem is thus in general degenerate with degenerescence degree equal to m + 1 – sl , where sl is the shortest length of an EFM (the only non-degenerate cases are a linear chain network or a one-metabolite network) . This has prevented up to now to use a combinatorial method, alternative to DD , called lrs ( local reverse search ), very efficient for non-degenerate problems. But very recent results show that, once highly parallelized, it can compete with DD for degenerate problems. 13
Application to EFMs computation: complexity Counting EFMs is #P-complete . Can EFMs be enumerated with polynomial delay is an open problem in case of at least one irreversible reaction (the answer is yes if all reactions are reversible, as EFMs are then the circuits of a linear matroid). Number of EFMs is bounded by 𝚰 (m d/2 ) . Deciding if it exists an EFM with given support Supp is NP-complete if |Supp|>1. Given k , deciding the existence of an EFM with at most k reactions is NP-complete. 14
Results - Despite very efficient and specially tuned implementations ( Metatool, Efmtool, FluxModeCalculator ) allowing the enumeration of millions of EFMs, the DD approach suffers of combinatorial explosion at genome scale . E. Coli: 89 metabolites, 106 reactions, 26,4 10 6 EFMs. Human (estimation of reconstructed network): 2766 metabolites, 4832 reactions, 10 29 EFMs. - Moreover most of the thousands or millions of EFMs that can be computed for medium scale systems are biologically irrelevant . This is because only stoichiometric constraints and some known irreversibility constraints are taken into account, but a lot of other biological constraints are missing. - In order to both being able to deal with systems at genome scale and to drastically reduce the number of biologically unfeasible EFMs, it is necessary to take into account and to integrate in the enumeration process biological constraints such as thermodynamic, kinetic and regulatory constraints . 15
Adding biological constraints: general framework - Given supplementary biological constraints C(v, x) on flux vectors v , depending in general on other variables x (e.g., enzymes concentrations, metabolites concentrations) and parameters p (not represented), we are interested in the solutions satisfying the constraints: Sol C ( x ) = { v ∈ ℝ d | S v = 0 , v ≥ 0 , C(v, x) } ⊆ P ≡ Sol ∅ . - Most often, values of most or all of variables x are unknown. Then, we consider the following weaker form of the solution space, requiring consistency with the constraints: Sol C = { v ∈ ℝ d | S v = 0 , v ≥ 0 , ∃ x C(v, x) } ⊆ P where ∃ x C(v, x) is True iff { x | C(v, x )} ≠ ∅ . Thus: Sol C = ∪ x Sol C ( x ) - Questions: - What is the mathematical (geometrical) structure of Sol C ? - What are the analogues of EFMs , i.e., those solutions in Sol C that are support- minimal in Sol C , that are undecomposable , … ? Do they characterize Sol C ? - How computing these minimal or undecomposable solutions? In particular, how integrating their computation into the DD method ? Or consider alternative methods ? Efficiently enough to be able to process genome scale models. 16
Recommend
More recommend