reverse engineering using computational algebra
play

Reverse engineering using computational algebra Elena Dimitrova - PowerPoint PPT Presentation

Reverse engineering using computational algebra Elena Dimitrova School of Mathematical and Statistical Sciences Clemson University http://edimit.people.clemson.edu/ Algebraic Biology E. Dimitrova (Clemson) Reverse engineering using


  1. Reverse engineering using computational algebra Elena Dimitrova School of Mathematical and Statistical Sciences Clemson University http://edimit.people.clemson.edu/ Algebraic Biology E. Dimitrova (Clemson) Reverse engineering using computational algebra Algebraic Biology 1 / 57

  2. What is reverse engineering? Sometimes, complex biological systems can seem a bit like this: (click here!). Systems biology is the study of systems of biological components. A central problem in systems biology is to use experimental data to infer the structure of a system such as a gene regulatory network. Modeling approaches Bottom-up : Build a network from the known local information about every single object. Top-down (“Reverse-engineering”): View the system as a black box, then use the available data to make a model. Previously, we’ve mostly studied the first approach to modeling. In this lecture, we’ll focus on the second approach. Many problems in statistics (e.g., linear regression) deal with the second approach. E. Dimitrova (Clemson) Reverse engineering using computational algebra Algebraic Biology 2 / 57

  3. The blind men and the elephant An old parable from India tells of several blind men who try to determine what an elephant looks like just by touch. The blind men are trying to reverse engineer an elephant from just a few data points. E. Dimitrova (Clemson) Reverse engineering using computational algebra Algebraic Biology 3 / 57

  4. Inferring a Boolean model (elephant) from data (observations) Consider a Boolean network on n nodes, with update function f : F n 2 → F n 2 . There are 2 n input states. Suppose we don’t know the actual function f , but through experimental data, we are able to observe several transitions: · · · s 1 = ( s 11 , s 12 , . . . , s 1 n ) s 2 = ( s 21 , . . . , s 2 n ) s m = ( s m 1 , . . . , s mn ) · · · t 1 = ( t 11 , t 12 , . . . , t 1 n ) t 2 = ( t 21 , . . . , t 2 n ) t m = ( t m 1 , . . . , t mn ) Reverse engineering Start with experimental data (observations) and reconstruct the model (elephant). The two main features are: (i) the network topology, or wiring diagram, (ii) the Boolean functions at each node: f = ( f 1 , . . . , f n ). This problem is not just limited to models over F 2 = { 0 , 1 } ; it works for models over larger finite fields F . We will call such models local models. E. Dimitrova (Clemson) Reverse engineering using computational algebra Algebraic Biology 4 / 57

  5. Inferring a Boolean network (elephant) from data (observations) Consider the following Boolean network: f 1 ( x 1 , x 2 , x 3 ) = x 1 ∧ x 2 = x 1 x 2 f 2 ( x 1 , x 2 , x 3 ) = x 1 ∧ x 2 ∧ x 3 = x 1 x 2 x 3 f 3 ( x 1 , x 2 , x 3 ) = x 1 ∧ x 2 = x 1 x 2 . The state space of f = ( f 1 , f 2 , f 3 ) is the following graph: 001 010 011 100 101 110 000 111 Question What if we only knew part of this state space, e.g., (1 , 1 , 0) − → (1 , 0 , 1) − → (0 , 0 , 0) − → (0 , 0 , 0) . Could we recover the individual functions? How many possible models could yield this “fragment”? E. Dimitrova (Clemson) Reverse engineering using computational algebra Algebraic Biology 5 / 57

  6. Reverse engineering the model space Broad goal Find “the best” local model f = ( f 1 , . . . , f n ) that fits the data: Input states: s 1 , . . . , s m ∈ F n with f ( s i ) = t i Output states: t 1 , . . . , t m ∈ F n Note that: f ( s i ) = ( f 1 ( s i ) , f 2 ( s i ) , . . . , f n ( s i )) = ( t i 1 , t i 2 , . . . , t in ) = t i . Question What if no models fit the data? (This is actually impossible.) What if many models fit the data? First, we’ll find all local models that fit the data. This is called the model space: � � F 1 × · · · × F n = ( f 1 , . . . , f n ) | f j ( s i ) = t ij for all i and j . Once we do this, the new problem becomes choosing the “best” one. This is called model selection. E. Dimitrova (Clemson) Reverse engineering using computational algebra Algebraic Biology 6 / 57

  7. Similar problems in other areas of mathematics 1. Parametrize a line in R n . 2. Parametrize a plane in R n . 3. Solve the underdetermined system Ax = b . 4. Solve the differential equation x ′′ + x = 2. E. Dimitrova (Clemson) Reverse engineering using computational algebra Algebraic Biology 7 / 57

  8. Parametrize a line in R n Suppose we want to write the equation for a line that contains a vector v ∈ R n : z t v + w v + w w t v v y x This line, which contains the zero vector , is t v = { t v : t ∈ R } . Now, what if we want to write the equation for a line parallel to v ? This line, which does not contain the zero vector , is t v + w = { t v + w : t ∈ R } . Note that ANY particular w on the line will work!!! E. Dimitrova (Clemson) Reverse engineering using computational algebra Algebraic Biology 8 / 57

  9. Solve an underdetermined system Ax = b Suppose we have a system of equations that has “too many variables,” so there are infinitely many solutions. For example: �   x � 2 � 4 � 2 x + y + 3 z = 4 1 3  = “ Ax = b form”: y .  3 x − 5 y − 2 z = 6 3 − 5 − 2 6 z How to solve: 1. Solve the related homogeneous equation Ax = 0 (this is null space, NS( A )); 2. Find any particular solution x p to Ax = b ; 3. Add these together to get the general solution: x = NS( A ) + x p . This works because geometrically, the solution space is just a line, plane, etc. Here are two possible ways to write the solution:         1 2 1 10  +  + C 1 0  , C 1 8  .     − 1 0 − 1 − 8 E. Dimitrova (Clemson) Reverse engineering using computational algebra Algebraic Biology 9 / 57

  10. Linear differential equations Solve the differential equation x ′′ + x = 2. How to solve: 1. Solve the related homogeneous equation x ′′ + x = 0. The solutions are x h ( t ) = a cos t + b sin t . 2. Find any particular solution x p ( t ) to x ′′ + x = 2. By inspection, we see that x p ( t ) = 2 works. 3. Add these together to get the general solution: x ( t ) = x h ( t ) + x p ( t ) = a cos t + b sin t + 2 . Note that while the general solution above is unique, its presentation need not be. For example, we could write it this way: x ( t ) = x h ( t ) + x p ( t ) = a (2 cos t − 3 sin t ) + b sin t + (2 − cos t + 8 sin t ) . Here, the particular solution has (unnecessary) “extra terms” that vanish on the homogeneous part, x ′′ + x = 0. E. Dimitrova (Clemson) Reverse engineering using computational algebra Algebraic Biology 10 / 57

  11. Reverse engineering: Problem statement Recall that a local model over F is an n -tuple f = ( f 1 , . . . , f n ) of functions f i : F n → F . The associated finite dynamical system (FDS) map is f : F n − → F n , f : x �− → ( f 1 ( x ) , . . . , f n ( x )) . p → F p is a polynomial in F p [ x 1 , . . . , x n ] / � x p 1 − x 1 , . . . , x p If F = F p then each f i : F n n − x n � . Goal Given a set of data: Input states: s 1 , . . . , s m ∈ F n with f ( s i ) = t i Output states: t 1 , . . . , t m ∈ F n Construct the model space F 1 × · · · × F n of all local models f = ( f 1 , . . . , f n ) that fit the data: f ( s i ) = ( f 1 ( s i ) , . . . , f n ( s i )) = ( t i 1 , . . . , t in ) = t i . We’ll find each F 1 , . . . , F n separately. E. Dimitrova (Clemson) Reverse engineering using computational algebra Algebraic Biology 11 / 57

  12. Reverse engineering: How to find F j We wish to find the set F j of all local functions (polynomials!) f j that fit the data: F j = { f j : f j ( s 1 ) = t 1 j , . . . , f j ( s m ) = t mj } . Define the set I (it is actually an “ideal” of the polynomial ring F [ x 1 , . . . , x n ]) I = { h : h ( s i ) = 0 for all i = 1 , . . . , m } = { all polynomials that vanish on the data } . Theorem The set of polynomials that fit the data at node j is F j = f j + I = { f j + h : h ∈ I } , where f j is any one particular polynomial that fits the data. Thus, to find F j , we need to do two things: 1. Find the ideal I ; ( all solutions to { f j ( s i ) = 0 , ∀ i } ) 2. Find any polynomial f j that fits the data. ( one solution to { f j ( s i ) = t ij , ∀ i } ) E. Dimitrova (Clemson) Reverse engineering using computational algebra Algebraic Biology 12 / 57

  13. Reverse engineering: How to find I and f j 1. Finding I : Define I ( s i ) to be the set of polynomials that vanish on s i : I ( s i ) = { all polynomials h i such that h i ( s i ) = 0 } = { ( x 1 − s i 1 ) g 1 ( x ) + ( x 2 − s i 2 ) g 2 ( x ) + · · · + ( x n − s in ) g n ( x ) } = � x 1 − s i 1 , x 2 − s i 2 , . . . , x n − s in � Clearly, the set I of polynomials that vanish on all s i (for i = 1 , . . . , m ) is m � I = I ( s i ) . i =1 2. Finding f j : There are many algorithms. Lagrange interpolation is one of them: n � � 1 − ( x i − c i ) p − 1 ] . f ( x 1 , . . . , x n ) = [ f ( c 1 , . . . , c n ) ( c 1 ,..., c n ) ∈ V i =1 In this lecture, we will learn another method which has the Chinese remainder theorem lurking behind the scenes. E. Dimitrova (Clemson) Reverse engineering using computational algebra Algebraic Biology 13 / 57

Recommend


More recommend