Reverse engineering using computational algebra Matthew Macauley Department of Mathematical Sciences Clemson University http://www.math.clemson.edu/~macaule/ Math 4500, Spring 2015 M. Macauley (Clemson) Reverse engineering using computational algebra Math 4500, Spring 2015 1 / 17
The blind men and the elephant An old parable from India tells of several blind men who try to determine what an elephant looks like just by touch. The blind men are trying to reverse engineer an elephant from just a few data points. M. Macauley (Clemson) Reverse engineering using computational algebra Math 4500, Spring 2015 2 / 17
Inferring a Boolean network model (elephant) from data (observations) Consider a Boolean network model on n nodes, with update function F : F n 2 → F n 2 . There are 2 n input states. Suppose we don’t know the actual function F , but through experimental data, we are able to observe several transitions: · · · s 1 = ( s 11 , s 12 , . . . , s 1 n ) s 2 = ( s 21 , . . . , s 2 n ) s m = ( s m 1 , . . . , s mn ) · · · t 1 = ( t 11 , t 12 , . . . , t 1 n ) t 2 = ( t 21 , . . . , t 2 n ) t m = ( t m 1 , . . . , t mn ) Reverse engineering Start with experimental data (observations) and reconstruct the model (elephant). The two main features are: (i) the network topology, or wiring diagram, (ii) the Boolean functions at each node: F = ( f 1 , . . . , f n ). M. Macauley (Clemson) Reverse engineering using computational algebra Math 4500, Spring 2015 3 / 17
Inferring a Boolean network model (elephant) from data (observations) Consider the following polynomial dynamical system: f 1 ( x 1 , x 2 , x 3 ) = AND( x 1 , x 2 ) = x 1 x 2 f 2 ( x 1 , x 2 , x 3 ) = AND( x 1 , x 2 , x 3 ) = x 1 x 2 x 3 f 3 ( x 1 , x 2 , x 3 ) = AND( x 1 , x 2 ) = x 1 x 2 . The state space of the FDS map F = ( f 1 , f 2 , f 3 ) is the following graph: 001 010 011 100 101 110 000 111 Question What if we only knew part of this state space, e.g., (1 , 1 , 0) − → (1 , 0 , 1) − → (0 , 0 , 0) − → (0 , 0 , 0) . Could we recover the individual functions? How many possible models could yield this “fragment”? M. Macauley (Clemson) Reverse engineering using computational algebra Math 4500, Spring 2015 4 / 17
Reverse engineering Broad goal Find “the best” model F = ( f 1 , . . . , f n ) that fits the data: Input states: s 1 , . . . , s m ∈ F n with F ( s i ) = t i Output states: t 1 , . . . , t m ∈ F n Note that: F ( s i ) = ( f 1 ( s i ) , f 2 ( s i ) , . . . , f n ( s i )) = ( t i 1 , t i 2 , . . . , t in ) = t i . Question What if no models fit the data? What if many models fit the data? (This is more likely.) First, we’ll find all models that fit the data. This is called the model space: F 1 × F 2 × · · · × F n = { ( f 1 , . . . , f n ) | f j ( s i ) = t ij for all i and j } . Once we do this, the new problem becomes choosing the “best” one. This is called model selection. We will not discuss this problem. M. Macauley (Clemson) Reverse engineering using computational algebra Math 4500, Spring 2015 5 / 17
Similar problems in other areas of mathematics 1. Parametrize a line in R n . 2. Parametrize a plane in R n . 3. Solve the underdetermined system Ax = b . 4. Solve the differential equation x ′′ + x = 2. M. Macauley (Clemson) Reverse engineering using computational algebra Math 4500, Spring 2015 6 / 17
Parametrize a line in R n Suppose we want to write the equation for a line that contains a vector v ∈ R n : z t v + w v + w w t v v y x This line, which contains the zero vector , is t v = { t v : t ∈ R } . Now, what if we want to write the equation for a line parallel to v ? This line, which does not contain the zero vector , is t v + w = { t v + w : t ∈ R } . Note that ANY particular w on the line will work!!! M. Macauley (Clemson) Reverse engineering using computational algebra Math 4500, Spring 2015 7 / 17
Solve an underdetermined system Ax = b Suppose we have a system of equations that has “too many variables,” so there are infinitely many solutions. For example: � x � 2 � 3 � 2 x + 3 y − 6 z = 3 3 − 6 = “ Ax = b form”: y . 3 x − 4 y + 3 z = 1 3 − 4 3 1 z How to solve: 1. Solve the related homogeneous equation Ax = 0 (this is null space, NS( A )); 2. Find any particular solution x p to Ax = b ; 3. Add these together to get the general solution: x = NS( A ) + x p . This works because geometrically, the solution space is just a line, plane, etc. M. Macauley (Clemson) Reverse engineering using computational algebra Math 4500, Spring 2015 8 / 17
Linear differential equations Solve the differential equation x ′′ + x = 2. How to solve: 1. Solve the related homogeneous equation x ′′ + x = 0. The solutions are x h ( t ) = a cos t + b sin t . 2. Find any particular solution x p ( t ) to x ′′ + x = 2. By inspection, we see that x p ( t ) = 2 works. 3. Add these together to get the general solution: x ( t ) = x h ( t ) + x p ( t ) = a cos t + b sin t + 2 . M. Macauley (Clemson) Reverse engineering using computational algebra Math 4500, Spring 2015 9 / 17
Reverse engineering: Problem statement Definition A finite dynamical system (FDS) is a function F = ( f 1 , . . . , f n ): X n → X n where each f i : X n → X is a local function and | X | < ∞ (usually X = F 2 = { 0 , 1 } ). Key fact If X = F is a finite field (e.g., Z 2 , Z 3 , Z p , etc.), then every function f i : F n → F is a polynomial in x 1 , . . . , x n . Goal Given a set of data: Input states: s 1 , . . . , s m ∈ F n with F ( s i ) = t i Output states: t 1 , . . . , t m ∈ F n Construct the model space F 1 × · · · × F n of all models F = ( f 1 , . . . , f n ) that fit the data: F ( s i ) = ( f 1 ( s i ) , . . . , f n ( s i )) = ( t i 1 , . . . , t in ) = t i . We’ll find each F 1 , . . . , F n separately. M. Macauley (Clemson) Reverse engineering using computational algebra Math 4500, Spring 2015 10 / 17
Reverse engineering: How to find F j We wish to find the set F j of all local functions (polynomials!) f j that fit the data: F j = { f j : f j ( s 1 ) = t 1 j , . . . , f j ( s m ) = t mj } . Define the set I (it is actually an “ideal” of the polynomial ring F [ x ]) I = { h : h ( s i ) = 0 for all i = 1 , . . . , m } = { all polynomials that vanish on the data } . Theorem The set of polynomials that fit the data at node j is F j = f j + I = { f j + h : h ∈ I } , where f j is any one particular polynomial that fits the data. Thus, to find F j , we need to do two things: 1. Find the ideal I ; ( all solutions to { f j ( s i ) = 0 ∀ i } ) 2. Find any polynomial f j that fits the data. ( one solution to { f j ( s i ) = t ij ∀ i } ) M. Macauley (Clemson) Reverse engineering using computational algebra Math 4500, Spring 2015 11 / 17
Reverse engineering: How to find I and f j 1. Finding I : Define I ( s i ) to be the set of polynomials that vanish on s i : I ( s i ) = { all polynomials h i such that h i ( s i ) = 0 } = { ( x 1 − s i 1 ) g 1 ( x ) + ( x 2 − s i 2 ) g 2 ( x ) + · · · + ( x n − s in ) g n ( x ) } = � x 1 − s i 1 , x 2 − s i 2 , . . . , x n − s in � Clearly, the set I of polynomials that vanish on all s i (for i = 1 , . . . , m ) is m � I = I ( s i ) . i =1 2. Finding f j : There are many algorithms. Lagrange interpolation is one of them. In this lecture, we will learn another method, and do a hands-on example. We’ll get started with this now. M. Macauley (Clemson) Reverse engineering using computational algebra Math 4500, Spring 2015 12 / 17
Finding f j (one method) For each data point s i ( i = 1 , . . . , m ), we’ll construct an r -polynomial that has the following property: � 1 x = s i r i ( x ) = 0 x � = s i Once we have these, the polynomial f j ( x ) we seek will be f j ( x ) = t 1 j r 1 ( x ) + t 2 j r 2 ( x ) + · · · + t mj r m ( x ) . One way to construct the r -polynomials: m � r i ( x ) = b ik ( x ) , k =1 k � = i where b ik ( x ) = ( s i ℓ − s k ℓ ) p − 2 ( x ℓ − s k ℓ ) and ℓ is the first coordinate in which s i and s k differ. M. Macauley (Clemson) Reverse engineering using computational algebra Math 4500, Spring 2015 13 / 17
An example Consider the following time series in a 3-node system over Z 5 : s 1 = (2 , 0 , 0)= t 0 s 2 = (4 , 3 , 1) = t 1 s 3 = (3 , 1 , 4) = t 2 s 4 =(0 , 4 , 3) = t 3 For reference, here are the input vectors s i and output vectors t i : s 1 = ( s 11 , s 12 , s 13 ) = (2 , 0 , 0) , t 1 = ( t 11 , t 12 , t 13 ) = (4 , 3 , 1) , s 2 = ( s 21 , s 22 , s 23 ) = (4 , 3 , 1) , t 2 = ( t 21 , t 22 , t 23 ) = (3 , 1 , 4) , s 3 = ( s 31 , s 32 , s 33 ) = (3 , 1 , 4) , t 3 = ( t 31 , t 32 , t 33 ) = (0 , 4 , 3) . Note that s 1 differs from s 2 and s 3 in the ℓ = 1 coodinate, so this ℓ will work for each of r 1 , r 2 , and r 3 . M. Macauley (Clemson) Reverse engineering using computational algebra Math 4500, Spring 2015 14 / 17
Recommend
More recommend