Transfer entropy for network reconstruction in a simple dynamical model Roy Goodman NJIT Dept. of Mathematical Sciences
How I Spent My Sabbatical My location My commute My host: Mau Por fi ri
The Por fi ri Lab A big group working in a lot of areas, both theoretical and through laboratory experiments: • Fluid mechanics: fl uid structure interactions during water impact • Arti fi cial Muscles and Soft Robotics • Telerehabilitation • Network-based modeling of infectious diseases • Fish schooling • Using robotics and zebra fi sh to study substance-abuse disorders • Information-theoretic analysis of social science datasets • more… A unifying theme in this work is using methods from information theory for modeling and analysis
What is this talk about? A lot of words to de fi ne here: Transfer entropy for network reconstruction in a simple dynamical model • Network: a graph composed of vertices and edges, the subject at the heart of graph theory • Transfer entropy: a quantity describing the transfer of information from one evolving variable to another, from information theory • The dynamical model: to be described, a simple probabilistic dynamics. • Reconstruction: fi gure out properties of the graph based on the dynamics 18 months ago I knew 𝞋 about any of this
<latexit sha1_base64="LwXKVndP6bQLhYuhEREnmP3bS6A=">AD3icbVJNixMxGE7Hr3X82K4evUSHiqCUSamsF6GwCJ5kV9rdhU4pmcw7bdgkMySZXcowZ8978S948urexKv+AEH8M2baKp1dXwg8zvk3mSvHEuLFh+KvlXbt+4+atrdv+nbv37m+3dx4cmqzQDEYsE5k+jqkBwRWMLcCjnMNVMYCjuKTvVo/OgVteKaGdpHDRNKZ4iln1Dpq2n4cJSAsnZaRluWbqsKvMXkedvtukRp3X07bQdgNl4WvArIGweDpwe8fH8/7U93Wj+jJGOFBGWZoMaMSZjbSUm15UxA5UeFgZyEzqDsYOKSjCTcnmWCnck+A024pi5es39m0DMmkrDVQrGoqJk0gpYVwP5txIXhmg0lFAL0qWxG2NwPd/A7gGSdYOgujOC9TDENFjZNLu2KrLAfKThjmZRUJWVEY1ON3YaRgNRGwl29DUik+WxuI1/Vc1+rhToatz756BqJiAgL4LeX9eSaUQuqTRmIWMXV1I7N5e1mvyfNi5s+mpScpUX69M6LS0EthmuhwMnXAOzYuEAZq798JsTjVl1o2Q7yaBXH73q+Cw1yX9bv+ABIP3aFVb6BF6gp4hgnbRAL1F+2iEGPqAPqMv6MI79y68r963VavXWnseokZ53/8At2H+0w=</latexit> <latexit sha1_base64="cSFswFU9E1zR8dRebv0rXnmP4vU=">ADtnicbVLbtQwFE0aHiW8prBkYxENYoGiZJhCWVRU6oYVKmimU2mSVo5zk7HqOJHtFEZRvoaf4Quo+BuczEPJlCtd6+YcH/vE90YFo1J53l9z7p3/8HD/Uf24ydPnz0fHLw4l3kpCExJznJxEWEJjHKYKqoYXBQCcBYxmEXpw0/uwEhac4nalAmOGU04QSrDR0Nfg9Q8coiClvIoyrAT9WdseoM8932zbtI9PGzLILC9Lr7NDdGX9alxu37okH5Luv72mnELdpXup90D3XHrxQ6Ax1vTVwPHc7020N3CXxfO51v7uPh1a59dHZh/gjgnZQZcEYalnPteocIKC0UJg9oOSgkFJtc4hbkuOc5AhlX75DUaiRGS50coVa1B52JRM/rBoOKn7jExiSHDJ9GUpZYyWsr+hgpKBuMn6FrnoSH6ChCvHUx0X310mnMiQEFXpN2uwBrZAYcfJM8yrB8twJGs5/rAgEGiAqYnRDl+IGi6UIFovur+fso5iHo+2iowTxk4/jtntFG1SM9yhTMpl1mk7eomLeQu14D/4+alSo7CivKiXP+t5pKSIZWjZoZRTAUQxZa6wERQ3S9EFlhgovSk23oS/N2+3y3OR64/dsfPOfku7GKfeOV8dp4a/jGR+PE+GKcGVODmK45MUPz0jqyLi2w0tXWPXOteWn0wir+AZh9GeU=</latexit> <latexit sha1_base64="3oRIr54WmvCXtvbZVTLQLfwvi30=">ADCnicbVJNi9QwGM7Ur7V+zepRD8Ey4EG6bCgF3FgL5klZmdhbaUNH3byW6SliRdGUpBPsXvHt1b7I38Tco/gd/hOnMKNdXyg8eZ73SZ8kb1Jyps1o9KvnXLl67fqNnZvurdt37t7r794/1EWlKMxowQt1lBANnEmYGWY4HJUKiEg4zJOT/Vafn4LSrJBTsywhEiSXLGOUGEvF/UdhCtyQuGYNfoFDXYm4Pm7w3BLHTdz3RsPRqvBl4G+AN3n57dP3+fvD+Ld3o8wLWglQBrKidaBPypNVBNlGOXQuGloST0hOQWCiJAB3Vq2M0eGCZFGeFsp80eMW6g23L1I/qVgNJm6isxQyUnH7s5xzirdbaih4qBORTfC9n54gF8DpJsEU3tXPt4vJFVgYNtk067JBruhHe0EILItA5JopvAbhyEzI7a0bzw8VyxcmVO2q6fYzKUE1wfifg8icg+c/9cZ/XSumE7kmQulSGxcQcxCX9Ra8n9aUJnseVQzWVab01otqzg2BW7nAqdMATV8aQGhitn3wnRBFKHGTo9rJ8G/+O6XweF46O8N9743uQtWtcOeogeoyfIR8/QBL1CB2iGKPqAPqMv6Mz56Jw5X53zdavT23geoE453/8AvsMArQ=</latexit> Graph Theory Graph theory is central to the mathematics of Computer Science, describing the connections between interacting agents. A graph is de fi ned by a set V of vertices, connected by a set E of edges. The graph at right is both directed and weighted . The weight matrix has entries 0 0.3 0 0 0.55 0 W ij de fi ned as follows 0 0 0 0 0 0 • If an edge exists from node j to node i the 0 0.3 0 0 0 0 W = 0 0.4 0.6 0 0 1 entry is the associated weight • If no edge exists, the entry zero. 0.1 0 0.4 1 0 0 0.9 0 0 0 0.45 0 An important notion for us will be the weighted incoming degree . δ i = P j W ij In this example δ E = 1 + 0.4 + 0.1 = 1.5
<latexit sha1_base64="h0CN0sprwUAmbI4BC5SLR0I3FA=">ADUnicdVJdixMxFE1bP9a6ul19CU4FrqgpVMWdB8KhUVYX6RKu1uYlJLJ3JmGzWSGJLNuGeaX+BP8Hz74LIJ/wGdf3SczbZXOrl4I3JxzT3KSe/1UcG16ve+1euPW7Tt3d+417+8+eLjX2n90qpNMZiwRCRq6lMNgkuYG4ETFMFNPYFnPnxyV/dgFK80SOzTKFWUwjyUPOqLHQvBWcdKYHeIBJTM1CxfnrwntjkdngBdFZPM8vMeFyzTIq8mlRYJKkoKhJlKQx5CNVdKaDywNMRBL9h5u3nF63twp8M3E3iTM8+nz049PHZ6P5fu0bCRKWxSANE1Rrz+2lZpZTZTgTUDRJpiGl7JxG4Nm0vE3P8tV3FLhtkQCHibJLGrxCm+1tydid5SUHkhVRocBhDQT9rKIC8EzXS3IROgLuKqhe3zcBu/BQg2Dsb2z18nEimwMC2yLpdgwVuEgkfWBLHVAY5ob4uPHsgERAaImz3jOMSxaOFIarcFdV6LiWowuv/VAZCXDc507/j2qFVCznNZ6GfvWbtlbfZ0rwX9xXmbCV7OcyzTbvNZyYSawSXA5XzjgCpgRS5tQprjtF2YLqigzdgqbdhLc632/mZz2u+5h9/Cd6wzfo3XsoCfoKeogF71EQ3SCRmiCGPqCfqJf6Kr+tX7VqDUa69J6baN5jCrR2P0NZI8Y0w=</latexit> <latexit sha1_base64="+LxPrbxaQrwLNIb/AbHZefKoL54=">ADGXicbVJNj9MwEHXD1I+tgtHLhZRoRVQNdVKcFmp0l7gpqd1eq8pxJqm1thPZzrJVlB/BmR/ClT0gEFckJP4NTltQs8tIkcbvzXOeZybMBDe23/d8K5dv3Hz1s7t5p279+7vtvYeHJk01wmLBWpPgmpAcEVTCy3Ak4yDVSGAo7D08OKPz4DbXiqxnaZwUzSRPGYM2odNG89e9M57+ID/IKINCmIgNh2iKR2oWUx0mXn5OC8SzRPFrZbzlt+v9dfBb6aBJvEHw7I029fn+yO5nuNXyRKWS5BWSaoMdOgn9lZQbXlTEDZJLmBjLJTmsDUpYpKMLNi9aoStx0S4TjV7lMWr9Bme1syDmZFxYFiZ0xcQxzYX7WcKF4LmpFxSQC9Bnsm5h+z7cxm8Bo2DsWtdgA9TxTRY2BY5t2uwxE2i4ANLpaQqKgNTl1F656SoQbgvWDdSuJrk5lvZ4rBbqcDv4pqEoE+MFzf/BXtUJqlgsqjVnK0NmthmYucxX4P26a2/jVrOAqyzevdVycC2xTXK0JjrgGZsXSJZRp7uaF2YJqyqxbpqbhODy3K8mR4NesN/bfxf4w/doHTvoEXqMOihAL9EQvUYjNEMfUSf0Rd04X3yLrzv3o91qdfYaB6iWng/wC7sQOI</latexit> Information Theory • Initiated by Claude Shannon’s 1948 “A Mathematical Theory of Communication” • Originally used to study the transmission of signals down noisy channels and to develop optimal strategies for encoding information • Recently become popular tool for analyzing dynamical systems Fundamental quantity: Consider a discrete random variable X drawn from a sample space X The information associated with the event X=x measures how “surprising” it is that X=x I ( x ) = − log ( Pr ( X = x )) The Shannon entropy of the random variable is the expectation value fo the information H ( X ) = E [ I ( X )] = − P x ∈ X Pr ( X = x ) log Pr ( X = x )
Basic example: biased coin toss Consider a biased coin that gives heads with probability p and tails with probability 1 - p • When p≈0 or p≈1 , entropy small since surprising outcomes rarely occur • When p≈0.5 , entropy large since both outcomes equally likely
<latexit sha1_base64="yAy5hPIWO/AbZjOjDkKU1KXu8vw=">AETXicjVPfT9swE4pG6z7AWyPezmtYqKCoQYhbS9ISGgSTxOTCrSqS+W4TrFwnMp2GJXxn7fnvU3a37DX8TZNuzQZIsCmnZTk83d3n8+5czSRwth2+1trj7/4OHC4qPG4ydPny0trzw/MmGT9kqUx1N6KGS6H4oRVW8u5Ec5pEkh9HZ3u5/icayNS1bHTCR8kdKxELBi1SA1XaieARhJqT3XiOu/9iesRm0LXw2vYgf217pdD1uX+Gm13lTXG72cBEIauUYeTkyWDB2+I2MpO3MXQ7fugQhV7MCodChMCFzcR06rZM97pCWPLXHkQBewD7gzrEPY2oFCfSNn8hWi3gyhjhbjUzsAYkXCDWamYyCxpgz+pXT5dyX/n4nX4QUgfrjcbG+2ZwZ3QViCZlDaAXbkKxmlLEu4skxSY/phe2IHjmormOS+QTLDJ/hz6Zj3ESqKJxy42SR4WEVmBHGq8VEWZmxj9WZKJxy43McV81WPiUc8pnEzcZCSpGZaoDjmeT6PKmWcFMPVuED56Oyg6OWwh7qWKaW34zCastSA8NovgnliYJVSNHaGR8HwWLrkscXNsMy1+p85WvxguluPb9resMqsaSN8ON5tafrBlTKdnRxJhpEmG5+aiZ276cvM/Xz2z8buCEmTladEXZxLwuRXC0ZCc2blFAFlWmC/gJ1SnDqLF7CBkxDe7vtdcLS1GW5vbn/cbu7uljOxGLwMXgVrQRi8DXaD/eAgOAxY7XPte+1H7ar+pX5V/1n/VYTO1cqcF0HF5hd+Aw1+YS0=</latexit> <latexit sha1_base64="FwI3fry1QsFcWeCEHvCkZP1kRU=">AD1nicvVNa9tAEJWifqTqR5z2MtS4eKAayxjaC+GQC45hbTYiYzXmNVqJC9ZrcTuKo1RlFvptb+u0H/Qn9GV4xYrKT12QDB6b2bnLW82zDlTut/Ye84Dx4+erz7xH367PmLvdb+yzOVFZLChGY8k0FIFHAmYKZ5hDkEkgacjgPL45q/vwSpGKZGOtVDvOUJILFjBJtoEXrJzruBN3pAXo7Qu+wKtJFeYUwEwinRC8p4WVQdGqCU2rCuEsB0l0JgVJoTyVScYXWno9UBwjxL/sljt56/b+mXtf8ouX1e/1oPuJv0k8axOni37O4yWqQgNOVEqZnfz/W8JFIzyqFycaEgJ/SCJDAzaT1Rzcu1JRVqGyRCcSbNJzRao257u2Xsz8uaA0GrJqPiCGJScDMsYZyzQjULSig4yMu0KWH7PNRGJwDRsHY+O6jo0xQCRq2m4zaW7BCLhbwmWZpSkRUYhKqamYOxBxijbnZIO35WLJkqbGs/6pmPRMCZDUb/OkgIuHg+V1v8LtrjTQklyRVapWGRm7tsbrL1eDfuFmh4w/zkom82NzWcHBkc5QveMoYhKo5iuTECqZ8QvRJZGEavMSXLMJ/l3f7ydng54/7A0/Dr3DT5ud2LVeW2+sjuVb761D69g6tSYWtU9sbVf2jRM4N84X5+t6Y696XlNcL59guXUEHJ</latexit> Transfer entropy Schreiber 2000 Consider two random variables X and Y . De fi ne the joint entropy and the conditional entropy: X H ( X, Y ) = − Pr ( X = x, Y = y ) log Pr ( X = x, Y = y ) x ∈ X ,y ∈ Y X H ( X | Y ) = − Pr ( X = x, Y = y ) log Pr ( X = x | Y = y ) x ∈ X ,y ∈ Y Entropy can be de fi ned analogously for stationary stochastic processes Transfer entropy from Y to X is the di ff erence between the entropy of X(t+1) conditioned on X(t) and that conditioned on both X(t) and Y(t) TE Y → X = H ( X ( t + 1 ) | X ( t )) − H ( X ( t + 1 ) | X ( t ) , Y ( t )) � � Pr [ X ( t + 1 ) = x + , X ( t ) = x, Y ( t ) = y ] × log Pr [ X ( t + 1 ) = x + | X ( t ) = x, Y ( t ) = y ] X = Pr [ X ( t + 1 ) = x + | X ( t ) = x ] x + ∈ X x ∈ X y ∈ Y Transfer entropy measures the reduction in the uncertainty of predicting X(t + 1) from both X(t) and Y(t) relative to predicting it from X(t) alone.
Contrived transfer entropy example Source: A Tutorial for Information Theory in Neuroscience Nicholas M. Timme and Christopher Lapish 2018 •Series 1: X & Y uncorrelated: TE ≈ 0 •Series 2 & 3: X tends to fi re before Y: TE large •Series 4: X(t+1) determined entirely from X(t) , no improvement from knowing Y(t): TE=0
Recommend
More recommend