Exploring the Use of TensorFlow to Predict Connection Table - PowerPoint PPT Presentation

Exploring the Use of TensorFlow to Predict Connection Table Information within Chemical Structures Brodie Schroeder

Machine Learning Basics Gives "computers the ability to learn without being explicitly programmed." ● - Arthur Samuel ● Goal is to solve problems with “generalized” algorithms that apply to many different problems Unsupervised and supervised learning ● ● Artificial Neural Networks and Deep Learning

Basic Artificial Neural Network

y = <activation func>(Wx + b) Softmax, Sigmoid, ReLU...

y = [0,0,1,0,0,0,0,0,0,0] The image appears to be a ‘2’

Goal for this Project Given the XYZ coordinates of atoms and their bonding information within a chemical structure, predict a bonding table for all atoms.

Example Dataset benzene ACD/Labs0812062058 6 6 0 0 0 0 0 0 0 0 1 V2000 1.9050 -0.7932 0.0000 C XYZ coordinates and the atom type. 1.9050 -2.1232 0.0000 C This will be the ‘x’ input in our model. 0.7531 -0.1282 0.0000 C 0.7531 -2.7882 0.0000 C -0.3987 -0.7932 0.0000 C -0.3987 -2.1232 0.0000 C 2 1 1 0 0 0 0 3 1 2 0 0 0 0 Connection information for 4 2 2 0 0 0 0 bonding between atoms. We will 5 3 1 0 0 0 0 use this to train our model. 6 4 1 0 0 0 0 6 5 2 0 0 0 0 M END $$$$

Parsing SDF Files Euclidean Distance Between Atoms 6.0, 0.0, 4.312, 6.223, 7.321, 3.221, 9.023, 2.345, 1.652, 4.791, C Atomic Number of Atom 6.0, 1.542, 0.0, 4.222, 8.231, 6.321, 1.999, 4.562, 8.345, 2.221, C 6.0, 2.221, 5.012, 0.0, 4.223, 6.723, 7.232, 9.821, 3.323, 4.124, C 8.0, 7.010, 3.011, 7.221, 0.0, 5.434, 7.777, 8.421, 5.341, 9.981, O 6.0, 4.312, 3.221, 3.563, 7.212, 0.0, 6.521, 7.623, 3.253, 7.456, C 8.0, 2.333, 5.321, 6.872, 6.454, 8.991, 0.0, 4.221, 6.213, 4.343, O

Parsing SDF Files Boolean Array of Connections 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 2, 1, 1, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 3, 1, 2, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 4, 2, 2, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 5, 3, 1, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 6, 4, 1, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 6, 5, 2,

Input and Training Data 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 6.0, 0.0, 4.312, 6.223, 7.321, 3.221, 9.023, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 6.0, 1.542, 0.0, 4.222, 8.231, 6.321, 1.999, x = y_ = 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 6.0, 2.221, 5.012, 0.0, 4.223, 6.723, 7.232, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 8.0, 7.010, 3.011, 7.221, 0.0, 5.434, 7.777, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 6.0, 4.312, 3.221, 3.563, 7.212, 0.0, 6.521, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 8.0, 2.333, 5.321, 6.872, 6.454, 8.991, 0.0, 6 x 6 = 36 6 x 7 = 42

Input and Training Data Build two Python lists ● ● List ‘a’ is a list of flattened 2d NumPy matrices containing euclidean distances and atom type List ‘b’ is a list of flattened 2d NumPy matrices containing bonding ● information for all atoms ● Matrix size is capped at 28 x 29 and 28 x 28 respectively (only molecules with less than or equal to 28 atoms are included) If smaller than 28 atoms the matrix is padded with zeros ● a[n] corresponds to b[n] ●

What is TensorFlow ? “Open source software library for numerical computation using data flow ● graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them.” Provides an API that makes it easy to setup, design and train deep learning ● models.

Building the Model x = tf.placeholder(tf.float32, [None, 812]) layer1 = tf.add(tf.matmul(x, W1), b1) layer1 = tf.nn.relu(layer1) W1 = tf.Variable(tf.truncated_normal([812, 784], stddev=0.1)) b1 = tf.Variable(tf.truncated_normal([784], stddev=0.1)) layer2 = tf.add(tf.matmul(layer1, W2), b2) layer2 = tf.nn.relu(layer2) W2 = tf.Variable(tf.truncated_normal([784, 784], stddev=0.1)) b2 = tf.Variable(tf.truncated_normal([784], stddev=0.1)) layer3 = tf.add(tf.matmul(layer2, W3), b3) layer3 = tf.nn.relu(layer3) W3 = tf.Variable(tf.truncated_normal([784, 784], stddev=0.1)) b3 = tf.Variable(tf.truncated_normal([784], stddev=0.1)) y = tf.add(tf.matmul(layer3, W), b) y = tf.nn.sigmoid(y) W = tf.Variable(tf.truncated_normal([784, 784], stddev=0.1)) b = tf.Variable(tf.truncated_normal([784], stddev=0.1)) y_ = tf.placeholder(tf.float32, [None, 784])

Building the Model cross_entropy = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=y_, logits=y)) train_step = tf.train.AdamOptimizer(0.001).minimize(cross_entropy) sess = tf.InteractiveSession() tf.global_variables_initializer().run() a, b = get_batch() train_len = len(a) correct_prediction = tf.equal(y_, y) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) # Training for i in range(train_len): batch_xs = a[i] batch_ys = b[i] _, loss, acc = sess.run([train_step, cross_entropy, accuracy], feed_dict={x: batch_xs, y_: batch_ys}) print("Loss= " + "{:.6f}".format(loss) + " Accuracy= " + "{:.5f}".format(acc))

Building the Model # Test trained model cumulative_accuracy = 0.0 for i in range(train_len): acc_batch_xs = a[i] acc_batch_ys = b[i] cumulative_accuracy += accuracy.eval(feed_dict={x: acc_batch_xs, y_: acc_batch_ys}) print("Test Accuracy= {}".format(cumulative_accuracy / train_len))

Results thus far...

Test Accuracy = 0.865 Apprx. 10,000 training sets

Future Improvements and Optimization Cache results of parsing SDF file ● ● Improve code for calculating distances ● Improve initial values of weights Overtraining or undertraining? ● TensorBoard visualization ●

Questions? View the code: https://github.com/Allvitende/chemical-modeling/

Exploring the Use of TensorFlow to Predict Connection Table - PowerPoint PPT Presentation

Exploring the Use of TensorFlow to Predict Connection Table Information within Chemical Structures Brodie Schroeder Machine Learning Basics Gives "computers the ability to learn without being explicitly programmed." - Arthur

Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring

C-FX-02-V1.0 DSV 4.0 2 45 15 TensorFlow TensorBoard TensorFlow

Getting Started with TensorFlow Part I: TensorFlow Graphs and Sessions Nick Winovich Department

A Trip Through the NGC TensorFlow Container GTC 2019 S9256 AGENDA A Trip Through the TensorFlow

Distributed TensorFlow Stony Brook University CSE545, Fall 2017 Goals Understand

TensorFlow w/XLA: TensorFlow, Compiled! Expressiveness with performance Pre-release

TensorFlow: a Framework for Scalable Machine Learning ACM Learning Center, 2016 You probably

TensorFlow: neural networks lab Paolo Dragone and Andrea Passerini paolo.dragone@unitn.it

Some resources for ML/TensorFlow TensorFlow resources A good tutorial (about 2:40:00 long)

Machine learning on mobile and edge devices with TensorFlow Lite Developer advocate for

TensorFlow Extended (TFX) An End-to-End ML Platform Clemens Mewald TensorFlow Extended (TFX): An

TensorFlow Probability Joshua V. Dillon Software Engineer Google Research What is TensorFlow

Getting Started with TensorFlow Part II: Monitoring Training and Validation Nick Winovich

TensorFlow Flexible, Scalable, Portable Rajat Monga Engineering Director, TensorFlow Released

PREDICT- -HD HD PREDICT BIG QUESTION: What do we need before we can treat HD ? How does

CONNECTION BETWEEN MICROPILES CONNECTION BETWEEN MICROPILES CONNECTION BETWEEN MICROPILES

MIXED PRECISION TRAINING OF DEEP NEURAL NETWORKS Carl Case, NVIDIA OUTLINE 1. What is mixed

e identification in the NO A Near Detector events Ciro Riccio Supervisors: Xuebing Bu and

Canterbury Lorraine Monkhouse Area Governance Officer Agenda 1. Outline of the meeting-

Learning in Market Risk Assessment Scott W. Bauguess Deputy Chief Economist U.S. Securities and

Year 12 Parent Information Session Year 12 Parent Information Session What support is

public involvement in core outcome set development: qualitative study Lucy Brading PhD Student

Hans Christiaan Haan Informal Apprenticeship Training Main points: Skills development of

Machine learning a very brief introduction Jaime Norman LPSC Grenoble Workshop on