Molecular Property Prediction: A Multilevel Quantum Interactions Modeling Perspective Chengqiang Lu†, Qi Liu†*, Chao Wang†, Zhenya Huang†, Peize Lin‡, Lixin He‡ †Anhui Province Key Lab. of Big Data Analysis and Application, University of S&T of China ‡China Key Laboratory of Quantum Information, University of S&T of China AAAI 2019
01 Introduction 02 Related Work CONTENTS 03 MGCN 04 Experiment
Introduction
01 Introduction Material Discovery Paradigms Feedback cycle Material Molecular Device Testing & concept synthesis construction characterization Example Properties Device Checking prototype Science 2018. Sanchez-Lengeling, et al. "Inverse molecular design using machine learning: Generative models for matter engineering."
Application Material Medicine Food Discovery Design Development
01 Introduction The Most Time-consuming Step Material Molecular concept synthesis To find the molecule with desired properties. We need explore the molecule database (e.g. gdb-17), and predict molecular properties.
01 Introduction Our Task Properties: U0 (Atomization energy at 0K) U (Atomization energy at room temperature) H (Enthalpy at room temperature) G (Free energy of atomization) . . Input Output . (molecule) (properties) J. Chem. Inf. Model. 2012. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. Ruddigkeit Lars, van Deursen Ruud, Blum L. C.; Reymond J.-L.
01 Introduction Challenge: • Molecular quantum interactions are highly complex and hard to model. • The amount of labeled molecule data is significantly limited, which requires a generalizable approach for the prediction. • The molecule data is unbalanced: most of the molecules are small and few of them are large, thus the model should be transferable .
Related Work
02 Related Works DFT (Density Functional Theory) • Classic physical methods which could date back to 1960s. • States that the quantum interactions between particles (e.g., atoms) create the correlation and entanglement of molecules which are closely related to their inherent properties • Pros: Accurate • Widely used df • • Cons: Extremely time consuming • Journal of Physics. 2014. Behler, Jörg. "Representing potential energy surfaces by high- • dimensional neural network potentials." Journal of Chemical Physics. 2017. Cubuk, Ekin D., et al. "Representations in neural network • based empirical potentials."
02 Related Works Traditional ML models Representations: Models: • BOB (bag of bonds) • Kernel ridge regression • Coulomb matrix • Random forest • HDAD (histogram of • Elastic Net distance, angle and • dihedral angle) • • • • • Cons: Hand crafted features need much domain expertise • Be restricted in practice • Journal of chemical theory and computation. 2013. Faber, F. A.; Hutchison, L.; Huang, B.; Gilmer, J.; Schoenholz, S. S.; Dahl, G. E.; Vinyals, O.; Kearnes, S.; Riley, P. F.; and von Lilienfeld, O. A. 2017. Prediction errors of molecular machine learning models lower than hybrid dft error.
02 Related Works Deep Neural Networks I Use grid-like data as input 2. Text 3. Sphere 1. Images Could utilize the models in CV/NLP • Initiative grid-like transformation usually caused • information loss 1. KDD’18. ChemNet: A Transferable and Generalizable Deep Neural Network for Small-Molecule Property Prediction 2. ACS’18. Automatic Chemical Design Using a Data -Driven Continuous Representation of Molecules 3. NIPS’17. Spherical convolutions and their application in molecular modelling
02 Related Works Deep Neural Networks II Use graph-like data as input • Deep Tensor Neural Network • Sch Net • Message Passing Neural Network Implement the conv-operator in graph • Achieve some superior experimental results • Have not utilize the multilevel property • Bad generalizability and transferability • Nature Comm’17. Quantum -chemical insights from deep tensor neural networks • NIPS’17 SchNet: A continuous- filter convolutional neural network for modeling quantum interactions • ICML’17 Neural Message Passing for Quantum Chemistry •
02 Problem Definition
Multilevel Graph Convolutional Network (MGCN)
Potential Energy Surfaces • Behler, Jörg. "Representing potential energy surfaces by high-dimensional neural network potentials." Journal of Physics: Condensed Matter 26.18 (2014): 183001. • Cubuk, Ekin D., et al. "Representations in neural network based empirical potentials." The Journal of Chemical Physics 147.2 (2017): 024104.
Atom-centered symmetry functions
Overview
Input Example: CH 2 O 2 N = 5 (atoms) • Atom List • [C, H, H, O, O] 1xN • Edge Matrix • Edge Matrix NxN • Distance Matrix • Distance Matrix NxN
Pre-processing Embedding Layer : generate initial representation of edges and atom. • Atom embedding: 𝐵 0 𝑂 × 𝐿 • Edge embedding: 𝐹 𝑂 × 𝑂 × 𝐿 Radial Basis Function Layer : convert distance matrix to robust distance tensors • ℎ - RBF function • 𝐸 𝑂 × 𝑂 × 𝐿
Interaction Layers In each interaction layer: model will generate the atomic representations at higher level Aggregate multilevel representations and update the edge and pass them representation: to the Readout Layer In detail:
Read Out Layer Thanks to the additivity and locality of molecular properties. We could process the final molecular representations separately and then sum them up.
Discussion Generalizability : Transferability : • Coordinates -> Distance tensor: • First-level knowledge are translation rotation invariance. structure/spatial-irrelevanted. • Element-wise operations: index • Pre-trained embedding. invariance. • Drop-out. www.islide.cc 23
Experiment
Data sets QM9 • Most well-known data set • Contains 134k stable molecules • 13 different properties ANI-1 • Contains 20 million unstable molecules • Only one property
Conclusion • Propose a well designed Multilevel Convolutional Neural Network (MGCN) for predicting molecular properties. • Model the quantum Interaction from a multilevel view using molecular graph as input. • MGCN model is transferable and generalizable. www.islide.cc 28
Thanks for listening. 29
Recommend
More recommend