contents
play

Contents Trend in Computer-Aided Materials Discovery - PowerPoint PPT Presentation

Contents Trend in Computer-Aided Materials Discovery High-Throughput Computational Screening & Exhaustive Enumeration Deep-Learning-based Evolutionary Design Deep-Learning-based Inverse Design Efficacy of Computer-Aided


  1. Contents  Trend in Computer-Aided Materials Discovery  High-Throughput Computational Screening & Exhaustive Enumeration  Deep-Learning-based Evolutionary Design  Deep-Learning-based Inverse Design  Efficacy of Computer-Aided Materials Discovery 1

  2. Trend in Computer-Aided Materials Discovery  For accelerated materials discovery First-principles High-performance Machine Learning Quantum Chemistry Computing Trial-and-Error Simulation Virtual screening Targeted design (high cost) (low throughput) (low hit-rate) (high hit-rate) Right solutions Iterative experiments Pre-validation High throughput with minimum effort [ 1 st Gen. ] [ 2 nd Gen.] [ 3 rd Gen. ] Conventional Rationalization Efficiency Intelligence 2

  3. Trend in Computer-Aided Materials Discovery  Prediction of materials property based on machine learning – Build-up of Materials vs. Property DB → Materials Informatics Kernel methods Bayesian approaches Deep Learning ANN ** in Chemistry (’71) (‘16 @ Stanford) QSAR * SMILES *** (‘87 Weininger) (’62, Hansch&Fujita) Bayesian Modeling Graph Kernels (‘09 @ MIT) (‘05 @ UC Irvine) * QSAR: Quantitative Structure-Activity Relationship ** ANN: Artificial Neural Network (‘18 @ Harvard) *** SMILES: Simplified Molecular-input-line Systems Introduction stage of Cheminformatics Development stage machine learning Process of Machine Learning @ Materials Research Descriptor Vector SMILES: CC(C)NCC(O)COC1=CC(CC2=CC=CC=C2)=C(CC(N)=O)C=C1 Fingerprint: Descriptor Training Analysis 011100011111101010010100100000101010001001010… graphs images 3

  4. Trend in Computer-Aided Materials Discovery  Materials design based on machine learning – Inverse QSAR → Inverse Design Deep Learning / Generative Models Inverse QSAR Exhaustive Generation GAN * for molecules (Late 80’s~) (’12 @ Tokyo) (‘17 @ Harvard) Inverse Design Genetic Algorithms (’16 @ SAIT) SMILES Autoencoder (’92 @ Purdue) (‘16 @ Harvard) Focus on autonomous molecular generation * GAN: Generative Adversarial Network Autoencoder Combinatorial Evolutionary 4

  5. Trend in Computer-Aided Materials Discovery  In-silico technologies for materials discovery Elemental Technologies Materials Discovery Methodologies Machine Learning [ In ] Targets Inverse Design [ Out ] Materials Molecules Informatics + + Evolutionary DB Design Molecular Target molecules Enumeration HTCS (High-Throughtput Automated Computational Screening) Simulation 5

  6. High-Throughput Computational Screening & Exhaustive Enumeration “Landscape of phosphorescent light -emitting energies of homoleptic Ir(III)- complexes predicted by a graph- based enumeration and deep learning”, GI01.02.02, 2018 MRS fall meeting 6

  7. High-Throughput Computational Screening  Property prediction with high-performance computing for large- scale exploration of materials candidates Seed Fragments Candidate Pool Combination large amounts Database of candidates Simulation Verification Target Materials 7

  8. High-Throughput Computational Screening  ML (Machine Learning)-assisted HTCS for higher efficiency Seed Fragments Candidate Pool Combination (2) Prioritizing calculation based on active learning large amounts Database of candidates (1) Simulation + ML Verification Target Materials 8

  9. High-Throughput Computational Screening  Exhaustive enumeration based on graph-theory – “Graphs” • Mathematical structures used to model pairwise relations between objects. • Made up of nodes and edges. • In chemistry, graph is used to model molecules, where nodes represent atoms and edges represent bonds. ※ Exhaustive enumeration : Systematical enumeration of all possible molecules for optimal solution search 9

  10. High-Throughput Computational Screening  Complete list of non-isomorphic graphs ID No. of edges No. of edges at each node http://www.cadaeic.net/graphpics.htm 10

  11. High-Throughput Computational Screening  Landscape of phosphorescent light-emitting energies of homoleptic Ir(III)-complex core structures – Ir(III)-complexes • Widely used as phosphorescent OLED dopants. • Figuring out the full landscape of emission color is important for discovering high-performing molecules in target color regions. New J. Chem ., 39 , 246 (2015) ACS Appl. Mater. Interfaces , 10 , 1888 – 1896 (2018) 11 Organic Electronics , 63 , 244 – 249 (2018)

  12. High-Throughput Computational Screening  Approach – Consider the nodes in graph as rings and edges as ring-connections. – Limited the total number rings between 3 and 5. – Exclude non-planar type (5-21) and invalid structures as dopant. → Only 11 graphs are valid among the total 29 graphs. 12

  13. High-Throughput Computational Screening  Enumeration – For 5- and 6-membered rings. – Substitute some carbons of each molecule with nitrogen atoms (max. five). → Total 9,919,469 (~10M) core structures 1. Graphs 3. Set Iridium positions 2. Skeletons total 405 EA 4. Substitute some carbon atoms with nitrogen atoms 13

  14. High-Throughput Computational Screening  Property prediction – Trained a deep-neural-network model with simulated T 1 data • Input: ECFP (Extended Connectivity FingerPrints) of molecular structures • Outputs: T 1 energy (phosphorescent light-emitting wavelength) 0.2 Mean Absolute Error of T 1 0.15 of the DNN (eV) With 80k training data, 0.1 the average prediction error was less than 0.1 eV 0.05 80k 0 10M = 0.8% 10K 20K 30K 40K 50K 60K 70K 80K Size of the training dataset By simulating the properties of only 0.8% molecules, we can fully scan the chemical space of 10M! 14

  15. High-Throughput Computational Screening  Results – Distribution of T 1 values – Blue-color emitting materials are rare compared with red and green 6 x 100,000 5 Number of molecules Red 4 (18.4%) Green (4.3%) 3 Blue 2 (0.4%) 1 0 0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95 1.05 1.15 1.25 1.35 1.45 1.55 1.65 1.75 1.85 1.95 2.05 2.15 2.25 2.35 2.45 2.55 2.65 2.75 2.85 2.95 Predicted T1 (eV) 15

  16. Conclusions  In materials discovery, deep-learning-based HTCS is a good alternative to conventional trial-and-error type approach.  Moreover, exhaustive enumeration makes it possible to systematically explore the whole chemical space.  With the proposed exhaustive enumeration method based on graph theory and deep learning, the whole landscape of 10M phosphorescent Ir-dopants could be scanned with just 0.8% computational cost compared with the pure simulation-based approach. 16

  17. Deep-Learning-based Evolutionary Design “Evolutionary design of organic molecules based on deep learning and genetic algorithm”, COMP , ACS fall 2018 National Meeting 17

  18. Evolutionary Design  A generic population-based metaheuristic optimization technique  Uses bio-inspired operators to reach near-optimal solutions ; mutation, crossover, and selection in case of genetic algorithm https://en.wikipedia.org/wiki/Fitness_landscape Initial population Fitness Calculate fitness Yes Done Satisfy constraints? No Selection Average fitness Mutation Crossover + New population Generation 18

  19. Deep-Learning-Based Evolutionary Design  Proposed approach Conventional Proposed Expectations Molecular Descriptor Graph or ASCII string Bit string (ECFP) • Prevent heuristic bias RNN • Secure chemical validity Molecular Evolution Heuristic Random • Versatile evaluation is possible Fitness Evaluation Simple assessment DNN *ECFP (Extended Connectivity FingerPrint) DNN (Deep Neural Network), RNN (Recurrent Neural Network) SMILES (Simplified Molecular-Input Line-Entry System) DB Seed molecule (ECFP) Best-fit molecule 1 1 1 0 0 1 1 0 Fitness evaluation Inspection of Mutation (n=50) 1 0 1 0 (DNN) chemical validity 1 1 0 0 Decoding to SMILES (RNN) Inspection of Decoding to Iteration chemical validity SMILES (RNN) 1 1 0 0 0 0 0 1 Parents Fitness evaluation Evolution 1 1 0 1 Crossover Selection (DNN) Crossover → Mutation) 1 0 0 1 Mutation 1 1 0 0 0 0 0 1 19

  20. Deep Learning-Based Evolutionary Design  Deep learning models [DNN] 3 hidden layers, 500 hidden units in each layer • [RNN] 3 hidden layers, 500 long short-term memory units • DNN Model RNN Model Input (ECFP*) <start> y 1 =‘CCC’ y 2 =‘CCC’ y T =‘)=O’ Input … t=1 t=2 t=3 t=T+1 (ECFP*) y 1 =‘ CCC ’ y 2 =‘ CC C ’ y 3 =‘ CC ( ’ <end> Output (SMILES) y = (‘ CCC ’,‘ CC C ’,‘ CC ( ’,…, ‘ )= O ’) → ‘ CCCC(N)=O ’ Output (Properties) *ECFP (dimension=5,000, neighbor size=6) 20

Recommend


More recommend