chemical insights from a random forest prediction of
play

Chemical Insights from a Random Forest Prediction of Molecular - PowerPoint PPT Presentation

Chemical Insights from a Random Forest Prediction of Molecular Quantum Properties Beomchang Kang Seoul National University 2019.11.8, 1st XAIENCE Conference Fluorescent molecule Bio-imaging Specification Cell organelles


  1. Chemical Insights from a Random Forest Prediction of Molecular Quantum Properties Beomchang Kang Seoul National University 2019.11.8, 1st XAIENCE Conference

  2. Fluorescent molecule • Bio-imaging • Specification • Cell organelles • Proteins • Observation • Structure • Dynamics

  3. Good fluorescent molecule? • High quantum yield in visible area • Distinctive color • Low toxicity • High synthetic ability

  4. Towards discovery of novel and effective fluorescent molecules • Prediction of quantum properties for a given molecule • High quantum yield • Distinctive color • Searching the chemical space for molecules of desired properties

  5. Today, I focus on… • Prediction of • Oscillator strength to get high quantum yield • Excitation energy • Gaining chemical insight from Random Forest results

  6. Excitation Energy • Energy difference between 2 state • Electronic transition • Determines color

  7. Oscillator strength (OS) • Dimensionless quantity • Probability of electromagnetic radiation • Absorption or emission • Transitions between energy levels • To have high OS (Oscillator Strength) • Orbital shapes of the two states must be different

  8. Methods

  9. Prediction of molecular properties Molecule Predictor Property

  10. PubChemQC Database • Molecular quantum calculation • DFT • TD-DFT • From PubChem • Really synthesized • Molecular orbitals • Quantum properties • Classical properties

  11. Data set for RF • From PubchemQC • Only H, B, C, N, O, F, P, S, Cl • Only neutral molecules • Randomly selected 0.5 M compounds • Training:Test = 9:1

  12. RandomForest • Advantage • Simple • White-box • Feature importance • From feature importance • Chemical Insight • To be compared with deep learning methods

  13. Extended Circular FingerPrint [ECFP] • 2D Molecule -> Identifiers • Parameter - Radius • Bit vector of ECFP • Hashing • One-hot encoding (binary) • Parameter - # of bits

  14. Results & Discussion

  15. RF result - Excitation Energy • RMSE 0.4500(eV) • PearsonR 0.8689

  16. RF result -Oscillator strength • RMSE 0.066 • PearsonR 0.7300

  17. 0.5 M set Mean Median std 0.042 0.009 0.096

  18. Feature importance to Fragments 1 … 6128 6129 6130 … 16384 0.xxx 0.xxx 0.022 0.xxx 0.xxx Many Fragments…

  19. RandomForest - Feature importance • Oscillator strength Bit number 6129 • ECFP6 Cc1=cc=c(o1)c=C Oscillator strength 0.4690 • n_bit = 16384 • Feature Importance > 0.02 Feature # of Bit Number Importance Fragments 9352 0.0330 115 8017 0.0251 107 6192 0.0218 129

  20. Important Fragments • # of molecules which have tag fragment > 3 • Feature importance > 0.02 Fragment radius Mean OS # of molecules • ECFP6, 16384 vector 1 0.175 10590 • Average of OS > 0.1 3 0.175 4 2 0.342 9 3 0.211 11 1 0.207 6263 3 0.101 4

  21. Fragment of high OS • C(=C)c(c)o • Radius = 2 • 9 molecules • Mean of OS = 0.342

  22. Ethyl 5-ethenylfuran-2-carboxylate OS = 0.5230

  23. 5-ethenyl-3H-1,3-oxazole-2-thione OS = 0.4790

  24. ethyl 2-(5-ethenylfuran-2-yl)propanoate OS = 0.4730

  25. Thank You!

Recommend


More recommend