Tackling Data Scarcity in Materials Research: Using Semi-supervised, Adversarial Training to Improve classification of X-ray Diffraction Patterns Shreyaa Raghavan Tonio Buonassisi, Zhe Liu MIT Photovoltaic Research Lab Contact: shreyaar@mit.edu
X-ray Diffraction (XRD) Pattern Classification XRD Pattern Example ▪ A typical machine learning problem ▪ Classification of crystals by space groups and dimensionality ▪ Currently, uses experimental data & computer- generated, synthetic data during training Figure 1. Examples of perovskite XRD patterns with different dimensionalities 1 (i.e. 0D, 2D, 3D) 1 Sun, S. et al. , Joule 3 , 1437 – 1451 (2019).
Challenges with Current Classifier (autoXRD) (1) Data Scarcity regarding generating labeled experimental data 1 (2) Simulated data in the training can be detrimental to the classifier Goal : Mimic experimental data and improve efficacy of non- experimental data (with generative adversarial network – GAN 2 ) 1 Oviedo, F., Ren, Z., Sun, S. et al. npj Comput Mater 5, 60 (2019). 2 A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb. Learning from simulated and unsupervised images through adversarial training. arXiv:1612.07828, 2016.
Proposed Method 1: Using Generative Adversarial Training Main Advantage: Unlabeled Experimental Training with Unlabeled XRD example Experimental Data Discriminator Model Model Simulated Refiner Refined XRD Update XRD input Model Example Binary Classification Real vs Fake Model Update
Proposed Method 2: Gaussian Filter Effect of Refiner Effect of Gaussian Filter on VS Model simulated XRD data (i.e. widening peaks)
Results Table 1 . Accuracies after 5-Fold Cross Validation of Space Group Classification using Proposed Methods and no Experimental Data Augmented Data 500 Data 1000 Data 2000 Data 4000 Data Accuracy (%) Accuracy (%) Accuracy (%) Accuracy (%) Simulated 12.7 23.9 26.2 34.6 Refiner Model A 39.8 51.2 53.2 62.9 (20 to 1) Refiner Model B 11.0 17.3 44.2 49.5 (30 to 1) Gaussian Filter 11.8 21.6 38.2 49.8
Future Work ▪ Accelerating characterization tasks with machine/deep learning ▪ Generalizable to tackle data scarcity in materials research and other fields (where there’s lack of large, labeled dataset)
References [1] Christopher Bowles et al. Gan augmentation: Augmenting training data using generative adversarial networks. ArXiv, abs/1810.10863, 2018. [2] Oviedo, F., Ren, Z., Sun, S. et al. Fast and interpretable classification of small X-ray diffraction datasets using data augmentation and deep neural networks. npj Comput Mater 5, 60 (2019). https://doi.org/10.1038/s41524-019-0196-x [3] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb. Learning from simulated and unsupervised images through adversarial training. arXiv preprint arXiv:1612.07828, 2016. [4] Sun, S. et al. Accelerated development of perovskite-inspired materials via high-throughput synthesis and machine-learning diagnosis. Joule 3 , 1437 – 1451 (2019). [5] https://github.com/mjdietzx/SimGAN
Thank You! ☺ Contact: shreyaar@mit.edu
Recommend
More recommend