Machine Learning for the Materials Scientist Chris Fischer*, Kevin Tibbetts, Gerbrand Ceder Massachusetts Institute of Technology, Cambridge, MA Dane Morgan University of Wisconsin, Madison, WI NGDM, October 10, 2007
Motivation : materials design through calculation computing power: exponential scaling with time Skylaris, C. et. al. J. Phys. Chem. 122 , 084119 (2005) O(N 3 ) O(N) Moore, G. ISSCC 2003 slides (http://www.intel.com) Run-time: polynomial scaling with number of atoms NGDM, October 10, 2007
DFT as a predictive tool Burkett, T. et. al. Phys. Rev. Lett. 93 (2004) Norskov, J. et. al. MRS Bulletin 31 (2006) Marzari, N. MRS Bulletin 31 (2006) courtesy of M. Lazzeri, Paris VI Jussieu Marzari, N. MRS Bulletin 31 (2006) courtesy of D. Scherlis, MIT NGDM, October 10, 2007
computational materials design strategies Calculating properties of realistic nanostructures ab initio Lee, Y. S. et al. PRL 95 076804 (2005) Galli, G. University of California, Davis NGDM, October 10, 2007
computational materials design strategies Which combinations yield the optimal material ? NGDM, October 10, 2007
Outline Machine learning in Computational Materials Design Searching for Structure: combining historical information with Density Functional Theory Data Mining the High-Throughput engine wrap-up NGDM, October 10, 2007
computational materials design strategies Which combinations yield the optimal material ? NGDM, October 10, 2007
Motivation : searching for new materials for i in ( relevant chemistries ) { ... ... getStablePhases(i); ... ... Depends on which phases are stable and calculateProperty(i); their structure i = nextChemistry(); } NGDM, October 10, 2007
Motivation : materials by design for i in ( relevant chemistries ) { ... ... getStablePhases(i); ... ... Machine Learning needed here !! Depends on which phases are calculateProperty(i); stable and their structure i = nextChemistry(); } NGDM, October 10, 2007
The need for machine learning Material Property DFT Code Predictions Doesn't know what to calculate next NGDM, October 10, 2007
The need for machine learning Material Property DFT Code Predictions Database Machine Learning of Computed and Framework Experimental results NGDM, October 10, 2007
Computational Materials Design poised for impact 'Commodity' computational resources Open source electronic structure software ~$200-250k capital Computing budget investment ~50k compounds/year NGDM, October 10, 2007
Computational Materials Design poised for impact ICSD : World's Largest database of inorganic crystal structures Computing budget ~50k compounds/year First Entry: 1913 # of entries: 100,243 # usable compounds: 29,962 NGDM, October 10, 2007
The structure search problem for i in ( relevant chemistries ) { ... ... getStablePhases(i); ... ... Where do we put the atoms calculateProperty(i); if no experimental structure i = nextChemistry(); is known ?? } Depends on which phases are stable and their structure NGDM, October 10, 2007
Strategies to search for structure Heuristic Rules Coordinate Search: Optimize energy (or free or energy) directly in the space Chemical Intuition of atomic coordinates NGDM, October 10, 2007
Methods to search for structure r N E Coordinate Search: r N GroundState ≡ arg min r 1 , r 2 , , r 1 , r 2 , , Optimize energy (or free energy) directly in the space # of dimensions = 3N – 3 + dim(a,b,c, α , β , γ ) of atomic coordinates complex energy landscape Doye, J. PRL, 88 , 238701, (2002) NGDM, October 10, 2007
Methods to search for structure r N E Coordinate Search: r N GroundState ≡ arg min r 1 , r 2 , , r 1 , r 2 , , Optimize energy (or free energy) directly in the space # of dimensions = 3N – 3 + dim(a,b,c, α , β , γ ) of atomic coordinates Proposed Solutions Calculate energy of a finite set of structure prototypes Doye, J. PRL, 88 , 238701, (2002) NGDM, October 10, 2007
Methods to search for structure r N E Coordinate Search: r N GroundState ≡ arg min r 1 , r 2 , , r 1 , r 2 , , Optimize energy (or free energy) directly in the space # of dimensions = 3N – 3 + dim(a,b,c, α , β , γ ) of atomic coordinates Proposed Solutions Calculate energy of a finite set of structure prototypes Use a stochastic optimization procedure (hop from basin to basin) e.g., Simulated Annealing Genetic Algorithms Doye, J. PRL, 88 , 238701, (2002) NGDM, October 10, 2007
Methods to search for structure r N E Coordinate Search: r N GroundState ≡ arg min r 1 , r 2 , , r 1 , r 2 , , Optimize energy (or free energy) directly in the space # of dimensions = 3N – 3 + dim(a,b,c, α , β , γ ) of atomic coordinates Proposed Solutions Calculate energy of a finite set of structure prototypes Knowledge is not transferred across chemistries Use a stochastic optimization procedure (hop from basin to basin) e.g., Simulated Annealing Genetic Algorithms Doye, J. PRL, 88 , 238701, (2002) NGDM, October 10, 2007
Methods to search for structure Heuristic Rules Use previous experiments to suggest what to calculate How ? Identify a set of simple parameters based on alloy constituents 1932: Pauling electronegativity r A ,B 1935: Laves & Witte 1926,1936-7: Hume-Rothery, e n at Mott & Jones e 1976: Miedema n ws NGDM, October 10, 2007
Methods to search for structure Heuristic Rules Plot stable structures in space of parameters 1986: Pettifor 1983: Villars e r A ,B n at NGDM, October 10, 2007
Methods to search for structure Heuristic Rules Plot stable structures in space of parameters 1986: Pettifor 1983: Villars e r A ,B n at Heuristic rules efficiently code historical knowledge provide transfer of knowledge Can we leverage historical knowledge to intelligently search for structure ? NGDM, October 10, 2007
description of knowledge base Knowledge Base Experimental Data Pauling File binaries edition ( Villars, P. et. al. J. of Alloys and Compounds, (2004) ) 1335 binary alloys 3975 non-unique compounds 4263 compounds total alloys not containing elements: He, B, C, N, O, F, Ne, Si, P, S, Cl, Ar, As, Se, Br, Kr, Te, I, Xe, At, Rn NGDM, October 10, 2007
Machine learning framework: concepts x = x A ,x 0 , ,x 1 , ,x B Low temperature state of alloy 2 Data ≡ { x 1, , x N } database of N binary alloys NGDM, October 10, 2007
Machine learning framework: concepts x = x A ,x 0 , ,x 1 , ,x B Low temperature state of alloy 2 Data ≡ { x 1, , x N } database of N binary alloys Probability of low temperature p x state (fitted to data) Probability of low temperature p x ∣ e state conditioned on evidence 'e' NGDM, October 10, 2007
how to use the machine learning framework Material Property DFT Code Predictions Set of likely structure candidates Machine Learning Database Framework of Computed and p x ∣ e Experimental results NGDM, October 10, 2007
Preliminaries and open questions Are probabilities consistent with physical intuition ? Do probabilities encode the physics of structure stability ? NGDM, October 10, 2007
quantifying correlation in probabilistic framework probability that both structures occur in same system estimated from database correlated g (2) (x i ,x j ) Pair Cumulant g ij x i , x j = p x i ,x j p x i p x j 1 uncorrelated probability that only x i occurs anti-correlated 0 NGDM, October 10, 2007
how probabilities represent physics of mixing Do probabilities embody real physical effects ? Do probabilities embody real physical effects ? Compounds stabilized by “size” effect: g ij x i , x j = p x i ,x j p x i p x j Fe 3 C 8.48 MgCu 2 1 1 1 3 0 1 2 4 3 3 4 2 c B Data from Pauling File, Binaries Edition NGDM, October 10, 2007
how probabilities represent physics of mixing Do probabilities embody real physical effects ? Do probabilities embody real physical effects ? Compounds stabilized by “size” effect: g ij x i , x j = p x i ,x j p x i p x j Fe 3 C 8.48 MgCu 2 ~0 1 1 1 3 0 1 2 4 3 3 4 2 c B Places 'small' atoms on 'large' atom sites Data from Pauling File, Binaries Edition G. Ceder NGDM, October 10, 2007
how probabilities represent physics of mixing: more interesting correlations PuNi 3 Gd 2 Co 7 g ij x i , x j = 54 AABAAB... stacking ABAB... stacking A B A A A B B A A B Both structures share the same local environments NGDM, October 10, 2007
Structure correlation observations Correlation factors are probabilistic analogue of heuristic rules No explicit reference to physics. Physics is embedded in experimental data NGDM, October 10, 2007
Information theory for structure stability Suppose I know Fe 3 C forms @ c = ¾, how does this change prediction @ c = ½ ? How much information is carried by knowledge of structure ? Mutual Information p x i ,x j log p x i p x j p x i ,x j I i , j = ∑ x i ,x j I i , j = 〈 log [ g ij x i ,x j ] 〉 NGDM, October 10, 2007
Recommend
More recommend