Systematic trends in results from different density functional theory models
MetadataShow full item record
The quantum mechanical properties of molecular and condensed matter systems are commonly calculated using Kohn-Sham density functional theory (DFT). Its favorable balance of accuracy and computational cost has made DFT the workhorse of both computational quantum chemistry and computational materials science. While the true exchange-correlation functional at the heart of this formally exact theory is an illusive object, there have been sustained efforts by numerous groups to develop approximations to this functional that yield more and more accurate results. Many dozen of these approximate functionals have been proposed and implemented over the past decades. A set of archetypical functional designs make up Jacob's ladder, the hierarchy of DFT methodology. In addition to the approximate nature of the available exchange-correlation functionals, another source of approximation is introduced by the use of truncated basis set expansions, in terms of which the numerical solutions for a given DFT approach are computed. While DFT is generally less sensitive to basis set size than wavefunction methods, this is still an issue of concern in practical applications. It is evident that results from different DFT approximations and basis sets will differ. However, it can reasonably be assumed that these discrepancies are not random but predominantly systematic in nature. Random discrepancies would indicate a fundamentally flawed functional and ultimately pathological behavior. In this project, we will analyze a large-scale data set with the results of different DFT model chemistries (i.e., combinations of quantum chemical methods -- in this case DFT flavors -- and basis sets) with respect to systematic trends. We begin by comparing the results of popular methods ranging from local density approximations (LDA), generalized gradient approximations (GGA), and a variety of hybrid approximations. The quantum chemical properties analyzed in this project were the HOMO and LUMO energies (eV), band gap energies (eV), and the dipole moments (D) of organic semiconductor compounds. After initial analysis using a linear regression technique with B3LYP/SVP and BP86/SVP as benchmark approximations, an overwhelming linear trend is noticed. It implies the results from the expensive DFT models, which are more accurate but considerably more time-consuming, can be attained by mapping the results from cheap DFT models via linear regression equations. However, although the majority follow the linear trend, there are still outliers for which the trend doesn't hold. Hence, identifying the outliers by finding the notable compound classes or compounds with particular characteristics, becomes the key to the success of this new methodology. First, We study the structure-property relationships by implementing pattern recognition of 26 fragments. Then, we employ a hypergeometric distribution analysis in order to assess the prevalence of the 26 fragments in the outliers. Based on these structure-property features, we dynamically derive an outlier predicting model to distinguish between fitted class and outliers. Finally, the area under curve (AUC) reaches least 0.8 throughout the different levels of outliers. It shows the models can successfully classify a molecule as an outlier or as belonging to the fitted group by its structural composition. We can accordingly select the appropriate approach to obtaining the results for different model chemistries.