Machine Learning Model Selection for Predicting Properties of High Refractive Index Polymers
Abstract
In the field of materials science and chemistry, machine learning has emerged as a promising technique in the recent times for the accelerated discovery of novel materials. This thesis focuses on one of the major aspects of machine learning, i.e., model selection,which is an important and also time-intensive task but remains highly unexplored in the materials community. We present a framework for automated model selection for machine learning with our research group's current work in the prediction of properties of organic polymers as the primary focus. The traditional approach for hyper-parameter optimization of a given machine learning model is discussed in the beginning, followed by the need for more specialized techniques due to the shortcomings of this approach that are readily identified. We analyze two algorithms for hyper-parameter selection: genetic algorithm and particle swarm optimization. For this purpose, we develop a genetic algorithm and particle swarm optimization module which is then incorporated within our research group's machine learning software package, ChemML. The algorithms are compared based on their performance as well as the time taken for the optimization to complete. It is shown that both genetic algorithm and particle swarm optimization are able to find better hyper-parameter values compared to the traditional methods used for hyper-parameter tuning, but at the cost of slightly higher computational time. Two approaches for reducing the computational time are also explored; one approach being feature selection using genetic algorithm, while the other is reducing the size of the data. It is shown that both the methods result in a lower computational time without losing much on the prediction accuracy of the machine learning model.