Advancing near-optimal and optimal feature selection methodology
MetadataShow full item record
The objective of a feature selection problem is to reduce the dimensionality of a given data set by collectively preserving a limited set of most informative features. This problem is often encountered in machine learning applications and other various domains. Methods for performing feature selection can be divided into two general categories, near-optimal and optimal, reflecting the trade-off between computational efficiency and optimality guarantees. This thesis reviews and advances the key fundamental methodological ideas developed over the years for feature selection. In addition to offering detailed explanations of the ideas behind existing algorithms, the thesis's main contribution is two-way. First, it presents a highly accurate near-optimal feature selection algorithm, the Improved Mutual Information Based Feature Selector. The algorithm uses the Sequential Forward Floating Selection strategy with a criterion function based on mutual information, which is estimated with the highest possible accuracy using a new, efficient memory allocation scheme for working with high-dimensional empirical distributions. Second, this thesis advances the state-of-art on optimal feature selection methods. It uses a recently introduced tree search strategy, Cyclic Best First Search, to obtain a novel Memory Based Cyclic Branch and Bound algorithm. Computational experiments show that the designed algorithms generally perform more accurately and consistently than their existing counterparts of the same level of design complexity.