n Annual Proceedings of the South African Statistical Association Conference - Genetic algorithms for feature selection

Volume 2019 Number Congress 1
  • ISSN :


This paper proposes using genetic algorithms (GA) for feature selection. Although the focus is on using this approach in linear regression, it can be extended to other machine learning methods. The GA approach is tailored to regression models and then compared to traditional feature selection using stepwise and lasso. In this research, emphasis is placed on finding the best feature subset among all possible combinations based upon the Bayesian Information Criterion, BIC. The approach is illustrated using a case study from fracking oil wells and simulations.

The conclusion is that GA selection has great benefit for applying machine learning in applications with many nuisance features. GA selection is more likely to find the best model among all possible subsets. Constraints from model restrictions, data transformations, data encoding are naturally incorporated into the algorithm. Although the time needed to find the best solution is higher than shrinkage methods, in most cases it is acceptable when compared to the improved selection and confidence in the selected features.

The case study and simulations used SAS® Enterprise Miner™ and Python.

Loading full text...

Full text loading...


Article metrics loading...


This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error