n South African Computer Journal - Extracting salient features for network intrusion detection using machine learning methods : research article
|Article Title||Extracting salient features for network intrusion detection using machine learning methods : research article|
|© Publisher:||South African Computer Society (SAICSIT)|
|Journal||South African Computer Journal|
|Affiliations||1 Rhodes University, 2 University of South Africa and 3 University of the Witwatersrand|
|Publication Date||Jul 2014|
|Pages||82 - 96|
|Keyword(s)||Computing methodologies - Machine learning, Computing methodologies - Supervised learning by classification, Decision trees, Feature selection, Information systems - Content analysis and feature selection, Information systems - Data mining, Machine learning, Network intrusion detection, Security and privacy - Intrusion detection systems, Security and privacy - Network security and Theory of computation - Oracles and decision trees|
This work presents a data preprocessing and feature selection framework to support data mining and network security experts in minimal feature set selection of intrusion detection data. This process is supported by detailed visualisation and examination of class distributions. Distribution histograms, scatter plots and information gain are presented as supportive feature reduction tools. The feature reduction process applied is based on decision tree pruning and backward elimination. This paper starts with an analysis of the KDD Cup '99 datasets and their potential for feature reduction. The dataset consists of connection records with 41 features whose relevance for intrusion detection are not clear. All traffic is either classified 'normal' or into the four attack types denial-of-service, network probe, remote-to-local or user-to-root. Using our custom feature selection process, we show how we can significantly reduce the number features in the dataset to a few salient features. We conclude by presenting minimal sets with 4-8 salient features for two-class and multi-class categorisation for detecting intrusions, as well as for the detection of individual attack classes; the performance using a static classifier compares favourably to the performance using all features available. The suggested process is of general nature and can be applied to any similar dataset.
Article metrics loading...