Ensemble Feature Selection for Network Intrusion Detection: Combining Information Gain and Random Forest with Recursive Feature Elimination
DOI:
https://doi.org/10.24203/262j5109Keywords:
Classification; ensemble; feature selection; network intrusion detection system; pre-processing; recursive feature eliminationAbstract
Network intrusion detection systems (NIDS) are essential for protecting computer networks against cyberattacks. The selection of a nominal set of essential features that may adequately discriminate malicious traffic from normal traffic is indispensable while developing a NIDS. As such, a more reliable and accurate detection result may be realized when intrusion detection is carried out on a dataset based on an inclusive feature representation. This work presents the pre-processing and feature selection workflow as well as its results in the case of the CIC-IDS-2017 dataset, with a focus on two cyber-attacks, namely Denial-of-Service (DoS) and PortScan. The study applied an ensemble feature selection method based on information gain and Random Forest to filter out important features. Recursive Feature Elimination method was then applied to the reduced features to optimize the selected feature subset. The selected feature subset was experimented with using two classification algorithms, namely support vector machine and multi-layer perceptron. In the evaluation process, four widely used performance metrics were considered. The study results demonstrated the efficacy of the proposed ensemble approach to optimize the selected feature subset for detecting PortScan and DoS attacks in network traffic. Experimental results revealed that the support vector machine had a slight advantage in accuracy and could train more quickly. According to the study's evaluation, the NIDS may be able to shorten processing times without sacrificing the ability to detect PortScan and DoS attacks accurately by choosing a narrow subset of informative features. This suggests the approach might be applicable to real-world NIDS scenarios involving these attacks. The study also provides encouraging perspectives on how ensemble feature selection utilizing MLP and SVM can enhance the effectiveness of NIDS. Building on these findings, more research can create NIDS solutions that are even more reliable and efficient for the dynamic field of cybersecurity.
References
M. Ring, S. Wunderlich, D. Scheuring, D. Landes and A. Hotho, “A Survey of Network-based Intrusion Detection Data Sets,” arXiv preprints, vol. arXiv:1903.02460v2, p. 17, 6 July 2019.
D. Berman, A. Buczak, J. Chavis and C. Corbett, “A Survey of Deep Learning Methods for Cyber Security,” Information, vol. 10, no. 122, pp. 1-35, 2019.
V. Jyothsna and K. Prasad, “Anomaly-Based Intrusion Detection System,” in Computer and Network Security, IntechOpen, 2019, pp. 1-15.
R. Vinayakumar, M. Alazab, K. Soman, P. Poornachandran, A. Al-Nemrat and S. Venkatraman, “Deep Learning Approach for Intelligent Intrusion Detection System,” IEEE Access, pp. 41525-41550, 2019.
E. Min, J. Long, Q. Liu, J. Cui and W. Chen, “TR-IDS: Anomaly-Based Intrusion Detection through Text-Convolutional Neural Network and Random Forest,” Security and Communication Networks, vol. 2018, pp. 1-10, 2018.
K. Patel and B. Buddhadev, “Machine Learning based Research for Network Intrusion Detection: A State-of-the-Art,” International Journal of Information & Network Security (IJINS), vol. 3, no. 3, pp. 31-50, June 2014.
P. Angelov, “Anomaly Detection based on Eccentricity Analysis,,” in IEEE Symposium on Evolving and Autonomous Learning Systems (EALS), 2014.
L. Göcs and Z. C. Johanyák, “Identifying Relevant Features of CSE-CIC-IDS2018 Dataset for the Development of an Intrusion Detection System,” arxiv preprint, vol. arxiv 2307.11544v1, p. 24, 21 Jul7 2023.
A. Javadpour, S. Abharian and G. Wang, “Feature Selection and Intrusion Detection in Cloud Environment Based on Machine Learning Algorithms,” in 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), Guangzhou, China, 2017.
K. Ren, Y. Zeng, Z. Cao and Y. Zhang, “ID-RDRL: a deep reinforcement learning-based feature selection intrusion detection model,” Scientific Reports, vol. 12, p. 17, 2022.
A. Ali, S. Shaukat, M. Tayyab, M. Khan, J. Khan, Arshad and J. Ahmad, “Network Intrusion Detection Leveraging Machine Learning and Feature Selection,” in 2020 IEEE 17th International Conference on Smart Communities: Improving Quality of Life Using ICT, IoT and AI (HONET), Charlotte, NC, USA,, 2020.
Y. Swe, P. Aung and A. Hlaing, “A Slow DDoS Attack Detection Mechanism using Feature Weighing and Ranking,” in Proceedings of the 11th Annual International Conference on Industrial Engineering and Operations Management, Singapore, 2021.
A. Patil and D. Kshirsagar, “An approach towards hybrid feature selection for detection of DDoS attack,” An approach towards hybrid feature selection for detection of DDoS attack, vol. 3, no. 3-4, pp. 274-289, 2021.
D. Kshirsagara and S. Kumar, “Towards an intrusion detection system for detecting web attacks based on an ensemble of filter feature selection techniques,” CYBER-PHYSICAL SYSTEMS, vol. 9, no. 3, pp. 244-259, 2023.
Z. Pelletier and M. Abualkibash, “Evaluating the CIC IDS-2017 Dataset Using Machine Learning Methods and Creating Multiple Predictive Models in the Statistical Computing Language R,” International Research Journal of Advanced Engineering and Science , vol. 5, no. 2, pp. 187-191, 2020.
Kurniabudi, D. Stiawan, Darmawijoy, M. Idris, A. Bamhdi and R. Budiarto, “CICIDS-2017 Dataset Feature Analysis With Information Gain for Anomaly Detection,” IEEE Access, vol. 8, pp. 132911-132921, 2020.
B. Reis, E. Maia and I. Praça, “Selection and Performance Analysis of CICIDS2017 Features Importance,” in Foundations and Practice of Security: 12th International Symposium, FPS 2019, Toulouse, France, 2019.
J. Cunningham and B. Yu, “Dimensionality reduction for large-scale neural recordings,” Nat Neurosci, vol. 17, no. 11, pp. 1500-1509, 2014.
J. Miao and L. Niu, “A Survey on Feature Selection,” Procedia Computer Science, vol. 91, pp. 919-926, 2016.
Z. Hira and D. Gillies, “A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data,” Advances in Bioinformatics, pp. 1-14, 2015.
G. Chandrashekar and F. Sahin, “A survey on feature selection methods,” Computers and Electrical Engineering, vol. 40, no. 1, pp. 16-28, 2014.
Y. Zhang, H. Zhang and B. Zhang, “An Effective Ensemble Automatic Feature Selection Method for Network Intrusion Detection,” Information, vol. 13, no. 7: 314, 2022.
W. Jia, M. Sun, J. Lian and S. Hou, “Feature dimensionality reduction: a review,” Complex & Intelligent Systems, vol. 8, p. 2663–2693, 21 January 2022.
Y. Yin, J. Jang-Jaccard, W. Xu, A. Singh, J. Zhu, F. Sabrina and J. Kwak, “IGRF-RFE: A hybrid feature selection method for MLP-based network intrusion detection on UNSW-NB15 Dataset.,” Journal of Big Data, vol. 10, no. 15, p. 26, 2023.
R. Patgiri, U. Varshney, T. Akutota and R. Kunde, “An Investigation on Intrusion Detection System Using Machine Learning,” in 2018 IEEE Symposium Series on Computational Intelligence (SSCI), Bangalore, India, 2018.
P. Dhal and C. Azad, “A comprehensive survey on feature selection in the various fields of machine learning,” Applied Intelligence, vol. 52, p. 4543–4581, 2022.
G. Biau and E. Scornet, “A random forest guided tour,” TEST, vol. 25, no. 2, pp. 197-227, 2016.
R. Su, X. Liu and L. Wei, “MinE-RFE: determine the optimal subset from RFE by minimizing the subset-accuracy-defined energy,” Brief Bioinform, vol. 21, no. 2, pp. 687-698, 2020.
J. Brownlee, “Recursive Feature Elimination (RFE) for Feature Selection in Python,” Machine Learning Mastery, 28 August 2020. [Online]. Available: https://machinelearningmastery.com/rfe-feature-selection-in-python/. [Accessed 28 December 2023].
B. Darst, K. Malecki and C. Engelman, “Using recursive feature elimination in random forest to account for correlated variables in high dimensional data.,” BMC Genomic Data, Vols. 19 (Suppl. 1), 65 , 2018.
O. Mitchell, “Experimental Research Design,” The Encyclopedia of Crime and Punishment, 2 October 2015.
I. Sharafaldin, A. Gharib, A. H. Lashkari and A. Ghorbani, “Towards a Reliable Intrusion Detection Benchmark Dataset,” Journal of Software Networking, vol. 2017, no. 1, p. 177–200, 2017.
N. Chawla, K. Bowyer, L. Hall and W. P. Kegelmeyer, “Smote: synthetic minority over-sampling technique,” Journal of artificial intelligence research, vol. 16, pp. 321-357, 2002.
G. E. A. P. A. Batista, A. L. C. Bazzan and M. C. Monard, “Balancing Training Data for Automated Annotation of Keywords: a Case Study,” in WOB, S. Lifschitz, N. F. A. Jr., G. J. P. Jr. and R. Linden, Eds., 2003, pp. 10-18.
E. Bisong, “Introduction to scikit-learn,” in Building Machine Learning and Deep Learning Models on Google Cloud Platform, Berlin, Germany, Springer, 2019, p. 215–229.
D. Svozil, V. Kvasnicka and J. Pospichal, “Introduction to multi-layer feed-forward neural networks,” Chemometrics and Intelligent Laboratory Systems, vol. 39, no. 1, pp. 43-62, 1997.
J. Gu and S. Lu, “An effective intrusion detection approach using SVM with naïve Bayes feature embedding,” Computers & Security, vol. 103, 2021.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Stephen Wanjau, Gabriel Kamau

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The articles published in International Journal of Computer and Information Technology (IJCIT) is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.