I-SiamIDS: an improved Siam-IDS for handling class imbalance in network-based intrusion detection systems
This paper proposes an algorithm-level approach called Improved Siam-IDS (I-SiamIDS), which is a two-layer ensemble for handling class imbalance problem and showed significant improvement in terms of Accuracy, Recall, Precision, F1-score and values of Area Under the Curve (AUC) for both NSL-KDD and CIDDS-001 datasets.
Abstract
NIDSs identify malicious activities by analyzing network traffic. NIDSs are\ntrained with the samples of benign and intrusive network traffic. Training\nsamples belong to either majority or minority classes depending upon the number\nof available instances. Majority classes consist of abundant samples for the\nnormal traffic as well as for recurrent intrusions. Whereas, minority classes\ninclude fewer samples for unknown events or infrequent intrusions. NIDSs\ntrained on such imbalanced data tend to give biased predictions against\nminority attack classes, causing undetected or misclassified intrusions. Past\nresearch works handled this class imbalance problem using data-level approaches\nthat either increase minority class samples or decrease majority class samples\nin the training data set. Although these data-level balancing approaches\nindirectly improve the performance of NIDSs, they do not address the underlying\nissue in NIDSs i.e. they are unable to identify attacks having limited training\ndata only. This paper proposes an algorithm-level approach called I-SiamIDS,\nwhich is a two-layer ensemble for handling class imbalance problem. I-SiamIDS\nidentifies both majority and minority classes at the algorithm-level without\nusing any data-level balancing techniques. The first layer of I-SiamIDS uses an\nensemble of b-XGBoost, Siamese-NN and DNN for hierarchical filtration of input\nsamples to identify attacks. These attacks are then sent to the second layer of\nI-SiamIDS for classification into different attack classes using m-XGBoost. As\ncompared to its counterparts, I-SiamIDS showed significant improvement in terms\nof Accuracy, Recall, Precision, F1-score and values of AUC for both NSL-KDD and\nCIDDS-001 datasets. To further strengthen the results, computational cost\nanalysis was also performed to study the acceptability of the proposed\nI-SiamIDS.\n