This work proposes a malware classification scheme that constructs a model using low-end computing resources and a very large balanced dataset for malware and gives improved performance for accuracy with the tuning of the hyperparameter and achieve higher accuracy.
Article history: Received: 31 July, 2020 Accepted: 06 September, 2020 Online: 26 September, 2020 In this industry 4.0 and digital era, we are more dependent on the use of communication and various transaction such as financial, exchange of information by various means. These transaction needs to be secure. Differentiation between the use of benign and malware is one way to make these transactions secure. We propose in this work a malware classification scheme that constructs a model using low-end computing resources and a very large balanced dataset for malware. To our knowledge, and search the complete dataset is used the first time with the XGBoost GBDT machine learning technique to build a classifier using low-end computing resources. The model is optimized for efficiency with the removal of noisy features by a reduction in features sets of the dataset by domain expertise in malware detection and feature importance functionality of XGboost and hyperparameter tuning. The model can be trained in low computation resources at less time in 1315 seconds with a reduction in feature set without affecting the performance for classification. The model gives improved performance for accuracy with the tuning of the hyperparameter and achieve higher accuracy of 98.5 and on par AUC of .9989.