Role of Machine Learning Models in detecting Cyber Attacks and checking their accuracy using Confusion Matrix
Particularly in the last decade, Internet usage has been growing rapidly. However, as the Internet becomes a part of the day to day activities, cybercrime is also on the rise. Cybercrime will cost nearly $6 trillion per annum by 2021 as per the cybersecurity ventures report in 2020. For illegal activities, cybercriminals utilize any network computing devices as a primary means of communication with a victims’ devices, so attackers get profit in terms of finance, publicity and others by exploiting the vulnerabilities over the system. Cybercrimes are steadily increasing daily. Evaluating cybercrime attacks and providing protective measures by manual methods using existing technical approaches and also investigations has often failed to control cybercrime attacks. In this blog, we will be discussing about various machine learning techniques to analyze cybercrimes.
What is Machine Learning?
Machine Learning is an Approach to AI thatuses a system that is capable of learning from experience. It is intended not only for AI goals (e.g., copying human behavior) but it can also reduce the efforts and/or time spent for both simple and difficult tasks like stock price prediction. In other words, ML is a system that can recognize patterns by using examples rather than by programming them. If your system learns constantly, makes decisions based on data rather than algorithms, and change its behavior, it’s Machine Learning.
Most of tasks are subclasses of the most common ones, which are described below:
- Regression (or prediction) — a task of predicting the next value based on the previous values.
- Classification — a task of separating things into different categories.
- Clustering — similar to classification but the classes are unknown, grouping things by their similarity.
- Association rule learning (or recommendation) — a task of recommending something based on the previous experience.
- Dimensionality reduction — or generalization, a task of searching common and most important features in multiple examples.
- Generative models — a task of creating something based on the previous knowledge of the distribution.
How Binary Classification is used in Intrusion Detection System?
The binary classification task represents the model’s ability to detect whether there is an attack or not in the network environment. The accuracy metric measures tolerance, or how close the predicted classification values are to the true values.
At present, there is no generalized framework is available to categorize cybercrime offenses by feature extraction of the cases. In the present work, data analysis and machine learning are incorporated to build a cybercrime detection and analytics system. The proposed system’s design and implementation utilize classification, clustering and supervised algorithms. Figure 1 depicts the proposed methodology. Here, naïve Bayes is used for classification.For feature extraction in the proposed work, the TFIDF or tf–idf vector process is used . This developed methodology is based on 4 phases that are applied to the data, which are reconnaissance, preprocessing, data clustering and classification and prediction analysis.Sustainability 2020, 12, 4087
KDDTest+ and KDDTest-21 which are two testing sets were used to evaluate the model perforation. In this paper, less than 1% of training data is used for training, while 100% of testing data is used for testing. The confusion matrices on the testing data are shown in table 7 and table 8. The detailed evaluation metric is calculated according to the confusion matrices, as listed in table 9. Compared with the KDDTest+, the related indicators for the normal class of KDDTest-21 declined significantly. The main reason is that KDDtest-21 has reduced about 7500 samples of normal class from KDDTest+,which is more than twice as many as that of abnormal class. The traditional machine learning methods, including J48, Naive Bayes, NB Tree, Random Forest, Random Tree, Multi layer perceptron and SVM, were evaluated by Tavallaee with NSL-KDD dataset.
These algorithms were evaluated on NSL-KDD set by using 20% training data.They used 20% of datasets for training. The result was listed in table 10. The traditional machine learning methods have a better performance in KDDTest+ than in KDDTest-21. Fuzzy based semi-supervised learning is a method of semi-supervised learning. It divided the training set into two parts and utilized 10% of labeled data and 90% of unlabeled data for training the model. The method has obtained excellent results, and the final effect of the model is better than the RNN model. RNN model was a deep learning method which used a full training set to classify the testing dataset. Although the model is better than traditional machine learning methods, it needs more data to train the model. Channel boosted and residual learning based deep convolutional neural network (CBR-CNN) uses the lightweight CNN architecture for classification. It used 80% of the training set and the remaining 20% as the testing set. The performance of CBR-CNN is inferior to that of our method. In the table, it is easy to see that our method exceeds other methods in both two testing sets. Not only that, our method used the least amount of data for training; it only used less than 1% of the data for training and got the best results.
Evaluation of model using Confusion Matrix
The final classification results are divided into four states: TP(true positive), FP(false positive), TN (true negative), FN (false negative), they also are four basic metrics of the confusion matrix. TP is the number of samples that are classified in the normal class. FP is the number of attack samples that are incorrectly classified in the normal class. TN is the number of attack samples that are classified correctly. FN is the number of normal class samples that are classified in attack class. To evaluate the performance of the proposed method.
Accuracy = T P + T N / T P + T N + F N + F P
Precision = T P /T P + F P
F alse P ositive Rate(F P R) = F P/ F P + T N
F alse Negative Rate(F NR) = F N /F N + T P
F alse Alarm Rate(F AR) = F NR + F P R/ 2
Detection Rate (DR) = T P/ T P + F N
F − measure = 2 ∗ (P recision ∗ DR)/ P recision + DR