0% found this document useful (0 votes)
36 views8 pages

Icccn 2019 8847179

This document discusses using machine learning techniques like artificial neural networks and support vector machines for intrusion detection in cloud computing environments. The authors develop models using these techniques on a dataset and perform feature selection to optimize accuracy while reducing training time and complexity. They achieve over 90% accuracy with both techniques and compare their results to existing works.

Uploaded by

keerthiks
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views8 pages

Icccn 2019 8847179

This document discusses using machine learning techniques like artificial neural networks and support vector machines for intrusion detection in cloud computing environments. The authors develop models using these techniques on a dataset and perform feature selection to optimize accuracy while reducing training time and complexity. They achieve over 90% accuracy with both techniques and compare their results to existing works.

Uploaded by

keerthiks
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Supervised Machine Learning Techniques for Efficient Network

Intrusion Detection
Nada Aboueata Sara Alrasbi Aiman Erbad
Qatar University Qatar University Qatar University
Doha, Qatar Doha, Qatar Doha, Qatar
na090288@qu.edu.qa sa099464@qu.edu.qa aerbad@qu.edu.qa

Andreas Kassler Deval Bhamare


Karlstad University Karlstad University
Karlstad, Sweden Karlstad, Sweden
andreas.kassler@kau.se deval.bhamare@kau.se
ABSTRACT Data security is a major concern for end-users in multi-cloud en-
Cloud computing is gaining significant traction and virtualized vironments. Protecting such environments against attacks and in-
data centers are becoming popular as a cost-effective infrastruc- trusions is a major concern in both research and industry. There
ture in telecommunication industry. Infrastructure as a Service (IaaS), are several well-known types of attacks including insider attacks,
Platform as a Service (PaaS) and Software as a Service (SaaS) are flood attacks, user to root attacks to name a few. Usually, the aim
being widely deployed and utilized by end users, including many of these attacks is to gain unauthorized access to the data on the
private as well as public organizations. Despite its wide-spread ac- cloud or cause disturbance to the services provided for the legiti-
ceptance, security is still the biggest threat in cloud computing mate users.
environments. Users of cloud services are under constant fear of Firewalls and other rule-based security approaches have been
data loss, security breaches, information theft and availability is- used extensively to provide protection against attacks in the data
sues. Recently, learning-based methods for security applications centers and contemporary networks. However, firewalls are not
are gaining popularity in the literature with the advents in ma- capable of detecting insider attacks or advanced attacks, such as
chine learning (ML) techniques. In this work, we explore applica- distributed denial of service (DDoS) attacks, since they do not oc-
bility of two well-known machine learning approaches, which are, cur where the firewalls are set up or the rules are not enough to
Artificial Neural Networks (ANN) and Support Vector Machines detect them. Additionally, large distributed multi-cloud environ-
(SVM), to detect intrusions or anomalous behavior in the cloud en- ments would require a significantly large number of complicated
vironment. We have developed ML models using ANN and SVM rules to be configured, which could be costly, time-consuming and
techniques and have compared their performances. We have used error prone [2, 5]. Cloud providers must take the necessary mea-
UNSW-NB-15 dataset to train and test the models. In addition, we sures to either prevent these attacks before their occurrence or
have performed feature engineering and parameter tuning to find detect them as soon as they occur. Commonly, attacks can be de-
out optimal set of features with maximum accuracy to reduce the tected and prevented using Intrusion Detection Systems (IDS) and
training time and complexity of the ML models. We observe that Intrusion Prevention Systems (IPS). Traditionally, IDSs use prede-
with proper features set, SVM and ANN techniques have been able fined rules or behavioral analysis over the network to detect at-
to achieve anomaly detection accuracy of 91% and 92% respectively, tacks [7]. However, the traditional IDSs require humans-in-the-loop
which is higher compared against that of the one achieved in the lit- (i.e., systems-experts) to continuously update the rules and signa-
erature, with reduced number of features needed to train the mod- ture patterns, which is expensive as new sophisticated attacks are
els. introduced everyday. This necessitates building intelligent systems
that effectively identify the attacks with high accuracy nad mini-
KEYWORDS mal overhead.
Recently, advances in machine learning techniques have proven
Intrusion Detection, Cloud Computing, Artificial Neural Networks, their efficiency in many applications, including intrusion detection
Support Vector Machines systems (IDS). Learning-based approaches may prove useful for se-
curity applications since such models could be trained to counter a
1 INTRODUCTION large amount of evolving and complex data using comprehensive
datasets. Such learning models can be incorporated with firewalls
Emergence of cloud computing as a commercial infrastructure has
to improve their efficiency. A well trained model with comprehen-
resulted in plethora of services that are hosted over the cloud plat-
sive attack types would improve anomaly detection efficiency sig-
forms. This has eventually resulted in a burst in the Internet traffic.
nificantly with a reasonable cost and complexity [7, 22, 25].
The possibility of paying-as-you-go along with on-demand elas-
In this paper, we tackle the problem of intrusion detection in
tic operations by cloud platform providers is gaining popularity
cloud platforms by leveraging machine learning (ML) techniques,
in the enterprise computing model [1, 3]. Transition to this cloud
namely Support Vector Machines (SVM) and Artificial Neural Net-
computing paradigm, however, has been hampered by major se-
works (ANN). The aim has been to improve the anomaly detection
curity issues, which are the subject of many recent studies [2, 5].

978-1-7281-1856-7/19/$31.00 ©2019 IEEE


accuracy, while reducing the model training time and complexity. very effective with known threats, but might not be that
ML models used in intrusion detection systems need to have a rel- effective with the unknown threats. These techniques are
atively short training time since they need to be retrained to clas- either knowledge-based (using a set of predefined rules or
sify new types of attacks or adapt to the changing characteristics patterns (signature pattern) [7] or machine learning-based
of previously seen attacks [23]. We have selected SVM and ANN approaches.
algorithms as they are among the best performing algorithms in (ii) Anomaly Detection Techniques: These techniques are based
the literature [12, 22]. Our contributions in this work are two fold: on analyzing the profile of legitimate network traffic. There-
(1) We implement SVM and ANN algorithms using the tool pro- after, any network traffic that deviates from the observed
vided by Python 1 to achieve high accuracy for anomaly de- pattern is considered as an anomaly. These techniques work
tection in cloud environments. We perform parameter tun- by analyzing either behavior of the network or behavior of
ing to achieve the best accuracy. programs. As a result, anomaly detection methods generally
(2) We perform feature engineering to find out optimal set of outperform signature-based approaches as it can detect un-
features to achieve maximum accuracy in minimal training known attacks [19].
time and complexity. We aim to reduce the training time by (iii) Hypervisor and Virtual Machine Introspection Techniques:
selecting optimal set of features while accuracy is not com- These technique refer to the inspection of hypervisor re-
promised. Finally, we compare our results with the existing lated resources to prevent and detect attacks in virtual envi-
works in the literature. ronments. They are implemented using techniques such as
adding another virtualization layer, using integrity check in
The rest of this paper is structured as follows. In Section 2, we re-
memory and code and revising the hypervisor design [19].
view the literature and discuss the prominent approaches in the lit-
(iv) Hybrid Techniques: These techniques are composed of com-
erature for anomaly detection. In Section 3 we discuss the dataset
binations of above mentioned techniques. The purpose of
used in our work along with the details of the feature engineer-
hybrid techniques is to overcome the limitations of the indi-
ing approach. We present the implementation details for our pro-
vidual technique mentioned earlier. [19].
posed approach along with the results in Section 4 and finally we
conclude the paper.
2.2 Machine Learning-based Intrusion
2 LITERATURE REVIEW Detection
In this section, we review the literature for traditional ID tech- In this subsection, we focus on machine learning approaches that
niques (Subsection 2.1) and techniques that deploy machine learn- have been recently employed to train intelligent systems to de-
ing (ML) algorithms (Subsection 2.2). For the latter, we mainly fo- tect anomalies and intrusions over the network automatically [7].
cus on the SVM and ANN ML algorithms and discuss how researchers Buczak, et al. [5] reviewed the literature for machine learning (ML)
have leveraged these techniques to address the intrusion detection and data mining (DM) techniques to detect intrusions and com-
problem. pared their performance. The work includes Support Vector Ma-
chine (SVM), Artificial Neural Networks (ANN), Association Rules,
2.1 Traditional Intrusion Detection Techniques Decision Trees, Naive Bayes, Clustering, to name a few. In the next
Catania et al. [7] have discussed several intrusion detection ap- subsection, we focus on the research works that have employed
proaches that have been proposed during the past 20 years. The SVM and ANN techniques for intrusion or anomaly detection in
most successful methods are based on implication rules and sig- computers or networks. We have selected these two specific tech-
nature patterns of the intrusion behavior. Despite their success, niques as they have been proven to be the most suitable ML tech-
these techniques are unable to perform real-time detection effi- niques for regression and classification problems in the literature
ciently and effectively, especially in the multi-cloud scenarios. This as well as outperform other ML algorithms in binary classifications,
is mainly due to the rapid growth of novel network attacks, which which is the case in this work.
require rewriting the rules and the signatures by systems-experts
(humans-in-the-loop). The authors have classified intrusion detec- 2.2.1 Support Vector Machines (SVM). Significant amount of
tion techniques into two categories which are: (1) misuse detection work has been done in the literature to implement SVM techniques
methods and (2) anomaly detection methods. The misuse detection for anomaly detection. For example, Li et al. [16] have trained a
methods include pattern recognition, implications rules, and data SVM classifier with radial basis function (RBF) kernel on subset of
mining techniques. The anomaly-based methods are further classi- KDD-1999 dataset [8] for misuse detection. Using a 10-fold cross
fied into statistical models, machine learning and data mining tech- validation, the classifier achieved an overall 98% accuracy. The au-
niques. Mishra, et. al. [18] further categorized intrusion detection thors have used feature selection techniques that maximize the
techniques into four finer categories: classification performance. On contrary, Hu et al. [13] have used
a variant of support vector machine approach, that is robust SVM
(i) Misuse Detection Techniques: Catania, et. al. [7] and Mishra
(RSVM) approach for attack classification. Their results, using DARPA-
et. al. [18] have presented the basic idea of misuse detec-
1998 dataset, demonstrated a high accuracy with minimal false
tion techniques that match the behavior of data in audit
alarm rate. Moreover, Wagner et al. [25] utilized real-world and
logs with the behavior of real attack data. Thus, they are
simulated attack datasets to study the effectiveness of one-class
1 scikit-learn.org/stable/index.html SVM classifier to detect anomalies. The authors have proposed a
new window kernel to help finding an anomaly based on time po- The authors have used a unipolar sigmoid function as the activa-
sition of the NetFlow data. The authors have demonstrated 89% tion function and a stochastic learning algorithm with mean square
to 94% accuracy on different types of attacks with 0% to 3% false error as a error function. The ANN model achieved an overall accu-
alarm rates. racy of 99.4%. However, they haven’t considered feature engineer-
More recently, Kuang et al. [15] have utilized SVM with parti- ing in their work.
cle swarm optimization (PSO) to solve intrusion detection problem. Chaouki Khammassi[14] proposed a feature selection approach
Authors have used KDD Cup 1999 dataset to evaluate the proposed to select the optimal set of features for intrusion detection. The pro-
system. The classification model designed by Kuang et al. [15] con- posed method is based on wrapper approach that consists of two
sists of two steps: (1) using the kernel principal component analy- components: search and predictor strategy. The search strategy is
sis (KPC) for dimensionality reduction and feature extraction. (2) based on a genetic algorithm that determines the subset of features
applying the SVM classification model. The work has considered to be evaluated, while the predictor strategy is based on logistic re-
SVM penalty parameter C and kernel function parameters. As each gression as a classification algorithm where its performance is used
of the proposed approaches used different evaluation metrics, it is to evaluate the subset of features selected by the search strategy.
not feasible to compare their performance. In this paper, we attempt to achieve the benchmarked accuracy for
Many research studies have applied data mining techniques to anomaly detection using SVM and ANN techniques, similar to the
design network intrusion detection systems (NIDS), one of the most one in the literature, with reduced number of features and with the
common and effective technique using Support Vector Machine latest data set available for more effectiveness. The performance
(SVM). Despite its effectiveness, SVM is unable to perform well of our proposed approach is compared to the work proposed in
with large datasets due to various system constraints such as higher [14], which is based on the similar data set. In the next section, we
memory demands or complex kernel functions. Horng et al. [12] discuss the dataset used in our work along with the details of a
overcame this issue by developing a hierarchical clustering algo- selected feature engineering approaches.
rithm which provides a representative sample of the training dataset.
Clustering reduces the number of data points and as a result it re- 3 DATASET AND FEATURE ENGINEERING
duces the training complexity significantly. Similar to [15], they
We now discuss the dataset we have used (subsection 3.1) in this
used KDD Cup 1999 dataset for evaluation and showed better per-
work, the evaluation measures (subsection 3.2) as well as the fea-
formance. The proposed classifier achieved an accuracy of 95.72%
ture engineering and parameter tuning approaches (subsection 3.3
and false positive rate of 0.7%. The major drawback of these works
and 3.4).
is the dataset used is quite old (1999) and models built using such
dataset may not be effective in detecting anomalies in contempo-
rary diversified cloud scenarios. 3.1 Dataset
In this work, we have referred to the UNSW-NB-15 dataset, which
2.2.2 Artificial Neural Networks (ANNs). In 1998, Cannady [6] has been collected by Cyber Range Lab of the Australian Centre
conducted preliminary experiments using ANNs as a multi-category for Cyber Security (ACCS) using IXIA PerfectStorm tool [21]. The
classifier for misuse detection. The experimental results achieved data has been collected for nine families of attacks, for a total of
high accuracy of 93% (0.058 root-mean-square (RMS) for training 31 hours. The data is labeled using a ground truth table that con-
phase and 0.070 for testing phase) in the testing phase. Lippmann tains all simulated attack types. The dataset contains 49 features
and Cunningham [17] also trained ANNs for anomaly detection. including the binary class label (1 is abnormal and 0 is normal)
They employed two multi-layered ANNs. The first model estimates and around 2.5M (2,540,044) records 2 . We have divided the origi-
the probability of an attack, given text statistics extracted from nal dataset into train and test data sets. The number of records in
transcripts of telnet sessions. The second model takes labeled at- the testing dataset is 175,341 instances, while the training dataset
tacks as an input and classifies them into the attack category. includes 82,332 instances. The training set is used to train the al-
Recently, Subba et al. [24] trained a simple ANN network (with gorithm using different parameters. We have further divided the
one hidden layer in addition to the input and output layers) to pre- testing dataset into two sets: validation and testing sets. The vali-
dict anomaly based intrusion over a network. The authors used dation set represents 80% of the testing dataset (140,272 instances)
both forward and back propagation to optimize binary (normal and it is used to optimize the algorithms parameters. The remain-
and attack classes) and multi-class (Normal, DoS, U2R, R2L, Probe ing 20% (35,069 instances) of the test dataset has been used to assess
classes) ANN models. They evaluated their approach over the bench- the performance of the trained models. Moreover, as proposed in
mark NSL-KDD dataset and compared it with Naive Bayes (NB), [21] we have performed feature engineering to prepare five differ-
SVM and a decision tree (C4.5) algorithms. The performance of ent sets of the training parameters which are: (1)basic, (2)content,
the simple ANN beat the NV classifier and performs comparably (3)time, (4)general purpose and (5)connection features. Categoriza-
to SVM and decision tree C4.5. tion can give insights about the contribution of features on the
Moradi et al. [20] have studied the possibility of achieving high classification accuracy. Readers are requested to refer to the work
performance with lesser number of hidden layers to reduce model presented in [21] for the details on the set of features included in
training time as well as the model complexity. The results were each category.
found to be promising. In another attempt Hodo et al.[11] proposed
an ANN model that is able to classify network traffic as normal vs.
abnormal by building a three layers feed-forward neural network. 2 www.unsw.adfa.edu.au/australian-centre-for-cyber-security/cybersecurity/ADFA-NB15-Datasets/
3.2 Evaluation Measures They also reduce the training time as they reduce the features di-
Typically, accuracy is the evaluation measure used to evaluate clas- mensionality.
sification techniques, however, in cases where classes are not bal- 3.3.1 Support Vector Machines. To select an optimal set of fea-
anced, accuracy might produce misleading results. To resolve this tures, we have used two feature selection algorithms, which are
issue, in addition to accuracy, we evaluate and analyze the per- commonly used in literature: (1) univariate [16] and (2) Principal
formance of classification methods using recall, precision, and F β Component Analysis (PCA) [13]. Univariate feature selection ex-
measures [2, 22]. Formulas for aforementioned evaluation param- amines each feature individually to evaluate its contribution to
eters are given below. the classification. Based on univariate statistical test result, such as
TP univariate chi square test (ChiX 2 ) that selects the best K features
Precision(P) = (1) and remove other features. Principal Component Analysis (PCA)
TP + FP
feature selection uses linear algebra to transform the dimensions
TP
Recall(R) = (2) of the dataset into a compressed form. Using PCA method, one
TP + FN
can identify the most important features (i.e. dimensions), which
Precision.Recall
F β = (1 + β 2 ). 2 (3) in turn reduces the training complexity significantly especially for
β Precision + Recall larger datasets [13].
TP +TN
Accuracy(Acc) = (4) 3.3.2 Artificial Neural Networks. We further experimented us-
TP + T N + FP + FN
Where: ing feature selection algorithms from SciKit-learn library. We se-
lected Chi-square statistical method (ChiX 2 ) that exclude all fea-
True Positives (TP) is a number of instances which are pre-
ture except the k highest scoring features. We compare the results
dicted as abnormal and are actually abnormal.
of ChiX 2 in the Section 4.2 with other feature sets.
False Positives (FP) is a number of instances which are pre-
dicted abnormal but are actually normal.
True Negative (TN) is a number of instances which are pre-
3.4 Parameter Tunning
dicted as normal and are actually normal. Parameter tuning is another important aspect of machine learn-
False Negative (FN) is a number of instances which are pre- ing techniques. Performances of the ML techniques vary greatly
dicted as normal but are actually abnormal. with different training parameters and hence it is mandatory to
observe the performance with different set of parameters. For Sup-
3.3 Feature Engineering port Vector Machine (SVM) technique, we experimented with two
SVM parameters: (1) kernel function and (2) penalty-C parameter.
Training of ANN and SVM models can be a time-consuming pro-
As explained in section 3.1, the kernel parameter is a similarity
cess, especially for the larger datasets. Usually, it is recommended
function that computes the similarity between two objects. It is
to have a large training set since it increases the accuracy of the
a configurable parameter that could be either linear (e.g., Linear
trained model; however, it can also lead to the over − f ittinд prob-
SVC) or non-linear (e.g., sigmoid function) kernel. The penalty-C
lem in which the model is fitted perfectly on the training set, but
parameter penalizes the misclassified classes. Penalty-C has to be
fails to perform well when tested on new unseen data [9]. One
set carefully, since assigning low values (i.e. soft margin) may lead
of the ways to avoid over-fitting in neural networks is by using a
to miss-classifying instances, while assigning high values may lead
neural network that is large enough, with large number of hidden
to over-fitting.
layers or a complex kernel function in SVM [2]. A complex model
ANN parameters that can be tuned to enhance the classifica-
can provide a satisfactory fit for the training as well testing data
tion performance include the activation function, the optimization
set, however, it might take significant amount of time to train or
function, the number of layers, the number of neurons, initial weights,
re-train the model. On contrary, intrusion detection problems re-
and the learning rate, to name a few. We have varied the number of
quire quick responses, since it is deployed and used in real-time
hidden layers and neuron in each layer and have performed sensi-
scenarios. Thus, models used in intrusion detection systems need
tivity analysis on the classification performance [20]. Additionally,
to have a relatively short training time since they need to be re-
we have experimented with the activation function and the weight
trained to classify new types of attacks or adapt to the changing
optimization functions (known as solver in python library).
characteristics of previously seen attacks [23].
In the next section, we describe the implementation details, the
Approaches that attempt to solve the aforementioned issue gen-
results and perform the experimental evaluation.
erally use regularization and early stopping methods [23]. Also,
selecting optimal set of features can be a effective way to reduce
the training time. However, care has to taken to find out the opti- 4 RESULTS AND EVALUATION
mal set. Emitting important features of the dataset from training We have implemented Support Vector Machine, Artificial Neural
set can lead to severely degraded performance of the model [2]. To Networks, and feature selection approaches, namely Principal Com-
address this problem, we now discuss the details of the feature en- ponent Analysis, and univariate feature selection using Python
gineering approaches used in this work. In addition to the feature programming language with the Scikit-Learn 3 machine learning
categorization mentioned in section 3.1, we experimented with dif- library. Scikit-Learn includes many implementations of different
ferent feature selection methods. These methods aim at selecting
3 http://scikit-learn.org/
a subset of relevant features to train the classification algorithms.
machine learning algorithms that are open source. Below we pro- Table 3: Optimal value of components size in PCA feature
vide the implementation details as well as present our results and selection method. Best value per column is boldfaced.
analyze them.
# Comp. P R F1 Acc
4.1 Support Vector Machines 8 0.70 0.68 0.69 0.6815
7 0.70 0.68 0.69 0.6815
To train SVM models, we started with the feature categorization
9 0.70 0.68 0.69 0.6814
method. For each category, we trained SVM model by using differ-
4 0.71 0.66 0.67 0.65
ent values of penalty parameter C, which ranges between 20-100
5 0.61 0.49 0.50 0.49
with step of size 20, and different kernel functions, that is, linear,
polynomial and sigmoid functions. Table 1 shows the best accuracy
achieved on the validation set per category and its corresponding For PCA feature selection approach, we trained the SVM model
SVM parameters. As its shown in figure 1, the general purpose cat- by using different values of components (i.e. number of dimensions
egory outperforms all other categories with an accuracy of 92%. for reduction), which ranges between 2-9 with step of size 1. Table
with sigmoidal kernel function and values of C as 20. 3 shows top five values of component sizes that achieve the highest
accuracy. As it can be seen, the highest accuracy is obtained using
8 components.
Table 1: SVM results on validation dataset per category. Best
value per column is boldfaced. Table 4: SVM results on testing dataset. Best value per col-
umn is boldfaced.
Category C_penalty kernel P R F1 Acc
Basic 20 sigmoid 0.46 0.68 0.55 0.68 Features C- kernel P R F1 Acc
Content 20 sigmoid 0.46 0.68 0.55 0.68 penalty
Time 20 sigmoid 0.77 0.76 0.77 0.76 Basic 20 sigmoid 0.46 0.68 0.55 0.68
General 20 sigmoid 0.92 0.92 0.92 0.92 Content 20 sigmoid 0.46 0.68 0.55 0.68
Connection 80 linear 0.71 0.69 0.59 0.69 Time 20 sigmoid 0.77 0.76 0.77 0.76
General 20 sigmoid 0.92 0.92 0.91 0.92
Connection 80 linear 0.71 0.69 0.59 0.69
univariate (K = 20 linear 0.55 0.68 0.55 0.68
For univariate feature selection method, we have trained the 4)
SVM model by using different values of K parameter which ranges PCA (com- 100 linear 0.70 0.71 0.70 0.71
between 2-28 with step of size 2 (where K refers to the top fea- ponents =
tures selected). Table 2 shows the top five values of k parameter 8)
that achieve the highest accuracy. As it can be seen, the highest
accuracy is obtained at k = 4.
Finally, table 4 shows the SVM prediction accuracy on the test-
ing dataset for each feature selection method. The general purpose
Table 2: Optimal value of K in univariate feature selection category outperforms all features selection methods with an accu-
method. Best value per column is boldfaced. racy of 92%, with value of C as 20 and sigmoidal kernel function.

4.2 Artificial Neural Networks


K P R F1 Acc
4 0.79 0.79 0.79 0.78 In this subsection, we describe the Artificial Neural Network model
12 0.82 0.77 0.78 0.77 as well as analyze the results obtained with ANN. ANN model is
10 0.79 0.74 0.75 0.74 developed with the help of a built-in model provided by Scikit-learn
20 0.72 0.73 0.72 0.73 machine learning library4 . We first explain the experimental setup
and then document the results obtained using the ANN model.
2 0.70 0.66 0.67 0.66
4.2.1 ANN Model preparation: Artificial Neural Network model
has two development phases. The first phase in ANN development
After obtaining the optimal value of K, we re-trained the SVM is parameters initialization, in which, the weights and the bias val-
model using the optimal value of K (K = 4) , however, this time ues of the network layers are initialized. The weights are randomly
with different SVM parameters. Table 2 shows how the SVM trained generated [10], whereas, the biases are initialized to zeros.
model perform on the validation set at different SVM parameters Let us assumeWr be a randomly generated matrix of dimensions
and k = 4. It can be seen that SVM model achieves almost the N ×M, where N and M are the numbers of neurons in current layer
same accuracy with polynomial and sigmoid kernel function at all and neurons in the previous layer, respectively. Then, the weight
values of C. Meanwhile, when the linear kernel is used, the SVM of a layer L is initialized by the following equation:
accuracy fluctuates over different values of C. 4 scikit-learn.org/stable/index.html
Figure 1: The accuracy of SVM trained Models against different feature sets with different SVM parameters.

equation 7.

2 Z = W (l )X + b (l ) (6)
W [L] = Wr × (5)
N
 relu(Z ) = max(0, Z ) (7)
2 is a multiplication factor recommended by He et.al. [10] for
N Here we have used the Rectified Linear Unit (ReLU) function,
better weights initialization. Next step is the forward propagation which is zero when Z < 0 and linear with a slope 1 when Z > 0.
phase. Each training example X from the input layer, goes through Calculations in equation 6 and 7 are repeated at each hidden layer.
transformation at each neuron it visits on each hidden layer. At For the output layer we use siдmoidal function as the activation
each neuron, two steps are performed. First, modified value of in- function, since our aim is to perform the binary classification and
put X after applying the weights W and biases b of a layer l is siдmoidal function is apt for the same [19]
calculated. It is represented by variable Z and is given by equation
6. Second, activation function is applied to generate the output at 1
each hidden layer. A simple ReLU activation function is given by siдmoid(Z ) = (8)
1 + e −Z
in table 6. It can be clearly seen that the general purpose category
m
1  (i) achieved the best accuracy (92% training accuracy and 91% testing
J =− (y log a (i) ) × ((1 − y (i) )(1 − log a (i) )) (9) accuracy) as compared with the other categories.
m i=0
At the end of the forward propagation phase, we compute the cost Table 6: ANN Training and Testing Results per Feature Cat-
to evaluate the model. We have used cross-entropy loss function egory. Best value per column is boldfaced.
[4] which have many practical applications in which, very small
probabilities need to be calculated quickly. It is given by the equa-
Training Results Testing Results
tion 9. In equation 9, m denotes the total number of training sam-
Cat. P R F1 Acc. P R F1 Acc
ples, ai denotes the prediction for training example i, and y is the
All 0.86 0.86 0.86 0.86 0.86 0.86 0.86 0.86
true label of training example i. It is expected that the cost de-
Cont. 0.78 0.74 0.75 0.74 0.78 0.74 0.75 0.74
creases in each iteration. The next step is the backward propaga-
Time 0.88 0.85 0.84 0.85 0.87 0.85 0.84 0.85
tion in which, the weights and the biases are updated based on the
prediction errors calculated in the earlier stage. The weights are Gen. 0.93 0.92 0.92 0.92 0.93 0.92 0.92 0.91
updated using the gradient decent algorithm, which tries to opti- Conn. 0.88 0.88 0.88 0.88 0.87 0.88 0.87 0.88
mize the training parameter so that the errors are minimized. ChiX 2 0.87 0.85 0.84 0.85 0.88 0.87 0.86 0.85

Table 5: Artificial Neural Networks Training Results. Best


5 COMPARISON AND CONCLUSIONS
value per column is boldfaced.
In this work, we studied and demonstrated the effectiveness of
Opt. Actv. # layers # nodes Acc the machine learning techniques, that is, Support Vector Machine
(SVM) and Artificial Neural Networks (ANN) to detect intrusions
lbfgs logistic 5 20 0.81
in the cloud environment. We compared the performance of dif-
lbfgs tanh [5, 7] [15, 20, 25] 0.82
ferent implementations of features selection and parameter tun-
lbfgs relu 1 25 0.75
ing schemes. In the case of SVM, we chose features categorization,
SGD Sigmoid 5 25 0.83
univariate and PCA feature selection methods. In addition to that,
SGD Htan [1, 3, 5, 7] [15, 20, 25] 0.82
we tweaked two SVM parameters, which are kernel function and
SGD Relu [1, 3, 5, 7] [5, 10, 15, 20, 25] 0.68 penalty C parameter. The general purpose feature category outper-
Adam Sigmoid [1, 3] 25 0.76 forms all other features selection methods with an accuracy of 90%
Adam Htan 1 15 0.78 with values of C as 20 and Sigmoidal kernel function.
Adam Relu 5 25 0.86 With our results, we observed that ANN had a slightly better
performance compared against SVM. The best SVM model achieved
4.2.2 ANN Model Results: The above steps are repeated for some an accuracy of 0.68; whereas, ANN achieved a higher accuracy of
finite number of iterations. Higher number of iterations means bet- 0.72. Moreover, ANN had a lot better precision than SVM. SVM
ter tuning of the weights and lower errors. However, more inter- scored a precision of 0.46 whereas ANN precision value was 0.78
actions might result in over-fitting of the model. In this work, we for the same feature category. Higher precision and recall values
have used Reдularization techniques to avoid over-fitting. In this using ANN implies that ANN has a better chance of detecting an
work, we have chosen four parameters to tune the ANN model, anomalous traffic compared to SVM. Similarly, Connection Fea-
which are: (1) Weight optimization functions: stochastic gradient tures model in ANN outperforms the SVM model with a difference
descent (SGD), stochastic optimization (Adam) and quasi-Newton of 0.2. In addition, the combination of ANN and ChiX 2 proved to
methods (lbfgs) (2) Activation functions: logistic sigmoid function be better than SVM with PCA.
(Sigmoid), hyperbolic tan function (Htan) and rectified linear unit
Table 7: Performance of our proposed method compared to
functions (Relu) (3) Number of hidden layers (h_layers) have been
other methods
chosen from range 1 to 7 with a step of size 2 (4) The number of neu-
rons in hidden layers have been chosen from range 15 to 25 with
step of size 5. In table 5, we display the results of different ANN work FS Method #Features Classifier Acc.
models with different combinations of hidden layers and neurons Khammassi, GA-LR 20 DT 0.81
and note down their accuracy for different activation functions. Chaouki
It can be clearly seen that the SGD optimization function is not (2017)
affected by the number of neurons and layers. The combination of SVM 0.92
proposed General purpose 5
"Adam optimization" and "ReLU " activation functions with 5 layers ANN 0.91
and 25 neurons in each layer outperforms all other combinations
while using the whole features set. We then conducted a sensitiv- The best accuracy is, however, achieved with general features
ity analysis over parameters of ANN in order to select the best set set, for both, ANN and SVM. SVM and ANN have been able to
of parameters. Figure 2 displays the results of the sensitivity anal- achieve a anomaly detection accuracy of 91% and 92% respectively,
ysis of different feature sets with the ANN model. We have pre- which is quite higher compared against that of the one achieved in
sented the results of training and testing for each feature category the literature. Table 7 illustrates the performance of our proposed
Figure 2: Parameters sensitivity for lbfgs optimization function

approach compared with the work proposed in [14]. Similar to our 38(5):1062–1072, 2012.
work, authors in [14] have used UNSW-NB-15 dataset for evalua- [8] KDD Cup. Dataset. available at the following website http://kdd. ics. uci.
edu/databases/kddcup99/kddcup99. html, 72, 1999.
tion. The results have shown that the classification accuracy with [9] Jiawei Han, Jian Pei, and Micheline Kamber. Data mining: concepts and tech-
our proposed schemes is improved significantly (from 81% to 92%) niques. Elsevier, 2011.
[10] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into
by selecting the general purpose features only (i.e., five features). rectifiers: Surpassing human-level performance on imagenet classification. In
compared against the accuracy of the work proposed in [14] with Proceedings of the IEEE international conference on computer vision, pages 1026–
20 features. Thus, we can conclude that the main goal of maximiz- 1034, 2015.
[11] Elike Hodo, Xavier Bellekens, Andrew Hamilton, Pierre-Louis Dubouilh,
ing the accuracy of classification while minimizing the features Ephraim Iorkyase, Christos Tachtatzis, and Robert Atkinson. Threat analysis of
dimensionality can be achieved by using the proposed ANN and iot networks using artificial neural network intrusion detection system. In Net-
SVM models while considering the general purpose features for works, Computers and Communications (ISNCC), 2016 International Symposium
on, pages 1–6. IEEE, 2016.
classification. Also, reducing the features dimensionality by the [12] Shi-Jinn Horng, Ming-Yang Su, Yuan-Hsin Chen, Tzong-Wann Kao, Rong-Jian
proposed approach can reduce the training time and complexity Chen, Jui-Lin Lai, and Citra Dwi Perkasa. A novel intrusion detection system
based on hierarchical clustering and support vector machines. Expert Systems
significantly, without compromising the accuracy of anomaly de- with Applications, 38(1):306 – 313, 2011.
tection. [13] Wenjie Hu, Yihua Liao, and V Rao Vemuri. Robust support vector machines for
As a future work, the authors plan to perform the sensitivity anomaly detection in computer security. In ICMLA, pages 168–174, 2003.
[14] Chaouki Khammassi and Saoussen Krichen. A ga-lr wrapper approach for fea-
analysis of the problem, especially by implementing cross valida- ture selection in network intrusion detection. Computers & Security, 2017.
tion technique in the evaluation. Also, the authors aim to train the [15] Fangjun Kuang, Siyang Zhang, Zhong Jin, and Weihong Xu. A novel SVM by
multi-class ML models to predict the exact attack type for finer combining kernel principal component analysis and improved chaotic particle
swarm optimization for intrusion detection. Soft Computing, 19(5):1187–1199,
classification. May 2015.
[16] Yinhui Li, Jingbo Xia, Silan Zhang, Jiakai Yan, Xiaochuan Ai, and Kuobin Dai.
An efficient intrusion detection system based on support vector machines and
6 ACKNOWLEDGEMENT gradually feature removal method. Expert Systems with Applications, 39(1):424–
This publication was made possible by NPRP award [NPRP 8-634- 430, 2012.
[17] Richard P Lippmann and Robert K Cunningham. Improving intrusion detection
1-131] from the Qatar National Research Fund (a member of The performance using keyword selection and neural networks. Computer Networks,
Qatar Foundation). Also, parts of this work has been funded by 34(4):597–603, 2000.
the Knowledge Foundation, Sweden, through the profile HITS. The [18] Preeti Mishra, Emmanuel S. Pilli, Vijay Varadharajan, and Udaya Tupakula. In-
trusion detection techniques in cloud environment: A survey. Journal of Network
authors would also like to thank Ms. Zeineb Safi and Ms. Reem and Computer Applications, 77(Supplement C):18 – 47, 2017.
Suwaileh for their contributions in the implementation of the al- [19] Chirag Modi, Dhiren Patel, Bhavesh Borisaniya, Hiren Patel, Avi Patel, and Mut-
tukrishnan Rajarajan. A survey of intrusion detection techniques in cloud. Jour-
gorithms. The statements made herein are solely the responsibility nal of Network and Computer Applications, 36(1):42–57, 2013.
of the author[s]. [20] Mehdi Moradi and Mohammad Zulkernine. A neural network based system for
intrusion detection and classification of attacks. In Proceedings of the IEEE Inter-
national Conference on Advances in Intelligent Systems-Theory and Applications,
REFERENCES pages 15–18, 2004.
[1] Deval Bhamare, Aiman Erbad, Raj Jain, Maede Zolanvari, and Mohammed [21] Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network
Samaka. Efficient virtual network function placement strategies for cloud ra- intrusion detection systems (unsw-nb15 network data set). In Military Com-
dio access networks. Computer Communications, 127:50–60, 2018. munications and Information Systems Conference (MilCIS), 2015, pages 1–6. IEEE,
[2] Deval Bhamare, Tara Salman, Mohammed Samaka, Aiman Erbad, and Raj Jain. 2015.
Feasibility of supervised machine learning for cloud security. In Information [22] Tara Salman, Deval Bhamare, Aiman Erbad, Raj Jain, and Mohammed Samaka.
Science and Security (ICISS), 2016 International Conference on, pages 1–5. IEEE, Machine learning for anomaly detection and categorization in multi-cloud en-
2016. vironments. In Cyber Security and Cloud Computing (CSCloud), 2017 IEEE 4th
[3] Deval Bhamare, Mohammed Samaka, Aiman Erbad, Raj Jain, Lav Gupta, and International Conference on, pages 97–103. IEEE, 2017.
H Anthony Chan. Multi-objective scheduling of micro-services for optimal ser- [23] Naeem Seliya and Taghi M. Khoshgoftaar. Active learning with neural networks
vice function chains. In 2017 IEEE International Conference on Communications for intrusion detection. pages 49–54, 2010.
(ICC), pages 1–6. IEEE, 2017. [24] Basant Subba, Santosh Biswas, and Sushanta Karmakar. A neural network
[4] Pieter-Tjerk de Boer, Dirk P. Kroese, Shie Mannor, and Reuven Y. Rubinstein. A based system for intrusion detection and attack classification. In Communication
Tutorial on the Cross-Entropy Method. Annals of Operations Research, 134(1):19– (NCC), 2016 Twenty Second National Conference on, pages 1–6. IEEE, 2016.
67, February 2005. [25] Cynthia Wagner, Jérôme François, Thomas Engel, et al. Machine learning ap-
[5] Anna L Buczak and Erhan Guven. A survey of data mining and machine learning proach for ip-flow record anomaly detection. In International Conference on Re-
methods for cyber security intrusion detection. IEEE Communications Surveys search in Networking, pages 28–39. Springer, 2011.
& Tutorials, 18(2):1153–1176, 2016.
[6] James Cannady. Artificial neural networks for misuse detection. In National
information systems security conference, pages 368–81, 1998.
[7] Carlos A Catania and Carlos García Garino. Automatic network intrusion detec-
tion: Current techniques and open issues. Computers & Electrical Engineering,

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy