content part_merged
content part_merged
BACHELOR OF TECHNOLOGY
IN
P. VENKATA PARIMALA
(226D5A0515)
Professor
CERTIFICATE
This is to certify that the Major Project work entitled “NEXT GEN INTRUSION DETECTION
USING DL” submitted by NAVEEN SABBISETTI (216D1A0522), BHAGYA LAKSHMI
THUMATI (216D1A0525), VENKATA PARIMALA PIRUPALLI (226D5A0515) in the
partial fulfillment of the requirements for the award of degree in “BACHELOR OF
TECHNOLOGY” in Computer Science and Engineering is a Bonafide record of the work carried
out under my guidance and supervision AT SANKETIKA INSTITUTE OF TECHNOLOGY
AND MANAGEMENT during the academic year 2021-2025.
External Examiner
DECLARATION
We hereby declare that the project work titled “Next Gen Intrusion Detection Using DL”
submitted to SANKETIKA INSTITUTE OF TECHNOLOGY AND MANAGEMENT is a
record of an original work done by Naveen Sabbisetti (216D1A0522), T. Bhagya Lakshmi
(216D1A0525), P. Venkata Parimala (226D5A0515) under the esteemed guidance of Dr.
K.N.S LAKSHMI Professor. This project work is submitted in the partial fulfilment of the
requirements for the award of the degree Bachelor of Technology in Computer Science &
Systems Engineering. This entire project is done with the best of our knowledge and is not
submitted to any University for the award of degree.
With great solemnity and sincerity, we express our deepest sense of gratitude and pay our sincere
thanks to our guide Dr. K.N.S Lakshmi, Professor, Department of Computer Science and
Engineering, who evinced keen interest in our efforts and provided her valuable guidance
throughout our project work.
We thank our project coordinator Ms. J. Kavitha, Assistant Professor who has made her support
available in a number of ways and helped us to complete our project work in correct manner.
We thank our Dr. K.N.S Lakshmi, Professor, Head of the Department of Computer Science &
Engineering who helped us to complete our project work in a truthful method.
We thank our gratitude to our principal Dr. T.V Rama Krishna, for his kind attention and valuable
guidance to us throughout this course in carrying out the project.
We wish to express gratitude to our Management Members who supported us in providing good
lab facility.
We are thankful to All Staff Members of Department of Computer Science & Engineering, for
helping us directly / indirectly to complete this project work by giving valuable suggestions.
All of the above we great fully acknowledge and express our thanks to our parents who have been
instrumental for the success of this project which play a vital role.
Title Page No
2.1 Comparitive Analysis 10
4.1 Comparitive Model Evaluation 21
8.1 ANN Model Confusion Matrix 63
LIST OF FIGURES
Title Page No
5.1 Use Case Diagram 26
5.2 Use Case Diagram 27
5.3 Data Flow Diagram 30
6.1 ANN Training Confusion Matrix 45
6.2 ANN Testing Confusion Matrix 45
6.3 Training and Testing Accuracy of Various Models 52
6.4 Home Screen 54
6.5 About Screen 55
6.6 Prediction Screen 55
6.7 Jupyter Notebook Screen 56
6.8 Python app.py Screen 56
6.9 Output Screen Normal 57
6.10 Output Screen Normal 57
6.11 Output Screen Intrusion 58
6.12 Output Screen Intrusion 58
i
ABSTRACT
The Next-Gen Intrusion Detection System (NIDS) leverages the capabilities of deep
learning to address the limitations inherent in traditional hybrid approaches. These existing
systems typically rely on a combination of the Differential Harmony Search Algorithm
(DLHA) and Support Vector Machine (SVM). While these traditional systems demonstrate
high accuracy in detecting network intrusions, they often struggle with scalability and are
less effective in identifying complex, non-linear patterns within large, dynamic datasets.
The proposed NIDS improves upon these constraints by incorporating an Artificial Neural
Network (ANN) framework, which is inherently suited for learning complex, non-linear
relationships within data. This deep learning-based approach enhances the system's ability to
detect evolving intrusion patterns and increases both its scalability and operational efficiency.
By removing the dependence on DLHA-SVM and adopting an ANN, the new system
provides a more robust, adaptive, and efficient solution for modern network security
challenges. Additionally, this approach reduces the need for extensive feature engineering,
which is a key challenge in traditional systems, offering a more streamlined process.
Key words:
NIDS :- Next-Gen Intrusion Detection System
DL :- Deep Learning
ANN :- Artificial Neural Network
DLHA:- Differential Harmony Search Algorithm
SVM :- Support Vector Machine
NIDS :- Network Intrusion Detection System
Scalability
Non-linear Pattern Recognition
ii
CHAPTER 1
INTRODUCTION
1
1.1. Introduction:
In the era of rapid technological advancements, the security of networked systems has become a
paramount concern. As organizations increasingly rely on digital infrastructure, the need for
effective protection against malicious activities such as cyberattacks and unauthorized access has
grown significantly. Network Intrusion Detection Systems (NIDS) serve as a critical component
of network security, designed to monitor network traffic and identify potential threats or intrusions.
These systems are essential for mitigating risks associated with data breaches, system
compromises, and other security vulnerabilities that could harm organizations and their operations.
Traditional NIDS approaches, such as the use of Signature-based Detection or Anomaly based
Detection, have served as the cornerstone of network security. However, these methods often face
limitations, particularly when dealing with large-scale and complex datasets, as well as dynamic
and sophisticated attack vectors. While signature-based systems are effective in detecting known
attacks, they struggle to identify new or evolving threats. On the other hand, anomaly-based
systems are capable of detecting novel intrusions but may generate high false positive rates,
impacting the overall efficiency of the system.
In response to these challenges, recent advancements in machine learning and deep learning have
introduced new possibilities for enhancing the performance and scalability of NIDS. Traditional
hybrid models, such as those utilizing the Differential Harmony Search Algorithm (DLHA) for
feature selection combined with Support Vector Machine (SVM) for classification, have shown
promise in offering high accuracy. However, these approaches are constrained by their inability to
scale effectively and their limited capacity to adapt to the nonlinear and dynamic patterns often
seen in modern network intrusions.
The Next-Gen Intrusion Detection System (NIDS) proposed in this project aims to address these
limitations by replacing the DLHA-SVM framework with a more advanced Artificial Neural
Network (ANN). ANNs, which are capable of learning complex, non-linear relationships within
data, provide a robust solution for detecting evolving intrusion patterns. This deep learning-based
approach is not only more scalable but also enhances the system's ability to adapt to new and
sophisticated attack techniques. The proposed system also reduces the need for extensive feature
engineering, simplifying the overall process while improving detection accuracy. By integrating
deep learning methodologies, the proposed NIDS aims to offer a more robust, adaptive, and
efficient solution for modern network security, enabling organizations to better safeguard their
digital infrastructure against increasingly sophisticated cyber threats.
2
1.2. Problem Statement:
With the exponential increase in cyber threats, network security has become a critical concern for
organizations worldwide. Traditional Network Intrusion Detection Systems (NIDS) typically
employ hybrid models such as the Differential Harmony Search Algorithm (DLHA) combined
with Support Vector Machines (SVM). While these systems demonstrate reasonable accuracy, they
suffer from major limitations, including scalability challenges, high computational costs, and
inefficiencies in detecting sophisticated and non-linear attack patterns. Additionally, reliance on
extensive feature engineering increases processing overhead and reduces real-time adaptability.
The proposed research aims to enhance intrusion detection by implementing an Artificial Neural
Network (ANN)-based NIDS. ANN models can automatically learn complex attack patterns from
raw data without requiring extensive manual feature selection, making them more scalable and
adaptable. By leveraging deep learning techniques, this study intends to develop a robust, high-
performance NIDS capable of detecting evolving threats with greater accuracy, efficiency, and
responsiveness. This research will contribute towards strengthening cybersecurity by providing an
advanced, intelligent, and self-learning intrusion detection system that outperforms conventional
approaches.
3
Real-Time Detection Implementation: Developing a system capable of processing
network traffic in real time, ensuring immediate detection of potential threats and
minimizing false positives.
Performance Benchmarking: Measuring system efficiency through key
performance indicators such as detection accuracy, precision, recall, F1-score, and
confusion matrix analysis.
Deployment and Scalability Considerations: Ensuring seamless integration into
high-traffic network environments while maintaining efficient computational performance.
This hypothesis will be tested through comprehensive experimental evaluation, utilizing diverse
datasets and statistical benchmarking to compare the ANN-based NIDS with traditional models.
1.5. Objectives:
The primary objectives of this research are:
4
Minimize Feature Engineering Dependency: Enable the system to learn directly
from raw data, reducing the need for extensive manual feature selection.
Ensure System Scalability: Design an intrusion detection system that efficiently
processes large datasets without performance degradation.
Implement Real-Time Detection and Response: Develop a low-latency alert
mechanism to ensure timely identification of cyber threats.
Performance Evaluation and Benchmarking: Conduct a comparative analysis
of the ANN-based NIDS against existing machine learning models using standard
performance metrics.
Introduction
Literature Review
Proposed Methodology
Implementation
5
Selection and Preparation of Benchmark Datasets (NSL-KDD, CICIDS 2017)
Performance Evaluation
6
CHAPTER 2
LITERATURE SURVEY
7
2.1. Introduction
Intrusion Detection Systems (IDS) are essential for safeguarding network security, especially in
the face of continuously evolving cyber threats. The increasing sophistication of cyber-attacks
necessitates the development of more advanced detection techniques. Recent years have witnessed
significant progress in IDS through innovations in deep learning, feature selection, and real-time
adaptability. These advancements have contributed to improved accuracy, efficiency, and
responsiveness in detecting and mitigating security threats. This literature review provides an in-
depth analysis of cutting-edge methodologies utilized in IDS, based on research conducted
between 2021 and 2025.
Thaseen et al. (2021) introduced an integrated intrusion detection system that employs
correlation-based attribute selection in conjunction with an artificial neural network (ANN).
The proposed approach focuses on refining the selection of relevant attributes by
eliminating redundancies, which enhances classification accuracy and system efficiency.
By leveraging ANN, the IDS is able to identify patterns in network traffic indicative of
potential security threats. The study demonstrated that this approach effectively improves
anomaly detection capabilities, yielding superior performance metrics such as precision
and recall compared to conventional machine learning techniques. This method highlights
the importance of feature selection in optimizing IDS efficiency while minimizing
computational complexity.
8
capability of SSID to learn and evolve without human intervention, making it an ideal
solution for rapidly changing network environments. Furthermore, the research highlights
the ability of SSID to improve detection rates while reducing false positives, ensuring a
more robust and scalable IDS solution suitable for modern cybersecurity landscapes.
Kaushik and Bhardwaj (2023) explored an advanced intelligent IDS tailored for Industry
4.0 environments. Their research introduced an optimized feature selection strategy
designed to handle large volumes of network traffic data efficiently. By leveraging
advanced machine learning techniques, the proposed IDS demonstrated enhanced
capability in reducing dimensionality while preserving critical information necessary for
intrusion detection. The study employed the CICIDS2018 dataset, achieving an impressive
accuracy of 99.26%. The results suggest that effective feature selection not only improves
IDS accuracy but also contributes to reduced computational overhead, making the system
more scalable and adaptable for real-world industrial applications. The research
underscores the growing need for intelligent IDS solutions capable of adapting to the
increasing complexity of cyber threats targeting Industry 4.0 infrastructures.
9
Table 2.1 Comparative Analysis
Despite significant advancements, IDS still face various challenges that hinder their efficiency and
deployment. Some of the key challenges include:
High False Positive Rates: Many IDS solutions generate a high number of false alarms,
leading to increased overhead and reduced operational efficiency.
Adaptability to New Threats: Cyber threats evolve rapidly, and many traditional IDS
struggle to adapt in real-time.
Computational Overhead: Deep learning-based IDS often require significant
computational resources, making them difficult to deploy on resource-constrained
environments.
Data Imbalance: Many datasets used in IDS training are imbalanced, leading to biased
detection performance.
2.4. Conclusion
The reviewed research studies collectively illustrate the rapid evolution of intrusion detection
methodologies. The integration of deep learning, feature selection, and self-supervised techniques
has significantly enhanced IDS performance in terms of accuracy, adaptability, and computational
efficiency. The studies demonstrate how various approaches, such as artificial neural networks,
autoencoder-based learning, and optimized feature selection, contribute to improving intrusion
10
detection capabilities. Moving forward, research in this field should continue to focus on
developing real-time adaptive IDS solutions that can effectively counter increasingly sophisticated
cyber threats. The adoption of hybrid models and real-time learning frameworks is expected to
play a crucial role in strengthening cybersecurity defenses in the coming years.
11
CHAPTER 3
SYSTEM DEVELOPMENT
12
System Development:
The development of the Next-Gen Intrusion Detection System (NIDS) focuses on improving the
accuracy, scalability, and adaptability of traditional intrusion detection approaches by integrating
advanced deep learning techniques, specifically Artificial Neural Networks (ANNs). The system
development process involves several key stages, including data acquisition, preprocessing, model
training, evaluation, and deployment. This section provides a detailed explanation of the steps
followed in the development of the proposed NIDS.
Preprocessing is a critical step to ensure that the data is suitable for use by machine learning
algorithms. The key preprocessing tasks include:
Feature Selection and Encoding: This involves identifying the most relevant features
for classification. Although feature engineering is minimized in the proposed deep learning
approach, some preprocessing steps such as encoding categorical features and normalization
of continuous values may still be necessary.
Data Splitting: The dataset is split into training, validation, and testing subsets. Typically,
a 70-15-15 split is used, with 70% of the data used for training, 15% for validation, and the
remaining 15% for testing.
Labeling: The data is labelled as either normal or intrusive, where "intrusive" encompasses
various attack types, such as DoS (Denial of Service), DDoS (Distributed Denial of Service),
SQL injection, etc.
13
3.2. Model Development and Architecture:
The Artificial Neural Network (ANN) is the core of the proposed NIDS, offering significant
improvements in scalability and adaptability over traditional models. The ANN model is designed
to learn complex, non-linear patterns in network traffic data, allowing it to detect both known and
unknown intrusion attempts. The architecture of the model is designed as follows:
Input Layer: The input layer receives the preprocessed features of the network traffic
data. Each feature corresponds to a specific characteristic of the network packet or flow.
Hidden Layers: Several hidden layers are introduced to model the complex relationships
between the input features. Each hidden layer consists of multiple neurons that apply
activation functions (e.g., ReLU) to learn non-linear patterns. The number of hidden layers
and neurons per layer is determined through experimentation.
Output Layer: The output layer consists of a single neuron representing the classification
outcome, with an activation function such as sigmoid for binary classification (normal vs.
intrusive).
The backpropagation algorithm is used to optimize the weights of the network during
training. The loss function, typically binary cross-entropy, is minimized during training to
improve the network's prediction accuracy. The optimizer, such as Adam or SGD, adjusts
the weights in the network to minimize the error in predictions.
Training Steps:
Epochs and Batching: The data is divided into smaller batches, and the model is
trained over multiple epochs to update its weights incrementally.
14
Validation: During training, the model's performance is evaluated on the validation
dataset to fine-tune hyperparameters and prevent overfitting. Regularization techniques like
dropout or L2 regularization are applied if necessary.
Evaluation Metrics: After training, the model is evaluated using the test dataset. Key
evaluation metrics include:
Precision, Recall, and F1-score: These metrics are important for assessing the model's
performance, especially in imbalanced datasets where one class (e.g., normal traffic) might
dominate.
The model’s performance is compared against traditional machine learning models, such as
SVM and Random Forest, to benchmark its effectiveness in detecting network intrusions.
Real-Time Prediction: Users can upload a CSV file containing network traffic data,
and the trained model will classify each entry as either normal or intrusive. The system
provides immediate feedback on the classification results.
Buzzer Alert: In the event of an intrusion detection, a buzzer sound is triggered to alert
the user, ensuring quick action can be taken to mitigate any threats.
Visualization: The system includes a bar plot or other visualizations to compare the
training and testing accuracies of different machine learning models, providing insight into
the model's performance.
15
3.5. Scalability and Performance Optimization:
To ensure that the system is scalable and performs efficiently in real-time environments, several
optimization techniques are incorporated:
Accuracy and Precision: How effectively the system can detect both known and
novel intrusions.
Real-Time Performance: The system's ability to classify network traffic in real-time
with low latency.
Scalability: The system's ability to handle large volumes of network traffic without
degradation in performance.
16
Conclusion of System Development:
The Next-Gen Intrusion Detection System (NIDS) developed in this project integrates an
Artificial Neural Network (ANN) to enhance the detection of network intrusions in complex and
large datasets. The system improves upon traditional intrusion detection methods by offering
greater scalability, adaptability, and accuracy. Through careful data preprocessing, model design,
and integration, the system ensures effective detection of both known and novel attacks while
maintaining real-time performance and ease of use. This system represents a significant step
forward in the evolution of network security, addressing the challenges faced by traditional
intrusion detection approaches and offering a robust solution for modern cybersecurity needs.
17
CHAPTER 4
PERFORMANCE ANALYSIS
18
Performance Analysis:
The performance analysis of the proposed Next-Gen Intrusion Detection System (NIDS), based
on Artificial Neural Networks (ANNs), is crucial in evaluating its effectiveness and comparing
it to traditional network intrusion detection methods. This section presents the metrics and results
used to assess the performance of the developed NIDS, focusing on accuracy, precision, recall,
F1-score, scalability, and real-time performance.
Accuracy: This is the proportion of correctly classified instances (both normal and
intrusive) out of the total instances. It is the primary metric used to evaluate the overall
correctness of the model.
{Accuracy} = \ frac {\ text {True Positives} + \text {True Negatives}} {\text {Total
Instances}}
Precision: Precision measures the accuracy of positive predictions, indicating how many
of the predicted intrusions (positive class) are actually correct.
{Precision} = \ frac {\ text {True Positives}} {\text {True Positives} + \text {False
Positives}}
19
Recall (Sensitivity): Recall measures the ability of the model to correctly identify
actual intrusions. It reflects the proportion of actual intrusions that the model correctly
detects.
{Recall} = \frac {\text {True Positives}} {\text {True Positives} + \text {False
Negatives}}
F1-Score: The F1-score is the harmonic mean of precision and recall, providing a single
metric that balances the trade-off between the two. It is especially useful when dealing
with imbalanced datasets.
F1-Score=2×Precision×RecallPrecision+Recall\text
{F1-Score} = 2 \times \frac {\text {Precision} \times \text {Recall}} {\text {Precision} +
\text {Recall}}
F1-Score=2×Precision+RecallPrecision×Recall
Confusion Matrix: The confusion matrix provides a detailed breakdown of the
model's performance by showing the number of true positives (TP), true negatives (TN),
false positives (FP), and false negatives (FN). This matrix is essential for understanding
the system's classification errors.
20
Random Forest: Random Forest is an ensemble learning method that aggregates
multiple decision trees. It is known for its robustness and ability to handle large datasets
but may struggle with non-linear data patterns in intrusion detection.
Naive Bayes: The Naive Bayes classifier is a simple probabilistic model based on Bayes'
theorem. It is typically used for classification tasks where the assumption of feature
independence holds, though this assumption may not always hold in intrusion detection
datasets.
The performance comparison of the models is presented in terms of accuracy, precision, recall,
and F1-score, allowing a clear comparison of the ANN-based NIDS with traditional methods.
As observed, the ANN-based system outperforms the traditional models in terms of accuracy,
precision, recall, and F1-score, with the highest performance metrics. This suggests that deep
learning techniques are better suited for handling complex, non-linear patterns in network traffic
data, resulting in more accurate and reliable intrusion detection.
21
4.4. Scalability and Efficiency:
Scalability is a critical consideration for modern intrusion detection systems, particularly as
network traffic volumes increase. The proposed ANN-based NIDS is designed to efficiently handle
large-scale datasets. During performance testing, the system was evaluated in terms of both
processing time and model size:
Processing Time: The ANN model was capable of processing large batches of network
traffic data with low latency. The real-time prediction capability of the model was tested by
classifying network traffic in a live environment, ensuring that detection occurs promptly.
Model Size and Deployment: The model size was optimized using techniques such as
model pruning and quantization. This ensures that the model is lightweight enough for
deployment on resource-constrained environments, such as edge devices and network
gateways.
Latency: The average latency for real-time detection was measured to be less than 1
second per packet, ensuring quick identification of intrusions.
Alerting: In the event of an intrusion detection, the system triggered an alert via a buzzer
sound or visual notification, allowing administrators to take immediate action.
22
ability to generalize and detect new, previously unseen attacks, as evidenced by its high recall and
F1-score.
23
CHAPTER 5
SYSTEM DESIGN
24
5.1. Introduction
In the design phase, the software requirements are transformed into definitions of the software
components and their interfaces, to establish the framework of the software. This is done by
examine the system design description and building a physical model using recognized software
engineering methods.
The physical model describes the solution in concrete, implementation terms. The logical model
produces in the requirement analysis phase is the structure of the problem and makes it manageable.
Once the system requirement has been specified and analyzed, system design is the first of the
three technical activities design, code and tests that is require building and verifying software.
They represent the functionality of the system from a user's point of view. They define the
boundaries of the system.
A use case diagram at its simplest is a representation of a user's interaction with the system and
depicting the specifications of a use case. A use case diagram can portray the different types of
25
users of a system and various ways that they interact with the system. A use case is a methodology
used in system analysis to identify, clarify and organize system requirements. Use case diagrams
are employed in UML (Unified Modelling Language), a standard notation for the modelling of
real-world objects and systems. Use case diagrams are very much important to explain the
interaction between the system and actor.
26
Figure 5.2 Use Case Diagram
27
Class Diagram
A class diagram is an illustration of the relationships and source code dependencies among classes
in the Unified Modelling Language (UML). In this context, a class defines the methods and
variables in an object, which is a specific entity in a program or the unit of code representing that
entity,
Sequence Diagram
A sequence diagram is an interaction that shows how processes operate with one another and in
what order. It is a construct of message sequence chart. A sequence diagram shows object
interactions arranged in time sequence.
Collaboration Diagram
Activity Diagram
Activity diagram is another important diagram in UML to describe dynamic aspects of the system.
Activity diagram is basically a flow chart to represent the flow from one activity to another activity
State chart diagram is one of the five UML, diagrams used to model the dynamic nature of a system.
They define different states of an object during its lifetime and these states are changed by the
events, State chart diagrams are useful to model the reactive systems.
Component Diagram
A component diagram, also known as UML component diagram, describes the organization and
wiring of the physical components in a system. In the first version of UML, components included
28
in these diagrams were physical: documents: documents, database table, files, and executables, all
physical elements with a location.
The DFD is also called as bubble chart. It is a simple graphical formalism that can be used
to represent a system in terms of input data to the system, various processing carried out
on this data, and the output data is generated by this system.
The data flow diagram (DFD) is one of the most important modeling tools. It is used to
model the system components. These components are the system process, the data used by
the process, an external entity that interacts with the system and the information flows in
the system.
DFD shows how the information moves through the system and how it is modified by a
series of transformations. It is a graphical technique that depicts information flow and the
transformations that are applied as data moves from input to output.
DFD is also known as bubble chart. A DFD may be used to represent a system at any level
of abstraction. DFD may be partitioned into levels that represent increasing information
flow and functional detail.
29
Figure 5.3 Data Flow Diagram
30
CHAPTER 6
IMPLEMENTATION
AND
RESULTS
31
6.1. SAMPLE CODE
IMPORTING DATASET:
# In[1]
train_url = 'https://raw.githubusercontent.com/merteroglu/NSL-KDD-Network-Instrusion-
Detection/master/NSL_KDD_Train.csv'
test_url = 'https://raw.githubusercontent.com/merteroglu/NSL-KDD-Network-Instrusion-
Detection/master/NSL_KDD_Test.csv'
#In[2]
import pandas as pd
col_names = ["duration","protocol_type","service","flag","src_bytes",
"dst_bytes","land","wrong_fragment","urgent","hot","num_failed_logins",
"logged_in","num_compromised","root_shell","su_attempted","num_root",
"num_file_creations","num_shells","num_access_files","num_outbound_cmds",
"is_host_login","is_guest_login","count","srv_count","serror_rate",
"srv_serror_rate","rerror_rate","srv_rerror_rate","same_srv_rate",
"diff_srv_rate","srv_diff_host_rate","dst_host_count","dst_host_srv_count",
"dst_host_same_srv_rate","dst_host_diff_srv_rate","dst_host_same_src_port_rate",
"dst_host_srv_diff_host_rate","dst_host_serror_rate","dst_host_srv_serror_rate",
"dst_host_rerror_rate","dst_host_srv_rerror_rate","label"]
#In[3]
#In[4]
print(train["label"].value_counts())
print()
print(test["label"].value_counts())
#In[5]
sns.countplot(x=train["label"])
plt.show()
#In[6]
#Rewriting
33
# Checking the distribution after the change
print(train["label"].value_counts())
print()
print(test["label"].value_counts())
#In[7]
sns.countplot(x=train["label"])
plt.show()
#In[8]
label_encoder = LabelEncoder()
#In[9]
train.drop(columns=train.select_dtypes(include=['object']).columns, inplace=True)
#In[10]
34
train.info()
#In[11]
RANDOM FOREST:
#In[12]
35
# Fit the model to the training data
rf_model.fit(X_train, y_train)
y_pred = rf_model.predict(X_test)
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
#In[13]
y_train_pred = rf_model.predict(X_train)
y_test_pred = rf_model.predict(X_test)
plt.figure(figsize=(8, 6))
36
xticklabels=label_encoder.classes_, yticklabels=label_encoder.classes_)
plt.title(title)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.tight_layout()
plt.show()
print(classification_report(y_train, y_train_pred))
print(classification_report(y_test, y_test_pred))
#In[14]
# Calculate accuracies
# Store accuracies
37
accuracy_scores['Random Forest'] = {
'train_accuracy': train_accuracy_rf,
'test_accuracy': test_accuracy_rf
NAIVE BAYES:
#In[15]
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)
y_train_pred = nb_model.predict(X_train)
y_test_pred = nb_model.predict(X_test)
38
test_confusion_matrix = confusion_matrix(y_test, y_test_pred)
plt.figure(figsize=(8, 6))
xticklabels=label_encoder.classes_, yticklabels=label_encoder.classes_)
plt.title(title)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.tight_layout()
plt.show()
print(classification_report(y_train, y_train_pred))
print(classification_report(y_test, y_test_pred))
#In[16]
39
# Calculate accuracies
# Store accuracies
accuracy_scores['Naive Bayes'] = {
'train_accuracy': train_accuracy_nb,
'test_accuracy': test_accuracy_nb
MLP:
#In[17]
mlp_model.fit(X_train, y_train)
40
y_train_pred = mlp_model.predict(X_train)
y_test_pred = mlp_model.predict(X_test)
plt.figure(figsize=(8, 6))
xticklabels=label_encoder.classes_, yticklabels=label_encoder.classes_)
plt.title(title)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.tight_layout()
plt.show()
41
print(classification_report(y_train, y_train_pred))
print(classification_report(y_test, y_test_pred))
#In[18]
accuracy_scores['MLP'] = {
'train_accuracy': train_accuracy_mlp,
'test_accuracy': test_accuracy_mlp
ANN:
#In[19]
import tensorflow as tf
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
42
X_test_scaled = scaler.transform(X_test)
model = Sequential()
# Hidden layers
model.add(Dense(units=128, activation='relu'))
model.add(Dense(units=64, activation='relu'))
model.add(Dropout(0.8))
model.add(Dense(units=1, activation='sigmoid'))
43
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_test_ann))
print("\nClassification Report:")
print(classification_report(y_test, y_test_ann))
plt.figure(figsize=(8, 6))
xticklabels=label_encoder.classes_, yticklabels=label_encoder.classes_)
plt.title(title)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.tight_layout()
plt.show()
44
Figure 6.1 ANN Training Confusion Matrix
45
#In[20]
#In[21]
accuracy_scores['ANN'] = {
'train_accuracy': train_accuracy_ann,
'test_accuracy': test_accuracy_ann
#In[22]
model = load_model('cnn_intrusion_model.h5')
#In[23]
import pandas as pd
samp_data = {
46
'duration': [100.00],
'src_bytes': [1317.00],
'dst_bytes': [404.00],
'land': [0.00],
'wrong_fragment': [0.00],
'urgent': [0.00],
'hot': [0.00],
'num_failed_logins': [0.00],
'logged_in': [1.00],
'num_compromised': [0.00],
'root_shell': [0.00],
'su_attempted': [0.00],
'num_root': [0.00],
'num_file_creations': [0.00],
'num_shells': [0.00],
'num_access_files': [0.00],
'num_outbound_cmds': [0.00],
'is_host_login': [0.00],
'is_guest_login': [0.00],
'count': [1.00],
47
'srv_count': [1.00],
'serror_rate': [0.00],
'srv_serror_rate': [0.00],
'rerror_rate': [0.00],
'srv_rerror_rate': [0.00],
'same_srv_rate': [1.00],
'diff_srv_rate': [0.00],
'srv_diff_host_rate': [0.00],
'dst_host_count': [36.00],
'dst_host_srv_count': [156.00],
'dst_host_same_srv_rate': [0.50],
'dst_host_diff_srv_rate': [0.08],
'dst_host_same_src_port_rate': [0.03],
'dst_host_srv_diff_host_rate': [0.01],
'dst_host_serror_rate': [0.00],
'dst_host_srv_serror_rate': [0.00],
'dst_host_rerror_rate': [0.00],
'dst_host_srv_rerror_rate': [0.00],
'protocol_type_encoded': [1.00],
'service_encoded': [54.00],
48
'flag_encoded': [9.00]
sample_row = pd.DataFrame(samp_data)
#In[24]
sample_row=sample_row.iloc[0]
prediction = model.predict(single_row_input_reshaped)
49
#In[25]
#In[26]
import numpy as np
model_names = list(accuracy_scores.keys())
x = np.arange(len(model_names))
# Bar width
width = 0.35
# Plotting
plt.figure(figsize=(10, 6))
50
# Labels and Title
plt.xlabel('Models')
plt.ylabel('Accuracy')
plt.xticks(x, model_names)
plt.ylim(0, 1.1)
plt.legend(loc='lower right')
plt.tight_layout()
plt.show()
#In[27]
X_test.to_csv('X_val.csv', index=False)
#In[28]
import gradio as gr
import pandas as pd
model = load_model('cnn_intrusion_model.h5')
51
Figure 6.3 Training and Testing Accuracy of Various Models
52
# Define the prediction function
df = pd.read_csv(file.name)
selected_row = df.iloc[row_number]
prediction = model.predict(single_row_input)
# Interpret prediction
if predicted_class == 0:
53
inputs = [
output = gr.Textbox(label="Prediction")
54
Figure 6.5 About Screen
56
6.3. OUTPUT SCREENS:
58
CHAPTER 7
CONCLUSION
59
Conclusion:
The proposed Next-Gen Intrusion Detection System (NIDS), utilizing Artificial Neural
Networks (ANNs), offers a significant advancement in the field of network security by addressing
the limitations of traditional hybrid intrusion detection systems that rely on Differential Harmony
Search Algorithm (DLHA) and Support Vector Machine (SVM). The primary motivation for
this development was to overcome the scalability challenges and limited adaptability to non-linear
patterns inherent in previous models.
Through the use of deep learning, the system demonstrates enhanced detection accuracy,
operational efficiency, and scalability. The ANN-based architecture excels at learning complex
and evolving attack patterns, making it highly effective for both known and novel intrusion
detection. By reducing the dependency on extensive feature engineering and improving the
system's ability to handle large datasets, the proposed model outperforms traditional machine
learning approaches in terms of accuracy, precision, recall, and F1-score.
In the course of the project, various machine learning models, including Random Forest, Naive
Bayes, and SVM, were benchmarked against the proposed ANN-based system. The results clearly
show that the ANN model offers superior performance, particularly in handling non-linear data
relationships, which are common in network traffic patterns.
Additionally, the integration of the system into a Gradio interface for real-time detection of
network intrusions offers a user-friendly approach to deploying the system in live network
environments. The ability to process network traffic in real time, coupled with low-latency alerts
for detected intrusions, ensures that the system is well-suited for modern cybersecurity demands.
The proposed Next-Gen Intrusion Detection System represents a robust, adaptable, and efficient
solution for modern network security, capable of addressing the evolving and complex nature of
cyber threats.
60
CHAPTER 8
APPENDICES
61
APPENDICES
1. NSL-KDD Dataset:
o Description: The NSL-KDD dataset is a widely used benchmark dataset for evaluating intrusion
detection systems. It consists of network traffic data labelled as either normal or intrusive, with
multiple attack types such as DoS, DDoS, and SQL injection.
o Usage: The dataset was used for training and testing the models in this research.
o Features: The dataset includes features such as packet length, source and destination IP addresses,
protocol type, and other statistical features derived from network traffic.
o Description: The CICIDS 2017 dataset is another popular dataset for intrusion detection,
containing both normal and attack traffic data across multiple categories like botnet, DoS, and web
attacks.
o Usage: It was used for performance testing and model evaluation in this project.
Activation Function: ReLU (Rectified Linear Unit) for hidden layers, and Sigmoid for the output
layer
Batch Size: 32
Epochs: 50
62
Appendix C: Confusion Matrix Example
The confusion matrix provides a detailed overview of the model's performance, breaking down the
number of true positives, true negatives, false positives, and false negatives. An example confusion
matrix for the ANN model is as follows:
1. Data Preprocessing: Code for cleaning, encoding, and normalizing the dataset.
2. Model Training: The architecture of the ANN model, including the number of layers,
activation functions, and training parameters.
3. Model Evaluation: Evaluation metrics such as accuracy, precision, recall, F1-score, and
confusion matrix.
4. Real-Time Interface: Integration with Gradio to allow users to upload network traffic data
and receive real-time predictions.
63
5. Alert System: Code for triggering a buzzer sound when an intrusion is detected.
Appendix E: Visualizations
Several visualizations were generated to aid in understanding the model’s performance and
comparison with other models:
1. Training vs. Testing Accuracy Bar Plot: A bar plot comparing the accuracy of the ANN
model with other machine learning models (SVM, Random Forest, Naive Bayes).
2. Confusion Matrix Heatmap: A heatmap of the confusion matrix, visually representing the
classification outcomes for the ANN model.
3. ROC Curve and AUC: Receiver Operating Characteristic (ROC) curve and the Area Under
the Curve (AUC) for the ANN model, illustrating its classification performance across different
thresholds.
64
CHAPTER 9
REFERENCES
65
1. Design and Development of an Efficient Network Intrusion Detection System Using
Machine Learning Techniques
Thomas, R., & Gupta, R. (2021). Wireless Communications and Mobile Computing, 2021, 1-35.
10. A Detailed Investigation and Analysis of Using Machine Learning Techniques for
Intrusion Detection
Mishra, P., Varadharajan, V., Tupakula, U., & Pilli, E. S. (2018). IEEE Communications Surveys
& Tutorials, 21(1), 686-728.
66
11. Machine Learning and Deep Learning Methods for Intrusion Detection Systems: A
Survey
Liu, H., & Lang, B. (2019). Applied Sciences, 9(20), 4396.
12. Evaluating Shallow and Deep Neural Networks for Network Intrusion Detection
Systems in Cyber Security
Vigneswaran, R. K., Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2018). In 2018 9th
International Conference on Computing, Communication and Networking Technologies
(ICCCNT) (pp. 1-6). IEEE.
13. Using Convolutional Neural Networks to Network Intrusion Detection for Cyber
Threats
Lin, W. H., Lin, H. C., Wang, P., Wu, B. H., & Tsai, J. Y. (2018). In 2018 IEEE International
Conference on Applied System Invention (ICASI) (pp. 1107-1110). IEEE.
15. A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks
Yin, C., Zhu, Y., Fei, J., & He, X. (2017). IEEE Access, 5, 21954-21961.
16. Intrusion Detection System Based on Convolutional Neural Networks for Internet of
Things
Tang, T. A., Mhamdi, L., McLernon, D., Zaidi, S. A. R., & Ghogho, M. (2018). In 2018 Wireless
Communications and Networking Conference (WCNC) (pp. 1-6). IEEE.
17. Network Intrusion Detection Using Deep Learning: A Feature Learning Approach
Shone, N., Ngoc, T. N., Phai, V. D., & Shi, Q. (2018). IEEE Transactions on Emerging Topics in
Computational Intelligence, 2(1), 41-50.
19. A Hybrid Intrusion Detection System Integrating Anomaly Detection with Misuse
Detection
67
Zhang, Y., & Zulkernine, M. (2006). In 2006 IEEE International Conference on Communications
(Vol. 5, pp. 2389-2394). IEEE.
68