An Intrusion Detection System
An Intrusion Detection System
An Intrusion Detection System
With the rapid growth of interconnected devices in the Internet of Things (IoT), network security
faces new challenges due to limited computational resources, memory constraints, and the unique protocols
used in these devices. This project introduces a lightweight Intrusion Detection System (IDS) optimized for
resource-constrained environments by leveraging Benford's Law and analyzing network flow size
differences. Our IDS systematically monitors network traffic and identifies deviations from normal flow
patterns, utilizing the first significant digit distribution predicted by Benford's Law. By applying linear
regression and error metrics on network traffic samples, we successfully detect anomalies indicative of cyber
attacks. Experimental evaluations using the NSL-KDD dataset demonstrate the effectiveness of this IDS
approach in distinguishing between normal and malicious TCP flows. The results show promise for scalable
deployment in IoT and similar resource-constrained networks, paving the way for improved, accessible
CHAPTTER 1: INTRODUCTION
1.1 INTRODUCTION 01
1.4 LITERATURE REVIEW 02
REFERENCES 24
APPENDIX 26
LIST OF FIGURES
LIST OF TABLES
CHAPTER 1
INTRODUCTION
1.1 INTRODUCTION
Nowadays, the number of internet-connected devices are increasing exponentially which makes the task
of enforcing security and availability of the network services to the users much more challenging. In the last
decade, various tools and techniques have been developed by organizations towards the protection of
network against different security threats such as access control mechanisms, user authentication, and
firewalls. Although these solutions prevent unauthorized access by outsiders, they are not resilient against
insider attacks. Thus intrusion detection system (IDS) [1] was developed to act as the second line of defense
to protect information loss to intruders.
IDS can be classified into two major types i.e., Networkbased intrusion detection system (NIDS) and
Host-based intrusion detection system (HIDS). NIDS is installed within the networks, which monitors the
network traffic and identify potential threats such as denial-of-service attacks, port scans, etc. Host-based
IDS is installed within the individual host computer and analyzes its local host activities for any intrusion or
threat in the system [2]. Based on the detection techniques, IDS can be further classified into three types, i.e.,
signature-based IDS, anomaly-based IDS and hybrid-based IDS. Signature-based IDS searches for a specific
predefined sequence of bytes or packet in the network traffic. It is very fast in detecting known attacks but
unable to detect unknown attacks or zero-day attacks. An anomalybased IDS identifies similar behavior in
normal traffic flow, and anything which deviates from this behaviour is considered as anomalies. This
method is useful in finding unknown attacks very well.
However, they often lack in detecting legitimate traffic which leads to high False Positive Rate (FPR).
On the other hand, hybrid-based IDS combines the properties of both signature-based and anomaly-based
IDS to get better accuracy. Network-based IDS is further classified into two types based on the source of
data analyzed ie., packet-based and flowbased intrusion detection system. Packet-based IDS has to inspect
each packet in the network traffic, including payload and headers. Flow-based analyses some specific
packets in the network traffic. It looks for aggregated information of similar packets in the network flow; due
to this the amount of data that is going to be analyzed is reduced. It provides information and patterns about
network flow. Packet-based is mostly based on signature-based, and flow-based provides anomaly-based
IDS.
It is found that flow size difference of normal TCP flow follows Benford’s law approximately and
malicious TCP flow deviates from it [4]. Flow size difference is applied on first digit of Benford’s law. Since
An Intrusion Detection System 2
Benford’s law can work using only a few (even single) feature values, and it requires only simple
mathematical operation, it can be very much suitable for resource-constraint systems like IoT devices. In this
direction, we propose an efficient, lightweight IDS using Benford’s law
Intrusion Detection Systems (IDS) serve as a defensive measure to detect and prevent malicious
activities within a network, acting as an additional layer of protection even with other security tools like
firewalls and antivirus software. According to Kazienko and Dorosz (2003), an IDS is essential for detecting
and thwarting hostile activities that could compromise network integrity if left unchecked. Amoroso (1999)
further defines intrusion detection as “a process of identifying and responding to malicious activity targeted
at computing and networking resources.” Unlike firewalls and antivirus programs, which protect against
unauthorized access from external threats, IDS monitors and flags unusual or malicious internal activities,
enhancing overall network security.
1. Network-based IDS (NIDS): Used to monitor network protocols and detect suspicious activity,
NIDS is typically deployed on virtual private networks, remote access servers, and routers (Sturmer,
2013).
2. Wireless IDS (WIDS): Similar to NIDS but specifically designed for wireless networks, WIDS
identifies unauthorized access points and potential misuse within wireless communication channels
(Adams).
3. Host-based IDS (HIDS): Operating on individual hosts, HIDS monitors for changes in system files,
network traffic anomalies, and suspicious application processes (Sturmer).
4. Network Behavior Analysis (NBA): NBA focuses on detecting unusual network behaviors and
traffic patterns, identifying potential threats through deviations from standard network flow
(Seehorn).
Attacks detected by IDS may be internal or external. Internal attacks originate from within the
network, often executed by trusted insiders who exploit their access to sensitive data. These attacks are
especially damaging as insiders benefit from organizational trust and physical access, making detection more
complex. Conversely, external attacks generally come from the internet, involving actors outside the network
An Intrusion Detection System 3
perimeter. With rising incidents of insider attacks and increasing regulatory demands, organizations face
significant challenges in securing data from internal threats while adhering to compliance requirements
(Magalhaes, 2003).
Network-based Intrusion Prevention Systems (NIPS) operate at the network level to scrutinize
incoming and outgoing packets, allowing or blocking them based on predefined security policies, similar to a
firewall. According to Scarfone and Mell (2007), NIPS provides extensive logging capabilities, which help
in tracking detected incidents for further analysis and alert review. Information logged by NIPS includes:
NIPS also facilitates information gathering on host systems within the network, offering insights into
operating systems, applications, and network characteristics, which supports proactive threat assessment and
enhances the network’s defensive capabilities.
Network Intrusion Detection Systems (NIDS) continuously monitor network traffic, identifying
patterns that may signify a potential attack. As noted by Rozenblum (2001), a NIDS serves multiple
purposes: monitoring user and system activities, analyzing vulnerabilities, assessing file integrity, detecting
attack patterns, and logging policy violations. Unlike firewalls, which primarily block unauthorized access,
NIDS can detect subtle indications of intrusions even after an initial entry. NIDS configurations involve the
strategic placement of intrusion detection sensors at network entry points to feed information to a central
management console. This centralized setup allows administrators to analyze logs, update attack signatures,
and manage sensor configurations, enhancing the network’s overall resilience.
4. Phishing
often involves mock websites mimicking authentic sites to lure victims. This form of fraud became widely
recognized in 1996, initially targeting AOL accounts, and has since evolved into a sophisticated threat that
continues to exploit trust in familiar online platforms (Reid, 2009). Rouse (2007) describes phishing attacks
as akin to “fishing” for information, where attackers cast a wide net, hoping to capture unsuspecting victims
who interact with fraudulent links or websites. The growth of online interactions has exacerbated phishing
risks, making it a persistent threat in both personal and organizational contexts.
IDS can play a crucial role in detecting and preventing phishing by using known phishing signatures
to block suspicious IP addresses. Liniger and Vines (2005) highlight the importance of signature-based
detection for identifying characteristics of known phishing attacks. With phishing attacks becoming more
prevalent and sophisticated, organizations need robust Intrusion Detection and Prevention Systems that can
monitor beyond traditional email channels, especially as social media adoption grows (Kaspian, 2013).
Comprehensive IDS solutions are thus essential for protecting sensitive information, as phishing attacks
increasingly target organizations through a range of digital communication channels.
An Intrusion Detection System 5
CHAPTER 2
PROJECT MECHANISM
End
An Intrusion Detection System 6
A. Traffic data
We have used NSL-KDD [12] as a traffic dataset for our project as it is one of the most widely used
dataset for intrusion detection system. This dataset consists of basic features and derived features for fine-
grained analysis of data in time and sequence windows. The dataset consists of 41 features divided into three
different categories as basic, content, and traffic. The NSL-KDD dataset contains a trainnig and testing
dataset. We have used only the training dataset that contains 1,25,973 samples of network traffic flow as we
need only few window flow samples and the number of samples is sufficient for our experimentation. The
42nd attribute represents a label for each flow as normal or attack. The attack classes are divided into four
types, namely DOS (Denial Of Service), Probe, U2R (User to Root), and R2L (Root to Local). The
distribution of total samples and TCP samples for each attack category is in shown Table 2.1.
There are 67,343 normal traffic flow present in the NSLKDD dataset, and the rest of them are
malicious. As, we aim to apply Benford’s law in TCP flow only, so we extracted all TCP connection samples
from the NSL-KDD training dataset, which consists of 1,25,973 samples and created a dataset. This dataset
contains 1,02,689 samples. Further, this dataset is divided into 3 datasets: 1) Normal dataset - which contains
all normal TCP connections (53,600 flows). 2) Malicious dataset - which contains all malicious TCP
connections (49,089 flows). 3) Mixed dataset - which contains normal and malicious flow in different
proportions.
B. The Metrics
We have used the metric called as” flow size difference” which is the difference between two
consecutive values in a feature having numeric values. Next, we discuss the flow-size difference and the
reason to use it.
An Intrusion Detection System 7
1) Flow size difference: It is defined as the numeric difference of two consecutive TCP flows’ sizes.
It seems to be a potential metric of interest because it inherits the feature of long-tailedness. Iorliam et al.
[11] discussed about the validity of the flow-size difference to follow Benford’s law. For flow size
difference, we ignore the sign bit of the metric value and consider only the absolute value of the flow size
difference.
C. Detection method
In the previous section, we introduce flow size difference, which is a core part of the detection
method. In this section, we will briefly explain how to use this metric to detect normal and attack flow based
on Benford’s law. First, we divide the dataset into two sets; the first set contains all the normal flows while
the second set contains all the attack flows. There is a three-step procedure that is used to obtain the first-
digit frequency distribution of any feature having numeric values.
These steps are followed for each feature having a numeric value, and the plots corresponding to
normal flows is obtained. Further, this process is repeated for all features in the second dataset, which
contains only attack flows. A feature is said to follow Benford’s law if its plot corresponding to normal flows
shows a high similarity while the plot corresponding to attack flows shows high deviation from the expected
curve of the Benford’s law. We found only seven features in NSLKDD dataset that follows Benford’s law in
NSL-KDD dataset. These features are: duration, src bytes, dst bytes, count, srv counts, dst host count and dst
host srv count.
Benford’s law requires the calculation of distribution over a window. In a real-life scenario, the size
of window taking into consideration plays a crucial role. We found that the size of window that will give
optimal result can be obtained based on χ 2 divergence method.
An Intrusion Detection System 8
Benford’s law provides a vector that contains percentage distribution for each first digit value. If we
substitute the distribution value in place of P(d) in the Inverse-Benford function for each first digit d, then we
obtain nine pair of values as (1,f(1)), (2,f(2)... (9,f(9)). Further, we plot these points with digits i,e 1, 2... 9 on
x-axis and corresponding functional value as f(1), f(2) ... f(9) on y-axis. We name this type of curve as
Inverse-Benford distribution.
Further, we have applied linear regression method to find out best-fit line against the Inverse-Benford
distribution. We then evaluated different error metrics such as Mean Absolute Error (MAE), Mean Square
Error (MSE), and Root Mean Square Error (RMSE) between the best-fit line and the Inverse-Benford
distribution. The error values are observed to significantly less for normal flows and high for attack flows
depending upon the deviation from the nature of straight-line curve. The idea is to set a threshold value on
these error metrics. If the calculated error values are higher than the threshold value, the window flow is
classified as attack flow otherwise it is termed as normal flow.
D. TCP flows
In this subsection, we will discuss TCP flows. It is a set of packets that have similar properties and
close in time, i.e., they have the same source and destination. On considering TCP flow, the amount of
analysis of data is being reduced. TCP flows provide TCP connection or TCP sessions, which is consist of a
set of IP packets, starting with a SYN packet and ending with a FIN or RST packet. These flows are
generated by a large population of users through internet, and they are mutually independent
E. Window size
Since Benford’s law is based on distribution, we need to collect a given number of samples by using a
window-based method [9]. For a given metric, we can construct observed distribution and then compare it
with the expected distribution for detecting any deviation. It is found that on increasing window size, the
calculation of first-digit frequency is getting more accurate. But after a certain window size value, it does not
increase very much. However, it does increase the processing and memory cost. So, choosing an appropriate
value for window size is an important issue so that the model can be applicable for limited resource devices
(like IoT systems). Window size should be selected in such a way that any small deviation from the target
distribution could be easily captured. Since the Benford’s law has only 9 values, it can be expected that
An Intrusion Detection System 9
window size could not be very large. It is found that window size W = 2000 to 3000 with step size of S = 50
to 500 is giving accurate results. Step size or window sliding defines how much the flow window slides at
one time. According to window-based method step size varies from 1 to W, generally S = W/2 is better
choice.
where d is the first digit (1,2,3...9), Pd is the actual observed frequency, and Pˆ d is the expected
frequency of the data set. Figure 2.2 illustrates the plot between χ 2 and window size for normal traffic flows
in the network. The plot shows that as the window size goes on increasing, the χ 2 value goes on decreasing.
It means that the actual distribution of the first digit is going to very similar to the expected distribution. The
concept is that small window size could not able to cover all data set in our first digit data set, so there would
be a case where some values will be missing. But, on increasing the window size, all the first digit data is
going to cover within that window size and distribution tends to similar like expected distribution. As we can
see in the plot, the window size of value approximately 2500 is giving less deviation. We can use these many
flows to process at one time to detect whether the flow is normal or contain attack in the real-life scenario.
An Intrusion Detection System 10
G. Code Explanation
An Intrusion Detection System 11
This code reads a CSV file containing a list of numbers, cleans the data, calculates expected probabilities for
each digit's appearance in the first and second positions, then compares these expected probabilities with
observed probabilities in the data. If observed probabilities fall outside calculated confidence intervals,
they’re flagged as violations. Finally, it creates bar charts for expected vs. observed frequencies and saves
them, along with flagged data.
1. Importing Libraries
import numpy as np
import math
from scipy import stats
import pandas as pd
import matplotlib
matplotlib.use('Agg') # Use a non-GUI backend
from matplotlib import pyplot as plt
def calc_confidence_interval(alpha):
alpha_stat = stats.norm.ppf(1 - (1 - alpha) / 2)
for d_i in range(expected_prob_matrix.shape[0]):
for n_i in range(expected_prob_matrix.shape[1]):
prop_exp = expected_prob_matrix[d_i, n_i]
half_length = alpha_stat * (prop_exp * (1 - prop_exp) / data_cols_length[n_i]) ** .5 + (1 / (2 *
data_cols_length[n_i]))
u_bound = prop_exp + half_length
l_bound = prop_exp - half_length
h_length_matrix[d_i, n_i] = half_length
u_bound_matrix[d_i, n_i] = u_bound
l_bound_matrix[d_i, n_i] = max(l_bound, 0)
Calculates confidence intervals for each digit’s expected probability using a normal distribution.
alpha_stat computes the z-score based on the alpha value.
half_length is the margin of error, calculated for each cell.
u_bound_matrix and l_bound_matrix store the confidence interval limits.
8. Identifying Violations
list_of_rules_violated = []
An Intrusion Detection System 14
CHAPTER 3
TECHNOLOGY STACK
3.1 PYTHON
Python is a programming language that is widely used in web applications, software development,
data science, and machine learning (ML). Developers use Python because it is efficient and easy to learn and
can run on many different platforms. Python software is free to download, integrates well with all types of
systems, and increases development speed.
The Python language has several use cases in application development, including the following examples:
Server-side web development includes the complex backend functions that websites perform to
display information to the user. For example, websites must interact with databases, talk to other websites,
and protect data when sending it over the network.
An Intrusion Detection System 17
Python is useful for writing server-side code because it offers many libraries that consist of prewritten
code for complex backend functions. Developers also use a wide range of Python frameworks that provide
all the necessary tools to build web applications faster and more easily. For example, developers can create
the skeleton web application in seconds because they don’t need to write it from scratch. They can then test it
using the framework’s testing tools, without depending on external testing tools.
A scripting language is a programming language that automates tasks that humans normally perform.
Programmers widely use Python scripts to automate many day-to-day tasks such as the following:
Downloading content
Data science is extracting valuable knowledge from data, and machine learning (ML) teaches computers
to automatically learn from the data and make accurate predictions. Data scientists use Python for data
science tasks such as the following:
Visualizing data by using charts and graphs such as line charts, bar graphs, histograms, and pie charts
Software development
An Intrusion Detection System 18
Software developers often use Python for different development tasks and software applications such as
the following:
Software testing is the process of checking whether the actual results from the software match the
expected results to ensure that the software is error-free.
Developers use Python unit test frameworks, such as Unittest, Robot, and PyUnit, to test the
functions they write.
Software testers use Python to write test cases for various test scenarios. For example, they use it to
test the user interface of a web application, multiple software components, and new features.
Developers can use several tools to automatically run test scripts. These tools are known as Continuous
Integration/Continuous Deployment (CI/CD) tools. Software testers and developers use CI/CD tools such as
Travis CI and Jenkins to automate tests. The CI/CD tool automatically runs the Python test scripts and
reports the test results whenever developers introduce new code changes.
3.2 VS CODE
Visual Studio Code is a free, lightweight but powerful source code editor that runs on your desktop
and on the web and is available for Windows, macOS, Linux, and Raspberry Pi OS. It comes with built-in
support for JavaScript, TypeScript, and Node.js and has a rich ecosystem of extensions for other
programming languages (such as C++, C#, Java, Python, PHP, and Go), runtimes (such as .NET and Unity),
environments (such as Docker and Kubernetes), and clouds (such as Amazon Web Services, Microsoft
Azure, and Google Cloud Platform).
Aside from the whole idea of being lightweight and starting quickly, Visual Studio Code has
IntelliSense code completion for variables, methods, and imported modules; graphical debugging; linting,
multi-cursor editing, parameter hints, and other powerful editing features; snazzy code navigation and
An Intrusion Detection System 19
refactoring; and built-in source code control including Git support. Much of this was adapted from Visual
Studio technology.
Visual Studio Code proper is built using the Electron shell, Node.js, TypeScript, and the Language
Server Protocol, and is updated monthly. The many extensions are updated as often as needed. The richness
of support varies across the different programming languages and their extensions, ranging from simple
syntax highlighting and bracket matching to debugging and refactoring. You can add basic support for your
favorite language through TextMate colorizers if no language server is available.
The code in the Visual Studio Code repository is open source under the MIT License. The Visual
Studio Code product itself ships under a standard Microsoft product license, as it has a small percentage of
Microsoft-specific customizations. It’s free despite the commercial license.
CHAPTER 4
RESULTS AND DISCUSSIONS
For experimental analysis, we have used the NSL-KDD data set which categorized attack types into
four different attack types, i.e., Dos, Probe, R2L, and U2R. The distribution of total samples and TCP
samples for each attack category in the NSL-KDD dataset is shown in Table 2.I. According Detection
method, we obtain only seven feature of NSL-KDD dataset which follows the Benford’s law distribution.
Figure 2.3 shows the frequency distribution curve for these seven features along with a blue line curve which
reflects the actual curve for the distribution of Benford’s law. It can be observed from the Figure 2.3 that the
distribution curves corresponding to these seven features are closely following Benford’s law curve for
normal flows. Figure 4.1 shows the data that does not have any intrusion. Because there is no deviation
between expected and given data. Figure 4.2 has attack flows as there is deviation between expected and
given data.
An Intrusion Detection System 20
CHAPTER 5
ADVANTAGES AND CHALLENGES
5.1 ADVANTAGES
The project we developed has several key advantages, particularly for applications in resource-constrained
environments such as IoT networks:
1. Efficiency and Resource Optimization:
By leveraging Benford's Law, which operates on statistical distributions rather than
exhaustive analysis of packet data, the system requires fewer computational resources,
making it well-suited for IoT and other low-power devices.
The proposed IDS uses a flow-based approach rather than packet-based inspection, focusing
on the size differences in network flows. This approach reduces data processing requirements,
conserving memory and processing power.
2. Scalability for Large Networks:
This lightweight system is designed to handle extensive IoT networks where traditional,
resource-intensive IDS models may not be feasible. The scalability is achieved by analyzing
fewer data points without compromising on detection effectiveness, allowing it to monitor and
protect large IoT networks with minimal performance impact.
3. Enhanced Detection of Anomalous Behavior:
The system’s reliance on Benford’s Law and flow size differences allows for effective
anomaly detection. By focusing on the natural statistical patterns in network traffic, it can
identify deviations associated with malicious activities, even in encrypted or obfuscated traffic
flows.
This approach is robust against various types of cyber threats, including denial-of-service
(DoS) attacks, port scans, and other network-based intrusions that may bypass signature-based
detection systems.
4. High Detection Accuracy and Low False Positives:
The use of linear regression and error metrics (e.g., Mean Absolute Error, Mean Squared
Error) enables the system to differentiate between normal and malicious flows with high
accuracy. The model adjusts its threshold values based on observed errors, helping to keep
false positives low, which is crucial in minimizing unnecessary alerts and reducing the
workload on security teams.
5. Adaptability for Different Attack Ratios:
The system demonstrates adaptability by maintaining detection accuracy even as the
proportion of malicious traffic in a network changes. This dynamic capability makes it highly
suitable for varying network conditions, which are common in real-world applications
.
An Intrusion Detection System 23
5.2 CHALLENGES
False Positives: Legitimate but unusual traffic patterns may deviate from Benford's Law,
potentially triggering false alarms.
Dataset Limitations: Reliance on the NSL-KDD dataset might reduce accuracy in diverse, real-
world scenarios with varied traffic behaviors.
Resource Constraints: Although designed for IoT devices, the IDS's statistical calculations
could still strain low-power devices
Encrypted Traffic: Encrypted data poses a challenge, as the IDS depends on analyzing
unencrypted traffic patterns to detect anomalies.
Threshold Tuning: The IDS requires fine-tuning of detection thresholds to fit different network
environments, which may complicate deployment.
CHAPTER 6
An Intrusion Detection System 24
6.1 CONCLUSION
The proposed intrusion detection system (IDS) leveraging Benford's Law and network flow size
differences provides an efficient solution for detecting malicious activity in resource-constrained
environments, such as IoT networks. By focusing on statistical patterns rather than exhaustive packet-level
analysis, this IDS offers a scalable and cost-effective alternative to traditional methods, allowing it to operate
with minimal computational overhead. Experimental results on the NSL-KDD dataset indicate strong
detection accuracy and low false-positive rates, affirming its potential for real-world applications in IoT and
large-scale networks. The project works and gives output as we expected from the beginning with slight
variations in between which are negligible.
Enhanced Attack Classification: Future work could integrate machine learning techniques for
classifying different attack types once detected, improving response strategies based on specific
threat categories.
Adaptation for Encrypted Traffic: Enhancements for analyzing encrypted traffic patterns without
Real-World Testing and Dataset Expansion: Testing the IDS on more diverse and contemporary
datasets would ensure adaptability to various network environments and emerging threat vectors.
Dynamic Threshold Adjustment: Implementing adaptive threshold mechanisms could help the IDS
Integration with IoT Security Frameworks: Deploying this IDS as part of a comprehensive IoT
security framework would allow for seamless threat management across heterogeneous devices and
REFERENCES
An Intrusion Detection System 25
[1] Anna L. Buczak, and Erhan Guven, ”A Survey of Data Mining and Machine Learning Methods for Cyber
Security Intrusion Detection,” IEEE communication surveys & tutorial, vol. 18, no. 2, pp. 1153-1176,
(2016).
[2] Chandola V, Banerjee A, Kumar V. Anomaly detection: A survey[J]. ACM computing surveys (CSUR),
2009, 41(3): 15
[3] Z. Jasak and L. Banjanovic-Mehmedovic, ”Detecting Anomalies by Benford’s Law,” 2008 IEEE
[4] Iorliam A, Tirunagari S, Ho A T S, et al. ” Flow Size Difference” Can Make a Difference: Detecting
Malicious TCP Network Flows Based on Benford’s Law[J]. arXiv preprint arXiv:1609.04214, 2016.
[5] F. Benford, ”The law of anomalous numbers” Proc. Amer. Philos. Soc. , 78 (1938) pp. 551-572.
[6] Nigrini M J. The detection of income tax evasion through an analysis of digital frequencies[J]. Doctorat
[7] Li X H, Zhao Y Q, Liao M, et al. Detection of tampered region for JPEG images by using mode-based
first digit features[J]. EURASIP Journal on advances in signal processing, 2012, 2012(1): 190.
[8] Sambridge M, Tkalciˇ c H, Jackson A. Benford’s law in the natural ´ sciences[J]. Geophysical research
[9] Arshadi L, Jahangir A H. An empirical study on TCP flow interarrival time distribution for normal and
[10] Asadi A N. An approach for detecting anomalies by assessing the interarrival time of UDP packets and
flows using Benford’s law[C]//Knowledge-Based Engineering and Innovation (KBEI), 2015 2nd
[11] Iorliam A, Tirunagari S, Ho A T S, et al. ” Flow Size Difference” Can Make a Difference: Detecting
Malicious TCP Network Flows Based on Benford’s Law[J]. arXiv preprint arXiv:1609.04214, 2016.
An Intrusion Detection System 26
[12] G. Meena and R. R. Choudhary, ”A review paper on IDS classification using KDD 99 and NSL KDD
dataset in WEKA,” in International Conference on Computer, Communications and Electronics, Jaipur, pp.
[13] A. Kumar, M. Sung, J. J. Xu, and J. Wang, “Data streaming algorithms for efficient and accurate
estimation of flow size distribution,” in Proceedings of 2004 Joint International Conference on Measurement
and Modeling of Computer Systems (ACM SIGMETRICS ’04/IFIP Performance ’04). ACM, 2004, pp. 177–
188.
[14] P. Tune and D. Veitch, “OFSS: Skampling for the flow size distribution,” in Proceedings of 2014
Conference on Internet Measurement Conference (IMC 2014). ACM, 2014, pp. 235–240.
An Intrusion Detection System 27
APPENDIX
Program Code
import numpy as np
import math
import pandas as pd
import matplotlib
prob = 0
if (d == 0) and (n == 1):
prob = 0
elif n == 1:
else:
l_bound = 10 ** (n - 2)
u_bound = 10 ** (n - 1)
return round(prob, 3)
data.columns = ['original']
data['first_digit'] = data.cleaned.str[0].astype(int)
data_cols_length[n_i]
def calc_confidence_interval(alpha):
data_cols_length[n_i]))
calc_confidence_interval(alpha_level)
list_of_rules_violated = []
list_of_rules_violated.append((d_i, n_i))
data_that_violated = []
data_that_violated.append(number)
def plot_charts(digit_location):
x = np.arange(10)
width = .35
An Intrusion Detection System 30
fig, ax = plt.subplots()
yerr=h_length_matrix[:, digit_location],
capsize=7,
label='Expected',
color='#d3d3d3',
figure=fig)
label='Given Data',
color='blue',
figure=fig)
ax.legend()
ax.set_xticks(x)
if digit_location == 0:
else:
plt.title(title, figure=fig)
plt.xlabel('Digit', figure=fig)
plt.ylabel('Frequency', figure=fig)
return fig
plt_0 = plot_charts(0)
plt_1 = plot_charts(1)
plt_0.savefig('first_digit_plot.png')
plt_1.savefig('second_digit_plot.png')
# Export CSV
violations_df.to_csv('transactions_to_investigate.csv')
if __name__ == "__main__":
main('transactions_real.csv', .999)