0% found this document useful (0 votes)
75 views

Insider Threat Detection in Organization Using Machine Learning

This document discusses insider threat detection in organizations using machine learning. It begins with an abstract discussing how insider attacks are a serious threat and outlines a proposed two-stage machine learning approach. The introduction provides background on cybersecurity, insider threats, and issues with existing insider threat detection methods. The document then reviews literature on analyzing the negative effects of insider attacks and techniques used by organizations to prevent them. It proposes a novel two-stage machine learning method combining hidden Markov models, fuzzy logic, and profile comparisons to detect insiders and reduce harm while maintaining a low false positive rate.

Uploaded by

Varsha Suresh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views

Insider Threat Detection in Organization Using Machine Learning

This document discusses insider threat detection in organizations using machine learning. It begins with an abstract discussing how insider attacks are a serious threat and outlines a proposed two-stage machine learning approach. The introduction provides background on cybersecurity, insider threats, and issues with existing insider threat detection methods. The document then reviews literature on analyzing the negative effects of insider attacks and techniques used by organizations to prevent them. It proposes a novel two-stage machine learning method combining hidden Markov models, fuzzy logic, and profile comparisons to detect insiders and reduce harm while maintaining a low false positive rate.

Uploaded by

Varsha Suresh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Insider Threat Detection in Organization Using

Machine Learning
P. Varsha Suresh Ms. Minu Lalitha Madhavu
Computer Science And Engineering Computer Science And Engineering
Sree Buddha College of Engineering Sree Buddha College of Engineering
Pattoor, India Pattoor, India
varsha2361995@gmail.com minulalitha@gmail.com

Abstract—A Cyber Attack is a sudden attempt launched by cy- etc. According to the evolution of cyber space Insider attack
bercriminals against multiple computers or networks. According is the most promising attack faced by user’s in today’s world.
to evolution of cyber space, insider attack is the most serious A threat that originates inside the industry or government
attack faced by end users, all over the world. Insider that
perform attack have certain advantage over other attack since firms, and causes exploitation is known as internal Cyber
they familiar system policies and procedures. It is performed threat or internal cyber attack. Insiders that perform attacks
by authorized person such as current working employee, pre- have a dominance over external attackers because they have
working employee and business organizations. Cyber security approved system access and also may be familiar with web
reports shows that both US federal Agency as well as different architecture and system guidelines. Moreover, there may be
organizations faces insider threat. Compromised Users, careless
Users and malicious Users are some of the ground for insider fewer safety against internal cyber attacks because many
attack. User-Centric insider threat detection based on data firm focal point is on protection from exterior attacks.As
granularity provide a new extent for insider detection since data claimed by Clearswift Insider Threat Index (CITI) [25] annual
is analysed on it’s depth. but, improper selection of feature is a report 74% of data security breaches in past 12 months were
demerit. As a result, Data granularity with two stage confirmation originated by insiders. Source of attack or behavior of
method is used in the proposed system. In the first stage dual
filtering using Hidden markov model and fuzzy logic is involved. attack are used to classify security attack. Source defines the
In the second stage, the predicted output from first stage is place from which attack originate and behavior defines the
again checked using profile-to-profile or template-to-template aggressive behavior which leads to forceful access of data.
comparison.The selection of user’s information as well as triple Attacks are classified as insider or external attack in case of
feature for generating training set is an additional advantage of the former and in the case of the latter, they are classified
the proposed approach. Two stage confirmation leads to increase
in performance measure with very low false positive rate. as active or passive attacks. External attack originate from
Index Terms—Cyber Security, Machine Learning (ML), Hid- the outside the organization and some of the important
den Markov Model, Fuzzy logic. external attack are network security attacks, physical security
attacks, etc. A malicious attack caused by an individual
I. I NTRODUCTION within the organization is known as insider Insider may be a
Securing information from unauthorized access is known as current working employee , a former employee or a business
information Security. The practice of stopping the disclosure, associate. Whereas, Active attack has more importance
disruption, modification, inspection and destruction of over passive attack as it tries to modify the content of the
information without knowledge of user. Information can messages. In case of Passive attack, an attacker observes the
be anything like user’s details, profile on social media, messages or it’s content and subsequent retransmission takes
data in mobile phone or biometrics etc. Authentication place.
and authorization are essence of information security. According to cyber security report , 25% of all the attacks to
Authentication does the duty of confirming a person’s identity organizations are due to insider and their number is increasing
while authorization does the work of providing appropriate day by day. It is very much importance in the current era .
privileges to an individual after verifying the person’s identity. According to recently published report of IBM, because of
Cyber Security is the approach of technologies to check and COVID- 19, 53% of employees are working from their home
to safeguard systems, networks, devices and data from cyber using personal laptops and 61% employees haven’t provide
attacks. A cyber attack can illegally damage computers, steal tools to properly secure those devices. Which leads to loss of
data, or use a breached computer as a starting point for secure data and it is done by a known person. Actually insider
other attacks. Cyber Security is the state of safeguarding and threats have been an issue for companies long back, but they
recovering computer system from any type of cyber attack. have gain more strength after the system gotten increasingly
Cyber attack can be divided in to insider as well as outsider interconnected. The study sponsored by Ponemon Institute
attack. There are different type of cyber attack such as which was sponsored by IBM explains that insider-related
Phishing, Manin-the-middle attack, Denial-of-service attack incidents costs $4.3 million in year 2016. According 2018,
cost for these internal cyber attacks was $8.7 million. This a new approach which tackle the problem in the existing
is the big take away and the data breach cost is trending approach and improves the efficiency. The reviewed papers
upwards both in the US as well as globally. In 2019 also concentrate on diverse task which have been considered to
the number of attacker is increasing tremendously , which study the negative effect of insider attack (Internal Cyber
leads to increasing in lose that was faced by organization. attack) to organization such as lose in money, reputation
Same time user’s or each individual may effected by insider and secret files and data of companies. The findings of
attack or an internal friend who behave as an attacker. User’s these work are very effective on learning the issues and
personal information or credentials are traced by the attacker increasing estimate to address solution. We spend most
with out the knowledge of user. As a result, the user may also of time to explore different facts such as Taylor and
face many problems such as lose of money or their personal francis, Elsevier, Science Direct, Springer, IEEE Explore,
properties etc. Different techniques are introduced prevent and other computer science journals and conference. In
insider and their harmful attack. searching sentences and keywords we used application of
cyber security in organization and government agencies. We
The objective of this paper is to understanding the inspect each and every article’s reference list to recognize
harmful effect of internal cyber attack. The lose which any potentially applicable research or journal title. The
is faced by individual or organization over years. Analysing publication periods taken into consideration is 2010 to 2021.
different work to understand the move made by an insider. For exploring different we collect abstract and keywords of
Understanding what are the differents methods used by PDF, reports, documents and Full length paper. Furthermore,
business organization to prevent insider before attack takes in searching different information for getting content we used
places. To propose a new and advance techniques to detect journal, conferences paper, workshop papers, topics related
insider and reduces the harmful effect. To bring forward publication, expert lectures by expert or talks and other topic
a novel two stage confirmation techniques to protect from related communities such as Overview of insider attack,
insiders. Insider attack challenge to organization, Insider and Outsider
Data Security threats. Different video and lecture class related
The important contributions of this paper are outline as to the harmful effect of insider, paved the way for research
follows: work. Different review and research paper on reputed journal
• We first give a brief idea of the Cyber Security, followed enable us to understand the demerit of existing system
by providing knowledge about insider as well as outsider and draw back of proper selection of feature. So, for this
attack. research, we collect information from two sources and take
• we provides an information about different type of insid- the application of machine learning to do this research with
ers which exist in the organization. all sincerity.
• We present a novel approach using two stage confirmation
to protect from internal cyber attack.
• We mention the use of two important machine learning III. OVERVIEW OF INSIDER ATTACK
algorithm that is fuzzy and markov model.
Attack that originates inside the industry or government
• We bring the use of the concept anomaly based detection
firms, and causes exploitation is known as internal Cyber
and misuse detection in the second stage of confirmation.
threat or internal cyber attack. Insiders have dominance over
• We mention the advantage of data granularity and proper
external attackers because they have permitted system access
selection of user’s information and triple features for
and also be familiar with the architecture of network as well
detection of insider.
as system procedures. Moreover, It has less security against
• Finally, we illustrates a theoretical analysis between the
insider attacks when compared with external attack.
existing and the proposed insider detection models.
Types of Insider Attack are:
The remainder of the paper is Section 2 describes the Method • Malicious insider - A Turncloak, who maliciously and
Of study. Section 3 provides an overview of Insider Cyber intentionally abuses credentials such as Password , to
Attack. Section 4 describes the related works explaining steal information for financial or personal reason.
different insider detection techniques. Section 5 explains the • Careless insider - An innocent user who unknowingly
proposed insider attack detection method. Section 6 illustrates reveal the system to outside threats. It is common type of
the theoretical analysis performed between the existing and insider threat that arises from mistakes, such as keeping
the proposed insider detection models. Section 7 concludes a device expose. Careless insider may arises when an
the paper and provides future directions. employee unknowingly click an insecure link, affecting
the system with malware.
• A mole - A person who is actually an outsider but behave
II. M ETHOD OF STUDY as an insider to gain access to a privileged network.
This paper propose a novel approach for insider detection Actually the outsider impersonate as a worker in the
using two stage confirmation. The objective is to propose organization.
An insider threat is one of the most expensive types of experimental testbed, which consists of 12 users to verify the
attacks and hardest to detect. It mainly occur inside the feasibility and accuracy of the IIDPS and the results shows
organization by peer worker or colleague with our knowledge. that mining of information as well as historic feature are used
An employ change in to insider due to dissatisfaction in his for intrusion detection provide strong resistance against threat.
work. Due to avoidance of promotion or unnecessary cutting The results also demonstrate that IIDPS can effectively pre-
of income will change their mind. Some times, company may vent different aforementioned attacks .Detection and mining
not provide employee proper reward or sudden termination speed of approach is high. Similarity scores in this approach
may lead the path to an insider. This circumstance of worker helps to identify unknown user accurately. Accuracy of detec-
is usually used by other agencies to make him insider. Most tion of attack in IIDPS is 94.29%, and the response time is less
probably legible user become insider because negative effect than 0.45 s means it can prevent protected system from insider
environment in which they live. attacks effectively and efficiently.IIDPS pay out 0.45 s to spot
a user. The demerits bounded with data mining and forensic
IV. RELATED WORKS technique is that third-party shell command is not used in
Insider attack is a cyber attack performed by authorized this approach to improve the system performance. IIDPS may
person such as current working employee, pre-woring em- detect inaccurately when user’s habit suddenly changes.
ployee or business organization. Insider or Internal Cyber
attack detection system proposed by several authors and their B. Insider Attacks detection in Big Data Systems
merits along with demerits on it’s detection and prevention are Big Data is a gathering of information that is vast in volume,
discussed in the following sections. which is raising exponentially. It is a area with so large
size and complexity that none of data management tools can
A. Insider attacks at SC level by using data mining and process it efficiently. In such a system Information security is
forensic techniques a major opposition for Big Data System. From a customer
Computer systems select user IDs and passwords as the point of view one of the main risks in adopting big data
login credentials to authenticate users. Mostly,login patterns systems is in trusting the provider who protect the information.
are shared by the individuals with the coworkers and request Santosh Aditham et al. [2] propose a new system architecture
them to assist co-tasks. As a result, safety of pattern used in which insider or internal cyber attacks can be identified by
is not up to the mark. A legitimate users of a system who using the replication of information on different nodes in the
perform malicious action to the system internally, are difficult system. It utilize a two-step threat detection algorithm and a
to determine as intrusion detection systems and firewalls safe communication protocol to identify processes running in
detect and destroy harmful action occurs from outside the the system. Atmost two step in which construction of control
system. Fang-Yie Leu et al. [1] propose a system, named instruction is the first step and second step involves their
Internal Intrusion Detection and Protection System (IIDPS), to matching. The first and foremost step in the attack detection
identify insider or internal threat at system call (SC) level by is process profiling, which is conducted independently at
mining the data as well as using forensic features.The IIDPS each and every node to identify different attacks and the
determine login user is a account holder by comparing with the second step is hash matching which is performed by replica
users’ personal profiles. User’s usage habit is used as forensic data nodes understand about the legitimacy of attack. It is
feature for the detection purpose.The IIDPS is organized by a combination of independent security modules that work
distinct element such as System call monitor and filter, two simultaneously and reside on individual nodes of the system.
type of server such mining and as well as detection server, Information transferred between the safety element in the
grid for computation purpose, and three repositories are also system architecture contain meaningful knowledge about the
used such as log files of user, profiles of user and attacker. analysis of a process. Hence, a public key cryptosystem is
The SC monitor and filter gather SCs submitted to the kernel used for secure communication .All information transferred
and save these SCs in a format which consist of user ID, the by node using secure communication channel is encrypted by
process ID, and the SC passed by the users. User’s inputs is using private key and copies of the keys are not available to
stored in user’s log file. The mining server check the log data anyone. The associated public key will be passed with all other
with data mining techniques to understand the user’s computer duplicate nodes that a information node need to communicate.
usage habits, which are then stored in the user profile. The Actually the proposed security system is a combination of 3
detection server makes a comparison between users’ behavior parts that is secure communication protocol, process profiling
patterns with those SC-patterns gathered in the attacker profile and hash matching. The three parts are formed of different
and those in user profiles to detect harmful behaviors and modules that need to be installed in to big data system. Secure
attacker. When an intrusion is identified, the detection server communication protocol is used to send, receives or queue
alert the SC monitor and filter to separate the harmful user message. Process profiling is used analyze as well as encrypt
from the protected system. The purpose is to ban the user data. Verification and consensus is used to decrypt, verify and
from continuously attacking the system. notify about attack.
Ability of identifying and preventing attack has been an- Different method used to increase the privacy and security
alyzed in this paper.This has been done by means of a of data in Big Data System.Mean while, providing a positive
vibe to user who trust Big Data. Techniques used to prevent The results also demonstrate that to it helps to avoid the use
insider has been discussed in this paper. This has done by of common secrets for protecting broadcast communications.
means of real-world hadoop and spark tests indicate that the Limitation bound with approach is that heightened tier of
system needs to consider only 20% of the code to understand security comes at the expense of performance. Other one
a program and suffers 3.28% time overhead and the result is accurate behavioral monitoring mechanisms that do not
shows that the security system can be built for any big data depend on continuous overhearing, and proper maintenance
system due to its external workflow. and dissemination of reputation metrics.
The results also demonstrate that the system detect insider
attacks quickly with low overhead. It aims to provide robust D. Geo-Social Insider Threat Resilient Access Control
security for big data systems. The limitation associated with Frame-work (G-SIR)
this approach is that need to provide methods to evaluate sys- The most dangerous and costly threat to institutions is
tem on security related benchmarks. Also, lack of commencing denoted as Insiders. These attacks are carried out by user or
a hardware architecture of security chips that can support the person who has authorized access to the system. Preventing
system. insider attacks is a discouraging task.Nathalie Baracaldo et
al. [4] propose Framework G-SIR to deter insider threats by
C. Selective Jamming/Dropping Insider Attacks in Wireless including current and historic geo-social knowledge as part of
Mesh Networks the entrance prevention decision process .By analyzing users’
Wireless mesh networks promise to extend high-speed wire- geo-social behavior, insider can be those users whose access
less connectivity beyond what is possible with the current behavior variess from the normal patterns, such anonymous
WiFi-based infrastructure. However, their unique architectural character can lead to potential insider attackers who may
features leave them particularly vulnerable to security threats. intentionally carry out harmful activities. Such information
Loukas Lazos et al. [3] says that unlocked nature of the to determine how ethical a user before granting access. The
wireless channel leaves it to jamming attacks. Anti-jamming proposed system consist of AC guideline specification and
method include some type of spread spectrum (SS) communi- enforcement strategy designed to understand users’ geo-social
cation, in which the signal is passed across a large bandwidth behavior. The AC component collect present and historic geo-
according to a pseudo-noise (PN) code. SS can protect the social interactions to predict whether an access should be
wireless communications safe to make PN codes secret. If an granted or denied. A role may have a spatial scope that
insider have the knowledge of the commonly shared PN codes specifies particular locations where it can be activated by users
can still make a jamming attacks. WMNs is a combination assigned to it. Geo-Social constraints indicate area that users
of two-tier network architecture. The first tier incorporate assigned to the constrained role cannot visit and user they
end users, they are also known as stations (STAs), which is cannot frequently meet. Vicinity constraints force a restrictions
merged with mesh nodes, and known as mesh access points on individual that may or may not be at a particular distance
(MAPs). The second tier or stage is made of a peer-to-peer from the requester at the time of an access. Mainly there
interconnections of the MAPs. Connectivity in the second tier are two types of vicinity constraints such as inhibiting and
is supported by mid-way routers known as mesh points (MPs), enabling constraints. Inhibiting constraints gives the idea that
which connect MAPs (MPs do not receive interrelation from permission needs to be denied, if inhibiting users are identified
end users). The interconnection of MAPs and MPs is usually in the vicinity. Enabling constraints are used to understand the
static in nature , and use different frequency bands to transmit importance of permission in the vicinity of the requester. Geo-
data and control information. Finally, mesh gateways (MGs) social trace based constraints are constraints that a user to go
support connectivity to the wired infrastructure. Internal at- after a certain geo-social path before they can be authorized
tacks or Insider attack, which are begin from compromised to access a particular resource. Geo-social obligations are
nodes, are much more knowledgeable in nature. These attacks geo-social actions that users need to manage after they have
uses ideas of secrets of networks and protocol to carefully and been granted an permission. All monitoring and likelihood
secretely target critical network. Internal attacks, also known computations described take place in the Monitoring, Context
as insider attacks, cannot only prevent using energetic methods and Inference Module. It specifies the context of a user, which
that focus on network secrets, because the attacker already contain information such as the current device used by the
have an idea. If selective jamming is not effective due to anti- individual, type of connection used, etc. The Access Control
jamming measures, an insider can carefully drop packets. Once Module is in charge of creating AC decisions.
a packet has been accessed, the compromised node can audit Ability of identifying and preventing attack in G-SIR has
the packet headers, classify the packet, and think whether to been proposed in this paper. This has been done means of a
pass it or not. Such an action is often termed misbehavior. distinct indoor simulator carried out in Java and the results
Post-reception dropping is less tensile than selective jamming indicate that this is the first and foremost effort to use geo-
because the adversary is minimal to dropping only the packets social information to deter insider threats by integrating it into
routed through it. the AC mechanism.
Ability of selective Jamming or Dropping insider attack in The results also demonstrate G-SIR is efficient, scalable and
Wireless mesh network has been analyzed in this paper. effective to detect insider by including historic and geo-social
knowledge. It contain various geo-social constraints and en- SecureMAC is an approach used to protect against such insider
forcing these condition helps to minimize the risk of proximity, threats. It consist of four components such as channelization
social engineering and probing threats. The limitation bound which is used to block large reservations, randomization
with this approach is that behavioral knowledge should be method is used to counter reactive targeted jamming, coor-
considered at the time AC decisions. However, designing a dination perform duty to to prevent control-message aware
system that uses such information without expanding the risk jamming and again over reserved and under-reserved spectrum
exposure is a challenging task. should be solved and assignment of power to find out each
node’s contribution to the particular power. Sang-Yoon Chang
E. Addressing the DAO Insider Attack in RPL’s Internet of et al. [6] demonstrate a general handshake-based MAC frame-
Things Networks work where it denote how to send a packet, how the transmitter
In RPL routing protocol DAO which is a information used shows a MAC-layer decision based on its observations and the
for control are passed by the child nodes to their corresponding knowledge from previous transmission rounds. Save as well as
parents to produce descending routes. A harmful insider node reserve the channels for data transmission. Reserved channels
can utilize this characteristics to send fraud DAOs to its parents is used for transmission of data packets and feedback is gain
periodically, activate those parents to pass the fake messages from the receiver as well as the network.
to the root node. This characteristics can have a harmful side In wireless MAC, an internal attacker can carry out the
effect on the production of the network, power consumption following actions that are more damaging than those from
can be increased drastically, latency, and reliability can be an external users. False reservation injection means holding
reduce to an extent. the channel resources without operating them, false feedback
RPL arrange its physical network into a shape of Directed distribution consist of announcing wrong data to twist the
Acyclic Graphs (DAGs), If DAG is implant at a single action on MAC control to the attacker’s favor , and MAC-
destination, then it is known as a Destination-Oriented DAG aware jamming where jamming is based up on to the received
(DODAG). To incorporate traffic pattern to upward, DODAG control messages. False reservation hide bandwidth to actual
should be constructed,topology centered at network root. The users and takes small insider resources and use network
manufacturing of the DODAG launch with the root multi- resources out of proportion to attacker effort, it is more
casting control messages called DODAG Information Objects efficient mistake than jamming. The goal of channelization
(DIOs) that is passed to RPLs neighbors. Baraq Ghaleb et is to distribute spectrum bandwidth to each user proportional
al. [5] demonstrate RPL node as a available stopping place according to their power capability, guarantee a specific power
from the root. An important reality of transferring a DAO spectral density. Channelization actions are made only once
message by a child node will leads to passing of many copies per round.The coordination solve these problems by enhancing
of DAOs that is equivalent to the number of intermediate the bandwidth allocation and the randomization output solving
parent nodes. An oponent can utilize this information to harm certain conflicting reservations and sharing transmission to
other network continuously transmitting DAOs to its parent area that would otherwise be not utilized.. Finally, after
node. In order to determine a DAO internal attack in RPL, a each round of information transmission is over, each junction
new approach called SecRPL is used, that prevent the count of carryout power attribution to calculate the count of power
forwarded DAOs by a parent. In fact, there are two opinion for contributed for communication of data by each node.
appling this restriction. Former is to regulate the total count of
transmitted DAOs regardless of the source node, the second G. Securing VPN from insider bandwidth flooding attack
is to prevent the count of transmitted DAO per destination. The insider attack is launched by users residing within
Second option is better compared to previous option and result the trusted zone of the VPN site. They are the legitimate
in preventing some DAOs coming from non malicious junction users of the VPN service. Flooding packets are used for
or node. It may also leads to block DAOs of some nodes easily attacking the VPN service. Safeguarding from insider or
and no effect to some others DAO. . In addition, parent node internal cyber attack is more difficult then external cyber attack
maintain a counter with each child node in its sub-DODAG. as it is launched by users who have authorized access to the
Incase, If the number of forwarded DAOs exceeds threshold VPN service. This type of flooding attack disrupts the VPN
value, the parent discards any DAO message. It also make clear service to its other legitimate users. Network security deals
that no node will be blocked due to the time factor, after two on safeguarding network perimeter from outside threat even
consecutive DIOs, counter is reset . Mainly, when the parent though internal attack is more serious. It’s aim is to add a con-
node pass a DIO message, all child node counter are reset. trol mechanism for bandwidth to control the bandwidth each
individual. The bandwidth control mechanism must ensure that
F. Securing Wireless Medium Access Control Against Insider the packet through the reserved bandwidth is within the allow-
Denial-of-Service Attacks able limit. Control mechanism has used to reduce dropping
An malicious user (attacker) who default the network can packets from the flooding source which protect authorized
start more harmful denial-of-service (DoS) attacks than a user from harmful attack. Saraswathi Shunmuganthan a et al.
External user by passing large amount reservation requests to [7] describes Virtual Private Network ( VPN) is an encrypted
block the bandwidth. connection over Internet from a device to a network.It helps
to transmit data from a branch office to main office. the risk is used to increase amount of risk that is faced by
Flooding is a routing algorithm present in computer network the attacker, when he/she do a malicious action which is
in which all arriving packet is passed through every other link harmful to organization. It also reduce the excuse which made
not on the link from which it has came. by worker in doing mistake, since they fear to do it again.
VPN site 1 and VPN site 2, are connected to gateway routers Reward which the attacker get by doing mistake is also reduce
called customer edge (CE). CE1 and CE2 are interconnected drastically.
to provider edge (PE) routers PE1 and PE2. Bandwidth is Social bond theory include attachment with the organization. If
actually maximum data transfer rate over network. Customer their is any problem with the organization as well as worker,
Edge router ensure that bandwidth allocated to VPN site is chance of performing attack is more. Commitment with the
being fairly distributed among users to avoid insider attack. organization is also considered. If a person is commitment
It employs entropy based probabilistic model at CE router to with organization, he/she will not do any negative thing to the
rate limit of insider attack traffic. Entropy is used to measure organization. Workers involvement with the organization show
the uncertainty. Entropy is used to calculate deviation of user that whether he is an attacker or normal user. If user is sensitive
from normal use age. towards organization, he/she will not do any malicious activity.

H. Insider threat risk prediction based on bayesian network J. User Behavior Modeling and Anomaly Detection Algo-
Bayesian network is a graphical model based on probability, rithms For Insider Detection
consist of a number of variables via directed acylic graph Junhong Kim et al. [10] demonstatre user behavior-
(DAG) is used to show conditional dependencies. Nebrase modeling phase, where each user’s behaviors are converted in
Elmrabit a et al [8] demonstrate that the features which to daily activity summary, e-mail contents, and e-mail com-
used by the graph are technological aspects, Organizational munication network. Anomaly detection algorithm consist of
impact and Human Factors. Information are collected from Gaussian density estimation (Gauss), Parzen window density
organization and particular measure sealing to ensure insider estimation (Parzen), principal component analysis (PCA) and
threat breaches are kept to minimum. Investment balance K-means clustering (KMC) are algorithms used for separation
is the balance between investment in insider and outsider of pattern. gaussian density estimation which is important
threat is key to understand insider threat breaches. Detection anomaly detection algorithm is used exhibit probability distri-
level is the measurement of how accurate detection system bution of variable which distributed randomly. Parzen window
with regards to previous insider attack. Security and privacy classification is used for density estimation. It find a point of
control include forensic evidence, network as well as email interest. Only the features inside the window is considered to
logs. Organizational Impact is the information related to the find which group the point of interest is present. It is used to
way in which organization is structured and how insider calculate output probability when a point is given. Principal
threat breaches are managed. Organizational impact deal with component analysis is used in dimensionality reduction for the
information like security breaches, Structure , security policy reduction of noise or unwanted data. Dimensional Reduction
as well as employee work-related stress symptoms. consist of feature selection and feature extraction. PCA comes
Security breaches include breaches that have occured histor- under feature extraction in order to reduce noise or error. As
ically with in the organization. Structure include information the number of feature decreases, processing will be fast. K-
about recruitment procedure, previous employment screening. mean clustering is an unsupervised algorithm does not have
Security policy contain information related to organizational labelled data. Set of data is put together in a group or cluster.
security policy. The fragile link in an information security cluster consist of object which is similar in nature. K denote
chain is one and only human factors. It include motivation the number of cluster or group.For best classification of data in
which include motivation for showing misbehavior, Oppor- to different group, appropriate cluster need to find.The attack
tunity is the factors which is available to perform attack. observation model surrender at most 53.67% of the detection
Capability include the power to do something by the fellow rate by only tracking the top 1% of malicious or suspicious
being. instances.
The papers [11], [12] propose many insider detection meth-
I. Motivation And Opportunity Based Model ods like alarm filters and Psychological model to predict
Situational crime prevention theory (SCPT) opportunities malicious behaviour.
for misbehaviour is lowered to an extent.Social Bond Theory
(SBT) can be used to help understand motivation to engage V. PROPOSED SYSTEM
in misbehaviour. Raise in effort, risk and lower the rewards, Insider Detection techniques have been developed to protect
stimulation, keep away exempt are the elements considered in the system from wide variety of internal attack performed by
SCPT. Nader Sohrabi Safaa et al. [9] explains SBT pivot on current working employee, Pre-Working employee or Business
mainly four factors such as organisation attachment, realtion Organization. It is used to safeguard the privileges, reputation,
with institution or organisational , involve n a particular work, key documents and economics of the organization or institu-
and personal standard. Increase the effort is used to raise the tion. Several feature extraction techniques, Machine learning
amount of effort which is taken to perform attack. Increase schemes, Psycho metrical and behaivour schemes are used to
Fig. 1. Architecture Of Insider Detection Based on Two Stage Confirmation

Fig. 2. Source Of Data Collection


detect insider. But, due to lack of proper arrangement or proper
capturing of data, this technique remains ineffective or the
performance is not up to the mark. 6) Malicious Behaviour Analysis or Behavior Analysis: This
Feature based decision modelling is required for identifying stage is primarily meant for the finalizing the insider as well
more attack. So, a novel insider threat detection method using as rejection of normal users.
two tier, fuzzy logic and markov chain mechanism by collect- 7) Action Module: The insider who behaive as a normal
ing temporal feature, geographical feature and connection or user will face the consequences.
re-connection feature is proposed.
The proposed system for malicious behavior and insider or
A. Data Collection
internal cyber attack detection is shown in Fig 1. It has the
following stages: The entire system work based on the supervision of security
1) Data Collection: Data gathered from different sources analyst. A security analyst is a person who make detailed
are gathered and stored in a specified formats. The two main studies to protect the system from unauthorized attack or
categories are cyber threat. The process of aggregation of data from different
sources and their further processing is done by system ana-
• User’s Information.
lyst. A good observation method in connection with proper
• Triple features.
collection of data leads to successful application of ML
2) Pre-processing: Identification of features and filtering of techniques and also assist security analysts in making correct
best features from the aggregated data occurs. decisions.System analyst mainly concentrate on collecting
3) Data granularity: Identification of collection of data information related to the user’s daily action on work hours
segments is called granules. and after he working hours on the user’s PC, shared PC
4) ML engine based Data analysis: Two tier processing of and Websites. Fig 2. represent the Data Collection process
data based on two important machine learning algorithm. in proposed insider detection system. System analyst collect
useful information for detection of insider from user’s PC.
• Fuzzy logic (FL) - Fuzzy Logic (FL) uses the method of
They mainly concentrate on collecting two main categories
human reasoning to derive a solution.
of details. The User’s information and Triple features. User’s
• Hidden Midden Model (HMM) - It is a machine learning
information mainly collected from Login, Http, Email and
model in which current state depend on previous state.
Connect features. They are Commo Separated file or .csv
5) Detection Module: It is peculiarly for identification of file. Triple features are mainly collected from geographical,
insider from the organization. Connect or Re-Connect features and Temporal features. These
are three most important feature which is used to detect – Temporal: Temporal feature helps to know the par-
insider.Hence they are known as triple feature. ticular time in which attack has occurred. Time
• User’s Information: User’s Information is collected specification helps to know, which attacker logged
mainly from Login, Email, Http and Connect dataset. in during the particular time. Since, it is an insider
These are background data contain idea about user’s attack and it is performed by current working or
motive or work with in the organization. It gives a good pre-working employee of an organization or internal
idea about reality or abnormality of user’s behavior. These user, time specification helps to identify the internal
are often actual data that need to be updated periodically. attacker easily. It is based on time bound. Mainly to
Since the user’s behavior may change at any time. Above find time zone as well as time region.
mentioned input dataset concentrate on the following – Connection or Dis-Connection States: Connection
contents: and re-connection phase count helps to know the
arrival of attacker, When insider or internal user mis-
– Http: It contain ID, date, action, PC and Url infor-
behave, dis-connection occur, which helps to know
mation.
the presence of abnormality. This will also contribute
Here ID means MAC-ID is generally machine access
to the model building of threat profiles. It shows the
identification. It is not similar to any machine. So,
indication of insider at particular moment. The effect
MAC-ID IS A unique name for identification of
of disconnection and reconnection states that to be
user’s PC. Different PC has different MAC-ID. Date
logged in feature extraction module.
is actually the system date in which data transfer
takes place. PC denote the number of PC which B. Pre-Processing
perform the operation. Action denote the specified
This stage actually delete non-continuous or records with
action user perform. Url information stands for Uni-
no data. It identify all the features properly and filter or select
form Resource Locator. It include domain name, with
best feature based on relevant values. It actually enhances the
other detailed information to direct the browser to
“garbage in, garbage out” process of the system. Analysing
certain webpage
data carefully help to remove misleading results. It helps
– Email: ID, date, pc, from, attachment, size and
to improve the quality of data before running any analysis.
content, bcc and cc email address.
DataSet with missing value lead to wrong results which can be
ID means MAC-ID, Date is system date, PC denote
removed by applying to pre-processing. If there are irrelevant,
the number of PC which perform the operation,
redundant, noisy and unreliable data, then knowledge discov-
From denote the from address, Attachment denote
ery during the phase of training is more difficult. It increases
the count of details attached, Size denote the size of
the amount of processing time for preparation and filtering
the file, Content indicate the details transferred, bcc
steps. Data preprocessing mainly includes cleaning, Instance
indicate blind carbon copy allows the center of email
selection, feature extraction and selection. The output of data
to conceal the user’s entered from bcc field from
preprocessing is the final training set.
other recipients. CC means the actual carbon copy,
that is recipient can see to whom all this message C. Data granularity
has been send.
Identifiable collection Of data segments is called granules.
– Login: ID, date, pc and activity.
In a DataSet, certain field is combine to predict particular
ID means MAC-ID, date is the same system date,
behavior. Such data is called granuled data.Data granularity
PC is the PC number, activity indicate the login and
is spliting or fragmenting data in to multiple pieces or gran-
log off phenomenon.
ules.report.The greater the granularity, the deeper the level of
– Connect: ID, date, user, pc, activity. ID the MAC-ID,
detail. Increased granularity can help you drill down on the
date is the same system date, user indicate the user
details of each organization and assess its efficacy, efficiency.
name, pc indicate the pc number and activity indicate
Different datasets contain login, http, email, content and triple
how many time the user connect or disconnect.
features are combine to generate the granule data. Here granule
• Triple Features: Triple features means the three main fea- data is the training data. User-centric insider threat detection
tures which contribute for insider detection and provide based on data granularity provide an additional advantage to
and extra miliage for detection. These are moral real the proposed approach. It actually considering microscopic
time features. By adopting these features, attackers can feature to break down data. Generally, Multiple Granularity
be analyze from depth. means hierarchically breaking up the database into blocks
– Geographical: It indicates the features of location, which can be locked and can be track what need to lock and in
area or region. Here, longitude and latitude are con- what fashion. Such a hierarchy can be represented graphically
sidered. Black list area normally where the entries as a tree.
are blocked and not considered. It contain elements Hierarchical representation of data granularity is shown in
that are not automatically possible to access a certain Fig 3. Actually it’s a tree, which is made of four levels
area. of nodes. The highest level represents the entire data Set.
efficiency and speed of insider detection increases. fuzzy logic
is an important machine learning algorithm in which the real
value of variables may be a real number which lies in between
0 and 1. It is working to use the partial truth concept, where
value of truth varies between completely true and completely
false. Fuzzy logic consider all possibilities and human way of
decision making. If membership value of particular group is
above a threshold then it is consider as important features for
detecting insider. Attribute based comparison helps to know to
insider at microscopic level. It helps to identify malicious user
as well as malicious behavior. Fuzzy logic provide attention
to Small false positive rate.
Hidden Markov Model (HMM), is assumed to be a markov
process. It move through different state from the start state
to end state. An important fact of Hidden markov model is
that current state depends on the previous state. The markov
process that is happening behind and hidden from rest of
the world is actually Hidden markov process. Here, based on
this fact a person is considered insider based on the previous
suspecting action. The action such as attaching multiple mails
per second is a malicious or doubtful action. This hint is
considered in next step for detecting insider.
Fig. 3. Hierarchical representation of data granularity Markov chain is used in first phase. Remaining condition
which occur is verified using fuzzy logic. This two-phase
comparison helps to improves the accuracy of detection.
Below it are nodes of type source, which denote the source Fig 4. illustrate the process in developing the proposed
of information. The dataset consists of exactly these source architecture. Initially the dataset of http, login, connect , email
of information. Source 1 has child node which are called and triple features will be extracted from .csv file format as
User’s information. Source 2 has child node which is known excel file by the data collection module. After pre-processing
as triple features. Finally, User’s information has child nodes and filtering the best feature it is converted in to granule data.
http, email, login and connect. Triple features has child nodes The granule data will be split as training and testing data. Two
geographical, Connect or Reconnect, Temporal features. These stage detection model use fuzzy and markov is used to train the
are comma separated file and no file can be present in more model with the training data. Once the model has been trained,
than one Source of data. Hence, the levels starting from the the input data selected from the testing dataset will be given
top level are: to the model. The model predict the MAC-ID of the insider as
• DataSet. output. After generated MAC-ID, a second stage confirmation
• Source of Information namely source 1 and source 2. is done through Profile-to-Profile comparison or template-to-
• User’s Information and Triple feature. template comparison. Only after two stage confirmation, the
• Comma Separated files. final arrival in to the insider occurs. Based on that Alert or
Example User-Week, User-Day, User-Session are granule warning generated by the system analyst based on the decision
data of user’s login information. of organization.

D. Machine learning based Data Analysis E. Detection Module


This stage uses machine learning based for the processing The MAC-ID generated by the fuzzy and markov is passed
of data and detection of insider. Machine learning is from the to the detection. In detection module, the system actually
knowledge that systems can grasp from data, understand pat- recognize the insider from the organization. MAC-ID is unique
terns and make conclusion with reduced human intervention. ID. Each electronic device has their own MAC-ID. It is a
Two stage filtering or two stage pruning is added advantage unique identification code.
of this approach. Two important machine learning algorithm
is used for this purpose. Hidden Markov Model (HMM) and F. Malicious Behavior Analysis
Fuzzy logic (FL). Microscopic level of detection take place This module actually enhances the efficiency of decision
in two phase detection, where two prominent algorithm of marking process. An added advantage of Profile-to-Profile or
machine learning helps in accurately classify the data has template-to-template comparison occurs. Here, a two stage
attackers and normal user. confirmation occurs. That is, it strength the decision which is
Fuzzy logic is used for handling combination of attribute. made. Second level of comparison occur in malicious behavior
It is used for feature aggregation and reduction of feature. So, analysis.
detect new insider which is not detected by two stage of
pruning of fuzzy and markov process.
Sudden Change in user behaviour can be identify using
profile-to-profile or template-to-template comparison. Profile
comparison help to identify attacker fastly and accurately.
Using these feature an automatic profile creating of attackers
will help to identify them very easily.
After two stage confirmation by machine learning algorithm
and profile-to-profile comparison the chance of false rate will
be reduced tremendously.

G. Action Module
In action module system analyst analysis the prediction
by machine learning algorithm and their performance mea-
sure. After two stage confirmation, final decision of insider
occur. System Analyst compare the predicted MAC-ID with
the orginal report and identify the employee who behaive
as an insider. They will forward the information with the
organization. Based on the order from the organization system
analyst generate alert, produce warning or blocking of certain
url occur. For severe case, insider who behaive as a legitimate
user need to face suspension. For, big lose, the insider will be
send to prison and entire gang of insider will be find out and
protect the reputation and secret files of organization. Mean
while, increase the trust of users.

VI. THEORETICAL ANALYSIS ON INSIDER


DETECTION
A theoretical analysis on the existing and the proposed
systems is performed. The existing mechanism implements
only the psychometric and behavioral features and certain
machine learning algorithm with proper collection of features
to predict insiders, which leads to decrease in performance
measures. But the proposed method implements two stage
confirmation method.The first stage uses novel method of
Fig. 4. Process of Insider Detection Based on Two Stage Confirmations
detection using fuzzy logic and markov features. It is a two
stage filtering or pruning methods. Machine learning is an
advanced and efficient technique than the existing ones. It
builds an effective and highly accurate model than the existing
Profile-to-Profile or Template-to-Template comparison, an behavioral based ones. The second stage of confirmation using
automatic profile creation of attackers occurs. In Profile- profile-to-profile comparison helps to reduce false positive
to-Profile Comparison an automatic profile of employees is rate.The proposed model also uses user-centric insider threat
created in the database based on the day to day activities. detection using data granularity, which is efficient to develop
It is a real time feature, that get updated periodically. In accurate models than the existing ones. Data granularity is
template-to-template comparison, a template of particular in- used for microscopic level of detection. Each small feature
dividual is already created based on some expectation that for detection of insider is considered with high importance.
particular individual will behaive in particular way. Actually Since accuracy level is comparatively higher for the proposed
it is already created expectation about an employee. It help system, efficiency will also be higher for the machine learning
to identify insider easily. Here, a concept of misuse detection based insider detection models using two stage confirmation,
and anomaly detection is used. Misuse detection is actually when compared with the existing insider detection models.
signature based which can only detect known insiders by Selection of real time features such as temporal features ,
matching the features of incoming insiders with the historical geographical and connection or re-connection features which
knowledge and predefined rules. In case of anomaly detection is the triple features help to isolate the attack model. These
automatically constructs a normal behavior of the insiders type of genuine filtering method is absent in existing system
and detects incoming insiders by computing deviations. It can which leads to increase the miliage of proposed system.
VII. CONCLUSION AND FUTURE WORK [8] Nebrase Elmrabit a, , Shuang-Hua Yang b , Lili Yangc , Huiyu Zhou,
“ Insider Threat Risk Prediction based on Bayesian Network,” Elsevier
A insider detection model has been developed using novel Journal on Computers and Security, 2020.
approach of Fuzzy and Hidden markov model (HMM) for [9] Nader Sohrabi Safaa,b, , Carsten Maplea , Tim Watsona , Rossouw Von
Solms, “ Motivation and opportunity based model to reduce information
predicting the MAC-ID of insider in the organization. User- security insider threats in organisations,” Journal of Information Security
centric insider threat detection based on data granularity is and Applications , 2017.
important path for the detection of insider. Identical collection [10] Junhong Kim, Minsik Park, Haedong Kim, Suhyoun Cho and Pilsung
Kang , “ Insider Threat Detection Based on User Behavior Modeling
of data segments is called granules. It helps to drill down and Anomaly Detection Algorithms ,” Appl. Sci., 2019.
microscopic level of features for the detection of insiders. Two [11] Guang Yang , Lijun Cai , Aimin Yu and Dan Mengand , “A General and
stage confirmation technique is used. In the first confirmation Expandable Insider Threat Detection System Using Baseline Anomaly
Detection and Scenario-driven Alarm Filters,” IEEE International Con-
stage, a two stage detection techniques using two important ference On Trust, Security And Privacy In Computing And Communi-
machine learning fuzzy and markov improves the effectiveness cations, 2018.
of detection. Hidden Markov Model is used in the first stage [12] Guang Yang , Lijun Cai , yuaimi ,h JianGang ,h Dan Me and YuWu,
” Potential Malicious Insiders Detection Based on a Comprehensive
and fuzzy is used in second stage. In the second stage of confir- Security Psychological Model,” IEEE Fourth International Conference
mation, profile -to-profile or template-to-template comparison. on Big Data Computing Service and Application ,2018.
Two stage confirmation reduces false positive rate. and based
on the identification of insider, change in internal policy occur. Ms. P. Varsha Suresh has completed B.Tech (CSE) from Sree
Future scope of this work is that the markov model cannot Buddha College Of Engineering, Elavumthitta in 2018 and is
be true in estimating conditional probability between two currently pursuing M.Tech (CSE) from Sree Buddha College
states. The current work can be enhanced in the direction, of Engineering, Pattoor.
so that the limitation of markov analysis can be over-ridden. Mrs. Minu Lalitha Madhavu pursued Bachelor of Tech-
Modifying the existing markov model with a fuzzy relation nology from Rajiv Gandhi Institute of Technology (RIT).
of attributes to a novel system that can predict the risk full She received her M.Tech degree in Technology Management
outcomes from existing attribute is a new direction for the from Kerala University and undergoing PhD at University of
research. kerala. She is currently working as an Assistant Professor in
Computer Science and Engineering in Sree Buddha College
VIII. ACKNOWLEDGEMENT of Engineering. She has published around 25 research papers
This research was supported by Dr. K Krishnakumar, the in reputed international journals. Her main areas of research
head of the institution. We would also like to show gratitude focus on Network and Security. She has more than 14 years
to the head of our institution, Dr. S.V. Annlin Jeba for sharing of experience as Assistant Professor in Computer Science at
her pearls of wisdom with us during the course of research. We Sree Buddha College Of Engineering.
thank our colleagues from Sree Buddha College Of Engineer-
ing who provided insight and expertise that greatly assisted
us although they may not agree with all of the interpretations
and conclusion of the paper.

R EFERENCES
[1] Fang-Yie Leu, Kun-Lin Tsai, Member, IEEE, Yi-Ting Hsiao, and Chao-
Tung Yang , “An Internal Intrusion Detection and Protection System
by Using Data Mining and Forensic Techniques,”IEEE SYSTEMS
JOURNAL, 2015.
[2] Santosh Aditham and Nagarajan Ranganathan, “A System Architecture
for the Detection of Insider Attacks in Big Data Systems ,” IEEE
Transactions on Dependable and Secure Computing, 2017.
[3] Loukas Lazos and Marwan Krunz, ” Selective Jamming/Dropping In-
sider Attacks in Wireless Mesh Networks,”Scopus Indexed Journal,2011.
[4] Nathalie Baracaldo, Balaji Palanisamy, and James Joshi, “G-SIR: An
Insider Attack Resilient Geo-Social Access Control Framework ,”IEEE
Transactions on Dependable and Secure Computing, 2017.
[5] Baraq Ghaleb, Ahmed Al-Dubai, IEEE, Elias Ekonomou , Mamoun
Qasem, Imed Romdhani , and Lewis Mackenzie, “Addressing the DAO
Insider Attack in RPL’s Internet of Thing,” IEEE Communications
Letters, Vol. 23, NO. 1, January 2019.
[6] Sang-Yoon Chang, Member, IEEE, and Yih-Chun Hu, Member, IEEE,
“SecureMAC: Securing Wireless Medium Access Control Against In-
sider Denial-of-Service Attacks,” IEEE Transactions on Mobile Com-
puting ,2016.
[7] Saraswathi Shunmuganathan a,* , Renuka Devi Saravanan b , Yogesh
Palanichamy c , “Securing VPN from insider and outsider bandwidth
flooding attack ,” Elsevier journal , 2020.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy