A Two-Stage Aggregation/thresholding Scheme For Multi-Model Anomaly-Based Approaches

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

A two-stage aggregation/thresholding scheme for

multi-model anomaly-based approaches


Karim Tabia Salem Benferhat Yassine Djouadi
CRIL CNRS UMR 8188 CRIL CNRS UMR 8188 Computer Science Department
Artois University Artois University Tizi Ouzou University
France France Algeria
Email: tabia@cril.univ-artois.fr Email: benferhat@cril.univ-artois.fr Email: djouadi@ummto.dz

Abstract—This paper deals with anomaly score aggregation However, the main advantage of anomaly detection lies in its
and thresholding in multi-model anomaly-based approaches potential capacity to detect both new and unknown (previously
which require multiple detection models and profiles in order to unseen) as well as known attacks.
characterize the different aspects of normal activities. Most works
focus on profile/model definition while critical issues related Several anomaly-based systems use models and profiles [13]
to anomaly measuring, aggregating and thresholding have not [20] [17] [14] to represent normal behaviors of networks,
received similar attention. In this paper, we in particular address hosts, users, programs, etc. In most anomaly-based IDSs,
the issue of anomaly scoring and aggregating which is a recurring anomaly score relative to a given audit event (network packet,
problem in multi-model anomaly-based approaches. We propose system call, application log record, etc.) often depends on
a two stage aggregation/thresholding scheme particularly suitable
for multi-model anomaly-based approaches. The basic idea of our several local deviations measuring how much anomalous is
scheme is the fact that anomalous behaviors induce either intra- the audit event with respect to the different normal profiles
model anomalies or inter-model ones. Our scheme is designed and detection models [14]. Critical issues in anomaly detection
for real-time detection of both intra-model and inter-model are normal profile/model definition and anomaly scoring and
anomalies. More precisely, we propose local thresholding in order thresholding. The first issue is concerned with extracting and
to detect intra-model anomalies and use a Bayesian network in
order to, on one hand, extract inter-model regularities and serve, selecting the best features to analyze in order to effectively
on the other hand, as an aggregating function for computing the detect anomalies. The second issue is also critical as it provides
overall anomaly score associated with each analyzed audit event. the anomaly scores determining whether audit events should
Our experimental studies, carried out on recent and real http be flagged normal or anomalous.
traffic, show for instance that most Web-based attacks induce The problem of bad tradeoffs between detection rates and
only intra-model anomalies and can be effectively detected in
real-time. Moreover, this scheme significantly improves the detec- underlying false alarm ones characterizing most anomaly-
tion rate of Web-based attacks involving inter-model anomalies. based IDSs are in part due to problems in anomaly measuring,
aggregating and thresholding methods. In this paper, we first
address drawbacks of existing methods for anomaly score mea-
I. I NTRODUCTION suring, aggregating and thresholding. Then, we propose a two
Intrusion detection [2] is concerned with real time or off- stage scheme for anomaly score aggregation and thresholding
line analysis of the events happening in information systems in suitable for multi-model anomaly-based approaches. The pro-
order to reveal unauthorized resources use or access attempts posed scheme aims at effectively detecting both intra-model
or any suspicious activity. Intrusion detection systems (IDSs) anomalies and inter-model ones. This scheme combines local
are either misuse-based such as SNORT [19] or anomaly- thresholding in order to detect intra-model anomalies with a
based such as EMERALD [17] or a combination of both Bayesian network that is used on one hand to learn inter-
the approaches in order to exploit their mutual complemen- model anomalies and compute the overall anomaly scores on
tarities [22]. The first approach is based on known attack the other hand. The proposed scheme is particularly designed
signatures whereas the latter focuses on the normal activity to overcome most existing methods’ drawbacks. We carried
profile. Misuse-based IDSs are known to achieve very high out experimental studies on real and recent http traffic and
detection rates on known attacks but they are ineffective in several Web-based attacks.
detecting new ones. Anomaly approaches build profiles or The rest of this paper is organized as follows: Section II briefly
models representing normal behaviors and detect intrusions presents anomaly scoring and points out drawbacks of existing
by comparing current system activities with learnt profiles. methods for anomaly measuring, aggregating and thresholding.
In practice, anomaly-based IDSs are efficient in detecting new A two stage scheme for anomaly score aggregation and thresh-
attacks but cause high false alarm rates which really encumbers olding is proposed in section III. In section IV, we detail the
the application of anomaly-based IDSs in real environments. Bayesian network-based scheme for aggregating local anomaly
In fact, configuring anomaly-based systems to acceptable false scores. Section V presents our experimental studies. Finally,
alarm rates result in failure to detect most malicious activities. section VI concludes this paper.

978-1-4244-2413-9/08/$25.00 ©2008 IEEE 919


II. M ULTI - MODEL ANOMALY- BASED APPROACHES compute local anomaly scores. Most used anomaly
measures are distance measures (which are widely
Multi-model anomaly-based approaches rely on several
used in outlier detection [1] and clustering [7]),
models/profiles in order to represent the different character-
probability measures [20], density measures [6]
istics of normal activities of users, programs, hosts, etc. For
and entropy measures [16]. Note that a multi-
instance, authors in [14] use six detection models in order
model anomaly-based approach may use different
to characterize the different aspects of http requests. Hence,
anomaly measures within the same system.
anomaly scoring functions evaluating the deviation of a given
b) Aggregating functions: Aggregating functions are
audit event1 with respect to learnt models/profiles first com-
used to fuse all individual anomaly scores into
putes local deviations of this audit event with each detection
a single anomaly score which will be used to
model/profile and aggregates the obtained local deviations in
decide whether the analyzed event is normal or
order to compute an overall anomaly score which will be used
anomalous. Namely, a global anomaly score AS
to decide whether the event in hand is normal or anomalous.
for an audit event E is computed using aggregating
Designing a multi-model anomaly-based approach needs to
function G which aggregates all local anomaly
design the following elements:
scores ASMi (E) relative to corresponding pro-
1) Profiles/Detection models: Effective detection files/models Mi .
models/profiles ideally consist in ”all” features/aspects
that can show differences between normal activities and AS(E) = G(ASM1 (E), ASM2 (E).., ASMn (E))
abnormal ones. Note that most common form of audit (1)
events used in statistical-based IDSs are multivariate In practice, aggregating functions range from
audit records describing network packets, connections, simple sum-based methods [10][15] to complex
system calls, application log records, etc. These audit models such as Bayesian networks [12][20].
records involve different data types among which
continuous and categorical data are common. Note that 3) Anomaly thresholding Thresholding is needed to
in order to characterize normal activities, detection transform a numeric anomaly score into a symbolic
models/profiles often take into account the deployed value (N ormal or Abnormal) such that an alert can be
security policy and historical data. For instance, an raised. Namely, thresholding is done by specifying value
anomaly-based approach may involve a detection intervals for both normal and anomalous behaviors.
model/profile to reflect the fact that the security policy It is important to note that only few works addressed
states that files containing confidential data must not be anomaly thresholding issues [18]. In fact, some authors
accessed by way of Web accesses. As for historical data, just use a single value [15][20] to fix the border-line
it allows to extract regularities and trends characterizing between normal and abnormal behaviors while others
past activities. For example, Web server log records are use range of values to fix this limit and flag events
used to learn long term models/profiles characterizing as N ormal, Abnormal or U nknown. In practice,
request lengths, character distributions, etc. thresholds are often fixed according to the false alarm
rate which must not be exceeded. Thresholds can
2) Anomaly scoring measures They are functions be statically or dynamically set. The advantage of
computing anomaly scores for every analyzed event. dynamically fixing a threshold is the ability to reassign
According to a fixed or learned threshold, an anomaly its value in such a way to limit the amount of triggered
score associated with an event allows flagging it alerts.
normal or anomalous. To compute such anomaly
scores, anomaly scoring measures use the following
Figure 1 shows a typical architecture of a multi-model
functions:
anomaly-based approach.

a) Set of ”individual” (or ”local”) anomaly scoring


measures: They are functions that evaluate the
normality of audit event with respect to normal
profiles individually. For example, in [15] three
statistical profiles represent normal http and DN S
requests: Request type profile, Request length pro-
file and Character distribution profile. Then three
anomaly scoring functions are used in order to
1 According to intrusion detection field, an audit event can be a packet or
a connection in case of network-oriented intrusion detection, a system log
record in case of host-oriented intrusion detection, a Web server log record
in case of Web-oriented intrusion detection, etc. Fig. 1. A typical multi-model anomaly-based approach

920
It is clear that the effectiveness of anomaly-based ap- sniffer may drop packets. Though, when applied to
proaches strongly depend on profile/model definition and network traffic, how can the model proposed in [15]
anomaly scoring measure relevance. In order to illustrate our deal with a request if the sniffer dropped the packet
ideas, we use a simple but widely used Web-based anomaly containing the request method? The problem is how to
approach developed by Kruegel & V igna [13]. These authors analyze audit events given that some inputs are missing.
proposed a multi-model approach to detect Web-based attacks
relying on six detection models (Attribute length, Character In the following, we propose an anomaly score aggrega-
distribution, Structural inference, Token finder, Attribute pres- tion/thresholding scheme particularly designed to overcome
ence or absence and Attribute order). During detection phase, the limitations of existing methods.
the six models output anomaly scores which are aggregated
using a weighted sum. Recently, this model has been examined III. A TWO STAGE SCHEME FOR ANOMALY SCORE
in depth in [9]. AGGREGATION AND THRESHOLDING
In this section, we detail the new scheme for aggregat-
A. Drawbacks of existing schemes for anomaly measuring and ing anomaly scores and thresholding particularly suitable for
aggregating multi-model anomaly-based approaches.
Most multi-model anomaly-based approaches use sum-
based aggregating functions [12]. Such methods are ”simplis- A. Anomalous behaviors
tic” and ineffective. In fact, most existing aggregating methods The premise of anomaly-based approaches is the assumption
suffer from several problems: that attacks induce abnormal behaviors. There are different
1) Weighting local anomaly scores is often done in a possibilities about how anomalous events affect and manifest
”questionable” way. For example, authors in [15] neither through elementary features. For instance, anomalous events
explained how they assign weights nor why they use can be in the form of an anomalous (new or outlier) value
same weighting for http and DN S requests. in a feature, anomalous combination of normal values or
2) The accumulation phenomena which causes several anomalous sequence of events. Accordingly, alerts raised by
small local anomaly scores to cause, once summed, a a multi-model anomaly-based approach can be caused by two
high global anomaly score. anomaly categories:
3) The averaging phenomena which causes a very high • Intra-model anomalies: They are anomalous behaviors
local anomaly score to cause, once aggregated, a low affecting one single model. Namely, the anomaly
global anomaly score. evidence is obvious only through one detection model.
4) Commensurability problems are encountered when dif- For example, in Kruegel & V igna model, there are
ferent detection model outputs do not share the same buffer-overflow attacks which heavily affect the length
scale. Then some anomaly scores will have much more model without affecting the other models. Similarly,
importance in the overall score than the others. directory traversal and cross site scripting attacks
5) Ignoring inter-model dependencies existing between the only affect the character distribution model. Then, the
different detection models: Most used aggregating func- anomaly score computed using the affected model should
tion assume that detection models are independent. This suffice in order to detect such attacks.
results in wasting valuable information that can be
exploited to detect anomalies. • Inter-model anomalies: They are anomalies that affect
6) Real-time detection capabilities: The decision of raising regularities and correlations existing between different
an alert is taken on the basis of the global anomaly models. In Kruegel & V igna model, authors pointed
score which requires computing all local anomaly scores out correlations between Length model and Character
then aggregating them. This method causes several distribution one. Then audit events violating such
problems especially for effectiveness considerations. For regularities are possibly anomalous. For example, in
example, when analyzing buffer-overflow attacks, the a Web-based multi-model approaches using Request
request length can be sufficient and there is not need to length and Attribute number detection models, there
compute the other anomaly scores. Moreover, in buffer- are inter-model regularities between the Length model
overflow attacks, the request is often segmented over and the Argument N umber model. Generally, when an
several packets which are reassembled at the destination http request has an oversized length, it often involves
host. However, such attack can be detected given the a large number of arguments and parameters. However,
first packets of the request and there is not need to wait a large request without any argument is potentially a
for all packets in order to detect such an anomaly. buffer-overflow attack.
7) Handling missing inputs: Missing data is an important
issue that existing systems have not dealt with It is obvious that intra-model anomalies can be detected
conveniently. In fact, many intentional or accidental without aggregating the different anomaly scores. Moreover,
causes can provoke the missing of some data pieces. this is interesting because such anomalies can be detected in
For example, in gigabyte networks, network packet real-time.

921
B. Local and global thresholding • Handling missing inputs: Missing inputs only affect mod-
Given that anomalous events can either affect detection els requiring these input. Then remaining models can
models individually or violate regularities existing between work normally and detect intra-model anomalies.
detection models, then we propose a two-stage thresholding • Intra-model and inter-model anomaly detection: As we
scheme aiming at raising an alert whenever an anomalous will see in experimental studies, combining local and
behavior occurs be it intra-model or inter-model. In fact, any global thresholding allows detecting more effectively
anomaly revealed by a detection model is sufficient to raise both intra-model and inter-model anomalies.
an alert even if other detection models have not yet returned
their anomaly scores. This is the idea motivating the two-stage
IV. BAYESIAN NETWORK - BASED AGGREGATION
thresholding scheme. Namely, each detection model has its
own anomaly threshold TMi . During the detection phase, once In this section, we first show how to learn inter-model
input data for detection model Mi is available, the system can regularities from empirical normal training data. Then we
trigger an alert whenever anomaly score AsMi (E) exceeds show how to use the learnt Bayesian network in order to
the corresponding threshold TMi . If no intra-model anomaly compute overall anomaly scores associated with audit events.
is detected, then we need to look for inter-model anomalies.
A. Bayesian networks
• Intra-model thresholding: In order to detect intra-model
anomalies, we fix for each detection model Mi a local Bayesian networks are powerful graphical models for
anomaly threshold in the following way: representing and reasoning under uncertainty conditions
[11]. They consist of a graphical component DAG (Directed
T hresholdMi = (M axEi ∈T S (AsMi (Ei )) ∗ θ (2) Acyclic Graph) and a quantitative probabilistic one. The
Threshold T hresholdMi associated with detection model graphical component allows an easy representation of
Mi is set to the maximum among all anomaly scores domain knowledge in the form of an influence network
computed on normal audit events Ei involved in training (vertices represent events while edges represent ”influence”
set T S. θ denotes a discounting/enhancing factor in relationships existing between these events). The probabilistic
order to control the detection rate and the underlying component expresses uncertainty relative to influence
false alarm rate. In case when no intra-model anomaly relationships between domain variables using conditional
is detected, then we need to check for inter-model probability tables (CPTs). Learning Bayesian networks
anomalies. requires training data to learn the network structure and
compute the conditional probability tables. Note that the
• Global thresholding: Similarly to intra-model threshold- structure can be specified by the expert according to his
ing, a threshold can be fixed for global anomaly score as domain knowledge.
follows: Several works used Bayesian networks for anomaly detection
[8][20][23]. For instance, authors in [12] used a Bayesian
T hreshold = (M axAi ∈T S (As(Ai )) ∗ θ (3)
network in order to assess the anomalousness of system
Note that term AS(Ai ) denotes the output of the calls. In our case, main advantages of Bayesian networks are
anomaly score aggregating function while term Ai learning capabilities in order for instance to automatically
denotes a training anomaly score record. In order to extract inter-model regularities and inference capacities
control the detection rate/false alarm rate tradeoff, one which are very effective. Moreover, Bayesian networks can
can use the discounting/enhancing parameter θ. combine user-supplied structure with empirical data. The
other interesting advantage of using a Bayesian network
Note that the motivation of setting the anomaly thresholds to is to use it as an aggregating function which takes into
the maximum among all anomaly scores computed on normal account inter-model regularities and naturally copes with both
training behaviors is to detect any event whose anomaly score weighting and commensurability problems. In fact, using
exceeds all normal behavior scores used to build the detection a Bayesian network to evaluate the normality of a given
models. This maximum-based thresholding is intuitive and anomaly score record does not require any transformation on
does not require any assumption about anomaly scores. the anomaly score record to be analyzed.
In fact, the greatest anomaly score on training behaviors
is the one associated with normal but unusual behavior.
Then behaviors having greater anomaly score are possibly B. Training the Bayesian network: Extracting inter-model
anomalous. regularities
In order to automatically extract inter-model regularities
Local and global thresholding can be combined in order to from a normal training set, we train a Bayesian network on
exploit their advantages: historical normal audit events. It is important to note that
• Real-time detection: With local thresholding, every intra- in order to learn these regularities, we need to train the
model anomaly is detected without waiting for other Bayesian network not directly on audit events (which may
detection model results. be unstructured data such as raw http requests) but on the

922
anomaly score records relative to training audit events (output (graph or network topology) given a data set of training exam-
of different detection models). ples [11]. There are two categories of Bayesian network learn-
1) Training set for learning inter-model regularities: Here ing algorithms: scoring-based structure learning algorithms
we explain the process of preparing the training set that will which search for the structure that best fits the training data by
be used to learn inter-model regularities: maximizing a score such as the Bayesian [5] and MDL [21]
Given a data set of m normal audit events Ei , we build scores, and IC-based (Conditional Independence) algorithms
an equivalent data set of m normal anomaly score records where the structure is learnt by searching for conditional
A1 ,A2 ,..,Am where each record Ai is composed of all local independence relationships existing between domain variables.
anomaly scores (namely Ai = (ai1 , .., ain ) corresponding The learnt network structure represents inter-model regulari-
to local anomaly scores relative to normal audit event Ei ties while conditional probability tables quantify inter-model
with respect to detection models M1 ,..,Mn and anomaly mea- influences. Ideally, all strong correlations existing between
sure AsM1 ,..,AsMn respectively). Then learning a Bayesian variables representing the detection models are extracted from
network from these anomaly vectors will learn inter-model training data. For example, Figure 3 shows a Bayesian network
regularities involved in the training set. Once inter-model leant from the normal anomaly score records corresponding to
regularities learnt, it is possible to detect audit events that a set of attack-free http requests:
violate these regularities.
As it is shown on Figure 2, normal training audit events are
first transformed into normal anomaly score records compos-
ing the training set. Using this latter, a Bayesian network is
learnt using a Bayesian network learning algorithm such as
K2 [4]:

Fig. 3. Example of a Bayesian network showing extracted inter-model


regularities

Note that the Bayesian network of Figure 3 is built with


the well-known K2 algorithm [4] using a hill-climbing search
strategy. Each node in this network represents a detection
model (in this example, there are 9 detection models designed
to reveal Web-based anomalies) while arcs represent influence
relationships existing between these detection models. Some
of these relationships are natural and intuitive such as the
Fig. 2. Training set preparation relationship existing between the number of parameters and
the number of arguments (in training set, these numbers are
Figure 2 shows the training data preparation step where in most cases equal) and the relationship between the request
normal audit events (in this example, http requests) are length and the URL length (the URI is a part of an http
transformed into anomaly score records. Namely, for each request). Other influence relationships are less intuitive but
training http request, there will be n local anomaly scores they are empirically significant correlations.
corresponding to the deviations of this request from the
n detection models (each training http request will be 3) Computing global anomaly score using the Bayesian
represented by its anomaly score record). Henceforth, this network: Once the Bayesian network built, it can be used
data set can be used for training a Bayesian network in to compute the probability of any anomaly score record. We
order to automatically learn inter-model regularities existing first compute the different anomaly scores then using the
between the selected detection models. Bayesian network, we compute the probability of the current
anomaly vector. Semantically, the normality of audit event Ei
2) Learning inter-model regularities from normal training is proportional to the probability of the corresponding anomaly
data: Structure learning from empirical data is an active vector. In fact, anomaly score records which are similar to
research topic. It is concerned with learning the best structure the training ones will be associated with high probabilities

923
while anomaly score records which significantly differ from client-side Web applications. The detection model features are
training anomaly vectors will be associated with zero or very grouped into four categories [3]:
low probabilities. 1) Request general features: They are features that pro-
Using a Bayesian network as an aggregating function has vide general information on http requests. Examples of
several advantages. First, inter-model regularities are taken such features are request method, request length, etc.
into account when computing the overall anomaly score. 2) Request content features: These features search for
Moreover, the Bayesian network-based aggregation overcomes particularly suspicious patterns in http requests. The
problems related to commensurability and weighing issues. number of non printable/metacharacters, number of di-
Since the Bayesian network is learnt on normal anomaly score rectory traversal patterns, etc. are examples of features
records, then whatever are the outputs of the different local describing request content.
detection models, they are easily and efficiently aggregated 3) Response features: Response features are computed by
by the Bayesian network. analyzing the http response to a given request. Examples
Using a Bayesian network, the probability of a given anomaly of these features are response code, response time, etc.
score record Ai =(Ai1 , .., Ain ) is computed using the chain 4) Request history features: They are statistics about
rule [11] as follows: past connections given that several Web attacks such
Y
p(Ai ) = p(Aij /parent(Aij )), (4) as flooding, brute-force, Web vulnerability scans,
j=1..n
etc. perform through several repetitive connections.
Examples of such features are the number/rate of
where term parent(Aij ) denotes parent nodes of variable connections issued by same source host and requesting
Ai in the Bayesian network. Using the learnt structure and same/different URIs, inter-request time interval, etc.
conditional probability tables (CPTs), it is easy and rapid to
compute the probability of any anomaly score record which
Note that in our experimentations, we selected the detection
is proportional to normality of the audit event in hand. Let us
models of Table I.
now see how to set the global anomaly thresholds.
From an anomaly-based approach point of view, every TABLE I
anomaly score exceeding all normal training anomaly scores S ELECTED DETECTION MODELS
should trigger an alert. Then, global anomaly threshold can be Name Description Type
fixed as follows: Req-length Request length Integer
URI-length URI length Integer
Req-method Request method (GET, POST HEAD...) Nominal
T hreshold = (M axAi ∈T S (1 − p(Ai )) ∗ θ (5) Req-resource-type Type of requested resource ( html, cgi, Nominal
php, exe, ...)
Term p(Ai ) in Equation 5 denotes the probability degree Num-param Number of parameters Integer
Num-arg Number of arguments Integer
computed using the Bayesian network. This threshold flags Num- Number of special and meta-characters Integer
anomalous any event having a probability degree smaller than NonPrintChars and shell codes in the http request
(x86, carriage return , semicolon...)
the most improbable training event involved in training set Resp-Code Response code to http request (200, Nominal
T S. Term θ denotes a discounting/enhancing factor allowing 404, 500...)
Response-time Time elapsed since the corresponding Real
to control the detection rate/false alarm rate tradeoff. http request
Script-type The type of script included in the re- Nominal
sponse (Java, Visual basic, ...)
V. E XPERIMENTAL STUDIES Num-Req-Same- Number of requests issued by same Integer
Host source host during last 60 seconds
In order to evaluate our aggregating/thresholding scheme, Num-Req-Same- Number of requests with same URL Integer
URL during last 60 seconds
we use a multi-model approach designed to detect Web-based Num-Req-Same- Number of requests issued by same Integer
anomalies. We selected a subset of detection models and Host-Diff-URI source and requesting different URLs
during last 60 seconds
features that we designed to detect attacks against server-side Inter-Req-Interval Inter request time interval Real
and client-side Web applications [3]. The selected detection Http-Error-Rate Rate of http responses with error codes Real
during last 60 seconds
models are built on real and recent attack-free http traffic and
evaluated on real and simulated http traffic involving normal
data as well as several Web-based attacks. In our experimentations, numeric features are modeled
by their means µ and standard deviations σ while nominal
A. Detection model definition features are represented by the frequencies of possible values.
In [3], we proposed a set of detection models and classi-
fication features including basic features of http connections B. Local anomaly scoring measures
as well as derived features summarizing past http connec- During the detection phase, the anomaly score associated
tions. Note that detection model’s inputs are directly extracted with a given http connection lies in the local anomaly scores
from network packets instead of using Web application logs. of the connection features with respect to the learnt detection
Processing whole http traffic is the only way for detecting models. We use different anomaly measures according to each
suspicious activities and attacks targeting either server-side or profile type(numeric, nominal) and its distribution in training

924
data. It is important to note that most numeric features in that this traffic includes both inbound and outbound http
training set have rather exponential distributions than Gaussian connections. We extracted http traffic and preprocessed it into
ones. For example, Figure 4 shows the distribution of URI- connection records using only packet payloads. As for attacks,
length and Num-Req-Same-Host features in the training set we simulated most of the attacks involved in [9] which is to
we used in our experimentations. our knowledge the most extensive and up-to-date open Web-
attack data set.
TABLE II
T RAINING / TESTING DATA SET DISTRIBUTION

Training data Testing data


Class Number % Number %
Normal connections 55342 100% 61378 58.41%
Fig. 4. Example of training distributions Buffer overflow – – 18 0.02%
Value misinterpretation – – 2 0.001%
Figure 4 clearly shows that distribution of URI-length Poor management – – 3 0.001%
and Num-Req-Same-Host features are not Gaussian. Indeed, Flooding – – 12485 11.88%
distribution of URI-length involves large numbers of normal Vulnerability scan – – 31152 29.64%
requests where the length of the URI is very low. As for Cross Site Scripting – – 6 0.01%
Num-Req-Same-Host’s distribution, Figure 4 shows that most SQL injection – – 14 0.01%
normal requests are characterized by values near to zero Command injection – – 9 0.01%
Other input validation – – 46 0.04%
meaning that most source hosts do not issue several successive
http requests to the same destination host within the selected Total 55342 100% 105084 100%
time window (in our experimentations, we used a 60 sec-
onds time window for computing Request history features). Note that in order to train the Bayesian network, normal
These finding suggest that abnormal behaviors such as buffer- http requests (second column in Table II) are first transformed
overflow attacks and vulnerability scans or flooding will cause into a training set composed of the anomaly score records
very large values. Accordingly, local anomaly measures used corresponding to these training requests. As for attacks of
in order to compute local anomaly score takes into account the Table II, they are categorized according to the vulnerability
distribution type empirically obtained from normal training set. category involved in each attack. Regarding attacks effects,
In order to compute the anomaly score of a given feature Fi attacks of Table II include denial of service, vulnerability
with respect to the corresponding detection model Mi , we scans, information leak, unauthorized and remote access [9].
consider two cases: It is important to note that attacks involved in Table II are
• if Fi is numerical then the anomaly score is computed as the most frequent Web-based attacks during 2007, the year
follows: training and testing sets are built.
Fi −µi
AsMi (Fi ) = e σi (6) D. Comparison of thresholding and aggregation schemes
Terms µi and σi denote respectively the mean and stan- Table III compares on one hand results of a sum-based
dard deviation of feature Fi in normal data. σi is used aggregation using a single global threshold with local thresh-
as a normalization parameter. Note that only exceeding olding solely then with a sum-based aggregation combined
values cause high anomaly scores. Intuitively, if the value with local and global thresholding. On the other hand, we
of Fi is less, equal or closer to the average µi then the evaluate the Bayesian network-based2 approach using a single
anomaly score will be negligible. Otherwise, the wider global threshold and the combination of the local and global
the margin, the greater will the anomaly score. thresholding with Bayesian network-based aggregation. Note
• if Fi is a nominal feature then the anomaly score is that all the anomaly thresholds are computed on normal
computed according to the improbability of the value of training data and we do not use any discounting/enhancing
Fi in normal training data. Namely, parameter θ (θ=1).
Firstly, Table III shows that our scheme perform better than
AsMi (Fi ) = −log(p(Fi )) (7)
the reference sum-based scheme. Moreover, it is important
Term p(Fi ) denotes the frequency of Fi ’s value in normal to note that most attacks induce only intra-model anomalies
training data. Intuitively again, the more exceptional and can be detected without any aggregation. In fact, the
(unusual) is the value of Fi in training data, the higher combination of sum-based scheme with local thresholding
will be the anomaly score. Conversely, frequent and usual significantly enhances the detection rates without triggering
values will be associated with low anomaly scores. higher false alarm rates. Similarly, Bayesian network-based
C. Training and testing data aggregation using global thresholding achieves better results
regarding detection rates and false alarm rate. Note that
Our experimental studies are carried out on a real http
traffic collected on a University campus during 2007. Note 2 Note that structure learning is performed using the K2 algorithm [4]

925
TABLE III
E VALUATION OF DIFFERENT AGGREGATION / THRESHODLING SCHEMES ON [3] S. Benferhat and K. Tabia. Classification features for detecting server-
http TRAFFIC side and client-side web attacks. In SEC2008 : 23rd International
Security Conference, Milan, Italy, 2008.
[4] Gregory F. Cooper and Edward Herskovits. A bayesian method for
Sum Bayes constructing bayesian belief networks from databases. In Proceedings
Sum- aggreg+ aggreg+ of the seventh conference (1991) on Uncertainty in artificial intelligence,
based local local Bayes local pages 86–94, San Francisco, CA, USA, 1991. Morgan Kaufmann
Audit event class aggreg thresh thresh aggreg thresh Publishers Inc.
[5] Gregory F. Cooper and Edward Herskovits. A bayesian method for the
Normal connections 99.94% 97.37% 97.37% 99.79% 99.66% induction of probabilistic networks from data. Mach. Learn., 9(4):309–
Buffer overflow 16.67% 94.44% 94.44% 27.78% 94.44% 347, 1992.
Value misinterpretation 100% 100% 100% 50% 100% [6] Levent Ertz, Eric Eilertson, Aleksandar Lazarevic, Pang-Ning Tan, Vipin
Kumar, Jaideep Srivastava, and Paul Dokas. Minds - minnesota intrusion
Poor management 100% 100% 100% 66.67% 100%
detection system.
Flooding 95.46% 99.62% 99.62% 86.22% 99.93% [7] Sa Li Gerhard Mnz and Georg Carle. Traffic anomaly detection using
Vulnerability scan 0.00% 51.84% 51.84% 83.06% 90.56% k-means clustering., 2007.
Cross Site Scripting 0.00% 100% 100% 100% 100% [8] Vaibhav Gowadia, Csilla Farkas, and Marco Valtorta. Paid: A proba-
SQL injection 0.00% 100% 100% 100% 100% bilistic agent-based intrusion detection system. Computers & Security,
24(7):529–545, 2005.
Command injection 0.00% 100% 100% 100% 100% [9] Kenneth L. Ingham and Hajime Inoue. Comparing anomaly detection
Other input validation 2.17% 86.96% 86.96% 23.91% 91.30% techniques for http. In RAID, pages 42–62, 2007.
Total 69.72% 84.16% 84.16% 93.20% 97.02 % [10] Javits and Valdes. The NIDES statistical component: Description and
justification. mar 1993.
[11] F. V. Jensen. An Introduction to Bayesian Networks. UCL press, London,
1996.
the best results are achieved by the Bayesian network-based [12] Christopher Kruegel, Darren Mutz, William Robertson, and Fredrik
Valeur. Bayesian event classification for intrusion detection. In ACSAC
aggregation combined with local and global thresholding (see ’03: Proceedings of the 19th Annual Computer Security Applications
detection rates over normal connections and Web attacks). This Conference, page 14, Washington, DC, USA, 2003.
is due to the fact that this scheme detects both intra-model and [13] Christopher Kruegel and Giovanni Vigna. Anomaly detection of web-
based attacks. In CCS ’03: Proceedings of the 10th ACM conference
inter-model regularities learnt by the Bayesian network. on Computer and communications security, pages 251–261, New York,
NY, USA, 2003.
VI. C ONCLUSION [14] Christopher Kruegel, Giovanni Vigna, and William Robertson. A multi-
model approach to the detection of web-based attacks. volume 48, pages
This paper dealt with anomaly scoring, thresholding and ag- 717–738, New York, NY, USA, 2005.
gregating issues in multi-model anomaly detection approaches. [15] Christopher Krugel, Thomas Toth, and Engin Kirda. Service specific
We proposed a two-stage aggregating/thresholding scheme anomaly detection for network intrusion detection. In SAC ’02: Pro-
ceedings of the 2002 ACM symposium on Applied computing, pages
suitable for detecting in real-time intra-model and inter-model 201–208, New York, NY, USA, 2002.
anomalies. The basic idea of our scheme is that anomalous [16] Wenke Lee and Dong Xiang. Information-theoretic measures for
behaviors either affect a single detection model or violates anomaly detection. In SP ’01: Proceedings of the 2001 IEEE Symposium
on Security and Privacy, page 130, Washington, DC, USA, 2001.
regularities existing between the different detection models. [17] Peter G. Neumann and Phillip A. Porras. Experience with EMERALD
The proposed scheme combines local thresholding in order to to date. In First USENIX Workshop on Intrusion Detection and Network
detect in real-time intra-model anomalies and global thresh- Monitoring, pages 73–80, Santa Clara, California, apr 1999.
[18] S.Srinivasan Sathish Alampalayam P. Kumar, Anup Kumar. Statistical
olding in order to detect inter-model regularities. These latter based intrusion detection framework using six sigma technique.
are directly extracted from attack-free training data using a [19] Snort. Snort: The open source network intrusion detection system.
Bayesian network which is used during the detection phase http://www.snort.org. 2002.
[20] Stuart Staniford, James A. Hoagland, and Joseph M. McAlerney. Prac-
to compute the overall anomaly score associated with each tical automated detection of stealthy portscans. J. Comput. Secur., 10(1-
analyzed audit event. Experimental studies, carried out on real 2):105–136, 2002.
and recent http traffic, showed that most Web-related attacks [21] J. Suzuki. A construction of bayesian networks from databases based on
a mdl scheme,. In A construction of Bayesian networks from databases
only induce intra-model anomalies and can be detected in real- based on a MDL scheme,, pages 266–273, San Mateo, CA: Morgan
time using local thresholding. Future works will explore the Kaufmann.
application of our scheme in order to detect anomalies and [22] Elvis Tombini, Herve Debar, Ludovic Me, and Mireille Ducasse. A
serial combination of anomaly and misuse idses applied to http traffic.
attacks when input data is uncertain or missing. In ACSAC ’04: Proceedings of the 20th Annual Computer Security
Applications Conference, pages 428–437, Washington, DC, USA, 2004.
VII. ACKNOWLEDGMENTS [23] Alfonso Valdes and Keith Skinner. Adaptive, model-based monitoring
for cyber attack detection. In RAID ’00: Proceedings of the Third
This work is supported by a French project entitled PLACID International Workshop on Recent Advances in Intrusion Detection,
(Probabilistic graphical models and Logics for Alarm Corre- pages 80–92, London, UK, 2000.
lation in Intrusion Detection).

R EFERENCES
[1] Fabrizio Angiulli, Stefano Basta, and Clara Pizzuti. Distance-based
detection and prediction of outliers. IEEE Trans. on Knowl. and Data
Eng., 18(2):145–160, 2006.
[2] Stefan Axelsson. Intrusion detection systems: A survey and taxonomy.
Technical Report 99-15, Chalmers Univ., 2000.

926

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy