DeepEAD

MITSUBISHI ELECTRIC RESEARCH LABORATORIES
https://www.merl.com
DeepEAD: Explainable Anomaly Detection from System

Logs
Wang, Xinda; Kim, Kyeong Jin; Wang, Ye; Koike-Akino, Toshiaki; Parsons, Kieran
TR2023-050 May 31, 2023
Abstract
System logs record rich information for system events. Practical anomaly detection from sys-
tem logs should be able to address three challenges: 1) understanding complicated attributes
in event logs; 2) extracting complex context relations among events; and 3) providing concrete
explanations to human analysts. In this paper, we develop an attention-equipped encoder-
decoder system to capture context from system logs for explainable anomaly detection. For
each target event, we collect its nearby events in chronological order as its context events.
Instead of using a recurrent neural network-based encoder like previous works, we adopt a
Transformer-based encoder to extract complex relations among context events and their at-
tributes. Then, a context vector is generated and passed to the decoder, where an attention
matrix is learned and used to weigh the context events for detecting the anomalies. Eval-
uation on the large-scale real-world Los Alamos National Laboratory dataset shows that,
compared with existing works, our methods can provide fine- grained one-to-one attention to
help explain the importance of each attribute in the context events to the prediction, without
sacrificing detection performance.
IEEE International Conference on Communications (ICC) 2023
c 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in
any current or future media, including reprinting/republishing this material for advertising or promotional purposes,
creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of
this work in other works.
Mitsubishi Electric Research Laboratories, Inc.

201 Broadway, Cambridge, Massachusetts 02139
DeepEAD: Explainable Anomaly Detection
from System Logs
Xinda Wang∗† , Kyeong Jin Kim∗ , Ye Wang∗ , Toshiaki Koike-Akino∗ , Kieran Parsons∗
∗ Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA, USA
† Center for Secure Information Systems, George Mason University, Fairfax, VA, USA
xwang44@gmu.edu, {kkim, yewang, koike, parsons}@merl.com
Abstract—System logs record rich information for system in time [6]. Failing to effectively consider the context around
events. Practical anomaly detection from system logs should be the event is the main reason for false alarms. To circumvent
able to address three challenges: 1) understanding complicated anomaly detectors, recent attackers tend to organize complex
attributes in event logs; 2) extracting complex context relations
among events; and 3) providing concrete explanations to human attacks where benign events are interleaved with suspicious
analysts. In this paper, we develop an attention-equipped encoder- ones. Without excluding unrelated benign events, models pro-
decoder system to capture context from system logs for explain- posed by [7] and [8] may learn these benign events as anomaly
able anomaly detection. For each target event, we collect its features. To solve this problem, the proposed approach should
nearby events in chronological order as its context events. Instead be able to capture the context where a suspicious event is
of using a recurrent neural network-based encoder like previous
works, we adopt a Transformer-based encoder to extract complex triggered in event logs and put emphasis on suspicious events.
relations among context events and their attributes. Then, a Although several approaches extract the context from log
context vector is generated and passed to the decoder, where sequences by adopting a series of recurrent neural network
an attention matrix is learned and used to weigh the context (RNN)-based models [9]–[13], they only predict if there is an
events for detecting the anomalies. Evaluation on the large-scale anomaly, but cannot explain why, which still necessitates much
real-world Los Alamos National Laboratory dataset shows that,
compared with existing works, our methods can provide fine- effort to manually examine neighboring logs and correlate
grained one-to-one attention to help explain the importance of context events. To help understand the context events involved
each attribute in the context events to the prediction, without during automated anomaly decisions, attention mechanisms
sacrificing detection performance. are employed by [14] and [15]. However, their attention
Index Terms—Anomaly detection, Transformer, explainable vectors are not directly applied to input events. Rather, they
deep learning, context analysis.
are applied over a complex combination of current and his-
I. I NTRODUCTION torical states, which is less interpretable for understanding the
importance of current event [16]. To mitigate this, a one-to-
Available in almost all computer systems, logs are used
one mapping mechanism between attention and context event
to record various events for monitoring, administration, and
should be produced for concrete explanations.
debugging, which provide a good source of information for
In this paper, we propose a deep learning-based explainable
analyzing and identifying anomalies. Since modern IT in-
anomaly detection system, named DeepEAD, which adopts an
frastructure systems continuously generate an overwhelming
attention-equipped encoder-decoder architecture. Specifically,
amount of event logs and attacks are evolving and becoming
a Transformer-based encoder is used to extract context rela-
more complex [1], automated anomaly detectors are usually
tions among different attributes in context events. Then, an
applied to flag potential anomalies. Then, detected events will
attention decoder is designed to decode the context informa-
be handed over to human analysts for further analysis [2].
tion into an attention matrix, which represents the weights for
However, as reported in FireEye M-Trends 2021 [3], the
each context attribute as well as enables explainability. After
median time for organizations to identify incidents by the help
associating context events with attention weights, the event
of anomaly detectors is 24 days, yielding too much time to
decoder predicts the target event. An anomaly is detected if the
attackers for conducting malicious activities. This is due to two
actual event is predicted not to happen. Such a process does not
major weaknesses of existing anomaly detectors: high false-
require anomaly labels and is fully unsupervised. If anomaly
positive rate and lack of explanations in detection results.
labels are available, we further leverage the transfer learning
Although anomaly detectors aim to filter out unlikely sus-
to transform the event prediction model into the anomaly
picious events to reduce the workload of human analysts,
prediction domain. The transferred model is supervised on the
they still generate an excessive number of alerts. Recent
binary prediction task of anomaly detection.
surveys conducted by [4] and [5] show that a typical security
We conduct experimental evaluations on a large-scale real-
operations team receives over 11,000 alerts daily, while less
world dataset collected by the Los Alamos National Lab-
than a fifth of them are actual attacks. Given such a huge
oratory (LANL). Comparison results show that our system
alert volume, only 7% of alerts are investigated by analysts
enables explainability and achieves comparable detection per-
∗ Xinda Wang’s work was conducted during her internship at MERL. formance with state-of-the-art works at the same time. Besides,
event log into a vector that can be input into the DeepEAD
Anomaly Event system. Specifically, for each value in an attribute, we use an
individual numeric value to represent it, so that an event e that
Explainable is composed of m attributes can be represented with a single
Context vector [attr0 , ..., attrm−1 ]. A numeric value is also used to
System Logs DeepEAD Human Analysts
index each different event type. The above event representation
makes our method system-independent since it can be easily
Fig. 1: Workflow of the DeepEAD system. adapted to any formatted logs generated by other systems.
We also discard user-related features (e.g., user, domain,
and computer ID) and retain only event-related features to
we perform a case study to show how attention weights work represent events, which allows our system to handle future
to explain the model behavior in specific cases. event logs without considering unseen out-of-vocabulary users,
II. M ETHODOLOGY domains, or computer IDs.
Since there are usually an overwhelming number of event
Assumption. We use system logs as the input to our anomaly
logs generated by the computer systems, we only need to
detection system. Intuitively, the preceding events can lead to
consider logs providing context for the target event. Otherwise,
the following event and the postceding events can be affected
unrelated events will introduce a large amount of noise to
by the previous event. Further, attackers tend to launch attacks
the detection process. As stated in our assumptions, we sort
with the help of several different accounts and devices, which
system logs according to the timestamp and group them by
causes victim computers become the destination of a series of
the destination device to determine the context for each event.
events. Therefore, for a target event, we assume its context to
Considering that both previous and following events can
be events happening around the same time and destined at the
provide context information, we use a bi-directional sliding
same device.
window to retrieve the context events (after excluding the
Overview. To reduce the workload of human analysts, we target event). For an event ei , we use its npre preceding events
aim at detecting suspicious events from system logs, while and npost postceding events to form a context event sequence
providing context and explainability for anomaly detection. Si = [ei−npre , ..., ei−1 , ei+1 , ..., ei+npost ]. We empirically set
The workflow of the network security system that employs both npre and npost as 10, such that each event has a 20-length
DeepEAD is shown in Fig. 1, where DeepEAD works as context sequence. Also, we limit our search space of context
an intermediate step to filter out the overwhelming number within 24 hours to remove irrelevant events that happen too
of unrelated logs, flag any anomaly event that is a part of early or late. Note that these values can be adjusted for time-
an attack, and provide a necessary explanation to the human sensitive applications. A specific event vector is used to pad if
analysts for further investigation. there are no sufficient preceding or postceding context events.
As illustrated in Fig. 2 at the top of the next page, Deep-
EAD is an attention-equipped encoder-decoder system, where
B. Transformer-Based Context Encoder
DeepEAD takes a sequence of preprocessed context events
as inputs. A Transformer-based context encoder addresses Previous work [18] takes the one-hot encoding for events
complex relations among different attributes in context events as the inputs and uses an RNN to extract context information.
to generate a context matrix. Then, the attention decoder However, as the LANL event log sample shown in the last
decodes the context vector into an attention matrix, which subsection, a practical system log is usually composed of mul-
represents how each attribute in each context event relates to tiple attributes, and these attributes may not be independent.
the target event. Finally, the event decoder associates context Thus, to provide a meaningful context for anomaly detection,
events with an attention matrix through multiplication and uses correlation among multiple attributes in the same and different
a neural network to detect the anomaly through unsupervised context events should be well extracted. To this end, instead
learning and optionally supervised learning. We detail each of using the one-hot encoding, we first adopt an embedding
module in the following. layer to embed context events and get an embedded context
sequence Si′ = [e′i−npre , ..., e′i−1 , e′i+1 , ..., e′i+npost ], where
A. System Log Preprocessing each attribute of each event is encoded into a 128-dimensional
Systems logs record various events for monitoring, ad- vector. To further address complex context relations among
ministration, and debugging. Although logs can be produced multiple attributes from the context sequence, we apply a
from different systems (e.g., operating systems, firewalls, and Transformer-based encoder [19] where the scaled dot product
containers), they are usually well-organized and composed of attention allows any attribute in an event can attend to any
multiple attributes. For instance, authentication logs of the other events and the multi-head attention mechanism provides
LANL internal network [17] consist of timestamp, source multiple different aspects an event attribute can attend to.
user, destination user, source computer, destination computer, Then, an embedded context event sequence is transformed into
authentication type, logon type, authentication orientation, a single fixed-length (e.g., 128 in our case) vector representa-
and success/failure. Such forms enable us to transform each tion context vector ci .
Fig. 2: DeepEAD: An attention-based encoder-decoder architecture.
C. Multi-Attribute Attention Decoder In the first stage, we train an event prediction model whose
output is a probability distribution over all types of events.
The design of the attention decoder is to
After generating the weighted context vector, a series of linear
decode the context vector as an attention matrix
layers and Softmax layers are used to predict not only the event
[[α0,0 , ..., α0,m−1 ], ..., [αn−1,0 , ..., αn−1,m−1 ]], where n
type but also the values for each attribute. Here, our goal is to
denotes the number of events in a context event sequence
enable the model to learn more from each attribute of context
(i.e., npre + npost ) and m represents the number of attributes
events by considering all of the losses in each attribute branch.
in a context event. In such a design, each value in the
Note that we use only the outputs of the main branch (i.e.,
attention matrix corresponds to an attribute in the context
event type) to determine the anomaly.
event and can be used to explain the correlation with the
The main branch will assign a probability for each event
predicted target event.
type. By sorting the probability, we will get the top k events,
To achieve this, we apply the gated recurrent units (GRU)
i.e., the most possible k events given the current context. Then,
layers [20] to generate the attention matrix. Then, a series of
if the actual event falls into the top k events, it will be regarded
linear layers and a Softmax function are used to normalize the
as a normal event. Otherwise, it means that the actual event
sum of these attention values to 1. Therefore, each value in the
should be unlikely to happen, which indicates that an anomaly
attention vector can be regarded as a weight to describe the
is detected. This stage is fully unsupervised since no anomaly
importance of the corresponding attribute for the target event.
label is required for training.
In addition, the sum of attention values in each context event
represents how it contributes to the target prediction. 2) Stage-2: Supervised Detection
After the cold start, some anomaly events may be detected
D. Two-Stage Event Decoder and further confirmed by human analysts so that anomaly
labels become available. Considering the lack of anomaly
The event decoder applies the attention weights of each
samples and huge manual efforts during the anomaly in-
attribute in the context event sequence to the embedded
vestigation, we adopt transfer learning by transforming the
context events by performing matrix multiplication between
context knowledge learned from event prediction to anomaly
the attention matrix and embedded context events, producing
prediction. The rationales of such a design are due to two-fold
a weighted context vector.
similarities. First, some types of events are highly likely to be
We design the rest of this module as a two-stage learning
anomalies. Second, no matter what the prediction goal is, given
process. Since the number of anomalies may be few for a sys-
a series of preceding and postceding events, the specific events
tem at the very beginning and anomaly labels require excessive
composing an activity are determined. For example, given a
human work, we design the first stage to be unsupervised
set of events: i. input the username; ii. check the privacy
which does not require any anomaly labels. In this stage,
policy; iii. follow a Twitter account; iv. enter the password;
we train a model to predict the event given the context, so
and v. enter the one-time verification code, it is easy to identify
that the anomaly is detected if the actual event is predicted
that i, iv, and v belong to a logon activity.
not to happen. In the second stage, we transform the context
To retain such similarity and transfer from event prediction
knowledge learned from the first stage into a supervised model.
to anomaly prediction, we use a small amount of data labeled
With the availability of anomaly labels from the prior detection
with anomaly or normality to fine-tune the model. Specifically,
and human verification, we fine-tune the transferred model
we freeze the context encoder and attention decoder module
with a small number of labeled data. As a result, it can directly
(i.e., retain the parameters) to keep the context information
predict if there is an anomaly or not.
among attributes as well as events and fine-tune the model
1) Stage-1: Unsupervised Detection using labeled data for anomaly prediction.
More detailedly, as shown in Fig. 2 with dotted lines, a set performed on the Day 7 data with 9M normal events and 1
of linear layers are added to concatenate the pre-confidence redteam event and the test is on the Day 8 data with 9M
features of each branch as well as the weighted context normal events and 261 redteam events. Note that, since the
vector after the first linear layer as the inputs. The adoption SOTA work is unsupervised and its adopted training set only
of the latter is inspired by the ResNet [21] to retain more contains 1 anomaly sample that is insufficient for supervised
original information. After the Softmax, the final prediction training, we use only the unsupervised stage of DeepEAD for
results will be expressed in binary, i.e., whether there is an comparison.
anomaly or not. Note that this stage is optional if no anomaly
TABLE I: Comparison results with existing anomaly
labels are available. However, our experimental results show
detection methods [11].
that the additional supervised learning stage can improve the
performance of anomaly detection. Method AUROC Attention Explainability
III. E VALUATION EM 0.932 × ×

BiEM 0.895 × ×
A. Implementation Tiered-EM 0.948 × ×
Tiered-BiEM 0.902 × ×
Inspired by DeepCASE [18], we develop DeepEAD with DeepEAD (stage-1) 0.939 ✓ ✓
extensive extensions to address complex correlation among DeepEAD (stage-2) 0.958 ✓ ✓
multiple attributes of different context events, generate fine-
grained attention, and integrate a two-stage multi-branch clas-
TABLE II: Comparison results with existing attention-based
sifier. In total, we construct DeepEAD with 2K new LoC in
anomaly detection methods [14], [18].
Python and PyTorch.
Method AUROC Explainability Fine Granularity
B. Experimental Setup
×
– – – –
EM-fixed 0.976 ✓
Dataset. We evaluate our DeepEAD on a real-world dataset EM-syntactic 0.975 ✓ ×
collected from LANL’s internal computer network collected EM-semantic1 0.980 ✓ ×
EM-semantic2 0.976 ✓ ×
during 58 consecutive days [17]. The LANL dataset includes DeepCASE 0.920 ✓ ×
large-scale logs from multiple sources, i.e., authentication, DeepEAD 0.971 ✓ ✓
process, network flow, and DNS lookup events, and each
log represents a single event. In the first 30 days of data, Table I shows the comparison results with a series of
some authentication events have been labeled as redteam existing RNN-based anomaly detection methods [11] including
compromise events. Therefore, we perform our experiments simple Long Short-Term Memory (LSTM)-based Event Model
on the authentication logs and use these redteam events as the (EM), Bidirectional EM (BiEM), Tiered-EM, and Tiered-
ground truth labels of anomaly behavior. BiEM. The AUROC of stage-1 in DeepEAD (unsupervised
Runtime Environments. All the experiments are conducted detection) is 0.939, which outperforms all other methods ex-
on a Ubuntu 20.04 server with an Intel i7-7700K CPU running cept Tiered-EM (0.948). Stage-2 further increases the AUROC
at 4.20GHz, 64 GB RAM, and an NVIDIA TITAN Xp GPU. to 0.958. Further, these other methods do not incorporate
The deep learning architecture is built on the NVIDIA CUDA attention mechanisms so they cannot provide explainability.
Toolkit 11.6 and cuDNN v7.6.0. By contrast, in DeepEAD, the attention matrix multiplied with
Evaluation Metrics. We adopt the area under the receiver context events enables DeepEAD to explain the importance of
operating characteristic (AUROC) to show the ability of an each context event for the detection results.
anomaly detection system. The ROC curve plots the true Furthermore, we compare the proposed DeepEAD with
positive (TP) rate against the false positive (FP) rate at SOTA attention-based approaches including EM with fixed,
various threshold settings. For both unsupervised learning and syntactic, and two different semantic attention. As shown
supervised learning (i.e., binary classification), TPs are the in Table II, the AUROC of DeepEAD is comparable with
predictions that are actual redteam events, whereas FPs are attention-based EM models. More importantly, the attention
predictions that are actual non-redteam events. in EM models cannot guarantee concrete explainability. The
reason is that the attention in EM models (e.g., LSTM) is
C. Performance Evaluation applied to a complex combination of multiple inputs (e.g.,
For a fair comparison, we adopt the same training and test previous hidden states). Instead, the attention in DeepEAD
datasets as those of existing works. Specifically, to compare and DeepCASE is directly applied to each simple attribute in
with existing RNN-based works [11], we train our model on the context event as introduced in Section II-C. In addition,
the first 12 days of data with 133M normal events and 316 although DeepEAD adopts a similar attention mechanism, it
redteam events and conduct the test on the following 18 days fails to address the complex information between attributes
of data, composed of 210M normal events and 385 redteam and cannot provide any insights on the importance at attribute
events. We also compare our DeepEAD with a state-of-the-art granularity. Therefore, compared with these works, DeepEAD
(SOTA) work with attention mechanisms [14] and the baseline provides superior one-to-one fine-grained explainability with-
work named DeepCASE [18]), where the training phase is out sacrificing too much detection performance.
case in our test dataset where lighter colors denote higher
attention weights. Each column represents a context event from
left to right in time order. For instance, the first column illus-
Attention Weights (Relevancy)
trates the earliest context event (pre1 ): Unknown Network

LogOff Success. The first four rows are for four attributes
in the system logs. Each cell exhibits the attention weight on
the current attribute value and the sum of attention weights in
the first four rows for each column is shown in the last row
to suggest the importance of each context event. We can find
that nearest context events in this case get higher attention
weights and the authentication type of nearest preceding and
postceding events have the most attention, which aligns with
Context Events the trend of attention statistics depicted in Fig. 3. An Unknown
Network LogOff Success (pre1 ) after three Kerberos
Fig. 3: Average attention weights over different attributes in
Network Logon Success brings valuable context to the
each context event.
target event. Also, although both the nearest preceding and
postceding events are Unknown Network LogOff Success
(pre2, pre3, pre4 ), different attention weights are given: au-
D. Explainability Analysis
thentication type in post1, logon type in pre1, and orientation
Since each value in the attention vector of DeepEAD is type in post1 are more important than the corresponding one in
directly applied to the context event, it enables us to explain post1, pre1, and post1, and the whole post1 event contributes
the relevancy of each context event and attribute with the more than pre1 event during the decision making.
prediction. In this subsection, we analyze the explainability Even though more attention weights are more likely given
of our models from two perspectives: i) the general model to nearest context events, they learn to differ for different
behavior by analyzing the attention weights for all testing cases. Fig. 5 presents an anomaly case where the fifth/sixth
logs; and ii) the specific decision of the model by studying postceding event and fourth/fifth preceding event are most
the attention weights in individual cases. relevant with the predicted events while the nearest preceding
1) Explainability in General: To get an overview of the event is not that important like general ones. This may be
model behavior during the prediction, we consider the statistics because there are repeated NTLM Network LogOn Success
of all attention weights over the test samples in 18 days of and Unknown Network LogOff Success around the target
data. In Fig. 3, the solid lines show the attention weights for event. When the occurrence of such a pattern exceeds a
each context event (including 10 preceding and 10 postceding threshold (e.g., twice in this case), it is brought to the attention
events listed in chronological order) and the shadows denote of the model. Among all attributes in this case, orientation
the standard deviation. In particular, each attribute (i.e., au- types usually gain more attention weights but some specific
thentication type, logon type, authentication orientation, and values (e.g., logon type Network in post1 ) can also receive
success/failure) is represented with different colors. high attention, which indicates that they contribute more to
In general, the context events that happen near the target the prediction of the target event.
event in time (e.g., the nearest preceding/postceding events
pre1 and post1 ) are shown to provide more attention than oth- IV. R ELATED W ORKS
ers, which is consistent with our assumption that nearest events Machine learning has been widely used for detecting
offer more context information and are more relevant to the anomalies to deal with the huge amount of log data generated
target event. Among different attributes, authentication types by modern systems and complicated contexts among them.
in neighboring context events are more important than others. A series of RNN-based systems [9]–[11] are proposed to
We can also see that the attention weights in authentication predict future events from the previous log sequence. Similar
orientation and success/failure of the neighboring preceding to our stage-1, they determine anomaly if a low probability
events (e.g., pre1 ) are higher than neighboring postceding is assigned to the ground truth event. The authors in [12]
events (e.g., post1 ) and those in logon type are quite the and [13] conclude the context relation among different logs
contrary. This indicates that the orientation and success/failure with authentication or heterogeneous graphs and then apply
of most preceding events contribute more to the target events a traditional logistic regression or clustering algorithm. Their
and the logon type of the next postceding events are more methods rely on pre-defined rules to construct the graph, which
affected by the target event. is hard to adapt to other system logs.
2) Explainability in Individual Cases: Attention weights ALEAP [22] is one of the earlier works that incorporate
can help explain the importance of corresponding attributes attention mechanism into LSTM-based event prediction for
and context during the model prediction. We use the heatmap anomaly detection, but it does not leverage attention for the
to visualize the attention weights over attributes of different explanation. Further, the work of [14] and [15] tries to use
context events. Fig. 4 shows a heatmap for a real-world normal attention weights to explain the prediction behaviors. However,
Fig. 4: One example of the heatmap showing attention weights on a normal case.
Fig. 5: One example of the heatmap showing attention weights on an anomaly case.
these attentions are applied over a complex combination of [6] DEMISTO, “The State of SOAR Report, 2018.”
multiple inputs, e.g., hidden states, not each attribute to- https://start.paloaltonetworks.com/the-state-of-soar-report-2018.
[7] L. Bilge, Y. Han, and M. Dell’Amico, “Riskteller: Predicting the risk of
ken, whose explainability is controversial [16], [23]. While cyber incidents,” in Proc. of the ACM SIGSAC conf. on computer and
DeepCASE [18] mitigates this by applying a 1-to-1 map commun. security, pp. 1299–1311, 2017.
between attention and context event, it assumes inputs as well- [8] Y. Liu et al., “Cloudy with a chance of breach: Forecasting cyber security
incidents,” in 24th USENIX Security Symposium (USENIX Security 15),
represented events and does not consider complex relations pp. 1009–1024, 2015.
among multiple attributes of system logs. [9] M. Du et al., “Deeplog: Anomaly detection and diagnosis from system
logs through deep learning,” in Proc. of ACM SIGSAC conf. on computer
V. C ONCLUSION and commun. security, pp. 1285–1298, 2017.
[10] Y. Shen, E. Mariconti, P. A. Vervier, and G. Stringhini, “Tiresias:
In this work, we have presented DeepEAD, an attention- Predicting security events through deep learning,” in Proc. of ACM
equipped encoder-decoder architecture for explainable SIGSAC Conf. on Computer and Commun. Security, pp. 592–605, 2018.
anomaly detection from system logs. A Transformer-based [11] A. R. Tuor et al., “Recurrent neural network language models for open
vocabulary event-level cyber anomaly detection,” in Workshops at the
encoder is adopted to address complex relations among thirty-second AAAI conf. on artificial intelligence, 2018.
attributes in multiple context events. A multi-attribute [12] F. Liu et al., “Log2vec: A heterogeneous graph embedding based
attention decoder is designed to generate fine-grained approach for detecting cyber threats within enterprise,” in Proc. of the
ACM SIGSAC Conf. on Computer and Commun. Security, pp. 1777–
attention weights so as to enable concrete explainability for 1794, 2019.
each context event attribute. During the cold start, we have [13] B. Bowman, C. Laprade, Y. Ji, and H. H. Huang, “Detecting lateral
applied an unsupervised learning-based event decoder for movement in enterprise computer networks with unsupervised graph
AI,” in 23rd Int. Symp. on Research in Attacks, Intrusions and Defenses
event prediction. An anomaly will be detected if the predicted (RAID 2020), pp. 257–268, 2020.
event does not happen. When anomaly labels are available, [14] A. Brown et al., “Recurrent neural network attention mechanisms for
we apply transfer learning to fine-tune a binary classifier for interpretable system log anomaly detection,” in Proc. of the First
Workshop on Machine Learning for Computing Systems, pp. 1–8, 2018.
anomaly prediction. [15] A. Patil et al., “Explainable LSTM model for anomaly detection in
Experimental evaluation on a large-scale real-world dataset HDFS log file using layerwise relevance propagation,” in 2019 IEEE
has shown that the DeepEAD achieves comparable perfor- Bombay Section Signature Conf. (IBSSC), pp. 1–6, IEEE, 2019.
[16] S. Jain and B. C. Wallace, “Attention is not explanation,” arXiv preprint
mance with state-of-the-art works. Additionally, the explain- arXiv:1902.10186, 2019.
ability analysis on context events has demonstrated the effec- [17] A. D. Kent, “Cybersecurity Data Sources for Dynamic Network Re-
tiveness of the DeepEAD to facilitate human investigation. search,” in Dynamic Networks in Cybersecurity, Imperial College Press,
2015.
[18] T. van Ede et al., “Deepcase: Semi-supervised contextual analysis of
R EFERENCES security events,” IEEE Security and Privacy, 2022.
[1] M. Du et al., “Lifelong anomaly detection through unlearning,” in [19] A. Vaswani et al., “Attention is all you need,” Advances in neural
Proc. of the ACM SIGSAC Conf. on Computer and Commun. Security, information processing systems, vol. 30, 2017.
pp. 1283–1297, 2019. [20] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of
[2] F. B. Kokulu et al., “Matched and mismatched socs: A qualitative study gated recurrent neural networks on sequence modeling,” arXiv preprint
on security operations center issues,” in Proc. of the ACM SIGSAC Conf. arXiv:1412.3555, 2014.
on Computer and Commun. Security, pp. 1955–1970, 2019. [21] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
[3] FireEye, “M-Trends 2021: Cyber Security Insights.” recognition,” in Proc. of the IEEE conf. on computer vision and pattern
https://vision.fireeye.com/editions/11/11-m-trends.html. recognition, pp. 770–778, 2016.
[4] D3 Security, “The Time for SOAR is Now.” [22] S. Fan et al., “Aleap: Attention-based lstm with event embedding for
https://d3security.com/blog/the-time-for-soar-is-now/. attack projection,” in Int. Performance Computing and Commun. Conf.
[5] Redscan, “Overcoming cyber security alert fatigue.” (IPCCC), pp. 1–8, IEEE, 2019.
https://www.redscan.com/news/overcoming-cyber-security-alert- [23] S. Wiegreffe and Y. Pinter, “Attention is not not explanation,” arXiv
fatigue/. preprint arXiv:1908.04626, 2019.

DeepEAD

Uploaded by

Copyright:

Available Formats

DeepEAD

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DeepEAD

Uploaded by

Copyright:

Available Formats

MITSUBISHI ELECTRIC RESEARCH LABORATORIES

DeepEAD: Explainable Anomaly Detection from System

IEEE International Conference on Communications (ICC) 2023

Mitsubishi Electric Research Laboratories, Inc.

III. E VALUATION EM 0.932 × ×

trates the earliest context event (pre1 ): Unknown Network

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.