Complex Event
Complex Event
Complex Event
net/publication/343994577
CITATIONS READS
0 713
1 author:
Ariane Ziehn
Deutsches Forschungszentrum für Künstliche Intelligenz
6 PUBLICATIONS 12 CITATIONS
SEE PROFILE
All content following this page was uploaded by Ariane Ziehn on 30 September 2020.
Ariane Ziehn
Supervised by Volker Markl and Steffen Zeuch
Technical University of Berlin,
German Research Center for Artificial Intelligence
ariane.ziehn@dfki.de
cloud
cloud
real-time decision making in data management systems. To-
day, applications leverage CEP only in cloud-based envi-
ronments to provide prompt reactions, although data are 4
fog
Things (IoT) will increase the number of data producers 1
3
sensor network
sensor network
processing introduces a critical bottleneck in current cloud-
based solutions, especially for delay-sensitive IoT applica-
tions. To overcome this bottleneck, fog computing emerged
as a paradigm to process data close to the network edge. (a) (b)
However, CEP systems are not yet ready to leverage the fog Figure 1: Data Flows in (a) Cloud Environments and (b)
layer as an extension for cloud-based stream processing. In Fog-Cloud Environments.
this paper, we examine how the current system has to adapt
to exploit the new capabilities of the fog for CEP. To this
end, we analyze principal CEP methodologies and propose devices centrally 1 . Usually, the cloud environment uses a
new solutions. With this work, we lay the foundation for data broker like Kafka for buffering data. Then, the cloud-
large-scale in-network CEP applications on top of the IoT. based SPE utilizes its almost unlimited resources to process
the collected data in the cloud 2 . However, the central data
1. INTRODUCTION collection before processing is highly resource-intensive and
causes significant delays as well as network overhead. Thus,
Complex Event Processing (CEP) is a common method
cloud environments introduce a critical bottleneck for future
for real-time stream processing to detect sequences of events
IoT data management systems that must handle millions of
in data streams and triggers actions upon detection [7, 14,
data streams [15, 22]. On the right-hand side, the fog-cloud
16]. User-defined rules specify both the events and the ac-
environment introduces an additional fog layer with a typ-
tions that enable autonomous real-time decision making in
ical tree-like network topology 3 . The fog nodes M pro-
a wide range of applications, e.g., traffic congestion moni-
toring, live maps, intelligent transportation systems, smart cess and reduce data on the paths through the network 4 .
street lamps, or vehicle pollution control [1, 8, 24]. Sev- Hence, they mitigate the central bottleneck and release net-
eral cloud-based stream processing engines (SPEs) [4, 10, work capacities for low-latency real-time responses.
21] provide CEP for rule-based monitoring as the current We argue that future IoT applications require a fog layer
solution for the Internet of Things (IoT) scenarios with low- to process millions of data streams efficiently. Therefore,
latency real-time response requirements. Under considera- it is crucial to leverage fog environments for CEP and its
tion of the constant increase of IoT devices, future IoT appli- capabilities to enable future IoT applications with millions of
cations will process data from potentially millions of devices. devices and thousands of rules. However, fog environments
Thus, cloud-based solutions are not capable of fulfilling the introduce new challenges for CEP, e.g., stateful in-network
low-latency real-time response requirements due to the mas- processing on low-end devices, changing network topologies,
sive amount of data generated in these future IoT applica- and mobile IoT devices. We investigate how to tackle these
tions [12, 22]. Zeuch et al. [22] address this problem and challenges in order to leverage the fog layer efficiently for
propose a fog-cloud environment that leverages a fog layer CEP. In this paper, we make the following contributions to
as extension of the cloud. Figure 1 presents the data flow in enable the IoT for CEP:
both (a) cloud and (b) fog-cloud environment. On the left- • We analyze state-of-the-art CEP systems and identify
hand side, the cloud environment collects data of connected the major limitations to leverage the IoT as a mean
for large-scale, in-network CEP.
Proceedings of the VLDB 2020 PhD Workshop, August 31st, 2020. Tokyo,
Japan. Copyright (C) 2020 for this paper by its authors. Copying permitted • We highlight three concrete problems and sketch pos-
for private and academic purposes. sible solutions to enable the IoT for CEP.
full match
• We outline possible improvements to our approaches, ignore: ignore: ignore: T1,T2,T3
which extend state-of-the-art CEP processing in the T2,T3 T1,T3 T1,T2
vents CEP from leveraging the IoT and outline three con- (a) NFA.
crete problems in Section 3. In Section 4, we summarize (b) Tree.
related work and conclude our findings in Section 5. Figure 2: Evaluations Models for Pattern Detection: NFA
(a) and Tree Structure (b).
2. RESEARCH CONTEXT
In this section, we introduce the concepts of rules and event types Tn , including a condition cTn for each type. We
event patterns. Afterward, we show the state-of-the-art pat- use SASE+ [23] to specify the pattern in Listing 1.
tern evaluation and optimization mechanism for CEP.
Rules: Rules build the knowledge base of the CEP engine Listing 1: Vehicle Pollution Control Example
and are used in active database systems for autonomous re- PATTERN SEQ (T1 e1 , T2 e2 , T3 e3 )
actions on data manipulations. For this purpose, the user WHERE ( e1 . vehicleCount > cT1 AND
specifies rules with up to three components, i.e., event e, e2 . speedLevel < cT2 AND
condition c, and action a, which lead to the name ECA- e3 . po llutionL evel > cT3 )
WITHIN TIME . MINUTES (30)
rules. Each tuple manipulation represents an event e that
might trigger the user-desired action a if e matches the Pattern Evaluation Mechanisms: To detect match-
relation-specific or inter-event conditions c. Actions are ing events, SPEs create a pattern detection plan given the
application-specific and can be user notifications such as internal pattern representation. Common detection plans
warnings and alerts or autonomous system actions such as are tree-based plans [17], order-based plans with state ma-
updating attribute values or triggering other rules [18]. chines [23], e.g., non-deterministic finite automaton (NFA),
Data Streams: We want to detect those rules in data as well as event processing networks and graphs [6, 11]. We
streams, where each data stream E is a continuous and un- focus on the research lines with the highest number of repre-
bounded flow of data tuples. Each generated tuple is an sentatives, tree-based and order-based evaluation plans [9].
event e that represents the state of its producer, e.g., an IoT Naive approaches of both lines represent the pattern iden-
device, at a specific point in time ts. Due to the increasing tical to the user-given formulation and create one instance
amount of geographically distributed IoT devices, the device of the evaluation structure for each detected event. The de-
location s is another essential tuple attribute. To sum up, tection of a complex pattern subsequence is denoted as a
each event source produces a stream E of events en with its partial match, while the detection of a complete pattern is
set of attributes E = {ts, s, a1 , ..., an }, where ts and s are called a full match [13]. The example evaluation structures
spatiotemporal attributes, and an are non-spatiotemporal of an NFA in Figure 2a represents each partial match in
attributes, e.g., measurements and identifiers [3]. one state qn and a full pattern match in the final state F .
Event Patterns: In SPEs, users define so-called simple The tree-based mechanism creates a leaf for each simple pat-
event patterns [4] as a complement to traditional ECA-rules. tern (Figure 2b). Each intermediate node presents partial
A simple pattern is a he, ci pair for one event type T . Event matches and the root a full match.
types are the replacement of relations in stream processing CEP belongs to the group of stateful query processing
and provide a uniform schema T = {ts, s, a1 , ..., an } for a methods as SPEs need to store all partial matches either
group of contributing streams En [20, 24]. For instance, all until the next partial match is detected or the window is
sensors that measure the temperature at different locations expired. The number of partial matches is influenced by
contribute to the type Ttemp . All arriving events are mon- many factors, e.g., query selectivity or event frequency, but
itored with the event type-specific conditions cTn , and an worst-case scenarios have exponential growth [14]. There-
action a is triggered if a matching event is detected. fore, pattern detection plan optimization aims to reduce the
Pattern Specification Language: The user formulates number of partial matches, e.g., by rewriting of single pat-
complex patterns of monitoring tasks as a composition of terns or sharing techniques for multi-pattern CEP [14].
simple patterns from different event types. The relation- An effective rewriting method is called Lazy NFA [13]. It
ships between simple patterns are defined by event opera- processes the pattern out-of-order by putting rare events in
tors, which are either logical, e.g., AN D, OR, or temporal, front of the pattern sequence. Frequent events are buffered
e.g., the window operator W IT HIN or the sequence oper- and only analyzed after the rare event has been detected.
ator SEQ, which defines the order of simple patterns [20]. A new line of research focuses on the translation of pat-
A wide range of pattern languages with different sets of event terns into multi-join queries. Thereby, patterns can be eval-
operators exists, e.g., SQL-like languages such as SASE+ [20, uated as stream queries and leverage existing join query op-
23] and CCL [24], or languages based on event logic, e.g., timizations [13]. For instance, we could rewrite the example
CEL [6]. An example pattern for a vehicle pollution control pattern from Listing 1 (excluding W IT HIN ) by replacing
application can be formulated as follows: Detect if within the SEQ operator with the AN D operator and adding ad-
30 minutes an increased amount of vehicles (T1 ) is followed ditional inter-event constraints, as shown in Listing 2 [13,
by a decrease of the average speed (T2 ), which leads to an 24]. By replacing the temporal operator with a logical al-
excess of the air pollution level (T3 ). The example defines ternative, the pattern can be translated into a join query
a sequence of three consecutive simple patterns for different and profit from join-query optimizations.
Listing 2: CEP Traffic Pattern Example where all data is centrally available, can be evaluated by
PATTERN AND (T1 e1 , T2 e2 ,T3 e3 ) accuracy metrics [9]. To this end, we can identify promising
WHERE ( e1 . vehicleCount > conditionT1 AND in-network evaluation mechanism for fog environments and
e2 . speedLevel < cT2 AND optimize them further using our evaluation results in the
e3 . pol lutionL evel > conditionT3 AND following step of our research agenda.
e1 . ts < e2 . ts AND
e2 . ts < e3 . ts )
3.2 Optimization of Evaluation Mechanisms
Storing and maintaining large amounts of partial matches
3. LEVERAGE IOT FOR CEP is already a significant challenge in cloud environments with
As opposed to fog-cloud environments, state-of-the-art almost unlimited resources capacities. For fog environments,
SPEs process a global union of all sources En as one large this challenge is even more critical as they contain low-end
stream. This processing strategy causes delays for mod- devices with limited capabilities but need to deal with par-
ern IoT applications because it enforces the central data tial matches of possibly hundreds of patterns. The hetero-
collection from millions of sensors before processing. Fog- geneous hardware in fog environments lead to the second
cloud environments allow the processing of individual sen- problem we want to tackle:
sor streams E or subsets of event types T close to the data Problem II: Cloud-based optimization techniques do not
producers in the fog layer. Thus, this layer allows data re- consider the limitations and challenges of unreliable and mov-
duction of more than 80% for stream queries, which reduces ing low-end devices and dynamic network topology.
network traffic and enables the system to handle the data Both optimization techniques, rewriting of single patterns,
from millions of devices with low-latency [22]. and sharing techniques for multi patterns (Sec. 2), aim to
Research Goal: We aim to leverage fog environments improve the pattern detection plan in order to reduce par-
for CEP and provide a solution that fulfills the low-latency tial matches. Cloud-based implementations of these strate-
and real-time response requirements of future IoT applica- gies make assumptions that do not hold in fog environ-
tions. To this end, we investigate the core features of CEP: ments. First, cloud-based SPEs examine the NP-complete
pattern evaluation mechanisms, their optimization, and pat- problem of calculating one optimal pattern detection plan
tern specification languages. for their static network [7, 13]. In a dynamically chang-
ing network, the existing solutions would cause the expen-
3.1 Pattern Evaluation Mechanisms sive re-computation of new optimal plans after each topol-
Efficient CEP requires a high-performance pattern eval- ogy change. Second, existing distributable solutions rarely
uation mechanism. Currently, available evaluation mecha- consider heterogeneous hardware and resource limitations of
nisms optimized for cloud-based environments could be ap- low-end devices for distribution strategies [2, 19]. Third, to
plied to fog-cloud environments, yet without leveraging the enable the rewriting technique out-of-order pattern detec-
fog layer, bottlenecks of cloud solutions would remain. Thus, tion, we need to store event buffers for retrospective pattern
the first problem we want to tackle in this work is: evaluation. Storing potentially high frequent events from
Problem I: Common cloud-based pattern evaluation mech- hundred of producers challenges again the capacity limits of
anisms use a central component for data processing, pattern low-end devices and requires data compression.
detection monitoring, or both. This central component pre- Solution Sketch: With this work, we want to iden-
vents leveraging a fog environment without additional in- tify optimization techniques for distributed pattern evalu-
network distribution strategies. ation on mobile low-end devices for hundreds of concur-
Opposed to the cloud paradigm, fog environments allow us rently running patterns. First, we intend to prevent the NP-
to tailor the data to the relevant only on the paths through complete problem of finding one optimal solution and inves-
the network. As data is only shared with nodes on the net- tigate strategies that identify sets of possible near-optimal
work path, the pattern detection plan needs to be aware of pattern detection plans. These plans can be used as avail-
the fog nodes that receive the relevant data to execute sub- able fall-backs in case of topology changes. Second, in a
plans. Further, by running sub-plans on fog nodes, we need fog-cloud environment, we can leverage the cloud as coor-
to consider that CEP is a stateful processing method that dinator for pattern maintenance and distribution. By dis-
needs to store partial matches on low-end devices. tributing stateful computation tasks of pattern evaluation
Solution Sketch: We intend to identify promising eval- to potentially mobile devices, an additional challenge ap-
uation mechanisms for distributed pattern detection and pears: pattern evaluation might not be possible due to data
bring them together with the fog paradigm and distribution producers. In this case, the user must be informed that
strategies. Since no general pattern evaluation mechanism currently, either monitoring is not possible or the results
with explicit performance guarantees exits, the selection of are probabilistic, e.g., derived from nearby sensors. Third,
one research line for fog environments is not straight forward we want to investigate efficient compression techniques, e.g.,
and requires an experimental evaluation. To this end, we partial aggregations [5], for event buffers on low-end devices
consider all three approaches, order-based, tree-based, and to enable out-of-order pattern detection. Furthermore, the
pattern translation into multi-join queries (Sec. 2), as possi- translation of patterns into multi-join queries enables an-
ble candidates. As the next step of our research agenda, we other potential optimization strategy. In essence, we can
intend to implement a naive distributed solution for each of combine both stream queries and pattern detection in one
the three pattern evaluation mechanisms, including the nec- engine for optimization and maximize results sharing in fog-
essary adaptions to leverage the fog layer. Afterward, we cloud environments. To this end, we leverage traditional
can compare our implementations using stream processing CEP optimization strategies for low-end devices in a dy-
metrics, e.g., forward delays for matches. Additionally, the namic network and herewith enable efficient CEP for mil-
accuracy of our result in comparison with cloud solutions, lions of devices and thousands of patterns.
3.3 Pattern Specification Language tor placement for non-hierarchical SPE. Multi-sink operator
Similar to evaluation mechanisms, no general pattern spec- placement is a general stream processing problem and thus
ification language exists, so no comprehensive set of event a complementing feature of our solution.
operators is available. Further, some languages lack formal
semantics, provide limited expressiveness, or prevent auto- 5. CONCLUSION
mated optimization [6, 19, 20]. Besides, these languages are In this paper, we introduce and motivate our goal to en-
designed and optimized for single machines or cloud appli- able efficient CEP for IoT data management systems in fog-
cations without considering future IoT applications. Thus, cloud environments. We review the state-of-the-art solu-
the third problem we want to tackle is: tions, identify their problems to leverage the fog layer, and
Problem III: Existing specification languages lack essen- suggest possible solutions. Further, we propose the follow-
tial event operators because they were initially not intended ing three steps to reach our goal: (I) identify and evaluate
for IoT applications. Further, many of them introduce re- appropriate in-network pattern evaluation mechanisms for
strictions that negatively impact efficient distributed CEP fog-cloud environments, (II) leverage and adapt optimiza-
optimizations. tion techniques for these mechanisms that fit the properties
We want to enable the formulation of complex patterns of low-end devices, and (III) build a pattern specification
for IoT applications by identifying an easy-to-use specifica- language for the IoT with CEP operators that leverage the
tion language with the necessary set of event operators. For fog layer. Next, we focus on Problem (I) and the imple-
Problem I and II, we focus on the most common set of event mentation of evaluation mechanisms, including adaptions,
operators, i.e., AN D, OR, N OT , SEQ [6, 13] and extend to leverage the fog layer. Then, we use this baseline to op-
it in this step of our research agenda. timize our solution.
Solution Sketch: Giatrakos et al. [11] reviewed the pat-
tern specification languages of several CEP systems accord- 6. ACKNOWLEDGMENTS
ing to their expressiveness, including the Big Data SPE
Apache Flink [4]. In contrast to other SPEs such as Apache This work was partly supported by the German Federal
Storm [10] or Spark [21], Flink provides built-in support for Ministry for Economic Affairs and Energy (BMWi) through
CEP [11] and additional operators compared to traditional the KI-SIGS – KI-Space for intelligent health systems (grant
single-machine CEP systems. However, Flink does not offer no. 01MK20012P). Furthermore, we thank Holmer Hemsen
a specification language but provides an API with low-level for the valuable input and discussions.
functions. To this end, we use its pattern API as a baseline
for our operator set and investigate how these operators can 7. REFERENCES
leverage a fog layer. Further, we build a pattern specification [1] A. Ahmed, H. Arkian, D. Battulga, and et al. Fog computing applications:
Taxonomy and requirements. arXiv preprint:1907.11621, 2019.
language on top of this operator set under the consideration [2] M. Akdere, U. Çetintemel, and N. Tatbul. Plan-based complex event
detection across distributed sources. VLDB Endowment, pages 66–77, 2008.
of other leading languages, e.g., ZStream [17]. [3] S. Akili. On the need for distributed complex event processing with
multiple sinks. In DEBS, pages 248–249, 2019.
[4] A. Alexandrov, R. Bergmann, S. Ewen, and et al. The stratosphere
4. RELATED WORK platform for big data analytics. The VLDB Journal, pages 939–964, 2014.
[5] L. Benson, P. M. Grulich, S. Zeuch, and et al. Disco: Efficient distributed
In this section, we summarize the state of the art of related window aggregation. In EDBT, 2020.
[6] M. Bucchi, A. Grez, C. Riveros, and M. Ugarte. Foundations of complex
work and highlight the major differences to our approach. event processing. arXiv preprint:1709.05369, 2017.
[7] J. Chen, L. Ramaswamy, D. K. Lowenthal, and et al. Comet:
CEP Optimization: Kolchinsky and Schuster [13] proved Decentralized complex event detection in mobile delay tolerant networks.
that the pattern detection plan could be translated into In IEEE, pages 131–136, 2012.
[8] W. Fengjuan, Z. Xiaoming, and et al. The research on complex event
multi-join queries and leverage join-query optimization. We processing method of internet of things. In ICMTMA, pages 1219–1222.
IEEE, 2013.
utilize this result and consider multi-join queries as one pos- [9] I. Flouris, N. Giatrakos, and et al. Issues in complex event processing:
sible evaluation mechanism for fog-cloud environments. An- Status and prospects in the big data era. JSS, pages 217–236, 2017.
[10] A. S. Foundation. Apache storm, 2012. Accessed January
other novel approach proposed by Kolchinsky and Schus- 2020: https://storm.apache.org/.
ter [14] is the combination of rewriting and prefix sharing to [11] N. Giatrakos, E. Alevizos, A. Artikis, and et al. Complex event recognition
in the big data era: a survey. The VLDB Journal, pages 313–352, 2020.
optimize multi-pattern CEP. We intend to leverage a subset [12] M. Hung. Leading the iot, gartner insights on how to lead in a connected
world. Gartner Research, pages 1–29, 2017.
of these techniques for our approach. [13] I. Kolchinsky and A. Schuster. Join query optimization techniques for
In-network CEP: Madumal et al. [16] proposed a tree- complex event processing applications. VLDB, pages 1332–1345, 2018.
[14] I. Kolchinsky and A. Schuster. Real-time multi-pattern detection over
based pattern evaluation approach for CEP with a rule en- event streams. In MOD, pages 589–606. ACM, 2019.
gine that schedules the events either to a local CEP engine [15] J. Lin, W. Yu, N. Zhang, and et al. A survey on internet of things:
Architecture, enabling technologies, security and privacy, and
of a fog node or the Cloud CEP engine. As opposed to applications. IoT, pages 1125–1142, 2017.
[16] M. P. Madumal and et al. Adaptive event tree-based hybrid cep
their approach, we focus on pattern evaluation in multiple computational model for fog computing architecture. In ICTer. IEEE, 2016.
nodes to leverage the tree-like topology of fog environments. [17] Y. Mei and S. Madden. Zstream: a cost-based query processor for
adaptively detecting composite events. In SIGMOD, pages 193–206, 2009.
Comet [7] is a decentralized ordered-based CEP approach for [18] N. W. Paton and O. Dı́az. Active database systems. ACM CSUR, pages
63–103, 1999.
delay-tolerant networks. Akdere et al. [2] propose network- [19] N. P. Schultz-Møller, M. Migliavacca, and P. Pietzuch. Distributed
aware distribution strategies managed by a central control complex event processing with query rewriting. In DEBS, pages 1–12, 2009.
[20] E. Wu, Y. Diao, and S. Rizvi. High-performance complex event processing
instance. Parts of both the approaches mentioned above over streams. In SIGMOD, pages 407–418, 2006.
can be reused for our CEP implementation. Nevertheless, in [21] M. Zaharia, R. S. Xin, P. Wendell, and et al. Apache spark: a unified
engine for big data processing. ACM, pages 56–65, 2016.
contrast to both, we focus on a solution that considers the [22] S. Zeuch, A. Chaudhary, B. Del Monte, and et al. The nebulastream
platform: Data and application management for the internet of things.
limitations of low-end devices in a dynamic environment. CIDER, 2020.
Akili [3] motivated the need for decentralized CEP and pro- [23] H. Zhang, Y. Diao, and et al. On complexity and optimization of expensive
queries in complex event processing. In SIGMOD, pages 217–228, 2014.
posed a tree-based approach for efficient multi-sink opera- [24] S. Zhang, H. T. Vo, and et al. Multi-query optimization for complex event
processing in sap esp. In ICDE, pages 1213–1224. IEEE, 2017.