Service Oriented Computing
Service Oriented Computing
Ghose
Services Science Grace A. Lewis Sami Bhiri (Eds.)
Service-Oriented
LNCS 8831
Computing
12th International Conference, ICSOC 2014
Paris, France, November 3–6, 2014
Proceedings
123
Lecture Notes in Computer Science 8831
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David Hutchison
Lancaster University, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Switzerland
John C. Mitchell
Stanford University, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbruecken, Germany
Xavier Franch Aditya K. Ghose
Grace A. Lewis Sami Bhiri (Eds.)
Service-Oriented
Computing
12th International Conference, ICSOC 2014
Paris, France, November 3-6, 2014
Proceedings
13
Volume Editors
Xavier Franch
Universitat Politècnica de Catalunya
UPC - Campus Nord, Omega 122, c/Jordi Girona 1-3
08034 Barcelona, Spain
E-mail: franch@essi.upc.edu
Aditya K. Ghose
University of Wollongong
School of Computer Science and Software Engineering
Wollongong, NSW 2522, Australia
E-mail: aditya@uow.edu.au
Grace A. Lewis
Carnegie Mellon Software Engineering Institute
4500 Fifth Ave., Pittsburgh, PA 15213, USA
E-mail: glewis@sei.cmu.edu
Sami Bhiri
Télécom SudParis
9 rue Charles Fourier, 91011 Evry Cedex, France
E-mail: sami.bhiri@gmail.com
General Chair
Samir Tata Télécom SudParis, France
Advisory Board
Paco Curbera IBM Research, USA
Paolo Traverso ITC-IRST, Italy
Program Chairs
Xavier Franch Universitat Politècnica de Catalunya, Spain
Aditya K. Ghose University of Wollongong, Australia
Grace A. Lewis Carnegie Mellon Software Engineering
Institute, USA
Workshop Chairs
Daniela Grigori University of Paris Dauphine, France
Barbara Pernici Politecnico di Milano, Italy
Farouk Toumani Blaise Pascal University, France
Demonstration Chairs
Brian Blake University of Miami, USA
Olivier Perrin University of Lorraine, France
Iman Saleh Moustafa University of Miami, USA
Panel Chairs
Marlon Dumas University of Tartu, Estonia
Henderik A. Proper Henri Tudor Center, Luxembourg
Hong-Linh Truong Vienna University of Technology, Austria
VIII Organization
Publicity Chairs
Kais Klai University of Paris 13, France
Hanan Lutfiyya University of Western Ontario, Canada
ZhangBing Zhou China University of Geosciences, China
Publication Chair
Sami Bhiri Télécom SudParis, France
Web Chairs
Chan Nguyen Ngoc LORIA, France
Mohamed Sellami Ecole des Mines de Nantes, France
Program Committee
Rafael Accorsi University of Freiburg, Germany
Rama Akkiraju IBM, USA
Alvaro Arenas Instituto de Empresa Business School, Spain
Ebrahim Bagheri Athabasca University, Canada
Luciano Baresi Politecnico di Milano, Italy
Alistair Barros Queensland University of Technology, Australia
Khalid Belhajjame Paris Dauphine University, LAMSADE, France
Salima Benbernou Paris Descartes University, France
Sami Bhiri Télécom SudParis, France
Domenico Bianculli University of Luxembourg, Luxembourg
Walter Binder University of Lugano, Switzerland
Omar Boucelma University of Aix-Marseille, France
Ivona Brandic Vienna University of Technology, Austria
Christoph Bussler Tropo Inc., USA
Manuel Carro IMDEA Software Institute and Technical
University of Madrid, Spain
Wing-Kwong Chan City University of Hong Kong, Hong Kong
Shiping Chen CSIRO ICT, Australia
Lawrence Chung University of Texas at Dallas, USA
Florian Daniel University of Trento, Italy
Shuiguang Deng Zhejiang University, China
Khalil Drira LAAS-CNRS, France
Abdelkarim Erradi Qatar University, Qatar
Rik Eshuis Eindhoven University of Technology,
The Netherlands
Marcelo Fantinato University of Sao Paulo, Brazil
Marie-Christine Fauvet University of Joseph Fourier, France
Joao E. Ferreira University of Sao Paulo, Brazil
Walid Gaaloul Télécom SudParis, France
G.R. Gangadharan IDRBT, India
Dragan Gasevic Athabasca University, Canada
Paolo Giorgini University of Trento, Italy
Claude Godart University of Lorraine, France
Mohamed Graiet ISIMM, Tunisia
Sven Graupner Hewlett-Packard, USA
Daniela Grigori Paris Dauphine University, France
X Organization
External Reviewers
Imene Abdennadher LAAS-CNRS, Toulouse, France
Husain Aljafer Wayne State University, USA
Nariman Ammar Wayne State University, USA
Mohsen Asadi Simon Fraser University, Canada
Nour Assy Télécom SudParis, France
Yacine Aydi University of Sfax, Tunisia
Fatma Basak Aydemir University of Trento, Italy
George Baryannis University of Crete, Greece
Mahdi Bennara University of Lyon, France
Lubomir Bulej University of Lugano, Switzerland
Mariam Chaabane University of Sfax, Tunisia
Wassim Derguech NUI, Galway, Ireland
Raffael Dzikowski Humboldt University of Berlin, Germany
Soodeh Farokhi Vienna University of Technology, Austria
Pablo Fernández University of Seville, Spain
José Marı́a Garcı́a University of Innsbruck, Austria
Feng Gao NUI, Galway, Ireland
Amal Gassara University of Sfax, Tunisia
Leopoldo Gomez University of Guadalajara, Mexico
Genady Grabarnik St. John’s University, USA
Gregor Grambow Ulm University, Germany
Khayyam Hashmi Wayne State University, USA
Dragan Ivanovic IMDEA Software Institute, Spain
Nesrine Khabou University of Sfax, Tunisia
Fayez Khazalah Wayne State University, USA
XII Organization
Joseph Sifakis
François Bancilhon
Data Publica
francois.bancilhon@data-publica.com
Abstract. Data science is now fashionable and the search for data sci-
entists is a new challenge for headhunters. Even though both terms are
fuzzy and subject to hype and buzzword mania, data science includes
data collection, data cleansing, data management, data analytics, and
data vizualisation, and a data scientist is a person who can master some
or all of these techniques (or sciences). At Data Publica, we are applying
data science to firmographics (firmographics is to organizations what de-
mographics is to people), and we are using firmographics to answer the
needs of B2B sales and marketing departments. This talk will present
the techniques we use and some of the amazing results they produce.
Table of Contents
Research Papers
Quality of Services
Probabilistic Prediction of the QoS of Service Orchestrations:
A Truly Compositional Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
Leonardo Bartoloni, Antonio Brogi, and Ahmad Ibrahim
Service Management
Choreographing Services over Mobile Devices . . . . . . . . . . . . . . . . . . . . . . . . 429
Tanveer Ahmed and Abhishek Srivastava
Trust
A Novel Equitable Trustworthy Mechanism for Service Recommendation
in the Evolving Service Ecosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
Keman Huang, Yi Liu, Surya Nepal, Yushun Fan,
Shiping Chen, and Wei Tan
Industrial Papers
Runtime Management of Multi-level SLAs for Transport and Logistics
Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560
Clarissa Cassales Marquezan, Andreas Metzger, Rod Franklin, and
Klaus Pohl
XXII Table of Contents
1 Introduction
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 1–15, 2014.
c Springer-Verlag Berlin Heidelberg 2014
2 N. Assy and W. Gaaloul
These research results highlight the need for means of support to derive indi-
vidual variants as integrated models tend to be complex with a large number
of configurable elements [11]. To fill this gap, some works propose to use ques-
tionnaires [12] or ontologies [13] in order to get business requirements and guide
the configuration process. Others propose to use non functional requirements to
assess configuration decisions on the process performance [14]. Although these
works have made a considerable effort on process variability design and config-
uration, a less attention has been paid to understand the way a configurable
process model is configured. That means which configurations are frequently se-
lected by the users and how configuration decisions may have an impact on others
in the process model. The configurations’ frequencies and interrelationships have
been identified in the requirements for a configurable modeling technique in [3].
In this work, we propose to enhance configurable process models with con-
figuration rules. These rules reveal the frequency and association between the
configuration decisions taken for different variation points in a configurable pro-
cess model. Concretely, we propose to discover from a large collection of process
variants the frequently selected configurations in a configurable process model.
Then, taking advantage of machine learning techniques [15], in particular associ-
ation rule mining, we extract configuration rules between the discovered config-
urations. These rules can be then used to support business analysts to develop
a better understanding and reasoning on the variability in their configurable
process models. For instance, business analysts can manage the complexity of
existing configurable process models by removing or altering the configurations
that were never or rarely selected. Moreover, the automated discovery of the
interrelationships between configuration decisions can assist the configuration
process by predicting next suitable configurations given the selected ones.
The remainder of the paper is organized as follows: in section 2, we present a
running example used throughout the paper to illustrate our approach. Section 3
provides some concepts and definitions needed for our approach. In section 4, we
detail our approach to derive configuration rules using association rule mining
techniques. The validation and experimental results are reported in section 5. In
section 6, we discuss related work and we conclude in section 7.
2 Running Example
Our running example is from SAP reference model for a procurement process
management modeled with the Configurable Event-Driven Process Chain nota-
tion (C-EPC) [3] (see Fig. 1). The EPC notation consists of three elements: event,
function and connector. An event can be seen as a pre- and/or post-condition
that triggers a function. A function is the active element that describes an activ-
ity. Three types of connectors, OR, exclusive OR (XOR) and AND are used to
model the splits and joins. In our example, we index connectors with numbers
in order to distinguish between them. The C-EPC notation adds the config-
urability option for functions and connectors. A configurable function can be in-
cluded or excluded from the model. A configurable connector can change its type
Configuration Rule Mining for Variability Analysis 3
Goods confirmation
received without
differences
Purchase Goods Receipt
x
V V
Stores
order release 1 receipt posted 4
Inbound 3
TO item
delivery confirmed
created
Requirement for Purchase Contract order
materials Requisition
x release
x Purchasing x
1 2 3
Purchase V Invoice Payment to
V
Shceduling order created verification effect
4
agreement
release Service Invoice
accepted received
Services to Service V
be entered entry sheet
2 Service entry Material
Legend: x V
V
transmitted released
Event Function XOR OR AND
Goods confirmation
received without
differences
Purchase Goods Receipt
x
V V
Stores OUX
requisition receipt posted
released for Inbound
1 3 4
TO item
purchase order delivery for confirmed
PO created
Requirement for Purchase
materials Requisition
OUX OUX Purchasing OUXx
1 2 3
Contract order Purchase V V Do Payment
release order created 2 paymenet received
4
Invoice
received
Material
released
3 Preliminaries
In this section, we present the definition of the business process graph and con-
figurable process model enhanced with our configuration rules definition.
∨ ∧ × Seq
∨c
∧c
×c
In this section, we present our approach for mining configuration rules. Let
P c = (N, E, T, L, B, Conf c , CRc ) be a configurable process model and =
{Pi = (Ni , Ei , Ti , Li ) : i ≥ 1} an existing business process repository. First, we
extract from the set of similar configurations for the configurable elements in
P c (see section 4.1). Then, using association rule mining techniques, we mine
configuration rules from the retrieved similar configurations (see section 4.2).
a set of algorithms for returning the synonyms between two words. We use in
particular the WUP algorithm [19] which measures the relatedness of two words
by considering their depths in WordNet database. After normalizing activities’
labels (i.e. put all characters in lowercase, remove stop words, etc.) the total
similarity is the average of their syntactic and semantic similarities.
LD(L(a), Li (a’)) + W U P (L(a), Li (a’))
SimA (L(a), Li (a’)) = (4)
2
where 0 ≤ SimA ≤ 1, LD and W U P are functions returning the Levenshtein
distance and the WordNet based similarity respectively between L(a) and Li (a’).
We say that a’ is the best activity matching for a iff: SimA (L(a), Li (a’) ≥
minSimA ∧ ax ∈ Ni : SimA (L(a), Li (ax )) > SimA (L(a), Li (a’)), where
minSimA is a user specified threshold. For example, in Fig. 1 and 2, the sim-
ilarity between the events “Purchase order release” and “Purchase requisition
released for purchase order” is 0.735. For a minSimA = 0.5,“Purchase order
requisition for purchase order” is the best activity matching for “Purchase order
release” as it has the highest similarity with “Purchase order release”.
Pid invoice ×1 ×2 ×3 ∨
verification
P1 C2 C3 C3 C20 C27
P2 C1 C4 C4 C21 -
P3 C1 C4 C4 - C28
... ... ... ... ... ...
Nc Conf Confid
OFF C1
invoice verification
ON C2
< ×, {purchase order release, contract order release} > C3
×1 < ×, {purchase order release, scheduling agreemenr release} > C4
... ...
< ×, {purchase order release, contract order release} > C8
×2 < ×, {purchase order release, scheduling agreemenr release} > C9
... ...
< ×, {Inbound delivery created, Purchase order created} > C20
×3 < Seq, {purchase order release} > C21
... ...
< ∧, {Purchase order created, ∧3 } > C27
< ×, {purchase order created, Service accepted} > C28
∨
< ×, {Service accepted, ∧3 } > C29
... ...
The configuration matrix along with a user specified support and confidence
thresholds are used as inputs by the Apriori algorithm. As output, the Apriori
algorithm returns the set of configuration rules having a support and confidence
above the user’s thresholds. An example of a configuration rule returned by
Apriori for a support S = 0.5 and a confidence C = 0.5 is given in (1).
Configuration Rule Mining for Variability Analysis 11
5 Experimental Results
Table 4. The size statistics of the clusters and the configurable process models
size # configurable
nodes
min max avg. min max avg.
cluster 20.55 25.625 23 0 0 0
configurable model 2 162 34.175 0 36 5.575
#CR
R=1− (6)
#C
where #CR is the number of configurations using our configuration rules and #C
is the total number of valid configurations. The results reported in Table 5 show
that in average we save up to 70% of allowed configurations which are either
infrequent configurations or never selected in existing process models. Note that
this amount of reduction may vary depending on the selected minSupport and
minConfidence thresholds which are set to 0.5 in our experiments.
In the second experiment, we evaluate the mined configuration rules in order
to extract useful characteristics for the configuration decision. Since configu-
ration rules can be represented as a graph where each node represents a rule
12 N. Assy and W. Gaaloul
90%
30
Table 5. The amount of reduction 80%
avg. # of configurable elements
25
100%
70%
90%
Size #CR #C R 20
60%
80%
70% avg. # of configuration
50% avg. # of configuration
1560%
min 2 2 6 0.6 40%
50%
decisions decisions
head or body and edges represent the implication relation between rules’ head
and body [25], we analyze this graph structure in order to derive interesting
hypothesis for the configuration decision. We borrow the emission and reception
metrics from the social network analysis domain [26] which measure the ratio
of the outgoing and incoming relations respectively of a node in the graph. The
reason for choosing these two metrics in particular is justified by the fact that
a configuration node with a high emission have an impact on a large number
of configurations in the process model. Therefore starting by its configuration
may save the number of configuration decisions that should be taken by the
user. Whereas a configuration node with a high reception depends on a large
number of configurations. Therefore it may be useful to delay the selection of
such configuration. The emission EC and reception RC ratios of a configuration
node are computed as:
#outC #inC
EC = RC = (7)
maxi (#outCi ) maxi (#inCi )
6 Related Work
The limitation and rigid representation of existing business process models have
led to the definition of flexible process models [27]. In this paper, we rely on the
work presented in [3] where configurable process models are introduced. In their
work, the authors define the requirements for a configurable process modeling
technique and propose the configurable EPC notation. They highlight the need
for configuration guidelines that guide the configuration process. These guide-
lines should clearly depict the interrelationships between configuration decisions
and can include the frequency information. In our work, we demonstrate how
using association rule mining techniques, we induce frequency-based configura-
tion rules from existing process variants. These rules describe the association
between the frequently selected configurations.
In order to match existing process models for merging, La Rosa et al. [4] use
the notion of graph edit distance [28]. They compute the score matching using
syntactical, semantic and contextual similarities identified in [16]. In our work,
we propose to use existing process variants in order to analyze the variability in
a configurable process model. This analysis can be used to improve the design
and configuration of the configurable process model. We also use similar metrics
for process model matching. However, instead of matching entire process models,
we only search a matching for configurable elements.
To manage the variability in configurable process models, the researchers have
been inspired from variability management in the field of Software Product Line
Engineering [29]. La Rosa et al. [12] propose a questionnaire-driven approach for
configuring reference models. They describe a framework to capture the system
variability based on a set of questions defined by domain experts and answered
by designers. Their questionnaire model includes order dependencies and domain
constraints represented as logic expressions over facts. The main limitation of
this approach is that it requires the knowledge of a domain expert to define
the questionnaire model. In addition, each change in the configurable process
model requires the update of the questionnaire model by the domain expert. This
task manually performed may affect the configuration framework performance.
While in our work, we propose an automated approach to extract the knowledge
resulted from existing configurations using the well know concept of association
rules. Our configuration rules can serve as a support for domain experts in order
to define and update their configuration models.
Huang et al. [13] propose an ontology-based framework for deriving business
rules using Semantic Web Rule Language (SWRL). They use two types of on-
tologies: a business rule ontology which is specified by a domain expert, and
a process variation points ontology based on the C-EPC language. Using these
ontologies, they derive SWRL rules that guide the configuration process. Differ-
ent from them, we map the configuration process to a machine learning problem
and use association rule mining instead of SWRL based rules in order to derive
configuration rules. Our approach does not require any extra expert’s effort and
can be extended in order to classify the learned configuration rules w.r.t. specific
business requirements.
14 N. Assy and W. Gaaloul
7 Conclusion
In this paper, we present a frequency-based approach for the variability analysis
in configurable process models. We propose to enhance the configurable process
models with configuration rules. These rules describe the combination of the
frequently selected configurations. Starting from a configurable process model
and an existing business process repository, we take advantage of association
rule mining techniques in order to mine the frequently selected configurations as
configuration rules. Experimental results show that using our configuration rules,
the complexity of existing configurable process models is reduced. In addition,
metrics such as emission and reception applied to our configuration rules help
in identifying the configurations that save users’ decisions.
Actually, we are integrating our approach in an existing business process mod-
eling tool, namely Oryx editor. In our fututre work, we target to define most
sophisticated rules for retrieving similar connector’ configurations. Instead of re-
lying only on the connectors’s direct preset and postset, we aim at looking for
k-backward and k-forward similar elements. This in turn, would improve our pre-
processing step and therefore our mined configuration rules. Moreover, we look
for enhancing our configuration rules, besides the frequency, with other useful
information such as the configuration performance, ranking, etc.
References
1. Fettke, P., Loos, P.: Classification of reference models: a methodology and its ap-
plication. Information Systems and eBusiness Management (2003)
2. Schonenberg, H., et al.: Towards a taxonomy of process flexibility. In: CAiSE Fo-
rum, pp. 81–84 (2008)
3. Rosemann, M., van der Aalst, W.M.P.: A configurable reference modelling lan-
guage. Inf. Syst. (2007)
4. Rosa, L., et al.: Business process model merging: An approach to business process
consolidation. ACM Trans. Softw. Eng. Methodol. (2013)
5. Derguech, W., Bhiri, S.: Merging business process variants. In: Abramowicz, W.
(ed.) BIS 2011. LNBIP, vol. 87, pp. 86–97. Springer, Heidelberg (2011)
6. Gottschalk, F., Aalst, W.M., Jansen-Vullers, M.H.: Merging event-driven process
chains. In: OTM 2008 (2008)
7. Assy, N., Chan, N.N., Gaaloul, W.: Assisting business process design with config-
urable process fragments. In: IEEE SCC 2013 (2013)
8. Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: Mining configurable
process models from collections of event logs. In: Daniel, F., Wang, J., Weber, B.
(eds.) BPM 2013. LNCS, vol. 8094, pp. 33–48. Springer, Heidelberg (2013)
9. Gottschalk, F., Aalst, W.M.P.v.d., Jansen-Vullers, M.H.: Mining Reference Process
Models and their Configurations. In: EI2N08, OTM 2008 Workshops (2008)
10. Assy, N., Gaaloul, W., Defude, B.: Mining configurable process fragments for
business process design. In: Tremblay, M.C., VanderMeer, D., Rothenberger, M.,
Gupta, A., Yoon, V. (eds.) DESRIST 2014. LNCS, vol. 8463, pp. 209–224. Springer,
Heidelberg (2014)
11. Dijkman, R.M., Rosa, M.L., Reijers, H.A.: Managing large collections of business
process models - current techniques and challenges. Computers in Industry (2012)
Configuration Rule Mining for Variability Analysis 15
12. Rosa, M.L., et al.: Questionnaire-based variability modeling for system configura-
tion. Software and System Modeling 8(2), 251–274 (2009)
13. Huang, Y., Feng, Z., He, K., Huang, Y.: Ontology-based configuration for service-
based business process model. In: IEEE SCC, pp. 296–303 (2013)
14. Santos, E., Pimentel, J., Castro, J., Sánchez, J., Pastor, O.: Configuring the vari-
ability of business process models using non-functional requirements. In: Bider, I.,
Halpin, T., Krogstie, J., Nurcan, S., Proper, E., Schmidt, R., Ukor, R. (eds.) BP-
MDS 2010 and EMMSAD 2010. LNBIP, vol. 50, pp. 274–286. Springer, Heidelberg
(2010)
15. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Tech-
niques, Second Edition (Morgan Kaufmann Series in Data Management Systems).
Morgan Kaufmann Publishers Inc (2005)
16. Dijkman, R.M., et al.: Similarity of business process models: Metrics and evalua-
tion. Inf. Syst. 36(2), 498–516 (2011)
17. Levenshtein, V.I.: Binary Codes Capable of Correcting Deletions, Insertions and
Reversals. Soviet Physics Doklady (1996)
18. Pedersen, T., Patwardhan, S., Michelizzi, J.: Wordnet: Similarity - measuring the
relatedness of concepts. In: AAAI, pp. 1024–1025 (2004)
19. Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: ACL 1994 (1994)
20. Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of
items in large databases. In: ACM SIGMOD 1993, pp. 207–216 (1993)
21. Fu, X., Budzik, J., Hammond, K.J.: Mining Navigation History for Recommenda-
tion. In: IUI 2000, pp. 106–112 (2000)
22. Lin, W., Alvarez, S.A., Ruiz, C.: Collaborative recommendation via adaptive as-
sociation rule mining. In: Data Mining and Knowledge Discovery (2000)
23. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large
databases. In: VLDB, pp. 487–499 (1994)
24. Keller, G., Teufel, T.: Sap R/3 Process Oriented Implementation, 1st edn. Addison-
Wesley Longman Publishing Co., Inc., Boston (1998)
25. Ertek, G., Demiriz, A.: A framework for visualizing association mining results. In:
Levi, A., Savaş, E., Yenigün, H., Balcısoy, S., Saygın, Y. (eds.) ISCIS 2006. LNCS,
vol. 4263, pp. 593–602. Springer, Heidelberg (2006)
26. Scott, J.P.: Social Network Analysis: A Handbook. SAGE Publications (2000)
27. Bhat, J., Deshmukh, N.: Methods for Modeling Flexibility in Business Processes.
In: BPMDS 2005 (2005)
28. Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determina-
tion of minimum cost paths. IEEE Trans. Systems Science and Cybernetics 4(2),
100–107 (1968)
29. Clements, P.C.: Managing variability for software product lines: Working with
variability mechanisms. In: SPLC, pp. 207–208 (2006)
ProcessBase: A Hybrid Process Management
Platform
1 Introduction
Many processes are difficult to model due to the ad-hoc characteristics of these
processes [1], which often cannot be determined before the process begins. While
certain characteristics could be predicted, the actual activities and ordering may
differ. More so, information may only become available during the process, thus
making human-beings and knowledge-workers in control of these processes [2–5].
An emerging discipline to deal with such processes (commonly referred to
as “unstructured” or “semi-structured” processes) is Case-Management. The
importance is well recognised since knowledge-workers who make up 25-40% of
a typical workplace play a vital role on the long-term success of an enterprise
[3]. However, while research in this area correctly highlights the importance
of combining knowledge with process [3], and calls for increased flexibility [4],
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 16–31, 2014.
c Springer-Verlag Berlin Heidelberg 2014
ProcessBase: A Hybrid-Process Management Platform 17
2 Motivating Example
Consider the “Software Development Change-Management” process, as illus-
trated in the BPMN model shown in Figure 1.
Approved?
Change Management
+ + No + +
Reject
Change
While the overall pattern may be followed, the specifics may vary between
case-to-case. For example, a “formal” software-project often view changes as a
non-typical event requiring a strict approval-process. However, even in a “for-
mal” setting, structured activities may exhibit variations, but only based on
preconceived conditions; an example is illustrated in Figure 2.
Approved?
Project Manager
Approved? Cost/Schedule?
Justify Requires
Review CR X Yes
X Yes
Change Plan PM Verification?
No No Not required
Reject Accept
Change Change
Audit Committee
Required
PM
Audit Committee
No CCB
Verify/Audit
CR Plan & Costing
Justification
X Yes Accept
Change
PM
CCB
Approved?
(a) (b)
Auditor
Fig. 2. (a) Formal Software Project Approval Process; (b) Process Variation Tree
ProcessBase: A Hybrid-Process Management Platform 19
be well adopted in mainstream, likely because they overarch the extensive range
of business rule-types (i.e. integrity, derivation, reaction and deontic rules), thus
clouding simplicity with over-rich vocabulary and semantics, [4, 8].
Event Driven Business Process Management (EDBPM). In an similar
approach, EDBPM focuses primarily on “event-driven” reaction-rules. The moti-
vation has been to merge BPM with Complex Event-Processing (CEP) platforms
via events produced by the BPM-workflow engine or any associated (and even
distributed) IT services. In addition, events coming from different sources and
formats can trigger a business process or influence its execution thereof; which
could in turn result in another event. Moreover, the correlation of these events in
a particular context can be treated as a complex, business-level event, relevant
for the execution of other business processes. A business process, arbitrarily fine
or coarse grained, can thus be choreographed with other business processes or
services, even cross-enterprise. Examples of such systems include: jBPM [14], and
RunMyProcess [15]. However, these systems are usually implemented where the
respective components sourced from BPM or CEP operate almost independently,
(e.g. event-modeller vs. process-modeller; event-store vs. process-store; rules-
engine vs. process-engine; process-instances vs. rules-instances, etc.). In fact,
the only thing connecting these two systems together is the event-stream at the
low-level, albeit this does not really directly benefit the process-modeller. These
systems also tend to be dominated somewhat by the structured process side (e.g.
a rudimentary process is always required, and even basic changes require restart-
ing the process). They also do not encompass the full range of process-specificity
support, however nonetheless they do provide the crucial step-ahead towards at
least a partial hybrid-process methodology.
Case Management. As mentioned, the “case-management” paradigm has
also been recognised as a promising approach to support semi-structured pro-
cesses. Unlike traditional business-process systems that require the sequence and
routing of activities to be specified at design-time (as otherwise they will not be
supported) - case-management is required to empower the ability to add new
activities at any point during the lifecycle and when the need arises, [4]. At the
same time it also requires the ability to capture possibly repeatable process pat-
terns, and variations thereof [3, 16]. However, although there has been several
efforts to push this, (e.g. OMG is currently working on an appropriate standardi-
sation), at present there are no concrete all-encompassing frameworks capable of
adequately supporting these requirements. Emergent Case Management provides
a slightly more modernised twist, suggesting a bottom-up approach. Bohringer
[2], proposes such a platform which petitions the use of social-software (e.g. tag-
ging, micro-blogging and activity-streams) in a process-based manner. It claims
to empower people to be at the centre of such information systems, where the
goal is to enable users to assign activities and artifacts independent of their rep-
resentation to a certain case, which can be dynamically defined and executed by
users. However, this work is currently only at its concept stage and is yet to be
implemented and tested. Likewise case-management in general is rather yet only
considered “a general approach” rather than being a “mature tool category”.
ProcessBase: A Hybrid-Process Management Platform 21
The Web-Services Layer represents APIs available over the Internet, whose
integration in processes offers vital potential: Services act as rich and real-time
sources of data, as well as, providing functionality (software and tools), infras-
tructure building-blocks, collaboration mechanisms, visualisations, etc.
The ServiceBus components (leveraged from our previous work [6, 7]) acts as
the middleware between outside Web-services and the platform back-end. Most
importantly, it helps solve the inherent heterogeneity challenges: Services may
differ in representation and access protocols, (e.g. SOAP vs. REST); as well as
in message-interchange formats, (e.g. JSON, XML, CSV, or Media files, etc.).
Moreover, APIs are constantly subject to change, (e.g. due to system updates,
22 M.C. Barukh and B. Benatallah
A2’ A3
HP2 A1 A4 A5 sub-activities
Case2
unusual
■ monitoring A7 behaviour
(goals) ■ change management
■ software-project T(j) BPELProcess
■ formal
Fig. 6. Example of Hybrid Process Model (showing key nodes and relationships)
24 M.C. Barukh and B. Benatallah
EventType ActionType
Service Tasks.EventTypes.Service.*
yp Tasks.ActionTypes.Service.*
yp
... FeedEntry ServiceInvoker
FeedInstanceID : String AccessToken : String
ServiceName : String
FeedType FeedInstance OpName : String
... ... InputMsg : Message
Tasks.EventTypes.BPEL.*
yp Tasks.ActionTypes.BPEL.*
yp
BPELInstanceEventType BPELInstanceActionType
BPELProcess PInstanceID : String
ProcessID : String
ProcessName : String BPELAsset
Version : Integer Name : String
ActivityEnabled InstanceStarted DeployProcess SuspendInstance
Status : String Type : FileType
TotalInstances : Integer ProcessID : String PInstanceID : String
Data : byte[]
Inst : List<BPELInstances>
Package : List<BPELAsset> ActivityDisabled InstanceCompleted UndeployProcess ResumeInstance
ProcessID : String PInstanceID : String
BPELInstance BPELInstanceEvent
ActivityFailure InstanceStopped
InstanceID : String Activity : String ActivateProcess TerminateInstance
ProcessID : String InstanceID : String ProcessID : String PInstanceID : String
Parent : BPELProcess ProcessID : String Version : Integer
A ti it E St t
ActivityExecStart InstanceTerminated
Status : String Type :
DateStarted : Date BPELInstanceEventType FaultInstance
LastActive : Date Time : Date RetireProcess
PInstanceID : String
TotalInstances : Integer ActivityExecStop ProcessID : String
Activities : List<String> Version : Integer
EventType ActionType
Lifecycle-State Task. Data and resources are central to any process. How-
ever, since many process systems tend to be activity-centric, data-artifacts ma-
nipulated by these processes are seen as second-class citizens. In contrast, the
“artifact”-centric approach stipulates an artifact modelled to have both an in-
formation and lifecycle model, [22]. We implement this archetype, as a Lifecycle
StateTask as illustrated in Figure 10, consisting of States and Transitions.
Modelled after a finite-state-machine (FSM), there are three kinds of state-
actions (in our model represented as a Rule - where a pure action could just
be with no event or condition): (i) onEntry is activated when the state is en-
tered; (ii) onDo after finishing the entry-action and anytime while in that state;
(iii) onExit when the state is deactivated. Likewise, in FSM terms, a transition
is modelled as an event, guard and action. A guard is effectively a condition,
which thus means we again re-use the notion of Rule which can thereby also be
attributed to the Transition entity.
Human Task. Although there are several options for integrating human-worker
frameworks into our platform, we have chosen to leverage Asana, due to its
popularity, integration with other tools, and ease-of-use [25]. The model for
a HumanTask has been illustrated in Figure 11. The entity Story represents
any change or human/system activity performed during the execution of some
human-task; which we represent in our system as events.
EventType ActionType
Tasks.EventTypes.Human.* Tasks.ActionTypes.Human.*
HumanTask
Story HumanTaskActionType
HumanTaskID : String
Parent_taskID : String Parent_taskID : String
Name : String
StoryID : String
Notes : String
Creation_date : Date
Assignee_id : String
AuthorID : String
Assignee_name : String
AuthorName : String CreateTask CommentOnTask
Assignee_status : String
Type : Enum(System|Human) Name : String Text : String
Completed : boolean
Body : JSONString ---
Due_on : Date
Created_at : Date
Modified_at : Date AddToProject
UpdateTask
Completed_at : Date CommentAdded TagAdded ProjectID : String
Followers : ---
CommentText : String TagName : String
Map<String,String>
Project_id : String
Workspace_id : String TaskCompleted AddedToProject
Add dT P j t DeleteTask AddFollower
CompletionDate : Date ProjectID : String --- FollowerID : String
ProjectName : String
Ch dD D t
ChangedDueDate
NewDueDate : Date FollowerAdded
FollowerID : String
FollowerName : String
Current techniques for re-use usually utilise process schema or template libraries.
However, this does not prove efficient with a large and increasing number of
process definitions, cases, and variations. In ProcessBase, we propose a novel
automated recommendation approach based on the currently detected “context”
of a hybrid-process definition. This means given the context, the system may
suggest the closest matching existing process definition, that could then be re-
used and/or customised as required.
ProcessBase: A Hybrid-Process Management Platform 27
Rule0 Rule1
[IF] HProcess(Goals) = (<blank>) [IF] HProcess(Goals) = (change-management &&
Activity(Goals) = (<blank>) software-project &&
agile-project)
[THEN] HProcessID = null
Activity(Goals) = (receive-request &&
plan/approve &&
notify dev-team &&
Rule3
update codebase) [IF] HProcess(Goals) = (change-management &&
[THEN] HProcessID = 1 software-project &&
formal-project
audit-reporting)
false (△±)
△ Activity(Goals = (receive-request &&
Rule2 plan/approve &&
notify dev-team &&
[IF] HProcess(Goals) = (change-management &&
update codebase &&
software-project &&
monitoring &&
formal-project)
audit change plan)
Activity(Goals) = (receive-request && [THEN] HProcessID = 3
plan/approve &&
notify dev-team &&
update codebase true (△+)
monitoring)
[THEN] HProcessID = 2
Using again the examples we described in Sections 5.2 and 6, starting with a
simple/empty process (Line 1), the recommender system can be invoked (Line
2), which finds the closest process being for an “agile” software project. The
designer can modify this by creating a sub-case (or template) (Line 3), and
then proceed to define a new “monitoring” activity, (Lines 4-6). The monitoring
activity posts a tweet-notification (e.g. “thanks for your patience!”), in the event
the approval process has taken longer than 1-week to complete.
ProcessBase: A Hybrid-Process Management Platform 29
Productivity Study. Given the task was fixed, productivity was measured
based on the total number of lines-of-code (LOC) in order to produce the solu-
tion. The results in Figure 13(a-d), presents a distributed measure of LOC.
Automated
ECA Rules
72 Total: 648 LOC Automated
ECA Rules
180 Total: 1,980 LOC Automated 0
ECA Rules
Total: 2,033 LOC
Task-type Category
0 37.5 75 112.5 150 187.5 225 262.5 300 0 100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 700 800
ProcessBase: Distributed Lines-of-Code (LOC) JBoss jBPM: Distributed Lines-of-Code (LOC) jOpera: Distributed Lines-of-Code (LOC)
(a) (b) (c)
Design / Implement Use-Case (secs)
500
Automated
ECA Rules
0 Total: 2,678 LOC
400
Task-type Category
Web-Services
878
Integration
Structured
300
635
Process Support 479
State/Lifecycle
200
597 313
Management
235 261
100
Other 568
0
0 112.5 225 337.5 450 562.5 675 787.5 900 ProcessBase JBoss jBPM JOpera Traditional
Code-based: Distributed Lines-of-Code (LOC) Programming
(d) (e)
Performance Study. Finally, we measured the round-trip time (i.e. from when
the change-request was issued, until updates were committed into Git ). We re-
peated this study 5 times, taking the median; results as presented in Figure 13(e).
9 Conclusions
The work in this paper as far as we know, proposes the first all-encompassing
complete hybrid-processes platform. Moreover, we propose an architecture where
existing process-support technology (either domain-specific or partial-hybrid)
can be leveraged, rather than compete with each other. In addition, our work
is the first to propose a novel recommendation system using process context-
detection - based on an incremental knowledge acquisition technique. Experimen-
tal results shows superior performance across all evaluated dimensions: usability,
productivity and performance. Above all, we are optimistic this work provides
the foundation for future growth into a new breed of enhanced process-support.
References
1. Marjanovic, O.: Towards is supported coordination in emergent business processes.
Business Process Management Journal 11(5), 476–487 (2005)
2. Böhringer, M.: Emergent case management for ad-hoc processes: A solution based
on microblogging and activity streams. In: Muehlen, M.z., Su, J. (eds.) BPM 2010
Workshops. LNBIP, vol. 66, pp. 384–395. Springer, Heidelberg (2011)
3. BPTrends: Case management - combining knowledge with process (July 2009)
4. de Man, H.: Case management: A review of modelling approaches (January 2009)
5. Holz, H., Rostanin, O., Dengel, A., Suzuki, T., Maeda, K., Kanasaki, K.: Task-
based process know-how reuse and proactive information delivery in tasknavigator.
In: Conference on Information and Knowledge Management, pp. 522–531 (2006)
6. Barukh, M.C., Benatallah, B.: ServiceBase: A programming knowledge-base for
service oriented development. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W.,
Song, W. (eds.) DASFAA 2013, Part II. LNCS, vol. 7826, pp. 123–138. Springer,
Heidelberg (2013)
7. Barukh, M.C., Benatallah, B.: A toolkit for simplified web-services programming.
In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds.) WISE 2013, Part
II. LNCS, vol. 8181, pp. 515–518. Springer, Heidelberg (2013)
8. Olding, E., Rozwell, C.: Expand your bpm horizons by exploring unstructured
processes. Technical Report (2009)
9. Bernstein, A.: How can cooperative work tools support dynamic group process?
bridging the specificity frontier. In: CSCW, pp. 279–288. ACM, New York (2000)
10. Keen, P.G., Morton, M.S.S.: Decision support systems: an organizational perspec-
tive, vol. 35. Addison-Wesley Reading, MA (1978)
11. Vanthienen, J., Goedertier, S.: How business rules define business processes. Busi-
ness Rules Journal 8(3, March) (2007)
12. Agrawal, A.: Semantics of business process vocabulary and process rules. In: Pro-
ceedings of the 4th India Software Engineering Conference, pp. 61–68. ACM (2011)
13. Milanovic, M., Gasevic, D., Wagner, G.: Combining rules and activities for model-
ing service-based business processes. In: 2008 12th Enterprise Distributed Object
Computing Conference Workshops, pp. 11–22. IEEE (2008)
ProcessBase: A Hybrid-Process Management Platform 31
1 Introduction
Business process model repair can be used to automatically make an existing process
model consistent with a set of new behaviours, so that the resulting repaired model is
able to describe them, while being as close as possible to the initial model [4]. Differ-
ently from process discovery, in which a completely new process is discovered from the
new observed behaviours, process model repair starts from an initial process model and
it incrementally evolves the available model through a sequence of repair operations [4].
Repair operations range from simple insertion and deletion of activities in the model,
to sophisticated sets of operations. In all cases, however, repair operations have a cost:
they add complexity to the repaired models. Business analysts in charge of repairing
existing models with respect to new behaviours are hence forced to choose whether
to accept the increased complexity of a model consistent with all deviant behaviours,
or to sacrifice consistency for a simpler model. In fact, some deviant behaviours may
correspond to exceptional or error scenarios, that can be safely abstracted away in the
process model.
In this work, we propose a multi-objective optimization approach to support business
analysts repairing existing process models. It uses repair operations from state-of-the-
art process repair algorithms to define a multi-objective optimization problem, whose
two objectives are: (1) minimizing the cost of repair (in terms of complexity added to
the repaired model); and, (2) maximizing the amount of new behaviours represented
consistently in the model. We formulate such multi-objective optimization problem in
terms of a set of pseudo-Boolean constraints and we solve it by means of a Satisfia-
bility Modulo Theory (SMT) solver. The result provides business analysts with a set
of Pareto-optimal alternative solutions. Analysts can choose among them based on the
complexity-consistency trade-off that better fits their needs. The approach has been
evaluated on a real life case study.
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 32–46, 2014.
c Springer-Verlag Berlin Heidelberg 2014
A Multi-objective Approach to Business Process Repair 33
The contribution of the paper is twofold: (i) a multi-objective approach for business
process model repair (Section 3); (ii) the results of our evaluation of the approach on a
real-life case study (Section 4).
2 Background
Inputs to the automated process repair techniques are new process behaviours, which
in modern information systems are captured through new execution traces recorded in
log files, so the problem of automated repair can be stated as the problem of repairing
a process model with respect to a log file [4]. In other words, given an initial process
model M (either manually designed or automatically discovered) and a set of execution
traces T (describing the new behaviours of the system), automated process repair aims
at transforming M into a new model M that is as close as possible to M and that
accepts all traces in T , where an execution trace t ∈ T is a sequence of events (i.e.,
system activities) t = e1 , ..., en .
Among the different ways in which automated repair can be realized, two main cat-
egories of approaches can be identified in the literature: (i) the approaches performing
repair operations on the initial model M by directly looking at its differences with the
new traces [4]; (ii) the approaches that mine from T one (MT ) or more (MT = (Mti ))
new process models describing the new behaviours, use delta-analysis [1] techniques
for identifying differences between the new mined models and the initial one, M , and
apply repair operations to M [7,6].
In both cases, the differences of the initial process model with respect to the deviant
behaviours (described as execution traces or as mined process models) have to be iden-
tified (see e.g., [4] and [7]). Once such differences have been identified, a set of repair
operations can be applied to the initial model M . The basic operations consist of inser-
tion and deletion of activities in the model. For example, given the extract of Petri Net in
Figure 1 and the execution traces t1 = A, B, D, C and t2 = A, C , two basic repair
operations, an insertion o1 and a deletion o2 (see Figure 2) can be applied to the Petri
Net in order to make t1 and t2 accepted by the Petri Net. Since these operations might
remove old behaviours of the net, some approaches (e.g., [4]) tend to be conservative
and to introduce the addition or removal of behaviours only as an optional alternative
to the old behaviours. Figure 3 shows how this can be realized in a Petri Net: the black
transitions represent silent transitions, i.e., transitions that are not observed when the net
is replayed. Note that while preserving old behaviours, repair operations can introduce
extra-behaviours such as the one described by the execution A, D, C .
In this work we use a repair technique belonging to the first group of approaches
(repairs based on trace differences) and, in detail, the ProM1 Repair Model plugin. This
plugin implements the approach proposed by Fahland et al. [4] and takes as input a Petri
Net describing the initial model M and a log. A cost is assigned to insertion and dele-
tion operations. Correspondingly, an optimization problem is defined and the lowest-
cost alignment between the process model and the set of input traces is computed. The
outcome is a Petri Net M which is able to accept all traces in T .
1
http://www.promtools.org/prom6/
34 C. Di Francescomarino et al.
On top of this base alignment algorithm and of the insertion and deletion operations
described above, a set of variations are proposed in the approach by Fahland [4]:
Subprocess repair operations. In order to improve the precision of the repaired model
M , i.e., to avoid having too many extra-behaviours (besides those in T ), a subpro-
cess repair operation is introduced. The idea is that whenever a sequence of inserted
activities occurs at the same place in the model, instead of adding these activities incre-
mentally, they are structured as a subprocess, which is mined starting from the set of
subtraces that maximize the sequence of skipped activities in M . For example, consid-
ering the two traces t3 = A, B, D, E, C and t4 = A, B, E, D, C , the subprocess
s1 in Figure 4 is added to the net in Figure 1 to take care of the sequences of activities
D, E and E, D that are inserted at the same place, i.e., after B. Moreover, accord-
ing to whether the inserted actions represented by means of the subprocess are executed
at most once, exactly once or more than once in T , a skipping transition is added to the
net, the subprocess is added in sequence or it is nested in a loop block. In our example
the subprocess is executed at most once and therefore a skipping transition that directly
connects B and C is added to the net in Figure 4.
Loop repair operations. In order to improve the simplicity of the repaired model, a
special repair operation is dedicated to the identification of loops in the traces. The
identification of a loop, whose body represents a behaviour already described in the
model, allows the addition of a simple loop back transition instead of a new subprocess
duplicating the behaviour already contained in the initial model. For example, given
the net in Figure 1 and a trace t5 = A, B, C, B, C the silent transition (loop back
transition) in Figure 5 is added to the net, instead of a new subprocess accepting the
second sequence B, C .
Remove unused part operations. In order to improve the precision and the simplicity
of the repaired model M , the parts of M that are no more used are removed, by
aligning T with M and detecting the parts of the model that do not contribute to the
acceptance of a minimum number of traces.
In this paper we applied our technique on top of the results provided by the state-of-
the-art ProM Repair Model plugin with the default configuration, which has been set
by the authors to values providing the best results, according to their experiments [4].
A Multi-objective Approach to Business Process Repair 35
Fig. 4. An example of subprocess repair op- Fig. 5. An example of loop repair operation
eration
To repair a model, a set of changes A (repair operations) are discovered and applied by
the repair algorithm. Indeed, every subset Ā ⊆ A is able to partially fix the model M ,
so that a subset T̄ ⊆ T of traces is accepted by the partially repaired model. Assuming
that every operation a ∈ A has a cost c(a), we can formulate the problem of trading the
number of traces accepted by the repaired model for the cost of repairing the model as
a multi-objective optimization problem (MOP).
Definition 3 (Pareto front). The image F ∗ = {(f1 (x∗ ), f2 (x∗ ), ...fn (x∗ )|x∗ ∈ X ∗ }
of the set X ∗ of points x∗ which are Pareto optima for the MOP defined by (fi ), (oi )
is called Pareto front for the MOP.
Thus, a Pareto optimum provides a point that is equal or better than any other point
for all the functions fi and it is better than any other point for at least one function fj .
36 C. Di Francescomarino et al.
The Pareto front is a useful tool to describe the options that a decision maker has at dis-
posal and to identify preferred alternative among the available ones. In particular, when
problems with two objective functions are concerned, a graphical representation of al-
ternative solutions can be obtained by drawing the Pareto front points on the Cartesian
plane. The solutions (points) that are not on the Pareto front are by definition worse at
least in one objective than the solutions on the front and so they can be ignored in the
decision process.
The constraints on the objective functions “total cost” C and “number of accepted
traces” N can be expressed as PBCs involving the variables αj and τi , respectively:
ci αi ≤ C̄ (1) τi ≥ N̄ (2)
i=1,...,NA i=1,...,NT
Let us define the matrix {mij } with i = 1, ..., NT and j = 1, ..., NA such that
mij = 1 if and only if trace ti requires the repair operation aj to be accepted by
the repaired model. The following system of logical formulas model the relationship
between repair operations and accepted traces:
τi ↔ αj , with i = 1, ..., NT (3)
j=1,...,NA ∧mij =1
τi ≥ 1 + mij (αj − 1) (4a)
j=1,...,NA
NA τi ≤ NA + mij (αj − 1) (4b)
j=1,...,NA
turns out to be satisfiable, the accepted traces are identified by the true elements of {τi }
and the required repair operations by the true elements of {αj }.
Computing the Pareto front. The Pareto front for the model repair problem MRP can
be computed using Algorithm 1. First (step 1), we compute the point (CT , NT ) of the
front with maximum cost. Second (step 2), starting from C = CT and N = NT ,
the maximum allowed cost C is reduced by one and the maximum number N of
traces that can be accepted with that cost is searched, iteratively solving the problem
P (c, m, C, N ) while decreasing N until a satisfiable problem is found. When found,
the point (C, N ) is added to the set F and every point (C , N ) that is dominated by
(C, N ) is removed from F ; the cost is reduced by one and the loop is repeated. Upon
exit, the algorithm returns the set of points in the Pareto front.
Algorithm 1. Computing the Pareto front for the Model Repair MOP
Input:
c = c1 , ..., cNA vector containing the cost of repair operations,
m = (mij )i=1,...,NT ,j=1,...,NA , matrix specifying what traces are
repaired by what operations
Output:
F , a set of points (C, N ) (cost, number of accepted traces),
i.e. the Pareto front for the Model Repair MOP
Compute the cost CT to have a model that accepts the whole set of traces
// step 1:
CT = i=1,...,NA (ci )
add (CT , NT ) to F
// step 2: Follow the Pareto front
C = CT − 1
N = NT
while C > 0 do
while N > 0 do
if MRP problem c, m, C, N is satisfiable then
add (C, N ) to F
remove any (C , N ) dominated by (C, N ) from F
break
end if
N =N −1
end while
if N = 0 then
break
end if
C =C−1
end while
return F
A Multi-objective Approach to Business Process Repair 39
4 Experimental Results
RQ1 Does the Pareto front offer a wide and tunable set of solutions?
RQ2 Does the Pareto front offer solutions that can be regarded as repaired models of
good quality?
RQ1 deals with the number and the variety of different solutions provided by Multi-
objective Repair . In particular, the shape of the Pareto front and the number of the
solutions in the Pareto front determine whether a wide range of alternatives that bal-
ance the two dimensions of interest is offered to business analysts. The Pareto front,
in fact, might consist of points spread uniformly in the interesting region or it may be
concentrated in limited, possibly uninteresting regions of the plane (e.g., near the totally
repaired processes accepting almost all traces in T ). In our specific setting the number
of solutions available in the Pareto is dependent on the number of operations needed to
repair the whole set of traces in T . In order to answer this research question, we look at
the number of optimal solutions, as compared to the whole set of repair operations, and
at the shape of the Pareto front.
RQ2 investigates the quality of the repaired models in the Pareto front. Specifically,
two important quality dimensions for repaired models [3] are taken into account: (i)
Precision, i.e., how many new behaviours are introduced in the repaired model with
respect to the real process being modelled; and, (ii) Generality, i.e., how many yet
unobserved behaviours of the real process are accepted by the repaired model.
In the following, we report the case study, the metrics, the experimental procedure,
and the results obtained to positively answer RQ1 and RQ2.
The process used in the case study is a procedure carried out in the Italian Public Ad-
ministration (PA). It deals with the awarding of public tenders by contracting adminis-
trations. Before the winners can be awarded with the final notification, the contracting
administration has to verify whether the winners have all the necessary requirements.
In detail, the procedure is activated when the tender reports and a temporary ranking
are available. According to whether anomalous offers can be accepted or not, a further
step of evaluation is required or the letters for winners, non-winners as well as the result
letter can be directly prepared and entered into the system. At this point, the require-
ments of the temporary winners have to be verified. If such verification is successful, an
award notice is prepared and officially communicated; otherwise, further clarifications
are requested to the temporary winners and the verification is iterated. The award notice
can be published through the Web, through the Council notice board or, if the reward is
greater than a given threshold, it has to go to print.
A Petri net M describing such public tender awarding procedure has been defined
by a team of business experts as part of a local project. M takes into account the “ideal”
procedure described in official documents and is composed of 35 transitions, none of
40 C. Di Francescomarino et al.
which silent, and 32 places. No concurrent behaviours and no routing transitions occur
in M , while there are three alternative choices and a loop, involving 5 routing places3 .
Since discrepancies were found between M and the actually observed execution traces
T , a repaired model M was produced from M using the ProM Repair Model plugin.
4.2 Metrics
In order to answer the above research questions, we use precision and generality met-
rics to compare M to a gold standard model GSM . Differently from the initial model
M which did take into account the generic “ideal” procedure described in official doc-
uments, the gold standard GSM has been manually defined by a team of business
analysts working on the real process of a specific institution. It contains all and only
behaviours that are legal in the specific setting. Model GSM contains 49 transitions
and 38 places; it contains some parallel behaviours (2 routing transitions), several alter-
native paths and few loops (21 routing places). Transitions are decorated with transition
probabilities estimated from real process executions.
|acc(GSM, TM )|
P (M ) = (5)
|TM |
where acc(M, T ) is the subset of T accepted by M and TM is a set of traces stochas-
tically generated by model M . It should be noticed that in the general case, where no
GSM is available, measuring the precision of a model might be quite difficult and
might involve substantial manual effort. We expect that good models are characterized
by high precision.
Generality. Generality (G) measures the capability of the repaired model M to de-
scribe unobserved behaviours of the real process. We compute it as the percentage of
traces generated by GSM that are accepted by M :
|acc(M , TGSM )|
G(M ) = (6)
|TGSM |
1. Trace generation. Two sets of traces T and GT are generated from the gold stan-
dard model GSM (in our experiments, |T | = 100; |GT | = 10). Each trace is
generated by a random execution of the Petri net: at each step, the transition to fire
is chosen according to the probabilities associated with the enabled transitions. The
execution ends when no enabled transitions exist;
2. Model Repair. The set of traces T is used to repair the initial model M , producing
the set A of operations required to fix it. For each operation a ∈ A, its cost c(a),
estimated as the number of transitions added by the repair operation a, and the set
of traces T (a) accepted by the repaired model due to the specific repair operation
a are stored;
3. MRP Solver. The Solver applies Algorithm 1 to obtain the Pareto front. Each point
Pi in the Pareto front is associated with a repaired model Mi ;
4. Compliance Computation for Generality. The set of traces GT is used to evaluate
the generality of each repaired model Mi ;
5. Trace Generation from Repaired Models. Each model Mi is used to randomly
generate a set TMi (|TMi | = 100), using a uniform distribution of probabilities
associated with the transitions;
6. Compliance Computation for Precision. Traces TMi are checked against GSM
to measure the precision of model Mi .
Stochastic trace generation from GSM and from the repaired models Mi was re-
peated 10 times, to allow for the computation of average and standard deviation of the
experimental metrics for precision and generality.
4.4 Results
Figure 9 shows the Pareto Front obtained by applying Multi-objective Repair to the
presented case study. Each Pareto front point is associated with a model Mi , obtained
by applying a different set of repair operations. The x-axis represents the cost of the
repair operations applied to obtain model Mi , while the y-axis represents the number
of traces in T that are accepted by the repaired model Mi .
The shape of the Pareto front offers an approximately linear set of solutions that
are quite well distributed along the two axes. There are 6 points in the central area of
the plot, distributed two by two along the Pareto front. For each pair, the point with
the lowest cost is clearly associated with a better solution since the more costly solution
accepts just one additional trace. For example, M7 (indicated with an arrow in Figure 9)
and M8 represent a pair of close points. A business analyst, in charge to choose between
the two repaired models, would probably prefer M7 , since this solution presents a lower
cost, sacrificing only one trace.
Considering that 16 different repair operations have been identified by the ProM
repair plugin – hence, 216 different sets of operations can be potentially built – the
12 solutions provided by Multi-objective Repair represent, for a business analyst in
charge of repairing the initial model, a good selection of different trade-off solutions,
all ensured to be Pareto-optimal. Manual inspection of the whole space of solutions
would be unaffordable. Based on these considerations, we can answer RQ1 positively.
42 C. Di Francescomarino et al.
Fig. 9. Pareto Front obtained by applying Multi-objective Repair to the awarding of public tenders
Table 1 reports, for each repaired model Mi in the Pareto front, the number of traces
in T accepted by Mi , the cost of the repair operations applied, and the values for preci-
sion and generality. Figure 10 plots the same data as a function of the repair cost. The
low values for precision at increasing repair costs are due to the ProM repair algorithm,
which tends to preserve old behaviours by introducing alternative silent transitions. These
increase the number of extra-behaviours. As a consequence, the trend of the precision
metrics is decreasing when the number of repair operations applied to the model grows.
The opposite trend can be noticed for the generality metrics (blue line in Figure 10).
Starting from very low values for the poorly repaired models, the capability to reproduce
new, unobserved behaviours increases together with the application of repair operations.
It is worth noticing that in our case study the generality value for the repaired model
accepting all traces in T , i.e., M12 , is exactly 1. In fact, the trace set T used to repair
the initial model M provides exhaustive coverage of all possible process behaviours.
Of course, this might not be true in the general case.
Table 1. Precision and generality for the models in the Pareto front
Precision Generality
# of accepted traces Repair operation cost
Avg. Std. dev. Avg. Std. dev.
M1 5 2 1 0 0.04 0.07
M2 15 4 1 0 0.13 0.17
M3 24 6 1 0 0.22 0.08
M4 28 7 0.51 0.04 0.26 0.1
M5 45 9 0.53 0.07 0.41 0.08
M6 46 11 0.29 0.03 0.41 0.1
M7 57 15 0.51 0.04 0.51 0.11
M8 58 17 0.3 0.03 0.51 0.11
M9 72 20 0.4 0.03 0.66 0.1
M10 75 22 0.25 0.02 0.66 0.1
M11 94 26 0.26 0.03 0.98 0.04
M12 100 28 0.13 0.01 1 0
A Multi-objective Approach to Business Process Repair 43
4.5 Discussion
We have manually inspected the repaired models in the Pareto front. We found that some
cheap operations (e.g., the introduction of silent transitions, realizing loop back/skipping
activities, or of small subprocesses) enable the acceptance of almost half of the traces
in T (see, e.g., M7 ), at a cost that is around half of the total repair cost (28). Solutions
located in the upper-right part of the Pareto, instead, are characterized by costly repair
operations dealing with the acceptance of parallel behaviours.
The parallelization of activities and the management of mutually exclusive branches
represent typical examples of challenging behaviours for repair techniques (in our case,
for the approach implemented by the ProM plugin). The low precision values of some
repaired models can be ascribed to these two types of criticalities. Concerning the first
44 C. Di Francescomarino et al.
Fig. 11. Sequentialization of parallel activi- Fig. 12. Sequentialization of mutually ex-
ties clusive activities
one (parallelization), indeed, the lack of a dedicated detector for parallel behaviours
causes the insertion of subprocesses in charge of exploring the different interleavings of
the parallel activities. Figure 11 shows a simplified view of this critical setting, which
makes it also clear why extra-behaviours are introduced in the repaired model (e.g.,
C, B, D, C ). Similarly, Figure 12 shows a simplified representation of a particular
case of the second criticality (mutually exclusive branches). When a new activity has
to be added to a block of mutually exclusive branches, it is added in sequence at the
join place as an optional activity, disregarding whether it is a new branch or part of an
existing one. Figure 12 gives an idea of the extra-behaviour introduced in the repaired
model (e.g., A, B, F, E ).
This analysis gives qualitative indications about the consequence of selecting a so-
lution in the central area of the Pareto front (e.g., M5 or M7 ). A business analyst can
repair the model at lower costs, while sacrificing execution traces involving more com-
plex (and costly to repair) behaviours, such as parallel behaviours (M7 ) or both mutually
exclusive and parallel behaviours (M5 ). The analysis provides also indications for the
improvement of existing model repair algorithms, e.g., the need to introduce special
rules dealing with parallelism and mutual exclusion.
5 Related Work
Reconciling execution information and process models, as done in process model dis-
covery and repair, involves multiple, often contrasting, dimensions. Some works [5]
A Multi-objective Approach to Business Process Repair 45
Future works will be devoted to performing further experiments involving larger case
studies, as well as investigating the use of different configurations and tools for process
model repair.
Acknowledgments. This work is partly funded by the European Union Seventh Frame-
work Programme FP7-2013-NMP-ICT-FOF (RTD) under grant agreement 609190 -
“Subject- Orientation for People-Centred Production”.
References
1. van der Aalst, W.M.P.: Business alignment: Using process mining as a tool for delta analysis
and conformance testing. Requir. Eng. 10(3), 198–211 (2005)
2. Arito, F., Chicano, F., Alba, E.: On the application of SAT solvers to the test suite minimiza-
tion problem. In: Proc. of the 4th Int. Symposium on Search Based Software Engineering
(SSBSE), pp. 45–59 (2012)
3. Buijs, J.C.A.M., La Rosa, M., Reijers, H.A., van Dongen, B.F., van der Aalst, W.M.P.: Im-
proving business process models using observed behavior. In: Cudre-Mauroux, P., Ceravolo,
P., Gašević, D. (eds.) SIMPDA 2012. LNBIP, vol. 162, pp. 44–59. Springer, Heidelberg
(2013)
4. Fahland, D., van der Aalst, W.M.P.: Repairing process models to reflect reality. In: Barros, A.,
Gal, A., Kindler, E. (eds.) BPM 2012. LNCS, vol. 7481, pp. 229–245. Springer, Heidelberg
(2012)
5. Fahland, D., van der Aalst, W.M.P.: Simplifying mined process models: An approach
based on unfoldings. In: Rinderle-Ma, S., Toumani, F., Wolf, K. (eds.) BPM 2011. LNCS,
vol. 6896, pp. 362–378. Springer, Heidelberg (2011)
6. Gambini, M., La Rosa, M., Migliorini, S., Ter Hofstede, A.H.M.: Automated error correction
of business process models. In: Rinderle-Ma, S., Toumani, F., Wolf, K. (eds.) BPM 2011.
LNCS, vol. 6896, pp. 148–165. Springer, Heidelberg (2011)
7. Li, C., Reichert, M., Wombacher, A.: Discovering reference models by mining process vari-
ants using a heuristic approach. In: Dayal, U., Eder, J., Koehler, J., Reijers, H.A. (eds.) BPM
2009. LNCS, vol. 5701, pp. 344–362. Springer, Heidelberg (2009)
8. Marchetto, A., Di Francescomarino, C., Tonella, P.: Optimizing the trade-off between com-
plexity and conformance in process reduction. In: Cohen, M.B., Ó Cinnéide, M. (eds.)
SSBSE 2011. LNCS, vol. 6956, pp. 158–172. Springer, Heidelberg (2011)
9. Medeiros, A.K.A.D., Weijters, A.J.M.M.: Genetic process mining: an experimental evalua-
tion. Data Min. Knowl. Discov. 14 (2007)
10. Tomasi, A., Marchetto, A., Di Francescomarino, C.: Domain-driven reduction optimization
of recovered business processes. In: Fraser, G., Teixeira de Souza, J. (eds.) SSBSE 2012.
LNCS, vol. 7515, pp. 228–243. Springer, Heidelberg (2012)
11. Tonella, P., Marchetto, A., Nguyen, C.D., Jia, Y., Lakhotia, K., Harman, M.: Finding the
optimal balance between over and under approximation of models inferred from execution
logs. In: 2012 IEEE Fifth Int. Conf. on. Software Testing, Verification and Validation (ICST),
pp. 21–30. IEEE (2012)
12. Van Veldhuizen, D.A., Lamont, G.B.: Multiobjective evolutionary algorithm test suites. In:
Proc. of the 1999 ACM Symp. on Applied Computing, pp. 351–357. ACM (1999)
13. Yoo, S., Harman, M.: Pareto efficient multi-objective test case selection. In: Proc. of the 2007
Int. Symposium on Software Testing and Analysis, pp. 140–150. ACM (2007)
Memetic Algorithms for Mining Change Logs
in Process Choreographies
1 Introduction
As a result of easier and faster iterations during the design process and at run-
time, the management of business process changes, their propagation and their
impacts are likely to become increasingly important [1]. Companies with a higher
amount of critical changes list change propagation as the second most frequent
objective. In particular, in around 50% of the critical changes, the change neces-
sity stems from change propagation. Thus critical changes are tightly connected
to change propagation in terms of cause and effects [2].
In practice, companies still struggle to assess the scope of a given change. This
is mainly because a change initiated in one process partner can create knock-on
changes to others that are not directly connected. Failures of change propaga-
tions can become extremely expensive as they are mostly accompanied by costly
negotiations. Therefore, through accurate assessments of change impact, changes
The work presented in this paper has been funded by the Austrian Science Fund
(FWF):I743.
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 47–62, 2014.
c Springer-Verlag Berlin Heidelberg 2014
48 W. Fdhila, S. Rinderle-Ma, and C. Indiono
not providing any net benefit can be avoided. Resource requirements and lead
times can be accounted for when planning the redesign process [3]. With early
consideration of derived costs and by preventing change propagation, bears the
potential to avoid and reduce both average and critical changes [2].
Hence it is crucial to analyze propagation behavior, particularly, transitive
propagation over several partners. Note that change propagation might even be
cyclic, i.e., the propagation affects either the change initiator again or one of the
already affected partners. This is mainly due to transitivity; e.g., when a change
propagation to a partner not only results in direct changes he has to apply, but
also leading to redesigns in different parts of his process. In turn, this may have
consequences on the change initiator or a different partner.
This paper is based on change event logs and uses mining techniques to
understand and manage change propagation, and assess how changes propa-
gate between process partners that are not directly connected (cf. Figure 1). A
novel contribution is the implementation of a memetic mining algorithm coupled
with the appropriate heuristics, that enables the mining of prediction models on
change event logs, i.e., no information about the propagation between partners
is provided.
In the following, Section 2 illustrates a motivating example, while Section 3
presents change log formats and gives the global overview of the problem. Section
4 follows up with a set of heuristics for change mining. Based on these heuristics,
we introduce a memetic change mining algorithm in Section 5, which we discuss
and evaluate in Section 6. In Section 7 we discuss related work and conclude in
Section 8.
δAir=Insert(F3) status=accept
resp_date=2013-09-21
resp_time=06:54:09:181
cost=10
δAcq=Replace(F2,F2‘) Airline
Change δT=Delete(F4)
Change
status=accept Propagation status=accept
Propagation resp_date=2013-09-21
Acquirer:
Traveler resp_time=10:21:03:201
πAcq cost=30
Change
Propagation
status=reject
resp_date=2013-09-22 Travel
resp_time=18:41:27:78 Agency
cost=0
δTA=Replace(F5,F5‘)
propagation to the Traveler that reacts by deleting process fragment F4. Finally,
the change propagates to the TravelAgency that would have to replace fragment
F5 by new fragment F5’. However, as the TravelAgency rejects the change, the
entire change propagation fails. According to [4], such change propagation is de-
fined as a function γ : {Insert, Delete, Replace} ×Π → 2{Insert,Delete,Replace}×Π
with γ((δi , πi )) = {(δj , πj )}. γ takes as an input an initial change on a given pro-
cess model and generates the ripple effects on the different partners affected by
the change.
The approach presented in this paper is based on change event logs collected
from different partners. Figure 2 outlines the overall approach and distinguishes
this work from previous ones. In [4], the overall picture of change propagation
in process choreographies has been set out. Also the basic algorithms for change
propagation are provided in [4]. We started analyzing change impacts in pro-
cess choreography scenarios using a priori techniques in [5]. The latter work is
based on the choreography structure only and does not consider information on
previous change propagations that occurred between the partners.
3 Problem Formulation
In this section, we introduce two different change log types and give a global
view on our approach.
are anonymized [8], normalized, collected and put in one file to be mined. In the
following, we refer to this type of log as CEL (Change Event Log).
In practice, it is also possible to have a change propagation log CPL (i.e.
containing the change requests, their impacts and the propagation information
as well). However, since the processes are distributed, it is not always possible
for a partner to track the complete propagation results of his change requests
(due to transitivity and privacy). To be more generic, we adopt change logs that
contain solely change events CEL (without information about propagations)
to be mined. However, in order to validate our mining approach, and assess
the quality of the mined model from the CEL, we also maintain a log of the
actual propagations CPL. The results of the predicted model from the CEL are
compared and replayed on the CPL propagation events.
Anonymization of the logs represents an important privacy step [8], which is
a trivial operation in a non-distributed setting. In a distributed environment a
consistent anonymization scheme needs to be employed, where for example π2
is consistently labeled as X.
Table 1 describes a sample of a change record. Each record includes informa-
tion about the partner that implemented the change (anonymized), the change
ID and type, the timestamps and the magnitude of the change. The latter is cal-
culated using the number of affected nodes (in the process model), the costs (gen-
erated randomly), and the response time. Other costs can be added as needed.
Table 2 describes a propagation record, with more propagation information.
3.2 Overview
C3Pro Framework
Process A Priori Prediction
Models Model
Change
Management Described in [5]
Change
candidate
Propagation Change candidate
Posteriori
model #1 Mined
model #1
Prediction Posteriori
Event Log
Model #1 Prediction Model
(CEL)
Change
N Generations
Negotiation
Memetic Change Mining
Described in [4]
of the propagations are stored in the CPL, and all individual change events are
stored in the CEL. Based on the change simulation data, posteriori and a priori
techniques are provided to evaluate and understand the propagation behavior
in the choreography through prediction models. The a priori technique [5] uses
the choreography structure to assess and predict the impact of a change request.
The posteriori technique, described in this paper, generates prediction models by
mining the previously stored change data. Derived models are validated through
a replay of the CPL.
In the CEL, the relationships between the change events are not explicitly
represented. The changes are collected from different partners without any in-
formation if a change on a business partner is a consequence of a change on
another business partner or if they are independent (because of the transitiv-
ity). In order to correlate between the change events and understand the prop-
agation behavior, we adopted different heuristics related to change in process
choreographies.
4 Heuristics
In this section we present 4 groups of heuristics that can be exploited for mining
change events in process choreographies.
δ δ δ change events on
Log File 11 12 13 1
Forward
−
should be approximately equal to ΓC , and therefore ωC = ΓC . However, a change
event A that occurred at tC − ΓC ± does not necessarily mean that C is conse-
quence of A. Indeed, both events can be independent.
Back to Figure 4, the table shows the possible propagation models that can
be generated according to the assumptions based on backward and forward win-
dows. In the first assumption, we assume that the response time of B matches
the time of its occurrence after A, and the time of its occurrence with respect
to B. Therefore, C can be seen as a possible consequence of either A or B. In
the same time the occurrence of B after A falls within its response time ΓB , and
therefore B can be possibly a consequence of A. As aforementioned, matching
the timestamps of the change events do not necessarily mean they are correlated,
and then we have to consider the possibility that the events might be indepen-
dent. The possible propagation models are then depicted in the second column,
which, merged together, give the probabilistic model in column 3. In the second
assumption, we assume that C can not be candidate for A (according to its
response time), and therefore the number of possible propagation models is re-
duced to only 4. In the last assumption, C can not be considered as consequence
of both A and B, and then the number of models is reduced to 2.
To conclude, the forward and backward windows can be very useful in reducing
the search space and highlighting the more probable propagation paths between
the change events. In addition, the example of Figure 4 considers only a small
window of events, where each event type occurred only once. In a bigger log, a
same change event (e.g. a Replace on a partner) can occur several times, which
may improve the precision of the probabilistic models.
REPLACE
REPLACE
REPLACE
REPLACE
REPLACE
REPLACE
DELETE
DELETE
DELETE
INSERT
INSERT
INSERT
DELETE
DELETE
DELETE
INSERT
INSERT
INSERT
0 (H2) p1.INSERT: set p2.DELETE and p2.REPLACE to 0
partner2 - partner3 - partner2 -
replace delete delete 0 (H3) p1.DELETE: set p2.INSERT and p2.REPLACE to 0
INSERT 0 0 0 1 0 0 0 0 0 INSERT 0 0 0 0 0 0 0
partner1
partner1
partner2
partner2
partner3
set p2.DELETE to 0, otherwise maybe 1
partner3
DELETE 0 1 0 0 1 0 0 0 0 DELETE 0 ? 0 0 ? 0 0 0 0
partner2 - partner3 - partner3 -
insert insert replace ? (H6) p1.REPLACE:
REPLACE 0 0 1 0 0 0 0 0 0 REPLACE ? ? ? ? 0 0 0 if PARTNER_LINK(p1,p2) == 0 &&
PARTNER_LINK(p2,p1) == 0 then
set p2.REPLACE to 0, otherwise maybe 1
(a) (b) (c)
Fig. 5. (a) Genetic Encoding of a candidate solution (b) Candidate solution in graph
form (c) Visualization of heuristics affecting a candidate solution
The implementation and the evaluation of the proposed heuristics within the
memetic mining is described in the following sections.
In this section we outline the memetic algorithm for mining change event logs
used to build change propagation models. This core algorithm is enriched with
an implementation of the heuristics sketched out in the previous Section 4. Em-
ploying a change propagation model, predicting the behaviour of change requests
in the choreography becomes possible. Memetic algorithms follow the basic flow
of genetic algorithms (GAs) [12], which are stochastic optimization methods
based on the principle of evolution via natural selection. They employ a popula-
tion of individuals that undergo selection in the presence of variation-inducing
operators, such as mutation and crossover. For evaluating individuals, a fitness
function is used, which affects reproductive success. The procedure starts with
an initial population and iteratively generates new offspring via selection, muta-
tion and crossover operators. Memetic Algorithms are in their core GAs adding
an inner local optimization loop with the goal of maintaining a pool of locally
optimized candidate solutions in each generation [13].
Genetic Encoding: The genetic encoding is the most critical decision about
how best to represent a candidate solution for change propagation, as it affects
other parts of the genetic algorithm design. In this paper, we represent an indi-
vidual as a matrix D that states the change relationships between the partners
(the genotype). Each cell dij in the matrix has a boolean value equal to 1 only
if a change on πi is propagated to πj , and zero otherwise. The matrix is non
symmetric and the corresponding graph for change propagation is directed. This
means that the probabilities of propagating changes from πi to πj and from πj
to πi are not equal. This is due to the fact that the log file may contain more
change propagations from πi to πj than from πj to πi . Figure 5(a) shows the
representation of a candidate solution. Internally, the table rows are collapsed
resulting in a bitstring of length (m × n)2 where n is the number of partners and
56 W. Fdhila, S. Rinderle-Ma, and C. Indiono
m is the number of change operation types (e.g., (3 × 3)2 in the Figure 5(a)).
Figure 5(b) represents the corresponding propagation model graph. Figure 5(c)
shows the importance of the heuristics in reducing the search space and their
effects on candidate solutions.
for the most probable candidate events. Both windows are determined by
M axi∈P (Γi ), i.e. the maximum average response times over all partner re-
sponse times. For change event candidate selection inside the window, the
individual timestamps are used to determine Δ. For the actual selection, the
variance value can be determined empirically. We have opted to base this
value on the candidate partner’s average response time.
In this section we briefly describe the data set as well as the experimental setup in
order to evaluate the memetic change mining algorithm coupled with the propsed
heuristics for building change propagation models from change event logs (σCEL ).
Heuristics
None -48.74 -48.26 -27.80 -268.75 -226.71 -186.64 -553.90 -502.61 -476.96
H1 -38.08 -30.03 -22.96 -241.34 -197.63 -168.61 -520.53 -482.92 -449.32
H1-H2 -25.94 -20.24 -17.17 -138.19 -106.18 -88.86 -337.45 -289.14 -250.25
H1-H3 0.73 0.78 0.80 0.50 0.55 0.56 0.54 0.55 0.55
H1-H4 0.72 0.76 0.75 0.53 0.58 0.55 0.57 0.55 0.61
H1-H5 0.72 0.77 0.75 0.56 0.50 0.64 0.60 0.62 0.56
H1-H6 0.72 0.77 0.80 0.50 0.63 0.63 0.62 0.65 0.67
H1-H7 0.72 0.78 0.77 0.65 0.67 0.56 0.71 0.73 0.72
H1-H8 0.71 0.77 0.75 0.65 0.68 0.70 0.71 0.73 0.75
1
http://www.wst.univie.ac.at/communities/c3pro/index.php?t=downloads
Memetic Algorithms for Mining Change Logs 59
(a) (b)
Fig. 6. Mined change propagation models (a) via CPL and (b) via CEL
optimization loop of the memetic algorithm. We have several such heuristic sets
as can be observed by the rows in Table 3. (3) Each heuristic set is loaded
into the memetic algorithm and executed on the change logs in Λ for up to 20
generations in turn. (4) For validating the mined model, we derive a propagation
model from the CPL (i.e. σCP L ), and compare it to the mined model (i.e. σCEL )
by applying the following scoring function:
f itnessvalidation = completeness × precision − penalites (4)
The differences between these two models are visually illustrated in the annota-
tions in Figure 6(b). As can be seen in that figure, the memetic change mining
algorithm could find a good approximation for the prediction model, showing
only three extraneous edges (i.e. (Replace, π8) → (Delete, π9 ), (Replace, π8 ) →
(Delete, π7 ) and (Delete, π7 ) → (Delete, π9 )). Our proposed memetic change
mining algorithm fared well in this instance. As more partners are added, more
candidates are included as potential consequences, leading to bigger models with
more extraneous edges.
7 Related Work
Only few approaches have been proposed to compute the changes and their prop-
agation in collaborative process settings [4,14,15,16]. Most of these approaches
use either the public parts of the partner processes or the choreography/collab-
oration model; i.e., the global view on all interactions, to calculate the derived
changes. They mainly calculate the public parts to be changed, but cannot an-
ticipate the impacts on the private parts, which in turn, could engage knock-on
effects on other partners. Besides, in some collaboration scenarios, a partner may
have access to only a subset of the partner processes, and consequently could
not estimate the transitive effects of the change propagation.
Change impact analysis has been an active research area in the context of large
complex systems and software engineering [17,18,19,20]. As pointed out in [5], we
studied these approaches, but found major differences to the problem discussed in
this paper. One difference is based on the different structure of the underlying sys-
tems. Moreover, the use of the structured change propagation logs combined with
memetic as well as genetic mining has not been employed before in these fields.
There exist approaches on impact analysis of change propagation within chore-
ographies, i.e., [19,21]. However, they do not consider previous change propaga-
tion experience to enhance the prediction models.
Also they do not take into consideration the different metrics related to the
specific structure of business process choreographies. Our previous work [5] on
analyzing change impacts in collaborative process scenarios is based on the chore-
ography structure only, i.e., it does not take into consideration any information
on previously applied changes.
8 Conclusion
to guide the candidate selection process towards higher quality ones. The con-
ducted benchmarks and validation of the mined models (see Table 3) show the
positive effects of the defined heuristics for reducing the search space, thus re-
ducing the exploration time for finding accurate prediction models. Future work
aims at mining change propagation logs (CPL), and analyzing dynamic impacts
of process choreography changes.
References
1. Wynn, D.C., Caldwell, N.H.M., Clarkson, J.: Can change prediction help prioritize
redesign work in future engineering systems? In: DESIGN, pp. 600–607 (2010)
2. Maier, A., Langer, S.: Engineering change management report 2011. Technical
University of Denmark, DTU (2011)
3. Ahmad, N., Wynn, D., Clarkson, P.J.: Change impact on a product and its re-
design process: a tool for knowledge capture and reuse. Research in Engineering
Design 24(3), 219–244 (2013)
4. Fdhila, W., Rinderle-Ma, S., Reichert, M.: Change propagation in collaborative
processes scenarios. In: IEEE CollaborateCom, pp. 452–461 (2012)
5. Fdhila, W., Rinderle-Ma, S.: Predicting change propagation impacts in collabora-
tive business processes. In: SAC 2014 (2014)
6. Rinderle, S., Jurisch, M., Reichert, M.: On deriving net change information from
change logs – The Deltalayer-Algorithm. In: BTW, pp. 364–381 (2007)
7. Günther, C., Rinderle-Ma, S., Reichert, M., van Der Aalst, W., Recker, J.: Using
process mining to learn from process changes in evolutionary systems. International
Journal of Business Process Integration and Management 3(1), 61–78 (2008)
8. Dustdar, S., Hoffmann, T., van der Aalst, W.M.P.: Mining of ad-hoc business
processes with teamlog. Data Knowl. Eng. 55(2), 129–158 (2005)
9. van der Aalst, W.M.P.: Process Mining: Discovery, Conformance and Enhancement
of Business Processes, 1st edn. Springer (2011)
10. Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: Mining configurable
process models from collections of event logs. In: Daniel, F., Wang, J., Weber, B.
(eds.) BPM 2013. LNCS, vol. 8094, pp. 33–48. Springer, Heidelberg (2013)
11. Gaaloul, W., Gaaloul, K., Bhiri, S., Haller, A., Hauswirth, M.: Log-based transac-
tional workflow mining. Distributed and Parallel Databases 25(3), 193–240 (2009)
12. Goldberg, D.: Genetic Algorithms in Search, Optimization and Machine Learning.
Addison-Wesley Longman Publishing Co. (1989)
13. Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Natural Com-
puting. Springer, Berlin (2007)
14. Rinderle, S., Wombacher, A., Reichert, M.: Evolution of process choreographies
in DYCHOR. In: Meersman, R., Tari, Z. (eds.) OTM 2006. LNCS, vol. 4275, pp.
273–290. Springer, Heidelberg (2006)
15. Fdhila, W., Rinderle-Ma, S., Baouab, A., Perrin, O., Godart, C.: On evolving
partitioned web service orchestrations. In: SOCA, pp. 1–6 (2012)
16. Wang, M., Cui, L.: An impact analysis model for distributed web service process.
In: Computer Supported Cooperative Work in Design (CSCWD), pp. 351–355
(2010)
17. Bohner, S.A., Arnold, R.S.: Software change impact analysis. IEEE Computer So-
ciety (1996)
62 W. Fdhila, S. Rinderle-Ma, and C. Indiono
18. Giffin, M., de Weck, O., Bounova, G., Keller, R., Eckert, C., Clarkson, P.J.: Change
propagation analysis in complex technical systems. Journal of Mechanical De-
sign 131(8) (2009)
19. Oliva, G.A., de Maio Nogueira, G., Leite, L.F., Gerosa, M.A.: Choreography Dy-
namic Adaptation Prototype. Technical report, Universidade de São Paulo (2012)
20. Eckert, C.M., Keller, R., Earl, C., Clarkson, P.J.: Supporting change processes in
design: Complexity, prediction and reliability. Reliability Engineering and System
Safety 91(12), 1521–1534 (2006)
21. Wang, S., Capretz, M.: Dependency and entropy based impact analysis for service-
oriented system evolution. In: Web Intelligence, pp. 412–417 (2011)
Flexible Batch Configuration in Business Processes
Based on Events
1 Introduction
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 63–78, 2014.
c Springer-Verlag Berlin Heidelberg 2014
64 L. Pufahl et al.
during process execution do influence the execution [10]. Reacting on these events and
changing the specified configuration parameters is required for process improvement.
In this paper, we apply event processing techniques to flexibly adapt these configu-
ration parameters at run-time to react in real-time on changes of the business process
execution environment and improve the batch execution. The contributions of this pa-
per are (i) to provide an overview about changes on batch configuration parameters
triggered by events and (ii) to describe a framework that implements the flexible adap-
tation of configuration parameters triggered through event occurrence.
The paper is structured as follows. Section 2 introduces the concepts of batch regions
and event processing before Section 3 presents a motivating example originating from
a real-world scenario in the healthcare domain. It leads to an analysis on how events
may influence batch execution and corresponding requirements in Section 4. Section 5
presents the concept of flexible adaptation of batch regions based on event processing
techniques. In Section 6, the framework is applied to the healthcare scenario from Sec-
tion 3 as evaluation. Section 7 is devoted to related work and Section 8 concludes the
paper.
2 Foundation
Batch Region. A batch region comprises a connected set of activities. For batch pro-
cessing configuration, a batch region contains four configuration parameters: (1) a group-
ing characteristic to cluster process instances to be processed in one batch based on
attribute values of utilized data, (2) an activation rule to determine when a batch may be
processed while balancing the trade-off between waiting time and cost savings, (3) the
maximum batch size indicating the maximum number of entities to be processed, and
(4) the execution order of the processed entities [20].
Each single execution of the batch region is represented by a batch cluster collecting
– based on the grouping characteristic – a number of process instances for synchro-
nization. Thereby, a batch cluster passes multiple states during its lifetime [21]. It is
initialized (state init) upon request of a process instance. The batch cluster transitions
to state ready (enablement), if the activation rule is fulfilled and is then provided to a
resource that decides to start execution at some point in time. The execution is indicated
by state running. If more than one resource is available, several batch cluster can be ex-
ecuted in parallel. After initialization and before execution start, process instances may
still be added until the maximum batch size is reached (state maxloaded). Termination
of all process instances being part of the batch cluster successfully terminates it.
Collecting multiple objects, e.g., blood samples, may also be done by utilizing loop
or multi-instance structures as specified in the workflow patterns [2]. This requires
to merge multiple process instances into one handling the synchronization. However,
batch regions do not merge instances to retain the single instances autonomy outside
the batch regions. This enables dynamic process instance assignment to batch clusters,
e.g., for run-time cluster adjustments as discussed in this paper or for error handling.
Events. Information about changes or exceptions in the business process environment
are provided by events. Often those events are not stored at one place, but in several
information systems and BPMSs [7]. We refer to events being unstructured and avail-
able in an IT system as raw events. Event processing techniques help to utilize these
Flexible Batch Configuration in Business Processes Based on Events 65
raw events and use them during process execution for process monitoring, adjustment,
and control [5, 9, 10]. Structuring raw events according to a certain description referred
to as structured event type, transforms raw events in a first step into normalized events.
Normalized events are the basis for further processing by, for instance, combination,
aggregation, and enrichment by context data [11]. We distinguish two event types being
relevant for flexible batch processing: (a) business events and (b) process events. Busi-
ness events base on normalized events enriched by business context information that
are relevant for all running process instances. In contrast, a process event is correlated
to a specific process instance and thus provides instance-specific information.
3 Motivating Example
The following healthcare process, the blood testing process in Fig. 1, is used to illustrate
the need for flexible batch execution.
Instantiation of the process takes place, if there is a blood test required for a patient
at the ward. First, the blood test order is prepared before a blood sample is taken from
the respective patient. Afterwards, a nurse transports both to the laboratory, where the
blood sample is first prepared for testing. Then, the actual test is conducted by a blood
analysis machine. The laboratory possesses one machine for each type of blood test. As
the blood analysis machines have an interface to the central hospital information system,
the results are published so that they are accessible by the physicians in the respecting
ward. There, they can evaluate the blood test result and can use it for diagnostic.
Business Event:
Business Event: Section B of
Maintenance of blood blood analysis Prepare Publish
Conduct
analysis machine is machine is not blood blood test
blood test
planned available sample result
within the business process or in its execution environment might require adaptation.
Following, we discuss three example events being of relevance for batch regions in the
blood testing process:
Planned maintenance of a machine: This business event indicates that a maintenance
of a machine is planned. During the maintenance, the machine is not available to con-
duct tests of the specific type. Blood samples in not yet running batch clusters might
expire, because the waiting time of the collected process instances increases by the
maintenance time. Thus, in such situations, the blood analysis should be started shortly
before the maintenance takes place to avoid expired blood samples.
Partly unavailability of a machine: Assume, a blood analysis machine contains four
sections to process blood samples from which one fails. Then, the capacity of the ma-
chine is reduced by one quarter. Hence, the maximum number of process instances
allowed to be contained by a batch cluster should be reduced accordingly.
Transportation of a set of blood samples of the same type is started: Assume, the
timeout is almost reached for a batch cluster while a transportation of blood samples to
the laboratory requiring the same test is started. The respective batch cluster may delay
its activation until the instances arrive to improve cost savings.
These examples show that there exist various situations requiring a flexible adjust-
ment of predefined batch processing behavior in order to (1) reduce costs, (2) avoid
increased waiting time, and (3) ensure correct batch execution, e.g., a reduced capacity
of the task performer. Next, we perform an analysis to set the requirements before we
present our concept in Section 5.
Table 1. Classification on how batch clusters can be changed and by which events
Reducing the maximum batch size may result in batch clusters exceeding the newly
set limit. Then, newest assigned process instances are removed from the corresponding
clusters and get assigned to other or new batch clusters accordingly. The concept intro-
duced in the next section covers all changes of Table 1 including these special cases.
As described in Section 2, during a batch cluster’s lifetime, it may pass the states init
- ready - maxloaded - running - terminated. When a task performer starts execution of a
batch cluster, it transitions to state running. From this moment, no adjustments shall be
done on the respective batch cluster anymore. Therefore, we assume that batch clusters
can only be adjusted in states init, ready, or maxloaded.
Having presented multiple types of changes according to the configuration param-
eters and their implications, we derive three requirements to implement above obser-
vations. First, at design-time, event types relevant for batch cluster adjustment need to
be identified (R1). Then, at run-time, occurring events must be correlated to respective
batch clusters (R2) and they need to be adjusted accordingly (R3).
mat. Besides attributes specific for an struc- Fig. 2. Events influence the properties of
tured event, a structured event type consists batch clusters during run-time
of some content description describing the
structure of the event content of a structured event, e.g., by defining the attributes (keys)
or by an XML schema definition (XSD).
We propose an approach that enables run-time flexibility of batch clusters by batch
adjustments following a batch adjustment rule. A batch adjustment is triggered by a
certain event and may result in the adaptation of some parameters of one batch cluster.
The events to react on, the conditions that need to be met, and the adjustments that may
need to be applied are defined in the batch adjustment rule. The structure of a batch ad-
justment rule follows the (E)vent-(C)ondition-(A)ction principle originating from the
database domain [6]. Events to react on are described by their event type, e.g., an event
indicating the maintenance of a machine. The condition information enables the cor-
relation of the event to the corresponding batch cluster, e.g., only the batch clusters
containing process instances with blood samples for this machine. The described action
specifies the particular adjustment of a batch cluster, e.g., the immediate execution.
Model Level Instance Level Model Level
0..* 1..*
0..* 1..*
Batch Adjustment Rule Batch Adjustment Structured Event Structured Event Type
1 0..*
0..* 0..*
1 1
1 0..*
Fig. 3. Class diagram integrating batch region [20] and event processing [11] concepts. The model
level shows the design-time concepts and the instance level shows their run-time implementation.
The connection of events and the batch region concept is illustrated in the class di-
agram of Fig. 3. One batch region can have an arbitrary set of batch adjustment rules
which are provided by the process designer. A batch adjustment rule refers to at least
one structured event type which can be a business or process event type. The structured
Flexible Batch Configuration in Business Processes Based on Events 69
event types describe based on which events a batch adjustment is triggered. If a struc-
tured event occurs which is relevant for a set of batch clusters, then for each batch cluster
one batch adjustment is created. Thus, a batch adjustment rule can have a arbitrary set of
batch adjustments being related to one or several structured events, but each adjustment
is assigned to only one batch cluster. During the lifetime of a batch cluster, it can be
adapted by an arbitrary set of batch adjustments.
1 machineMaintancePlanned b . e x t r a c t i o n =
2 { machineMaintancePlanned b . id = getGuid ( ) ;
3 m a c h i n e M a i n t a n c e P l a n n e d b . time Sta mp = g e t T i me ( now ) ;
4 SELECT
5 m a c h i n e S t a t u s n . name ,
6 FROM
7 machineStatus n ,
8 technicianSchedule n
9 INTO
10 m a c h i n e M a i n t a n c e P l a n n e d b . MachineName
11 WHERE
12 m a c h i n e S t a t u s n . name =
13 t e c h n i c i a n S c h e d u l e n . ma c h i n e ID AND
14 m a c h i n e S t a t u s n . s t a t u s = ” M a i n t e n a n c e N e e d e d ” AND
15 t e c h n i c i a n S c h e d u l e n . s t a t e = ” p l a n n e d ” AND
16 t e c h n i c i a n S c h e d u l e n . t i m e − g e t T i m e ( now ) <= ma c h i n e ( name ) . g e t R u n t i m e ( ) }
Listing 1.1. Definition of the business event type machineMaintancePlannedb that captures the
information about a maintenance in near future. This event results from events of the machine
itself (event type machineStatusn ) and the technician schedule (event type technicianSchedulen ).
70 L. Pufahl et al.
takes place. Thus, a time constraint is set to create the corresponding business event, if
time until the maintenance is equal or lower to the time needed for a run of the machine
(machine(name).getRuntime() returns the duration of a run of machine name).
This defined event type can be used as trigger for a batch adjustment rule that adapts
the activation rule of batch clusters in case of a maintenance for avoiding expired blood
samples. The proposed batch adjustment rule is shown in Listing 1.2, illustrating its
basic structure. In the condition part of the batch adjustment rule, we ensure that batch
adjustments are only created for batch clusters the event is relevant for. In our exam-
ple, the events of type machineMaintancePlannedb are relevant for all batch clusters
that are intended to run in time where the maintenance is planned to be conducted.
Those should be started before the maintenance takes place to avoid unnecessary wait-
ing times for the blood samples. The relevant clusters are those that have the same
blood testing type as the blood analysis machine to be maintained and that are not yet
enabled for execution, i.e., in state init. The instances of the blood testing batch region
are grouped based on their blood test type (cf. Fig. 1) with the grouping characteristics
= Order.bloodTestType. Thus, the batch cluster’s data view provides information which
blood test type its assigned process instances requires, e.g., BC1(BloodTestA). The data
view of the batch cluster can be used for the condition, cf. Listing 1.2 line 2 and 3.
1 EVENT { m a c h i n e M a i n t a n c e P l a n n e d b }
2 CONDITION b a t c h C l u s t e r . d a t a V i e w == m a c h i n e M a i n t a n c e P l a n n e d b . name
3 b a t c h C l u s t e r . s t a t e == ” INIT ”
4 ACTION b a t c h c l u s t e r . a c t i v a t i o n R u l e = T h r e s h o l d ( 5 0 , 0 h )
Listing 1.2. Definition of a batch adjustment rule to start batch clusters before a maintenance
takes place.
Based on this example, we can observe that a specific batch cluster or a set of spe-
cific batch clusters for which an event is relevant can be identified based on batch clus-
ter specific characteristics, i.e., (1) data view, (2) current state of the cluster, (3) num-
ber of instances contained in a cluster, and (4) type of instances. If no condition is
described, a batch adjustment is created for all batch clusters which are in the init,
ready, or maxloaded state. Clusters being already accepted by the task performer are
not adapted anymore.
The last part of the batch adjustment rule is the definition of actions that need to be
performed when an event happened and the conditions are fulfilled. These actions can
use information of the underlying events to specify the adjustments of the particular
batch cluster. Referring to our example, the action would be to enable the batch execu-
tion before maintenance, cf. Listing 1.2 line 4. With this action, the activation rule of
the cluster is adjusted so that either 50 blood sample are triggered or the batch cluster
waits 0 hours, meaning that the cluster is immediately enabled to be finished before the
maintenance starts.
Batch adjustment rules are utilized to create batch adjustments for batch cluster. A
batch adjustment holds the ID of the corresponding batch cluster and the action that
need to be taken to change certain parameters of the batch cluster. Applying the batch
adjustment rule of our example, a batch adjustment as shown in Listing 1.3 will be
generated for batch cluster 1234.
Flexible Batch Configuration in Business Processes Based on Events 71
1 b a t c h C l u s t e r . i d = 1234
2 b a t c h C lu s t e r . a ct i v at i o n Ru l e = ” Threshold (50 ,0 h ) ”
Listing 1.3. Exemplary batch adjustment created for batch cluster 1234.
The batch adjustment mentioned above will replace the activation rule Threshold
(50,1h) of batch cluster 1234 by Threshold (50, 0h). With regards to the generation of
batch adjustments, if an event is received, it is immediately checked whether this event
is relevant for any available batch cluster. For each relevant cluster, a batch adjustment
is created. In case that the event is valid for a certain time period, the event is stored.
For each further initialized cluster, it is checked whether this event applies. Upon inval-
idation of the event, it is removed from the event storage. After presenting the structure
of batch adjustment rules and the generation of batch adjustments, the next section dis-
cusses the special case where a batch cluster is not only adapted, but a reassignment of
process instances is necessary.
tained by the batch cluster in case of a decreased init ready running terminated
maxBatchSize or (b) the cancellation of a batch clus-
ter in case of a changing groupingCharacteristic. Fig. 4. Lifecycle of batch cluster ex-
The extended lifecycle of batch clusters with the tended by canceled state
canceled state is shown in Fig. 4; a cancellation is only possible from states init, ready,
and maxloaded. In both cases, process instances have to be reassigned to other or new
batch clusters.
In general, process instances that arrive at a batch region, i.e., the enablement of the
entry activity into the region, are temporarily deactivated and assigned to a queue of the
so-called batch cluster manager in the order of their arrival time (first-in-first-out). The
batch cluster manager organizes the assignment of process instances to batch clusters
and, if necessary, initializes new batch clusters.
If a process instance, in case of an ad- groupingCharacteristic = Order.testType
activationRule = Threshold(50 cases, 1h)
justment, is reassigned, it should be han- maxBatchSize = 100
executionOrder = parallel
dled prioritized, because it already experi-
ences a longer waiting time than newly ar- Prepare 10:36
Conduct
Publish
blood blood test
riving instances at the batch region. Thus, sample
blood test
result
the to-be reassigned process instance is
placed in the front of the queue based on BC1 –
Business Event:
ErrorofMachineSection
BloodTestA
its arrival time at the batch region. Then, Batch Adjustment:
maxBatchSize = 100 -25
77 10:07
it is assigned to an existing or new batch
10:10
cluster. In the example of Fig. 5, the num-
Business Events
ber of instances of the batch cluster BC1
have to be reduced because an event indi- Fig. 5. Reassignment of process instances in
cated that a section of machine A is not case of a reduced maxBatchSize
72 L. Pufahl et al.
working currently. Then, the newest assigned instances are removed from the size-
reduced cluster. The process instance with the arrival time 10:07 is placed at the be-
ginning of the queue, then the instance with 10:10 is added followed by the newly
arrived instance at 10:36.
Often batch regions have an activation rule with a time constraint which describes
the maximum waiting time for a process instance in a batch cluster. In the example
process of Fig. 5, the threshold rule states that either 50 instances have to be available
or the waiting time of 1h is exceeded to activate the batch cluster. For assuring the
maximum waiting time also for reassigned process instances, we propose the usage
of the batch adjustment concept here. If an instance is added to a batch cluster which
was arrived at the batch region earlier than the batch cluster was created (or one of its
instances), an event is created. This event triggers a batch adjustment which reduces
the time constraint of the batch cluster by the difference between the batch cluster’s
creation time and the reassigned instance arrival time at the batch region.
5.4 Architecture
Next, we present an architecture showing details about a technical implementation to
flexibly adapt batch cluster configurations. Fig. 6 presents the main components and
their interactions as FMC block diagram [12]. The architecture is structured into three
parts: event producer, event processing platform, and process control. The process en-
gine, which controls process execution and batch handling, is an event producer and
consumes event provided by the event processing platform at the same time. Besides
the process engine, several event producers (event sources) can be connected via an
appropriate event adapter to the event processing platform. These can be information
systems as well as databases. The event processing platform normalizes the received
raw events and creates business and process events based on defined rules. Event con-
sumers are connected by an event consumer interface.
Event Producer Event Processing Platform Process Control
Process Engine Process Modeling
Event Process
Process Engine
Normalization Execution
Event Consumer Interface
Event Source n
Batch
Business and Process Adjustment Process
Events Handler Repository
Event Source n+1
...
Batch Region
Process Engine
Event Source m Process Event Creation Configuration
Database
Fig. 6. Architecture to realize batch adjustments during process execution based on an event pro-
cessing platform
Process control comprises the process engine and some modeling environment to
create the process model to be executed within the process engine. After creation, a
Flexible Batch Configuration in Business Processes Based on Events 73
process model is stored in the process repository. While modeling a process, batch
regions can be designed. Thereby, the process designer can define batch adjustment
rules used at run-time to adapt the batch regions. Those are saved together with the
process model in the process model repository. During process execution, the process
engine retrieves the process model and the adjustment rules from the repository. For
each designed batch region, the batch cluster manager assigns the process instances
to batch clusters. The batch adjustment handler registers for events that are specified
in the batch adjustment rules of a batch region at the event consumer interface. If the
handler receives a registered event from the event processing platform, then the event is
evaluated and the according action is triggered for the appropriate batch clusters. The
batch adjustment handler has an internal list of all batch clusters which are in state init,
ready, or, maxloaded as these are the only ones that might be affected by events.
6 Evaluation
The approach is evaluated by showing its applicability to a real world use case: the
blood testing scenario introduced in Section 3 with a simulation. As described, the lab-
oratory uses a batch region to synchronize several blood samples for the blood analysis
to save machine costs. The blood analysis machine needs to be maintained regularly
respectively on request. Based on an event informing about the maintenance some time
before it actually starts, the configuration of a running batch cluster can be adjusted.
With the adjustment, the cluster is started in-time to decrease the number of expired
blood samples due to unavailability of the machine. A blood sample expires after a cer-
tain time frame, often 90 to 120 minutes, because the blood structure changes. Then,
the blood sample is not useful for medical analysis. Each expired blood sample causes
costs of taking a new one.
Simulation Setup. For the evaluation, a simulation is used to compare the number of
expired blood samples in case of normal batch execution, i.e., without run-time adap-
tations, to flexible batch execution as presented in this paper. Therefore, the laboratory
part of the blood testing process was implemented as simulation1
with DESMO-J [8], a Java-based framework for discrete event simulation. The sim-
ulation starts with the arrival of process instances, i.e., blood samples, at the laboratory.
Each process instance is terminated after finishing the blood test. At average, using an
exponential distribution, every 12 minutes, a nurse brings 20 ± 5 blood samples (nor-
mally distributed) to the laboratory. For this simulation, we assumed that only one blood
analysis machine exists. One run of the machine for analyzing blood samples takes 25
minutes. At maximum, the machine can handle 100 blood samples in one analysis.
For the simulation, the laboratory selected ThresholdRule(50 instances, 1h) as acti-
vation rule requiring 50 instances or a waiting time of one hour to enable a batch cluster
(cf. Fig. 1). If a batch cluster fulfills this rule, it queues for being processed by the ma-
chine. The machine is already in use for a longer time period. Thus, twice a week, every
3.5 days with a deviation of 1 day, a maintenance is required. For the flexible batch han-
dling, some time before the technician arrives, an event regarding the maintenance is
1
The simulation source code and the reports of the different simulation runs are available at
http://bpt.hpi.uni-potsdam.de/Public/FlexibleBatchConfig
74 L. Pufahl et al.
provided. When the technician arrives, he is prioritized, but a current analysis on the
machine is not interrupted.
Results. We conducted several simulation runs for two scenarios to compare the im-
pact of flexible batch adjustments. The scenarios differ in the expiration time for blood
samples: 120 minutes and 90 minutes. Fig. 7 and 8 summarize the results of the simu-
lation runs over a period of two years, one diagram for each scenario. In both diagrams,
we compare the results for maintenance times of 45 minutes and 60 minutes (inter-
cept 2 and 3) with the result where no maintenance takes place (intercept 1). The black
bars provide the numbers of expired blood samples, if (1) no adjustments are made at
run-time. The different gray bars (2)-(4) show the results for event triggered batch ad-
justments, if the event is sent 1, 1.5, or 2 times the analysis run, i.e., 25, 37.5, or 50
minutes respectively, before the technician arrives.
If no maintenance would be conducted, 1,738 samples in scenario 1 and 19,913
samples in scenario 2 would expire due to exponential arrival of these blood samples
and resulting waiting times for the machine. If the maintenance is conducted at average
twice a week as indicated above, the number of expired blood samples increases by
14% respectively 29% for 45 and 60 minutes maintenance duration in scenario 1 (cf.
black bars in Fig. 7) and they increase by 13% respectively 41% in scenario 2 (cf. black
bars in Fig. 8).
(1) Usual (2) Event – 1.0 run earlier (3) Event – 1.5 runs earlier (4) Event – 2.0 runs earlier
2,500
2,243
2,000
2,009
1,979
1,971
1,863
1,769
1,751
1,738
1,736
1,500
1,000
500
0
W/o maintenance Maintenance duration: Maintenance duration:
45 min 60 min
Fig. 7. Scenario 1 – 120 min expiration time: Number of expired blood samples in two years for
different simulations
Applying flexible batch adjustments aims at reducing the number of expired blood
samples. The recognition of the event indicating the maintenance directly activates all
initialized batch clusters by changing the activation rule accordingly (cf. line 4 in List-
ing 1.2 in Section 5.2). The impact of the batch adjustment rule with respect to the point
in time the event is sent is shown by the different gray bars (2)-(4). In 9 of 12 cases, we
observe measurable improvements. The highest improvements for the different settings
are mostly observed for the light gray bar ((2) Event 1.0 run earlier). It indicates that
it is most beneficial for reducing the number of expired blood samples to inform about
the maintenance one analysis run before the start of the maintenance. The improvement
is at 13% respectively 20% in scenario 1 for 45 respectively 60 minutes maintenance
time and at over 3% in scenario 2 (60 minutes maintenance). With these numbers, for
scenario 1, we almost compensate for the maintenance.
For scenario 2, shown in Fig. 8, only slight improvements as well as two cases of
no improvements are observed. This may be explained as follows: The arriving event
enables a batch cluster which is then started for the blood analysis. During the analysis,
Flexible Batch Configuration in Business Processes Based on Events 75
(1) Usual (2) Event – 1.0 run earlier (3) Event – 1.5 runs earlier (4) Event – 2.0 runs earlier
30,000
28,023
28,021
27,657
25,000
27,138
22,738
22,753
20,000
22,494
22,085
19,913
15,000
10,000
5,000
0
W/o maintenance Maintenance duration: Maintenance duration:
45 min 60 min
Fig. 8. Scenario 2 – 90 min expiration time: Number of expired blood samples in two years for
different simulations
multiple new samples might arrive, but they are not processed before the maintenance
as the technician is prioritized. Due to the small expiration time of 90 minutes, there
is a good chance that those samples expire. For a maintenance time of 45 minutes, all
samples which arrive 5 minutes after the start of the flexibly enabled cluster expire,
because they have at least 20 minutes waiting time before the maintenance plus 45
minutes maintenance time plus another 25 minutes analysis time summing up to at
least 90 minutes. For 60 minutes maintenance, all samples arriving at least 20 minutes
after the start of the flexibly enabled cluster will expire. Thus, if – due to the arrival
distribution of the blood samples – many samples arrive within these time frames, also
negative results can be observed.
Summarizing above observations, it is important to check the relation between expi-
ration time as well as waiting and maintenance times to decide whether to apply batch
adjustments or not. In case, the relations are appropriate as, for instance, in scenario 1,
applying batch adjustments provides reasonable and measurable improvements.
The simulation results indicate that the waiting time for the technician slightly in-
creases, in average less than a minute. Due to limited space, the reader is referred to
our simulation reports (see footnote 1). If, we take scenario 1, the cost savings due to
reductions in expired blood sample will be higher than the technician costs due to small
increases in the waiting time.
In most cases, we can observe that the number of zero-waitings increases, because
starting an analysis run shortly before the technician arrives, increases the chance that
the run is terminated just upon arrival. However, sometimes a run may only be started
shortly before the technician’s arrival as some other analysis run was still busy. Then,
the technician must wait longer resulting in a higher distribution of waiting times and a
higher total average waiting time.
7 Related Work
In the business process research domain, few works exists to synchronize the execution
of multiple instances. For example in [1,14,23], the integration of batch processing into
process models is discussed. These works provide limited parameters to configure the
batch execution at design-time, often only the maximum capacity. This also limits op-
portunities to conduct adjustments at run-time. [23] provides some means for flexible
run-time batch control by introducing batch activation by user invocation. Extending
76 L. Pufahl et al.
the options for batch configuration in business processes, [21] introduces batch activi-
ties with three configuration parameters: capacity as the ones above as well, rule-based
activation generalizing the user invocation based on rules, and execution order. One step
forward, [20] extends the parameters by the grouping characteristic to distinguish pro-
cess instances. However, all these works focus on specifications at design-time and do
not support automatic adjustments of the batch configuration at run-time, for instance,
due to changes in the process environment or within the process itself. In this paper,
we extend the concepts presented in [20, 21] to allow run-time flexibility in terms of
configuration adaptation to improve batch processing in business processes. We utilize
events as trigger for taking adjustment actions. These extensions can also be applied to
other works for adapting the configuration parameters offered there.
Batch processing flexibility has also been discussed in other domains as, for example,
the manufacturing domain [17]. Here, batch scheduling is used to schedule a number
of available jobs on a single or on multiple machines for saving set-up costs. Changes
of market factors, e.g., a canceled order, or on the operational level, e.g., breakdowns,
require a rescheduling functionality. In [17], an overview of suitable algorithms is pre-
sented and the need for a framework which combines possibly occurring events with
some reschedule action is discussed. The contributions of this paper can be adjusted to
offer a first approach in this direction: instead of configuration parameter adjustments,
rescheduling action can be used in the batch adjustment rule.
Adoption of process instances during run-time is a widely discovered field. [22] dis-
cusses manual ad-hoc changes of single instances, e.g., to insert, delete, or shift ac-
tivities according a given process model. This provides flexibility for single process
executions but this does not provide possibilities to pool several process instances and
to work on them as a batch. The CEVICHE framework [9] allows to change process
instances automatically during run-time. Similar to this paper, it uses Complex Event
Processing (CEP) to detect changes and exceptions which then trigger dynamic adap-
tation of the BPEL processes. In the same vein, [5] discusses means to integrate CEP
with BPMSs on architectural level and shows how to do this for a BPEL engine. [24]
introduces an approach to discover deviations of process executions and the underlying
process model by using CEP techniques.
In this paper, we use CEP techniques as, for example, described in [7, 15], to create
the necessary business and process events. [7] lists definitions for CEP-related terms,
e.g., event type, that are used in this paper. Based on these works, a framework for
CEP for business processes was introduced [10, 11]. We utilize this framework to allow
dynamic batch activation and configuration rule adaptations as presented in Section 5.
In this paper, we deal with comparably simple rules to correlate events to each other,
to process instances, and to batch clusters. Applying common correlation techniques
extends the correlation capability of the presented approach. One of these techniques,
the determination of correlation sets based on event attributes, is introduced in [19].
8 Conclusion
In this paper, we showed the necessity to synchronize multiple cases in batch clusters
and the requirement of their flexible adjustments during run-time. Therefore, a con-
cept is introduced to apply event processing to batch execution allowing to flexibly
Flexible Batch Configuration in Business Processes Based on Events 77
adjust batch configuration parameters and batch activation based on run-time changes
represented by events. Based on the principle of Event-Condition-Action rules, rele-
vant events are identified and then compared to defined conditions. If the conditions
are fulfilled, the configured actions are executed as a batch adjustment for the corre-
sponding batch cluster. Further, an architecture is presented showing details about a
technical implementation and the components that are necessary to apply the concept
within a process engine. We showed applicability of the introduced concept of batch
adjustments during run-time with a real-world use case of a blood analysis in a hos-
pital’s laboratory. We simulated two years of work in the laboratory and showed that
the application of the presented concept compensates for maintenance interruptions de-
creasing the blood expiration rate by at most 7%. With integrating more information
about the process environment, e.g., the availability of resources, the presented concept
can be extended. Further, techniques to ensure that batch adjustments do not lead to
inconsistencies should be developed. We will investigate this topic in the future.
References
1. van der Aalst, W., Barthelmess, P., Ellis, C., Wainer, J.: Proclets: A Framework for
Lightweight Interacting Workflow Processes. IJCIS 10(4), 443–481 (2001)
2. van der Aalst, W.M.P., ter Hofstede, A.H.M., Kiepuszewski, B., Barros, A.P.: Workflow Pat-
terns. Distributed and Parallel Databases 14(1), 5–51 (2003)
3. Activiti: Activiti BPM Platform, https://www.activiti.org/
4. Bonitasoft: Bonita Process Engine, https://www.bonitasoft.com/
5. Daum, M., Götz, M., Domaschka, J.: Integrating CEP and BPM: How CEP Realizes Func-
tional Requirements of BPM Applications (Industry Article). In: DEBS, pp. 157–166. ACM
(2012)
6. Dayal, U.: Active Database Management Systems. In: JCDKB, pp. 150–169 (1988)
7. Etzion, O., Niblett, P.: Event Processing in Action. Manning Publications Co. (2010)
8. University of Hamburg, D.o.C.S.: DesmoJ - A Framework for Discrete-Event Modeling and
Simulation, http://desmoj.sourceforge.net/
9. Hermosillo, G., Seinturier, L., Duchien, L.: Using Complex Event Processing for Dynamic
Business Process Adaptation. In: SCC, pp. 466–473. IEEE (2010)
10. Herzberg, N., Meyer, A., Weske, M.: An Event Processing Platform for Business Process
Management. In: EDOC, pp. 107–116. IEEE (2013)
11. Herzberg, N., Weske, M.: Enriching Raw Events to Enable Process Intelligence - Research
Challenges. Tech. Rep. 73, HPI at the University of Potsdam (2013)
12. Knöpfel, A., Gröne, B., Tabeling, P.: Fundamental Modeling Concepts: Effective Communi-
cation of IT Systems. Wiley (2005)
13. Lanz, A., Reichert, M., Dadam, P.: Robust and flexible error handling in the aristaFlow BPM
suite. In: Soffer, P., Proper, E. (eds.) CAiSE Forum 2010. LNBIP, vol. 72, pp. 174–189.
Springer, Heidelberg (2011)
14. Liu, J., Hu, J.: Dynamic Batch Processing in Workflows: Model and Implementation. Future
Generation Computer Systems 23(3), 338–347 (2007)
15. Luckham, D.: The Power of Events. Addison-Wesley (2002)
16. Luckham, D., Schulte, R.: Event Processing Glossary - Version 2.0 (July 2011), http://
www.complexevents.com/wp-content/uploads/2011/08/EPTS_Event_
Processing_Glossary_v2.pdf
78 L. Pufahl et al.
17. Méndez, C.A., Cerdá, J., Grossmann, I.E., Harjunkoski, I., Fahl, M.: State-of-the-art review
of optimization methods for short-term scheduling of batch processes. Computers & Chem-
ical Engineering 30(6), 913–946 (2006)
18. Meyer, A., Pufahl, L., Fahland, D., Weske, M.: Modeling and Enacting Complex Data De-
pendencies in Business Processes. In: Daniel, F., Wang, J., Weber, B. (eds.) BPM 2013.
LNCS, vol. 8094, pp. 171–186. Springer, Heidelberg (2013)
19. Motahari-Nezhad, H.R., Saint-Paul, R., Casati, F., Benatallah, B.: Event Correlation for Pro-
cess Discovery from Web Service Interaction Logs. VLDB Journal 20(3), 417–444 (2011)
20. Pufahl, L., Meyer, A., Weske, M.: Batch Regions: Process Instance Synchronization based
on Data. In: EDOC. IEEE (2014) (accepted for publication)
21. Pufahl, L., Weske, M.: Batch Activities in Process Modeling and Execution. In: Basu, S.,
Pautasso, C., Zhang, L., Fu, X. (eds.) ICSOC 2013. LNCS, vol. 8274, pp. 283–297. Springer,
Heidelberg (2013)
22. Reichert, M., Dadam, P.: Enabling Adaptive Process-aware Information Systems with
ADEPT2. In: Handbook of Research on Business Process Modeling, pp. 173–203. Infor-
mation Science Reference (2009)
23. Sadiq, S., Orlowska, M., Sadiq, W., Schulz, K.: When Workflows Will Not Deliver: The Case
of Contradicting Work Practice. BIS 1, 69–84 (2005)
24. Weidlich, M., Ziekow, H., Mendling, J., Günther, O., Weske, M., Desai, N.: Event-based
monitoring of process execution violations. In: Rinderle-Ma, S., Toumani, F., Wolf, K. (eds.)
BPM 2011. LNCS, vol. 6896, pp. 182–198. Springer, Heidelberg (2011)
25. Weske, M.: Business Process Management: Concepts, Languages, Architectures. Second
Edition, 2nd edn. Springer (2012)
Automatic Generation of Optimized Workflow
for Distributed Computations on Large-Scale Matrices
1 Introduction
Cloud computing offers an attractive alternative to easily and quickly acquire IT
services such as storage and computation services. Its adoption continues to grow as
companies opt for flexibility, cost savings, performance and scalability. Cloud services
such as Elastic MapReduce offer an attractive platform for outsourcing the storage and
computations on large scale data because of their optimized algorithmic
implementations and access to on-demand large-scale resources. We focus particularly
on matrix algebra computations since they are used in many scientific domains;
including but not limited to analysis of big data, image processing, computer graphics,
information retrieval and data mining applications. The inputs are typically large-scale
matrices and performing math operations (e.g. multiply, inverse, transpose,
add/subtract, dot product…) on them could be long-running. In this paper, we consider
the scenario where several cloud services are offering matrix storage and basic matrix
operations with different service characteristics. Based on availability, quality of
service (QoS), reliability, security and data locality, the optimal decomposition, task
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 79–92, 2014.
© Springer-Verlag Berlin Heidelberg 2014
80 F. Sabry et al.
2 Related Work
Service composition is closely related to workflow [9]; automatic workflow genera-
tion can be considered a subtask from automated web service composition. The latter
term is considered more general as it includes an extra step of the automatic service
discovery and selection from the set of available services. According to a survey of
automated web services composition [10], this can be done using workflow techniques
or AI planning. The workflow techniques can be further classified as either static or
dynamic [9]. The static techniques mean that the requester should build an abstract
process model before the composition planning starts. Only the selection and binding
Automatic Generation of Optimized Workflow for Distributed Computations 81
to atomic web services is done automatically. On the other hand, the dynamic compo-
sition both creates process model and selects atomic services automatically. This re-
quires the requester to specify several constraints, including the dependency of atomic
services, the user’s preference and so on. An example for a static workflow generation
approach was implemented in ASTRO project [11].
According to [11], one of the phases for the automatic composition of web services
is the translation between the external and internal languages used by the service
composition system. The external language is used by the service users to express
what they can offer or what they want in a relatively easy manner. For example,
BPMN (Business Process Modeling Notation) to BPEL translation is presented in [12]
where the designer uses BPMN graphical notations to easily describe the process
control flow and data flow and then it gets automatically translated to BPEL. This
work can also be considered static in the sense that BPMN is describing the con-
trol/data flow as input. Similar work was proposed in [13] but using XPDL (XML
Process Definition Language) which is a graph-structured language mainly used in
internal process modeling. However, in this work the generated outputs are abstract
BPEL processes that are not fully executable and deployable and they need some
manual editing to be ready for deployment. Also in [12], it is stated that it cannot
detect all pattern types and the code produced by this transformation lacks readability.
Our approach for automatic workflow generation presented in this paper is consi-
dered dynamic in the sense that the workflow steps and the process model that describes
the control flow and data flow are not input by the requester but they are created auto-
matically according to the parsing of the input expression. Additionally the atomic ser-
vices used for computations are selected based on their functionality and QoS such as
accuracy, reliability, performance and security. We assume that developers/researchers
are using contract-based web service composition; and they are provided with the
WSDLs representing the interfaces of the available services and their characteristics.
Our proposed framework depends on the service-oriented architecture where large-scale
mathematical computations are offered as services and this differs from other distributed
execution engines like MapReduce [23] or DryadLINQ [24].
The main components of our proposed framework are depicted in Fig. 1. First, the
developer/researcher inputs the expression and the resources' references corresponding
to the aliases of the operands (i.e., the location where each operand is stored). A
configuration file specifies additional parameters such as the registry address where the
WSDLs of the services are stored. These WSDLs serve as the interface to the external
cloud services to be invoked or composed in the generated BPEL process. A parser
parses the input mathematical expression into an expression tree. An optimizer then
transforms the tree to a more consolidate form based on data locality of operands and
identifies independent operations that can be done in parallel. The optimizer also
annotates the nodes of the tree based on their types (operands vs. operators). Then the
translator traverses the tree and maps the tree parts to corresponding BPEL activities.
Attributes of these activities like the partner link to the service to invoke, the values of
the input variables to this service and their types are initialized according to the
annotations set by the optimizer. The output of the translator is a BPEL process
accompanied by a deployment descriptor so that it can be deployed to a BPEL engine
for execution. In the next section we present formal definitions and explain in more
details the different steps of the automation process.
In our framework the service definitions are obtained from a local registry by
parsing the corresponding WSDL files.
Definition 2: [Operators] are the set of predefined tokens representing unary and
binary operations on matrices such as addition, subtraction, multiplication, dot product,
inverse of a matrix and transpose of a matrix: , , , ., , ^ , ′ .
Definition 3: [Operands] are the set of input literals used in the input mathematical
expression, , , ... where each is an alias for a resource matrix with
metadata (location, nRows, nCols, datatype). The , mapping tuples are stored
to a hash map so-called LM.
Automatic Generation of Optimized Workflow for Distributed Computations 83
Definition 4: [Expression Tree] is the binary tree obtained from parsing the input
string expression and is defined by , , where , … is the set of
tree nodes, , , , , , … . represents the
connections between the nodes, where , means is a parent of . The
following conditions apply:
is the only node with no parents.
The leaf nodes must belong to .
Internal nodes belong to O, a hash map OS maps each operator node
to the service offering this operation and being selected to
do the operation according to data locality, concurrency considerations and
QoS parameters.
Each node has at most two direct children.
Methods and get the left and right child of node .
Given these definitions, we discuss next expression-to-BPEL translation steps in
more details.
expression evaluation. The output of this step is a left-deep parse tree, an example is
shown in Fig. 2(a).
Matrix A B C D E F G
ID
Size 1000 1000 1000 1 100 1000 1000
*1000 *1000 *1 *100 *1000 *1000 *1000
Location S1 S2 S2 S1 S1 S1 S2
and li, lj represent operand nodes. The rules has for mission to map the expression tree
parts to their equivalent BPEL constructs such as assign, invoke, receive, sequence,
and flow. BPEL Assign activity is used to exchange values between incoming and
outgoing message variables. Invoke activity is used to do the service invocation.
Receive activity is to receive an input message or a callback message. Sequence
activity is to group some activities to be done in sequence. Flow activity is used when
different sequences are to be done in parallel. Attributes of these activities like the
partner link to the service to invoke, values of input variables to the service and their
types are initialized according to the annotations values of the nodes (operands and
operators: , ).
The output of this transformation is a BPEL process saved to a “.bpel” file, a workflow
interface description saved to a “.wsdl” file. This is because workflow itself is
deployed as a web-service. A deployment descriptor saved to “deploy.xml” is also
generated so that the workflow can be deployed to a BPEL engine for execution.
The translation algorithm of an expression tree T to executable BPEL code that
includes the BPEL constructs to be used and the control flow is shown in Fig. 4. The
algorithm is a post-order traversal for the expression tree T with the mapping rules
shown in Fig. 3 applied. The rule case (c) in Fig. 3(c) is considered the base case used
for the recursive traversal where the tree has an operator as a parent and its two
children are operands , or only left child in case where is a unary
operator. In this case, the mapping is a sequence activity that includes (assign, invoke,
receive). The BPEL assign activity is for assigning input values for the variable used in
the invocation. The invoke activity and then the callback receive activity are to get the
information about the intermediate result location. The attributes of these activities are
determined from the computation services definition S and the mapping. is
the selected service for operation . The , mapping is used to get the
metadata of the input matrices. Case (a) occurs when the two children are operators
which mean that the services in these two paths can be executed in parallel. This
corresponds to the BPEL Flow construct including two sequences for the mapping of
the two children where each child has its own scope. Case (b) occurs when one of the
children is an operator and the other is a literal which means that the mapping of
and will be a Sequence activity. A flow stack is maintained so that during traversal
if case (a) is encountered a Flow activity is pushed into the stack and the two paths
are executed in parallel. The activity is popped out once its left and right children
return.
execute according to the minimum data transfer criterion. BPEL code generation is
done according to the algorithm in Fig. 4. We modified the unified framework pack-
age [22] for generation and serialization of BPEL constructs. We used web services
using MapReduce for matrix multiplication and addition operations that we used in
[1] for testing. These input WSDLs are read and de-serialized using wsdl4j library.
Fig. 3. Mapping expression tree patterns to the corresponding BPEL constructs where
Mapping(ox) is a recursive function with case (c) as the base case
Automatic Generation of Optimized Workflow for Distributed Computations 89
The first experiment is to test for ten different expressions available on the project
page as a sample dataset with different number of literals ranging from 4 to 10. The
data locality optimization is not taken into consideration in this experiment and it is
assumed that the data matrices are stored on the same server offering these web ser-
vices. Results are shown in Fig. 5 with an average speed-up (Tsequential/Tworkflow)
of 1.8. From the results it is clear that the optimized workflow achieve better results
for expressions with larger number of literals and which have operations that can be
done in parallel.
90 F. Sabry et al.
Fig. 5. Optimized workflow execution time vs. the sequential execution time for 10 different
expressions
In the second experiment, we assume matrices are stored on different servers and
according to the data locality optimization step; a service is chosen to execute a certain
operation in an expression tree so that it minimizes the data transfer between servers.
So we compare the time taken for data transfer being logged by the web services under
test between the optimized workflow with web services selection according to data
locality and random web services selection. Fig. 6 shows that for most of the
expressions under test, the data transfer time is less when web services are selected
according to data locality (expression 8 has all its data stored on the same server, that’s
why no data transfer time recorded). Some cases show no improvement; this depends
on the heterogeneity of the distributed data.
6 Conclusion
For future work, we aim to incorporate QoS-based service selection. This feature
will allow selecting the most appropriate service among functionally-equivalent
computation services having the same score according to data locality and size of input
data but offering different QoS guarantees.
Fig. 6. Data transfer time taken by services selected according to data locality vs. random selec-
tion for different expressions
Acknowledgments. This publication was made possible by a grant from the Qatar
National Research Fund; award number NPRP 09-622-1-090. Its contents are solely
the responsibility of the authors and do not necessarily represent the official views of
the Qatar National Research Fund.
References
1. Nassar, M., Erradi, A., Sabri, F., Malluhi, Q.: Secure Outsourcing of Matrix Operations as
a Service. In: 6th IEEE International Conference on Cloud Computing, pp. 918–925. IEEE
Press (2013)
2. Web Services Business Process Execution Language v2.0, http://docs.oasis-
open.org/wsbpel/2.0/OS/wsbpel-v2.0-OS.html
3. Van der Aalst, W.M.P., ter Hofstede, A.: YAWL: Yet Another Workflow Language.
Information Systems 30(4), 245–275 (2005)
4. Taverna Workflow Management System, http://www.taverna.org.uk/
5. Altintas, I., Berkley, C., Jaeger, E., Jones, M.: Ludascher. B., Mock, S.: Kepler: an extens-
ible system for design and execution of scientific workflows. In: Scientific and Statistical
Database Management International Conference, pp. 423–424 (2004)
92 F. Sabry et al.
6. Sonntag, M., Karastoyanova, D., Deelman, E.: BPEL4Pegasus: Combining Business and
Scientific Workflows. In: Maglio, P.P., Weske, M., Yang, J., Fantinato, M. (eds.) ICSOC
2010. LNCS, vol. 6470, pp. 728–729. Springer, Heidelberg (2010)
7. Apache ODE: http://ode.apache.org/
8. WebSphere Application Server Enterprise Process Choreographer,
http://www.ibm.com/developerworks/websphere/
9. Dustdar, S., Schreiner, W.: A survey on web services composition. Journal of Web and
Grid Services 1(1), 1–30 (2005)
10. Rao, J., Su, X.: A Survey of Automated Web Service Composition Methods. In: Cardoso,
J., Sheth, A.P. (eds.) SWSWPC 2004. LNCS, vol. 3387, pp. 43–54. Springer, Heidelberg
(2005)
11. Trainotti, M., Pistore, M., Calabrese, G., Zacco, G., Lucchese, G., Barbon, F., Bertoli,
P.G., Traverso, P.: ASTRO: Supporting Composition and Execution of Web Services. In:
Benatallah, B., Casati, F., Traverso, P. (eds.) ICSOC 2005. LNCS, vol. 3826, pp. 495–501.
Springer, Heidelberg (2005)
12. Ouyang, C., Dumas, M., ter Hofstede, A.H.M., van der Aalst, W.M.P.: Pattern-based trans-
lation of BPMN process models to BPEL web services. International Journal of Web Ser-
vices Research 5(1), 42–62 (2007)
13. Yuan, P., Jin, H., Yuan, S., Cao, W., Jiang, L.: WFTXB: A Tool for Translating Between
XPDL and BPEL. In: 10th IEEE International Conference on High Performance Compu-
ting and Communications, pp. 647–652. IEEE Press (2008)
14. JEP (Java Expression Parser), http://www.singularsys.com/jep
15. Kastner, R., Hosangadi, A., Fallah, F.: Arithmetic Optimization Techniques for Hardware
and Software Design. Cambridge University Press, Cambridge (2010)
16. Bacon, D., Graham, S., Sharp, O.: Compiler Transformations for High-Performance Com-
puting. ACM Computing Surveys 26(4), 345–420 (1994)
17. Parr, T., Fisher, K.: LL(*): The Foundation of the ANTLR Parser Generator. In: Program-
ming Language Design and Implementation Conference (PLDI), pp. 425–436 (2011)
18. Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms, 3rd edn., pp.
370–377. MIT Press (2009)
19. Hameurlain, A.: Evolution of Query Optimization Methods: From Centralized Database
Systems to Data Grid Systems. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA
2009. LNCS, vol. 5690, pp. 460–470. Springer, Heidelberg (2009)
20. Evrendilke, C., Dogac, A., Nural, S., Ozcan, F.: Multidatabase query optimization. Journal
of Distributed and Parallel Databases 5(1), 77–114 (1997)
21. Zeng, L., Benatllah, B., Ngu, A.H.H., Dumas, M., Kalagnanam, J., Chang, H.: QoS-Aware
Middleware for Web Services Composition. IEEE Transactions On Software Engineer-
ing 30(5), 311–327 (2004)
22. Unify framework package, Software Languages Lab, Vrije Universiteit Brussel,
http://soft.vub.ac.be/svn-gen/unify/src/org/unify_framework/
23. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters.
Communications of the ACM - 50th anniversary issue 51(1), 107–113 (2008)
24. Yuan, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, U., Gunda, P.K., Currey, J.:
DryadLINQ: A system for general-purpose distributed data-parallel computing using a
high-level language. In: OSDI 2008 Proceedings of the 8th USENIX Symposium on Oper-
ating Systems Design and Implementation, pp. 1–14 (2008)
A Dynamic Service Composition Model
for Adaptive Systems in Mobile Computing
Environments
1 Introduction
Extensive use of mobile devices, coupled with advances in wireless technology like
Wi-Fi direct, increase the potential for shared ownership applications for mobile
ad hoc networks (MANETs) [6]. Devices can employ computational resources
in a network to accomplish not only data routing tasks but also a complex
user task with value-added services. A widely accepted mechanism to carry out
such user tasks is service-based applications (SBAs), in which complex tasks are
modelled as loosely-coupled networks of services. A SBA provides appropriate
functionalities to consumers by composing cooperating services.
Typical MANET environments are dynamic; mobile SBAs must be adaptable
to cope with potential changes in their dynamic operating environments (e.g.
topology changes, network disconnections or service failures). Centralized service
management for traditional adaptive systems is not applicable to MANETs, as
device mobility is likely to be unpredictable, with devices joining and leaving the
network at any time. There is, therefore, no guarantee that a suitably resource-
rich central node will be available for the duration of a complex service provision.
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 93–107, 2014.
c Springer-Verlag Berlin Heidelberg 2014
94 N. Chen and S. Clarke
which are drawn from an online NAICS 2 and Restaurant ontology3 . The SSON
layer is built over the service layer, linking different services based on their
semantic similarity and dependency. Services are semantically similar if they
provide similar functionalities or they require similar input data and produce
outputs with similar types. Two services are semantically dependent if one’s
output has the same semantic type as the other’s input. In other words, it is
possible for one to use the others result to execute its own operations. The highest
layer is a user task specification layer where a set of composition requirements
are defined as a task specification, with the SSON layer aggregating appropriate
services for the task specified, as described in Section 4.
For the service model, we describe a service S =< Sid , IN, OU T >, con-
sisting of a service identification Sid and two sets capturing I/O parameters.
Semantic annotations for a service are defined in a Service Annotation Profile
AP =< Sid , AF, AIN, AOU T >, where AF , AIN , and AOU T are ontology con-
cepts mapped to corresponding functionalities and I/O parameters in a service
description. APs inform service compositions about services’ logical semantic in-
terfaces, and can be described by semantically rich data models, such as OWL-S
and SAWSDL.
content-based search. The core notion of a SON is to bunch similar peers, making
query processes discover bunches instead of individual peers for faster locating.
Current research on facilitating distributed service discovery with SONs, like
DHT-SON [18] and SDSD [4], shows a SON can be structured at comparable
cost to that of producing a normal network by using probe messages.
This proposed SSON supports service composition by introducing the idea of
I/O dependencies from service overlay networks [13], [23] which create links be-
tween service providers by matching I/O parameters. A SSON combines service
overlay networks with normal SON activities. I/O parameters-dependent links
can be built as a byproduct of general SONs. A SSON is structured by link-
ing semantically-related services, and the relationship between services can be
classified as similarity or dependency. The former represents two service sharing
similar functionalities like that defined in traditional SONs; the latter indicates
two services with potential I/O data dependencies. Semantic links rely on match-
making techniques to be established.
where AP1 =< S1 , AF1 , AIN1 , AOU T1 > and AP2 =<
S2 , AF2 , AIN2 , AOU T2 >. The matching function between two APs can
then be defined by using of Equation1:
– R0 same: A and B provide the same functionalities and ask for the same
sets of input data. (e.g. A: Get Location, B: Get Address)
– R1 reqI : A provides the same functionalities with B , but asks for additional
input data with respect to that of B. (e.g. A: Get Location, B: Get Address
by Name and Phone Number)
– R2 share: A and B have shared functionalities. (e.g. A: Navigator, B:
Route Planner)
– R3 dep: A depends on B’s execution result. (e.g. A: Navigator, B: Get Lo-
cation)
– R4 in: A can provide input data to B. (e.g. A: Navigator, B: Route Render)
successfully matched, the node becomes a composer. This new composer then
stores the received request, eliminates the matched operation of the request and
sends out a new request. It also draws up template information for creating con-
figuration fragments. Therefore, the number of operations in a request reduces
hop by hop. If the eliminated operation is the last operation of a workflow branch
in the request, the composer sends a token to the initiator.
A node has a chance to provide a full service for an operation by combining to
other peers in the network when it can only provide a partial service for that op-
eration. To this end, we take into account two situations and apply corresponding
two strategies (Line 8 and Line 13 in Algorithm 1 (b)). These strategies allow
composers to adapt the abstract workflow on-the-fly. To illustrate such adap-
tation in our backward planning process, we present a brief example scenario:
finding a restaurant and routing to it. Table 1 shows the operations defined in the
composition request and the original abstract workflow. It also illustrates avail-
able service providers in the example scenario. We assume there is no provider
in the network that can work alone to serve operation oC .
A Dynamic Service Composition Model for Adaptive Systems 101
Traditional service composition techniques start service executions only after ser-
vice binding has completed. The composition mechanism in this paper combines
the service binding phase and the execution phase. Our previous work on op-
portunistic service composition illustrates a distributed execution model [10], [9]
that allows systems to bind one service provider, directly executing its provided
services, and then forwards the remainder of composition request on to the next
node. The bound provider then waits for other providers to reply with messages
that include their service functionality information. We apply the distributed
execution model in this work with some extensions to discovery mechanisms.
Instead of forwarding the rest of the request, the bound provider (composer)
selects the best matched CF, sending its execution results to the nodes in the
S P ost of the CF.
This paper provides dynamic adaptation mechanisms for systems. Global adap-
tation is realized by selecting adaptable local CFs hop-by-hop during service exe-
cution. The CFs of a composer can be adapted during composition planning phases
and service execution phases. Such adaptation is modelled by a MAPE (Monitor-
Analyze-Plan-Execute) loop, as shown in Fig 4(b). Composers can monitor adap-
tation trigger events with SSON, analyze these events when they appear and assign
A Dynamic Service Composition Model for Adaptive Systems 103
Fig. 4. (a) An example of the CF for Provider C (Table 1), (b) CFs Adaptation (SN:
Semantic Neighbours)
composer leaves the physical network, this absence of the node can be detected
through the management of SSON. The composer removes all the CFs that
contain the absent node. If a composer resigns its role, it sends a message to
all the PreCN and PostCN nodes, asking them to adapt their CFs by removing
invalid CFs. A node can also join the network and engage in the composition at
runtime. If a composer finds a new node in its semantic neighbour list, this new
node can participate in the composition in different ways depending on the rank
of the semantic link established between it and the composer (See Table 2).
5 Evaluation
6 Related Work
Workflow-based adaptive systems [21] choose, or implement services from a pre-
defined abstract workflow that determines the structure of services. The abstract
workflow is implemented as a concrete (executable) workflow by selecting and
composing requested services. Adaptation of concrete workflows has been ex-
plored in the literature [1], [3]. However, these require central entities for com-
position and an explicit static abstract workflow, which is usually created manu-
ally. Decentralized process management approaches [24], [25] explore distributed
mechanisms, like process migration, to manage and update the execution of con-
crete workflows, which is close to our work in terms of service execution. However,
they still need a well- defined system business process at deployment time. In
our approach, the partial workflows that composers generate locally, distribute
over participating service providers during service discovery phases to gradually
devise a global one.
Dynamic service composition can also be reduced to an AI planning problem.
Similar to our solution, decentralized planning approaches [7], [11], [20] form a
global plan through merging fragments of plans that are created by individual
service agents. However, with these approaches, programmers need to provide
an explicitly defined goal for planning. The initial plan can become unreliable
when the environment changes. Automatic re-planning schemes [12], [14], [17]
allow plans to be adapted when matching services are unavailable, but existing
approaches depend on central knowledge bases.
Considerable research effort has targeted dynamic service compositions sup-
porting one-to-M (M > 1) matching while a matching basic service is not lo-
cated. They usually define a composition result as a directed acyclic graph [8],
106 N. Chen and S. Clarke
[13]. The nodes in a DAG represent services, and the edges show compositions
between the collaborating services. Service composition is modelled as a problem
of finding a shortest path (or the one with the lowest cost) from two services in
the DAG. However, existing work has limited support for services with multiple
I/O parameters. In addition, creating such DAG requires the aggregation of ser-
vice specifications from a central registry. Al- Oqily [2] proposed a decentralized
reasoning system for one-to-M (M > 1) matching, which is the closest work to
us. It composes services using a Service-Specific Overlay Network built over P2P
networks and enables self-organizing through management of the network. How-
ever, this approach is based on an assumption that every node in the network
knows its geographic location, as service discovery is realized by broadcasting
a request over its physical neighbours. Geographic locations usually can be ob-
tained from location services like GIS, but these are not readily accessible for
every node in the network. Our approach uses a semantic-based overlay network
to discover logical neighbours instead of geographic ones.
References
1. Adams, M., ter Hofstede, A.H.M., Edmond, D., van der Aalst, W.M.P.: Worklets:
A service-oriented implementation of dynamic flexibility in workflows. In: Proc.
Int. Conf. on On the Move to Meaningful Internet Systems (2006)
2. Al-Oqily, I., Karmouch, A.: A Decentralized Self-Organizing Service Composition
for Autonomic Entities. ACM Trans. Auton. and Adapt. Syst. (2011)
3. Ardissono, L., Furnari, R., Goy, A., Petrone, G., Segnan, M.: Context-aware work-
flow management. In: Proc. 7th Int. Conf. ICWE (2007)
A Dynamic Service Composition Model for Adaptive Systems 107
1 Introduction
As explained in surveys [1,2], the management of large number of services in the
global Internet creates many open problems, in particular in service composition,
which consists in selecting/identifying several existing services and combining
them into a composite one to produce value-added process.
Many approaches on QoS-aware web service (WS) composition exist, where
QoS represents the quality of the service (e.g. price or response time) – see
for example survey [3]. As explained in [4], the inter-operation of distributed
software-systems is always affected by failures, dynamic changes, availability
of resources, and others. And, as argued by [5], to make service-oriented ap-
plications more reliable, web services must be examined from a transactional
perspective. The execution of a composite WS is reliable if, in case of a compo-
nent WS failure, the negative impacts are negligible for the user [6]. A service
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 108–122, 2014.
c Springer-Verlag Berlin Heidelberg 2014
Optimal Transactional WS Composition with Dependency Graph 109
2 Related Work
be huge since it is proportional to the number of WS and data times the number
of stages. Moreover, the number of stages is not known; only upper bounds can
be chosen (the worst one is to set the number of stages equals to the number
of WS). The size of the model does not allow to solve big size test sets: none
experimental results are given in [19], while, in [12], computational experiments
are performed, for 20 WS and 200 data, taking 200 seconds to find the solution
and, for 100 WS and 800 data, taking 900 seconds.
Some approaches extend QoS-aware composition to transactionnal and QoS-
based approaches (see for example survey [20]). However, to the best of our
knowledge, the approach of [7] is the only one proposing a WS composition al-
gorithm based on service dependency graph integrating both transactional and
QoS requirements. In this approach, the dependency graph is represented by a
colored Petri Net, where places correspond to the data and transitions to the
WS. The proposed algorithm is a greedy-like algorithm locally optimizing the
QoS. In order to limit the execution time, the authors proposed to identify the
WS which are potentially useful to answer the user query. This identification
consists in selecting the transactional paths in the dependency graph, that allow
to obtain an output data needed by the user from the inputs of the query. The
greedy-algorithm then consists in selecting the solution from a smaller depen-
dency graph, only containing the WS which are potentially useful or relevant to
answer the user query.
In this article, the service repository (i.e. the set of available services) is repre-
sented by a directed graph G = (X, U ). The set of vertices X can be partitioned
in two sets: S the set of vertices representing WS and, D the set of vertices
representing data. In the following, for all i ∈ S, let us denote by s(i) the WS
represented by vertex i and, for all i ∈ D, d(i) denotes the data represented by
vertex i. The set of directed edges U represents two kinds of dependency: (1)
an edge from i ∈ S to j ∈ D represents the fact that WS s(i) produces data
d(j) (d(j) is an output of s(i)), (2) an edge from i ∈ D to j ∈ S represents the
fact that WS s(j) needs data d(i) to be executed (d(i) is one intput of s(j)).
Thus, in this graph representation, there does not exist any directed edge of the
form: (i, j) with i ∈ S and j ∈ S or, i ∈ D and j ∈ D. Such graph, generally
called a Service Dependency Graph, is used in [7,10,11,12]. An example of SDG
is presented in Fig. 1.
The user query is defined by a set I of input data (with I ⊂ D) corresponding
to the information that the user provides, and a set O of output data represent-
ing the information the user needs (with O ⊂ D). Such query is also used in
[7,10,11,12] for example.
A composite WS (CWS) satisfying the user query, characterized by I and O,
can be represented by a connected sub-graph if and only if: (a) each o ∈ O is
covered by the sub-graph, (b) in this sub-graph, the only vertices without any
predecessor belong to I, (c) if a vertex i ∈ S is covered by the sub-graph, then
Optimal Transactional WS Composition with Dependency Graph 111
all arcs (j, i), with j ∈ D, belong to the sub-graph (indeed each WS s(i) can be
executed if and only if all its input data are available) and, (d) this sub-graph
does not contain any directed cycle.
For example, given the graph of Fig. 1 and the query described by I = {1, 2}
and O = {7, 8}, we can propose different CWS. The CWS {s(16), s(18), s(15)}
is represented by the following sub-graph: {(1, 16), (16, 3), (3, 18), (18, 6), (6, 15),
(15, 7), (15, 8)}. Let us remark that CWS {s(11), s(13), s(15)} is not feasible since
it contains the following conflicting situation: to be executed, s(11) needs d(6)
as input, and d(6) is obtained by executing s(13) which input d(4) is produced
by s(11). In terms of graph, the associated sub-graph {(2, 11), (11, 4), (4, 13),
(13, 6), (6, 11), (6, 15), (15, 7), (15, 8)} satisfies the aforementioned properties (a),
(b) and (c) but does not verify property (d): (11, 4), (4, 13), (13, 6), (6, 11) is a
directed cycle.
Given a user query, our problem consists in finding a reliable CWS that op-
timizes its overall QoS.
With such a constraint, if xjk equals to 1, all directed edges entering in vertex j
must belong to the solution (ensuring that all input data for s(j) are available).
Otherwise, if xjk equals to 0, the constraint is relaxed and plays no role.
Considering the graph of Fig. 1, s(11) produces data d(1) and d(4) and needs
data d(2) and d(6) to be executed. Therefore, two constraints must be written for
describing inputs and outputs of s(11): (i) considering output d(1), x2,11 +x6,11 ≥
2x11,1 and, (ii) considering output d(4), x2,11 + x6,11 ≥ 2x11,4 . These constraints
imply that d(1) or d(4) can be computed by s(11) if and only if d(2) and d(6)
are available as inputs for s(11).
For any data d(j) not provided by the user, j ∈ D \ I, d(j) is available when
at least one WS computes it. In the associated graph, the set of WS computing
d(j) exactly corresponds to Γ − (j) inducing the following constraint:
∀j ∈ D \ I, ∀k ∈ Γ + (j), i∈Γ − (j) xij ≥ xjk (C2 )
This constraint imposes that d(j) can be used by s(k) (inducing that variable
xjk is equal
to 1) if and only if d(j) has been computed by at least one WS s(i)
(leading to i∈Γ − (j) xij ≥ 1). When xjk is equal to 0, the constraint plays no
role.
Considering the graph of Fig. 1, d(6) is an output of three WS s(13), s(14)
and s(18), and is an input of three WS s(11), s(15) and s(17). Thus, three
Optimal Transactional WS Composition with Dependency Graph 113
constraints must be written: (i) As input of s(11): x13,6 + x14,6 + x18,6 ≥ x6,11 ,
(ii) as input of s(15): x13,6 + x14,6 + x18,6 ≥ x6,15 and, (iii) as input of s(17):
x13,6 + x14,6 + x18,6 ≥ x6,17 . Each constraint imposes that d(6) is available if and
only if it has been computed by s(13), s(14) and/or s(18).
Each data j ∈ O, needed by the user, must be computed by at least one WS:
∀j ∈ O, i∈Γ − (j) xij ≥ 1 (C3 )
Given the graph of Fig. 1, we consider a user query with O = {7, 8}. d(7)
can be computed by s(14) or s(15), and d(8) by s(15) only. Then, we have:
x14,7 + x15,7 ≥ 1 and x15,8 ≥ 1.
Any data d(j) provided by the user is available, meaning that ∀j ∈ I, ∀k ∈
Γ + (j), arc (j, k) can belong to the resulting sub-graph (xjk ≤ 1). Therefore,
input data provided by the user do not introduce any specific constraints in the
model.
This constraint imposes that wj equals to 1 if at least one directed edge with
initial vertex j belongs to the solution. On the contrary, when no edge of the
form (j, k) belongs to the solution, constraint plays no role since it becomes
|Γ + (j)|wj ≥ 0. In this case, wj can be equal to 1 even if the x values corresponds
to a sub-graph which does not cover the vertex j. However, the value of such a
solution is strictly greater than the value of the solution with the same x values
and wj = 0. Recalling that the objective function is to minimize i∈S qi wi
(qi > 0 ∀i) such a solution cannot be an optimal one.
The last family of constraints are introduced for eliminating directed cycle in
the solution. These constraints are classical (initially proposed in [21]) and are
written as follows:
tj − ti ≥ 1 − |X|(1 − xij ) ∀(i, j) ∈ U (C5 )
ti = 0 ∀i ∈ I (C6 )
0 ≤ ti ≤ |X| − 1 ∀i = 1, . . . , |X|
114 V. Gabrel, M. Manouvrier, and C. Murat
For all i ∈ X, variable ti represents the topological order of vertex i in the sub-
graph. If xij equals to 1, arc (i, j) belongs to the sub-graph and constraint (C5 )
becomes: tj − ti ≥ 1. This constraint imposes that WS s(i) is executed (or data
d(i) is obtained) before producing data d(j) (or executing service s(j)). If xij
equals to 0, constraint (C5 ) becomes: tj − ti ≥ (1 − |X|). Since ti belongs to
[0, |X| − 1], constraint (C5 ) plays no role even if ti equals to (|X| − 1).
In graph of Fig. 1, we have previously noticed that CWS {s(11), s(13), s(15)}
is not a feasible solution. This CWS is described by variables x11,4 = x4,13 =
x13,6 = x6,11 = 1 and constraints (C5 ) lead to an unfeasibility: (i) t4 − t11 ≥ 1,
(ii) t13 − t4 ≥ 1, (iii) t6 − t13 ≥ 1 and, (iv) t11 − t6 ≥ 1 =⇒ 0 ≥ 4.
if once it successfully completes, its effects remain for ever and cannot be se-
mantically undone. If it fails, then it has no effect at all. For example, a service
delivering a non refundable and non exchangeable plane ticket is pivot. A WS
is compensatable (C) if it exists another WS, or compensation policies, which
can semantically undo its execution. For example, a service allowing to reserve
a room in a hotel with extra fees in case of cancellation is compensatable. A
WS is retriable if it guarantees a successful termination after a finite number
of invocations. This property is always combined with the two previous one,
defining pivot retriable (P R) or compensatable retriable (CR) WS. For example,
a payment service may be (pivot or compensatable) retriable in order to guar-
antee that the payment succeed. The authors of [15] propose transactional rules
defining the possible combinations of component WS to obtain a reliable (i.e.
a transactional) composite WS. These rules are summarized in Table 1 where
the second and the third columns represent the transactional properties of WS
which are incompatible in a composition with a WS of transactional property of
column 1. For more details about these rules, reader must refer to [15].
Considering the graph G = (X, U ) associated to the composition problem, the
transactional property of each WS induces a partition of the subset of vertices
S ⊂ X as follows: S = P ∪ C ∪ P R ∪ CR (each subset is denoted by the
corresponding transactional property). In the graph of Fig. 1, we set the following
partition: P = {12, 16, 17}, C = {10, 13}, P R = {11, 18}, CR = {14, 15}.
Considering a particular query with given inputs and expected outputs, intro-
ducing transactional rules implies that some paths, belonging to the sub-graph,
going from an input to an expected output are eliminated (for example paths
containing two vertices belonging to P – see line 1 in Table 1). Thus, in terms
of our 0-1 linear program, some additional constraints must be introduced in
order to eliminate the solutions which do not respect transactional rules. These
additional constraints of linear form are presented in the following subsection.
∀i ∈ P, ti − tj ≥ −|X|(1 − wi ) ∀j ∈ Ai ∩ C (C8 )
∀i ∈ P R, ti − tj ≥ −|X|(1 − wi ) ∀j ∈ Ai ∩ (P ∪ C) (C10 )
Optimal Transactional WS Composition with Dependency Graph 117
In the following section, we compare our model to the two main related ones:
the linear-programming model of [12] and the approximate approach of [7].
6 Experimental Results
The objectives of our experiments are: without transactional requirements, (i) to
compare our model with another recent model based on 0-1 linear programming
proposed in [12], and (ii) to test our model on the well-known WS composition
benchmark of WS-Challenge 2009 [23], and, with transactional properties, (iii)
to measure the difficulty inducing by transactional requirements, and (iv) to
compare the optimal solution given by our model to the feasible solution obtained
with the approximate algorithm proposed in [7].
associated with each WS. On each WS repository, 10 user queries are randomly
generated by varying the number of inputs and the number of outputs between
1 and 3 and by randomly generating the QoS score of each WS. The second test
set corresponds to the 5 data sets of WS-Challenge1 2009 containing 500, 4000,
8000 and 15000 WS (described by their response time and throughoutput QoS
values) with respectively 1500, 10000, 15000 or 25000 data.
Each 0-1 linear programming problem is solved with CPLEX solver which
uses a branch and bound algorithm to search the optimal solution. We limit the
computation time to 3600 seconds for the first test set and to 300 (limit time
given by WS-Challenge) for the second one. Thus, two situations can occur:
either CPLEX solves the problem in time and, if it exists, the optimal solution
is found and otherwise the absence of solution is proved, or CPLEX is disrupted
by time out. In this last case, either a solution is proposed but its solution status
is unknown (the algorithm cannot prove that this solution is the optimal one
because it has not enough time to explore all the feasible solution set), or no
solution is founded in time (even if a solution exists).
Table 2. Description of the first test set and LP-based model comparison
We first compare our 0-1 linear model, denoted Pnew , to the one published in
[12]. Results are presented in Lines 4 to 7 and in Line 9 of Table 2. In this table,
the ratio r (line 9) is equal to the average value (over 10 queries for a given WS
repository) of the computational time taken by CPLEX for solving the model of
[12] over the one taken for solving Pnew .
Let us recall that in Pnew , the variables represent WS execution order,
input/output WS and data availability. The 0-1 linear program sizes are reason-
able since the numbers of variables and constraints only depend on the depen-
dency graph size (number of vertices and edges). In [12], the model is based on a
decomposition in stages (we choose to set a number of stages equal to 10 for all
1
Data sets available at http://www.it-weise.de/documents/files/wsc05-09.zip
Optimal Transactional WS Composition with Dependency Graph 119
experiments) and program sizes are much greater since the number of variables
and constraints depends on the dependency graph size times the number of stages
(number of constraints and variables are presented in lines 4 to 7 of Table 2).
Consequently, an optimal solution is founded in 3 times faster in average
with Pnew . Moreover, considering big size test sets with 1000 WS and 100 data,
CPLEX solves Pnew at optimality for all the 10 queries (in 266s in average) while
it is not the case for the model presented in [12]. More precisely, for 2 queries,
model of [12] finds the optimal solution but without proving their optimality
status in 3600s, and for one query, it computes a feasible solution with a greater
objective function value. For the 7 queries in which both models can find the
optimal solution, the problem is solving 7.4 times faster with Pnew .
These experimental results prove that our 0-1 linear model is the most efficient
one. It can find the optimal solution of all the considered 100 queries excepted
2 of them.
Table 3. Experiments of our model on the WS-Challenge 2009 (WSC) test sets
WSC test set 1 (500WS) 2 (4000 WS) 3 (8000 WS) 4 (8000 WS) 5 (15000 WS)
To find optimal sol. 0.35s 3.46s 4.23s 22s 27s
To prove optimality 6.65s 5.8s 6.7s > 300s > 300s
We also applied our model on the test sets of WS-Challenge 2009 [23], slightly
adapting it by modifying the objective function in order to optimize the response
time. The transformations are: (1) adding a fictitious vertex f and the fictitious
arcs (i, f ), ∀i ∈ O, (2) replacing the objective function by min tf , (3) modifying
constraints (C5 ) that way: tj − ti ≥ di − T (1 − xij ), ∀(i, j) ∈ U , with di the
response time of each WS i (i ∈ 1, . . . , |X|) and T an upper bound, and (4)
deleting variables wj and their corresponding constraints C4 (since they are no
more necessary). Our model finds the optimal solution for all the 5 test sets in less
than 5 minutes (timeout fixed by the WS-Challenge) – see Table 3. Optimality
of the solution can not be proved in 300s for the last two sets. However, the
optimal solution finds for the biggest set is better than the solution proposed in
WS-Challenge 2009. In terms of quality of the solutions, our model is therefore
comparable to the recent related approaches [8,11,17] using the WS-Challenge
data for their experiments. We have to notice that, in this article, we do not
clean the dependency graph by filtering all services relevant for the query and
by discarding the rest as done in the aforementioned approaches. Therefore, our
model always finds the optimal solution while other approach [17] can not always
find it for the biggest data set without any cleaning process. Moreover, as shown
in Section 5.2 and in the next section for experiments, our model can be extended
to take into account transactional properties.
Table 4. Comparison between our model and the approximate approach of [7]
Secondly, we compare our model with the approximate approach of [7]. When
we compute solutions with this approximate algorithm, two possible results can
be provided: either a solution is proposed (its solution status is unknown), or
no solution is proposed (even if a solution exists). Comparison between our
model and the approximate approach is presented in Table 4. Line 2 contains
the average computation time (in seconds) taken by CPLEX to solve queries at
optimality (the number of these solved queries - at most 10 - is given in paren-
thesis Line 3). Among queries unsolved at optimality in 3600s, PnewT may find
a feasible solution or may not find any solution (see Line 4). Lines 5 and 6
show the results obtained with the approximate algorithm. Line 5 corresponds
to the average computation time to compute a feasible solution (the number of
queries for which the approximate algorithm is able to find a solution is given in
parenthesis Line 6). Line 7 presents the average approximate ratio (approximate
solution value / optimal solution value). Our experimental results show that,
in a large majority of queries, our approach based on the CPLEX branch and
bound algorithm computes more rapidly an optimal solution than the approxi-
mate algorithm. In only one case over more than 100, the approximate algorithm
finds a better solution. The queries on non-sparce SDG (R7 to R11) are easier
to solve at optimality. For R7 and R8 data sets, computation times are very
small: all optimal solutions are found by PnewT in less than 1 second. Queries
on sparce SDG with 20 data (R1 to R5) are much more difficult to solve, the
computation times are important even if, in average, PnewT computes the op-
timal solution more rapidly (for R5, 9 queries are solved at optimality with an
average computation time of 140s while the approximate algorithms needs 786s
to find a feasible solution with a value equals to 3.3 times the optimal value in
average).
Computation times vary a lot with the query. For example, for R2 with an
average computation times of 55.6, the ”harder” query needs 412s to determine
the optimal solution while 5 queries take less than 3s and 2 queries around 10s ;
Optimal Transactional WS Composition with Dependency Graph 121
the 2 remaining queries are not solved at optimality: for one query, PnewT finds
a better solution than one computed by the approximate algorithm (the value is
15% better), and for the other query, PnewT finds a feasible solution while the
approximate algorithm cannot provide any solution. When no solution can be
found by both algorithms, we cannot conclude that no solution exists: either the
approximate algorithm cannot find a feasible solution (it often occurs for queries
of R7 to R11), or CPLEX hasn’t enough time to find a feasible solution. If we
don’t impose time limit, PnewT should be able to find an optimal solution.
Finally we have experimented our model on two WS repositories containing
1000 WS (data sets R6 and R12). R6 contains 20 data and PnewT takes 409s
in average to compute the optimal solution of 8 queries. For the two remaining
queries, PnewT only finds a feasible solution in 3600s for one, and cannot find
any solution for the other, because of timeout. R12 contains 100 data and PnewT
takes 525s in average to compute the optimal solution of only 3 queries. It cannot
find any solution for 4 queries and a feasible solution for the 3 remaining ones,
because of timeout. With this problem size, it becomes hard to solve at optimality
with CPLEX.
7 Conclusion
In this article, we present a 0-1 linear program for automatically determining a
transactional composite WS optimizing QoS from a service dependency graph.
With our model, the QoS and transactional-aware composition problem can be
solved at optimality. As far as we know, it is the first time. With consequent
experimental results, we show that our model dominates an already recent pub-
lished one [12], also based on linear programming for solving the QoS-aware
composition problem without transactional requirement. Our model also finds
all the optimal solutions for the well-known service composition benchmark of
WS-Challenge 2009. Then, we compare our approach, with the only related one
including transactional requirements [7], which is an approximate approach. Ex-
perimental results show that, when an optimal solution exists, our model can
find it generally faster than the related work. However, for big size test sets, a
standard solver like CPLEX is too long to find optimal solution. Specific resolu-
tion methods should be proposed to solve such 0-1 linear programming model.
This topic will be the focus of our future research.
References
1. Issarny, V., Georgantas, N., Hachem, S., Zarras, A., et al.: Service-oriented middle-
ware for the Future Internet: state of the art and research directions. J. of Internet
Services and App. 2(1), 23–45 (2011)
2. Dustdar, S., Pichler, R., Savenkov, V., Truong, H.L.: Quality-aware Service-
oriented Data Integration: Requirements, State of the Art and Open Challenges.
SIGMOD Rec. 41(1), 11–19 (2012)
3. Strunk, A.: QoS-Aware Service Composition: A Survey. In: IEEE ECOWS, pp.
67–74 (2010)
122 V. Gabrel, M. Manouvrier, and C. Murat
4. Liu, A., Li, Q., Huang, L., Xiao, M.: FACTS: A Framework for Fault-Tolerant
Composition of Transactional Web Service. IEEE Trans. on Serv. Comp. 3(1),
46–59 (2010)
5. Badr, Y., Benslimane, D., Maamar, Z., Liu, L.: Guest Editorial: Special Section on
Transactional Web Services. IEEE Trans. on Serv. Comp. 3(1), 30–31 (2010)
6. Gabrel, V., Manouvrier, M., Megdiche, I., Murat, C.: A new 0-1 linear program for
QoS and transactional-aware web service composition. In: IEEE ISCC, pp. 845–850
(2012)
7. Cardinale, Y., Haddad, J.E., Manouvrier, M., Rukoz, M.: CPN-TWS: a coloured
petri-net approach for transactional-QoS driven Web Service composition. Int. J.
of Web and Grid Services (IJWGS) 7(1), 91–115 (2011)
8. Yan, Y., Chen, M., Yang, Y.: Anytime QoS Optimization over the PlanGraph for
Web Service Composition. In: ACM SAC, pp. 1968–1975 (2012)
9. Liang, Q., Su, S.: AND/OR Graph and Search Algorithm for Discovering Compos-
ite Web Services. Int. J. Web Service Res. (IJWSR) 2(4), 48–67 (2005)
10. Gu, Z., Li, J., Xu, B.: Automatic Service Composition Based on Enhanced Service
Dependency Graph. In: IEEE ICWS, pp. 246–253 (2008)
11. Jiang, W., Zhang, C., Huang, Z., Chen, M., Hu, S., Liu, Z.: QSynth: A Tool for
QoS-aware Automatic Service Composition. In: IEEE ICWS, pp. 42–49 (2010)
12. Paganelli, F., Ambra, T., Parlanti, D.: A QoS-aware service composition approach
based on semantic annotations and integer programming. Int. J. of Web Info. Sys.
(IJWIS) 8(3), 296–321 (2012)
13. Zeng, L., Benatallah, B., Ngu, A., Dumas, M., Kalagnanam, J., Chang, H.: QoS-
Aware Middleware for Web Services Composition. IEEE Trans. on Soft. Eng. 30(5),
311–327 (2004)
14. Yu, T., Zhang, Y., Lin, K.J.: Efficient algorithms for Web services selection with
end-to-end QoS constraints. ACM Trans. on the Web 1, 1–26 (2007)
15. Haddad, J.E., Manouvrier, M., Rukoz, M.: TQoS: Transactional and QoS-aware
selection algorithm for automatic Web service composition. IEEE Trans. on Serv.
Comp. 3(1), 73–85 (2010)
16. Syu, Y., FanJiang, Y.Y., Kuo, J.Y., Ma, S.P.: Towards a Genetic Algorithm Ap-
proach to Automating Workflow Composition for Web Services with Transactional
and QoS-Awareness. In: IEEE SERVICES, pp. 295–302 (2011)
17. Rodriguez-Mier, P., Mucientes, M., Lama, M.: A dynamic qoS-aware semantic web
service composition algorithm. In: Liu, C., Ludwig, H., Toumani, F., Yu, Q. (eds.) Ser-
vice Oriented Computing. LNCS, vol. 7636, pp. 623–630. Springer, Heidelberg (2012)
18. Aleti, A., Buhnova, B., Grunske, L., Koziolek, A., Meedeniya, I.: Software Archi-
tecture Optimization Methods: A Systematic Literature Review. IEEE Trans. on
Soft. Eng. 39(5), 658–683 (2013)
19. Yoo, J.J.W., Kumara, S., Lee, D., Oh, S.C.: A Web Service Composition Frame-
work Using Integer Programming with Non-functional Objectives and Constraints.
In: IEEE CEC/EEE, pp. 347–350 (2008)
20. Cardinale, Y., Haddad, J.E., Manouvrier, M., Rukoz, M.: Transactional-aware Web
Service Composition: A Survey. In: Handbook of Research on Non-Functional Prop.
for Service-oriented Sys.: Future Directions, pp. 116–142. IGI Global (2011)
21. Miller, C.E., Tucker, A.W., Zemlin, R.A.: Integer Programming Formulation of
Traveling Salesman Problems. J. of ACM 7(4), 326–329 (1960)
22. Aho, A.V., Hopcroft, J.E., Ullman, J.: Data Structures and Algorithms, 1st edn.
Addison-Wesley Longman Pub. Co., Inc. (1983)
23. Kona, S., Bansal, A., Blake, M.B., Bleul, S., Weise, T.: WSC-2009: A Quality of
Service-Oriented Web Services Challenge. In: IEEE CEC, pp. 487–490 (2009)
A Framework for Searching Semantic Data
and Services with SPARQL
Abstract. The last years witnessed the success of Linked Open Data
(LOD) project and the growing amount of semantic data sources avail-
able on the web. However, there is still a lot of data that will not be pub-
lished as a fully materialized knowledge base (dynamic data, data with
limited acces patterns, etc). Such data is in general available through web
api or web services. In this paper, we introduce a SPARQL-driven ap-
proach for searching linked data and relevant services. In our framework,
a user data query is analyzed and transformed into service requests. The
resulting service requests, formatted for different semantic web services
languages, are addressed to services repositories. Our system also fea-
tures automatic web service composition to help finding more answers
for user queries. The intended applications for such a framework vary
from mashups development to aggregated search.
1 Introduction
The last years witnessed the success of Linked Open Data (LOD) project and
the growing amount of semantic data sources available on the web (public sector
data published by several government initiatives, scientific data facilitating col-
laboration, ...). The Linked Open Data cloud, representing a large portion of the
semantic web, comprises more then 2000 datasets that are interlinked by RDF
links, most of them offering a SPARQL endpoint (according to LODstats1 as of
May 2014) . To exploit these interlinked data sources, federated query processing
techniques were proposed ([1]). However, as mentioned in [2] there is still a lot
of data that will not be published as a fully materialized knowledge base like:
– dynamic data issued from sensors
– data that is computed on demand depending on a large sets of input data,
e.g. the faster public transport connection between two city points
– data with limited access patterns, e.g. prices of hotels may be available for
specific requests in order to allow different pricing policies.
Such data is in general available through web API or web services. In order
to allow services to be automatically discovered and composed, research works
1
http://stats.lod2.eu/
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 123–138, 2014.
c Springer-Verlag Berlin Heidelberg 2014
124 M.L. Mouhoub, D. Grigori, and M. Manouvrier
Such a search often requires distinct queries: a) data queries to lookup in the
LOD to find data b) service requests to discover relevant services in some SWS
repositories and c) service composition requests to create relevant service com-
positions in case no single relevant service is found. Our framework searches
for both (data and services) starting from a single query from the user called
the data query, i.e. a query intended to search only for data. From this query,
it automatically issues service requests and finds relevant services or generates
service compositions.
To explain the motivations and goals of our framework, we consider the fol-
lowing example scenario: A user wants to know all writers born in Paris and
holding a Nobel prize as well as the list of all their books. This query is written
in SPARQL in listing 1. Answers for this query in the LOD might supposedly
find all these writers in DBpedia. However, their published books are not all
listed in DBpedia. In this case, data is not complete and might need to be com-
pleted with full book listings from services like Amazon API, Google Books API,
etc Some of the latter APIs can also provide complementary information on the
books such as the prices, ISBN numbers, etc. In addition, there are some other
126 M.L. Mouhoub, D. Grigori, and M. Manouvrier
relevant services that allow the user to buy a given book given online. However,
if the user wants to buy a given book from a local store, and there is a service
that only takes an ISBN number as input to return the available local stores
that sell this book, in that case, a service composition can be made to return
such information.
2.1 Definitions
To better explain the details of service search we first give the following defini-
tions for the context of the search in this section.
Similar Concepts (en ) : For a given concept of a node cn , there exists a set of
one or more equivalent (similar) concepts en = Similar(cn ) where Similar(cn )
is a function that returns the similar concepts of a given concept defined in
its ontology by one of the following rdfs:property predicates: a) owl:sameAs b)
owl:equivalentClass and c) rdfs:subClassOf in either directions.
Service Query (Qs ) : Similarly in the QD definition above, the service query is
a SPARQL query written to select relevant services from their SWS repositories
via their SPARQL endpoints . It consists of sets of triple patterns that match the
inputs and outputs of Rs with inputs and outputs of a service in S. The triples
of Qs follow the SWS description model used by the repositories to describe
services.
2. Services that consume some of the inputs or the outputs of the request, or
that return some of the inputs or the outputs of the request. Such services
would be useful to: a) provide additional information or services to the data,
b) discover candidate services for a mashup or composition of services that fit
as providers or consumers in any intermediate step of the composition. The
service request for such kind of services is obtained by one of the strategies
bellow that satisfy the following:
Strategy#2a (Rs , Ds ) : (InD ∩ InS = φ)
Strategy#2b (Rs , Ds ) : (OutD ∩ InS = φ)
Strategy#2c (Rs , Ds ) : (OutD ∩ OutS = φ)
Strategy#2d (Rs , Ds ) : (InD ∩ OutS = φ)
Once the service request elements are extracted from the query, we try to find
the semantic concepts cn that describe the previously extracted nodes with no
concept: (n, null).
Listing 1.2. An example query of Con- Listing 1.3. An example query of Con-
cept Lookup in the LOD cept Lookup in Ontology
Similarity Lookup. To extend the service search space, we use the similar
concepts en of every concept cn in the service search queries along with the
original concepts. To find these similar concepts, we use the rules given by the
definition in section 2.1. Based on this definition, we issue a SPARQL query qe
like the one in the concept lookup but slightly different by adding a triple that
defines a silimarity link between cn and a variable ?similar. The triple pattern
has the form cn ?semanticRelation ?similar where ?semanticRelation is
one of the following properties: a) owl:sameAs, owl:equivalentClass for similar
concepts in other ontologies b) rdfs:subClassOf for hierarchically similar concepts
within the same ontology.
The similarity lookup query qe is executed on the sources used in QD as well
as on the other sources of the LOD because the similar concepts can be found
anywhere.
To optimize the search in other sources of the LOD, we use a caching technique
to build an index structure on the go of the LOD sources content. The details
of this caching is described in section 5.
a single basic graph pattern. On the other hand, loose strategies require only
partial matching, hence, the query triples are be put in a UNION of multiple
graph patterns.
SELECT DISTINCT ? service WHERE {
? service a service : Service ; service : presents ? profile .
? profile profile : h a s O u t p u t ? output1 ;
profile : h a s O u t p u t ? output2 .
? output1 process : p a r a m e t e r T y p e dbpedia - owl : Writer .
? output2 process : p a r a m e t e r T y p e dbpedia - owl : Book .
OPTIONAL { ? profile profile : hasInput ? intput1 .
? input1 process : p a r a m e t e r T y p e dbpedia : Place .}
OPTIONAL { ? profile profile : hasInput ? intput2 .
? input2 process : p a r a m e t e r T y p e dbpedia - owl : Award .}}
0 if cni = cni +1
sim(cni , cni+1 ) = (1)
1 if cni = Similar(cni +1 )
is a function that determines the similarity between two concepts.
From the functions above, the cost of the best known path to the current node
subset is given by the following function:
n
g(n) = cost(ni ) (2)
i=0
where ni are all the accessible services for the next step
The heuristic function h(n) calculates the distance between the current node
and the target AND node n0 in the SDG graph. This is justified by the fact
that, a better solution is the one that uses less services.
1,800
8
1,600 6
4
1,400
1,200
0
2 4 6 8 10
Q4 Q17 Q22 Q27
6 Related Works
The motivations and research questions of our work are tackled by many recent
works. In fact, our work emerges from a crossing of many research topics in the
semantic web and web services. We’ll list few of the most recent and relevant
works to our paper.
6
http://projects.semwebcentral.org/projects/owls-tc/
136 M.L. Mouhoub, D. Grigori, and M. Manouvrier
SPARQL Query Management. Among the works that tackle the query man-
agement in the LOD, SPARQL federation approaches are the most relevant for
our context. FedX[1] is one of the most popular works that has good perfor-
mance results besides the fact that the tool is available in open source. FedX
optimizes the query management by performing a cache-based source selection
and rewriting queries into sub-queries and running them on the selected sources.
Some recent works like [8] introduce some further optimization for FedX and
other works by optimizing the source selection. We are actually using FedX as
a part of our Framework for answering data queries because, as we stated in
section 2, managing data queries is out of the scope of our work in this paper.
Search of Data and Services. Our work is inspired by the work in [3] which
aims to look for services that are related to a given query based on keywords
comparison between an SQL-like query and a service ontology. This approach
uses generated semantics for services to expand the search area.
A Framework for Searching Semantic Data and Services with SPARQL 137
Another similar work in [14] called ANGIE consists of enriching the LOD
from RESTful APIs and SOAP services by discovering, composing and invoking
services to answer a user query. However, this work assumes the existence of a
global schema for both data and services which is not the case in the LOD. This
assumption makes ANGIE domain specific and not suitable for general purpose
queries.
Some recent works could complement our work such as [15] which proposes
an approach that uses Karma[11] to integrate linked data on-the-fly from static
and dynamic sources and to manage the data updates.
References
1. Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: Optimization
techniques for federated query processing on linked data. In: Aroyo, L., Welty, C.,
Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC
2011, Part I. LNCS, vol. 7031, pp. 601–616. Springer, Heidelberg (2011)
2. Speiser, S., Harth, A.: Integrating linked data and services with linked data ser-
vices. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D.,
De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 170–184.
Springer, Heidelberg (2011)
3. Palmonari, M., Sala, A., Maurino, A., Guerra, F., Pasi, G., Frisoni, G.: Aggregated
search of data and services. Information Systems 36(2), 134–150 (2011)
4. Bizer, C., Heath, T., Berners-Lee, T.: Linked data-the story so far. Intl. journal on
semantic web and information systems 5(3), 1–22 (2009)
5. Kopecky, J., Gomadam, K., Vitvar, T.: hrests: An html microformat for describing
restful web services. In: IEEE/WIC/ACM Intl. Conf. on. Web Intelligence and
Intelligent Agent Technology, WI-IAT 2008, vol. 1, pp. 619–625. IEEE (2008)
6. Blthoff, F., Maleshkova, M.: Restful or restless - current state of today’s top web
apis. In: 11th ESWC 2014 (ESWC 2014) (May 2014)
7. Yan, Y., Xu, B., Gu, Z.: Automatic service composition using and/or graph. In:
2008 10th IEEE Conf. on E-Commerce Technology and the Fifth IEEE Conf. on
Enterprise Computing, E-Commerce and E-Services, pp. 335–338. IEEE (2008)
138 M.L. Mouhoub, D. Grigori, and M. Manouvrier
1 Introduction
Enterprises rely on business processes to accomplish business goals (handling a loan
application, etc.) Business process models are either imperative or declarative [12].
Imperative models typically employ graphs (e.g., automata, Petri Nets) to depict how a
process should progress. Declarative models are usually based on constraints [2], they
are flexible and easy to change during design time or runtime [18]. A practical problem
is whether a given set of constraints allows at least one execution. It is fundamental in
business process modeling to test satisfiability of a given set of constraints.
A process execution is a (finite) sequence of activities through time. The declarative
language DecSerFlow [2] uses a set of temporal predicates as a process specification,
The D ECLARE system [11] supports design and execution of DecSerFlow processes.
In [14] an orchestrator for a declarative business process called REFlex was developed,
where a subset of DecSerFlow can be expressed by REFlex. In this paper, we study
the following conformance problem: does there exist an execution that satisfies a given
DecSerFlow specification? Clearly efficient conformance testing provides an effective
and efficient help to the user of D ECLARE and the scheduler of [14]. Temporal predi-
cates in DecSerFlow can be translated into linear temporal logic (LTL) [13] but limited
to finite sequences. A naive approach to conformance checking is to construct automata
representing individual constraints and determine if their cross product accepts a string.
Complexity of this approach is exponential in the number of given constraints. This
paper aims at efficient comformance checking.
Most DecSerFlow constraints can be categorized into two directions: “response”
(Res), which specifies that an activity should happen in the future, and “precedence”
Supported in part by a grant from Bosch.
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 139–153, 2014.
c Springer-Verlag Berlin Heidelberg 2014
140 Y. Sun and J. Su
(Pre), which specifies that an activity should happen in the past. For each direction,
there are three types of constraints: (1) An ordering constraint Res(a, b) (or Pre(a, b))
for activities a and b specifies that if a occurs, then b should occur in the future (resp.
past). As a practical example of a loan application, if activity “loan approval” hap-
pens, then in the past a “credit check” activity should have happened. (2) An alternating
constraint aRes(a, b) (or aPre(a, b)) specifies that each occurrence of a implies a fu-
ture (resp. past) occurrence of b but before (resp. after) the occurrence of b, a cannot
occur again (i.e., between two occurrences of a, there should exist an occurrence of
b). As an example, if a “house evaluation request” activity happens, a “house evalua-
tion feedback” activity should happen in the future and before receiving the feedback,
the applicant cannot submit another evaluation request, i.e., “request” and “feedback”
should alternate. (3) An immediate constraint iRes(a, b) (or iPre(a, b)) restricts that if a
occurs, then b should immediately follow (resp. occur before). In addition to “response”
and “precedence” constraints, there is a special type of “existence” constraints that only
require occurrences in any order. An existence constraint Exi(a, b) restricts that if a
occurs, then b should occur either earlier or later. In practice, a common existence con-
straint can be that a guest can either choose to pay the hotel expense online then check
in, or check in first and pay the expense later, i.e., Exi(“check in”, “payment”).
In addition to temporal constraints, there may be cardinality requirements on each
activity, i.e., an activity should occur at least once. For example, in an online order
process, “payment” activity is always required to occur; while “shipping” is not (a cus-
tomers may pick up the ordered items in a store).
The contributions of this paper are the following: We present a reduction from gen-
eral DecSerFlow to DecSerFlow “Core” with no existence constraints nor cardinality
requirements (Theorem 2.3). For DecSerFlow Core, we formulate syntactic characteri-
zations (sufficient and necessary for conformance) for constraints involving (1) order-
ing and immediate constraints (Theorem 3.5), (2) ordering and alternating constraints
(Theorem 3.9), (3) alternating and immediate constraints (Theorem 3.20), or (4) only
precedence (or only response) constraints (Theorem 3.23). For the general case, it re-
mains open whether syntactic characterizations exist. Algorithms are also developed
to generate conforming strings when the schema is conformable. Finally, we designed
and implemented a conformance analyzer and our experimental evaluation shows that
(1) the syntactic condition approach is polynomially scalable (in time) comparing with
the exponential-time naive approach using automata, (2) the time complexity of con-
forming string generation varies from polynomial to exponential complexity, and (3)
the increasing number of constraints will increase the time needed of the automata ap-
proach exponentially more than the time needed by the syntactic condition approach.
The remainder of the paper is organized as follows. Section 2 defines DecSerFlow
constraints studied in this paper. Section 3 focuses on different combinations of con-
straints together with their conformance checking and conforming string generation.
A conformance checker is developed and evaluated in Section 4. Related work and
conclusions are provided in Sections 5 and 6, resp. Detailed proofs, some examples,
algorithms, and formal definitions are omitted due to space limitation.
Conformance for DecSerFlow Constraints 141
2 DecSerFlow Constraints
In this section we introduce DecSerFlow constraints, define the conformance problem,
state a straightforward result, and present a reduction to the case of “core” constraints.
Let A be an infinite set of activities, N the set of natural numbers, and A ⊆ A a finite
subset of A. A string over A (or A) is a finite sequence of 0 or more activities in A
(resp. A). A∗ (A∗ ) the set of all strings over A (resp. A).
A subsequence of a1 a2 ...an is a string ak1 ak2 ...akm , where (1) m ∈ N and m 1, (2)
ki ∈ [1..n] for each i ∈ [1..m], and (3) ki < ki+1 for each i ∈ [1..(m−1)]; a substring is a
subsequence ak1 ak2 ...akm where for each i ∈ [1..(m−1)], ki+1 = ki + 1.
Let A ⊆ A, a, b ∈ A. A (sequence) constraint on a, b is a constraint shown in Fig. 1.
Response Precedence
Res(a, b): each occurrence of a is Pre(a, b): each occurrence of a is
Ordering
followed by an occurrence of b preceded by an occurrence of b
aRes(a, b): in addition to Res(a, b), aPre(a, b): in addition to Pre(a, b),
Alternating
a and b alternate a and b alternate
iRes(a, b): each occurrence of a is imme- iPre(a, b): each occurrence of a is imme-
Immediate
diately followed by an occurrence of b diately preceded by an occurrence of b
Existence Exi(a, b): each occurrence of a implies an occurrence of b
For ordering precedence constraint Pre(a, b), if “a” occurs in a string, then before
“a”, there must exist a “b”, and between this “b” and “a”, all activities are allowed to
occur. Similarly, for alternating response constraint aRes(a, b), after an occurrence of
“a”, no other a’s can occur until a “b” occurs. For immediate precedence constraint
iPre(a, b), a “b” should occur immediately before “a”. The existence constraints have
no restrictions on temporal orders. Given a constraint c and a string s, denote s |= c if s
satisfies c, for example, s |= Res(a, b), if s = abcadb.
Definition 2.1. A (DecSerFlow) schema is a triple S = (A, C, κ) where A ⊆ A is a finite
set of activities, C a finite set of constraints on activities in A, and κ is a total mapping
from A to {0, 1}, called cardinality, to denote that an activity a ∈ A should occur at least
once (if κ(a) = 1) or no occurrence requirement (if κ(a) = 0).
Definition 2.2. A finite string s over A conforms to schema S = (A, C, κ) if s satisfies
every constraint in C and for each activity a ∈ A, s should contain a for at least κ(a)
times. If a string s conforms to S , s is a conforming string of S and S is conformable.
Conformance Problem: Given a schema S , is S conformable?
A naive approach to solve the conformance problem is to construct an automaton A
for each given constraint c (and each cardinality requirement r, i.e., an actvitiy occurring
at least 0 or 1 times), such that A can accept all strings that satisfy c (resp. accept all
strings that satisfy r) and reject all other strings. Then the conformance problem is
reduced to checking if the cross product of all constructed automata accepts a string.
However, the automata approach yields to exponential complexity in the size of the
input schema. Our goal is to find syntactic conditions to determine conformity that lead
to polynomial complexity.
142 Y. Sun and J. Su
For notation convenience, given a DecSerFlow schema S = (A, C, κ), if for each a ∈
A, κ(a) = 1, we simply use (A, C) to denote S .
Theorem 2.3. Given S = (A, C, κ) as a schema, there exists a schema S = (A , C ) such
that S is conformable iff S is conformable.
Theorem 2.3 shows that conformance of arbitrary schemas can be reduced to con-
formance of schemas where each activity occurs at least once. If each activity in a
given schema occurs at least once, the existence constraints are redundant. In the re-
mainder of this paper, we only focus on schemas with core constraints, i.e., from set
{Res, Pre, aRes, aPre, iRes, iPre} and that each activity occurs at least once.
simply as (A, E or
, Eor
, Eal
, Eal
).
For technical development, we review some well-known graph notions. Given a (di-
rected) graph (V, E) with vertex set V and edge set E ⊆ V × V, a path is a sequence
v1 v2 ...vn where n > 1, for each i ∈ [1..n], vi ∈ V, and for each i ∈ [1..(n−1)], (vi , vi+1 ) ∈ E;
n is the length of the path v1 ...vn . A path v1 ...vn is simple if vi ’s are pairwise distinct ex-
cept that v1 , vn may be the same node. A (simple) cycle is a (resp. simple) path v1 ...vn
where v1 = vn . A graph is cyclic if it contains a cycle, acyclic otherwise. Given an
acyclic graph (V, E), a topological order of (V, E) is an enumeration of V such that
for each (u, v) ∈ E, u precedes v in the enumeration. A subgraph (V , E ) of (V, E) is a
graph, such that V ⊆ V and E ⊆ E ∩ (V ×V ). A graph is strongly connected if there
is a path from each node in the graph to each other node. Given a graph G = (V, E)
and a set V ⊆ V, the projection of G on V , πV G, is a subgraph (V , E ) of G where
E = E ∩ (V ×V ). A strongly connected component (V , E ) of a graph G = (V, E) is a
strongly connected subgraph G = (V , E ) of G, such that (1) G = πV G, and (2) for
each v ∈ V − V , πV ∪{v}G is not strongly connected.
To obtain the syntactic conditions for deciding the conformance of ordering and
immediate constraints, we first present a pre-processing upon a given schema, such
that the given schema is conformable if and only if the pre-processed the schema is
conformable, and then show the syntactic conditions upon the pre-processed schemas.
Lemma 3.1. Given a schema S = (A, C) and its causality graph (A, E or
, E
or
, E
al
, E
al
,
E , E ), for each (u, v) ∈ E ∪ E , if there exists w ∈ A − {u}, such that (v, w) ∈ E
im im im im or
or
(or E ), then for each conforming string s of S , s satisfies Pre(u, w) (resp. Res(u, w)).
Conformance for DecSerFlow Constraints 143
Lemma 3.1 is straightforward. Based on Lemma 3.1, we define the following pre-
processing given a schema.
Definition 3.2. Given a schema S = (A, C), the immediate-plus (or im+ ) schema of S
is a schema (A, C ) constructed as follows: 1. Initially C = C. 2. Repeat the following
steps while C is changed: for each distinct u, v, w ∈ A, if (1) iPre(u, v) or iRes(u, v) is in
C and (2) Pre(v, w) ∈ C (or Res(v, w) ∈ C ), then add Pre(u, w) (resp. Res(u, w)) to C .
Example 3.3. A schema S has 3 activities, a, b, c, and 4 constraints iRes(a, c), iRes(b,
c), Pre(c, a), and Pre(c, b). Let S be the im+ schema of S . According to the definition
of im+ schema, in addition to the constraints in S , S also contains constraints: Pre(a, b)
(which is obtained from iRes(a, c) and Pre(c, b)) and Pre(b, a).
It is easy to see that for each given schema, its corresponding im+ schema is unique.
The following is a consequence of Lemma 3.1.
Corollary 3.4. A schema is conformable iff its im+ schema is conformable.
For reading convenience, we introduce the following notations: let x, y, z be one of
y x∪y x∪y x∪y∪z
x
‘or’, ‘al’, ‘im’; we denote E ∪ E as E and use similar notations E or E .
Theorem 3.5. Given a schema S = (A, C) where C contains only ordering and imme-
diate constraints, the im+ schema S of S , and the causality graph (A, E or
, E
or
, E
im
, E
im
)
of S , S is conformable iff the following conditions all hold.
or ∪ im or ∪ im
(1). (A, E ) and (A, E ) are both acyclic,
(2). for each (u, v) ∈ E (or E
im im
), there does not exist w ∈ A such that w u and (v, w) ∈
im im
E (resp. E ), and
(3). for each (u, v) ∈ E im
(or E im
), there does not exist w ∈ A such that w v and (u, w) ∈
im im
E (resp. E ).
In Theorem 3.5, Condition (1) restricts that the response or precedence direction does
not form a loop (a loop of the same direction can lead to infinite execution). Conditions
(2) and (3) similarly restrict that the immediate constraints are consistent. For example,
it is impossible to satisfy constraints iRes(a, b) and iRes(a, c), where a, b, c are activities.
Example 3.6 shows the importance of “pre-processing” to obtain im+ schemas.
Example 3.6. Let S and S be as stated in Example 3.3. Note that S satisfies all condi-
tions in Theorem 3.5. However, S does not, since Pre(a, b) and Pre(b, a) form a cycle in
or ∪ im
E , which leads to non-conformability of S . Therefore, a pre-processing to obtain
an im+ is necessary when determining conformability.
Given a conformable schema that contains only ordering and immediate constraints,
one question to ask is how to generate a conforming string. To solve this problem, we
first introduce a (data) structure, which is also used in the later sections.
For a schema S = (A, C), let πim (S ) = (A, C ) be a schema where C is the set of all
immediate constraints in C. The notation πim (S ) holds the projection of S on immediate
constraints. Similarly, let πal (S ) be the projection of S on alternating constraints.
Given a schema S = (A, C), if πim (S ) satisfies the conditions stated in Theorem 3.5,
then for each activity a ∈ A, denote S̄im (a) as a string constructed iteratively as follows:
144 Y. Sun and J. Su
(i) S̄im (a) = a initially, (ii) for the leftmost (or rightmost) activity u of S̄im (a), if there
exists v ∈ A such that iPre(u, v) ∈ C (resp. iRes(u, v) ∈ C), then update S̄im (a) to be vS̄im (a)
(resp. S̄im (a)v), i.e., prepend (resp. append) S̄im (a) with v, and (iii) repeat step (ii) until
no more changes can be made. For each a ∈ A, it is easy to see that S̄im (a) is unique and
is finite. Let Sim (a) be the set of activities that occur in S̄im (a).
Alg. 1 shows the procedure to create a conforming string given a schema with only
ordering and immediate constraints. The main idea of Alg. 1 relies on a topological
order of both the “precedence” and “response” directions (to satisfy the ordering con-
straints); then replace each activity a by S̄im (a) (to satisfy the immediate constraints).
Algorithm 1.
Input: A causality graph (A, E or
, E
or
, E
im
, Eim
) of an im+ schema of a schema S that
satisfies all conditions in Theorem 3.5
Output: A finite string that conforms to S
or ∪ im
A. Let “a1 a2 ...an ” and “b1 b2 ...bn ” be topological sequences of (A, E ) and
or ∪ im
(A, E ), resp.
B. Return the string “S̄im (bn )...S̄im (b1 )S̄im (a1 )...S̄im (an )”.
Definition 3.7. Given a schema S = (A, C) and its causality graph (A, E or
, E
or
, E
al
, E
al
,
im
E , E
im
), the alternating-plus (or al+ ) schema of S is a schema (A, C ) where
C = C ∪ {aPre(v, u) | (u, v) ∈ E
al al
, u and v are on a common cycle in (A, E ∪ E
al
)}
∪ {aRes(v, u) | (u, v) ∈ E , u and v are on a common cycle in (A, E ∪ E )}
al al al
It is easy to see that for each given schema, its corresponding al+ schema is unique.
Theorem 3.9. Given a schema S that only contains ordering and alternating constraints,
let S = (A, C) be the al+ schema of S and (A, E
or
, Eor
, E
al
, E
al
) the causality graph of
or ∪ al or ∪ al
S . S is conformable iff both (A, E ) and (A, E ) are acyclic.
Definition 3.12. Given an al+ schema S = (A, C) that contains only alternating and im-
al
mediate constraints, and its causality graph (A, E , E
al
, E
im
, E
im
), S is collapsable if S
satisfies all of the following.
al ∪ im al ∪ im
(1). (A, E ) and (A, E ) are acyclic,
(2). for each (u, v) ∈ E (or E
im im
), there does not exist w ∈ A such that w u and (v, w) ∈
im im
E (resp. E ),
(3). for each (u, v) ∈ E im
(or E im
), there does not exist w ∈ A such that w v and (u, w) ∈
im im
E (resp. E ), and
(4). for each distinct u, v, w ∈ A, if (u, w), (v, w) ∈ E im
or (u, w), (v, w) ∈ E
im
, then there
al ∪ im al ∪ im
is no path from w to either u or v in graph (A, E ∪ E ).
Note that Conditions (1)–(3) in the
u1 u2 u3
above definition are similar to the char- aRes
iPre
c
acterization stated in Theorem 3.5. f
b d
Example 3.13. Consider an al+ schema e
aPre
a
with activities a, b, c, d, e, f , and con-
al ∪ im
straints shown in Fig. 3 as (A, E Fig. 3. A collapsed schema example
al ∪ im
∪ E ) where the edge labels denote
types of constraints. (Ignore the dashed boxes labeled u1 , u2 , u3 for now.) The schema
is collapsable. However, if constraint iPre(a, c) is added to the schema, Condition (4)
(in the collapsability definition) is violated and thus the new schema is not collapsable,
al ∪ im al ∪ im
since ( f, c), (a, c) ∈ E
im
and there is a path cda from c to a in (A, E ∪ E ).
Definition 3.14. Given a collapsable schema S = (A, C) with only alternating and im-
mediate constraints, the collapsed schema of S is a schema (A , C ) constructed as
follows:
1. Initially A = A and C = C.
2. Repeat the following steps while (A , C ) is changed:
i. Let (A , E
al
, E
al
, E
im
, E
im
) be the corresponding causality graph of (A , C ).
al ∪ im al ∪ im
ii. for each u, v ∈ A on a common cycle in (A, E ∪ E ), If (u, v) ∈ E
im
or
im
E , then (1) remove each X(u, v) or X(v, u) from C , where X ranges over aRes,
aPre, iRes, and iPre. (2) Create node wuv ; let A := A − {u, v} ∪ {wuv }, and (3)
replace each u and v in C by wuv .
It is easy to show that given a collapsable al+ schema, the corresponding collapsed
schema is unique. The following lemma (Lemma 3.15) is easy to verify.
Lemma 3.15. Given a collapsable al+ schema S with only alternating and immediate
constraints, S is conformable iff its collapsed schema is conformable.
By Corollaries 3.8 and 3.15, conformance checking of a schema that only contains
alternating and immediate constraints can be reduced to the checking of its collapsed
version. Thus, in the remainder of this subsection, we focus on collapsed schemas.
In order to have a clean statement of the necessary and sufficient condition, we in-
troduce a concept of “gap-free”. Essentially, “gap-free” is to deal with a special case of
a schema illustrated in the following Example 3.16.
Conformance for DecSerFlow Constraints 147
Example 3.16. Continue with Example 3.13; note that the schema in Fig. 3 is a col-
lapsed schema. Consider a schema S u2 that only contains activities a, b, and f , together
with the constraints among them shown in Fig. 3 (i.e., a “subschema” bounded by the
dashed box labeled as “u2 ”). Based on Theorem 3.9, S u2 is conformable and a con-
forming string is ba f . Now consider a schema S u1,2 that only contains activities e, a, b,
and f , together with the constraints among them shown in Fig. 3 (i.e., a “subschema”
bounded by the dashed boxes labeled as “u1 ” and “u2 ” together with the constraints
crossing u1 and u2 ). Due to constraints iRes(e, b) and iPre(e, f ), if S u1,2 is conformable,
then each conforming string of S u1,2 must contain substring “ f eb”. This requirement
leads to some restriction upon schema S u2 , i.e., if we take out activity “e” from S u1,2
and focus on schema S u2 again, one restriction would be: is there a conforming string
of S u2 that contains a substring f b? If the answer is negative, then apparently, S u1,2 is
not conformable, since no substring f eb can be formed.
With the concern shown in Example 3.16, we need a checking mechanism to de-
cide if two activities can occur as a substring (i.e., “gap-free”) in some conforming
al
string. More specifically, given (A, E , Eal
, Eim
, E
im
) as a causality graph of a collapsed
schema S , we are more interested in checking if two activities that in the same strongly
connected component in (A, E al
∪ E al
) can form a substring in a conforming string of
S . Note that in Example 3.16, activities a, b, and f are in the same strongly connected
component labeled with u2 in (A, E al
∪ E al
).
Definition 3.17. Let S = (A, C) be a schema that only contains alternating constraints
and (A, Eal
, E
al al
) the causality graph of S , such that (A, E ∪ E
al
) is strongly connected.
Given two distinct activities u, v ∈ A, u, v are gap-free (wrt S ) if for each w, x, y ∈ A,
al
the following conditions should all hold wrt graph (A, E ):
(a). if there is a path p with length greater than 2 from u to v, the following all hold:
(i). if w is on p, then (u, v) E
al
,
(ii). if there is a path from x to u, then (x, v) E
al
,
(iii). if there is a path from v to y, then (u, y) E
al
,
(iv). if there are paths from x to u and v to y, and then (x, y) E al
, and
(b). if there is a path from v to u, then the following all hold:
(i). if there is a path from x to v, then (x, u) E
al
,
(ii). if there is a path from u to y, then (v, y) E , and
al
Given a graph G = (V, E), for each v ∈ V, denote SV(v) to be the set of all the nodes
in the strongly connected component of G that contains v.
Let (A, C) be a collapsed schema and (A, Eal
, E
al
, E
im
, E
im
) its causality graph. Con-
sider graph (A, E ∪E ∪E ∪E ); given an activity a ∈ A, denote S (a) to be a schema
al al im im
Example 3.19. Continue with Example 3.13; consider the schema in Fig. 3. Note that
the schema is a collapsed schema. SV(a) = SV(b) = SV( f ) is the strongly connected com-
ponent of the graph in Fig. 3 with nodes a, b, and f . Moreover, S (a) = S (b) = S ( f ) is
a schema that only contains activities a, b, and f , together with the constraints among
them in Fig. 3.
The following Theorem 3.20 provides a necessary and sufficient condition for con-
formability of schema with only alternating and immediate constraints.
Theorem 3.20. Given a schema S that only contains alternating and immediate con-
straints, S is conformable iff the following conditions all hold.
(1). S is collapsable,
(2). πal (S̃ ) is conformable (recall that πal denotes the “projection” only upon alternating
constraints), where S̃ is the collapsed schema of S , and
(3). Let (A, E al
, E
al
, E
im
, E
im
) be the causality graph of the collapsed schema S̃ , for
each u, v, w ∈ A, if there is a path from u to w in (A, E im
), there is a path from u to v
al ∪ im al ∪ im
in (A, E ), and SV(w) = SV(v) wrt (A, E ∪E ), then either (1) v, w are gap-
im
al ∪ im al ∪ im
free wrt S (v) if v w, or (2) v has no outgoing edge in graph (A, E ∪ E )
if v = w.
Example 3.21. Continue with Example 3.19; consider the schema in Fig. 3. The schema
satisfies the conditions in Theorem 3.20 and is conformable. A conforming string can
be bdac f ebdac f .
Similar to the previous combinations of constraints, given a schema with only order-
ing and alternating constraints, an algorithm to construct a conforming string is desired.
In this case, the algorithm is rather complicated and thus omitted The main idea is that
(1) for each activity a, construct a string that satisfies each constraint “related” to a
as well as each alternating constraint with in a strongly connected component, and (2)
hierarchically link these constructed strings together. In this paper, we only provide an
example of the algorithm.
Example 3.22. Consider the schema shown in Fig. 3. We first construct a string for ac-
tivity e starting with base S̄im (e) = c f eb, where f and b are both in strongly connected
component u2 ; while c is in u3 . According to the property of gap-free for f and b,
there must exist a string that satisfies every constraint in u2 and has f b as a substring;
a possible string could be: s1 = ba f ba f . Similarly, string s2 = dc satisfies every con-
straint in u3 and has c as a substring; then we “glue” the underline parts of s1 and s2 to
each end of S̄im (e) and have badc f eba f . Note that this string satisfies every immediate
constraint containing e and every alternating constraint within u1 , u2 , and u3 . Further,
as there is an alternating precedence constraint from e to b, to satisfy that, we “glue”
the topological order of u2 before badc f eba f , and have ŝ(e) = ba f badc f eba f . Note
that ŝ(e) satisfies every constraint containing e and every alternating constraint within
u1 , u2 , and u3 . In general, for each activity, a string ŝ(∗) is constructed. For example
ŝ(b) = b and ŝ(a) = dca.
The second step is to link these ŝ(∗) strings together. The way to link them is first
constructing a topological order of all the strongly connected components. For example
Conformance for DecSerFlow Constraints 149
Algorithm 3.
Input: A causality graph (A, E or
, E
al
, E
im
) of a schema S that satisfies both conditions
in Theorem 3.23
Output: A finite string that conforms to S
or ∪ al ∪ im
A. Let string s be a topological order of (A, E ). For each a ∈ A, let ŝ(a) be
the substring s s [k] [k+1]
...s[len(s)]
of s such that s = a (clearly k ∈ [1..len(s)]). Let
[k]
i = 1.
B. While i len(s), repeat the following step:
B1. If (s[i] , v) ∈ Eim
for some v ∈ A and either i = len(s) or s[i+1] v, then replace
[i] [i]
s in s by s ŝ(v).
B2. Increment i = i + 1.
C. Return s.
150 Y. Sun and J. Su
4 Experimental Evaluations
In this section, several experiments are conducted to evaluate the performance of the
syntactic-condition-based conformance checking approaches. Three main types of al-
gorithms are implemented, including: (1) The naive algorithm to check DecSerFlow
conformance using automata (denoted as Chk-A), (2) the syntactic-condition-based
conformance checking algorithms for all four combinations of predicates (denoted as
Chk-Or-Im for ordering and immediate constraints, Chk-Or-Al, Chk-Al-Im, and Chk-
Sin for single direction constraints, i.e., either response or precedence), and (3) all four
conforming string generation algorithms (denoted as Gen-Or-Im, Gen-Or-Al, Gen-Al-
Im, and Gen-Sin). All algorithms are implemented in Java and executed on a com-
puter with 8G RAM and dual 1.7 GHz Intel processors. The data sets (i.e., DecSer-
Flow schemas) used in experiments are randomly generated. Schema generation uses
two parameters: number of activities (#A) and number of constraints (#C), where each
constraint is constructed by selecting a DecSerFlow predicate and two activities in a
uniform distribution. Each experiment records the time needed for an algorithm to com-
plete on an input schema. In order to collect more accurate results, each experiment is
done for 1000 times to obtain an average time result with the same #A and same #C for
schemas having #A < 200, 100 times for schemas having #A ∈ [200, 400), and 10 times
for #A ∈ [400, ∞). The reason to have less times of experiments for larger #A is that it
takes minutes to hours for a single algorithm execution with large #A, which makes it
impractical to run 1000 times. We now report the findings.
The automata approach is exponentially more expensive than syntactic conditions
We compared the time needed for the automata and syntactic condition approaches on
checking the same set of schemas that contain only ordering and alternating constraints.
(For other three types of combinations of constraints, the results are similar). The input
schemas have n activities and either n, 2n , or 2n
3 constraints, where n ranges from 4 to 28.
Fig. 4 shows the results (x-axis denotes the number of activities and y-axis denotes the
time needed in the log scale). It can be observed that for the automata approach, the time
needed is growing exponentially wrt the number of activities/constraints. For a schema
with 28 activities and 28 constraints, it takes more than 3 hours to finish the checking.
However, the syntactic condition approaches (whose complexity is polynomial) can
finish the conformance checking almost instantly. As the times needed for either n, n2 ,
or 2n
3 constraints are all too close around 1ms, we only use one curve (instead of three)
in Fig. 4 to represent the result for the syntactic conditions approach.
Conformance for DecSerFlow Constraints 151
Fig. 7. String Generation Fig. 8. Str. Gen. / Checking Fig. 9. Changing #Constraints
The syntactic conditions approaches have at most a cubic growth rate in the size of
the input schemas
We compute the times needed for the syntactic condition approaches for input schemas
with n activities and n constraints, n between 50 and 500. Fig. 5 and 6 show the same re-
sult with normal and logarithm scales (resp.) of all four combinations of the constraints.
From the result, the complexity of the syntactic condition approach for alternating and
immediate constraints appears cubic due to the checking of Condition (4) of Definition
3.12 (collapsable); the complexity for ordering and immediate constraints is quadratic
due to the pre-processing to form an im+ schema; the complexity for ordering and al-
ternating constraints is linear as the pre-processing (to form an al+ schema by detecting
strongly connected components) as well as the acyclicity check of the causality graphs
are linear; finally, the complexity for the constraints of a single direction is also linear.
Conforming string generation requires polynomial to exponential times
With the same experiment setting as above, Fig. 7 shows the time to generate a conform-
ing string for a conformable schema. From the results, all string generating approaches
are polynomial except for the single direction case (i.e, either response or precedence).
According to Alg. 3, the length of a generated string can be as long as 2n , where n is
the number of activities in the given schema. Fig. 8 presents the ratios of the time to
generate a conforming string over the time to check conformance of the same schema
for conformable schemas. The results indicate that the complexity to generate a string
can be polynomially lower (ordering and immediate case), the same (alternating and
immediate case), polynomially higher (ordering and alternating case), and exponen-
tially higher (single direction case) than the corresponding complexity to check con-
formance of the same schema. Note that the curves in Fig. 8 is lower or “smaller” than
dividing “Fig. 7” by “Fig. 5” due to the reason that the data shown in Fig. 7 is only for
the conformable schemas; while the one in Fig. 5 is for general schemas, where non-
conformable schemas can be determined 5 - 15% faster than conformable ones due to
the reason that a non-comformable schema fails the checking if it does not satisfy one
of the conditions (e.g., in Theorem 3.5, there are three conditions to check); while a
comformable schema can pass the check only after all conditions are checked.
Increasing the number of constraints increases more time for the automata approach
than syntactic condition approaches
We compute the time needed for the syntactic condition approaches with input schemas
containing only ordering and immediate constraints with n activities and either n, 2n,
152 Y. Sun and J. Su
or n2 constraints, where n ranges from 50 to 500. (For other three types of combina-
tions of constraints, the results are similar). Fig. 9 shows the three curves for n, 2n,
and n2 constraints respectively. Comparing the similar settings shown in Fig. 4, there
does not exist an obvious growth in time when the number of constraints grow and the
curves are almost the same. The reason is that the algorithms we used to check confor-
mance and generate strings are graph-based approaches. As #C ∈ [ #A 2 , 2#A], we have
O(#C) = O(#A) that can provide the same complexity. Moreover, if #C < #A 2 , there
will be activities involving in no constraint, which leads to a non-practical setting; if
#C > 2#A, almost all the randomly generated schemas will be non-confomable based
on uniform distribution.
5 Related Work
The work reported here is a part of the study on collaborative systems and choreography
languages [16]. The constraint language studied is a part of DecSerFlow [2], whose
constraints can be translated to LTL [13].
Original LTL [13] is defined for infinite sequences. [15] proved that LTL satisfiability
checking is PSPACE-Complete. A well-know result in [17] shows that LTL is equivalent
to Büchi automata; and the LTL satisfiability checking can be translated to language
emptiness checking. Several complexity results on satisfiability developed for subsets
of LTL. [5] shows that restriction to Horn formulas will not decrease the complexity of
satisfiability checking. [6] investigates the complexity of cases restricted by the use of
temporal operators, their nesting, and number of variables. [4] and [3] provide upper and
lower bounds for different combinations of both temporal and propositional operators.
[7] presents the tractability of LTL only with combination of “XOR” clauses.
For the finite semantics, [8] studies the semantics of LTL upon truncated paths. [10]
provides an exponential-time algorithm to check if a given LTL formula can be satisfied
by a given finite-state model, but the execution is still infinite.
Business process modeling has been studied variously in the last decade ([9,1]). Pre-
vious studies of declarative models focus mostly on formal verification of general prop-
erties involving data, generally, such verification problems have exponential or higher
time complexity (see [9]).
6 Conclusions
This paper studied syntactic characterization of conformance for “core” DecSerFlow
constraints that are reduced from general DecSerFlow constraints. We provided char-
acterizations for (1) ordering and immediate constraints, (2) ordering and alternating
constraints, (3) alternating and immediate constraints, and (4) ordering, alternating, and
immediate constraints with precedence (or response) direction only. The general case
for ordering, immediate, and alternating constraints with both precedence and response
directions remains as an open problem; furthermore, it is unclear if the conformance
problem for DecSerFlow constraints is in PTIME.
Conformance for DecSerFlow Constraints 153
References
1. van der Aalst, W.M.P.: Business process management demystified: A tutorial on models, sys-
tems and standards for workflow management. In: Desel, J., Reisig, W., Rozenberg, G. (eds.)
Lectures on Concurrency and Petri Nets. LNCS, vol. 3098, pp. 1–65. Springer, Heidelberg
(2004)
2. van der Aalst, W.M.P., Pesic, M.: DecSerFlow: Towards a Truly Declarative Service Flow
Language. In: Bravetti, M., Núñez, M., Zavattaro, G. (eds.) WS-FM 2006. LNCS, vol. 4184,
pp. 1–23. Springer, Heidelberg (2006)
3. Artale, A., Kontchakov, R., Ryzhikov, V., Zakharyaschev, M.: The complexity of clausal
fragments of LTL. In: McMillan, K., Middeldorp, A., Voronkov, A. (eds.) LPAR-19 2013.
LNCS, vol. 8312, pp. 35–52. Springer, Heidelberg (2013)
4. Bauland, M., Schneider, T., Schnoor, H., Schnoor, I., Vollmer, H.: The complexity of gen-
eralized satisfiability for linear temporal logic. In: Seidl, H. (ed.) FOSSACS 2007. LNCS,
vol. 4423, pp. 48–62. Springer, Heidelberg (2007)
5. Chen, C.C., Lin, I.P.: The computational complexity of satisfiability of temporal horn formu-
las in propositional linear-time temporal logic. Inf. Proc. Lett. 45(3), 131–136 (1993)
6. Demri, S., Schnoebelen, P., Demri, S.E.: The complexity of propositional linear temporal
logics in simple cases. Information and Computation 174, 61–72 (1998)
7. Dixon, C., Fisher, M., Konev, B.: Tractable temporal reasoning. In: Proc. International Joint
Conference on Artificial Intelligence (IJCAI). AAAI Press (2007)
8. Eisner, C., Fisman, D., Havlicek, J., Lustig, Y., McIsaac, A., Van Campenhout, D.: Reasoning
with temporal logic on truncated paths. In: Hunt Jr., W.A., Somenzi, F. (eds.) CAV 2003.
LNCS, vol. 2725, pp. 27–39. Springer, Heidelberg (2003)
9. Hull, R., Su, J., Vaculín, R.: Data management perspectives on business process manage-
ment: tutorial overview. In: SIGMOD Conference, pp. 943–948 (2013)
10. Lichtenstein, O., Pnueli, A.: Checking that finite state concurrent programs satisfy their linear
specification. In: POPL, pp. 97–107 (1985)
11. Pesic, M., Schonenberg, H., van der Aalst, W.M.P.: Declare: Full support for loosely-
structured processes. In: EDOC, pp. 287–300 (2007)
12. Pichler, P., Weber, B., Zugal, S., Pinggera, J., Mendling, J., Reijers, H.A.: Imperative versus
declarative process modeling languages: An empirical investigation. In: Daniel, F., Barkaoui,
K., Dustdar, S. (eds.) BPM Workshops 2011, Part I. LNBIP, vol. 99, pp. 383–394. Springer,
Heidelberg (2012)
13. Pnueli, A.: The temporal logic of programs. In: FOCS, pp. 46–57 (1977)
14. Silva, N.C., de Carvalho, R.M., Oliveira, C.A.L., Lima, R.M.F.: REFlex: An efficient web
service orchestrator for declarative business processes. In: Basu, S., Pautasso, C., Zhang, L.,
Fu, X. (eds.) ICSOC 2013. LNCS, vol. 8274, pp. 222–236. Springer, Heidelberg (2013)
15. Sistla, A.P., Clarke, E.M.: The complexity of propositional linear temporal logics. J.
ACM 32(3), 733–749 (1985)
16. Sun, Y., Xu, W., Su, J.: Declarative choreographies for artifacts. In: Liu, C., Ludwig, H.,
Toumani, F., Yu, Q. (eds.) Service Oriented Computing. LNCS, vol. 7636, pp. 420–434.
Springer, Heidelberg (2012)
17. Vardi, M.Y., Wolper, P.: An automata-theoretic approach to automatic program verification
(preliminary report). In: LICS, pp. 332–344 (1986)
18. Xu, W., Su, J., Yan, Z., Yang, J., Zhang, L.: An Artifact-Centric Approach to Dynamic
Modification of Workflow Execution. In: Meersman, R., Dillon, T., Herrero, P., Kumar, A.,
Reichert, M., Qing, L., Ooi, B.-C., Damiani, E., Schmidt, D.C., White, J., Hauswirth, M.,
Hitzler, P., Mohania, M. (eds.) OTM 2011, Part I. LNCS, vol. 7044, pp. 256–273. Springer,
Heidelberg (2011)
Integrating On-policy Reinforcement Learning
with Multi-agent Techniques for Adaptive
Service Composition
1 Introduction
As the mainstream paradigm of SOC (Service-oriented Computing), the research
on theories of service composition and related technologies for seamless integra-
tion of business applications is always the core proposition. However, large-scale
service composition faces a multitude of thorny issues, such as, accuracy, inter-
operability, efficiency and adaptability for practical use, if there exist massive
services with similar functionality in a highly-dynamic environment.
Under the premise of validity for service composition, efficiency, adaptability
and optimality of composition in large-scale and dynamic scenarios are especially
significant. First of all, both the complexity of business flows and the quantity of
candidate services may affect the efficiency of the service orchestration. Secondly,
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 154–168, 2014.
c Springer-Verlag Berlin Heidelberg 2014
Integrating On-policy Reinforcement Learning with Multi-agent Techniques 155
how to adapt to the services’ internal changes and external dynamic environment
is a grand challenge. Furthermore, how to achieve the optimal aggregated QoS
should also be taken into consideration. Therefore, a novel method should be
proposed to obtain a good balance between those objectives.
Previous studies mainly focus on integer programming, graph planning, rein-
forcement learning (RL) and so on. Ardagna et al. [1] modelled the QoS infor-
mation of candidate services by a multi-channel framework, and then utilized
Mixed Integer Programming (MIP) to obtain the optimal solution. However,
this method only performs well for small-scale problems, and the computing
resource consumption may become prohibitive when faced with large-scale sce-
narios. Beauche et al. [2] used a hierarchical planning approach based on graph
planning and hierarchical task networks to construct adaptive service compo-
sition. However, continuous emergence and demise of services lead to sustained
search of viable services for updating the corresponding planning graph, which is
not suitable for a highly dynamic environment. Jureta et al. [7] proposed a multi
criteria-driven reinforcement learning algorithm to ensure that the system is re-
sponsive to the availability changes of Web services. We also [20] proposed an
adaptive service composition approach based on reinforcement learning method
combined with logic preference. Despite the effectiveness of conventional rein-
forcement learning in achieving adaptability, such methods can not ensure high
efficiency in a large-scale and complex scenario.
As a subdiscipline of distributed artificial intelligence (DAI) [15], multi-agent
techniques have arisen as a viable solution for modularity, more computing
power, scalability and flexibility required by service composition [16]. Some re-
searchers have already applied multi-agent techniques to service composition.
Maamar et al. [11] proposed a web service composition method based on multi
agents and context awareness. Gutierrez-Garcia et al. [5] characterized behavior
of the services with Colored Petri-net, and exploited multi-agent techniques for
services orchestration in the context of cloud computing. Unfortunately, those
methods seldom take self-adaptivity into consideration.
In view of superiority from RL and multi-agent technologies, a natural idea to
achieve self-adaptability in a dynamic environment and maintain acceptable effi-
ciency when faced with massive candidate services is to combine them together,
which has already been discussed in the field of DAI and is called Multi-agent
reinforcement learning (MARL) [15]. On the one hand, RL is a commonly used
machine learning method for planning and optimization in a dynamic environ-
ment [18], which learns by trial-and-error interaction with dynamic environment
and thus has good self-adaptability. On the other hand, multi-agent technology
can compensate for inefficiencies under large-scale and complex scenarios.
In this paper, we propose a new adaptive model that is built upon MARL.
Different from previous work, this new model is based on team Markov Games,
which is more mature and generic for service composition in a multi-agent sce-
nario. To tackle the common problems of agent coordination and equilibrium
selection emerged in a multi-agent environment, we introduce the coordination
equilibrium and fictitious play process to ensure the agents to converge to a
156 H. Wang et al.
unique equilibrium when faced with multiple equilibriums. Finally, we have pro-
posed the multi-agent Sarsa algorithm for our multi-agent service composition.
Our contributions are summarized as follows:
– We introduce a TMG-WSC model for service composition with massive can-
didate services in a highly dynamic and complex environment.
– We propose a multi-agent Sarsa algorithm to adapt to the multi-agent service
composition scenarios and achieve a better performance.
– We present the concept of multi-agent service composition that caters for
the distributed environment and big data era.
The reminder of this paper is organized as follows. Section 2 compares our
approach against some related works. Section 3 introduces the problem formu-
lation and basic definitions. Section 4 presents our approach for service compo-
sition based on MARL. In section 5, some experimental results are presented for
evaluating the proposed approach. Finally, the paper is concluded in Section 6.
2 Related Work
In this section, we review some existing works that are most relevant to our
approach, including RL and agent techniques adopted in service composition.
Moustafa et al. [13] proposed a approach to facilitate the QoS-aware ser-
vice composition problem using multi-objective reinforcement learning, but the
method is not very efficient for large-scale service composition scenarios. Our
prior work [20] suffer from the same issue with preceding method.
Xu et al. [22] proposed a multi-agent learning model for service composition,
based on the Markov game and Q-learning with a hierarchical goal structure
to accelerate the searching of states during the learning process. However, their
model may not work well when faced with a complicated goal with more mu-
tual dependencies between each sub-goals as their agents are fixed for certain
service classes. We proposed a multi-agent learning model [19] based on MDP
and knowledge sharing before, however this can not be regarded as a real multi-
agent framework as the MDP is designed for a single agent and does not take
the potential collaboration between agents into consideration.
MARL has strong connections with game theory [4], because the relation be-
tween agents has a great impact on the design of learning dynamics. According
to Claus and Boutilier [4], the MARL can be classified into two forms. The first
one is independent learners (ILs), which just apply RL methods (Q-learning,
Sarsa etc.) and ignore the existence of other agents. The second one is joint
action learners (JALs), which learn their actions in conjunction with others via
integration of RL with a certain kind of Nash equilibrium, just like the coor-
dination equilibrium [3,4]. Consequently, agents coordination and equilibrium
selection are the key issue in MARL for JALs. Wang et al. [21] proposed an
algorithm which can ensure to converge to an optimal equilibrium, but its high
computational cost has limited its practical use.
In this paper, we integrate on-policy reinforcement learning with multi-agent
techniques for services composition. The proposed approach is fundamentally
Integrating On-policy Reinforcement Learning with Multi-agent Techniques 157
3 Problem Formulation
Markov Games is so called team Markov Games when all agents strive for a
common goal and thus share a common reward function. Here we adopt team
Markov Games as all agents work for a common service workflow.
However, Markov Games can not directly replace the MDP model for multi-
agent service composition, because some differences arises when trying to transfer
some concepts in the MDP-WSC model [20] to the new multi-agent environment.
For example, in MDP-WSC, there is only one learning agent, which always
starts from the initial state. If it finally reaches the terminal state, it can get a
full path from the initial state to the terminal state according to its trajectory.
Unfortunately, it is much more complicated in the multi-agent scenario, as
there are a group of learning agents and each one starts from one of the states
randomly instead of a fixed initial state in MDP-WSC model. So even someone
has reached one of the terminal states, we can not claim that they have completed
current learning episode and got the full path, because this “lucky” one may not
start from the initial state, and consequently what it has marched is just part
of the whole path. In order to handle this problem, we need to introduce some
new concepts to fit in the new multi-agent scenario.
6
A(S0)={Vacation time}
A(S1)={Vacation place}
6 A(S2)={Flight, Train, Ship}
A(S3)={Airfare}
6 6 6 6 6 6
A(S4)={Interchange, Ship fare}
6
A(S5)={Train fare}
6 A(S6)={Train}
A(S7)={Luxury Hotel, Budget Hotel}
A(S8)={Luxury Cost}
A(S9)={Budget Cost}
6
Defnition 6 (Passed State Set). The set Sp is a passed state set iff Sp con-
tains all the states that agents in the team have passed by.
We can display a back trace from the terminal state and check whether it can
stretch back to the initial state using the Passed State Set. Next, we will propose
our multi-agent model called TMG-WSC for service composition, which is based
on Team Markov Games (TMG) and new concepts mentioned before.
$JHQW
s s0 u s2 u s7
6
6 $JHQW
6
$JHQW A A S1 u A S 3 u A S 8
^VacationPlace, Airfare, LuxuryCost`
6 $LUIDUH
/X[XU\
&RVW
6
6 6 6 6 6 6
6
9DFDWLRQ
3ODFH 6
6
4.1 SARSA
Compared with off-policy learning methods like Q-learning, on-policy learning
methods has an advantage in on-line performance, since the estimation policy,
that is iteratively improved, is also the policy used to control its behavior [17].
Sarsa is a classic on-policy reinforcement learning method. The task of the
learner in Sarsa is to learn a policy that maximizes the expected sum of reward.
The cumulative reward starting from an arbitrary state st and following a policy
π is defined as Eq.1, where rt+i is the expected reward in each step, and γ is a
discount factor.
∞
V π (st ) = rt + γ ∗ rt+1 + γ 2 ∗ rt+2 + ... = γ i ∗ rt+i (1)
i=0
Based on Eq.1, we can deduce the reward of action pair < st , at >, that is, the
feedback of executing action at at state st , which is defined as Eq.2, where st+1
Integrating On-policy Reinforcement Learning with Multi-agent Techniques 161
is the resulting state by executing at and P (st+1 |st , at ) is the probability distri-
bution, r(st , at ) represents the immediate reward of taking action at at state st ,
which is defined as Eq.3.
Q(st , at ) = r(st , at ) + γ ∗ P (st+1 |st , at ) ∗ V π (st+1 ) (2)
st+1
m
Attai t − Attmin
r(st , at ) = wi ∗ i
(3)
i=1
Attmax
i − Att min
i
In Eq.3, Attai t represents the observed value of the ith attribute of the service
corresponding to the executed action at , and Attmax i , Attmin
i represent the max-
imum and minimum values of Atti for all services. wi is the weighting factor.
wi is positive if users prefer Atti to be high (e.g. reliability). wi is negative if
preferring Atti to be low (e.g. service fee). m is the number of QoS attributes.
The Q function represents the best possible cumulative reward of executing
at at st . We can run dynamic programming (value iteration) by performing the
Bellman back-ups in terms of the Q function as follows:
Q(st , at ) = r(st , at ) + γ ∗ P (st+1 |st , at ) ∗ Q(st+1 , at+1 ) (4)
st+1
n
m at
Attij − Attmin
ij
R(st , at ) = wij ∗ (6)
i=1 j=1
Attmax
ij − Att min
ij
162 H. Wang et al.
Based on Eq.6, we can plug this and rewrite Eq.5 in a multi-agent form:
Qi1 ,i2 ...in (st , at ) ← (1 − α) ∗ Qi1 ,i2 ...in (st , at ) + α ∗ [R(st , at ) + γ ∗ Qi1 ,i2 ...in (st+1 , at+1 )]
(7)
For many Markov games, there is no policy that is un-dominated because the
performance depends critically on the behavior of the other agents. Then, how
can we define a deterministic optimal policy in this case? An natural idea from
the game-theory literature is to define an agent’s optimal behavior as being its
behavior at a Nash equilibrium. Some researchers, like Littman, have already
done such work in the field of MARL [6,9,10]. Here we adopt Littman’s idea and
give the definition of multi-agent optimal policy as follows:
π ∗ (s, a1 , ..., an ) = π1 (s, a1 ) ∗ . . . ∗ πn (s, an ) ∗ Q(s, a1 , ..., an )
a1 ,...,an
Caj
Priaj = j j (9)
Cbj
bj ∈Aj
Integrating On-policy Reinforcement Learning with Multi-agent Techniques 163
Dov Monderer gives the definition of Fictitious Play Property and also proves
the following theorem in his work [12].
Defnition 8 (Fictitious Play Property). A game has the fictitious play prop-
erty (FPP) if every fictitious play process converges in beliefs to equilibrium.
Theorem 1: Every game with identical payoff functions has the fictitious play
property. 2
In view of Theorem 1, we can deduce that the team game where the agents
have common interests has the fictitious play property. Hence, fictitious play
process can be applied in the Team Markov Games and help to converge to a
unique equilibrium surely despite the existence of multiple equilibriums.
To improve the efficiency of the fictitious play process, Young [23] proposed
an optimized version and proved its validity. Based on it, we construct a new
function that combines the Q-value and fictitious play process together for esti-
mating cumulative reward of joint action in TMG-WSC. It is defined as follows,
Ktm (Aj )
k is a probability model for agent i at the joint state s, based on the ficti-
tious play process. t is the number of times for attending state s. ai is the action
chosen by the ith agent. m is the length of the queue which stores the reduced
joint action a−i of agent i s opponents in chronological order. Ψ (s, a−i ) is the
best response for agent i s opponents’ joint action at state s.
Ktm (Aj )
W EQ(s, ai ) = Qi,j (s) (10)
k
Aj ∈Ψ (s,a−i )
1≤j≤n,j =i
Finally, we need a learning policy for the learner to execute the TMG-WSC
during the learning. Here we choose the Boltzmann learning policy as it can
better characterize our coordination mechanism and equilibrium selection tech-
nique. The Boltzmann exploration used here can be depicted as follows, T is
temperature parameter, T = T0 ∗ (0.999)c , T0 is an initial value, c is the fre-
quency that the learner is in state st .
b∈A
Initialization:
Qi1,i2,...,in (st , at )
repeat
// for each episode
each agent choose an action ai (i = 1, 2, ..., n) based on Eq.11,
and form the joint action at = a1 × a2 × · · · × an ;
repeat
// for each step of a episode
1. On-Policy Learning
take joint action at , observe R, st+1 , each agent choose action a based on
Eq.11, and form the joint action at+1 = a1 × a2 × · · · × an
Q(st , at ) ←− (1 − α) ∗ Q(st , at ) + α ∗ [R + γ ∗ Q(st+1 , at+1 )]
st ←− st+1 , at ←− at+1 ;
2.Terminal condition check
if st is a possible terminal state, st = s1 × s2 × ... × sn then
Create a set named T emp, T emp = {st },
Create a set named P rev, P rev contains all the
previously passed states of any element in T emp
end if
while Sp ∩ T emp = Φ and s0 ∈ / P rev do
T emp ← Sp ∩ T emp
P rev ← all the previous states of any element in T emp
end while
if P rev contains S0 then
This episode is ended
end if
until The current episode is ended
until the cumulative reward matrix converges
Algorithm 1. Multi-agent Sarsa based on TMG-WSC
20 20
Multi-Sarsa 2000 services
Single-S 3000 services
Single-Q 4000 services
discounted cumulative reward
15 15
10 10
5 5
0 0
1000 1500 2000 2500 3000 3500 4000 4500 5000 1000 1500 2000 2500 3000 3500 4000 4500 5000
Episodes Episodes
(a) (b)
Fig. 3. (a) Effectiveness and Efficiency Comparison (b) Different number of services
2. Scalability
The purpose of the second experiment set is to assess the scalability of the
proposed Multi-Sarsa algorithm. We probe the influence of the service, state
and agent number respectively.
Firstly, we vary the number of services for each state node from 2000 to 4000
while fixating the agents number for 4 and state nodes for 100. From Fig.3 (b), we
know that the increasing number of candidate services for each state node may
postpones the convergence. In 2000-service scenario, the Multi-Sarsa converges
at about the 4200th episode, while converging at about the 4500th episode in
3000-service and about the 4700th episode in 4000-service. However, increasing
the number of services does not necessarily mean the corresponding improvement
of service quality, so the rewards may be higher or lower. In a word, Multi-Sarsa
always converge at an acceptable time despite of vast candidate services.
166 H. Wang et al.
80 20
200 states 8 agents
300 states 12 agents
70 400 states 16 agents
%22.88 deviation from the optimal
discounted cumulative reward
40 10
%15.5 deviation from the optimal
30
20 5
10
0 0
1000 1500 2000 2500 3000 3500 4000 4500 5000 1000 1500 2000 2500 3000 3500 4000 4500 5000
Episodes Episodes
(a) (b)
Fig. 4. (a) Different number of state nodes (b) Different number of agents
20
change %1
change %5
change %10
discounted cumulative reward
15
10
0
1200 1600 2000 2400 2800 3200 3600 4000 4400 4800
Episodes
Secondly, we fix the agent number and service number of each state node as
4 and 1000 respectively, and vary the state nodes from 200 to 400. As shown in
Fig.4 (a), the bigger number of state nodes corresponds to higher values of the
optimal convergence and a slower convergence rate. In 200-state-node scenario,
the Multi-Sarsa converges at about the 4100th episode with rewards 33.8, and
in 400-state-node, it converges at about the 4500th episode with rewards 61.7.
What’s more, we calculate the deviation of the current convergence rewards from
the optimal convergence rewards in different scenarios by D = OP OP
R−CCR
R , where
D represents deviation degree, OP R indicates the optimal convergence rewards,
and CCR is the current convergence rewards. It can be seen from Fig.4 (a),
the D is %17.17 in 300 states nodes, and %22.88 in 400 scenario. That is to
say, the increasing number of state nodes may aggravate the deviation from the
optimality and fall into local optima. Hence, we can conclude that the Multi-
Sarsa has the scalability when face with the increment of states nodes.
Finally, we come to the affect of agents number. We set the state nodes for
100, the services number for 1000 to each state node. From Fig.4 (b), we know
that the more agents involved, the more adequate space exploration will be done,
consequently the discount cumulative rewards is apparently bigger in scenario of
12 and 16 agents. However, the increasing number of agents brings another severe
problem, that is, the communication consumption in the process of fictitious
play. So, 16-agent does not perform better than 12-agent. In brief, 12-agent
may be a compromise for Multi-Sarsa, the increasing number of agents does
not necessarily leads to an improvement in efficiency, and the communication
consumption between agents must be considered as an important factor.
Integrating On-policy Reinforcement Learning with Multi-agent Techniques 167
References
1. Ardagna, D., Pernici, B.: Adaptive service composition in flexible processes. IEEE
Transactions on Software Engineering 33(6), 369–384 (2007)
2. Beauche, S., Poizat, P.: Automated service composition with adaptive planning. In:
Bouguettaya, A., Krueger, I., Margaria, T. (eds.) ICSOC 2008. LNCS, vol. 5364,
pp. 530–537. Springer, Heidelberg (2008)
2
This work is partially supported by NSFC Key Project (No.61232007) and Doctoral
Fund of Ministry of Education of China (No.20120092110028).
168 H. Wang et al.
Lina Barakat, Samhar Mahmoud, Simon Miles, Adel Taweel, and Michael Luck
1 Introduction
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 169–183, 2014.
c Springer-Verlag Berlin Heidelberg 2014
170 L. Barakat et al.
not possible) to guarantee specific quality values for a service, because of their
dependency on various run-time factors. For instance, the service response time
at any particular moment could be significantly affected by the provider load
and network traffic at that moment. Such dynamism and uncertainty can lead
to highly undesirable situations during service execution (e.g. unfulfilled quality
promises), and may demand costly corrective actions.
Consequently, as an attempt to minimise quality deviations of services at ex-
ecution time, a number of efforts focus on providing more accurate estimation
of service quality values, based on the available information regarding their past
performance [9–13]. Specifically, assessing a quality attribute for a service is typ-
ically performed by applying some aggregation measure (e.g. a time-weighted
average) to the previously observed values, which are obtained as feedback from
service users, or from service-side monitors. Such a single-valued quality estima-
tion model, however, does not capture the uncertainty in the service’s quality
values, and might produce inaccurate or invalid quality predictions, especially for
attributes with high variance in values. For example, assume the values encoun-
tered in the past regarding the learning time attribute of a knowledge service
are 10, 10, 10, 60, 60, 60 (minutes). Estimating the mean of these values would
produce an expected value of 35 minutes, an imprecise indication of the at-
tribute’s actual outcome. Moreover, such a model is only limited to quantitative
attributes, without the ability to accommodate qualitative cases.
In response, this paper proposes a probabilistic multi-valued quality estima-
tion model, applicable to both numeric and categorical attributes. It captures
uncertainty in quality values by augmenting these values with reliability scores,
allowing more informative reasoning about the various potential quality out-
comes of a service, thus enabling more reliable and proactive service selection.
The responsibility of instantiating such quality models for services is distributed
among a number of learning-enabled software agents, applying online learning
An Agent-Based Service Marketplace for Dynamic and Unreliable Settings 171
information based on collected service ratings after each interaction with the
service. The ratings can be collected either directly from consumers via feedback
interfaces, or automatically via appropriate monitors residing at the service side
or over the network. Note that we assume in this paper that the ratings are
honest and objective (false ratings can be handled through appropriate filtering
and reputation mechanisms [15], but this is out of the scope of this paper).
In what follows, we first outline the traditional QoS model of services (the
model corresponding to provider advertisements), and then focus on modelling
the service agent, including an improved QoS model, augmenting the traditional
model with reliability information, and a learning algorithm.
valueag : AN × T → P ROB
An Agent-Based Service Marketplace for Dynamic and Unreliable Settings 173
$ ! $
! !" "#
Fig. 2. Probabilistic attribute value function valueag (a, t) for a learning object
available for discovery through dedicated repositories, where their properties are
described using a standard language (e.g. IEEE LOM1 ).
5 Learning Model
The learning problem of the service agent concerns devising a reliable QoS de-
scription for the service in the presence of uncertainty in the environment, where
service providers might be untrustworthy, and may change their QoS policies
without notification, either intentionally or unintentionally. Such learning is con-
ducted by observing the behaviour of the service over time, and adapting its
description accordingly. Specifically, the cycle of the service agent involves the
following three steps.
(1) Observe. The agent receives new ratings for the service at time step t (e.g.
user feedback after interaction with the service). Let obs(t) = {(a, rating(a, t)) |
a ∈ AN } denotes such ratings, where function rating(a, t) ∈ domd (a) maps
quality attribute a of the service to the value observed for a at time step t.
(2) Learn. The agent utilises the new observation history to update the prob-
ability distributions of the quality values of the service, so that the service be-
haviour is more accurately described for future selection. In other words,
where OBSt = {obs(i)}ti=1 are the past observations of the service up to time t,
and function qoslearn corresponds to the agent’s learning algorithm.
(3) Expose. The agent makes the probability distributions, valueag (a, t), of
the service’s quality attributes a ∈ AN , available to discovery applications as
the best generalisation of the behaviour of the service at time step t.
Next, we define the properties that need to be satisfied by the learning function
qoslearn, followed by a learning algorithm achieving these properties.
2.2. else
1
∀v ∈ domd (a), p(a, v, t0 ) =
|domd (a)|
3. Repeat
3.1. Observe the behaviour of the service, obs(t) = {(a, rating(a, t)) | a ∈ AN }
3.2. Learn more accurate QoS policy, valueag (a, t), for each attribute a ∈ AN :
b (x−μ)2
1 −
Q2 (a, t) q(a, v, t) = √ e 2σ2 dx type(a) = CNT
σ 2π
a
we refer to re-building the model using all the data observed so far, which is
utilised as a baseline in our evaluation.
QoSLearn δ: this is the learning strategy proposed in this paper, utilising a
learning rate δ, with no memory requirement.
0.4 1
(a) (b)
QoSLearn_0.01
SlideWindow_all 0.8
0.3
Uninstantiated, |domain|=5
SlideWindow_100 Uninstantiated, |domain|=15
h(P,Q)
h(P,Q)
QoSLearn_0.03
0.2 Untrustworthy, |domain|=5
QoSLearn_0.01
0.4 Untrustworthy, |domain|=15
Untrustworthy, |domain|=25
0.1
0.2
0 0
1 101 201 301 401 501 601 701 801 901 1 51 101 151 201 251 301 351 401 451 501 551
Time Step Time Step
the observations rating(a, t), for all the time steps t, according to the same
probability distribution valueprov (a, t0 ) = Q(a, t0 ), s.t. ∀t, Q(a, t) = Q(a, t0 )
(i.e. value vi and mean μ for distributions Q1 and Q2 , respectively, remain fixed
for all time steps). Here, we are interested in testing the ability of the proposed
approach to learn probability distribution Q(a, t0 ), in the following two settings:
Untrustworthy Provider, where the provider acts maliciously and advertises false
capability value(a) for attribute a, i.e. value(a) does not correspond to vi in the
case of actual distribution Q1 (see Table 1), and value(a) differs significantly
from μ in the case of actual distribution Q2 (see Table 1); and Uninstantiated
Attribute, where no performance indication regarding attribute a is available by
the provider, i.e. value(a) = undefined.
Figure 3(a) reports the results of the considered learning strategies. As ex-
pected, SlideWindow all is the best performing strategy, with smaller window
sizes achieving lower accuracy due to excluding relevant observations (all obser-
vations remain relevant in a static environment). By setting the learning rate δ
to a small value of 0.01, the proposed learning strategy, QoSLearn 0.01, keeps
the effect of older observations without necessitating their storage, and man-
ages to approximate the performance of SlideWindow all. However, such a small
learning rate causes slower learning at the beginning, achieving an accuracy of
about 0.2 only after 60 observations, compared to SlideWindow all that achieves
similar accuracy after just 15 observations. This initial learning period is fur-
ther highlighted in Figure 3(b), distinguishing the two cases of Untrustworthy
Provider and Uninstantiated Attribute, and varying the size of the attribute’s
domain domd . As can be seen, the effect of misleading providers generally takes
longer to overcome, especially for a larger domain size, requiring a larger number
of samples to accurately learn the actual underlying distribution.
0.5
(b)
0.4 QoSLearn_0.1
QoSLearn_0.05
0.3
QoSLearn_0.03
h(P,Q)
0.2
QoSLearn_0.01
0.1
0
1 101 201 301 401 501 601 701 801 901
Time Step
0.5
(b)
QoSLearn_0.2
QoSLearn_0.1
0.4
QoSLearn_0.05
QoSLearn_0.03
0.3
QoSLearn_0.01
h(P,Q)
0.2
0.1
0
198 248 298 348 398
Time Step
faster reactivity after a change, but lower performance in stable periods. In con-
trast, smaller learning rates improve the accuracy of the learner due to capturing
enough samples to reflect the current distribution, yet causes slower adaptation
to a change since irrelevant data takes longer to be forgotten.
The results above demonstrated that, for both SlideWindow w and QoSLearn δ,
appropriate parameter setting plays an important role for achieving accurate
learning of the probability distributions of quality values, and depends on the
environment dynamism. Moreover, the learning-rate-based strategy performs al-
most as good as the memory-based one, while achieving considerable saving in
terms of storage and computation, especially with the increasing dimensionality
and number of services in the marketplace. For instance, given a marketplace
with 1000 services, each with 10 quality attributes, applying QoSLearn δ, as
compared to SlideWindow 100, in a gradually changing environment eliminates
the need for storing and iterating over 100 × 104 quality ratings at each time
step, with the gain increasing in static environments (which require larger win-
dow sizes).
7 Related Work
8 Conclusion
The paper presented a probabilistic QoS learning model, tailored towards dy-
namic and untrustworthy service environments, where each service is associated
with a software agent, able to learn, based on past performance information,
the uncertainty degrees regarding the service’s quality outcomes in the form of
probability distributions over such outcomes. The learning is both efficient and
adaptable to various degrees of environment dynamism via an appropriate choice
of the learning rate, which is demonstrated through experimental results.
An Agent-Based Service Marketplace for Dynamic and Unreliable Settings 183
Future work involves investigating more complex stochastic models for the
dynamic adjustment of the learning rate during the learning process when envi-
ronment dynamics change over time, as well as accommodating the addition of
new quality characteristics. Moreover, we intend to explore the social ability of
software agents (e.g. collaboration among those monitoring services for the same
provider) to improve QoS predictions in the proposed marketplace architecture,
where the role of agents has been limited so far to individual learning.
References
1. Papazoglou, M.P., Traverso, P., Dustdar, S., Leymann, F.: Service-oriented Com-
puting: State of the Art and Research Challenges. Computer 40, 38–45 (2007)
2. Bowling, M., Veloso, M.: Rational and Convergent Learning in Stochastic Games.
In: 17th Int. Joint Conf. on Artificial Intelligence, pp. 1021–1026 (2001)
3. Simpson, D.G.: Hellinger Deviance Tests: Efficiency, Breakdown Points, and Ex-
amples. Journal of the American Statistical Association 84, 107–113 (1989)
4. Zeng, L., Benatallah, B., Ngu, A.H.H., Dumas, M., Kalagnanam, J., Chang, H.:
QoS-aware Middleware for Web Services Composition. IEEE Trans. Softw. Eng. 30,
311–327 (2004)
5. Ardagna, D., Pernici, B.: Adaptive Service Composition in Flexible Processes.
IEEE Trans. Softw. Eng. 33, 369–384 (2007)
6. Canfora, G., Penta, M.D., Esposito, R., Villani, M.L.: QoS-Aware Replanning of
Composite Web Services. In: IEEE Int. Conf. on Web Services, pp. 121–129 (2005)
7. Barakat, L., Miles, S., Poernomo, I., Luck, M.: Efficient Multi-granularity Service
Composition. In: IEEE Int. Conf. on Web Services, pp. 227–234 (2011)
8. Barakat, L., Miles, S., Luck, M.: Efficient Adaptive QoS-based Service Selection. Ser-
vice Oriented Computing and Applications (2013), doi:10.1007/s11761-013-0149-z
9. Amin, A., Colman, A., Grunske, L.: An Approach to Forecasting QoS Attributes
of Web Services Based on ARIMA and GARCH Models. In: IEEE Int. Conf. on
Web Services, pp. 74–81 (2012)
10. Aschoff, R., Zisman, A.: QoS-driven proactive adaptation of service composition.
In: Kappel, G., Maamar, Z., Motahari-Nezhad, H.R. (eds.) Service Oriented Com-
puting. LNCS, vol. 7084, pp. 421–435. Springer, Heidelberg (2011)
11. Dai, Y., Yang, L., Zhang, B.: QoS-driven Self-healing Web Service Composition
Based on Performance Prediction. Journal of Computer Science and Technology 24,
250–261 (2009)
12. Yang, L., Dai, Y., Zhang, B.: Performance Prediction Based EX-QoS Driven Ap-
proach for Adaptive Service Composition. Information Science and Engineering 25,
345–362 (2009)
13. Maximilien, E.M., Singh, M.P.: Agent-based Trust Model Involving Multiple Qual-
ities. In: 4th Int. Joint Conf. on Autonomous Agents and Multiagent Systems, pp.
519–526 (2005)
14. Xu, Z., Martin, P., Powley, W., Zulkernine, F.: Reputation-Enhanced QoS-based
Web Services Discovery. In: IEEE Int. Conf. on Web Services, pp. 249–256 (2007)
15. Vu, L., Hauswirth, M., Aberer, K.: QoS-based Service Selection and Ranking with
Trust and Reputation Management. In: The Cooperative Information System Con-
ference, pp. 446–483 (2005)
16. Malik, Z., Bouguettaya, A.: RATEWeb: Reputation Assessment for Trust Estab-
lishment among Web Services. The VLDB Journal 18, 885–911 (2009)
17. Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A Survey on
Concept Drift Adaptation. ACM Computing Surveys 46, 1–37 (2014)
Architecture-Centric Design of Complex
Message-Based Service Systems
1 Introduction
The last two decades have witnessed the emergence of various techniques for com-
posing complex service systems. Composition approaches based on orchestration
languages such as BPEL [12] and YAWL [1] or those based on choreography lan-
guages such as WS-CDL1 share a common assumption on the underlying system
architecture: namely workflow-like control and data flow among services. Not
all application scenarios, however, fit this workflow-centric scheme and hence
existing approaches are cumbersome to apply. A publish-subscribe architecture
is a better match for a complex service system which (i) discourages centralized
1
Web Services Choreography Description Language
http://www.w3.org/TR/ws-cdl-10/
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 184–198, 2014.
c Springer-Verlag Berlin Heidelberg 2014
Architecture-Centric Design of Complex Message-Based Service Systems 185
execution control, (ii) consumes and provides data rather than method invoca-
tions, (iii) experiences unpredictable service availability, and (iv) must support
a dynamically changing number of service instances.
In this paper we address challenges emerging from the design and configura-
tion efforts of such decentralized, highly decoupled, event-based (composite) ser-
vices. A system architect following a naive approach would specify the individual
(composite) services and wire them up in an ad-hoc manner via message queues.
The resulting message flow might be documented somewhere but the overall
consistency of the ultimately developed services and the deployed message bro-
kers cannot be guaranteed. The ground truth message flow remains implicit in
the configuration of individual services and the utilized message-oriented mid-
dleware (MOM). It is only a matter of time and complexity before the design
and configuration of such a composite system becomes inconsistent. An engineer
engaging in example tasks such as restructuring the message flow, integrating
additional services, deploying additional instances, or adapting services has lit-
tle means to ensure that a particular change leaves the updated system in a
coherent state. Enterprise Application Integration (EAI) patterns [10] guide the
architect in how to structure the overall system but cannot guarantee correct
implementation. Consequently high costs occur in terms of time and invested
resources when attempting to maintain consistency, as well as for detecting and
repairing inconsistencies.
We propose to address this problem through a combination of architecture-
centric composite service specification, separation of message routing aspects
from local invocation-centric message processing, and architecture-to-configura-
tion transformation. Specifically, our approach applies a component and connec-
tor view for describing the high-level, overall complex service system’s
architecture. The components represent individual, composite services while the
connectors represent message brokers. The resulting centralized system architec-
ture serves as the authoritative source for configuring the MOM and each ser-
vice’s publish/subscribe endpoints. Individual services leverage the advantages
of proven technologies such as Enterprise Service Buses (ESB) and workflow en-
gines for processing messages locally. This cleanly separates the responsibility
of designing the overall, distributed architecture from designing its constituent
components. Constraint checks ensure that the architecture itself is consistent.
Ultimately transformations derive the actual technology configuration automat-
ically from the architecture description and thus guarantee consistency.
In support of this approach, our contribution in this paper is four-fold. We
provide (i) an message-centric extension for the Architecture Description Lan-
guage xADL [5] (Sec. 5.2), (ii) message-centric architecture consistency checking
(Sec. 5.3), (iii) tool support through extension of ArchStudio4 [4] (Sec. 6), and
(iv) proof-of-concept architecture-to-configuration transformations for the Ac-
tiveMQ JMS server and the Mule ESB (Sec. 6.1). We applied our approach and
techniques to an industry case study, demonstrating that our methodology is
not only feasible, but also easily applicable in real world situations (Sec. 7).
186 C. Dorn, P. Waibel, and S. Dustdar
2 Motivating Scenario
Fig. 1. Parking Management Complex Service System comprising, Data Services, Filter
Services, Aggregator Services, and POS Services, as well as message brokers. (Note that
icons are meant to depict services and not servers.)
This scenario reflects the challenges from the introduction. The overall system
relies primarily on asynchronous message exchange. There is no single service
that would logically serve as a central point of control. Individual service partic-
ipants may be disconnected or briefly overloaded and thus temporarily unavail-
able. New services may be introduced anytime but must not affect the remaining
service participants’ interaction nor require their extensive reconfiguration. Our
approach aims at preventing (respectively detecting) following example prob-
lems: a Data service publishing its updates to the wrong topic, respectively a
Filter service reading from the wrong topic; an Aggregator service expecting an
incompatible message type from a filter. Multiple POS services using a single,
Architecture-Centric Design of Complex Message-Based Service Systems 187
3 Related Work
Choreography and Orchestration are the two main contemporary paradigms for
addressing design and configuration of complex service systems. Orchestration
languages such as BPEL [12], JOpera [13], or YAWL [1] represent centralized
approaches and thus need a single coordinating entity (i.e., the workflow en-
gine) at runtime. Decentralized orchestration approaches (e.g.,[11,16]) mitigate
this shortcoming through distributing control flow among the participating ser-
vices. While orchestration takes on a single process view including all participat-
ing services, choreography specification languages such as BPEL4Chor [6], Let’s
Dance [17], or MAP [3], on the other hand, aim at a holistic, overarching system
view. Both choreography and orchestration, however, presume a workflow-like
system style, with services playing fixed roles, and being highly available (respec-
tively easily replaceable on the fly). It is rather cumbersome to model complex
service systems that experience dynamically fluctuating service instances, mul-
tiple (a-priori unknown) instances of the same service type, and temporal un-
availability with the languages and approaches outlined above. Our work caters
predominately to systems that more naturally rely on one-way events and less
on request/reply style information exchange. In addition, our approach offers
more flexibility on where to locate and manage coordinating elements by strictly
separating them from services concerned with business logic as well as modeling
them as first class entities. Enterprise Application Integration patterns (EAI)
[10] demonstrate the benefits of message-centric service interaction. Scheibler et
al. [14] provide a framework for executing EAI-centric configurations; however,
by means of a central workflow engine.
At no point are we suggesting that our approach is superior to any of these
existing approaches, methodologies, or technologies. We rather see our work as
focusing on different service system characteristics. We believe that integrating
these existing technologies are well worth investigating as part of future work.
This holds also true for existing research efforts that focus on other qualities
than high-level architectural consistency. Work on integrating QoS or resource
allocation is highly relevant but currently not applicable to our scenarios. Such
approaches [7] typically rely on centralized control and/or exclusively employ
the request/reply invocation pattern.
Our work takes inspiration from significant contributions in the software ar-
chitecture domain. Zheng and Taylor couple architecture-implementation con-
formance with change management in their 1.x mapping methodology [18]. 1.x
mapping focuses primarily on maintaining consistency between an architecture
specification and its underlying Java implementation and how changes are propa-
gated from the architect to the software developer. We follow a similar procedure
by separating high-level architectural design and configuration decisions from the
engineers that implement the actual (composite) services.
188 C. Dorn, P. Waibel, and S. Dustdar
4 Approach
Our approach to designing and configuring a complex service system consists of
four phases (depicted in Figure 2). First, a high-level architectural component
and connector view identifies the main (composite) services (architecture-level
components), and their interactions via messages (architecture-level interfaces).
Explicit message channels (architecture-level connectors) enable the clear sepa-
ration of interaction concerns from (service) logic concerns [15]. An architect may
model connector-specific properties, configurations, simplify N:M links (become
N:1:M), interaction monitoring, etc when connectors become first class model
elements. In our specific context, the high-level architecture also separates the
responsibility of the overall service system architect from engineers tasked with
the internal design and wiring of the individual services (incl. applied tools such
as ESBs). We apply an existing extensible Architecture Description Language
(xADL) [5] for expressing the high-level architecture. Subsection 5.1 below pro-
vides a short introduction of xADL and its main modeling elements.
At any stage in the architecture modeling, the architect may choose to spec-
ify messaging-specific configuration properties. For connectors, these properties
define messaging middleware-specific details such as channel name, applicable
protocol, or deployment host. For interfaces, these properties include messaging
endpoint related details such as reply channel references, event-centric request
endpoint references, as well as framework-centric properties.
Upon triggering consistency checking, our algorithm iterates through all com-
ponents, connectors, and links that exhibit messaging-specific configurations. It
verifies allowed link cardinalities, missing configuration values, and matching
interface details. For a detailed description of constraints see Subsection 5.3.
For all elements that passed these constraint checks, the system architect may
then trigger architecture-to-configuration transformations. Distinct tool-centric
Architecture-Centric Design of Complex Message-Based Service Systems 189
Figure 2 displays the four phases in a sequential manner. The system architect
and her co-worker, however, will typically progress through these phases in an
iterative manner. An initial configuration may sufficiently serve for checking
overall consistency and for identifying core individual services. This approach
addresses the four properties of complex service systems outlined earlier in the
introduction:
No centralized execution control. The high-level system architecture con-
stitutes a central, authoritative specification only at design-time. The top-
down specification of message infrastructure and message-centric service end-
points ensures that the decentralized elements remain true to the architecture
at runtime.
Publish/Subscribe interaction. The high-level composite services in this sys-
tem are primarily concerned with their business logic and not how many sources
they receive information from or how many destination in turn are interested
in their processed information. Hence, an event-driven interaction style best
reflects this loose coupling.
Unpredictable service availability. The message-oriented architecture ena-
bles reasoning on the effect of unavailable services. As a durable subscriber,
individual services may process at their own pace without affecting simultane-
ous subscribers. Explicit connector modeling also enable reasoning on where
to host which channels, further decoupling message routing from processing.
190 C. Dorn, P. Waibel, and S. Dustdar
Fig. 3. Simplified xADL schema excerpt including the messaging extensions (dark grey)
To this end, we provide a set of four implementation extensions (see also Figure 3
bottom). The schemas for Channel Implementation and Endpoint Implementa-
tion provide general messaging properties, while Mule Implementation and JMS
Implementation express technology specific properties for Mule and ActiveMQ
respectively. The separation into four schemas also reflect the fact that each
schema applies only to a particular core architecture element.
Channel Implementation (aka EAI message channel pattern) applies to a
Connector element and specifies whether the Connector behaves as a publish-
subscribe channel or point-to-point channel and provides the respective name.
6 Tool Support
6.1 Architecture-to-Configuration Transformation
For the purpose of this paper, we focused on two architecture-to-configuration
transformations. As example for configuring a message-oriented middleware, we
generate the XML-based ActiveMQ Server configuration. A ConnectorType’s
JMS implementation is sufficient for deriving a server’s configuration which
consists of two parts. The Persistence_Configuration determines the persis-
tenceAdapter and all Transport_Configurations determine the set of available
transportConnector s. The transformation also ensures traceability by adding the
connector type’s id as a comment to the configuration files broker element. Note
that the transformation ignores any connectors and thus doesn’t specify what
queues or topics the server will eventually manage as ActiveMQ creates these on
the fly. Ultimately, every connector type with a JMS implementation results in
a separate transportConnector element. Configurations for connector types that
share a file_id are aggregated into a single configuration file and subsequently
end up collocated on the same ActiveMQ server instance.
As example for configuring a service endpoint, we provide the complete mes-
sage specification for a Mule workflow, i.e., a workflow designer may neglect any
message-broker related details and can focus purely on the local message pro-
cessing. The Mule workflow configuration captures components, interfaces, and
their wiring to the various connectors, while the ActiveMQ service configuration
represents only the connector types in complex service system’s architecture.
Each component results in a separate mule workflow specification. The transfor-
mation places all workflow specifications from component with the same file_id
in the same file, and thus collocates them on the same Mule ESB instance. To
this end, the transformation first retrieves all connectors (with channel imple-
mentation) and obtains the JMS configuration from the corresponding connector
type. Each distinct connector type becomes an activemq-connector element. For
each interface, a new [inbound|outbound]-endpoint element obtains the config-
uration properties from the endpoint implementation, the channel name from
the linked connector’s channel implementation, and the respective connector-ref
to the activemq-connector. Our transformation treats two interfaces coupled via
a Connection_To_Request_Endpoint property and Reply_To_Queue property
differently depending on whether they represent the requesting component or the
replying component. In case of the former, the respective two mule endpoints
become wrapped in a request-reply element and a preceding message-properties-
transformer element. In case of the latter, the receiving interface becomes a
inbound-endpoint with an exchange-pattern="request-response" property while
the outgoing interface is ignored. The respective reply endpoint information ar-
rives embedded in the request message at runtime.
main editors: ArchEdit provides access to the underlying xADL document (in-
cluding all extensions) as a tree, while Archipelago offers a drag-and-drop, point-
and-click interface for placing and wiring up components and connectors. Arch-
Studio foresees the integration of additional functionality through extensions.
Schema Extensions. For the purpose of our approach, it proved sufficient
to extend xADL at the implementation schema level. The additional elements
(recall subsection 5.2) blend in smoothly with the existing user interface, merely
appearing as new implementation options (see Figure 4 left). An existing Arch-
Studio 4 user won’t have to learn any new steps for utilizing our schema exten-
sions. Under the hood, ArchStudio applies its Apigen tool for creating a data
binding library for each xADL schema. ArchEdit and Apigen’s limitations com-
bined result in configuration properties being limited to strings, references, and
complex data structures thereof.
Consistency Checking. We implemented the consistency checker as a ded-
icated component within ArchStudio. The checker raises warnings and errors
during execution, depending on the consistency rule severity. The user may de-
cide to ignore warnings and still continue to configuration transformation later
on. Transformation is disabled, however, in the presence of consistency errors (see
Figure 4 inset). In general, consistency checking is cheap. The consistency algo-
rithm’s runtime complexity is Θ(comp + l) for architecture components (comp)
and links (l) as rules are either local (e.g., interface properties, interface link
cardinality) or access only a link’s two referenced elements (e.g., compatible
Architecture-Centric Design of Complex Message-Based Service Systems 195
(not shown), and one workflow processing structural events. The composite Ag-
gregator Service comprises three workflows: one for obtaining structural data
(typically provided by more than one Filter Service), one for checking the struc-
tural data for changes relevant to POS services and dispatching those changes,
and one for providing POS services with initial complete state information. A
generic POS Service contains at minimum flows for (i) obtaining initial data (in
this case, the POS service is aware from which Aggregator Service it receives such
initial information), for (ii) receiving structural updates and for (iii) dynamic
data updates. Further locally relevant flows which contain the actual business
logic are irrelevant at this architectural level. Similarly, shared databases serving
multiple flows within a single Mule instance need not be configured at this level
but instead are within the scope of a Mule configuration file. Note also that the
architectural substructures are included for sake of better understanding. Collo-
cation of mule workflows depends solely on specifying the same implementation
file id property.
Tool supported consistency checking pays of even for this small architecture
excerpt. A single execution of the all consistency checks outlined in Section 5.3
on the architecture in Figure 5 results in four architecture-level checks, four
link-specific checks, three connector checks, one connector type check, and six
component checks (including respective interfaces). Remember that an archi-
tecture would need to conduct many more checks when conducting the same
analysis on Mule and ActiveMQ files alone. The ActiveMQ configuration is void
of any topic and queues definitions, thus there exists no authoritative, explicit
connector element. Observing a simple example such as ensuring that a queue
has only a single receiver or that queue/topic names are unique: the architect
needs to traverse the Mule configuration for each queue and topic definition first
to the corresponding Mule messaging endpoint definitions (requiring a detailed
understanding of the configuration file) and then pairwise compare this informa-
tion across all included mule workflows (i.e., n ∗ (n − 1)/2 comparisons, thus 45x
for our use case’s 10 connected component interfaces); a tedious and error-prone
task, especially for larger systems.
Architecture-Centric Design of Complex Message-Based Service Systems 197
Complex service systems do not necessarily need to exhibit all the challeng-
ing properties listed in the introduction: prohibiting centralized execution con-
trol, consuming and providing data rather than invocations, experiencing unpre-
dictable service availability, and supporting a dynamically changing number of
service instances. Systems that encounter only a subset will equally benefit from
our approach and tools.
Currently, our architecture-to-configuration transformation produces only Mule
workflows and ActiveMQ configurations. The underlying real world development
project underlying our evaluation scenario identified these technologies as suffi-
cient and providing a good balance between a light-weight messaging framework
and the expressive and extensible Mule workflows for composite service design.
Our approach remains valid for other messaging protocols or frameworks as well as
for other service design methodologies. The architecture-level consistency checking
mechanism remain applicable. Ultimately, supporting other runtime frameworks
does not necessarily require adapting our ArchStudio add-on. For small deviating
tasks, such as generating an OpenJMS server configuration, access to the architec-
ture model via Apigen’s data binding libraries, or directly via the xADL XML file
will be sufficient.
8 Conclusions
References
1. van der Aalst, W., Hofstede, A.H.M.T.: Yawl: Yet another workflow language.
Information Systems 30, 245–275 (2003)
2. Baresi, L., Ghezzi, C., Mottola, L.: On accurate automatic verification of publish-
subscribe architectures. In: Proc. of the 29th International Conference on Software
Engineering, ICSE 2007, pp. 199–208. IEEE Computer Society, Washington, DC
(2007)
3. Barker, A., Walton, C., Robertson, D.: Choreographing web services. IEEE Trans-
actions on Services Computing 2(2), 152–166 (2009)
4. Dashofy, E., Asuncion, H., Hendrickson, S., Suryanarayana, G., Georgas, J., Taylor,
R.: Archstudio 4: An architecture-based meta-modeling environment. In: Compan-
ion to the Proc. of the 29th International Conference on Software Engineering, pp.
67–68. IEEE Computer Society, Washington, DC (2007)
5. Dashofy, E.M., Van der Hoek, A., Taylor, R.N.: A highly-extensible, xml-based ar-
chitecture description language. In: Proceedings of the Working IEEE/IFIP Con-
ference on Software Architecture, pp. 103–112. IEEE (2001)
6. Decker, G., Kopp, O., Leymann, F., Weske, M.: Bpel4chor: Extending bpel for
modeling choreographies. In: IEEE 20th International Conference on Web Services,
pp. 296–303. IEEE Computer Society, Los Alamitos (2007)
7. Dustdar, S., Schreiner, W.: A survey on web services composition. Int. J. Web Grid
Serv. 1(1), 1–30 (2005)
8. Esfahani, N., Malek, S., Sousa, J.P., Gomaa, H., Menascé, D.A.: A modeling lan-
guage for activity-oriented composition of service-oriented software systems. In:
Schürr, A., Selic, B. (eds.) MODELS 2009. LNCS, vol. 5795, pp. 591–605. Springer,
Heidelberg (2009)
9. Garcia, J., Popescu, D., Safi, G., Halfond, W.G.J., Medvidovic, N.: Identifying
message flow in distributed event-based systems. In: Proceedings of the 2013 9th
Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2013, pp. 367–
377. ACM, New York (2013)
10. Hohpe, G., Woolf, B.: Enterprise Integration Patterns: Designing, Building, and
Deploying Messaging Solutions. Addison-Wesley, Reading (2003)
11. Nanda, M.G., Chandra, S., Sarkar, V.: Decentralizing execution of composite web
services. SIGPLAN Not 39(10), 170–187 (2004)
12. Organization for the Advancement of Structured Information Standards (OASIS):
Web Services Business Process Execution Language (WS-BPEL) Version 2.0 (April
2007), http://docs.oasis-open.org/wsbpel/2.0/OS/wsbpel-v2.0-OS.html
13. Pautasso, C., Heinis, T., Alonso, G.: Jopera: Autonomic service orchestration.
IEEE Data Eng. Bull. 29(3), 32–39 (2006)
14. Scheibler, T., Leymann, F.: A framework for executable enterprise application in-
tegration patterns. In: Mertins, K., Ruggaber, R., Popplewell, K., Xu, X. (eds.)
Enterprise Interoperability III, pp. 485–497. Springer, London (2008)
15. Taylor, R.N., Medvidovic, N., Dashofy, E.M.: Software Architecture: Foundations,
Theory, and Practice. Wiley (2009)
16. Yildiz, U., Godart, C.: Information flow control with decentralized service compo-
sitions. In: IEEE Int. Conf. on Web Services, pp. 9–17 (July 2007)
17. Zaha, J.M., Barros, A., Dumas, M., ter Hofstede, A.: Let’s dance: A language for
service behavior modeling. In: Meersman, R., Tari, Z. (eds.) OTM 2006. LNCS,
vol. 4275, pp. 145–162. Springer, Heidelberg (2006)
18. Zheng, Y., Taylor, R.N.: Enhancing architecture-implementation conformance with
change management and support for behavioral mapping. In: Proc. of the 34th Int.
Conf. on Software Engineering, ICSE 2012, pp. 628–638. IEEE Press, Piscataway
(2012)
Managing Expectations: Runtime Negotiation
of Information Quality Requirements
in Event-Based Systems
1 Motivation
Having information of adequate quality available at the right time in the right
place is vital for software systems to react to situations or support decisions. Sup-
ply chain management based on the Internet of Things (IoT) and data centre
monitoring are just two examples of reactive systems where information provided
by data sources has to be interpreted and where false alarms, missed events or
otherwise information of inadequate quality carries a cost [18]. Event-based sys-
tems (EBS) and service-oriented architectures (SOA) complement each other
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 199–213, 2014.
c Springer-Verlag Berlin Heidelberg 2014
200 S. Frischbier, P. Pietzuch, and A. Buchmann
Concept of expectations
Fig. 1. Our concept extends the model of EBS (top) with capabilities, expectations
and bidirectional feedback for runtime adaptation (bottom, bold)
Publishers expose their general capabilities as well as the state they are currently
operating at to brokers as capabilities. Support for new properties can be realized
by extending the set of available properties, their relationships and by associating
suitable actions for manipulating them at the middleware.
Expressing expectations and capabilities as ranges of accepted and provided
values over properties automates the process of runtime negotiation: matching
requirements to the system state is reduced to a range matching problem be-
tween corresponding properties. Furthermore, requirements become malleable
due to the individual tradeoffs defined by subscribers, giving the system more
degrees of freedom when deciding on the extent of adaptation necessary. Feed-
back enables participants to adapt their behavior at runtime and extends the
scope of supported properties to those influenced by publishers.
The concept of expectations complementary extends the paradigm of EBS with-
out compromising the model of indirect many-to-many communication, making
it backward compatible. As shown in Fig. 1, expectations and capabilities can
be defined independently of advertisements, notifications or subscriptions. They
are matched only at the middleware, preserving the anonymity of the associated
participants necessary for scalability in EBS. Bidirectional feedback enables par-
ticipants to assess their current situation and adapt their behavior at runtime if
necessary. Our concept encompasses related approaches by treating them as ded-
icated actions for enforcing requirements for specific properties.
This paper makes the following contributions to support QoI in EBS:
Related work is discussed in Sec. 5 before Sec. 6 concludes with final remarks.
This section describes the challenges supporting QoI in EBS at runtime and our
proposed solution using expectations, capabilities and feedback.
A capability profile reflects the full set of capabilities available from a spe-
cific publisher for a given event type and can be matched against expecta-
tions. Capabilities determinable only by the broker are added at runtime. Ca-
pability profiles for the same type of event (CP e ) but associated with differ-
ent publishers can be heterogenous in terms of the (i) set of properties (e.g.,
CP e2 = {rate ∧ latency} ⊂ CP e1 = {rate ∧ latency ∧ conf idence}), (ii) ranges,
and (iii) current values.
A capability profile’s lifecycle starts with registering it at the broker and ends
with revoking it. During runtime, the situation of a publisher might change in
a way that requires updating registered capability profiles without changing the
advertisement. For example, a battery powered sensor runs low on energy and
has to switch to an energy-saving mode, decreasing the rate of publication; or
new resources become available at runtime, improving or adding capabilities
(e.g., higher confidence due to better contextual information [16]).
Feedback to subscribers and publishers. At runtime, publishers and sub-
scribers are able and willing to adapt their behavior if they get feedback about
their actions and the system state. As traditional EBS do not give such feedback
at runtime, participants cannot assess if and how they would have to adapt [11].
We introduce bidirectional feedback from the middleware to participants to pro-
vide them with additional information about their actions and support adap-
tation at runtime. Subscribers get feedback about the state of their active ex-
pectations (satisfied or unsatisfied). They are informed about the reason if an
expectation cannot be satisfied by the system at the time. Reasons are expressed
as tuples (Xie ,pe ,α), describing the value currently provided by the system for
each property that is not satisfied. As soon as the expectation can be satisfied,
the subscriber is notified about the new state. Publishers receive feedback about
each active capability profile’s usage together with advice to adapt their publica-
tions if necessary. This includes the list of capabilities to adapt together with the
required target values, expressed as tuples (CP ej ,C e ,β). We consider publishers
to be able to adapt automatically at runtime if notified as we show in [13].
per event type e that represents the skyline [7] of capabilities available at this
broker: For every set of capabilities in CP e it contains those capability profiles
that are as good or better than all other capability profiles known at this bro-
ker in all capabilities and dominating in at least one capability as illustrated in
Fig. 4. The SuperSet is updated with every change to a capability profile.
An expectation Xie is satisfied (Xie ∈ Sat) if it is dominated by the SuperSet,
satisfiable (Xie ∈ Sat) if covered by it and unsatisfiable (Xie ∈ Sat) if not.
Managing Expectations: QoI Runtime Negotiation 207
Property of an Corresponding
expectation capability
Expectation Expectation
In a distributed setup, each broker forwards its SuperSet to its directly con-
nected neighbors along the routing tree after modifying it: each contained ca-
pability profile is associated with the forwarding broker, masking the identity
of the locally known provider (i.e., CP ej → CP ebk ). Broker-related capabilities
like latency have to be updated as well. Forwarded SuperSets are handled like
capability profiles registered by local clients at each neighboring broker, starting
an iterative update that generates a global skyline at the edge brokers.
Example: Matching in distributed EBS. Consider a distributed EBS with an
acyclic routing topology as shown in Fig. 4 (top), consisting of brokers B and
C, five publishers and four subscribers for events of type e. Expectations and
capabilities are defined over properties pa and pb (improvable by minimization).
Publishers P1 → {CP e1 }, P2 → {CP e2 }, P3 → {CP e3 } register their capability
profiles at broker B (c.f. Fig.. 4 (bottom left)), P4 → {CP e4 } and P5 → {CP e5 }
at broker C. Broker B forwards its SuperSet SB e
= {CP e1 , CP e2 } to broker C,
masking the identity of P1 and P2 . Note that SC = SB
e e
as SB
e
dominates all other
local capability profiles at broker C. At broker C, the sequentially registered
208 S. Frischbier, P. Pietzuch, and A. Buchmann
Subscriber 2
Publisher 2 Broker B Broker C
Subscriber 3
Expectation
CP4e
CPe3
pb pb Xe2
CPe1 CPeB1
Xe1
CPe2 CPeB2
Xe5
Xe4
minimize minimize
pa pa
and Sat = {X4e } (not satisfiable as it is not covered by any capability profile).
be more expensive than the current state; we decide to adapt in all other cases.
Referring to the example in Fig. 4, we assume S1 to register X1e after X3e has been
satisfied. The middleware would approve satisfying X1e by adapting publisher P1
X e X e
if pe1 CP e1 .costpe (pe .ub) < pe3 CP e1 .costpe (pe .lb)
4 Implementation
Runtime support for QoI in EBS using expectations and capabilities is realized
by extending the middleware with an ExpectationController and providing ad-
ditional handlers to participants as shown in Fig. 5. We have implemented two
prototypes in Java, extending the ActiveMQ JMS messaging broker1 and the
distributed REDS middleware2 . We chose these two platforms for their different
features: ActiveMQ is representative of an industrial-strength messaging system
focussing on high performance, while the modular REDS systems allows us to
exploit routing strategies and broker topologies for adaption. Both systems are
easy to extend without affecting existing code. We use our prototypes to support
QoI at runtime within the open-source monitoring system Ganglia3 [13]. In this
paper, we focus on describing the key components for a single broker setup.
Fig. 5. Runtime support for QoI in EBS with expectations and capabilities showing
additional components (dark gray) for participants and middleware (gray)
RessourceMonitor monitors the broker’s state and the system’s population, re-
porting changes to the Registry.
Registry stores all expectations and capabilities registered at this broker with
the definitions of available properties and their matching. Changes trigger a
negotiation of requirements at the Balancer.
Balancer matches expectations to capabilities (c.f., Sec. 3) while applying differ-
ent optimization strategies. Triggers ReactionCoordinator upon completion.
ReactionCoordinator selects applicable actions from the MechanismRepository
and coordinates their execution by adapting the broker, advising selected
publishers to adapt using feedback or notifying subscribers.
MechanismRepository stores available actions for specific properties (c.f., Sec.
3.3). Actions are objects implementing generic or platform-specific activities.
5 Related Work
References
1. Appel, S., Sachs, K., Buchmann, A.: Quality of service in event-based systems. In:
22nd GI-Workshop on Foundations of Databases, GvD (2010)
2. Araujo, F., Rodrigues, L.: On QoS-aware publish-subscribe. In: ICDCSW (2002)
3. Bahjat, A., Jiang, Y., Cook, T., La Porta, T.: Quality of information functions for
networked applications. In: PERCOM Workshops (2012)
4. Behnel, S., Fiege, L., Mühl, G.: On quality-of-service and publish-subscribe. In:
ICDCS Distributed Computing Systems Workshops (2006)
5. Bellavista, P., Corradi, A., Reale, A.: Quality of service in wide scale publish/sub-
scribe systems. IEEE Communications Surveys & Tutorials (99), 1–26 (2014)
6. Bisdikian, C., Kaplan, L., Srivastava, M.: On the quality and value of information
in sensor networks. ACM Transactions on Sensor Networks 9(4), 39 (2010)
7. Borzsony, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE (2001)
8. Buchmann, A., Appel, S., Freudenreich, T., Frischbier, S., Guerrero, P.E.: From
calls to events: Architecting future BPM systems. In: Barros, A., Gal, A., Kindler,
E. (eds.) BPM 2012. LNCS, vol. 7481, pp. 17–32. Springer, Heidelberg (2012)
4
http://research.spec.org/tools/overview/fincos.html
Managing Expectations: QoI Runtime Negotiation 213
9. Carvalho, N., Araujo, F., Rodrigues, L.: Scalable QoS-based event routing in
publish-subscribe systems. In: Network Computing and Applications (2005)
10. Frischbier, S., Gesmann, M., Mayer, D., Roth, A., Webel, C.: Emergence as com-
petitive advantage - engineering tomorrow’s enterprise software systems. In: ICEIS
(2012)
11. Frischbier, S., Margara, A., Freudenreich, T., Eugster, P., Eyers, D., Pietzuch,
P.: ASIA: application-specific integrated aggregation for publish/subscribe mid-
dleware. In: Middleware 2012 Posters and Demos Track (2012)
12. Frischbier, S., Margara, A., Freudenreich, T., Eugster, P., Eyers, D., Pietzuch, P.:
Aggregation for implicit invocations. In: AOSD (2013)
13. Frischbier, S., Margara, A., Freundenreich, T., Eugster, P., Eyers, D., Pietzuch, P.:
McCAT: Multi-cloud Cost-aware Transport. In: EuroSys Poster Track (2014)
14. Hinze, A., Sachs, K., Buchmann, A.: Event-based applications and enabling tech-
nologies. In: DEBS (2009)
15. Hoffert, J., Schmidt, D.: Maintaining QoS for publish/subscribe middleware in
dynamic environments. In: DEBS (2009)
16. Hossain, M.A., Atrey, P.K., Saddik, A.E.: Context-aware QoI computation in multi-
sensor systems. In: MASS (2008)
17. Kattepur, A., Georgantas, N., Issarny, V.: QoS analysis in heterogeneous chore-
ography interactions. In: Basu, S., Pautasso, C., Zhang, L., Fu, X. (eds.) ICSOC
2013. LNCS, vol. 8274, pp. 23–38. Springer, Heidelberg (2013)
18. Keeton, K., Mehra, P., Wilkes, J.: Do you know your IQ? A research agenda for
information quality in systems. ACM SIGMETRICS Performance Evaluation Re-
view 37(3), 26–31 (2010)
19. Kritikos, K., Pernici, B., Plebani, P., Cappiello, C., Comuzzi, M., Benrernou, S.,
Brandic, I., Kertész, A., Parkin, M., Carro, M.: A survey on service quality de-
scription. ACM Computing Surveys 46(1), 1 (2013)
20. Perera, C., Zaslavsky, A., Christen, P., Compton, M., Georgakopoulos, D.: Context-
aware sensor search, selection and ranking model for internet of things middleware.
In: Mobile Data Management (2013)
21. Pernici, B., Siadat, S.H.: Adaptation of web services based on qoS satisfaction. In:
Maximilien, E.M., Rossi, G., Yuan, S.-T., Ludwig, H., Fantinato, M. (eds.) ICSOC
2010. LNCS, vol. 6568, pp. 65–75. Springer, Heidelberg (2011)
22. Pietzuch, P., Eyers, D., Kounev, S., Shand, B.: Towards a common API for pub-
lish/subscribe. In: DEBS (2007)
23. Sachidananda, V., Khelil, A., Suri, N.: Quality of information in wireless sensor
networks: a survey survey. In: ICIQ (2010)
24. Sachs, K., Appel, S., Kounev, S., Buchmann, A.: Benchmarking publish/subscribe-
based messaging systems. In: Database Systems for Advanced Applications: DAS-
FAA 2010 International Workshops: BenchmarX 2010 (2010)
25. Shi, Y., Chen, X.: A survey on QoS-aware web service composition. In: Multimedia
Information Networking and Security (2011)
26. Soberg, J., Goebel, V., Plagemann, T.: CommonSens: personalisation of complex
event processing in automated homecare. In: ISSNIP (2010)
27. Wilkes, J.: Utility functions, prices, and negotiation. HP Labs HPL-2008-81 (2008)
28. Yang, H., Kim, M., Karenos, K., Ye, F., Lei, H.: Message-oriented middleware with
qoS awareness. In: Baresi, L., Chi, C.-H., Suzuki, J. (eds.) ICSOC-ServiceWave
2009. LNCS, vol. 5900, pp. 331–345. Springer, Heidelberg (2009)
C2P: Co-operative Caching
in Distributed Storage Systems
1 Introduction
Distributed storage systems often partition sequential data across storage systems for
performance ( data striping) or protection (Erasure-Coding) . Data striping [5][4] is a
technique in which logically sequential data is partitioned into segments and each seg-
ment is stored on different physical storage device(HDD). This helps improve aggregate
I/O performance by allowing multiple I/O requests to be serviced in parallel from differ-
ent devices. Striping has been used in practice by storage controllers to manage HDD
storage arrays for improved performance for more than a decade (e.g., RAID 0 [15])
. Most of the popular enterprise cluster/distributed filesystems IBM GPFS[13], EMC
Isilon OneFS[14], Luster[17] etc. support data striping. Also, popular blob-storage like
Amazon S3[2], Openstack Swift[20], Microsoft Azure[18], Google Cloud Storage[11]
support segmented blob uploads.
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 214–229, 2014.
c Springer-Verlag Berlin Heidelberg 2014
C2P: Co-operative Caching in Distributed Storage Systems 215
Logically correlated data in storage systems also gets partitioned due to new data
protection techniques like Erasure-Coding (EC)[3][12][24], which deliver higher mean
time between data loss (MTBDL) as compared to RAID. For example, with 9:3 EC data
protection policy, when a new data is written, it is first partitioned into 9 equal-sized seg-
ments. Next, 3 additional code segments are computed from the data segments. These
12 segments are then stored on different storage nodes. Any 9 of these 12 segments then
can be used to satisfy subsequent read requests for the data.This provides availability
of the data for maximum up to 3 disk failures. Thus, either for performance or for re-
dundancy, we are increasingly seeing data segmentation in distributed storage systems
today.
2 Design
In this section, we first motivate the need for co-operative caching in distributed storage
systems. We discuss few key design challenges for C2P and our approach.
216 S.J. Nadgowda et al.
Storage Application Storage Application
A1 B2 B1 C2 A2 C3 A1 B2 A3 B1 A2 B3
A1 B2 C1 A3 B1 C2 A2 C3 B3 A1 B2 C1 A3 B1 C2 A2 B3 C3
(a) (b)
Fig. 1. Distributed Systems with (a) Independent and (b) Co-operative Caches
2.1 Motivation
Let’s consider a distributed storage application with 3 storage nodes as shown in Fig.1.
Each node has a cache capacity to host only 2 segments. We store 3 objects - A, B, C in
this storage system. Each object is segmented into 3 partitions and placed on different
storage nodes as shown. Also, consider the latency to access a segment from cache to be
50ms compared to the disk latency of 200ms. We identify an object access as complete
only when all its segments are read. Hence, access latency is defined as the maximum
time taken to read any segment for the object. Disk IO is measured as the total segments
read from disk across all storage nodes.
Fig.1 (a) shows the cache state at some point in time of a traditional system without
any cache co-ordination and (b) shows the cache state of a co-operative co-ordinated
cache system. At this stage if objects A, B and C are accessed from application, then
we can observe the system characteristics as shown in Tab.1. As we can see, for both
traditional and C2P system total segments hit (6) and miss (3) in the cache are same.
Also, number of disk IOs (3) are same. However, applications experience very different
access latency with the two systems. In the traditional system without any cache co-
ordination, each of the object suffers disk latency (200 ms) in their access. On the other
hand, in a co-operative cache system with co-ordination, we are able to reduce the
access latency for 2 objects (A, B) to cache latency (50 ms) and only 1 object (C) incurs
disk latency. Hence, if all cache controllers are able to achieve a distributed consensus
on the segments to cache, this can lead to improved response time for served objects.
– Distributed Consensus for Cache Management: Each node in the C2P will have
2 sets of cache information - namely Local Cache Metadata or LMD and Remote
Cache Metadata or RMD. LMD on a node is the repository of the cache events
generated by that node while RMD is the repository of the cache events received
from the peer nodes. In an ideal situation, all cache controllers need to arrive at
a consensus on the objects to be cached in a fully distributed manner. Designing a
distributed consensus is challenging and we address this problem by defining global
metrics based on the local metrics and remote metrics. Our defined metrics lead to
consistent values for objects across all storage nodes in a probabilistic sense. This
ensures that even though each cache controller executes a cache eviction policy in
isolation, all of them converge to the same object ordering in most cases.
– Identifying Relevant Events: Every data operation (READ/WRITE/DELETE) has
associated one or more cache event(s). Moreover, same data operation can create
different cache events. E.g. READ request for some data might cause <cache miss
>or <cache hit >event. It is important to snoop these events very efficiently with-
out adding any overhead to the data path. These captured events then need to be
classified into predefined categories. These categories then help implement cache
management policies in C2P system. E.g. prefetching policy would need <cache
miss >category.
– Peer node discovery: A set of nodes are identified as peer if they are hosting the
segments for the same objects. Set of peer nodes is different for each object. Peers
are created dynamically and need to be identified quickly to ensure that relevant
cache events are quickly communicated to peer. We had two design choices here:
1) each node broadcast their events to all nodes but only peer nodes will match and
process those events. 2) each node send the events only to its peer nodes. The former
option clearly had the downside of overloading the network. Consider, a storage
system with 100 nodes where an object with 2 segments is stored will generate
200 events on the network (100 by each node) for each object access. Later option
would certainly minimize this overhead. But it has challenge on how a node will
discover it’s peers for a given object. Storage applications typically decide on the
placement of segments for an object dynamically and also stores this mapping.
Thus, we could have an application-tailored peer node discovery for this purpose.
In C2P we selected the latter option.
– Load-proportional Communication Overhead: Peak load in storage systems are
co-related with high number of cache activities (reads, evictions, writes). Hence,
more cache activities across nodes generate large number of cache events in the
system. As a consequence, the network may become a bottleneck during high load
and lead to inefficient caching. We address this problem by implementing an aggre-
gation scheme, which ensures a communication overhead that is almost oblivious
to application I/O load. In aggregation, cache events are buffered for short duration
218 S.J. Nadgowda et al.
Segment
Cache
Ext4 Filesystem
update Local Global update
Message Handler
Cache MD Cache MD
Disk
Aggregation & Message Message
Peer Coordinator
Filter Sender Receiver
Peer Node
Discovery
before transmitting and multiple cache events to the same peer node are coalesced
together. We also use filtering to prioritize and drop low priority events.
3 Implementation
The design for C2P in itself can be implemented as an extension to any distributed
storage system that supports data segmentation. As a concrete implementation, we have
implemented C2P into a filesystem cache for open-sourced and widely accepted Open-
stack Swift - a highly available, distributed, eventually consistent object/blob store. We
next discuss the implementation details.
’Aggregation and Filter’ policy. In aggregation, before publishing the cache event,
we buffer them for short duration of 200 ms. And during this time, if there are more
cache events for the same node, then they are aggregated which reduces the payload
size. While aggregation is an optimization policy, filtering is a throttling policy. In
an overloaded system filtering essentially prioritize and drop some events.
– Global cache MD: Global cache MD (GMD) on a given storage node is the meta-
data about segments hosted on that storage node and which is communicated from
the peer storage nodes. Note that, the segments in this GMD not necessarily be in
present in the cache but it could also be on the disk as well. Fields of the GMD
are shown in Tab.2(c). When an object is read, all it’s segments will be accessed
generating an cache event for their peer nodes. Also, each node will receive cache
event from all peer nodes for a given segment. <timestamp>is the latest timestamp
C2P: Co-operative Caching in Distributed Storage Systems 221
T1 T2
4 Evaluation
We evaluate C2P with Openstack Swift Object store. Swift was deployed on a set of 8
VMs running Ubuntu 12.04 LTS. The VMs were hosted on 2 servers each with 24-core
3:07GHz Xeon(R) processor and 64GB memory. Each VM was configured with 2 vC-
PUs, 2 GB memory and a 50 GB disk formatted with ext4 filesystem. We configured 128
MB cache size for C2P-FS on each VM. This cache size was decided based on heuristics
and size of our workloads. We have defined two configurable modes of cache manage-
ment for C2P-FS - namely C:ON and C:OFF. C:ON indicates that co-operative caching
policy is ON for cache replacement on all storage nodes while C:OFF indicates that
each node implements default LRU cache replacement policy. We evaluate C2P based
on several metrics. First, in the baseline experiments, we measured the overhead of our
cache implementation by comparing the performance with native implementation of
fuse. Then, in the case study experiment we specifically measured the cache efficiency
with C2P cache replacement policy against traditional LRU: Least recently used.
We tag all the data access (read) on each of the individual storage node either as
a segment hit or a segment miss. segment hit indicates that the data is read from the
cache while the later indicates that data is read from the disk. More importantly we also
tag each object access. When each segment of an object is a segment hit we identify it
as an Object hit. If there is a segment miss for even a single segment, it is an Object
miss. We further decompose Object miss into Object miss complete and Object miss
partial to indicate whether there is a segment miss for all the segments or some segments
respectively . We define comm latency as the delay between the times when a cache
event is published by any storage node and when it is delivered to peer nodes. We
also measured comm overhead as the number of messages (cache events) generated
per second. Finally, we measured object throughput as size of object (in MB) read per
second.
1000
fs-cache
c2p-cache
Number of op/s
100
10
PU
D
EL
ET
Ts
ET
s
s
Operation type (logscale)
co-operation is disabled and default LRU cache replacement policy is used on storage
nodes to match standard setup. We used Swift-bench [21] which is a standard bench-
marking tool for Openstack Swift. We chose three common IO workloads on any object
store - namely PUT, GET and DELETE for these experiments. We further define the
workload profile with 500 PUTs, 2000 (random) GETs and 500 DELETEs for object
size of 1 MB. Then, we ran this same workload profile in both phases and measured
operation throughput as shown in Fig.4. As we can see, C2P-FS achieves almost the
same throughput as with the standard filesystem deployment for all three kinds of work-
loads. Thus through these baseline experiment we established that our C2P-FS cache
implementation does not incur any performance overhead over a standard swift deploy-
ment. Hence, in the case study experiments below we used C2P-FS in C:OFF mode as
a reference system implementing LRU. Then we compared and contrasted the metric
measurements of C2P system against it.
access. When we access an object, we access it completely i.e. all 5 segments. But, even
for partial access C2P efficiency will be the same. We also measured total segments
accessed across all storage nodes as shown in Fig.6(b). As we can see, there is a large
variation in the access load across storage nodes which is again mostly true for all
distributed storage systems.
60
1400 stored
accessed
50
1200
Num of Segments
Number of access
40 1000
800
30
600
20
400
10 200
0
0
no
no
no
no
no
no
no
no
de
de
de
de
de
de
de
de
0
10
20
30
40
50
-1
-2
-3
-4
-5
-6
-7
-8
0
Object ID Node ID
Thus, this un-even segment distribution compounded with variant access load creates
an erratic data pressure across storage nodes in distributed storage systems. Thus, there
is a greater need of co-operation to enable highly utilized nodes to mantain their cache
states in consistent with their peer nodes which are less utilized.
C2P: Co-operative Caching in Distributed Storage Systems 225
1 1
lru lru
0.9 0.9
c2p−cr c2p−cr
0.8 0.8
0.7 0.7
CDF Of Objects
CDF Of Objects
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0.5 1 1.5 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3
Response Time(s) Response Time(s)
Single-threaded Run. In single-threaded run we used a single swift client which would
read objects from the data store following an access pattern. In the results, first we ana-
lyze the most important aspect from application’s point of view i.e. Object hit ratio. As
shown in Fig.8(a) for C2P systems we get almost 6.7% more Object hit in the cache.
Amongst object misses, we measured around 50% reduction for Object miss partial
and 4% increase for Object miss complete. Putting these numbers in perspective, we
note that for applications storing segmented objects, C2P system can help achieve bet-
ter cache efficiency at object level to reduce an application latency. In Fig.7(b), we also
plot cdf of number of objects against their response time. This is an important measure
which can be translated into SLA assurance of a storage system. For example, for SLA
of response time <0.8 sec C2P system has about 6% more objects satisfying the SLA
than the one implementing LRU. Another interesting observation we made here is that,
for cache missed objects response time for C2P is between 0.7s to 0.9s while that for
LRU is between 0.7s to 1.2s. We conjecture that this increased latency for LRU for
cache missed objects is attributed to the increased disk queue length for missed seg-
ments. Fig.8(b) shows the object throughput measured for each object access. It shows
for C2P, most of the objects have either high throughput around 9 MBps (Object hit) or
low throughput around around 4 MBps(Object miss complete). While for LRU there are
many objects with throughput in between (Object miss partial). As mentioned earlier,
for C2P we increases Object miss complete, but that does not necessarily means disk IO
is also increased. To elaborate on this, for each object accessed we also traced segment
hit ratio on each storage node. As shown in Fig.8(c) on each individual storage node we
get more segment hits. In effect, we reduces disk IO on each storage node and overall
we observed about 5% reduced disk IO across all storage nodes which is a very criti-
cal measure for any storage system. Finally, Fig.10 shows the Rabbit MQ’s monitored
state of the message queue. As we can see, C2P system requires around 20 messages/s
to cache event co-ordination and comm latency less than 200 ms. And considering size
of each message is less than 100 Bytes, the network overhead is very minimal.
Multi-threaded Run. In multi-threaded run we used 4 swift clients. We split the ac-
cess pattern of 2000 objects into 4 access patterns of 500. Then, we ran all the 4 clients
226 S.J. Nadgowda et al.
in parallel requesting objects from the respective split-access pattern. Similar to Single-
threaded run, we measured system characteristics across different metrics. Fig.9(a)
shows 4.5% improved Object hit, and amongst object misses 43% reduced Object miss
partial and around 4% increases for Object miss complete. Fig.7(a) shows the cdf of
number of objects against their response time. Again, compared to LRU in C2P we
measured larger % of objects under any given response time. Fig.9(b) shows object
throughput for object access across all 4 clients. We observe similar pattern to that of a
singlethread run. And Fig.9(c) shows segment hit distribution across all storage nodes.
Again, on each storage node we observe better cache hits for C2P, thus reducing the
disk IO by around 3.5% across all nodes. Fig.11 shows comm overhead in the order of
70 messages/s (7KBps) still very minimal. But, now we get comm latency in around
1 second. Comm overhead and latency are observed to be higher than the respective
numbers in the single-threaded run. This is because in multithreaded run, object request
rate is higher being coming from 4 clients in parallel which in turn increases the rate of
cache activities on individual storage nodes, thus cache events are published at a higher
rate. We observed, Object hit for C2P system in this run is slightly less than that in the
single threaded run. This is attributed to the higher comm latency which increases the
delay between cache co-ordination across storage nodes. In out future work, we will try
to minimize this effect of latency on cache efficiency.
1200 1000
lru lru lruhits
c2p 14 c2p c2phits
900
lrumiss
1000 800 c2pmiss
12
700
Throughput (MB/s)
Number of objects
800 10
Data in MB
600
8 500
600
400
6
400 300
4
200
200 100
2
0
0 0
no
no
no
no
no
no
no
no
de
de
de
de
de
de
de
de
hitcomplete misscomplete misspartial 0 200 400 600 800 1000 1200 1400 1600 1800 2000
1
(a) Object Cache Hit/Miss (b)Object Throughput (c) Cache Stats on storage nodes
1200 1000
lru lru lruhits
c2p 14 c2p c2phits
900
lrumiss
1000 800 c2pmiss
12
700
Throughput (MB/s)
Number of objects
800 10
600
Data (MB)
8 500
600
400
6
400 300
4
200
200 100
2
0
0 0
no
no
no
no
no
no
no
no
de
de
de
de
de
de
de
de
hitcomplete misscomplete misspartial 0 200 400 600 800 1000 1200 1400 1600 1800 2000
1
(a) Object Cache Hit/Miss (b)Object Throughput (c) Cache Stats on storage nodes
Fig. 10. Singlethread run overhead Fig. 11. Multithread run overhead
The maximum value of comm latency for optimal performance of C2P system, is a
function of size of cache on each storage node. Although, it is important to note here
that effectiveness of C2P systems is not limited by comm latency of less than a second.
But, since we had a small cache size on storage nodes,
To summarize the case study results we note: 1) compare to traditional LRU cache
replacement policy, C2P achieves 4-6% increase in the object hits thus reducing the
access latency for more objects. 2) In C2P systems, on each of the comprising storage
nodes cache hits are improved reducing the disk IO by around 3-5%. And 3) event-based
architecture to co-ordinate caching incurs a very minimal network overhead.
5 Related Work
Distributed systems and cache coordination techniques in such systems has been around
for a long time[1][23][7][6][16]. But cache cooperation traditionally been applied in
contexts like scaling or disk IO optimization. To our best knowledge C2P is the first
system designed to maximize cache efficiency of distributed storage hosting segmented
data.
Scale: Memcached[9] is a general-purpose distributed memory caching system. For
a large scale cache requirements, hundreds of nodes are setup. And these nodes then
leverages their memory through memcached to build a large in-memory key-value store
for small chunks of arbitrary data. Facebook probably is the world’s largest user of
memcached[19].
Disk IO optimization: CCM[8] probably is closet to our work. For cluster-based
servers, CCM keeps an accounting information for multiple copies of the same data
(blocks) available in the cache across all nodes. Then, this information is used to for-
ward IO requests between nodes to ensure cache hit. Essentially, they increase network
communication to reduce disk access. Similarly in [1], technique of split caching is
used to avoid disk reads by using the combined memory of the clients as a cooperative
cache. DSC[16] describes the problems of the exposition of one nodes resources to oth-
ers. As they state, cache state interactions and the adoption of a common scheme for
cache management policies are two primary reasons behind the problem. [6] mentions
interesting techniques for data prefetching with Co-Operative Caching. The bottomline
in all these prior work is that - a cache cooperation will happen between the nodes only
if they contains the same data.
In C2P, the primary distinction is that cache cooperation is designed for logically
related data e.g. different segments of the same object. Also, there is no resource expo-
sition between the nodes in the cluster i.e. each node will serve the ONLY IO requests
228 S.J. Nadgowda et al.
for which it is actually hosting the data. Thus, IO requests are not forwarded between
the nodes, but just the cache events are communicated.
C2P design presented in this paper caters to the distributed systems storing segmented
data and ensures better cache efficiency for them. In our current implementation we
have exploited this cache cooperation only for cache replacement policy. We also plan
to implement and exercise cache prefetching for C2P, wherein we can prefetch seg-
ments based on cache events from the peer nodes. We believe such prefetching will
further improve the cache efficiency.
One of the data property we haven’t considered in C2P is - replication. For stor-
age systems, data striping and replication can be applied simulteneously. Here, first we
need to understand placement and access characteristics of such data. Then for these
scenarios, through cache cooperation we can ensure only one copy of the data segment
remains in the cache across all nodes. And these segments in cache might belong to
different replica copy of the data.
Finally, we plan to deploy C2P in some production distributed system and measure
the scalability, overhead for live data.
7 Conclusion
In this paper we present C2P: a coperative caching policy for distributed stoarge sys-
tems. C2P implements a coordination protocol wherein each node communicates their
local cache events to peers. Then based on these additional cache state information of
peers, each node implements a co-ordinated caching policy for cache replacement and
cache prefetching. These policies in turn ensures a consistent caching across nodes for
segments of the data which are logically related. Thus, we can reduce the access latency
for the data and improve the overall performance of the system.
References
1. Adya, A., Castro, M., Liskov, B., Maheshwari, U., Shrira, L.: Fragment reconstruction: Pro-
viding global cache coherence in a transactional storage system. In: Proceedings of the 17th
International Conference on Distributed Computing Systems, pp. 2–11. IEEE (1997)
2. Amazon: Amazon S3, http://aws.amazon.com/s3/
3. Bloemer, J., Kalfane, M., Karp, R., Karpinski, M., Luby, M., Zuckerman, D.: An xor-based
erasure-resilient coding scheme (1995)
4. Cabrera, L.F., Long, D.: Using data striping in a local area network (1992)
5. Cabrera, L.F., Long, D.D.E.: Swift: Using distributed disk striping to provide high i/o data
rates. Computing Systems 4(4), 405–436 (1991)
6. Chi, C.H., Lau, S.: Data prefetching with co-operative caching. In: 5th International Confer-
ence on High Performance Computing, HIPC 1998, pp. 25–32. IEEE (1998)
7. Clarke, K.J., Gittins, R., McPolin, S., Rang, A.: Distributed storage cache coherency system
and method, US Patent 7,017,012 (March 21, 2006)
C2P: Co-operative Caching in Distributed Storage Systems 229
1 Introduction
Over the last decade, there is a paradigmatic shift from the traditional stand-
alone software solutions towards the service-oriented paradigm to design, de-
velop, and deploy software systems [1]. REST (REpresentational State Trans-
fer) [7] architectural style is simpler and more efficient than the traditional
SOAP-based (Simple Object Access Protocol) Web services in publishing and
consuming services [18]. Thus, RESTful services are gaining an increased atten-
tion. Facebook, YouTube, Twitter, and many more companies leverage REST.
However, the increased usage of REST for designing and developing Web-based
applications confronts common software engineering challenges. In fact, likewise
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 230–244, 2014.
c Springer-Verlag Berlin Heidelberg 2014
Detection of REST Patterns and Antipatterns 231
any software system, RESTful systems must evolve to handle new web entities
and resources, i.e., meet new business requirements. Even, the changes in un-
derlying technologies or protocols may force the REST APIs to change. All these
changes may degrade the design of REST APIs, which may cause the introduction
of common poor solutions to recurring design problems—antipatterns—in oppo-
sition to design patterns, which are good solutions to the problems that software
engineers face while designing and developing RESTful systems. Anti(patterns)
might be introduced even in the early design phase of RESTful systems. Antipat-
terns in RESTful systems not only degrade their design but also make their main-
tenance and evolution difficult, whereas design patterns facilitate them [3, 5, 6].
Forgetting Hypermedia [16] is a common REST antipattern that corresponds
to the absence of hypermedia, i.e., links within resource representations. The
absence of such links hinders the state transition of RESTful systems and lim-
its the runtime communication between clients and servers. In contrast, Entity
Linking [6]—the corresponding design pattern—promotes runtime communica-
tion via links provided by the servers within resource representations. By us-
ing such hyper-links, the services and consumers can be more autonomous and
loosely coupled. For REST APIs, the automatic detection of such (anti)patterns
is an important activity by assessing their design (1) to ease their maintenance
and evolution and (2) to improve their design quality.
REST (anti)patterns require a concrete detection approach, to support their rig-
orous analysis, which is still lacking. Despite the presence of several technology-
specific approaches in SCA (Service Component Architecture) and Web services
(e.g., [3, 9–11, 13]), they are not applicable for detecting (anti)patterns in REST.
Indeed, the key differences between REST architecture and other SOA standards
prevents the application of these approaches because: (1) traditional service-
orientation is operations-centric, whereas REST is resources-centric, (2) RESTful
services are on top of JSON (or XML) over HTTP, whereas traditional Web services
are on top of SOAP over HTTP or JMS (Java Message Service), (3) Web services
use WSDL (Web Service Definition Language) as their formal contracts; REST has
no standardised contract except the human-readable documentations, (4) tradi-
tional services are the set of self-contained software artefacts where operations
are denoted using verbs; resources in REST are denoted by nouns and are directly-
accessible objects via URIs, and (5) REST clients use the standard HTTP methods
to interact with resources; Web services clients implement separate client-stubs to
consume services.
Among many others, the differences discussed above motivate us to propose a
new approach, SODA-R (Service Oriented Detection for Antipatterns in REST) to
detect (anti)patterns in RESTful systems. SODA-R is supported by an underlying
framework, SOFA (Service Oriented Framework for Antipatterns) [9] that supports
static and dynamic analyses of service-based systems.
To validate SODA-R, first, we perform a thorough analysis of REST (anti)patterns
from the literature [2, 5, 6, 8, 12, 16] and define their detection heuristics. A de-
tection heuristic provides an indication for the presence of certain design issues.
For instance, a heuristic “servers should provide entity links in their responses”,
232 F. Palma et al.
suggests that REST developers need to provide entity links in the responses that
REST clients can use. For such case, we define a detection heuristic to check if the
response header or body contains any resource location or entity links. Follow-
ing the defined heuristics, we implement their concrete detection algorithms, ap-
ply them on widely used REST APIs, and get the list of REST services detected
as (anti)patterns. Our detection results show the effectiveness and accuracy of
SODA-R: it can detect five REST patterns and eight REST antipatterns with an aver-
age precision and recall of more than of 75% on 12 REST APIs including BestBuy,
Facebook, and DropBox.
Thus, the main contributions in this paper are: (1) the definition of detection
heuristics for 13 REST (anti)patterns from the literature, namely [2, 5, 6, 8, 12,
16], (2) the extension of SOFA framework from its early version [9] to allow
the detection of REST (anti)patterns, and, finally, (3) the thorough validation
of SODA-R approach with 13 REST (anti)patterns on a set of 12 REST APIs by
invoking 115 REST methods from them.
The reminder of the paper is organised as follows. Section 2 briefly describes
the contributions from the literature on the specification and detection of SOA
(anti)patterns. Section 3 presents our approach SODA-R, while Section 4 presents
its validation along with detailed discussions. Finally, Section 5 concludes the
paper and sketches the future work.
2 Related Work
Step 2.2: Dynamic Invocation - After we have Java interfaces for REST
APIs, we implement the REST clients to invoke each service by providing the
correct parameter lists. The REST clients must conform to the API documenta-
tions. During the detection time, we dynamically invoke the methods of service
interfaces. From REST point of view, invocation of a method refers to performing
an action on a resource or on an entity. For some method invocations, clients
require to authenticate themselves to the servers. For each authentication pro-
cess, we need to have a user account to ask for the developer credentials to the
server. The server then supplies the user with the authentication details to use
every time to make a signed HTTP request. For instance, YouTube and DropBox
support OAuth 2.0 authentication protocol to authenticate their clients. At the
end of this step, we gather all the requests and responses.
Step 2.3: Application - For the application, we rely on the underlying frame-
work SOFA (Service Oriented Framework for Antipatterns) [9] that enables the
analysis of static and dynamic properties specific to REST (anti)patterns. We au-
tomatically apply the heuristics in the form of detection algorithms on the requests
236 F. Palma et al.
from the clients and responses from the servers, gathered in the previous step. In
the end, we obtain a list of detected REST (anti)patterns.
From its initial version in [9], we further developed the SOFA framework to sup-
port the detection of REST (anti)patterns. SOFA itself is developed based on the
SCA (Service Component Architecture) standard [4] and is composed of several
SCA components. SOFA framework uses FraSCAti [15] as its runtime support. We
added a new REST Handler SCA component in the framework. The REST Handler
component supports the detection of REST (anti)patterns by (1) wrapping each
REST API with an SCA component and (2) automatically applying the detection
heuristics on the SCA-wrapped REST APIs. This wrapping allows us to introspect
each request and response at runtime by using an IntentHandler. The intent
handler in FraSCAti is an interceptor that can be applied on a specific service
to implement the non-functional features, e.g., transaction or logging. When we
invoke a service that uses an IntentHandler, the service call is interrupted and
the intent handler is notified by calling the invoke(IntentJoinPoint) method.
This interruption of call enables us to introspect the requests and responses of
an invoked REST service.
4 Validation
4.1 Hypotheses
We define heuristics for eight different REST antipatterns and five REST patterns
from the literature. Tables 2 and 3 list those REST antipatterns and patterns
Detection of REST Patterns and Antipatterns 237
collected from the literature, mainly [6, 8, 12, 16, 17]. In Tables 2 and 3, we put
the relevant properties for each antipattern and pattern in bold-italics.
As for the objects in our experiment, we use some widely-used and popular
REST APIs for which their underlying HTTP methods, service end-points, and
authentication details are well documented online. Large companies like Face-
book or YouTube provide self-contained documentations with good example sets.
Table 4 lists the 12 REST APIs that we analysed in our experiment.
Through the implemented clients, we invoked a total set of 115 methods from
the service interfaces to access resources and received the responses from servers.
Then, we applied the detection algorithms on the REST requests and responses
and reported any existing patterns or antipatterns using our SOFA framework.
We manually validated the detection results to identify the true positives and
238 F. Palma et al.
to find false negatives. The validation was performed by two professionals who
have knowledge on REST and were not part of the implementation and exper-
iment. We provided them the descriptions of REST (anti)patterns and the sets
of all requests and responses collected during the service invocations. We used
precision and recall to measure our detection accuracy. Precision concerns the
ratio between the true detected (anti)patterns and all detected (anti)patterns.
Recall is the ratio between the true detected (anti)patterns and all existing true
(anti)patterns.
4.4 Results
Table 5 presents detailed detection results for the eight REST antipatterns and five
REST patterns. The table reports the (anti)patterns in the first column followed
Detection of REST Patterns and Antipatterns 239
Fig. 4. Bar-plots of the detection results for eight antipatterns and five patterns. (APIs
are followed by the number of method invocations in parentheses. The acronyms corre-
spond to the (anti)pattern name abbreviation and the numbers represent their detected
instances.)
by the analysed REST APIs in the following twelve columns. For each REST API
and for each (anti)pattern, we report: (1) the total number of validated true
positives with respect to the total detected (anti)patterns by our algorithms,
i.e., the precision, in the first row and (2) the total number of detected true
positives with respect to the total existing true positives, i.e., the recall, in the
following row. The last two columns show, for all APIs, the average precision-
recall and the total detection time for each (anti)pattern. The detailed results
on all the test cases, e.g., 115 methods from 12 REST APIs, are available on our
web site1 .
(ISC, 2 instances) and Misusing Cookies (MC, 3 instances) were not significantly
observed among the 115 tested methods.
As for REST patterns, Content Negotiation (CN, 70 instances) and Entity
Linking (EL, 62 instances) were most frequently applied by REST developers.
Content Negotiation pattern supports the ability to represent REST resources in
diverse formats (implemented by REST developers) as requested by the clients.
Entity Linking pattern facilitates clients to follow links provided by the servers.
Furthermore, some APIs also applied Response Cashing (RC, 13 instances) and
End-point Redirection (ER, 2 instances) patterns.
Overall, REST APIs that follow patterns tend to avoid corresponding antipat-
terns and vice-versa. For example: BestBuy and Facebook are found involved
respectively in 0 and 8 instances of Forgetting Hypermedia antipattern; however,
these APIs are involved in 11 and 21 corresponding Entity Linking pattern.
Moreover, DropBox, Alchemy, YouTube, and Twitter APIs had 27 instances of
Ignoring Caching antipattern, but they were involved in 8 instances of the cor-
responding Response Cashing pattern. Finally, we found Facebook, DropBox,
BestBuy, and Zappos APIs involved in only 3 instances of Ignoring MIME Types
antipattern, which conversely are involved in more than 55 instances of corre-
sponding Content Negotiation pattern.
In general, among the 12 analysed REST APIs with 115 methods tested and
eight antipatterns, we found Twitter (32 instances of four antipatterns), Drop-
Box (40 instances of four antipatterns), and Alchemy (19 instances of five an-
tipatterns) are more problematic, i.e., contain more antipatterns than others (see
Figure 4). On the other hand, considering the five REST patterns, we found Face-
book (49 instances of four patterns), BestBuy (22 instances of two patterns), and
YouTube (15 instances of three patterns) are well designed, i.e., involve more
patterns than others (see Figure 4).
Table 5. Detection results of the eight REST antipatterns and five REST patterns ob-
tained by applying detection algorithms on the 12 REST APIs (numbers in the paren-
theses show total test methods for each API).
(4)CharlieHarvey
Precision-Recall
Detection Time
(8)TeamViewer
precision-recall
(8)Musicgraph
(29)Facebook
(15)DropBox
(12)BestBuy
(9)YouTube
(10)Twitter
(7)Alchemy
(115) Total
REST API
(7)Zappos
(3)Ohloh
Average
(3)Bitly
REST Antipatterns
Breaking Self- 0/0 12/12 0/0 4/4 12/12 29/29 0/0 3/3 0/0 10/10 9/9 7/7 p 86/86 100%
21.31s
descriptiveness 0/0 12/12 0/0 4/4 12/14 29/29 0/0 3/3 0/0 10/10 9/9 7/7 r 86/88 98.21%
Forgetting 1/1 0/0 2/2 0/0 9/10 8/8 7/7 0/0 3/3 4/4 2/3 0/0 p 36/38 94.58%
19.54s
Hypermedia 1/1 0/0 2/2 0/0 9/9 8/8 7/7 0/0 3/3 4/4 2/2 0/0 r 36/36 100%
Ignoring 7/7 0/0 0/0 0/0 12/12 1/1 0/0 1/1 4/4 8/8 0/0 0/0 p 33/33 100%
18.99s
Caching 7/7 0/0 0/0 0/0 12/12 1/1 0/0 1/1 4/4 8/8 0/0 0/0 r 33/33 100%
Ignoring 2/2 1/1 3/3 4/4 0/0 2/2 8/8 0/0 0/0 10/10 9/9 0/0 p 39/39 100%
19.39s
MIME Types 2/2 1/1 3/3 4/4 0/0 2/2 8/8 0/0 0/0 10/10 9/9 0/0 r 39/39 100%
Ignoring 1/2 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 p 1/2 50%
21.22s
Status Code 1/2 0/0 0/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 r 1/3 25%
Misusing 0/0 0/0 0/0 0/0 0/0 0/0 0/0 3/3 0/0 0/0 0/0 0/0 p 3/3 100%
19.1s
Cookies 0/0 0/0 0/0 0/0 0/0 0/0 0/0 3/3 0/0 0/0 0/0 0/0 r 3/3 100%
Tunnelling 5/7 0/0 0/2 0/0 0/0 0/0 0/1 0/0 0/0 0/0 0/0 0/1 p 5/11 17.86%
28.26s
Through GET 5/5 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 r 5/5 100%
Tunnelling 0/0 0/0 0/0 0/0 5/5 0/0 0/0 0/0 0/0 0/0 0/0 0/0 p 5/5 100%
28.64s
Through POST 0/0 0/0 0/0 0/0 5/5 0/0 0/0 0/0 0/0 0/0 0/0 0/0 r 5/5 100%
REST Patterns
Content 5/5 11/11 0/0 0/0 14/14 26/26 0/0 3/3 5/5 0/0 0/0 7/7 p 71/71 100%
19.63s
Negotiation 5/5 11/11 0/0 0/0 14/14 26/26 0/0 3/3 5/5 0/0 0/0 7/7 r 71/71 100%
Entity 6/6 11/11 1/1 4/4 3/3 21/21 1/1 2/2 1/1 5/5 6/6 4/4 p 65/65 100%
19.90s
Linking 6/6 11/11 1/1 4/4 3/3 21/21 1/1 2/2 1/1 5/5 6/7 4/4 r 65/66 98.81%
End-point 0/0 0/0 0/0 0/0 0/0 1/1 0/0 1/1 0/0 0/0 0/0 0/0 p 2/2 100%
20.36s
Redirection 0/0 0/0 0/0 0/0 0/0 1/1 0/0 1/1 0/0 0/0 0/0 0/0 r 2/2 100%
Entity 1/1 0/0 1/1 1/1 1/1 1/1 1/1 1/1 1/1 1/1 1/1 0/0 p 10/10 100%
23.06s
Endpoint 1/1 0/0 1/1 1/1 1/1 1/1 1/1 1/1 1/1 1/1 1/1 0/0 r 10/10 100%
Response 0/0 0/0 0/0 0/0 0/0 0/0 0/0 1/1 0/0 0/0 8/8 4/4 p 13/13 100%
19.23s
Caching 0/0 0/0 0/0 0/0 0/0 0/0 0/0 1/1 0/0 0/0 8/8 4/4 r 13/13 100%
p 369/378 89.42%
Average 21.43s
r 369/374 94%
them as non-standard practice. This leads to the precision of 100% and the recall
of 98.21% for this detection.
Any RESTful interaction is driven by hypermedia—by which clients interact
with application servers via URL links provided by servers in resource repre-
sentations [7]. The absence of such interaction pattern is known as Forgetting
Hypermedia antipattern [16], which was detected in eight APIs, namely Bitly,
DropBox, Facebook, and so on (see Table 5). Among the 115 methods tested,
we found 38 instances of this antipattern. Moreover, REST APIs that do not
have this antipattern well applied the corresponding Entity Linking pattern [6],
e.g., Alchemy, BestBuy, and Ohloh, which is a good practice. This observation
suggests that, in practice, developers sometimes do not provide hyper-links in
resource representations. As for the validation, 36 instances of Forgetting Hy-
permedia antipattern were manually validated; therefore, we have an average
precision of 94.58% and a recall of 100%. For Entity Linking pattern, the man-
ual validation confirmed 66 instances whereas we detected a total of 65 instances,
242 F. Palma et al.
all of which were true positives. Thus, we had an average precision of 100% and
a recall of 98.81%.
Caching helps developers implementing high-performance and scalable REST
services by limiting repetitive interactions, which if not properly applied violates
one of the six REST principles [7]. REST developers widely ignore the caching ca-
pability by using Pragma: no-cache or Cache-Control: no-cache header in the re-
quests, which forces the application to retrieve duplicate responses from servers.
This bad practice is known as Ignoring Caching antipattern [16]. In contrast, the
corresponding pattern, Response Caching [6] supports response cacheability. We
detected six REST APIs that explicitly avoid caching capability, namely Alchemy,
DropBox, Ohloh, and so on (see Table 5). On the other hand, cacheability is
supported by YouTube and Zappos, which were detected as Response Caching
patterns. The manual analysis of requests and responses also confirmed these
detections, and we had an average precision and recall of 100%.
As future work, we plan to generalise our findings to other REST APIs. How-
ever, we tried to minimise the threat to the external validity of our results by
performing experiments on 12 REST APIs by invoking and testing 115 methods
from them. The detection results may vary based on the heuristics defined for
the REST (anti)patterns. To minimise the threats to the Internal validity, we
made sure that every invocation receives responses from servers with the correct
request URI, and the client authentication done while necessary. Moreover, we
tested all the major HTTP methods in REST, i.e., GET, DELETE, PUT, and POST on
resources to minimise the threat to the internal validity. Engineers may have dif-
ferent views and different levels of expertise on REST (anti)patterns, which may
affect the definition of heuristics. We attempted to lessen the threat to construct
validity by defining the heuristics after a thorough review of existing literature
on the REST (anti)patterns. We also involved two professionals in the intensive
validation of the results. Finally, the threats to reliability validity concerns the
possibility of replicating this study. To minimise this threat, we provide all the
details required to replicate the study, including the heuristics, client requests,
and server responses on our web site1 .
Acknowledgements. The authors thank Abir Dilou for initiating the study.
This study is supported by NSERC (Natural Sciences and Engineering Research
Council of Canada) and FRQNT research grants.
244 F. Palma et al.
References
1. Bennett, K., Layzell, P., Budgen, D., Brereton, P., Macaulay, L., Munro, M.:
Service-based Software: The Future for Flexible Software. In: Proceedings of Sev-
enth Asia-Pacific Software Engineering Conference, pp. 214–221 (2000)
2. Daigneau, R.: Service Design Patterns: Fundamental Design Solutions for
SOAP/WSDL and RESTful Web Services. Addison-Wesley (November 2011)
3. Demange, A., Moha, N., Tremblay, G.: Detection of SOA Patterns. In: Basu, S.,
Pautasso, C., Zhang, L., Fu, X. (eds.) ICSOC 2013. LNCS, vol. 8274, pp. 114–130.
Springer, Heidelberg (2013)
4. Edwards, M.: Service Component Architecture (SCA). OASIS, USA (April 2011)
5. Erl, T.: SOA Design Patterns. Prentice Hall PTR (January 2009)
6. Erl, T., Carlyle, B., Pautasso, C., Balasubramanian, R.: SOA with REST: Prin-
ciples, Patterns & Constraints for Building Enterprise Solutions with REST. The
Prentice Hall Service Technology Series from Thomas Erl. (2012)
7. Fielding, R.T.: Architectural Styles and the Design of Network-based Software
Architectures. PhD thesis (2000)
8. Fredrich, T.: RESTful Service Best Practices: Recommendations for Creating Web
Services (May 2012)
9. Moha, N., Palma, F., Nayrolles, M., Conseil, B.J., Guéhéneuc, Y.-G., Baudry,
B., Jézéquel, J.-M.: Specification and Detection of SOA Antipatterns. In: Liu,
C., Ludwig, H., Toumani, F., Yu, Q. (eds.) Service Oriented Computing. LNCS,
vol. 7636, pp. 1–16. Springer, Heidelberg (2012)
10. Nayrolles, M., Moha, N., Valtchev, P.: Improving SOA Antipatterns Detection in
Service Based Systems by Mining Execution Traces. In: 20th Working Conference
on Reverse Engineering, pp. 321–330 (October 2013)
11. Palma, F., Nayrolles, M., Moha, N., Guéhéneuc, Y.G., Baudry, B., Jézéquel, J.M.:
SOA Antipatterns: An Approach for their Specification and Detection. Interna-
tional Journal of Cooperative Information Systems 22(04) (2013)
12. Pautasso, C.: Some REST Design Patterns (and Anti-Patterns) (October 2009),
http://www.jopera.org/node/442
13. Penta, M.D., Santone, A., Villani, M.L.: Discovery of SOA Patterns via Model
Checking. In: 2nd International Workshop on Service Oriented Software Engineer-
ing: In Conjunction with the 6th ESEC/FSE Joint Meeting, IW-SOSWE 2007, pp.
8–14. ACM, New York (2007)
14. RFC2822: Internet Message Format by Internet Engineering Task Force. Technical
report (2001)
15. Seinturier, L., Merle, P., Rouvoy, R., Romero, D., Schiavoni, V., Stefani, J.B.: A
Component-Based Middleware Platform for Reconfigurable Service-Oriented Ar-
chitectures. Software: Practice and Experience 42(5), 559–583 (2012)
16. Tilkov, S.: REST Anti-Patterns (July 2008),
http://www.infoq.com/articles/rest-anti-patterns
17. Tilkov, S.: RESTful Design: Intro, Patterns, Anti-Patterns (December 2008),
http://www.devoxx.com/
18. Vinoski, S.: Serendipitous Reuse. IEEE Internet Computing 12(1), 84–87 (2008)
How Do Developers React
to RESTful API Evolution?
1 Introduction
Nowadays, on-line users can conduct various tasks such as posting text on Twit-
ter1 through web applications or services. With the rapid emergence of REp-
resentational State Transfer (REST) and high demand of engaging on-line ex-
perience from end-users, software organizations such as Twitter are willing to
open their applications as RESTful web service APIs described in plain HTML
pages [7]. Client code developers often integrate the web APIs into their ap-
plications or services to accelerate their development or stay away from low
level programming tasks [4]. Typically web API providers (e.g., Twitter) evolve
their APIs for various reasons, such as adding new functionality [10]. Client
code developers have no control of web API evolution and have to evolve their
1
https://twitter.com/
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 245–259, 2014.
c Springer-Verlag Berlin Heidelberg 2014
246 S. Wang, I. Keivanloo, and Y. Zou
show that the change type of Adding New Methods has the highest percentage of
the number of total changes (i.e., 41.52% of total changes belongs to Adding New
Methods).
RQ2. Which types of changes trigger more questions from client code
developers?
To understand the difficulties of developers to adopt different types of API
changes, the first step is to identify which types of changes trigger a larger
volume of discussion (i.e., in terms of the number of questions) regarding the
changes from client code developers. We extract the question posts related to
change types of web API evolution in StackOverflow, an online platform for
developers to share software development experience. We investigate the differ-
ences between change types in terms of the average number of question posts
per change from developers. Our empirical results show that the change type
of Adding New Methods attracts more question than other change types do.
RQ3. Which types of changes bring discussion on posted questions
from developers?
When a question post attracts developers’ attention and is worth discussing,
the post starts receiving more answers, comments and views from developers.
Analyzing such question posts regarding API changes is helpful to understand
on which type of problems the developers are stuck. We use the number of
answers, the number of views from developers received by a question and its score
as measurements of the more discussed questions for developers. In RQ2, the
impact of API changes on the volume of discussion is explored. In this question,
we investigate the impact of API changes on the quality of discussion. Our
empirical results show that Deleted Methods generates most discussed questions
in the developer community, and the type of change Adding New Methods draws
the questions with more view counts from developers.
The rest of this paper is organized as follows. Section 2 presents the back-
ground of this study. Section 3 introduces the empirical study setup and re-
search questions. Section 4 discusses threats to validity. Section 5 summarizes
the related literature. Finally, Section 6 concludes the paper and outlines some
avenues for future work.
2 Background
In this section, we introduce the basic structure of web APIs and Question & An-
swer websites.
Fig. 1. A labeled screen shot of a question titled “What are REST resources?” in
StackOverflow
answers, comments and a score. StackOverflow opens its dataset on-line through
StackExchange Data Explorer3. In this paper, we extract developers’ posts in
StackOverflow through StackExchange Data Explorer for analyzing developers’
discussions regarding the evolution of RESTful services.
3 Empirical Study
In this section, we first introduce the setup. Then we discuss the research ques-
tions of our study. For each question, we introduce the motivation of the question,
analysis approach and the findings.
3
https://data.stackexchange.com/
250 S. Wang, I. Keivanloo, and Y. Zou
Data Collection
Collecting web APIs: To study web API changes, we need different versions of
a web API. We extracted the list of most popular APIs in ProgrammableWeb4
and use them as the candidates of our study. Then, we use the following criteria
to choose web APIs as the subject APIs in our study: 1) The web APIs have
at least two versions, and the API documentation of each version is available
on-line; 2) The web APIs are from different application domains; 3) The web
APIs are from different companies, since we aim to study the various change
types of different development teams. For each selected web API, we study all
of the publicly available versions of the API. We identify the types of web API
changes by comparing the differences between subsequent versions of each API.
We downloaded the web pages describing the web API methods for comparison.
Table 1 shows the information of our subject web APIs.
Collecting the developers’ on-line discussion on web API changes in Stack-
Overflow: We composed SQL scripts and ran these scripts to retrieve posts
related to web APIs through StackExchange Data Explorer5. For each web API,
we defined a keyword and conducted a wild-card search to mine all of the posts
tagged with labels including the keyword [11]. We considered only posts with the
matching labels to exclude possible irrelevant posts. For example, we retrieved
46,646 posts with labels including the “twitter” keyword (e.g., twitter, twitter-
api). Table 2 shows the keywords used and the number of posts retrieved for
each API. All of the posts were retrieved on May 1st, 2014.
In this sub-section, we present our three research questions. For each research
question, we introduce the motivation of the research question, analysis approach
and findings of the question.
RQ1. What are the change types of web APIs during evolution?
Motivation. Usually, the web API providers conduct various changes on their
APIs between two subsequent versions such as adding new functionality or fix-
ing bugs [10]. The API client developers have to study the API changes and
incorporate the client applications with the changes accordingly. Understand-
ing the types of changes is useful to help client code developers to conduct a
code migration [10]. In this question, we explore the change types during API
evolution.
Analysis Approach. To answer this question, we conduct the following
steps: Step 1: We first identify API changes among subsequent versions of web
APIs. We manually compare API documentation, such as migration guides or
reference documents, of subsequent versions of an API. We process two versions
of a web API in the following steps:
1. We cross-reference two versions of the API and identify any changes made for
all of the API methods. Such changes are considered as API-level changes.
4
http://www.programmableweb.com/
5
https://data.stackexchange.com/
How Do Developers React to RESTful API Evolution? 251
Step 2: We summarize and classify the identified changes in Step 1. Then, in order
to identify new change types, we compare the summarized change types of web
APIs with the ones of Web APIs [10], JAVA APIs [3] and WSDL service [5][6].
Step 3: We summarize and count the frequency of each change type to identify
the common practices.
Findings. In total, we identify 21 change types on the eleven studied web
APIs. We divide them into three groups: 1) the API-level change types made on
all of the methods; 2) the method-level change types made on specific methods;
3) the parameter-level change types made on parameters of methods.
Table 3 shows our summary of change types at API-level. All of the change
types in Table 3 are observed based on the comparison of subsequent versions
of a web API. However API providers can support several versions and make
changes on all of the running versions at the same time. For example, on Nov.
2nd, 2012, Twitter changed the format of “withheld in countries” field from a
comma-separated JSON string to an array of uppercase strings [14], which is a
breaking change applicable to all of the versions of Twitter.
Table 4 shows our summary of change types at method-level and Table 5 shows
our summarized change types at parameter-level. We found that the functionality
of several API methods was merged into the functionality of one method, or
the functionality of a method was divided into several methods in the newer
252 S. Wang, I. Keivanloo, and Y. Zou
Table 6 shows that the average proportion of changed elements (i.e., Methods
and Parameters) between two consecutive versions of a web API is 82%. However
only 30% of JAVA API elements and 41% of WSDL service elements (i.e., Types
and Operations) are changed compared with two consecutive versions [3][6].
How Do Developers React to RESTful API Evolution? 253
RESTful web APIs are more change-prone than JAVA APIs and WSDL
services during API evolution.
Table 7 summarizes the number of changes of each change type. The most four
common practices are: Add Method, Delete Method, Change Method Name and
Add Parameter. In total, we identify 460 changes and 191 changes (i.e., 41.52%)
of them belong to Add Method. The API-level change types are not included in
the frequency counting, since they are typically applicable to all of the methods
and including the frequency of API level change types will skew the results.
The change type of Add Method makes up the largest proportion (i.e.,
41.52%) of total changes in the studied RESTful services
254 S. Wang, I. Keivanloo, and Y. Zou
RQ2. Which types of changes trigger more questions from client code
developers?
Motivation. When encountering problems in software development, developers
start using crowd-sourced resources such as StackOverflow instead of using mail-
ing lists or project-specific forums [11]. Therefore, analyzing on-line discussion
regarding API changes in StackOverflow is useful to understand the developers’
difficulties in dealing with different types of API changes. The first step of under-
standing the developer’s challenges is to identify the change types drawing more
discussion (i.e., in terms of the number of questions) than others in StackOver-
flow. By knowing the change types triggering more discussion, RESTful API
providers can arrange their resources to approach such change types carefully to
help client code developers during the client code migration.
Analysis Approach. To answer this question, we analyze the StackOverflow
question posts from Twitter, Blogger, Tumblr and OpenStreetMap, because they
relatively have more posts than the other APIs in our dataset in Section 3.1. To
identify which change types trigger more questions from developers, we conduct
the following steps:
Step 1: we link API changes with StackOverflow posts in the following steps:
1) we obtain a mapping from method-level and parameter-level changes to API
HTTP methods from RQ1. We search for API-related posts containing the API
method names; 2) we remove any special characters such as “/” in a method
name; 3) some methods can be linked with several change types. In this case,
we cannot identify the change types with which StackOverflow posts belong to.
Instead of introducing bias in our results, we remove such methods from our
analysis (i.e., we only have very few such methods). We obtain a mapping chain:
a change type— a set of API methods—a set of Posts.
Step 2: we compute the average number of questions concerning each method.
In this question, we only study the method and parameter level change types,
because such types of changes can be linked with StackOverflow posts through
API method names and introduce less noise in our data than API-level changes.
Step 3: we compute the Mann-Whitney Test and the Cliff’s Delta, a non-
parametric effect size measure [8], to compare the distribution of questions for
different types of changes (i.e., only change types at method and parameter
level) in our study. We follow the guidelines in [8] to interpret the effect size
How Do Developers React to RESTful API Evolution? 255
Fig. 2. Average Number of Questions Per Methods with Different Change Types
Table 8. Questions per Method of Change Types: Manny-Whitney Test (adj. p-value)
and Cliff’s Delta (d) Between Different Change Types. Only Significant Results and
Major Change Types are reported.
Test adj. p-value d
Add Method vs Delete Parameter < 0.01 -0.12 (Small)
Add Method vs Change Method Name < 0.01 0.15 (Small)
Add Method vs Add Parameter < 0.01 -0.57 (Large)
Add Method vs Change Parameter Name < 0.01 -0.48 (Large)
Add Method vs Delete Method < 0.01 0.07 (Small)
Delete Method vs Add Parameter < 0.01 -0.39 (Medium)
Delete Method vs Change Parameter Name < 0.01 -0.36 (Medium)
Delete Parameter vs Add Parameter < 0.01 -0.33 (Medium)
Delete Parameter vs Change Parameter Name < 0.01 -0.29 (Medium)
values: small for d < 0.33 (positive as well as negative values), medium for
0.33 ≤ d < 0.474 and large for d ≥ 0.474.
Findings. Fig. 2 shows that the change type of Add Method draws average 63
questions per change which is higher than other change types. Add Method draw
1.3 times more questions than Delete Method, with a statistically significant dif-
ference (p-value< 0.01) shown in Table 8. Furthermore, the method-level change
types trigger more questions that parameter-level change types. Summarizing
results in Fig. 2 and Table 8, we find that
Add Method draws more questions than other change types.
RQ3. Which types of changes bring discussion on posted questions
from developers?
Motivation. When a question is worth discussing and attracting developers’
attention due to various reasons (e.g., the question is hard to be solved), the
question post starts receiving more answers, comments and views from develop-
ers. Identifying the change types drawing such questions is helpful to understand
which change types are more related to developers. In RQ2, the impact of change
256 S. Wang, I. Keivanloo, and Y. Zou
Table 10. Discussed Questions of Change Types: Manny-Whitney Test (adj. p-value)
and Cliff’s Delta (d) Between Different Change Types. Only Significant Results are
reported.
Average Score Test adj. p-value d
Delete Method vs Add Method <0.01 -0.92. (large)
Delete Method vs Add Parameter <0.01 -3.41 (large)
Delete Method vs Change Parameter Name <0.01 -2.44 (large)
Add Method vs Add Parameter <0.01 -2.93 (large)
Average View Count Test adj. p-value d
Add Method vs Delete Method <0.01 -1.21 (large)
Add Method vs Add Parameter <0.01 -2.37 (large)
Add Method vs Change Method Name <0.01 -2.62 (large)
Delete Method vs Delete Parameter <0.01 -1.83 (large)
Average Answer Count Test adj. p-value d
Delete Method vs Add Method <0.01 -1.01 (large)
Delete Method vs Add Parameter <0.01 -3.73 (large)
Delete Method vs Change Method Name <0.01 -3.86 (large)
Delete Method vs Delete Parameter <0.01 -1.82 (large)
Add Method vs Change Method Name <0.01 -2.32 (large)
Add Method vs Add Parameter <0.01 -2.22 (large)
Add Method have a higher view count than those related to the other types.
Table 10 shows that statistical tests confirm the above three findings. The results
suggest that when dealing with the change type of Delete Method, developers
can have more various solutions and more communication with other developers.
However, when learning new methods in the newer version, developers experience
a hard time to find a solution. Our study supports the fact that since deleted
methods can break client applications, client code developers feel the pressure
to update their client applications and start searching for a solution intensively.
Questions related to Delete Method are most relevant and discussed in
terms of higher score values and more answers.
4 Threats to Validity
This section discusses the threats to validity of our study following the guidelines
for case study research [16].
Construct validity threats concern the relation between theory and observa-
tion. In this paper, the construct validity threats are mainly from the human
judgment involved in identification and categorization of API changes during
web API evolution. Many research studies (e.g., [3][10]) have conducted manual
analysis of API changes. We set guidelines before we conduct manual study and
we paid attention not to violate any guidelines to avoid the big fluctuation of
results with a change of the experiment conductor.
External validity threats concern the generalization of our findings. In this pa-
per, we only analyze the dataset from StackOverflow. Although StackOverflow
is one of the top Questions&Answer websites for developers and many research
studies (e.g., [11][1]) have been conducted on only StackOverflow, further analy-
sis is desired to claim that our findings of reactions of developers are generalized
well for different Questions&Answer websites, different developer population and
other forums for programming.
Reliability validity threats concern the possibility of replicating this study. We
attempt to provide all the necessary details to replicate our study. The posts
from developers are publicly available on Stack Exchange Data Explorer6. All
the documentation of our subject web APIs are available on-line.
5 Related Work
In this section, we summarize the related work on API changes and developer
discussion in StackOverflow.
Analysis of evolution of Java APIs, WSDL services, and web APIs
Several studies (e.g., [10][3][5]) have studied the API evolution. Li et al. [10]
conduct an empirical study on classifying web API changes. They identify 16
API change patterns. This study is the most similar one to our analysis in re-
search question 1. However our study is based on more web APIs and identifies
7 more change types. Although Li et al. [10] discuss the potential troubles from
6
https://data.stackexchange.com/
258 S. Wang, I. Keivanloo, and Y. Zou
developers during the migration, they do not conduct any empirical study on the
developers reactions to the changes. Dig et al. [3] conduct a manual analysis on
classification of API changes of Java API evolution. They mainly focus on break-
ing changes due to the refactoring during the evolution. Fokaefs et al. [5] con-
duct an empirical study on the changes, potentially affecting client applications,
of WSDL web service interface evolution using VTracker to differentiate XML
schema by comparing different versions of web service. Furthermore, Fokaefs et
al. [6] introduce a domain-specific differencing method called WSDarwin to com-
pare interfaces of web services described in WSDL or WADL. Romano et al. [13]
propose a tool called WSDLDiff analyzing fine-grained changes by comparing
the versions of WSDL interfaces. However all of the above studies do not ana-
lyze changes of web API evolution and how developers react to the API evolution.
In the future, we plan to include more web APIs in our analysis. Further-
more, we want to conduct fine-grained analysis on source code changes of client
applications.
Acknowledgments. The authors would like to thank Pang Pei and Nasir Ali
for their valuable comments on this work.
References
1. Barua, A., Thomas, S.W., Hassan, A.E.: What are developers talking about?
an analysis of topics and trends in stack overflow. Empirical Software Engineer-
ing 19(3), 619–654 (2014)
2. Blank, S.: API integration pain survey results (2014), https://www.yourtrove.
com/blog/2011/08/11/api-integration-pain-survey-results (accessed on
May 18, 2014)
3. Dig, D., Johnson, R.: How do apis evolve? a story of refactoring. Journal of software
maintenance and evolution: Research and Practice 18(2), 83–107 (2006)
4. Espinha, T., Zaidman, A., Gross, H.G.: Web api growing pains: Stories from client
developers and their code. In: 2014 Software Evolution Week-IEEE Conference on
Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE),
pp. 84–93. IEEE (2014)
5. Fokaefs, M., Mikhaiel, R., Tsantalis, N., Stroulia, E., Lau, A.: An empirical study
on web service evolution. In: 2011 IEEE International Conference on Web Services
(ICWS), pp. 49–56. IEEE (2011)
6. Fokaefs, M., Stroulia, E.: Wsdarwin: Studying the evolution of web service systems.
In: Advanced Web Services, pp. 199–223. Springer (2014)
7. Gomadam, K., Ranabahu, A., Nagarajan, M., Sheth, A.P., Verma, K.: A faceted
classification based approach to search and rank web apis. In: IEEE International
Conference on Web Services, ICWS 2008, pp. 177–184. IEEE (2008)
8. Grissom, R.J., Kim, J.J.: Effect sizes for research: A broad practical approach.
Lawrence Erlbaum Associates Publishers (2005)
9. Li, H., Xing, Z., Peng, X., Zhao, W.: What help do developers seek, when and how?
In: 2013 20th Working Conference on Reverse Engineering (WCRE), pp. 142–151.
IEEE (2013)
10. Li, J., Xiong, Y., Liu, X., Zhang, L.: How does web service api evolution affect
clients? In: 2013 IEEE 20th International Conference on Web Services (ICWS),
pp. 300–307. IEEE (2013)
11. Linares-Vásquez, M., Bavota, G., Di Penta, M., Oliveto, R., Poshyvanyk, D.: How
do api changes trigger stack overflow discussions? a study on the android sdk. In:
Proceedings of the 22nd International Conference on Program Comprehension, pp.
83–94. ACM (2014)
12. Mamykina, L., Manoim, B., Mittal, M., Hripcsak, G., Hartmann, B.: Design lessons
from the fastest q&a site in the west. In: Proceedings of the SIGCHI Conference
on Human Factors in Computing Systems, pp. 2857–2866. ACM (2011)
13. Romano, D., Pinzger, M.: Analyzing the evolution of web services using fine-grained
changes. In: 2012 IEEE 19th International Conference on Web Services (ICWS),
pp. 392–399. IEEE (2012)
14. Twitter: Changes to withheld content fields (2014), https://blog.twitter.com/
2012/changes-withheld-content-fields (accessed on May 1, 2014)
15. Wikipedia: Web API (2014), http://en.wikipedia.org/wiki/Web_API (accessed
on May 19, 2014)
16. Yin, R.K.: Case study research: Design and methods. Sage Publications (2014)
How to Enable Multiple Skill Learning
in a SLA Constrained Service System?
1 Introduction
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 260–274, 2014.
c Springer-Verlag Berlin Heidelberg 2014
How to Enable Multiple Skill Learning 261
This need for multi-faceted workers entails not only retaining the right skills,
but also transforming the skills of the workers as dictated by the changing busi-
ness requirements. For example, in the IT services domain, it may so happen
that due to a transformation in the customer’s environment, a provider has to
quickly upskill his team. The current team of 10 people who only had expertise
in the Solaris operating system needs to be transformed to a team where both
the operating systems of Windows and Solaris need to be supported. While one
option for the provider is to replace some of the Solaris personnel with new hires
having Windows skills, a better option is to impart new skills to existing SWs
such that they collectively meet the target skill requirements.
There are several approaches for imparting new skills: (a) class room training,
where SWs dedicate training time for a certain duration and incur costs, (b)
shadowing, where SWs observe the work of skilled SWs and learn, or (c) on-job
training, where SWs pick up skills while actually doing the work. The nature
of work in services involves substantial interactions not only with the customer
but also with colleagues. Also, carrying out a task is far more difficult than
simply knowing how to carry out a task. Hence, on-job training following minimal
classroom training is the approach commonly adopted by service providers. As
of today, very little understanding exists on how the on-job training should
be carried out. For example, how does the skill of a SW evolve when one or
multiple new learnings are imparted ? Does this evolution of target skills change
when (s)he already has some existing skills ? How do multiple learnings interfere
with each other ? Can parallel learnings also reinforce ? How should the on-job
training be planned and carried out such that impact to customer service in
terms of service level agreement (SLA) is minimized ?
We have addressed the problem of incorporating on-job training in IT busi-
ness process in this paper. This internalizes many of the questions raised above
for on-job training. Our main contributions are:
1) We have developed an on-job training model based on the Dreyfus model
of skill acquisition [10], the Learn-Forget-Curve-Model(LFCM)[15] and theory
of interference in learning [19]. This model can be used to create a standalone
training process or embedded into existing business processes. The main com-
ponents of the model are service time estimation model, skill distri-
bution policies and finally the dispatch heuristics.
2) The on-job training model has been woven into the IT incident manage-
ment(ITIM) business process as a case study.
3) We have carried out an evaluation of the proposed model using discrete event
simulation.
The evaluation focuses on understanding (i) the role of interference and skill
multiplicity while imparting training for multiple skills simultaneously and (ii)
how do dispatch(work assignment) policies influence learning. The rest of the
paper is organized as follows. Section 2 describes the learning model based on
service times during on-job training. Section 3 explains the skill distribution
and dispatch heuristics components. Section 4 explains how the on-job training
262 S. Kalra, S. Agarwal, and G. Dasgupta
components get integrated into a business process. The evaluation of the training
model as a part of business process is presented in section 4.1 and 4.2. Related
work is discussed in section 5 and we conclude in section 6.
On-job training seems to be an effective way to bridge the gaps between the
new and existing skills. In this scenario, a service worker gets to work on tasks
which require the specific new skills (s)he is expected to be upskilled on and im-
provement in service times is the main observable measure to quantify learning.
While initially the tasks will take longer to complete, as (s)he works on them the
service time to complete tasks become smaller. In specific, authors in [15] have
shown that the reduction in service time with experience follows the power-law 1 .
However if there exists breaks between the new-skill tasks assigned to a worker,
forgetting may happen. Also if multiple new skills are being learnt by a worker,
learning interference may creep in among the multiple skills. Both forgetting and
interference slow down the learning process and affect the service time. Keeping
this in mind, the service time model has been designed drawing upon the ex-
isting work on learning and skill acquisition namely, LFCM and Dreyfus model
respectively. We briefly explain the factors that play a role in the service time
estimation below.
Learning Effect on Service Time: During on-job training, when people ini-
tially take-on new skill work, service times are longer. Assuming the difference
between skills could be mapped to a gap function, we state that larger the gap
between the skills, longer becomes the service times. This is modeled as gap
learning factor or glf.
Forgetting Effect on Service Time: Time gaps between task executions
[15]cause forgetting, which in turn has the effect of longer service times. Forget-
ting is proportional to the time gap [13].
Interference Effect on Service Time: When a service worker works on mul-
tiple new skills within the same span of time, the learning of these new skills
interfere with each other. This interference results in lower recall accuracy of
other skills [19] and hence in longer service times.
Skill Level Gap Effect on Service Time: Dreyfus model [10] of skill acqui-
sition models the progression levels as Novice, Advanced beginner, Competent,
Proficient and Expert. The interpretation of each level has been provided in
terms of qualitative translation of each level to the task performance. This model
is very appropriate for on-job training. The service times are least at the expert
level and highest at the beginner’s level. The time taken by a SW at any level to
complete an SR is stochastic and is shown [1] to follow a lognormal distribution
for a single skill.
1
While this is true for manufacturing, the same principle can be applied to any in-
dustry where there is rhythmic and repeatable work, for example, IT service man-
agement.
How to Enable Multiple Skill Learning 263
We now present a service time model that takes into account the above factors.
This represents the skill progression model of a worker as multiple new learnings
are imparted to her.
Let Ts be the service time required by a SW for a SR with particular skill
requirement while working for nth , n > 1, time on the same skill where the SW is
working on the skill after a time gap. TBS is the base service time which denotes
the time taken by service worker when working on the skill for the first time.
TBS is defined for each SW skill level. Let dist be the gap between required skill
level of the SR and the current skill level possessed by SW. If the latter is higher
or equal, dist is 0. The base service time is computed as TBS (1 + log(1 + dist)).
Equations 1, 2 and 3 show the learning model while factoring in the time gap
[13] only. The timeGap is the time spent on resolving SRs with other skills and
the timeU sed is the time spent on resolving SRs with relevant skill. The learning
factor (lf ) is a constant [15] which depends the learning pace of the SW. The
gap learning factor (glf ) incorporates the lf and γ, 0 ≤ γ < 1, which is function
of timeGap and timeU sed.
glf = lf ∗ (1 − γ) (2)
There has been sufficient evidence in the literature to indicate that interference
also causes forgetting. To include the interference in this model, we assumed
that the effect of interference is equivalent to stretched time gap. To include the
interference in this model, we used the results from [7] that show that the effect
of interference is equivalent to stretched time gap and modify the Equation 1 as
Equation 4.
(timeGap+interf erenceMeter)
log(1 + timeUsed )
γ= (4)
log n
Interf erenceM eter keeps the track of number of times the SW has worked
on other interfering skills since last worked on the current skill. Each increment
denotes a unit of time. This meter is reset to zero every time when SW works
on the skill. If a SW works on the interfered skill less often, then the effect of
interfering skills are more and vice-versa. However, as n increases, the impact of
forgetting and interference reduces.
The quantitative model for skill progression corresponding to the Dreyfus
qualitative model is obtained using time and motion studies. These studies pro-
vide a threshold on quantum of work to be done for being eligible to move to
next skill level. We assume in this work that the SWs are provided basic class-
room training for skills that they have never worked on before to make on-job
training feasible. We shall now describe how to carry out on-job training.
264 S. Kalra, S. Agarwal, and G. Dasgupta
3.2 Dispatching
SRSLAT ime denotes the timestamp by which the SR should be completed in order
to meet SLA, SRcompT ime denotes the timestamp when the SR got completed and
SLAremain denotes the time remaining to meet the specified SLA.The positive
value of SLAremain indicates an SLA success otherwise SLA miss.The SRcompT ime
is dependent on the expected service time of the SW who is working on it.
higher chances of increasing the skill level of the service worker at the cost of
increasing the probability of missing SLA.
Algorithm 2 formally describes the Learning Priority policy for assigning a
service request SR to assign appropriate service worker w among the pool of
available service workers SW List. Initially Learning Priority policys checks for
all the service workers which have expected service time less than remaining
service time and not overloaded. Among them, it finds the least loaded service
worker w with highest value of worst case service time maxW orstServiceT ime.
SLA Priority Policy: The work dispatch based on SLA priority policy mimics
on ground reality of existing service systems. It basically dispatches the SR to
the first available SW who has skills to carry out the work. There is no other
consideration like skill level, learning progress etc. Note that SLA Priority policy
does not compute expected service time according to the learning curve model
of section 2 while choosing the SW as is done by Skill-Level or Learning Priority
policy thus differing from them in a crucial way.
problem described by the user. The incident, once assigned to a work group, is
picked up by an available resource within the work group who then updates the
assignment information indicating the ownership of the incident. The incident
enters the resolution stage. The resource further analyzes the problem in the
ticket, communicates to the business user for more input on the problem, and
resolves the problem. Once an incident is resolved, the resource restores the
functionality of the system as required by the business user. The business user
validates and confirms the service provided by the resource. Once confirmed by
the business user, the incident is closed.
The extension to the existing ITIM process for training is primarily in the
dispatch task. The incident, once assigned to a group, is assessed by a dispatch
engine for the skills that it requires and the load on the SWs that have been
identified to work on those skills. Subsequently, the expected service time for the
shortlisted SWs is computed according to the model in section 2. Then, after
considering the SLA requirements, the engine selects an incident owner following
one of the proposed policies and the incident is dispatched. Once the incident is
resolved, the parameters that track the learning of the SWs are updated and so
is the SLA measurement. The process to initiate the closure of the incident is
also initiated in parallel.
are assumed to be: as long as a provider completes 95% of all SRs received every
month within specified hours, the quality of service is deemed adequate. We also
assume that each SR requires single skill like unix, windows, db2 etc. The main
components are described below.
Global Queue: A global queue is maintained which accepts all the incoming
SRs with different priority and different skill requirement. For every SR, we
maintain the information such as priority, SLA deadline, skill and corresponding
level requirement and status as inqueue (default), pending, inservice, rework
or completed. This global queue serves as the input to the dispatching module.
Dispatch Module: The dispatching module accepts SRs from global queue one
by one and uses list of all the SWs in order to search for the most suitable SW
for the current SR based on a policy described in section 3. A policy remains in
force for the period of simulation (say, a month). After identifying the SW, SR is
sent to its queue and the status of the SR is changed from inqueue to pending.
The SRP endingQueueLoad is updated as described in equation 6.
Service Workers’ Queue: A service worker’s queue can have SRs of skill levels
different than his current level. The load value in such a situation is normalized
by having more complex SRs contribute more to the load than the lower level
ones. We assume the normal load of a SR for SW is equal to 20 and the Equation
6 is used to calculate the load due to different levels.
Let curload of a SW denote the load due to pending SRs in the queue. We
calculate whether a SW is overloaded or not as follows:
yes if curload ≥ 100
overloaded =
no otherwise
Observation 1 (On Skill Load vs. Interference): Skill load and interference
are equally strong deterrents in learning. We carried out experiments with a
target skill profile having high number of skills to be learned uniformly such
that the skills do not interfere (Table 3 Scheme S-2) and compared the learning
time with a target profile where the skills to be learned do not exceed two but
these skills interfere with each other (Table 3 Scheme S-3). In both scenarios, we
find the similar pattern of skill level progression. In Scheme S-1, we kept both
the skill load and interference load to low. The entries in the table show the
percentage of SWs at the levels L1 to L4 at starting of weeks 1, 10, 20 and 40.
These numbers are of Learning First Policy. However, the observation holds in
the other two policies also.
Fig. 2. Skill progression: High/Low skill load and high/low interference load
demonstrates this for two types of workloads. It can be seen in Learning Pri-
ority policy that at any given time more than 90% SWs are distributed at two
consecutive levels but this is not so for the other policies. This is indicative of
uniform learning where the level gap between SWs is not too much at any given
point in time. Skill-Level Priority, however, follows non-uniform learning and
SWs reach the highest level faster compared to the other policies. SLA Priority
is neither uniform nor greedy. The simulations were run for all combinations of
workload and the same trends were observed. We have presented results only for
two distributions for sake of brevity.
Applying the Insights: The insights obtained above from simulation runs can
be applied in practice by SSs to achieve desired behavior. We summarize some of
the important practical considerations that emerged from the experiments: i) SS
should adopt Learning Priority policy if uniform learning is more desirable, ii)
If the goal is to promote a competitive environment, Skill-Level Priority policy
is most advisable provided SLAs are relaxed, iii) For efficient learning, the new
skills to be learned per worker should be minimized; and an attempt should be
made to minimize the interfering skills to be learnt per worker.
5 Related Work
In this section, we situate our work within prior research on team and organiza-
tional learning theories, resource planning, human skill evolution and learning.
One of the most recent works that studies multi-skill requirements in service
delivery is [8]. This work studies the problem of optimal skills to train people
on while in this paper we have studied how to train people on multiple skills.
Learning has also been looked at in the context of human resource planning [4],
[3], where there is a need to forecast the future skill mix and levels required, as
well as in context of dynamic environments like call centers[12], where both and
learning and turnover are captured to solve the long and medium term staffing
problem.
There has been a significant body of work focused on teams and their learn-
ings. About two decades back researchers[25,11] studied the effects of organi-
zational structure (i.e. hierarchy, team etc.) on metrics like problem solving,
cost, competition and drive for innovation and also the effect [6] of learning and
turnover on different structures. At the same time, collaborations and communi-
cation with teams have also seen a comprehensive body of research. Carley’s [5]
theory of group stability postulates a relationship between individual’s current
knowledge and her behavior. She also found that a group’s interaction increases
How to Enable Multiple Skill Learning 273
6 Conclusions
We conclude that distribution and dispatch policies play a crucial role in balanc-
ing SLA success and upskilling when performing on-job training. The presence
of interference slows down the learning rate and so does the number of skills to
be learned. As part of future work, we plan to formalize interference model and
study semantic facilitation during training. We also plan to study the training
method in context of other business processes where on-job training is practiced.
References
8. Dasgupta, G.B., Sindhgatta, R., Agarwal, S.: Behavioral analysis of service delivery
models. In: Proceedings of the Service-Oriented Computing - 11th International
Conference, ICSOC 2013, Berlin, Germany, December 2-5, pp. 652–666 (2013)
9. Dibbern, J., Krancher, O.: Individual knowledge transfer in the transition phase of
outsourced softwaremaintenance projects. In: ISB-IBM Service Science Workshop
(2012)
10. Dreyfus, S.E., Dreyfus, H.L.: A Five-Stage Model of the Mental Activities Involved
in Directed Skill Acquisition. Tech. rep. (February 1980), http://stinet.dtic.
mil/cgi-bin/GetTRDoc?AD=ADA084551&Location=U2&doc=GetTRDoc.pdf
11. Jablin, F.M., Putnam, L.L., Roberts, K.H., Porter, L.W. (eds.): Handbook of Or-
ganizational Communication: An Interdisciplinary Perspective. Sage (1986)
12. Gans, N., Zhou, Y.P.: Managing learning and turnover in employee staffing. Oper.
Res. 50(6) (2002)
13. Jaber, M.Y., Bonney, M.: Production breaks and the learning curve: The forget-
ting phenomenon. Applied Mathematical Modelling 20(2), 162–169 (1996), http://
www.sciencedirect.com/science/article/pii/0307904X9500157F
14. Jaber, M.Y., Kher, H.V., Davis, D.J.: Countering forgetting through training and
deployment. International Journal of Production Economics 85, 33–46 (2003)
15. Jaber, M.Y., Sikstrom, S.: A numerical comparison of three potential learning and
forgetting models. International Journal of Production Economics 92(3) (2004)
16. Knox, W.B., Stone, P.: Reinforcement learning from simultaneous human and
mdp reward. In: Proceedings of the 11th International Conference on Autonomous
Agents and Multiagent Systems, AAMAS 2012, vol. 1 (2012)
17. Liemhetcharat, S., Veloso, M.: Modeling and learning synergy for team formation
with heterogeneous agents. In: Proceedings of the 11th International Conference on
Autonomous Agents and Multiagent Systems, AAMAS 2012, vol. 1, pp. 365–374
(2012)
18. Liu, R., Agarwal, S., Sindhgatta, R.R., Lee, J.: Accelerating collaboration in task
assignment using a socially enhanced resource model. In: Daniel, F., Wang, J.,
Weber, B. (eds.) BPM 2013. LNCS, vol. 8094, pp. 251–258. Springer, Heidelberg
(2013)
19. Jan Mensink, G., Raaijmakers, J.G.W.: A model for interference and forgetting.
Psychological Review, 434–455 (1988)
20. Nembhard, D.A., Uzumeri, M.V.: Experiential learning and forgetting for manual
and cognitive tasks. International Journal of Industrial Ergonomics 25, 315–326
(2000)
21. Sikstrom, S., Jaber, M.Y.: The power integration diffusion (pid) model for produc-
tion breaks. Journal of Experimental Psychology 8, 118–126 (2002)
22. Spohrer, J., Maglio, P., Bailey, J., Gruhl, D.: Steps toward a science of service
systems. Computer 40(1), 71–77 (2007)
23. Subagdja, B., Wang, W., Tan, A.H., Tan, Y.S., Teow, L.N.: Memory formation,
consolidation, and forgetting in learning agents. In: Proceedings of the 11th In-
ternational Conference on Autonomous Agents and Multiagent Systems, AAMAS
2012, vol. 2 (2012)
24. Urieli, D., MacAlpine, P., Kalyanakrishnan, S., Bentor, Y., Stone, P.: On optimizing
interdependent skills: A case study in simulated 3d humanoid robot soccer. In:
Proc. of 10th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS
2011) (May 2011)
25. Williamson, O.E.: The economics of organization: The transaction cost approach.
American Journal of Sociology (1981)
ADVISE – A Framework for Evaluating Cloud
Service Elasticity Behavior
1 Introduction
One of the key features driving the popularity of cloud computing is elasticity,
that is, the ability of cloud services to acquire and release resources on-demand, in
response to runtime fluctuating workloads. From customer perspective, resource
auto-scaling could minimize task execution time, without exceeding a given bud-
get. From cloud provider perspective, elasticity provisioning contributes to max-
imizing their financial gain while keeping their customers satisfied and reducing
administrative costs. However, automatic elasticity provisioning is not a trivial
task.
A common approach, employed by many elasticity controllers [1, 2] is to
monitor the cloud service and (de-)provision virtual instances when a metric
threshold is violated. This approach may be sufficient for simple service models
This work was supported by the European Commission in terms of the CELAR FP7
project (FP7-ICT-2011-8 #317790).
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 275–290, 2014.
c Springer-Verlag Berlin Heidelberg 2014
276 G. Copil et al.
but, when considering large-scale distributed cloud services with various inter-
dependencies, a much deeper understanding of its elasticity behavior is required.
For this reason, existing work [2, 3] has identified a number of elasticity control
processes to improve the performance and quality of cloud services, while addi-
tionally attempting to minimize cost. However, a crucial question still remains
unanswered: which elasticity control processes are the most appropriate for a
cloud service in a particular situation at runtime? Both cloud customers and
providers can benefit from insightful information such as how the addition of a
new instance to a cloud service will affect the throughput of the overall deploy-
ment and individually of each part of the cloud service. Thus, cloud service elas-
ticity behavior knowledge under various controls and workloads is of paramount
importance to elasticity controllers for improving runtime decision making.
To this end, a wide range of approaches relying on service profiling or learning
from historic information [3–5] have been proposed. However, these approaches
limit their decisions to evaluating only low-level VM metrics (e.g., CPU and
memory usage) and do not support elasticity decisions based on cloud service
behavior at multiple levels (e.g., per node, tier, entire service). Additionally, cur-
rent approaches only evaluate resource utilization, without considering elasticity
as a multi-dimensional property composed of three dimensions (cost, quality, and
resource elasticity). Finally, existing approaches do not consider the outcome of
a control process on the overall service, where often enforcing a control process
to the wrong part of the cloud service, can lead to side effects, such as increas-
ing the cost or decreasing performance of the overall service. In our previous
work, we focused on modeling current and previous behavior with the concepts
of elasticity space and pathway [6], or using different algorithms to determine
enforcement times in observed behavior (e.g., with change-point detection), but
without modeling expected behavior of different service parts, in time.
In this paper, we focus on addressing the limitations above by introducing the
ADVISE (evAluating clouD serVIce elaSticity bEhavior ) framework, which esti-
mates cloud service elasticity behavior by utilizing different types of information,
such as service structure, deployment strategies, and underlying infrastructure
dynamics, when applying different external stimuli (e.g., elasticity control pro-
cesses). At the core of ADVISE is a clustering-based evaluation process which
uses these types of information for computing expected elasticity behavior, in
time, for various service parts. To evaluate ADVISE effectiveness, experiments
were conducted on a public cloud platform with a testbed comprised of two dif-
ferent cloud services. Results show that ADVISE outputs the expected elasticity
behavior, in time, for different services with a low estimation error rate. ADVISE
can be integrated by cloud providers alongside their elasticity controllers to im-
prove their decision quality, or used by cloud service providers to evaluate and
understand how different elasticity control processes impact their services.
The rest of this paper is structured as follows: in section 2 we model rele-
vant information regarding cloud services. In section 3, we present the elasticity
behavior evaluation process. In section 4, we evaluate ADVISE framework effec-
tiveness. In section 5 we discuss related work. Section 6 concludes this paper.
ADVISE – A Framework for Evaluating Cloud Service Elasticity Behavior 277
MaSPi [start, end] = {Ma (tj )|SPi ∈ ServiceP arts, j = start, end} (1)
BehaviorSPi [start, end] = {MaSPi [start, end]|M a
∈ M etrics(SPi )} (2)
BehaviorCloudService [start, end] = {BehaviorSPi [start, end]|SPi ∈
ServiceP arts(CloudService)} (3)
Elasticity capabilities (ECs) are the set of actions associated with a cloud service,
which a cloud service stakeholder (e.g., an elasticity controller) may invoke, and
which affect the behavior of a cloud service. Such capabilities can be exposed
by: (i) different SP s, (ii) cloud providers, or (iii) resources which are supplied by
cloud providers. An EC can be considered as the abstract representation of API
calls, which differ amongst providers and cloud services. Fig. 2 depicts the dif-
ferent subsets of ECs provided for an exemplary web application when deployed
on two different cloud platforms (e.g., Flexiant, and Openstack private cloud),
as well as the ECs exposed by the cloud service and the installed software. In
each of the two aforementioned cloud platforms, the cloud service needs to run
on a specific environment (e.g., Apache Tomcat web server), and all these capa-
bilities, when enforced by an elasticity controller, will have an effect on various
parts of the cloud service. For instance, even if not evident at first sight, elas-
ticity capabilities of a web server topology of the cloud service could also affect
the performance of its database backend.
ADVISE – A Framework for Evaluating Cloud Service Elasticity Behavior 279
regardless on whether the ECP is application specific, or it does not have any
apparent link to other SP s. In fact, as we show in Section 4, the impact of various
ECP s over different SP s and over the entire cloud service is quite interesting.
δ + ECPtime δ + ECPtime
SPi
RT SM = MaSPi [x − ,x + ], (4)
a
2 2
δ + ECPtime δ + ECPtime
[ECPstartT ime , ECPendT ime ] ⊂ [x − ,x + ]
2 2
, where x is the ECP index and δ is the length of the period we aim to evaluate.
As part of the input pre-processing phase, we represent δ + ECPtime as multi-
dimensional points, BP in Equation 5, in the n-dimensional Euclidian space (see
Fig. 5), where the value for dimension t(j) is the timestamp j of current RT S.
SPi
BPaSPi [j] = RT SMa
[t(j)], j = 0, ..., n, BP : M SP → Rn , n = δ + ECPtime (5)
4 Experiments
To evaluate the effectiveness of the proposed approach, we have developed the
ADVISE framework1 which incorporates the previously described concepts. Cur-
rent ADVISE version gathers various types of information to populate the elastic-
ity dependency graph, such as: (i) Structural information, from TOSCA service
descriptions; (ii) Infrastructure and application performance information from
JCatascopia [9] and MELA [6] monitoring systems; (iii) Elasticity information
regarding ECPs from the rSYBL [8] elasticity controller where we developed an
enforcement plugin to randomly enforce ECPs on cloud services. To evaluate
the functionality of the ADVISE framework, we established a testbed comprised
of two services deployed on the Flexiant public cloud. On both cloud services,
we enforce random ECP s exposed by different SP s. We do not use a rational
controller, since we are interested in estimating the elasticity behavior for all
SP s as a result of enforcing both good and bad elasticity control decisions.
ADVISE currently receives monitoring information in two formats: (i) as
simple *.csv files, or (ii) automatically pulling monitoring information from
MELA. ADVISE can be used both in service profiling/pre-deployment phase or
during runtime, for various service types, whenever monitoring information and
enforced ECP s are available for generating estimations for various metrics of
service parts.
1
Code & documents: http://tuwiendsg.github.io/ADVISE
ADVISE – A Framework for Evaluating Cloud Service Elasticity Behavior 283
Table 1. Elasticity control processes available for the two cloud services
Cloud ECP Action Sequence
ServiceId
ECP1 Scale In Application Server Tier: (i) stop the video streaming service, (ii)
Video remove instance from HAProxy, (iii) restart HAProxy, (iv) stop JCatas-
Service copia Monitoring Agent, (v) delete instance
ECP2 Scale Out Application Server Tier: (i) create new network interface, (ii)
instantiate new virtual machine, (ii) deploy and configure video streaming
service, (iv) deploy and start JCatascopia Monitoring Agent, (v) add
instance IP to HAProxy, (vi) restart HAProxy
ECP3 Scale In Distributed Video Storage Backend: (i) select instance to remove,
(ii) decommission instance data to other nodes (using Cassandra nodetool
API), (iii) stop JCatascopia Monitoring Agent, (iv) delete instance
ECP4 Scale Out Distributed Video Storage Backend: (i) create new network in-
terface, (ii) instantiate new instance, (iii) deploy and configure Cassandra
(e.g., assign token to node), (iv) deploy and start JCatascopia Monitoring
Agent, (v) start Cassandra
ECP5 Scale In Event Processing Service Unit: (i) remove service from HAProxy,
M2M (ii) restart HAProxy, (iii) remove recursively virtual machine
DaaS ECP6 Scale Out Event Processing Service Unit: (i) create new network inter-
face, (ii) create new virtual machine, (iii) add service IP to HAProxy
configuration file
ECP7 Scale In Data Node Service Unit: (i) decommision node (copy data from
virtual machine to be removed), (ii) remove recursively virtual machine
ECP8 Scale Out Data Node Service Unit: (i) create new network interface, (ii)
create virtual machine, (iii) set ports, (iv) assign token to node, (v) set
cluster controller, (vi) start Cassandra
The first cloud service is a three-tier web application providing video streaming
services to online users, comprised of: (i) an HAProxy Load Balancer which dis-
tributes client requests (i.e., download, or upload video) across application servers;
(ii) An Application Server Tier, where each application server is an Apache Tom-
cat server containing the video streaming web service; (iii) A Cassandra NoSQL
Distributed Data Storage Backend from where the necessary video content is re-
trieved. We have evaluated the ADVISE framework by generating client requests
under a stable rate, where the load depends on the type of the requests and the
size of the requested video, as shown in the workload pattern in Fig.6.
The second service in our evaluation is a Machine-to-Machine (M2M) DaaS
which processes information originating from several different types of data sen-
sors (e.g., temperature, atmospheric pressure, or pollution). Specifically, the
M2M DaaS is comprised of an Event Processing Service Topology and a Data End
Service Topology. Each service topology consists of two service units, one with
a processing goal, and the other acting as the balancer/controller. To stress this
cloud service we generate random sensor event information (see Fig. 6) which is
processed by the Event Processing Service Topology, and stored/retrieved from
284 G. Copil et al.
the Data End Service Topology. Tables 1 and 2 list the ECP s associated to each
SP and the monitoring metrics analyzed for the two cloud services respectively.
Online Video Streaming Service. Fig. 7 depicts both the observed and the
estimated behavior for the Application Server Tier of the cloud service when a re-
move application server from tier ECP occurs (ECP1 ). At first, we observe that
the average request throughput per application server is decreasing. This
is due to two possible cases: (i) the video storage backend is under-provisioned
and cannot satisfy the current number of requests which, in turn, results in
requests being queued; (ii) there is a sudden drop in client requests which in-
dicates that the application servers are not utilized efficiently. We observe that
after the scale in action occurs, the average request throughput and
busy thread number rises which denotes that this behavior corresponds to
the second case where resources are now efficiently utilized. ADVISE revealed
an insightful correlation between two metrics to consider when deciding which
ECP to enforce for this behavior.
Similarly, in Fig. 8 we depict both the observed and the estimated behavior
for the Distributed Video Storage Backend when a scale out action occurs
(add Cassandra node to ring) due to high CPU utilization. We observe that
after the scale out action occurs, the actual CPU utilization decreases
to a normal value as also indicated by the estimation. Finally, from Fig. 7 and 8,
ADVISE – A Framework for Evaluating Cloud Service Elasticity Behavior 285
we conclude that the ADVISE estimation successfully follows the actual behavior
pattern and that in both cases, as time passes, the curves tend to converge.
M2M DaaS. Fig. 9 shows how an ECP targeting a service unit affects the
entire cloud service. The Cost/Client/h is a complex metric (see Table 2)
which depicts how profitable is the service deployment in comparison to the
current number of users. Although Cost/Client/h is not accurately estimated,
due to the high fluctuation in number of clients, our approach approximates how
the cloud service would behave in terms of expected time and expected metric
fluctuations. This information is important for elasticity controllers to improve
their decisions when enforcing this ECP by knowing how the Cost/Client/h
for the entire cloud service would be affected. Although the CPU usage is not
estimated perfectly, since it is a highly oscillating metric, and it depends on the
286 G. Copil et al.
CPU usage at each service unit level, knowing the baseline of this metric can
also help in deciding whether this ECP is appropriate (e.g., for some applications
CPU usage above 90% for a period of time might be inadmissible).
ADVISE can estimate the effect of an ECP of a SP , on a different SP , even if
apparently unrelated. Fig. 10 depicts an estimation on how the Data Controller
Service Unit is impacted by the data transferred at the enforcement of ECP8 .
In this case, the controller CPU usage drops, since the new node is added to
the ring, and a lot of effort goes for transferring data to the new node, then it
raises due to the fact that reconfigurations are also necessary on the controller,
following a slight decrease and stabilization. Therefore, even in circumstances of
random workload, ADVISE can give useful insights on how different SP s behave
when enforcing ECP s exposed by other SP s.
ADVISE – A Framework for Evaluating Cloud Service Elasticity Behavior 287
Overall, even in random cloud service load situations, the ADVISE framework
analyses and provides accurate information for elasticity controllers, allowing
them to improve the quality of control decisions, with regard to the evolution
of monitored metrics at the different cloud service levels. Without this kind of
estimation, elasticity controllers would need to use VM-level profiling informa-
tion, while they have to control complex cloud services. This information, for
each SP , is valuable for controlling elasticity of complex cloud services, which
expose complex control mechanisms.
5 Related Work
Verma et al. [3] study the impact of reconfiguration actions on system perfor-
mance. They observe infrastructure level reconfiguration actions, with actions on
live migration, and observe that the VM live migration is affected by the CPU
usage of the source virtual machine, both in terms of the migration duration
and application performance. The authors conclude with a list of recommen-
dations on dynamic resource allocation. Kaviani et al. [10] propose profiling as
a service, to be offered to other cloud customers, trying to find tradeoffs be-
tween profiling accuracy, performance overhead, and costs incurred. Zhang et
al. [4] propose algorithms for performance tracking of dynamic cloud applica-
tions, predicting metrics values like throughput or response time. Shen et al. [5]
propose the CloudScale framework which uses resource prediction for automat-
ing resource allocation according to service level objectives (SLOs) with mini-
mum cost. Based on resource allocation prediction, CloudScale uses predictive
ADVISE – A Framework for Evaluating Cloud Service Elasticity Behavior 289
migration for solving scaling conflicts (i.e. there are not enough resources for
accommodating scale-up requirements) and CPU voltage and frequency for sav-
ing energy with minimum SLOs impact. Compared with this research work, we
construct our model considering multiple levels of metrics, depending on the
application structure for which the behavior is learned. Moreover, the stress
factors considered are also adapted to the application structure and the elas-
ticity capabilities (i.e. action types) enabled for that application type. Juve et
al. [11] propose a system which helps at automating the provisioning process
for cloud-based applications. They consider two application models, one work-
flow application and one data storage case, and show how for these cases the
applications can be deployed and configured automatically. Li et al. [12] propose
CloudProphet framework, which uses resource events and dependencies among
them for predicting web application performance on the cloud.
Compared with presented research work, we focus not only on estimating the
effect of an elasticity control process on the service part with which it is associ-
ated, but on different other parts of the cloud service. Moreover, we estimate and
evaluate the elasticity behavior of different cloud service parts, in time, because
we are not only interested in the effect after a predetermined period, but also
with the pattern of the effect that the respective ECP introduces.
References
1. Al-Shishtawy, A., Vlassov, V.: Elastman: Autonomic elasticity manager for cloud-
based key-value stores. In: Proceedings of the 22nd International Symposium on
High-Performance Parallel and Distributed Computing, HPDC 2013, pp. 115–116.
ACM, New York (2013)
2. Wang, W., Li, B., Liang, B.: To reserve or not to reserve: Optimal online multi-
instance acquisition in IaaS clouds. Presented as part of the 10th International
Conference on Autonomic Computing, Berkeley, CA, USENIX, pp. 13–22 (2013)
3. Verma, A., Kumar, G., Koller, R.: The cost of reconfiguration in a cloud. In:
Proceedings of the 11th International Middleware Conference Industrial Track.
Middleware Industrial Track 2010, pp. 11–16. ACM, New York (2010)
4. Zhang, L., Meng, X., Meng, S., Tan, J.: K-scope: Online performance tracking for
dynamic cloud applications. Presented as part of the 10th International Conference
on Autonomic Computing, Berkeley, CA, USENIX, pp. 29–32 (2013)
290 G. Copil et al.
5. Shen, Z., Subbiah, S., Gu, X., Wilkes, J.: Cloudscale: elastic resource scaling for
multi-tenant cloud systems. In: Proceedings of the 2nd ACM Symposium on Cloud
Computing, SOCC 2011, pp. 5:1–5:14. ACM, New York (2011)
6. Moldovan, D., Copil, G., Truong, H.L., Dustdar, S.: Mela: Monitoring and analyz-
ing elasticity of cloud services. In: 2013 IEEE Fifth International Conference on
Cloud Computing Technology and Science, CloudCom (2013)
7. OASIS Committee Specification Draft 01: Topology and Orchestration Specifica-
tion for Cloud Applications Version 1.0 (2012)
8. Copil, G., Moldovan, D., Truong, H.-L., Dustdar, S.: Multi-level Elasticity Control
of Cloud Services. In: Basu, S., Pautasso, C., Zhang, L., Fu, X. (eds.) ICSOC 2013.
LNCS, vol. 8274, pp. 429–436. Springer, Heidelberg (2013)
9. Trihinas, D., Pallis, G., Dikaiakos, M.D.: JCatascopia: Monitoring Elastically
Adaptive Applications in the Cloud. In: 14th IEEE/ACM International Sympo-
sium on Cluster, Cloud and Grid Computing (2014)
10. Kaviani, N., Wohlstadter, E., Lea, R.: Profiling-as-a-service: Adaptive scalable re-
source profiling for the cloud in the cloud. In: Kappel, G., Maamar, Z., Motahari-
Nezhad, H.R. (eds.) Service Oriented Computing. LNCS, vol. 7084, pp. 157–171.
Springer, Heidelberg (2011)
11. Juve, G., Deelman, E.: Automating application deployment in infrastructure
clouds. In: Proceedings of the 2011 IEEE Third International Conference on Cloud
Computing Technology and Science, CLOUDCOM 2011, pp. 658–665. IEEE Com-
puter Society, Washington, DC (2011)
12. Li, A., Zong, X., Kandula, S., Yang, X., Zhang, M.: Cloudprophet: towards
application performance prediction in cloud. In: Proceedings of the ACM SIGCOMM
2011 Conference, SIGCOMM 2011. ACM, New York (2011)
Transforming Service Compositions
into Cloud-Friendly Actor Networks
1 Introduction
In recent years, the use of private and public clouds for providing services to users
has proliferated as organizations of all sizes embraced the Cloud as an increasingly
technically mature and economically viable way to reach markets and meet quality re-
quirements on the global scale. This is especially true for simple (atomic and back-end)
services that perform individually small units of work. Such services can be distributed
on different cloud nodes, and the requests are routed to different instances based on node
availability and load balancing. The key enablers here are distributed databases, which
offer high availability and distribution at the price of limited, eventual consistency [7].
Service compositions typically need to store their internal state (point of execution
and state variables) along with the domain-specific user data on which they operate.
That is needed because service compositions may be long-running and may involve
The research leading to these results has received funding from the EU FP 7 2007-2013 pro-
gramme under agreement 610686 POLCA, from the Madrid Regional Government under CM
project S2013/ICE-2731 (N-Greens), and from the Spanish Ministry of Economy and Compet-
itiveness under projects TIN-2008-05624 DOVES and TIN2011-39391-C04-03 StrongSoft.
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 291–305, 2014.
© Springer-Verlag Berlin Heidelberg 2014
292 D. Ivanović and M. Carro
many internal steps, so that it would be inefficient to let them occupy the scarce server
resources (such as threads and database connections) for the whole duration of their
execution, most of which is typically spent waiting for responses from other services.
Besides, saving the composition state in a persistent store allows resumption after server
restarts or network failures. This leads to an essentially event-driven implementation of
most composition engines, where incoming events (messages or timeouts) either create
new composition instances or wake up dormant ones, which perform a short burst of
processing and then either terminate or go to sleep until the next wake-up event.
However, even when eventual consistency on user data is permitted, any inconsis-
tency in the saved internal state of an executing composition may lead to wrong or
unpredictable behavior, and must be avoided. That is why most service composition
engines, such as Apache ODE [5], Yawl [1], and Orchestra [17], rely on a transactional
database to ensure state consistency of long-running processes. This presents a prob-
lem for scaling the SOA’s composition layer in the Cloud, as concurrent processing of
events within the same composition instance implicitly requires access synchronization,
transactional isolation, and locking or conflict detection on a central database.
In this paper, we argue that SOA’s service composition layer can more successfully
exploit the advantages offered by the Cloud if it is based on state messaging rather than
mutable shared state kept in a database. This means basing the design of composition
engines on well-defined, fine-grained, and Cloud-friendly parallelism and distribution
formalisms, rather than “hacking” the existing centralized implementations.
In Section 2, we motivate our approach and outline it in Section 3. Section 4 presents
the details of the approach, and Section 5 gives some implementation notes and presents
an experimental validation of the approach. We close with conclusions in Section 6.
2 Motivation
According to the Reactive Manifesto [4], the ability to r := 0;
react to events, load fluctuations, failures, and user re- while ¬empty(in) do begin
quirements is the distinguishing mark of reactive soft- join begin
ware components, defined as being readily responsive to send head(in) to P;
stimuli. In this paper, we try to facilitate some of those receive x from P
end and begin
capabilities in service compositions, starting with ser-
send head(in) to Q;
vice orchestrations with centralized control flow.
receive y from Q
Take, for instance, an example currency exchange end;
composition whose pseudo-code is shown in Figure 1. r := r + max(x, y);
(The syntax and semantics of a sample composition lan- in := tail(in)
guage is given in Section 4.1.) This composition takes a end;
list of amounts in different currencies (in), and tries to send r to caller
find the maximal amount of Euros to which they can be
converted, using two external currency conversion ser- Fig. 1. Sample composition
vices, P and Q. Each amount/currency pair (head(in))
is sent to P and Q in parallel, and the responses (x and y) add to the result (r) before
continuing with the rest of the input list (tail(in)). Finally, the result is sent to the caller.
To allow the sample composition to scale both up and out, we need to surpass the
limits posed by the shared state store architecture. One way to achieve that is to turn
Transforming Service Compositions into Cloud-Friendly Actor Networks 293
Translation
← Actor network
Monitoring
Clustering & load balancing
State
Node A Node B Node C
Persistence
the logical flow of control within the composition into a message flow, by transforming
the composition into a network of interconnected stateless, reactive components, each
performing a small unit of work, and forwarding results down the logical control flow.
Ideally, slower components would be automatically pooled and load-balanced in order
to enhance throughput, and/or spread between different nodes in a cluster, depending on
available cloud resources. Instead of being kept in a shared data store, the composition
state would be reconstructed from observed messages and pushed to a persistent store
in a write-only, non-blocking manner.
A major challenge – and the main contribution of this paper – is to find a method
for automatically and transparently transforming compositions into such networks of
readily scalable reactive components. The transformation needs to hide the underly-
ing implementation details and preserve semantics of state variables, complex control
constructs (loops and parallel flows), operations on a rich data model, and message
interchange with external services.
We therefore address a similar problem as the concept of Liquid Service Architecture
[8], but targeting specific issues in the composition layer, based on formal models of
composition semantics and semantically correct transformations.
{C ∧ φ | π } S1 { φ ! | π ! } { ¬C ∧ φ | π } S2 { φ ! | π ! }
COND SKIP
{ φ | π } if C then S1 else S2 { φ ! | π ! } { φ | π } skip { φ | π }
{C ∧ φ | π } S { φ | π ! } { φ | π } S1 { φ ! | π ! } { φ ! | π ! } S2 { φ !! | π !! }
LOOP SEQ
{ φ | π ! } while C do S { ¬C ∧ φ | π ! } { φ | π } S1 ; S2 { φ !! | π !! }
{ φ | π } S1 { φ ! | π ! } { φ | π } S2 { φ ! | π ! }
STATE JOIN
{ φ [x\E] | π } x := E { φ | π } { φ | π } join S1 and S2 { φ ! | π ! }
π ! contains no ( ← P) π !! contains no ( ← P)
RECV
{ φ [x\u] | π ! (u ← P)π !! } receive x from P { φ | π ! (u ← P)π !! }
φ " E = u π !! contains no ( → P) or ( ← P)
SEND
{ φ | π ! π !! } send E to P { φ | π ! (u → P)π !! }
A, tail : B}, and the empty list [ ] with {cons : false}. In turn, records and lists can rep-
resent JSON and XML documents. In examples, we use sans serif font to distinguish
field, built-in and other global names from local names in cursive.
We use a form of axiomatic semantics to specify the meaning of control constructs,
data operations, and message exchanges for the language fragment, with the inference
rules (axiom schemes) shown in Figure 4. The pre- and post-conditions are expressed
in the form { φ | π }, where logic formula φ characterizes the composition state as in
the classic Hoare Logic [14,6], and π is a chronological sequence of outgoing messages
(u → P), incoming unread messages (u ← P), and incoming read messages (u ← P),
where u is a datum. The consequence rule, which states that pre-conditions can always
be strengthened as well as post-conditions weakened, is implicit. Condition { φ ! | π ! }
is stronger than { φ | π } iff φ ! logically implies φ (in the data domain theory), and π is
a (possibly non-contiguous) sub-sequence of π ! .
Rules COND, SKIP, LOOP, SEQ, and STATE are direct analogues of the classical Hoare
Logic rules for sequential programs. The abstract semantics of parallel and-join flow is
given in rule JOIN. The parallel branches are started together, and race conditions on
state variables and partner services are forbidden: variables modified by one branch
cannot be read of modified by the other, and the branches cannot send or receive mes-
sages to or from a same partner service.
In rule RECV, the conditions on π ! and π !! ensure that messages are read in the
order in which they are received, and the condition on π !! in rule SEND ensures the
chronological ordering of outgoing messages. The underscores here denote arbitrary
data. The message exchange is asynchronous, and thus the relative ordering of messages
to/from a partner matters more than the absolute ordering of all messages.
The abstract syntax of a functional actor language is given in Fig. 5, along the lines
of Aga et al. [3] and Varela [18], with some syntactic modifications. Its domain of
values (V ) is the same as in the sample composition language, with addition of actor
references (A) used for addressing messages. The expressions (E) extend expressions
in the composition language with functional and actor-specific constructs.
296 D. Ivanović and M. Carro
Function abstractions and applications from λ -calculus are included together with
the special recursion operator rec. The match construct searches for the first clause
T → E where pattern T matches a given value, and then executes E to the right of “→”.
At least one match must be found. Variables in patterns capture matched values, and
each underscore stands for a fresh anonymous variable. The order of fields in record
patterns is not significant, and matched records may contain other, unlisted fields. Sev-
eral common derived syntactic forms are shown in Table 1.
Construct new creates a new actor with the given behavior, and returns its reference.
An actor behavior is a function that is applied to an incoming message. Construct ready
makes the same actor wait for a new message with the given behavior. Construct send
sends the message given by its second argument to the agent reference to which the first
argument evaluates. Finally, stop terminates the actor.
Fig. 6 shows two simple examples of
sink ≡ rec(λ b → λ m → ready(b))
actor behaviors. The sink behavior sim-
ply accepts a message (m) without doing cell ≡ rec(λ b → λ x → λ m →
anything about it, and repeats itself. The match m with
cell behavior models a mutable cell with {get: a} → do send(a, x) then ready(b(x));
content x. On a ‘get’ message, the cur- {set: y} → ready(b(y))
rent cell value x is sent to the designated end)
recipient a, and the same behavior is re-
Fig. 6. Two simple actor behaviors
peated. On a ‘set’ message, the cell for-
gets the current value x and repeats the same behavior with the new value y. Note how
in both cases the construct rec allows the behavior to refer to itself via b.
W ::= V | λ x → E
e ::= W (W ) | x | f (W, . . . ,W ) | W ◦W | rec(W ) | RW | W {x : W }
| match W with T → E[; T → E]∗ end | new(W ) | stop | ready(W ) | send(W,W )
E ::= | W (E ) | E (E) | f (W, . . . ,W, E , E, . . . , E) | W ◦ E | E ◦ E
| rec(E ) | {x : W, . . . , x : W, x : E , x : E, . . . , x : E} | W {x : E } | E {x : E}
| match E with T →E[; T → E]∗ end | new(E ) | ready(E ) | send(W, E ) | send(E , E)
w∈W e ≡ rec(λ x → E) r ≡ {x : , ϕ } v ∈ V
APP REC UPD
(λ x → E)(w) −→λ E[x\w] e −→λ E[x\e] r{x : v} −→λ {x : v, ϕ }
f n ∈ Builtins n ≥ 0 [[ f n ]] : V n → V v ∈ V ∃θ · v ≡ T θ
BI MATCH 1
f n (v1 , . . . , vn ) −→λ [[ f n ]](v1 , . . . , vn ) (match v with T → E[; τ ] end) −→λ E θ
v ∈V ∃θ · v ≡ T θ
MATCH 2
(match v with T → E; τ end) −→λ (match v with τ end)
e −→λ e!
FUN STOP
α , [E e]a || μ −→ α , [E e! ]a || μ α , [E stop]a || μ −→ α || μ
w ∈ W a! ∈ A fresh
NEW
α , [E new(w)]a || μ −→ α , [E a! ]a , [ready(w)]a! || μ
w ∈W
READY
α , [E ready(w)]a || μ , (a ⇐ v) −→ α , [w(v)]a || μ
a! ∈ A v ∈ V
SEND
α , [E send(a! , v)]a || μ −→ α , [E null]a || μ , (a! ⇐ v)
After explaining the syntax and semantics of the sample composition language and the
actor language, we now proceed with the crucial step in our approach: the transforma-
tion of a service composition into an actor network.
An actor network is a statically generated set of actor message handling expressions
that correspond to different sub-constructs in a composition. At run-time, actor net-
works are instantiated into a set of reactive, stateless actors, which accept, process and
route information to other actors in the network, so that the operational behavior of the
instantiated network is correct with respect to the abstract semantics of the composition
language. The stateless behavior of the actors in an instantiated network enables their
replacement, pooling, distribution, and load-balancing.
For a composition S, by A [[ S ]] we denote its translation into an actor network, as a
set whose elements have the form i : Ei or i → j . Here, i and j are (distinct) code
location labels, which are either 0 (denoting composition start), 1 (denoting composition
finish), or are hierarchically structured as .d, where d is a single decimal digit (denoting
a child of ). Element i : E means that the behavior of the construct at i is realized with
actor behavior E over input message m. Element i → j means that i is an alias for j .
Alias i → j is sound iff A [[ S ]] contains either j : E j or i → k such that k → j is
sound. Unsound or circular aliases are not permitted.
A [[ S ]] is derived from the structure of S, by decomposing it into simpler constructs.
!
Figure 9 shows the translations A [[ S! ]] for each construct S! located at , and immedi-
ately followed by a construct at ! . For the whole composition, A [[ S ]] = A [[ S ]]10 . Items
P, , ! , .1, .2, etc. are treated as string literals in actor expressions.
The translation of skip simply maps the behavior of location to that of ! , without
introducing new actors. For other constructs, the structure of the incoming message m
is relevant: m.inst holds the unique ID of the composition instance; m.loc maps loca-
tion labels to actor addresses (discussed below); m.env is a record whose fields are the
Transforming Service Compositions into Cloud-Friendly Actor Networks 299
!
A [[ skip ]] ={ → ! }
!
A [[ x := E ]] ={ : (send(fget(m.loc, ! ), m{env.x : Ē}{from : }))}
!
A [[ if C then S1 else S2 ]] ={ : (send(fget(m.loc, if C̄ then .1 else .2), m{from : }))}
! !
∪ A [[ S1 ]].1 ∪ A [[ S2 ]].2
!
A [[ while C then S ]] ={ : (send(fget(m.loc, if C̄ then .1 else ! ), m{from : }))} ∪ A [[ S ]].1
! !
A [[ S1 ; S2 ]] ={ → .1} ∪ A [[ S1 ]].2
.1 ∪ A [[ S2 ]].2
!
A [[ send E to P ]] ={ : (do send(fget(m.link, P), m{out : Ē}) then
send(fget(m.loc, ! ), m{from : }))}
!
A [[ receive x from P ]] ={ : (send(fget(m.link, P), m{in : ”x”}{from : }{to : ! }))}
! !
A [[ join S1 and S2 ]] ={ : (let m2 = m{from : }{loc : fset(m.loc, .2, new(J [[ S1 , S2 ]] (m)))}
in do send(fget(m.loc, .1.1), m2 ) then send(fget(m.loc, .1.2), m2 ))}
∪ A [[ S1 ]].2
.1.1 ∪ A [[ S2 ]].1.2
.2
!
J [[ S1 , S2 ]] ≡λ m → λ m1 → ready(λ m2 → do send(fget(m.loc, ! ),
(S2 writes z̄) (if m1 .from ≥ .1.1 then m{env : m1 .env{z : m2 .env.z}}
(S1 writes ȳ) else m{env : m2 .env{y : m1 .env.y}}){from : .2}) then stop
composition state variables with their current values; and m.link is a map from available
partner service names to references of the actors which serve as their mailbox inter-
faces. The initial content of m is set up upon the reception of the initiating message
with which the composition is started. For simplicity, we assume that m.env.in holds
the input message, and that the initiating party is by convention called caller.
The translation of an assignment uses the built-in fget to fetch the value of m.loc
associated with ! (as a string literal). That value is the reference of the next actor in
the flow, to which a message is sent with the modified value of the assigned variable x.
With Ē we denote the result of replacing each state variable name y encountered in E
with m.env.y. Here, as in other translations, we additionally modify the from field in m
to hold the location from which the message is sent.
The translation of the conditional creates two sub-locations, .1 and .2 to which it
translates the then- and the else-part, respectively. Then, at run-time the incoming mes-
sage is routed down one branch or another, depending on the value of the condition C̄
(which is rewritten from C in the same way as Ē from E in assignment). The translation
of the while loop is analogous to that of the conditional. When a sequence is translated,
two sub-locations .1 and .2 are created and chained in a sequence.
The translations of the messaging primitives rely on partner links in m.link. For send,
the outgoing message is asynchronously sent to the partner link, wrapped in m.out, and
then the incoming message is forwarded to the next location in the flow. For receive,
the partner mailbox is asked to forward m to ! when the incoming message becomes
available, by placing it in m.env under the name of the receiving variable.
The most complex behavior is for the join construct, which needs to create a transient
join node (at .2) which collects and aggregates the results of both parallel branches be-
fore forwarding it to ! . The branches are translated at .1.1 and .1.2. The branches
receive message m2 whose m2 .loc is modified (using the built-in fset) to point to the
!
transient join node under .2. Its behavior of is defined by J [[ S1, S2 ]] : m is the
300 D. Ivanović and M. Carro
0
0.1 r := 0
0.2
0.2.1 while ¬empty(in) do ...
original incoming message, and m1 and m2 are messages received from the branches.
The outgoing message is based on m, and inherits env from the first branch to termi-
nate, with the added modifications from the other one: the value of each state variable z
written by S2 (or state variable y written by S1 ) is copied into the resulting environment.
Figure 10 shows the topology of the actor network resulting from the translation
of the our example composition, annotated with location labels and the corresponding
composition constructs, with the message flow indicated with arrows. The transient
node is marked with an asterisk, and the dotted nodes correspond to the sequences and
are aliased to the next node in the flow.
m m!
m
m m!
(a) =⇒ =⇒
i j k i j k i j k
m! m
m!
m m!
(b) =⇒ =⇒
i j k i j k i j k
m! m! m!! m!
(c)
m k =⇒
m k =⇒
m k
i j i j i j
m!!
p p p
m! m m! m!! m!
(d) k =⇒
m k =⇒
m k
i j i j i j
m!!
p p p
instance) is sent to location 1; (ii) φ ! holds on the output message; and (iii) the messages
sent to and received from the external service mailboxes are compatible with π ! .
The proof of this theorem is based on structural induction of correctness on the
building blocks of S. For each building block, the operational semantics of the actors
in α (augmented with partner mailbox actors) is validated against the pre- and post-
conditions defined in the abstract semantics of the composition language, applied to the
content and the circulation of messages that belong to a same composition instance.
Note that the behavior rec(λ b → m → do E then ready(b)) with which new actors
are created is fully stateless and repetitive, and thus a single actor can be seamlessly
replaced with a load-balanced and possibly dynamically resizable pool of its replicas
attached to the same location, without affecting the semantics of the instantiation.
By observing all messages sent between actors in the instantiated actor network (with
the addition of partner service mailbox actors), a monitor can keep the current snapshot
of the execution state for each executing instance, distinguished by m.inst. The snapshot
can be represented as a tuple σ , ς , where σ is the stable, and ς the unstable set of
observed messages. The two sets are needed because messages can arrive out of order.
For example, part (a) of Figure 11 treats the case of location j which needs one
incoming, and produces one outgoing message. When messages come in order, the in-
coming message m from i is placed in σ and is subsequently replaced with the outgoing
message m! . It may, however, happen, as in Figure 11(b), that the outgoing message m!
is observed first. In that case, it is placed in ς (indicated by the dashed line). When the
incoming message m is observed, it is discarded, and m! is moved from ς to σ . Two
analogous cases for a location corresponding to a parallel split node, which sends two
outgoing messages, is shown in Figure 11(c)-(d). In these examples, we tacitly merge
the aliased locations together, and include the partner service mailboxes.
After each observation, the stable set of observed messages σ can be written to a
persistent data store, and used for reviving the execution of the instance in case of
a system stop or crash, simply by replaying the messages from σ . This may cause
302 D. Ivanović and M. Carro
100
80
80
# instances
# instances
60
60
40
40
20
20
0
0
0 10000 20000 30000 40000 0 5000 10000 15000 20000 25000 30000
100
80
80
# instances
# instances
60
60
40
40
20
20
0
Fig. 12. Dynamic behavior of the sample composition deployed as an actor network
repetition of some steps (including the external service invocations), whose completion
has not been observed when the last stable set was committed to the persistent store,
but the messages in σ always represent a complete and consistent instance snapshot.
1 The experiment was performed on a Mac Airbook computer with 1.7 GHz Intel Core i5 and
4 GB of RAM, running Mac OS X 10.9.2, Oracle Java 1.7 55, and Akka 2.3.2.
304 D. Ivanović and M. Carro
100
80
80
# instances
# instances
60
60
40
40
20
20
0
0
0 1000 2000 3000 4000 0 1000 2000 3000
Fig. 13. Limits of performance improvements when increasing the scaling factor
Figure 13 shows how the effects on performance degrade as the common scaling
factor increases. While scaling factors n = 2, n = 4 and n = 10 reduce the overall exe-
cution time almost proportionally by factors 1.993, 3.937, and 9.097, respectively, for
15 ≤ n ≤ 20 the reduction factor remains close to 10.7.
actor formalism into which compositions are translated can be used for reasoning about
safety and liveness properties of choreographies involving several orchestrations.
References
1. van der Aalst, W.M.P., ter Hofstede, A.H.M.: YAWL: Yet Another Workflow Language. In-
formation Systems 30(4), 245–275 (2005)
2. Agha, G.: Actors: A Model of Concurrent Computation in Distributed Systems. MIT Press,
Cambridge (1986)
3. Agha, G., Mason, I.A., Smith, S.F., Talcott, C.L.: A foundation for actor computation. Journal
of Functional Programming 7(1), 1–72 (1997)
4. et al., M.D.D.: The reactive manifesto. Web (September 2013),
http://www.reactivemanifesto.org/
5. Apache Software Foundation: Apache ODE Documentation (2013),
https://ode.apache.org/
6. Apt, K.R., De Boer, F.S., Olderog, E.R.: Verification of sequential and concurrent programs.
Springer (2010)
7. Bailis, P., Ghodsi, A.: Eventual consistency today: Limitations, extensions, and beyond.
Commun. ACM 56(5), 55–63 (2013), http://doi.acm.org/10.1145/2447976.2447992
8. Bonetta, D., Pautasso, C.: An architectural style for liquid web services. In: 2011 9th Working
IEEE/IFIP Conference on Software Architecture (WICSA), pp. 232–241 (June 2011)
9. Cardelli, L., Gordon, A.D.: Mobile ambients. Theoretical Computer Science 240(1), 177–
213 (2000)
10. Fournet, C., Gonthier, G.: The join calculus: A language for distributed mobile programming.
In: Barthe, G., Dybjer, P., Pinto, L., Saraiva, J. (eds.) APPSEM 2000. LNCS, vol. 2395, pp.
268–332. Springer, Heidelberg (2002)
11. Gupta, M.: Akka Essentials. Packt Publishing Ltd. (2012)
12. Hewitt, C.: A universal, modular actor formalism for artificial intelligence. In: IJCAI 1973.
IJCAI (1973)
13. Hewitt, C.: Viewing control structures as patterns of passing messages. Artificial Intelli-
gence 8(3), 323–364 (1977)
14. Hoare, C.A.R.: An axiomatic basis for computer programming. Communications of the
ACM 12(10) (1969)
15. Ibsen, C., Anstey, J.: Camel in Action, 1st edn. Manning Publications Co., Greenwich (2010)
16. Milner, R.: Communicating and mobile systems: The pi calculus. Cambridge University
Press (1999)
17. Team, O.: Orchestra User Guide. Bull-SAS OW2 Consortium (October 2011),
http://orchestra.ow2.org/
18. Varela, C.A.: Programming Distributed Computing Systems: A Foundational Approach. MIT
Press (2013)
A Runtime Model Approach for Data
Geo-location Checks of Cloud Services
1 Introduction
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 306–320, 2014.
© Springer-Verlag Berlin Heidelberg 2014
Policy Checks Based on Runtime Models 307
cost goals [5, 23]. These elasticity mechanisms replicate and migrate virtual ma-
chines and services among data centers, which may lead to dynamic re-locations
of personal data. During design-time, changes to service deployments and con-
sequently to geo-locations of data are unknown and have to be checked during
runtime. For instance, two interacting services process personal data and are ini-
tially deployed on cloud data centers within the EU. For performance reasons,
one service is migrated to a cloud data center located outside the EU. When
both services interact after the migration, they exchange personal data across
the EU-borders and thus violate data geo-location policies.
Privacy checking approaches such as host geo-location [8, 13] consider the
cloud as a black box. These approaches are agnostic to migrations and repli-
cations that may occur behind service interfaces. Approaches on access control
mechanisms [7, 17] neither consider changes of data geo-locations imposed by
migration or replication nor transitive data transfers. In summary, existing ap-
proaches are limited in detecting privacy violations that arise from the combi-
nation of cloud elasticity and service interactions.
In this paper, we systematically analyze cloud service elasticity in combination
with service interactions towards potential policy violations. We propose a novel
policy checking approach based on runtime models that covers the identified
cases of policy violations. The proposed runtime models reflect the deployment
and interaction of cloud services and components. The models are updated when
migrations or replications are applied to the reflected cloud applications. By
expressing the privacy policy checks as an st-connectivity problem on the runtime
models, potential data transfers that violate the geo-location policies can be
rapidly determined. The empirical evaluation indicates that the approach is both
effective and performant.
The remainder of the paper is structured as follows. Sec. 2 systematically
analyzes the changes in the cloud infrastructure that need to be considered to
identify data geo-location violations. Sec. 3 discusses the related work. Sec. 4
introduces our policy checking approach. In Sec. 5 we evaluate our approach
concerning its effectiveness and performance using the CoCoMe case study. Sec.
6 concludes the paper and provides an outlook to future work.
employ a variant of CoCoME that has been adapted to the cloud (within a
working group of the DFG Priority Programme ”Design For Future”3 described
in, e.g., [14]). This variant collects shopping transactions of customers in order to
offer payback discounts, thereby involving the storage and processing of personal
data. In the following, we describe the CoCoMe case study in terms of service
and component interactions, the characterization of data, and the definition of
data geo-location policies.
A virtual machine stores data, receives data from other virtual machines, or
transfers data to them. Further, a virtual machine can be migrated or replicated.
3
http://www.dfg-spp1593.de
Policy Checks Based on Runtime Models 309
the personal data) among data centers outside the EU, which violates geo-
location policies. Information required for detecting this case: information
R1-5 is required as in case 2. Furthermore, explicit or implicit information
about transitive data transfers among components is required (R6 ).
Checks covering the three cases have to access the relevant information sum-
marized in Tab. 1.
3 Related Work
the service storing the data (case 1). Moreover, data transfers between the client
services and further services are not covered. Transitive data transfers (case 3)
that may lead to policy violations thus remain undetected.
Rules for controlling cloud elasticity have been proposed in [21] as well as
in the MODAClouds5 and Optimis6 projects. Those elasticity rules are defined
during design time. They are utilized to achieve quality goals, such as response
time, energy consumption, cost, and reliability during runtime. However, rules
that implement data geo-location policies have to be defined considering the data
stored by a virtual machine (case 1) as well as the data, which may be transferred
to it (case 2 and 3). Yet, this information is not available during design time (see
Sec. 2.2), and thus defining geo-location rules during design-time is not feasible.
To summarize, none of the existing approaches cover the cases 1-3. The ap-
proaches in [7, 17] cover case 2, but fall short in detecting policy violations re-
sulting from transitive data accesses and cloud elasticity.
the policy checks as an st-connectivity problem, and discuss how this covers
cases 1-3 (Sec. 4.3).
The concepts for the runtime models underlying our approach are shown in
Fig. 2. The concepts Datacenter, VM, Component, and the relations between
them provide the information on components and their deployments required
to run the policy check (see R4 in Sec. 2). One GeoLocation references several
Datacenter (R5 ). Modeling this relation is important to facilitate the runtime
check (to be discussed in Sec. 4.3). The Component concept subsumes both
traditional components and services. Components execute processes (Process)
that may interact across components (R1 ) and data centers. The meta-model
allows defining components that access data through further components and can
represent direct or transitive data transfers (R6 ). From this relation potential
data transfers are derived.
<<enumeration>> Data
Classification
classification:Classification[*]
NOT_CLASSIF contentType:ContentType[*]
PERS_IDENT
ANONYMIZED 0..*
NON_PERS accesses deployed on
accesses
0..* 0..* Service
<<enumeration>>
ContentType 0..*
Process Component
0..* 1
NOT_TYPED id:int 1..* id:int
1
VIDEO_RENT_INF
executes Platform
SALES_INF
1..*
<<enumeration>> deployed on
GeoLocation
1
DEU
contains Datacenter VM
FRA
ITA id:int hosts id:int
1
USA 0..* location:GeoLocation 1 0..*
Processes access data that is stored in the component executing the process
(R2 ). Anonymized data can potentially be de-anonymized [24]. Furthermore, ac-
cidental or intentional disclosure of different content types is attached to diverse
severities and penalties as stipulated, e.g., in the Video Privacy Protection Act7
and the Health Insurance Portability and Accountability Act8 . Consequently,
data may be treated differently with respect to its classification and content. To
support a flexible definition, we enrich the modeled Data entity with a Classifi-
cation and a ContentType (R3 ).
7
http://www.law.cornell.edu/
8
http://www.cms.gov/
Policy Checks Based on Runtime Models 313
The runtime model may be created manually during the software design phase
or may be generated from software artifacts (source code, deployment descriptors
etc.) as part of a model-driven engineering process. During runtime, whenever
replication or migration changes the deployment or composition of the reflected
application, the model has to be updated. Due to space limitations we focus on
the presentation and evaluation of the policy checking approach in this paper
and examine the monitoring-driven update of the runtime model structure in
our future work. However, a comprehensive survey on updating runtime models
based on monitoring data can be found in [22].
Policy
0..* 0..*
id:int
0..*
classifications 1..* 1..* geoLocations
contentTypes 1..*
<<enumeration>> <<enumeration>>
Classification <<enumeration>> GeoLocation
ContentType
NOT_CLASSIF DEU
PERS_IDENT NOT_TYPED FRA
... ... ...
Data(1, Data(2,
[PERS_IDENT], [ANONYMIZED],
[HEALTH_INF]) [SALES_INF])
Process(1) Process(4)
Process(2) Process(3)
With Def. 1 we define the runtime model as graph G and with Def. 2 we
define a path H in G. When searching path H in graph G, modeling the relation
between GeoLocation and Datacenter is important. Based on this, the runtime
model can be transversed from GeoLocation entities Vs to data nodes Vt , which
allows defining the policy check as st-connectivity problem with Def. 3.
where ”true” means the checked equation holds and ”false” means the policy is
violated.
The checking approach covers the data re-location cases 1-3 introduced in Sec.
2:
– Case 1 is covered when both vt and the host where the data reside are at
geo-location vs .
– Case 2 is covered when a component executed at geo-location vs directly
accesses vt from a remote component.
– Case 3 is covered when a component at geo-location vs accesses vt from a
remote component transitively through further components.
316 E. Schmieders, A. Metzger, and K. Pohl
The implementation of the runtime check may base on algorithms for graph
traversal. For instance, basic breadth-first search or depth-first search may be
applied as well as optimized variants such as the A∗ algorithm.
5 Experimental Evaluation
The experimental evaluation aims for analyzing the effectiveness and perfor-
mance of the geo-location policy checking approach. Here, effectiveness refers
to the capability of identifying potential data transfers that may violate data
geo-location constraints. Performance refers to the time consumed for checking
the violations and indicates how timely one may be informed about violations.
The set up of the experiments is based on the combination of a simulated
cloud environment and the prototypical implementation of our approach. The
set up includes a runtime model, a set of data geo-location policies, a prototypical
implementation of the runtime checking approach using depth first search, and
a simulator that simulates replication and migration of virtual machines. We
implemented the runtime and policy meta models as Ecore instances12 . The
runtime model reflects the SOA-version of CoCoME (see Sec. 2.1) and includes
22 data centers distributed among five countries, four virtual machines, seven
components, and six processes accessing two different types of personal data.
The simulation environment allows us to run controlled, reproducible functional
tests and to examine policy checking performance without provider limitations
or side effects.
Our approach passes the functional tests and thus correctly identifies all three
cases of policy violations. Of course, the generalizability (external validity) of
these findings is limited by the fact that we examined the expressiveness by
means of a single case study. Although CoCoME is used in a multitude of em-
pirical studies, we thus plan to apply our approach to further applications in our
future work.
– In the ”worst case” (red graph) the search algorithm has to traverse every
possible path entirely to detect the policy violation. The measured growth in
checking durations of worst cases is of polynomial time as the upper bound
can be described by a quadratic expression (black curve in the figure), i.e.
tw = 1.112 × 10−3 ∗ x + 2.460 × 10−8 × x2 with x as model complexity (ex-
pression estimated by non-linear regression). This indicates that the growth
indeed maps to the analytical complexity of the st-connectivity problem.
– The ”typical case” experiment is repeated several times with different seed
values. We observe that the checking durations increase according to tw
(all paths are explored) until a violation occurs. After this, the checking
durations increase linear or remain constant. Durations of the ”typical case”
are within [tb , tw ]. Fig. 5 shows one example run (blue bouncing graph). First,
the run duration increases according to tw until the complexity of 12275
at which the checked model exhibits a policy violation. After the violation
occurs, the number of visited nodes required for detecting the violation grows
linear due to replication adding each step up to one node to the search
path. At a complexity of 83847, a service interaction is randomly inserted.
The interaction connects a service located at the excluded geo-location to a
service that processes personal data. This almost reduces the time checking
duration to tb . Further runs of the ”typical case” show similar behaviors.
An amount of more than 800 virtual machines is realistic for large applications
as, for instance, Hadoop-clusters typically utilize several hundreds of data nodes
(see [4]). However, our approach is still able to check the worst case for large
cloud applications (> 500 virtual machines) in less than one second (cf. tb ).
Of course, due to the decision to simulate the cloud, the experimental design
may have limitations towards construct validity. There may be further factors
influencing the performance of the approach in a productive cloud environment.
For instance, delays of monitoring data may result from geographical distances
between the monitoring probes and the place where the policy checker resides.
These limitations have to be tackled in future experiments conducted on real
cloud infrastructures.
To this end, we will investigate its violation detection capabilities with respect
to precision, recall, and further evaluation metrics. In addition, we will comple-
ment our approach by leveraging cloud monitoring data to update the proposed
runtime models.
References
1. van der Aalst, W., Schonenberg, M., Song, M.: Time prediction based on process
mining. Information Systems 36(2) (Apr 2011)
2. Brosig, F., Huber, N., Kounev, S.: Automated extraction of architecture-level
performance models of distributed component-based systems. In: 2011 26th
IEEE/ACM International Conference on Automated Software Engineering, ASE
(2011)
3. Canfora, G., Di Penta, M., Esposito, R., Villani, M.L.: A framework for QoS-aware
binding and re-binding of composite web services. Journal of Systems and Software
81(10) (2008)
4. Chen, Y., Alspaugh, S., Katz, R.: Interactive analytical processing in big data
systems: A cross-industry study of MapReduce workloads. Proc. VLDB Endow.
5(12) (August 2012)
5. Copil, G., Moldovan, D., Truong, H.-L., Dustdar, S.: Multi-level elasticity control
of cloud services. In: Basu, S., Pautasso, C., Zhang, L., Fu, X. (eds.) ICSOC 2013.
LNCS, vol. 8274, pp. 429–436. Springer, Heidelberg (2013)
6. Epifani, I., Ghezzi, C., Mirandola, R., Tamburrelli, G.: Model evolution by run-
time parameter adaptation. In: 31st Internal Conference on Software Engineering
(ICSE) (2009)
7. e-Ghazia, U., Masood, R., Shibli, M.A.: Comparative analysis of access control
systems on cloud. In: 2012 13th ACIS International Conference on Software En-
gineering, Artificial Intelligence, Networking and Parallel Distributed Computing
(SNPD) (2012)
8. Gondree, M., Peterson, Z.N.: Geolocation of data in the cloud. In: Proceedings
of the Third ACM Conference on Data and Application Security and Privacy,
CODASPY 2013. ACM, New York (2013)
9. Gutiérrez, A.M., Cassales Marquezan, C., Resinas, M., Metzger, A., Ruiz-Cortés,
A., Pohl, K.: Extending WS-Agreement to support automated conformity check
on transport and logistics service agreements. In: Basu, S., Pautasso, C., Zhang,
L., Fu, X. (eds.) ICSOC 2013. LNCS, vol. 8274, pp. 567–574. Springer, Heidelberg
(2013)
10. van Hoorn, A., Rohr, M., Hasselbring, W.: Engineering and continuously operating
self-adaptive software systems: Required design decisions. In: Engels, G., Reussner,
R.H., Momm, C., Stefan, S. (eds.) Design for Future 2009, Karlsruhe, Germany
(November 2009)
11. Huber, N., Brosig, F., Kounev, S.: Modeling dynamic virtualized resource land-
scapes. In: Proceedings of the 8th International ACM SIGSOFT Conference on
Quality of Software Architectures (2012)
320 E. Schmieders, A. Metzger, and K. Pohl
12. Ivanović, D., Carro, M., Hermenegildo, M.: Constraint-based runtime prediction
of sla violations in service orchestrations. In: Kappel, G., Maamar, Z., Motahari-
Nezhad, H.R. (eds.) Service Oriented Computing. LNCS, vol. 7084, pp. 62–76.
Springer, Heidelberg (2011)
13. Juels, A., Oprea, A.: New approaches to security and availability for cloud data.
Commun. ACM 56(2) (February 2013)
14. Jung, R., Heinrich, R., Schmieders, E.: Model-driven instrumentation with kieker
and palladio to forecast dynamic applications. In: Symposium on Software Perfor-
mance: Joint Kieker/Palladio Days 2013. CEUR (2013)
15. Maoz, S.: Using model-based traces as runtime models. Computer 42(10) (2009)
16. von Massow, R., van Hoorn, A., Hasselbring, W.: Performance simulation of
runtime reconfigurable component-based software architectures. In: Crnkovic, I.,
Gruhn, V., Book, M. (eds.) ECSA 2011. LNCS, vol. 6903, pp. 43–58. Springer,
Heidelberg (2011)
17. Park, S., Chung, S.: Privacy-preserving attribute distribution mechanism for ac-
cess control in a grid. In: 21st International Conference on Tools with Artificial
Intelligence (2009)
18. Rausch, A., Reussner, R., Mirandola, R., Plášil, F. (eds.): The Common Compo-
nent Modeling Example. LNCS, vol. 5153. Springer, Heidelberg (2008)
19. Ries, T., Fusenig, V., Vilbois, C., Engel, T.: Verification of data location in cloud
networking. IEEE (December 2011)
20. Schmieders, E., Metzger, A.: Preventing performance violations of service composi-
tions using assumption-based run-time verification. In: Abramowicz, W., Llorente,
I.M., Surridge, M., Zisman, A., Vayssière, J. (eds.) ServiceWave 2011. LNCS,
vol. 6994, pp. 194–205. Springer, Heidelberg (2011)
21. Suleiman, B., Venugopal, S.: Modeling performance of elasticity rules for cloud-
based applications. In: 2013 17th IEEE International Enterprise Distributed Object
Computing Conference (EDOC) (September 2013)
22. Szvetits, M., Zdun, U.: Systematic literature review of the objectives, techniques,
kinds, and architectures of models at runtime. Software & Systems Modeling (De-
cember 2013)
23. Vaquero, L.M., Rodero-Merino, L., Buyya, R.: Dynamically scaling applications in
the cloud. ACM SIGCOMM Computer Communication Review 41(1) (2011)
24. Zang, H., Bolot, J.: Anonymization of location data does not work: A large-scale
measurement study. In: Proceedings of the 17th Annual International Conference
on Mobile Computing and Networking. ACM, New York (2011)
Heuristic Approaches
for Robust Cloud Monitor Placement
1 Introduction
Utilizing services from the cloud, consumers gain a very high level of flexibility.
Configurable computing resources are provided on-demand in a similar manner
like electricity or water [4] at a minimal amount of management effort. However,
this shift in responsibility to the cloud provider bears several risks for the cloud
consumer. Amongst these risks is a loss of control concerning aspects like per-
formance, availability, and security. Quality guarantees in the form of so-called
Service Level Agreements (SLAs) offered by cloud providers aim at lowering this
risk in favor for cloud consumers. Basically, an SLA represents a contract between
a cloud provider and a cloud consumer and defines certain quality levels (e.g.,
lower bounds for performance parameters such as availability) to be maintained
by the cloud provider. In addition, such a contract specifies some penalties for
the cloud provider in case of SLA violations. Nevertheless, the cloud consumers’
perception still is that providers do not sufficiently measure performance against
SLAs [5]. Furthermore, cloud providers often assign the task of violation report-
ing to their customers [11]. But even despite the fact that some cloud providers
also offer consumers corresponding monitoring solutions, these solutions cannot
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 321–335, 2014.
c Springer-Verlag Berlin Heidelberg 2014
322 M. Siebenhaar et al.
2 Related Work
In this section, we give an overview of our extended approach and align our
former work as well as our new contributions in the context of a broker-based
scenario. Our former work is briefly described in the next sections followed by a
detailed description of our new contributions.
This paper focuses on a broker-based scenario that is based on a cloud mar-
ket model. In such a cloud market, cloud consumers submit their functional and
non-functional requirements for their desired cloud services to brokers, which
constitute mediators between cloud consumers and providers [4]. The brokers
are capable to determine the most suitable cloud providers by querying a service
registry residing in the market. Furthermore, such a broker enables consumers
to negotiate SLAs to be provided by the cloud provider. For this purpose, a
broker acts on behalf of a cloud consumer and conducts SLA negotiations with
cloud providers. Now, in order to be able to verify compliance with SLAs from
a consumer’s perspective later on, the broker can also act on behalf of con-
sumers and apply our proposed monitoring approach during runtime. The ad-
vantage of a broker-based perspective for our approach is the exploitation of
global knowledge. In case that SLA violations are detected, a broker is aware of
the adherence to SLAs of other cloud providers and thus, is able to recommend
alternative cloud providers. In addition, the monitoring information gained at
runtime can be used to initiate SLA re-negotiations or to adapt the properties of
the monitoring approach in order to improve monitoring quality or monitoring
infrastructure reliability.
In the following, we focus on an enterprise cloud consumer utilizing a set of
applications running in different data centers of a cloud provider. In order to
verify the performance guarantees in the form of SLAs obtained from the cloud
provider, the enterprise cloud consumer entrusts the broker which conducted
the SLA negotiations before with the monitoring of the running cloud appli-
cations. As part of such a monitoring service order, we propose the definition
of Monitoring Level Agreements (MLAs) specifying the properties of the mon-
itoring tasks for each application (cf. Section 3.1 for details). The broker then
applies our hybrid monitoring approach and places monitoring units for each
cloud application on provider- as well as on broker-side. Besides the provider-
side monitoring, the broker-side monitoring permits an assessment of the status
of a cloud application from a consumer’s perspective. In order to obtain a robust
monitor placement, the broker can select one of our proposed monitor placement
algorithms according to our investigation in Section 4.
324 M. Siebenhaar et al.
Our hybrid monitoring approach introduced in our former work in [15] focuses
on verifying the availability of cloud applications from a consumer’s perspective,
since availability is one of the very few performance parameters contained in
current cloud SLAs. Nevertheless, other performance parameters can be easily
incorporated in our monitoring approach as well. In order to allow for an in-
dependent assessment of the status of a cloud application and visibility of the
end-to-end performance of cloud applications, we proposed a hybrid monitoring
approach with placements of monitoring units on provider and consumer side.
The latter are now replaced by broker-side placements. Furthermore, such a hy-
brid approach permits to differentiate between downtimes caused by issues on
broker and provider side and thus, enables a broker to filter downtimes that re-
late to cloud provider SLAs. In our hybrid monitoring approach, each monitoring
unit observes predefined services of a cloud application as well as processes of
the underlying VM. For each cloud application, the set of services to be invoked
by a monitoring unit can be defined in advance. Same applies for the system
processes to be observed on VM level. For this purpose, MLAs can specify the
consumer’s requirements concerning all the cloud applications to be monitored.
Besides the services and processes, an amount of redundant monitoring units
to be placed for each application can be defined. Higher numbers of redundant
monitoring units are reasonable for business critical applications, since the prob-
ability that all redundant monitors fail decreases. We also follow this assumption
in our monitor placement approach described in the next section.
As already stated before, our approach must not only consider monitoring qual-
ity, but also has to account for downtimes of the monitoring infrastructure itself.
Therefore, the monitoring units have to be placed by a broker in the data centers
on provider- and broker-side in such a way that maximizes the robustness of the
monitoring infrastructure. We introduced this problem, denoted as Robust Cloud
Monitor Placement Problem (RCMPP), in our former work in [14]. The corre-
sponding formal model is briefly described in the following. Table 1 shows the
basic entities (upper part) and parameters (lower part) used in the formal model.
Each instance of the RCMPP consists of a set S = {1, ..., n} of data center sites
comprising the set S = {1, ..., d} of data center sites on broker side and the set
S = {d+1, ..., n} of data center sites on cloud provider side. On each data center
site s ∈ S on provider and broker side, a set Vs = {1, ..., i} of VMs is running
which constitute candidates for monitor placement. A set of cloud applications
Cs v = {1, ..., j} to be monitored is running on each VM v ∈ Vs located on
a data center site s ∈ S on provider side. A set of links L = {l(sv s v )}
interconnects the VMs v ∈ Vs constituting placement candidates with the VMs
v ∈ Vs of the cloud applications Cs v . Each cloud application c ∈ Cs v has
certain requirements concerning the corresponding monitoring units. These re-
quirements comprise a specific resource demand of rds v cr ∈ R+ for a specific
Heuristic Approaches for Robust Cloud Monitor Placement 325
Symbol Description
S = {1, ..., n} set of n data center sites
S = {1, ..., d} consumer sites, S ⊂ S
S = {d + 1, ..., n} provider sites, S ⊂ S
Vs = {1, ..., i} VM candidates for monitor placement on site s ∈ S
Cs v = {1, ..., j} cloud applications to monitor on VM v ∈ Vs , s ∈ S
L = {l(sv s v )} links interconnecting VM monitor candidates Vs and
VMs of applications Cs v
R = {1, ..., k} set of k considered VM resource types
rds v cr ∈ R+ resource demand for monitoring application c ∈ Cs v for
resource r ∈ R
rssvr ∈ R+ resource supply of VM v ∈ Vs for resource r ∈ R
rfs v c ∈ N>1 redundancy factor for monitoring application c ∈ Cs v
pl(svs v ) ∈ R+ observed reliability for each link l ∈ L
psv ∈ R+ observed reliability for each VM v ∈ Vs
resource type r ∈ R = {1, ..., k} such as CPU power or memory, and a redun-
dancy factor rfs v c ∈ N>1 , indicating that the cloud application c has to be
monitored by rfs v c different monitoring units. In order to account for the re-
liability of the monitoring infrastructure, it has to be noted that the broker is
not aware of the underlying network topologies of the cloud provider and the
Internet service provider. However, we assume that the broker is able to utilize
traditional network measurement tools in order to estimate the end-to-end per-
formance between any pair of VMs that are represented by a given link l ∈ L
in order to determine the observed reliability pl(svs v ) ∈ R+ for a given link
l ∈ L. Furthermore, we assume that the broker can also utilize such measure-
ment tools in order to estimate the reliability psv ∈ R+ of a given VM v ∈ Vs
on a site s ∈ S. Finally, our model must also consider the respective resource
supply of rssvr ∈ R+ each VM v ∈ Vs on a site s ∈ S is able to provide. The
objective of the RCMPP now is to assign rfs v c monitoring units for each cloud
application to be monitored on broker and provider side, while maximizing the
reliability of the whole monitoring infrastructure. Hereby, we express the relia-
bility by the probability that at least one of the monitoring units for each cloud
application is working properly. In doing so, the resource constraints of the VMs
must not be exceeded and all monitoring units must be placed. Furthermore,
we incorporate a set of placement restrictions for the monitoring units. First of
all, no monitoring unit is allowed to be placed on the VM of the cloud applica-
tion to be monitored and second, one monitoring unit must be placed on broker
and provider side, respectively. Both restrictions directly follow from our hybrid
monitoring approach. Third, for reasons of fault-tolerance, each monitoring unit
to be placed for a single application must be placed on a different site.
326 M. Siebenhaar et al.
subject to
xsvs v c = rfs v c (4)
s∈S,v∈Vs
∀s ∈ S, v ∈ Vs , r ∈ R
xsvs v c ≤ 1 (6)
v∈Vs
∀s ∈ S, s ∈ S , v ∈ Vs , c ∈ Cs v
xsvs v c ≥ 1 (7)
s∈S,v∈Vs
xsvs v c = 0 (9)
∀c ∈ Cs v , s = s and v = v
for the respective application does not fail. Equation 2 represents this probabil-
ity by 1 minus the probability that all monitors for a specific cloud application
path
c ∈ Cs v fail. Equation 3 determines the probability to fail (qsvs v ) for a given
M inimize z (11)
subject to
v c (x) ≤ z
qslog (12)
∀s ∈ S , v ∈ Vs , c ∈ Cs v , z ∈ R
qslog
v c (x) =
path
xsvs v c log(qsvs v ) (13)
s∈S,v∈Vs
In each step, the Greedy algorithm tries to improve the partial solution obtained
so far to a maximum extent. For this purpose, the set of connections between
each VM (sourcevm) where a cloud application (app) to be monitored is run-
ning and each VM (targetvm) constituting a candidate for monitor placement
is sorted according to decreasing reliability values (line 3). Afterwards, we ex-
plore the connections in descending order (line 4). In each step, if all redundant
monitoring units for each application have not been placed so far (line 8), we
examine, whether we can place a monitoring unit on the targetvm of the current
connection. For this purpose, we check, whether any constraints of the RCMPP
are violated when the placement is realized. In case that no violation is detected
(line 10), we can add the current connection to the result set of final placements
(line 11) and update the auxiliary data structures (line 12). If all redundant
monitoring units have been placed for a given application, this application is re-
moved from the set R of monitor requirements (line 16). The Greedy algorithm
continues to explore further connections until all monitoring units have been
placed for each application (line 21).
330 M. Siebenhaar et al.
Our tabu search algorithm is depicted in Algorithm 4 and is inspired by the work
from Karsu and Azizoglu [9], who proposed a tabu search-based algorithm for
Heuristic Approaches for Robust Cloud Monitor Placement 331
5 Performance Evaluation
We have implemented our solution approaches in Java and conducted an eval-
uation in order to assess their applicability for real-world scenarios. For the
implementation and evaluation of the ILP-based approach, we used the JavaILP
framework1 and the commercial solver framework IBM ILOG CPLEX2 .
1
http://javailp.sourceforge.net/
2
http://www.ibm.com/software/integration/optimization/cplex-optimizer
332 M. Siebenhaar et al.
3
http://aws.amazon.com/ec2/
4
http://www-iepm.slac.stanford.edu/pinger/
Heuristic Approaches for Robust Cloud Monitor Placement 333
1 1
2 3 4 4 5 6 7 8
GREEDY GTSEARCH ILP GREEDY GTSEARCH ILP
100
computation time (ms)
1 2 3 2 3 4
GREEDY GTSEARCH ILP GREEDY GTSEARCH ILP
2200 900
2000 800
1800
700
1600
600
downtime (s)
downtime (s)
1400
1200 500
1000 400
800 300
600
200
400
200 100
0 0
1 2 3 1 2 3
RANDOM GREEDY GTSEARCH ILP GREEDY GTSEARCH ILP
Figures 1 to 6 depict selected results of the evaluation. Please note the logarith-
mic scale in the first four figures.
When using the ILP approach, the computation time shows an exponential
growth with increasing problem size, e.g., 100ms up to 10000ms in Fig. 1. How-
ever, this effect is considerably less when increasing the redundancy factor. All in
334 M. Siebenhaar et al.
all, the exponential growth underlines the fact that the RCMPP is NP-complete.
Hence, the applicability of the ILP approach in practice is very limited, since
the size of the problems considered in the evaluation is relatively small. Never-
theless, the ILP approach can serve as a baseline in order to assess the heuristic
approaches. In comparison to the ILP approach and the GTSearch heuristic, the
Greedy heuristic performs best with respect to computation time and yields a
linear growth with increasing problem size. The GTSearch heuristic also shows
smaller values in computation time than the ILP approach. This effect is most
pronounced when the number of cloud applications increases (cf. Fig. 3). How-
ever, the GTSearch heuristic exhibits no linear growth with respect to problem
size like the Greedy heuristic (cf. Fig. 2). Therefore, a further improvement of
this heuristic with respect to computation time will be considered in future
work, since this heuristic performs best, besides the ILP approach, with respect
to solution quality (cf. Fig. 5). In comparison, the Greedy heuristic although
showing the best computation times performs worse regarding solution quality
with increasing complexity of the problem (cf. Fig. 6 for a magnified view). Nev-
ertheless, the Greedy heuristic still achieves a considerably large improvement
over a random placement. The results of a random placement of monitoring units
are depicted in Fig. 5 and emphasize the need for heuristic solutions. Without
conducting any optimization, the monitoring units would end up, e.g., with a
downtime of 25 minutes (on a yearly basis) in contrast to a few seconds when
using the other approaches in case of 3 cloud applications deployed on each VM.
A result which is unacceptable when business critical applications are utilized.
References
1. Andreas, A.K., Smith, J.C.: Mathematical Programming Algorithms for Two-Path
Routing Problems with Reliability Considerations. INFORMS Journal on Comput-
ing 20(4), 553–564 (2008)
2. Bin, E., Biran, O., Boni, O., Hadad, E., Kolodner, E., Moatti, Y., Lorenz,
D.: Guaranteeing High Availability Goals for Virtual Machine Placement. In:
31st International Conference on Distributed Computing Systems (ICDCS), pp.
700–709 (2011)
3. Box, G.E.P., Hunter, J.S., Hunter, W.G.: Wiley, 2nd edn. (2005)
4. Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J., Brandic, I.: Cloud Computing
and Emerging IT Platforms: Vision, Hype, and Reality for Delivering Computing
as the 5th Utility. Future Generation Computer Systems 25(6), 599–616 (2009)
5. CSA, ISACA: Cloud Computing Market Maturity. Study Re-
sults. Cloud Security Alliance and ISACA (2012), http://www.
isaca.org/Knowledge-Center/Research/ResearchDeliverables/Pages/
2012-Cloud-Computing-Market-Maturity-Study-Results.aspx (last access:
May 30, 2014)
6. Hillier, F.S., Liebermann, G.J.: Inroduction to Operations Research, 8th edn.
McGraw-Hill (2005)
7. Jensen, P.A., Bard, J.F.: Appendix A: Equivalent Linear pROGRAMS. In:
Supplements to Operations Research Models and Methods. John Wiley and
Sons (2003), http://www.me.utexas.edu/~jensen/ORMM/supplements/units/lp_
models/equivalent.pdf (last access: May 30, 2014)
8. Kamal, J., Vazirani, V.V.: An Approximation Algorithm for the Fault Tolerant
Metric Facility Location Problem. In: Jansen, K., Khuller, S. (eds.) APPROX
2000. LNCS, vol. 1913, pp. 177–182. Springer, Heidelberg (2000)
9. Karsua, Z., Azizoglua, M.: The Multi-Resource Agent Bottleneck Generalised As-
signment Problem. International Journal of Production Research 50(2), 309–324
(2012)
10. Natu, M., Sethi, A.S.: Probe Station Placement for Robust Monitoring of Networks.
Journal of Network and Systems Management 16(4), 351–374 (2008)
11. Patel, P., Ranabahu, A., Sheth, A.: Service Level Agreement in Cloud Computing.
Tech. rep., Knoesis Center, Wright State University, USA (2009)
12. Sharma, P., Chatterjee, S., Sharma, D.: CloudView: Enabling Tenants to Monitor
and Control their Cloud Instantiations. In: 2013 IFIP/IEEE International Sympo-
sium on Integrated Network Management (IM 2013), pp. 443–449 (2013)
13. Shrivastava, V., Zerfos, P., Lee, K.W., Jamjoom, H., Liu, Y.H., Banerjee, S.:
Application-aware Virtual Machine Migration in Data Centers. In: Proceedings
of the 30th IEEE International Conference on Computer Communications (INFO-
COM 2011), pp. 66–70 (April 2011)
14. Siebenhaar, M., Lampe, U., Schuller, D., Steinmetz, R.: Robust Cloud Monitor
Placement for Availability Verification. In: Helfert, M., Desprez, F., Ferguson, D.,
Leymann, F., Muoz, V.M. (eds.) Proceedings of the 4th International Conference
on Cloud Computing and Services Science (CLOSER 2014), pp. 193–198. SciTe
Press (April 2014)
15. Siebenhaar, M., Wenge, O., Hans, R., Tercan, H., Steinmetz, R.: Verifying the
Availability of Cloud Applications. In: Jarke, M., Helfert, M. (eds.) Proceedings
of the 3rd International Conference on Cloud Computing and Services Science
(CLOSER 2013), pp. 489–494. SciTe Press (May 2013)
Compensation-Based vs. Convergent
Deployment Automation for Services Operated
in the Cloud
1 Introduction
Cloud computing [10,21,7] can be used in different setups such as public, private,
and hybrid Cloud environments to efficiently run a variety of kinds of applica-
tions, exposed as services (SaaS). Prominent examples are Web applications,
back-ends for mobile applications, and applications in the field of the “internet
of things”, e.g., to process large amounts of sensor data. Users of such services
based on Cloud applications expect high availability and low latency when inter-
acting with a service. Consequently, the applications need to scale rapidly and
dynamically to serve thousands or even millions of users properly. To implement
scaling in a cost-efficient way the application has to be elastic, which means that
application instances are provisioned and decommissioned rapidly and automat-
ically based on the current load. Cloud providers offer on-demand self-service
capabilities, e.g., by providing corresponding APIs to provision and manage re-
sources such as virtual machines, databases, and runtime environments. These
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 336–350, 2014.
c Springer-Verlag Berlin Heidelberg 2014
Compensation-Based vs. Convergent Deployment Automation 337
capabilities are the foundation for scaling applications and implementing elas-
ticity mechanisms to run them efficiently in terms of costs. Moreover, users of
services operated in the Cloud expect fast responses to their changing and grow-
ing requirements as well as fixes of issues that occur. Thus, underlying applica-
tions need to be redeployed frequently to production, e.g., several times a week.
Development and operations need to be tightly coupled to enable such frequent
redeployments. DevOps [5,3] aims to eliminate the split between developers and
operations to automate the complete deployment process from the source code in
version control to the production environment. Today, the DevOps community
follows a leading paradigm to automate the deployment, namely to implement
idempotent scripts to converge resources toward a desired state. Because this
approach has some major drawbacks we propose an alternative approach based
on compensation. Our major contributions are presented in this paper:
– We present the fundamentals of state-of-the-art deployment automation ap-
proaches and point out existing deficiencies and difficulties
– We propose an alternative approach to implement deployment automation
based on compensation on different levels of granularity to improve the effi-
ciency and robustness of script execution
– We further show how compensation actions can be automatically derived at
runtime to ease the implementation of compensation based on snapshots
– We evaluate the compensation-based deployment automation approach based
on different kinds of applications operated in the Cloud and exposed as ser-
vices
The remainder of this paper is structured as follows: based on the funda-
mentals showing state-of-the-art deployment automation approaches (Sect. 2),
focusing on convergent deployment automation, we present the problem state-
ment in Sect. 3. To tackle the resulting challenges, Sect. 4 presents approaches
to implement compensation-based deployment automation. Our evaluation of
compensation-based deployment automation is presented and discussed in Sect. 5
and Sect. 6. Finally, Sect. 7 presents related work and Sect. 8 concludes this pa-
per.
2 Fundamentals
The automated deployment of middleware and application components can be
implemented using general-purpose scripting languages such as Perl, Python, or
Unix shell scripts. This is what system administrators and operations personnel
were primarily using before the advent of DevOps tools providing domain-specific
languages [2] to create scripts for deployment automation purposes. We stick to
the following definition 1 for a script to be used for automating operations,
especially considering deployment automation:
Definition 1 (Operations Script). An operations script (in short script) is
an arbitrary executable to deploy and operate middleware and application compo-
nents by modifying the state of resources such as virtual machines. Such a state
338 J, Wettinger, U. Breitenbücher, and F. Leymann
Script
Action A3
Run script …
to reach
desired state
Resource
(e.g., a virtual machine, a container, etc.)
Fig. 1 shows the basic usage of scripts: several actions Ax are specified in the
script that are command statements (install package “mysql”, create directory
“cache”, etc.) to transfer a particular resource such as a virtual machine (VM)
or a container [15,17] into a desired state. For instance, the original state of the
virtual machine could be a plain Ubuntu operating system (OS) that is installed,
whereas the desired state is a VM that runs a WordPress blog1 on the Ubuntu
OS. Consequently, a script needs to execute commands required to install and
configure all components (Apache HTTP server, MySQL database server, etc.)
that are necessary to run WordPress on the VM.
This is a straightforward implementation of deployment automation. However,
this approach has a major drawback: in case an error occurs during the execution
1
WordPress: http://www.wordpress.org
Compensation-Based vs. Convergent Deployment Automation 339
of the script, the resource is in an unknown, most probably inconsistent state. For
instance, the MySQL database server is installed and running, but the installa-
tion of Apache HTTP server broke, so the application is not usable. Thus, either
manual intervention is required or the whole resource has to be dropped and a
new resource has to be provisioned (e.g., create a new instance of a VM image)
to execute the script again. This is even more difficult in case the original state
is not captured, e.g., using a VM snapshot. In this case manual intervention is
required to restore the original state. This is error-prone, time-consuming, costly,
and most importantly even impossible in cases where the original state is not
documented or captured. Since errors definitely occur in Cloud environments,
e.g., if the network connection breaks during the retrieval of a software package,
it is a serious challenge to implement full and robust deployment automation.
This is why the DevOps community provides techniques and tools to imple-
ment convergent deployment automation: its foundation is the implementation
of idempotent scripts [4], meaning the script execution on a particular resource
such as a VM can be repeated arbitrarily, always leading to the same result
if no error occurs; if an error occurs during execution and the desired state
is not reached (i.e., resource is in an unknown state) the script is executed
again and again until the desired state is reached. Thus, idempotent scripts
can be used to converge a particular resource toward a desired state without
dropping the resource as shown in Fig. 1. With this approach the resource
does not get stuck in an inconsistent state. DevOps tools such as Chef [11]
provide a declarative domain-specific language to define idempotent actions
(e.g., Chef resources2) that are translated to imperative command statements
at runtime, depending on the underlying operating system. For instance, the
declarative statement “ensure that package apache2 is installed” is translated
to the following command on an Ubuntu OS: apt-get -y install apache2; on
a Red Hat OS, the same declarative statement is translated to yum -y install
apache2. Imperative command statements can also be expressed in an idem-
potent manner. For instance, a simple command to install the Apache HTTP
server on Ubuntu (apt-get -y install apache2) is automatically idempotent
because if the package apache2 is already installed, the command will still
complete successfully without doing anything. Other command statements need
to be adapted such as a command to retrieve the content of a remote Git3
repository: git clone http://gitserver/my_repo. This command would fail
when executing it for a second time because the directory my_repo already
exists. To make the command statement idempotent a minor extension is re-
quired that preventively deletes the my_repo directory: rm -rf my_repo && git
clone http://gitserver/my_repo.
2
Chef resources: http://docs.opscode.com/resource.html
3
Git: http://git-scm.com
340 J, Wettinger, U. Breitenbücher, and F. Leymann
3 Problem Statement
As discussed in Sect. 2, convergent deployment automation makes the execution
of scripts more robust. However, it may not be the most efficient approach to
repeatedly execute the whole script in case of errors until the desired state is
reached. Furthermore, this approach only works in conjunction with idempotent
scripts. While in most cases it is possible to implement idempotent actions, it can
be challenging and sophisticated to implement fully idempotent scripts without
holding specific state information for each action that was executed. Typical
examples include:
– An action to create a database or another external entity by name, so the
second execution results in an error such as “the database already exists”.
– An action that sends a non-idempotent request to an external service (e.g.,
a POST request to a RESTful API), so the second request most probably
produces a different result.
– An action to clone a Git repository, so the second execution fails because
the directory for the repository already exists in the local filesystem.
Compensation Script
Script
If error occurs on
execution of AX Action A1
pt
run comp. script
run script againin A
Action A2
Action A3
Run script …
to reach
desired state
Resource
(e.g., a virtual machine, a container, etc.)
The following extract of a Unix shell script shows how a corresponding com-
pensation script could be implemented:
342 J, Wettinger, U. Breitenbücher, and F. Leymann
1 #! / b i n / sh
2
3 echo "DROP DATABASE $DBNAME" | mysql −u $USER
4
5 rm − r f my_repo
6
7 ...
Action A2
A Compensation Action CA2
Resource
(e.g., a virtual machine, a container, etc.)
1 ...
2
3 RUN c u r l −H " Content−Type : a p p l i c a t i o n / j s o n " −X PUT −−d a t a " @$ID . j s o n "
−u $USER :$PASSWORD h t t p : / / . . . / e n t r i e s / $ID
4
5 COMPENSATE c u r l −X DELETE −u $USER :$PASSWORD h t t p : / / . . . / e n t r i e s /
$ID
6
7 RUN . . .
8 COMPENSATE . . .
Resource
(e.g., a virtual machine, a container, etc.)
5 Evaluation
Ruby on Rails
PHP Module
MySQL Framework MySQL
Node.js
Database Database
Runtime
Apache HTTP Server Server
Ruby Runtime
Server
Fig. 5 outlines the architectures of the applications, namely a simple Chat Appli-
cation7 based on Node.js, the Ruby-based project management and bug-tracking
tool Redmine8 , and WordPress9 to run blogs based on PHP. Each application is
deployed on a clean VM (1 virtual CPU clocked at 2.8 GHz, 2 GB of memory)
on top of the VirtualBox hypervisor10, running a minimalist installation of the
Ubuntu OS, version 14.04.
Application Average Duration (in sec.) Average Memory Usage (in MB)
Application Average Duration (in sec.) Average Memory Usage (in MB)
6 Discussion
incremental snapshots are cached to quickly restore the state captured after the
last successfully executed action. This happens preventively, even in case the
snapshots are not used, e.g., if no error occurs (clean environment). For longer-
running deployment processes with more memory consumption in general such
as the one of Redmine this overhead becomes less relevant, so in some cases
such as the Docker-based deployment of Redmine the memory usage is even less
compared to the corresponding Chef-based deployment.
In a disturbed environment that may be similar to an error-prone Cloud envi-
ronment, where network issues appear and memory bottlenecks occur, the gap
between the compensation-based and the convergent approach is significantly
larger in terms of deployment duration. In this case compensation clearly out-
performs convergence. Considering the design and implementation of scripts the
compensation-based scripts and actions are easier to implement because they
do not have to be idempotent as in the convergent approach. Moreover, most
compensation actions can be automatically generated at runtime based on snap-
shots, so the implementation of custom compensation actions is not necessary for
most actions. Fine-grained snapshots are also a convenient tool when developing,
testing, and debugging scripts: snapshots can be created at any point in time to
capture a working state and build upon this state, always having the possibility
to quickly restore this state. Without using snapshots the whole script has to be
executed for each test run. This can be time-consuming in case of more complex
scripts that do not terminate after a few seconds already.
7 Related Work
Script
run SCR1
Workflow
8 Conclusions
References
1. Breitenbücher, U., Binz, T., Kopp, O., Leymann, F.: Pattern-based Runtime Man-
agement of Composite Cloud Applications. In: Proceedings of the 3rd International
Conference on Cloud Computing and Services Science. SciTePress (2013)
2. Günther, S., Haupt, M., Splieth, M.: Utilizing Internal Domain-Specific Languages
for Deployment and Maintenance of IT Infrastructures. Tech. rep., Very Large Busi-
ness Applications Lab Magdeburg, Fakultät für Informatik, Otto-von-Guericke-
Universität Magdeburg (2010)
3. Humble, J., Molesky, J.: Why Enterprises Must Adopt Devops to Enable Continu-
ous Delivery. Cutter IT Journal 24 (2011)
4. Hummer, W., Rosenberg, F., Oliveira, F., Eilam, T.: Testing Idempotence for In-
frastructure as Code. In: Eyers, D., Schwan, K. (eds.) Middleware 2013. LNCS,
vol. 8275, pp. 368–388. Springer, Heidelberg (2013)
16
Cloud Foundry: http://cloudfoundry.org
350 J, Wettinger, U. Breitenbücher, and F. Leymann
1 Introduction
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 351–358, 2014.
c Springer-Verlag Berlin Heidelberg 2014
352 S. Cheikhrouhou et al.
2 Motivating Example
To illustrate the features of the proposed approach, we introduce a Web shopping
scenario inspired by Amazon [3]. The booking process can be described as follows
: When ordering books on-line, a collaboration between a customer, a seller
company like Amazon and a shipper company like FedEx is established. Fig. 1
shows such a collaboration with the help of an excerpt of an IOBP involving the
processes of different partners. The BPMN 2.0 standard, is used for the depiction
of the IOBP. This latter represents a simple scenario in which the customer sends
an order to the seller, makes payment and expects to receive the items from a
shipper.
As shown in Fig. 1, different temporal constraints can be assigned to business
processes. Theses constraints include duration of activities (e.g., the duration of
the activity Ship products is 24 hours) and deadlines (e.g., DSeller = 35 hours to
denote that the execution of the Seller process takes no longer than 35 hours).
Additionally, dashed lines between activities depict message exchange. For
instance, there is a message exchange between activities Send order of the Cus-
tomer and Receive order of the Seller. In spite each business process is consistent
against its temporal constraints, the IOBP does not intrinsically guarantee the
satisfaction of the whole temporal constraints such as those related to dead-
lines. We see significant potential in proposing a consistency analysis approach.
Indeed, it is clear that considering temporal constraints of the example while
respecting process deadlines is a fastidious and error prone task.
Time-Aware Consistency of Collaborative IOBP 353
In the sequel of the paper, we refer to activities of the motivating example with
abbreviations using first letters’ name of activities (eg. RSD to denote Receive
shipment details). Fig. 3 exhibits the timed graphs of the processes of different
partners involved in the motivating example, namely the Shipper (PShip ), the
Seller (PSel ) and the Customer (PCust ) processes. For more details about the
calculation of the timed graph, we refer the reader to [1,2].
354 S. Cheikhrouhou et al.
Fig. 3. Timed graphs of the Shipper, the Seller and the Customer processes
Let x ∈ Pj /Ci be the set of solutions satisfying the consistency condition (Eq.1).
between Pj and Pk ; for the calculation of both Pj /Ci and Pk /Ci . Supposing at
least one communication between processes Pj and Pk , additional calculations
must be performed in order to adjust the intervals Pj /Ci and Pk /Ci accordingly;
which will be the main focus of Step 2.
- Analyzing consistency of multiple processes (Step 2)
The aim of Step 2 is to gather solutions for temporal inconsistencies while
considering all involved processes in the IOBP (i.e. all communications between
the processes). Indeed, it provides a set of constraints on the starting time of
processes such that if each process satisfies the constraint, the whole collabora-
tion is still possible to be successfully carried out. Step 2 requires that Step 1
be completed successfully.
In an inter-organisational business process, we can deduce implicit temporal
relations beyond those resulting from direct communications between Pj and Pi .
We argue that the communication between processes Pj and Pk , has an impact
on both time intervals Pj /Ci and Pk /Ci .
In our approach, the transitivity behavior of the temporal relationships intro-
duced by Allen in [4] helps to deduce a new interval Pj /Ci from the two intervals
Pj /Ck and Pk /Ci (the result of Step1). Pj /Ci denotes the interval delimiting
the starting time of process Pj related to the start of process Pi (related to
clock Ci ) while considering indirect communication links between processes
Pi and Pj (precisely, the communication between Pj and Pk and between Pk
and Pi ). Given Pj /Ck = [minjk , maxjk ] and Pk /Ci = [minki , maxki ], Pj /Ci is
calculated as follows: Pj /Ci = [minjk + minki , maxjk + maxki ].
As a result, we introduce Pj IOBP
/Ci to denote the resulting interval delimiting
the starting time of process Pj regarding the starting time of process Pi while
considering both direct and indirect communication links as follows :
Pj IOBP
/Ci = Pj /Ci Pj /Ci (4)
Consider again the timed graphs of the shipper (PShip ), the seller (PSel ), and
the customer (PCust ) processes of the motivating example as depicted in Fig. 3.
Provided also with the starting time intervals resulting from Step 1 of our ap-
proach, namely, PShip /CSel = [3, 32], PCust /CSel = [−34, 24], and PShip /CCust =
[−23, 23]. Let’s conduct now Step 2 of the approach.
IOBP
PShip /C
= PShip /CSel PShip /C = [3, 32] [−57, 47] = [3, 32]. (
Sel Sel
PShip /C = [−57, 47] = [−23−34, 23+24] deduced from PShip /CCust = [−23, 23]
Sel
IOBP
and PCust /CSel = [−34, 24]). The same applies to PCust /CSel = [−20, 24].
Time-Aware Consistency of Collaborative IOBP 357
4 Related Work
The approach of Bettini et al. [5] provides temporal constraints reasoning and
management tool offering the consistency checking of temporal requirements in
workflows systems. Second, it monitors workflow activities and predicts their
starting and ending time. Finally it provides the enactment service with useful
temporal information for activity scheduling. Reluctantly, consistency algorithms
have only been defined for activities of a single process and does not consider
collaborative processes exchanging messages.
In [6,7,8,9,10], the authors use temporal properties in order to analyse the timed
compatibility in Web service compositions. Several temporal conflicts are identi-
fied in asynchronous web service interactions. In this approach, the focus has been
on the construction of a correct Web service composition using mediators. Never-
theless, the scope of this approach is limited to the verification of time constraints
only caused by message interaction between services of the process.
In [11], Du et al. present a Petri net-based method to compose Web services
by adding a mediation net to deal with message mismatches. Their approach
implements both timed compatibility checking by generating modular timed
state graphs. Compared to our work, they can only work at service level, and
have limitation to cover the temporal dependencies of involved services in a
business collaboration.
The approach proposed by Eder in [1,2] is closely related to ours since it
uses the concept of timed graphs while analysing the consistency issue in inter-
organisational collaborations. Nevertheless, this work is too restrictive since it
358 S. Cheikhrouhou et al.
assumes that both processes begin at the same time. Furthermore, only the case
with two partners is considered.
5 Conclusion
In this paper, we proposed an approach aiming at discovering temporal inconsis-
tencies that may constitute obstacles towards the interaction of business processes.
Additionally, it gathers for solutions to resolve the temporal inconsistencies by
providing each partner with temporal restrictions about the starting time of its
processes in accordance with the overall temporal constraints of all involved pro-
cesses. Consequently, as long as each process starts executing within the specified
time period, the overall temporal constraints of the IOBP will be satisfied. Cur-
rently, we are working on a tool support for the proposed approach based on the
Eclipse BPMN2 modeler.
References
1. Eder, J., Panagos, E., Rabinovich, M.I.: Time Constraints in Workflow Systems.
In: Jarke, M., Oberweis, A. (eds.) CAiSE 1999. LNCS, vol. 1626, pp. 286–300.
Springer, Heidelberg (1999)
2. Eder, J., Tahamtan, A.: Temporal Consistency of View Based Interorganizational
Workflows. In: Kaschek, R., Kop, C., Steinberger, C., Fliedl, G. (eds.) Synchroniza-
tion Techniques for Chaotic Commun. Syst. LNBIP, vol. 5, pp. 96–107. Springer,
Heidelberg (2008)
3. van der Aalst, W.M.P., Weske, M.: The P2P Approach to Interorganizational
Workflows. In: Dittrich, K.R., Geppert, A., Norrie, M. C. (eds.) CAiSE 2001.
LNCS, vol. 2068, pp. 140–156. Springer, Heidelberg (2001)
4. Allen, J.F.: Maintaining knowledge about temporal intervals. Communications of
the ACM 26(11), 832–843 (1983)
5. Bettini, C., Wang, X.S., Jajodia, S.: Temporal Reasoning in Workflow Systems.
Distributed and Parallel Databases (2002)
6. Guermouche, N., Godart, C.: Timed Conversational Protocol Based Approach for
Web Services Analysis. In: Maglio, P.P., Weske, M., Yang, J., Fantinato, M. (eds.)
ICSOC 2010. LNCS, vol. 6470, pp. 603–611. Springer, Heidelberg (2010)
7. Guermouche, N., Godart, C.: Timed model checking based approach for web ser-
vices analysis. In: ICWS. IEEE CS (2009)
8. Cheikhrouhou, S., Kallel, S., Guermouche, N., Jmaiel, M.: Enhancing Formal Spec-
ification and Verification of Temporal Constraints in Business Processes. In: Pro-
ceedings of the 11th IEEE International Conference on Services Computing (SCC).
IEEE Computer Society (2014)
9. Kallel, S., Charfi, A., Dinkelaker, T., Mezini, M., Jmaiel, M.: Specifying and Mon-
itoring Temporal Properties in Web Services Compositions. In: ECOWS. IEEE CS
(2009)
10. Guidara, I., Guermouche, N., Chaari, T., Tazi, S., Jmaiel, M.: Pruning Based
Service Selection Approach under Qos and Temporal Constraints. In: ICWS. IEEE
CS (2014)
11. Du, Y., Tan, W., Zhou, M.: Timed compatibility analysis of web service composi-
tion: A modular approach based on petri nets. IEEE Transaction on Automation
Science and Engineering (2014)
Weak Conformance between
Process Models and Synchronized Object Life Cycles
1 Introduction
Business process management allows organizations to specify their processes struc-
turally by means of process models, which are then used for process execution. Process
models comprise multiple perspectives with two of them receiving the most attention
in recent years: control flow and data [22]. These describe behavioral execution con-
straints between activities as well as between activities and data objects. It is usually
accepted that control flow drives execution of a process model. While checking control
flow correctness using soundness [1] is an accepted method, correctness regarding data
and control flow is not addressed in sufficient detail. In this paper, we describe a formal-
ism to integrate control flow and data perspectives that is used to check for correctness.
In order to achieve safe execution of a process model, it must be ensured that every
time an activity attempts to access a data object, the data object is in a certain expected
data state or is able to reach the expected data state from the current one, i.e., data speci-
fication within a process model must conform to relevant object life cycles, where each
describes the allowed behavior of a distinct class of data objects. Otherwise, the execu-
tion of a process model may deadlock. To check for deadlock-free execution in terms
of data constraints, the notion of object life cycle conformance [9, 20] is used. This ap-
proach has some restrictions with respect to data constraint specification, because each
single change of a data object as specified in the object life cycle, we refer to as data
state transition, must be performed by some activity. [21] relaxes this limitation such
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 359–367, 2014.
c Springer-Verlag Berlin Heidelberg 2014
360 A. Meyer and M. Weske
that several state changes can be subsumed within one activity. However, gaps within
the data constraints specification, i.e., implicit data state transitions, are not allowed al-
though some other process may be responsible of performing a state change of an object,
i.e., these approaches can only check whether an object is in a certain expected state.
We assume that implicit data state transitions get realized by an external entity or by
detailed implementations of process model activities. In real world process repositories,
usually many of those underspecified process models exist, which motivates the intro-
duction of the notion of weak conformance [13]. It allows to also check underspecified
models.
Additionally, in real world, often dependencies between multiple data objects exist;
e.g., an order may only be shipped to the customer after the payment is recognized. Non
of above approaches supports this. Thus, we utilize the concept of synchronized object
life cycles that allows to specify dependencies between data states as well as state tran-
sitions of different object life cycles [16]. Based thereon, we extend the notion of weak
conformance and describe how to compute it for a given process model and the corre-
sponding object life cycles including synchronizations. We utilize the well established
method of soundness checking [1] to check for process model correctness. For mapping
a process model to a Petri net, we utilize an extension covering data constraints [16] to
a widely-used control flow mapping [4] to enable an integrated checking of control flow
and data correctness. Further, fundamentals and preliminaries required in the scope of
this paper are discussed in Section 2 of our report [16].
The remainder is structured as follows. First, we discuss weak conformance in gen-
eral and compare it to existing conformance notions in Section 2 before we introduce
the extended notion of weak conformance in Section 3. Afterwards, we discuss the pro-
cedure for integrated correctness checking in Section 4. Section 5 is devoted to related
work before we conclude the paper in Section 6.
2 Weak Conformance
The notion of weak conformance has been initially proposed in [13] as extension to the
notion of object life cycle conformance [9, 20] to allow the support of underspecified
process models. A fully specified process model contains all reads and writes of data
nodes by all activities. Additionally, each activity reads and writes at least one data
node except for the first and last activities, which may lack reading respectively writing
a data node in case they only create respectively consume a data node.
In contrast, underspecified process Table 1. Applicability and time complexity of data
models may lack some reads or writes conformance computation algorithms
of data nodes such that they are im-
plicit, performed by some other pro- Attribute [9, 20] [21] [13] this
cess, or they are hidden in aggregated Full + + + +
activities changing the state multiple specification
times with respect to the object life Underspecification - o + +
cycle. Though, full support of under- Synchronization - - - +
specified process models requires that
the process model may omit state Complexity exp. poly. – exp.
changes of data nodes although they are specified in the object life cycle.
Weak Conformance between Process Models and Synchronized Object Life Cycles 361
In this paper, we extend the notion of weak conformance to also support object
life cycle synchronization. First, we compare different approaches to check for confor-
mance between a process model and object life cycles. Table 1 lists the applicability and
specifies the time complexity of the computation algorithms for approaches described
in [9, 20], [21], [13], and this paper. The notion from [9, 20] requires fully specified
process models and abstracts from inter-dependencies between object life cycles by not
considering them for conformance checking in case they are modeled. Conformance
computation is done in polynomial time. In [21], underspecification of process models
is partly supported, because a single activity may change multiple data states at once
(aggregated activity). Though, full support of underspecified process models would re-
quire that the process model may omit data state changes completely although they
are specified in the object life cycle. Synchronization between object life cycles is not
considered in that approach and complexity-wise, it requires exponential time. [13] sup-
ports fully and underspecified process models but lacks support for object life cycle syn-
chronization, which is then solved by the extension described in this section. For [13],
no computation algorithm is given such that no complexity can be derived. The solution
presented in this paper requires exponential time through the Petri net mapping and sub-
sequent soundness checking as described in Section 4. However, state space reduction
techniques may help to reduce the computation time for soundness checking [6]. The
choice of using soundness checking to verify weak conformance allows to check for
control flow soundness as well as weak conformance in one analysis and still allows to
distinguish occurring violations caused by control flow or data flow.
with d referring to c holds (i) fA ⇒m fA implies ϑ(f ) ⇒lc ϑ(f ), (ii) ∀se ∈ SE ϑ(f )
originating from the same object life cycle l ∈ L : ∃ξ(se) == true, and (iii) fA = fA
implies f represents a read and f represents a write operation of the same activity. #
Given a process scenario, we say that it satisfies weak conformance, if the process
model satisfies weak conformance with respect to each of the used data classes. Weak
data class conformance is satisfied, (i),(iii) if for the data states of each two directly
succeeding data nodes referring to the same data class in a process model there exists a
path from the first to the second data state in the corresponding object life cycle and (ii)
if the dependencies specified by synchronization edges with a target state matching the
state of the second data node of the two succeeding ones hold such that all dependency
conjunctions and disjunctions are fulfilled. Two data nodes of the same class are directly
succeeding in the process model, if either (1) they are accessed by the same activity with
one being read and one being written or (2) there exists a path in the process model in
which two different activities access data nodes of the same class in two data states with
no further access to a node of this data class in-between.
type, and origin from the same object life cycle. The preset of an added place com-
prises all transitions directly preceding the places representing the source and the target
data states of the corresponding synchronization edge. The postset of an added place
comprises all transitions directly preceding the place representing the target state of the
synchronization edge. For currently typed edges, the postset additionally comprises the
set of all transitions directly succeeding the place representing the source state.
For each untyped synchronization edge, one transition is added to the Petri net. If
seT {src ∪ tgt} = ∅ for two untyped synchronization edges, i.e., they share one data
state, then both transitions are merged. The preset and postset of each transition com-
prise newly added places; one for each (transitively) involved synchronization edge for
the preset and the postset respectively. Such preset place directly succeeds the transi-
tions that in turn are part of the preset of the place representing the data state from which
the data state transition origins. Such postset place directly precedes the transition repre-
senting the corresponding source or target transition of the typed synchronization edge.
2—Petri Net Integration: First, data states occurring in the object life cycles but not in
the process model need to be handled to ensure deadlock free integration of both Petri
nets. We add one place p to the Petri net, which handles all not occurring states, i.e.,
avoids execution of these paths. Let each qi be a place representing such not occurring
data state. Then, the preset of each transition tj being part of the preset of qi is extended
with place p, if the preset of tj contains a data state which postset comprises more than
one transition in the original Petri net mapped from the synchronized object life cycle.
Each data state represented as place in the Petri net Read O
mapped from the process model consists of a control flow in data state
s
and a data flow component as visualized in Fig. 1 with
D
C and D. Within the integrated Petri net, the control flow C
component is responsible for the flow of the object life O.s
cycle and the data flow component is responsible for the
data flow in the process model. The integration of both Fig. 1. Internal places for a
Petri nets follows three rules, distinguishable with respect place representing a data state
to read and write operations. The rules use the data flow
component of data state places.
(IR-1) A place p from the object life cycle Petri net representing a data state of a
data class to be read by some activity in the process model is added to the preset of
the transition stating that this data node (object) is read in this specific state, e.g., the
preset of transition Read O in data state s is extended with the place representing data
state s of class O, and (IR-2) a new place q is added to the integrated Petri net, which
extends the postset of the transition stating that the data node (object) is read in the
specific state and which extends the preset of each transition being part of the postset
of place p, e.g., the place connecting transition Read O in data state s and the two
transitions succeeding the place labeled O.s. (IR-3) Let v be a place from the object life
cycle Petri net representing a data state of a class to be written by some activity in the
process model. Then a new place w is added to the integrated Petri net, which extends
the preset of each transition being part of the preset of w and which extends the postset
of the transition stating that the data node (object) is written in the specific state. the
Petri net derived from the process model stating this write.
364 A. Meyer and M. Weske
3—Workflow Net System: Soundness checking has been introduced for workflow net
systems [1,12]. Workflow nets are Petri nets with a single source and a single sink place
and they are strongly connected after adding a transition connecting the sink place with
the source place [1]. The integrated Petri net needs to be post-processed towards these
properties by adding enabler and collector fragments. The enabler fragment consists of
the single source place directly succeeded by a transition y. The postset of y comprises
all places representing an initial data state of some object life cycle and the source place
of the process model Petri net. The preset of each place is adapted accordingly.
The collector fragment first consists of a transition t preceding the single sink node.
For each distinct data class of the process scenario, one place pi and one place qi are
added to the collector. Each place pi has transition t as postset1 . Then, for each final data
state of some object life cycle, a transition ui is added to the collector. Each transition
ui has as preset the place representing the corresponding data state and some place qi
referring to the same data class. The postset of a transition ui is the corresponding place
pi also referring to the same data class. Additionally, a transition z succeeded by one
place is added to the collector. The place’s postset is transition t. The preset of z is the
sink place of the process model Petri net. The postset of z is extended with each place
qi .
Next, the synchronization places need to be considered. If a typed synchronization
edge involves the initial state of some object life cycle as source, then the correspond-
ing place is added to the postset of transition y of the enabler fragment. For all syn-
chronization edges typed previously, the postset of the corresponding place is extended
with transition t of the collector. If a currently typed synchronization edge involves a
final state of some object life cycle as source, then the corresponding place is added
to the postset of the corresponding transition ui of the collector fragment. Finally, the
semaphore places need to be integrated. Therefore, for each semaphore place, the preset
is extended with transition y from the enabler and the postset is extended with transition
t from the collector fragments. Now, connecting sink and source node, the workflow net
is strongly connected. A workflow net system consists of a workflow net and some ini-
tial marking. The workflow net is given above and the initial marking puts a token into
the single source place and nowhere else.
4—Soundness Checking: Assuming control flow correctness, if the workflow net sys-
tem satisfies the soundness property [1], no contradictions between the process model
and the object life cycles exist and all data states presented in all object life cycles are
implicitly or explicitly utilized in the process model, i.e., all paths in the object life
cycles may be taken. If it satisfies the weak soundness property [12], no contradictions
between the process model and the object life cycles exist but some of the data states
are never reached during execution of the process model. In case, control flow inconsis-
tencies would appear, places and transitions representing the control flow would cause
the violation allowing to distinguish between control flow and data conformance issues.
Validation. The described approach reliably decides about weak conformance of a pro-
cess scenario. It takes sound Petri net fragments as input and combines them with
1
Generally, we assume that addition of one element a to the preset of another element b implies
the addition of b to the postset of a and vice versa.
Weak Conformance between Process Models and Synchronized Object Life Cycles 365
respect to specified data dependencies. Single source and sink places are achieved
through the addition of elements either marking the original source places or collect-
ing tokens from the original final places. Thus, they do not change the behavior of the
process model and the object life cycles, i.e., they do not influence the result.
5 Related Work
The increasing interest in the development of process models for execution has shifted
the focus from control flow to data flow perspective leading to integrated scenarios
providing control as well as data flow views. One step in this regard are object-centric
processes [3,17,23] that connect data classes with the control flow of process models by
specifying object life cycles. [8] introduces the essential requirements of this modeling
paradigm. [9, 20] present an approach, which connects object life cycles with process
models by determining commonalities between both representations and transforming
one into the other. Covering one direction of the integration, [10] derives object life
cycles from process models. Tackling the integration of control flow and data, [14, 15]
enable to model data constraints and to enforce them during process execution directly
from the model. Similar to the mentioned approaches, we concentrate on integrated
scenarios incorporating process models and object life cycles removing the assumption
that both representations must completely correspond to each other. Instead, we set a
synchronized object life cycle as reference that describes data manipulations allowed in
a traditional, i.e., activity-driven, modeled process scenario, e.g., with BPMN [18].
The field of compliance checking focuses on control flow aspects using predefined
rule sets containing, for instance, business policies. However, some works do consider
data. [11] applies compliance checking to object-centric processes by creating pro-
cess models following this paradigm from a set of rules. However, these rules most
often specify control flow requirements. [7] provides a technique to check for confor-
mance of object-centric processes containing multiple data classes by mapping to an
interaction conformance problem, which can be solved by decomposition into smaller
sub-problems, which in turn are solved by using classical conformance checking tech-
niques. [23] introduces a framework that ensures consistent specialization of object-
centric processes, i.e., it ensures consistency between two object life cycles. In contrast,
we check for consistency between a traditional process model and an object life cycle.
Eshuis [5] uses a symbolic model checker to verify conformance of UML activity di-
agrams [19] considering control and data flow perspectives while data states are not
considered in his approach. [9] introduces compliance between a process model and
an object life cycle as the combination of object life cycle conformance (all data state
transitions induced in the process model must occur in the object life cycle) and cov-
erage (opposite containment relation). [21] introduces conformance checking between
process models and product life cycles, which in fact are object life cycles, because a
product life cycle determines for a product the states and the allowed state transitions.
Compared to the notion of weak conformance, both notions do not support data syn-
chronization and both set restrictions with respect to data constraints specification in
the process model.
366 A. Meyer and M. Weske
6 Conclusion
In this paper, we presented an approach for the integrated verification of control flow
correctness and weak data conformance using soundness checking considering depen-
dencies between multiple data classes, e.g., an order is only allowed to be shipped
after the payment was received but needs to be shipped with an confirmed invoice in
one package. Therefore, we utilized the concept of synchronized object life cycles. For
checking data correctness, we use the notion of weak conformance and extended it with
means for object life cycle synchronization. Additionally, we utilized a mapping of a
process model with data constraints to a Petri net and described a mapping of a syn-
chronized object life cycle to a Petri net. Both resulting Petri nets are combined for an
integrated control flow and data conformance check based on the soundness criterion.
With respect to the places or transitions causing soundness violations, we can distin-
guish between control flow and data flow issues and therefore, we can verify the notion
of weak conformance. Revealed violations can be highlighted in the process model and
the synchronized object life cycle to support correction. In this paper, we focused on
the violation identification such that correction is subject to future work.
References
1. van der Aalst, W.M.P.: Verification of Workflow Nets. In: Azéma, P., Balbo, G. (eds.)
ICATPN 1997. LNCS, vol. 1248, pp. 407–426. Springer, Heidelberg (1997)
2. van der Aalst, W.M.P.: Workflow Verification: Finding Control-Flow Errors Using Petri-Net-
Based Techniques. In: van der Aalst, W.M.P., Desel, J., Oberweis, A. (eds.) Business Process
Management. LNCS, vol. 1806, pp. 161–183. Springer, Heidelberg (2000)
3. Cohn, D., Hull, R.: Business Artifacts: A Data-centric Approach to Modeling Business Op-
erations and Processes. IEEE Data Engineering Bulletin 32(3), 3–9 (2009)
4. Dijkman, R.M., Dumas, M., Ouyang, C.: Semantics and Analysis of Business Process Mod-
els in BPMN. Information & Software Technology 50(12), 1281–1294 (2008)
5. Eshuis, R.: Symbolic Model Checking of UML Activity Diagrams. ACM Transactions on
Software Engineering and Methodology (TOSEM) 15(1), 1–38 (2006)
6. Fahland, D., Favre, C., Jobstmann, B., Koehler, J., Lohmann, N., Völzer, H., Wolf, K.: In-
stantaneous Soundness Checking of Industrial Business Process Models. In: Dayal, U., Eder,
J., Koehler, J., Reijers, H.A. (eds.) BPM 2009. LNCS, vol. 5701, pp. 278–293. Springer,
Heidelberg (2009)
7. Fahland, D., de Leoni, M., van Dongen, B.F., van der Aalst, W.M.P.: Conformance Checking
of Interacting Processes with Overlapping Instances. In: Rinderle-Ma, S., Toumani, F., Wolf,
K. (eds.) BPM 2011. LNCS, vol. 6896, pp. 345–361. Springer, Heidelberg (2011)
8. Künzle, V., Weber, B., Reichert, M.: Object-aware Business Processes: Fundamental Require-
ments and their Support in Existing Approaches. IJISMD 2(2), 19–46 (2011)
9. Küster, J.M., Ryndina, K., Gall, H.C.: Generation of Business Process Models for Object
Life Cycle Compliance. In: Alonso, G., Dadam, P., Rosemann, M. (eds.) BPM 2007. LNCS,
vol. 4714, pp. 165–181. Springer, Heidelberg (2007)
10. Liu, R., Wu, F.Y., Kumaran, S.: Transforming Activity-Centric Business Process Models into
Information-Centric Models for SOA Solutions. J. Database Manag. 21(4), 14–34 (2010)
11. Lohmann, N.: Compliance by design for artifact-centric business processes. In: Rinderle-
Ma, S., Toumani, F., Wolf, K. (eds.) BPM 2011. LNCS, vol. 6896, pp. 99–115. Springer,
Heidelberg (2011)
Weak Conformance between Process Models and Synchronized Object Life Cycles 367
12. Martens, A.: On Usability of Web Services. In: Web Information Systems Engineering Work-
shops, pp. 182–190. IEEE (2003)
13. Meyer, A., Polyvyanyy, A., Weske, M.: Weak Conformance of Process Models with respect
to Data Objects. In: Services and their Composition (ZEUS), pp. 74–80 (2012)
14. Meyer, A., Pufahl, L., Batoulis, K., Kruse, S., Lindhauer, T., Stoff, T., Fahland, D., Weske,
M.: Automating Data Exchange in Process Choreographies. In: Jarke, M., Mylopoulos, J.,
Quix, C., Rolland, C., Manolopoulos, Y., Mouratidis, H., Horkoff, J. (eds.) CAiSE 2014.
LNCS, vol. 8484, pp. 316–331. Springer, Heidelberg (2014)
15. Meyer, A., Pufahl, L., Fahland, D., Weske, M.: Modeling and Enacting Complex Data Depen-
dencies in Business Processes. In: Daniel, F., Wang, J., Weber, B. (eds.) BPM 2013. LNCS,
vol. 8094, pp. 171–186. Springer, Heidelberg (2013)
16. Meyer, A., Weske, M.: Weak Conformance between Process Models and Object Life Cycles.
Tech. rep., Hasso Plattner Institute at the University of Potsdam (2014)
17. Nigam, A., Caswell, N.S.: Business artifacts: An approach to operational specification. IBM
Systems Journal 42(3), 428–445 (2003)
18. OMG: Business Process Model and Notation (BPMN), Version 2.0 (January 2011)
19. OMG: Unified Modeling Language (UML), Version 2.4.1 (August 2011)
20. Ryndina, K., Küster, J.M., Gall, H.C.: Consistency of Business Process Models and Object
Life Cycles. In: Kühne, T. (ed.) MoDELS 2006. LNCS, vol. 4364, pp. 80–90. Springer, Hei-
delberg (2007)
21. Wang, Z., ter Hofstede, A.H.M., Ouyang, C., Wynn, M., Wang, J., Zhu, X.: How to Guarantee
Compliance between Workflows and Product Lifecycles? Tech. rep., BPM Center Report
BPM-11-10 (2011)
22. Weske, M.: Business Process Management: Concepts, Languages, Architectures, 2nd edn.
Springer (2012)
23. Yongchareon, S., Liu, C., Zhao, X.: A Framework for Behavior-Consistent Specialization of
Artifact-Centric Business Processes. In: Barros, A., Gal, A., Kindler, E. (eds.) BPM 2012.
LNCS, vol. 7481, pp. 285–301. Springer, Heidelberg (2012)
Failure-Proof Spatio-temporal Composition of Sensor
Cloud Services
Azadeh Ghari Neiat, Athman Bouguettaya, Timos Sellis, and Hai Dong
1 Introduction
The large amount of real-time sensor data streaming from Wireless Sensor Networks
(WSNs) is a challenging issue because of storage capacity, processing power and data
management constraints [1]. Cloud computing is a promising technology to support the
storage and processing of the ever increasing amount of data [2]. The integration of
WSNs with the cloud (i.e., Sensor-Cloud) [3] provides unique capabilities and oppor-
tunities, particularly for the use of data service-centric applications. Sensor-Cloud is a
potential key enabler for large-scale data sharing and cooperation among different users
and applications.
A main challenge in Sensor-Cloud is the efficient and real-time delivery of sensor
data to end users. The preferred technology to enable delivery is services [4], i.e., sensor
data made available as a service (i.e. Sensor-Cloud service) to different clients over
a Sensor-Cloud infrastructure. The service paradigm is a powerful abstraction hiding
data-specific information focusing on how data is to be used. In this regard, sensor data
on the cloud is abstracted as Sensor-Cloud services easily accessible irrespective of the
distribution of sensor data sources. In this paper, we propose a service-oriented Sensor-
Cloud architecture that provides an integrated view of the sensor data shared on the
cloud and delivered as services.
The “position” and “time” of sensed data are of paramount importance reflecting the
spatio-temporal characteristics. Spatio-temporal features are fundamental to the func-
tional aspect of the Sensor-Cloud. In this regard, we focus on spatio-temporal aspects
as key parameters to query the Sensor-Cloud.
Composition provides a means to aggregate Sensor-Cloud services. In a highly dy-
namic environment such as those found in sensed environments, the non-functional
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 368–377, 2014.
c Springer-Verlag Berlin Heidelberg 2014
Failure-Proof Spatio-temporal Composition of Sensor Cloud Services 369
properties (QoS) of Sensor-Cloud services may fluctuate [5]. For example, a participant
service may no longer be available or its QoS constraint has been fluctuated at runtime.
As a result, the service may no longer provide the required QoS and fail. Therefore,
the initial composition plan may become non-optimal and needs to be replanned to deal
with the changing conditions of such environments.
This paper focuses on providing an efficient failure-proof spatio-temporal composi-
tion model for Sensor-Cloud services. In particular, new spatio-temporal QoS attributes
to evaluate Sensor-Cloud services based on spatio-temporal properties of the services
are proposed. We propose a failure-proof spatio-temporal combinatorial search algo-
rithm to deal with the affecting component Sensor-Cloud services based on D* Lite
algorithm [6] which is an incremental version of A* algorithm. D* Lite algorithm is
efficient at repairing the plan when the new information about the environment is re-
ceived [10]. Our proposed approach continually improves its initial composition plan
and find the best composition plan from a given source-point to a given destination
point while QoS constraints change.
The rest of the paper is structured as follows: Section 2 presents the proposed spatio-
temporal model for Sensor-Cloud services. Section 3 illustrates the spatio-temporal
QoS model. Section 4 elaborates the details of the proposed failure-proof composition
approach. Section 5 evaluates the approach and shows the experiment results. Section
6 concludes the paper and highlights some future work.
Motivating Scenario
We use a typical scenario from public transport as our motivating scenario. Suppose
Sarah is planning to travel from ‘A’ to ‘B’. She wants to get information about the travel
370 A. Ghari Neiat et al.
services (i.e., buses, trams, trains and ferries) in the city to plan her journey. Different
users may have different requirements and preferences regarding QoS. For example,
Sarah may specify her requirements as maximum walk 300 meters and waiting time 10
minutes at any connecting stop. In this scenario, we assume that each bus (tram / train
/ ferry) has a set of deployed sensors (see Fig. 1). We also assume that there are sev-
eral bus sensor providers (i.e., sensor data providers) who supply sensor data collected
from different buses. Assuming that each sensor data provider owns a subset of a set
of sensors on each bus. In addition, there are several Sensor-Cloud data providers who
supply Infrastructure as a Service (IaaS), i.e., CPU services, storage services, and net-
work services to sensor data providers. Sensor-Cloud service providers make services
available that may query multiple heterogeneous sensor data providers. We assume that
each Sensor-Cloud service provider offers one or more Sensor-Cloud services to help
commuters devise the “ best ” journey plan. Different Sensor-Cloud service providers
may query the same sensor data providers. The quality of services that they provide
may also be different.
In our scenario, Sarah uses the Sensor-Cloud services to plan her journey. It is
quite possible that a single service cannot satisfy Sarah’s requirements. In such cases,
Sensor-Cloud services may need to be composed to provide the best travel plan. The
composer acts on behalf of the end users to compose Sensor-Cloud services from
different Sensor-Cloud service providers.
– Definition 1: Sensor sen. A sensor seni is a tuple of < id, (loci , tsi )> where
• id is a unique sensor ID,
• (loci , tsi ) shows the latest recorded location of sensor seni and timestamp
tsi is the latest time in which sensor data related to Sensor-Cloud service is
collected from sensor seni .
– Definition 2: Sensor-Cloud Service S. A Sensor-Cloud service Si is a tuple of <
id, ISi , F Si , di , Fi , Qi , SENi > where
• id is a unique service ID,
• ISi (Initial State) is a tuple < Ps , ts >, where
∗ Ps is a GPS start-point of Si ,
∗ ts is a start-time of Si .
Failure-Proof Spatio-temporal Composition of Sensor Cloud Services 371
In the remainder of the paper, the service and composite service are used to refer to a
Sensor-Cloud service and composite Sensor-Cloud service, respectively.
– Service time (st): Given an atomic service S, the service time qst (S) measures the
expected time in minutes between the start and destination points. The value of
qst (S) is computed as follows:
qst (S) = S.te − S.ts (1)
– Currency (cur): Currency indicates the temporal accuracy of a service. Given an
atomic service S, currency qcur (S) is computed using the expression
(currenttime − timestamp(S)). Since each service consists of a set of sensors
{sen1 , ..., senn }, timestamp(S) will be computed as follows:
timestamp(S) = Avg(tsi) (2)
– Accuracy (acc): Accuracy reflects how a service is assured. For example, a smaller
value of accuracy shows the fewer sensors contribute to the results of the service.
Given an atomic service S, the accuracy qacc (S) is the number of operating sensors
covering the specific spatial area related to S. The value of the qacc (S) is computed
as follows: Nsen (S)
0 1 (3)
Tc
where Nsen (S) is the expected number of operating sensors in S and Tc is the total
number of sensors covering the spatial area related to S. Nsen (S) can be estimated
based on the number of sen in S. We assume that Tc is known. It is also assumed
that all sensors have the same functionalities and accuracy. For example, sensor
data related to the bus service S65 is collected from 4 sensors ( Nsen = 4) from 20
sensors ( Tc = 20 ) deployed on a bus where is the spatial area related to S65 .
Failure-Proof Spatio-temporal Composition of Sensor Cloud Services 373
– Service time: The service time of a composite service is the sum of the service time
of all its component services in addition to the transition time trans between two
component services. The transition time is computed as follows:
n−1
trans = (S(j+1) .start-time − Sj .end-time) (4)
j=1
where Sj and Sj+1 are two subsequent component services and S1 .end-time is the
start time of a query Qt .
– Currency: The currency value of a composite service is the average of the currency
of all the selected services.
– Accuracy: The accuracy value for a composite service is the product of the accuracy
of all its component services.
where g-score calculates the QoS utility score [9] of selected services from the source-
point ς to the current service and heuristic function h-score estimates the Euclidean
distance between the end-point of candidate service S and the destination-point ξ.
In this section, we propose a novel failure-proof service composition approach based
on spatio-temporal aspects of services to support real-time response to fluctuation
of QoS attributes. We introduce a new heuristic algorithm based on D* Lite, called
STD*Lite. D* Lite is a dynamic shortest path finding algorithm that has been exten-
sively applied in mobile robot and autonomous vehicle navigation. D* Lite is capable
of efficiently replanning paths in changing environment [10]. Whenever the QoS values
of component services in the initial composition plan significantly change at runtime,
STD*Lite recomputes a new optimal composition plan from its current position to the
destination. Without loss of generality, we only consider temporal QoS fluctuations in
service time qst . In our approach, the existence of a temporal QoS change is ascertained
by measuring the value of difference τ between the measured qst of a service at runtime
and its promised qst . If τ is more than a defined threshold , a QoS change has occurred.
STD*Lite algorithm , like STA*, maintains an estimate g-score for each service S
in the composition plan. Since STD*Lite searches backward from the destination-point
to the source-point, g-score estimates the QoS utility score of the optimal path from S
to the destination. It also maintains a second kind of estimates called rhs value which
is one step lookahead of g-score. Therefore, it is better informed than g-score and
computed as follows:
0 S.Pe = ξ
rhs(S) = (6)
minS ∈SuccNeighboursList(S)(trans(S , S) + g-score(S )) S.Pe = ξ
where trans(S , S) is the transition time between S’ and S and SuccN eighboursList
is the set of successor neighbours of the service S. The rationale of using neighbours is
that the optimal plan from S to the destination must pass through one of the neighbours
of S. Therefore, if we can identify the optimal plans from any of the neighbours to
the destination, we can compute the optimal plan for S. The successor neighbours of a
service S are identified through Spatio-TemporalSearch algorithm in [7].
By comparing g-score and rhs, the algorithm identifies all affecting, called inconsis-
tent, component services. A service is called locally consistent iff its rhs value equals
to its g-score value, otherwise it is called locally inconsistent. A locally inconsistent
service falls into two categories: underconsistent (if g-score(S) < rhs(S)) and over-
consistent (if g-score(S) > rhs(S)). A service is called underconsistent if its QoS
values degrades. In such a situation, the QoS values of affecting services should be
updated and the composition plan should adapt to the violations. Moreover, a service
is called overconsistent if its QoS values become better. An overconsistent service im-
plies that a more optimal plan can be found from the current service. When a service
is inconsistent, the algorithm updates all of it’s neighbours and itself again. Updating
services makes them consistent.
Algorithm 1 presents the details of STD*Lite algorithm. The algorithm generates
an optimal initial composition plan like a backward STA* search {Line 33-42}. If the
QoS values of component services change after generating the initial composition plan,
STD*Lite updates the inconsistent (i.e., affecting) component services and expands
the services to recompute a new optimal composition plan {43-47}. All inconsistent
Failure-Proof Spatio-temporal Composition of Sensor Cloud Services 375
services are then inserted in a priority queue CandidateQueue to be updated and made
consistent. STD*Lite avoids redundant updates through updating only the inconsistent
services which are necessary to modify, while A* updates all of the plan. The priority
of an inconsistent service in CandidateQueue is determined by key value as follows:
key(S) = [k1 (S), k2 (S)]
(7)
= [min(g-score(S), rhs(S)) + h-score(Sstart , S), min(g-score(S), rhs(S))]
The keys are compared in a lexicographical order. The priority of key(S) < key(S )
, iff k1 (S) < k1 (S ) or k1 (S) = k1 (S ) and k2 (S) < k2 (S ). The heuristics in k1
serves in the same way as f -score in STA*. The algorithm applies this heuristic to
ensure that only the services either newly overconsistent or newly underconsistent that
are relevant to repairing the current plan are processed. The inconsistent services are
selected in the order of increasing priority which implies that the services which are
closer to the Sstart (i.e. less h-score value) should be processed first. Note that as the
algorithm tracks the execution of the composition plan, the start service Sstart becomes
the current running service of the plan. Therefore, when a QoS value fluctuates, a new
optimal plan is computed from the original destination to the new start service (i.e.
current service).
The algorithm finally recompute a new optimal plan by calling ComputePlan() func-
tion {48}. ComputePlan() expands the local inconsistent services on CandidateQueue
and updates g-score and rhs values and add them to or remove them from Candidate-
Queue with their corresponding keys by calling UpdateService() function {4-15}.
When ComputePlan() expands an overconsistent service, it sets g-score value of the
service equals to its rhs value to make it locally consistent {20}. Since rhs values
of predecessor neighbours of a service are computed based on the g-score value of
the service, any changes of its g-score value can effect the local consistency of its
predecessor neighbours. As a result, predecessor neighbours {19} of an inconsistent
service should be updated {21-23}.
When ComputePlan() expands an underconsistent service, it sets g-score value of
the service to infinity to make it either overconsistent or consistent {25}. The predeces-
sor neighbour services of the service need also to be updated {26-28}. ComputePlan()
expands the services until the key value of the next service to expand is not less than
the key value of Sstart and Sstart is locally consistent {17}.
5 Experiments Results
We conduct a set of experiments to assess the effectiveness of the proposed approach
over different QoS fluctuation ratio. We run our experiments on a 3.40 GHZ Intel Core
i7 processor and 8 GB RAM under Windows 7. To the best of our knowledge, there is
no spatio-temporal service test case to be used for experimental purposes. Therefore,
we focus on evaluating the proposed approach using synthetic spatio-temporal services.
In our simulation, 1000 nodes are randomly distributed in a 30 km × 30 km region.
The radius for neighbour search r as 0.5% of the specified region. All experiments are
conducted 1000 times and the average results are computed. Each experiment starts
from a different source and destination which are randomly generated. Two spatio-
temporal QoS attributes of the syntactic service instances are randomly generated with
376 A. Ghari Neiat et al.
a uniform distribution from the following intervals: qacc ∈ [0, 1] and qcur ∈ [60, 1440].
The qst is assigned based on the distance between Ps and Pe considering a fixed speed.
The remaining parameters are also randomly generated using a uniform distribution.
103
102
101
100
5 10 15 20 25 30
QoS Fluctuation Ratio (%)
ratio. The slight difference (i.e., less than 10 ms over 100000 services) shows the relative
stability of our approach when QoS is highly violated.
6 Conclusion
This paper proposes a novel approach for failure-proof composition of Sensor-Cloud
services in terms of spatio-temporal aspects. We introduce a new failure-proof spatio-
temporal combinatorial search algorithm based on D* Lite to replan a composition plan
in case of QoS changes. We conduct preliminary experiments to illustrate the perfor-
mance of our approach. Future work includes implementing a prototype and test it with
real-world applications with focusing on building Sensor-Clouds for public transport.
References
1. Hossain, M.A.: A survey on sensor-cloud: architecture, applications, and approaches. Inter-
national Journal of Distributed Sensor Networks (2013)
2. Lee, K., Murray, D., Hughes, D., Joosen, W.: Extending sensor networks into the cloud using
amazon web services. In: 2010 IEEE International Conference on Networked Embedded
Systems for Enterprise Applications (NESEA), pp. 1–7. IEEE Press (2010)
3. Rajesh, V., Gnanasekar, J., Ponmagal, R., Anbalagan, P.: Integration of wireless sensor
network with cloud. In: 2010 International Conference on Recent Trends in Information,
Telecommunication and Computing (ITC), pp. 321–323. IEEE Press (2010)
4. Carey, M.J., Onose, N., Petropoulos, M.: Data services. Communications of the ACM 55(6),
86–97 (2012)
5. Ben Mabrouk, N., Beauche, S., Kuznetsova, E., Georgantas, N., Issarny, V.: Qos-aware ser-
vice composition in dynamic service oriented environments. In: Bacon, J.M., Cooper, B.F.
(eds.) Middleware 2009. LNCS, vol. 5896, pp. 123–142. Springer, Heidelberg (2009)
6. Koenig, S., Likhachev, M.: D* lite. In: AAAI/IAAI, pp. 476–483 (2002)
7. Ghari Neiat, A., Bouguettaya, A., Sellis, T., Ye, Z.: Spatio-temporal composition of sen-
sor cloud services. In: 21th IEEE International Conference on Web Services (ICWS), pp.
241–248. IEEE Press (2014)
8. Theoderidis, Y., Vazirgiannis, M., Sellis, T.: Spatio-temporal indexing for large multimedia
applications. In: Proceedings of the Third IEEE International Conference on Multimedia
Computing and Systems, pp. 441–448. IEEE Press (1996)
9. Zeng, L., Benatallah, B., Ngu, A.H.H., Dumas, M., Kalagnanam, J., Chang, H.: Qos-
aware middleware for web services composition. IEEE Transactions on Software Engineer-
ing 30(5), 311–327 (2004)
10. Koenig, S., Likhachev, M.: Improved fast replanning for robot navigation in unknown terrain.
In: IEEE International Conference on Robotics and Automation, pp. 968–975 (2002)
Probabilistic Prediction of the QoS of Service
Orchestrations: A Truly Compositional
Approach
1 Introduction
Quality of Service (QoS) of a service orchestration depend on the QoS of services
it invokes. When selecting and composing various services together, the designer
of an orchestrator has to consider whether the desired composition yields an
overall QoS level which is acceptable for the application. In order to predict QoS
two characteristics of service orchestration must be considered:
– Different results of service invocations. Each invoked service can return a suc-
cessful reply, a fault notification, or even no reply at all. If a fault is returned,
a fault handling routine will be executed instead of the normal control flow.
If no reply is received, the orchestrator may wait forever for a reply (unless
some parallel branch throws a fault). In either case, the resulting QoS of the
composition differs from the case of successful invocation.
– Non-determinism in the workflow. Different runs of the same application can
have different QoS values just because the orchestration control flow is non-
deterministic due to two reasons. Firstly, different runs of the orchestration
can get different service invocation results (success/fault/no reply). It is
worth noting that a service is not always faulty or successful, rather it has a
certain probability of being successful (as guaranteed in its SLA). Secondly,
Work partly supported by the EU-FP7-ICT-610531 SeaClouds project.
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 378–385, 2014.
c Springer-Verlag Berlin Heidelberg 2014
Probabilistic Prediction of the QoS of Service Orchestrations 379
alternative and iterative control flow structures (if/else and loops) depend
on input data which may differ in different runs. This leads, for instance,
to different numbers of loop iterations or to different branches executed in
a if/else structure. Moreover certain QoS properties of invoked services can
vary from one run to another (e.g., response time).
2 Related Work
Various approaches (e.g., [3–9]) have been proposed to determine the QoS of
service compositions.
Cardoso [3] presented a mathematical model and an algorithm to compute the
QoS of a workflow composition. He iteratively reduces the workflow by removing
parallel, sequence, alternative and looping structures according to a set of re-
duction rules, until only one activity remains. However, some workflow complex
dependencies cannot be decomposed into parallel or sequence, as shown in [9].
This kind of approach has been adopted also by others [5, 7, 8], some of whom
(e.g., [4]) tried to overcome such limitation by defining more reduction patterns.
Mukherjee et al. [6,9] presented a algorithm to estimate the QoS of WS-BPEL
compositions. They convert a WS-BPEL workflow into an activity dependency
graph, and assign probabilities of being executed to each activity. In their frame-
work it is possible to treat any arbitrary complex dependency structure as well
as fault driven flow control. However, they do not consider correlation between
activities which do not have a direct dependency, and this in some cases can
yield a wrong result.
Zheng et al. [8] focused on QoS estimation for compositions represented by
service graphs. In their approach however they only marginally deal with par-
allelism, by not considering arbitrary synchronization links (i.e., they restrict
to cases in which is possible to decompose flow -like structures into parallel and
sequences, as in [3]), and they do not take into account fault handling. Moreover,
380 L. Bartoloni, A. Brogi, and A. Ibrahim
they need to fix an upper bound to the number of iterations for cycles, in order
to allow decomposition into acyclic graph. They also assume that service invo-
cations are deterministic, namely services are always successful and their QoS is
not changing from one run to another.
To the best of our knowledge all previous approaches require to know a priori
the exact number of iterations, or at least an upper bound for each loop in order
to estimate QoS values. Also, other approaches rarely take fault handling into
account, and never deal with non-responding services.
In this section we introduce our algorithm to provide a QoS estimate for a service
orchestration based on the QoS of the services it invokes. Our input workflows
can contain any arbitrary dependency structure (i.e., not only for parallel and
sequential execution patterns), fault handling, unbound loops and can preserves
correlation, for example in diamond dependencies.
Our algorithm uses a structural recursive function that associates each WS-
BPEL activity with a cost structure. This cost structure is a tuple of metadata
chosen accordingly to the QoS values we want to compute. The cost structure
has to carry enough information to allow computation of QoS values and allow
composing it with other costs using the standard WS-BPEL constructs, i.e. it
needs to have a composition function for each WS-BPEL construct. Later we
will show that it is possible to write a composition function for most of WS-
BPEL composition constructs by only requiring two basic operations on the cost
data type. The first is the compositor for independent parallel execution of two
activities. Suppose we have two activities A and B, we assume to be able to
compute the cost of executing both in parallel only knowing the cost of those
activities, by using a given function Both. The second compositor is the one we
use to resolve dependency. If a WS-BPEL construct of A and B introduces some
dependency/synchronization between the two activities, namely we suppose that
it forces the activity B to start after completion of A, we will need to adjust the
cost of B to take into account the dependence introduced by the composition
structure, and we suppose to be able to do it from the costs of A and B by using
a given operation Delay1 . For example in our model the Sequence(A,B)
construct is decomposed into a parallel execution of the independent activity
A and the activity B synchronized after A, as such its cost can be written, in
absence of faults, as:
Cost(A) = cA Cost(B) = cB
Cost(Sequence(A, B)) = Both(cA, Delay(cB, cA))
This is similar to what has been done in previous approaches (e.g., [3]) in
which the Flow dependency graph is decomposed into parallel and sequence
1
We use Delay as function name because in most cases this affects only time-based
properties of the dependent activity, such as completion time.
Probabilistic Prediction of the QoS of Service Orchestrations 381
– Data dependent control flow can not be evaluated exactly, because data
values are unknown.
From the flow analysis point of view the Scope activity is very similar to
a Sequence, except that while Sequence executes the second activity only
if the first is successful, in Scope the fault handler is executed only when the
first yields a Fault. For external invocations we expect to have a sampling
function describing the service, which can be written according to the service’s
QoS. Note that if the service has a WS-BPEL description, its sampling function
can be computed in the same way with this algorithm.
let Eval (Invoke(s)) env =
s.getSamplingFunction()
4 Example
To illustrate our approach, we consider a bank customer loan request example
(Figure 1), which is variation of the well-known WS-BPEL loan example [1].
We want to estimate values for the Reliability, amortized expense for successful
execution and average response time of this composition. Let us assume for the
loan example the distribution of variable assignments and invoked services QoS
shown in Table 1.
The algorithm will start by evaluating the cost and outcome for the outermost
Flow activity and computes delayed costs for the activities in the flow, and then
sums them with the All compositor. Table 2 summarizes six runs of the Eval
384 L. Bartoloni, A. Brogi, and A. Ibrahim
Process Start
Flow
Sequence
Receive Customer
Link:receiveToAssess Link:receiveToApproval
Loan Request
bigAmount=False bigAmount=True
Assign bigAmount=
(LoanRequest>=10,000$)
Sequence Sequence
Link:assessToApproval Assign Condition=True
Invoke
Risk Assessor highRisk=True
While (Condition=True)
Scope
Assign highRisk =
(riskAssessment > 10%)
Sequence
Invoke FAULT HANDLER
Loan Approver
highRisk=False
Catch All
Assign Condition
= False Change
Endpoint
Reply
Link:setMessageToReply Customer Link:approvalToReply
Process End
function on the loan request example. To estimate the required QoS proper-
ties, we will perform a Monte Carlo sampling. Reliability can be determined by
computing the expectation of successCount. Amortized expense and average
response time are divided by reliability to normalize them with respect to the
number of successful executions.
5 Conclusions
In this paper we have presented a novel approach to probabilistically predict the
QoS of service orchestrations. Our algorithm improves previous approaches by
coping with complex dependency structures, unbound loops, fault handling, and
unresponded service invocations. Our algorithm can be fruitfully exploited both
to probabilistically predict QoS values before defining the SLA of an orches-
tration and to compare the effect of substituting one or more endpoints (viz.,
remote services).
We see different possible directions for future work. One of them is to ex-
tend our approach to model some other WS-BPEL constructs that we have not
discussed in this paper, like Pick and EventHandlers. Another possible exten-
sion could be to allow for cases in which no information at all (not even a branch
execution probability) is available for flow control structures. Similarly the un-
correlated samples restriction imposed on invocations and assignments should
be relaxed. We would also like to be able to specify some degree of correlation
between consecutive samples (e.g., if a service invocations yields a fault because
it is "down for maintenance" we should increase the probability of getting the
same fault in the next invocation).
References
1. Jordan, D., Evdemon, J., Alves, A., Arkin, A., Askary, S., Barreto, C., Bloch, B.,
Curbera, F., Ford, M., Goland, Y., et al.: Web services business process execution
language version 2.0. OASIS standard 11 (2007)
2. Dunn, W.L., Shultis, J.K.: Exploring Monte Carlo Methods. Elsevier (2011)
3. Cardoso, A.J.S.: Quality of service and semantic composition of workflows. PhD
thesis, Univ. of Georgia (2002)
4. Jaeger, M., Rojec-Goldmann, G., Muhl, G.: QoS aggregation for web service com-
position using workflow patterns. In: Proceedings of the Eighth IEEE International
Enterprise Distributed Object Computing Conference, EDOC, pp. 149–159 (2004)
5. Ben Mabrouk, N., Beauche, S., Kuznetsova, E., Georgantas, N., Issarny, V.: QoS-
Aware service composition in dynamic service oriented environments. In: Bacon,
J.M., Cooper, B.F. (eds.) Middleware 2009. LNCS, vol. 5896, pp. 123–142. Springer,
Heidelberg (2009)
6. Mukherjee, D., Jalote, P., Gowri Nanda, M.: Determining QoS of WS-BPEL com-
positions. In: Bouguettaya, A., Krueger, I., Margaria, T. (eds.) ICSOC 2008. LNCS,
vol. 5364, pp. 378–393. Springer, Heidelberg (2008)
7. Wang, H., Sun, H., Yu, Q.: Reliable service composition via automatic QoS predic-
tion. In: IEEE International Conference on Services Computing (SCC), pp. 200–207
(2013)
8. Zheng, H., Zhao, W., Yang, J., Bouguettaya, A.: Qos analysis for web service com-
positions with complex structures. IEEE Transactions on Services Computing 6,
373–386 (2013)
9. Mukherjee, D.: QOS IN WS-BPEL PROCESSES. Master’s thesis, Indian Institute
of Technology, Delhi (2008)
10. Syme, D., Granicz, A., Cisternino, A.: Expert F# 3.0, 3rd edn. Apress, Berkeley
(2012)
QoS-Aware Complex Event Service Composition
and Optimization Using Genetic Algorithms
1 Introduction
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 386–393, 2014.
c Springer-Verlag Berlin Heidelberg 2014
Complex Event Service Composition and Optimization 387
In this paper, we extend the work in [2], which aims to provide CEP applica-
tions as reusable services and the reusability of those event services is determined
by examining complex event patterns and primitive event types. This paper aims
to enable a QoS-aware event service composition and optimization. In order to
facilitate QoS-aware complex event service composition, two issues should be
considered: QoS aggregation and composition efficiency. The QoS aggregation
for a complex event service relies on how its member events are correlated. The
aggregation rules are inherently different to conventional web services. Efficiency
becomes an issue when the complex event consists of many primitive events, and
each primitive event detection task can be achieved by multiple event services.
This paper addresses both issues by: 1) creating QoS aggregation rules and util-
ity functions to estimate and assess QoS for event service compositions, and 2)
enabling efficient event service compositions and optimization with regard to
QoS constraints and preferences based on Genetic Algorithms.
The remainder of the paper is organized as follows: Section 2 discusses re-
lated works in QoS-aware service planning; Section 3 presents the QoS model
we use and the QoS aggregation rules we define; Section 4 presents the heuristic
algorithm we use to achieve global optimization for event service compositions
based on Genetic Algorithms (GA); Section 5 evaluates the proposed approach;
conclusions and future work are discussed in Section 6.
2 Related Work
The first step of solving the QoS-aware service composition problem is to define a
QoS model, a set of QoS aggregation rules and a utility function. Existing works
have discussed these topics extensively [5,7]. In this paper we extract typical
QoS properties from existing works and define a similar utility function based
on Simple Additive Weighting (SAW). However, the aggregation rules in existing
works focus on conventional web services rather than complex event services,
which has a different QoS aggregation schema. For example, event engines also
has an impact on QoS aggregation, which is not considered in conventional
service QoS aggregation. Also, the aggregation rules for some QoS properties
based on event composition patterns are different to those based on workflow
patterns (as in [5]), which we will explain in details in Section 3.1.
As a second step, different concrete service compositions are created and com-
pared with regard to their QoS utilities to determine the optimal choice. To
achieve this efficiently, various GA-based approaches are developed [8,1,6]. The
above GA-based approaches can only evaluate service composition plans with
fixed sets of service tasks (abstract services) and cannot evaluate composition
plans which are semantically equivalent, but consist of different service tasks,
i.e., service tasks on different granularity levels. A more recent work in [7] ad-
dresses this issue by developing a GA based on Generalized Component Services.
Results in [7] indicate that up to a 10% utility enhancement can be obtained
by expanding the search space. Composing events on different granularity lev-
els is also a desired feature for complex event service composition. However, [7]
388 F. Gao et al.
only caters for Input, Output, Precondition and Effect based service composi-
tions. Complex event service composition requires an event pattern based reuse
mechanism [2].
In this section, a QoS aggregation schema is presented to estimate the QoS prop-
erties for event service composition. A utility function is introduced to evaluate
the QoS utility under constraints and preferences.
In this paper, some typical QoS attributes are investigated, including: latency,
price, bandwidth consumption, availability, completeness, accuracy and security.
A numerical quality vector Q =< L, P, E, B, Ava, C, Acc, S > is used to specify
the QoS measures of an event service with regard to these dimensions. The
Composition Plan is a key factor in aggregating quality vectors for event service
compositions. As in [2], a composition plan contains an event pattern correlating
event services with event operators. Event patterns are modeled as event syntax
trees. In this paper, a step-wise transformation of event syntax tree is adopted
to aggregate QoS properties. Aggregation rules for different QoS dimensions can
be event operator dependent or independent, as shown in Table 1. In Table 1,
E denotes an event service composition. P (E ), E(E ) etc. denote QoS values of
E . Eice and Edst denotes the set of Immediately Composed Event services and
Direct Sub-Trees the syntax tree of E , respectively. f (E ) gives the frequency of
E , card(E ) gives the repetition cardinality of the root node in E .
Event Service 1
type= e1 Query
loc=loc1 Event Service 4
e1 reusable on e1
type= e4 OR Event Service 3
SEQ loc=loc4 type= e3
reusable on SEQ reusable on e3 loc=loc3
SEQ e3 e3
Event Service 2
type= e2 e2
e1
loc=loc2
e2 e1 e2
reusable on e2
e1 e2 e4 e3
5 Evaluation
In this section we present the experimental results of the proposed approaches
based on simulated datasets. The weights of QoS metrics in the preference vector
are equally set to 1.0, and a loose constraint is defined in the query which do
not reject any event service compositions to enlarge the search space.
(c) Convergence time under different selec- (d) Max QoS utility under different selec-
tion factor tion factor
produces about 79% optimal results in much a shorter time, compared with the
brute-force enumerations.
There are two ways to increase the utility in the GA results: increase the size
of the initial population or the selection probability for the individuals in each
generation. To evaluate the influence of the initial population size and selection
probability, we execute the genetic evolutions with different population sizes and
selection probabilities over the second dataset (BF-5) in Table 2.
Figure 3(a) and Figure 3(b) show the growth of execution time and best QoS
utility retrieved using from 200 to 1200 CCPs as the initial populations. From
the results we can see that the growth of evolution time is (almost) linear to
the size of the initial population. In total, increasing the initial population from
200 to 1200 gains an additional 0.276 (15.6%) QoS utility with the cost of 1344
milliseconds of execution time.
In the tests above, we adopt the Roullete Wheel selection policy with elites.
However, this selection policy results in early extinction of the population. To
produce more generations, we simply increase the selection probability with a
Complex Event Service Composition and Optimization 393
In this paper a QoS aggregation schema and utility function is proposed to cal-
culate QoS vectors for event services (compositions) and rank them based on
user-defined constraints and preferences. Then, a genetic algorithm is developed
and evaluated to efficiently create optimal event service compositions. The ex-
perimental results show that the genetic algorithm is scalable, and by leveraging
the trade-off between convergence time and degree of optimization, the algo-
rithm gives 79% to 97% optimized results. As future work, we plan to validate
our approach based on real-world datasets. We also plan to enable adaptive event
compositions based on the GA developed in this paper.
References
1. Canfora, G., Di Penta, M., Esposito, R., Villani, M.L.: A lightweight approach for
qos-aware service composition. In: Proceedings of 2nd International Conference on
Service Oriented Computing, ICSOC 2004 (2004)
2. Gao, F., Curry, E., Bhiri, S.: Complex Event Service Provision and Composition
based on Event Pattern Matchmaking. In: Proceedings of the 8th ACM International
Conference on Distributed Event-Based Systems. ACM, Mumbai (2014)
3. Hasan, S., Curry, E.: Approximate Semantic Matching of Events for The Internet
of Things. ACM Transactions on Internet Technology, TOIT (2014)
4. Hinze, A., Sachs, K., Buchmann, A.: Event-based applications and enabling tech-
nologies. In: Proceedings of the Third ACM International Conference on Distributed
Event-Based Systems, DEBS 2009, pp. 1:1–1:15. ACM, New York (2009)
5. Jaeger, M., Rojec-Goldmann, G., Muhl, G.: Qos aggregation for web service com-
position using workflow patterns. In: Proceedings of the Eighth IEEE International
Enterprise Distributed Object Computing Conference, EDOC 2004, pp. 149–159
(2004)
6. Karatas, F., Kesdogan, D.: An approach for compliance-aware service selection with
genetic algorithms. In: Basu, S., Pautasso, C., Zhang, L., Fu, X. (eds.) ICSOC 2013.
LNCS, vol. 8274, pp. 465–473. Springer, Heidelberg (2013)
7. Wu, Q., Zhu, Q., Jian, X.: Qos-aware multi-granularity service composition based
on generalized component services. In: Basu, S., Pautasso, C., Zhang, L., Fu, X.
(eds.) ICSOC 2013. LNCS, vol. 8274, pp. 446–455. Springer, Heidelberg (2013)
8. Zhang, L.J., Li, B.: Requirements driven dynamic services composition for web
services and grid solutions. Journal of Grid Computing 2(2), 121–140 (2004)
Towards QoS Prediction Based on Composition
∗
Structure Analysis and Probabilistic Models
1 Introduction
Analyzing and predicting QoS of service compositions during the design phase makes
it possible to explore design decisions under different environment conditions and can
greatly reduce the amount and cost of maintenance, help the adaptation of software
architectures, and increase overall software quality.
The QoS of a service composition depends both on the QoS of the individual compo-
nents and on the structure of the composition. The effects of the execution environment
also impact the observed QoS, which exhibits a stochastic variability due (among oth-
ers) to changes in network traffic, machine load, cache behavior, database accesses at
a given moment, etc. QoS prediction is notoriously challenging when, as in the case of
service-oriented systems, boundaries and behavior are not fully specified.
Fig. 1 shows a fragment of a service composition.
Input: transport Let us assume that we are interested on execution time
if transport == "train" and that we know (e.g., from observations) the prob-
call SearchTrain ability distribution functions for the response times
else
of the two services invoked in it (Fig. 2 (a) and (b)),
call SearchFlight
whose averages are 5 and 3 seconds, respectively. The
end
average response time for Fig. 1 may actually be sel-
Fig. 1. Simple orchestration
dom observed, as executions cluster around 3 and 5
∗
The research leading to these results has received funding from the EU FP 7 2007-2013 pro-
gram under agreement 610686 POLCA, from the Madrid Regional Government under CM
project S2013/ICE-2731 (N-Greens), and from the Spanish Ministry of Economy and Com-
petitiveness under projects TIN-2008-05624 DOVES and TIN2011-39391-C04-03 StrongSoft.
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 394–402, 2014.
c Springer-Verlag Berlin Heidelberg 2014
QoS Prediction 395
seconds. Moreover, this average is not useful to answer questions such as what is the
probability that the answer time is less than 4 seconds, which is interesting to, for ex-
ample, negotiate penalties to compensate for SLA deviations.
If we know the probability that train / plane trips are requested (e.g., 0.3 and 0.7, re-
spectively), we can construct the probability distribution of the QoS of the composition
(Fig. 2 (c)). This result gives much more information and insight on the expected QoS
of the composition, and makes it possible to answer the question presented above.
2 Related Work
The basis for the classical approach to the analysis of QoS for service compositions
[2,1,4] is QoS aggregation. Most approaches focus on the control structure without
considering data operations, and return expected values as result. This falls short to de-
scribe the composition behavior. More recent approaches infer upper and lower bounds
of QoS based on input data and environmental factors [5]. However, bounds often need
to be too large, since services often exhibit a “long-tail” behavior, and bounds do not
capture the shape of the distribution of values, which limits the usefulness of the de-
scription.
Recent proposals [7] work directly with statistical distributions of QoS, but use a very
abstract composition model that is far from existing implementation languages and does
not take into account internal compositions of data and operations, which is bound to
give less accurate results. The work in [6] uses probability distributions to drive SLA
negotiation and optimization. While its focus is complementary to ours, it does not
take into account the relationships between the internal state and data operations of the
composition (including the initial inputs) and its QoS.
interpretation of every construct starts with a distribution ρ before the construct is ex-
ecuted and produces a ρ ! after it is executed. ρ ! describes all possible executions (and
only those) that are consistent with the distribution ρ before the execution.
Let us assume variables x, y ∈ {1, 2} and their probability distributions in Fig. 4
and the code in Fig. 5. At the beginning, all the combinations of these two values
are equally probable. The question is what are the probabilities of the possible val-
ues of x and y at the end of the if−then−else; let us call these values x! and y! . Since
whether x or y are updated depends on their concrete values, not all combinations
of x + 10 and y + 10 are possible. If x! = 1, it must be y! = 11, but y! = 12 is not
possible: y cannot be incremented if x = 1, y = 2. The only valid combinations are
(x! , y! ) ∈ {(1, 11), (11, 2), (2, 11), (2, 12)} with probability 0.25 each. Then, the values
of x and y have become entangled and need to be described as a joint probability distri-
bution.
Other QoS attributes will need specific aggregation operators. For example, avail-
ability will need to be aggregated with × instead of +. If the invoked service gives
value to some variable X, its result value will have to be replaced in the after ρX! .
We assume that the QoS and results of si do not depend on its input data. In our
framework, taking this into account would require to include probability of outputs
given inputs. Although doable, in practice this is a challenge for which we still do not
have a satisfactory solution other than assuming a uniform distribution.
3.6 Conditionals
3.7 Loops
We restrict ourselves to terminating loops. Loop constructs (Eq. (6) are unfolded into a
conditional, treated according to Section 3.6, and a loop (Eq. (7)):
while B do C1 (6) if B then begin C1 ; while B do C1 end else skip (7)
Termination ensures that the unfolding is finitary. Existing techniques [3] can decide
termination for many cases.
4 Experimental Validation
The experimental validation focused on execution time and was conducted on fully-
deployed services. We compared actual execution times, obtained from a large number
of repeated executions, with the distribution predicted by a tool.
ρP of the predicted execution time is produced from a single run of the interpreter. In
order to find out the network impact on our results, we measured time both on the client
and on the service to derive:
The smaller the MSE, the more accurate the prediction is. However, the MSE is just
a number whose magnitude we need to put in context to decide how good is the fitness
we obtain — for example, comparing this number with the fitness obtained with other
prediction techniques. This is not easy due to to the difficulty of installing and running
tools implementing existing proposals.
Therefore we repeated the predictions using as input probability distributions to char-
acterize external services either a single point (for the average approach) or a uniform
distribution ranging from the observed lower to upper bound (for the bounds approach).
We termed these scenarios Constant Probability and Uniform Probability, resp. Table 1
shows the evaluation results. From them it is clear that using the observed probabil-
ity distribution produces much more accurate (for orders of magnitude) predictions.
Most of the prediction errors come from the network characteristics, which are difficult
to control. If the network issues are excluded (Pr[TPe < t]), the predictions show very
promising results with very small MSE.
402 D. Ivanović, M. Carro, and P. Kaowichakorn
References
1. Cardoso, J.: Complexity analysis of BPEL web processes. Software Process: Improvement
and Practice 12(1), 35–49 (2007)
2. Cardoso, J., Sheth, A., Miller, J., Arnold, J., Kochut, K.: Quality of service for workflows
and web service processes. Web Semantics: Science, Services and Agents on the World Wide
Web 1(3), 281–308 (2004),
http://www.sciencedirect.com/science/article/pii/S157082680400006X
3. Cook, B., Podelski, A., Rybalchenko, A.: Proving program termination. Commun.
ACM 54(5), 88–98 (2011)
4. Dumas, M., García-Bañuelos, L., Polyvyanyy, A., Yang, Y., Zhang, L.: Aggregate quality of
service computation for composite services. In: Maglio, P.P., Weske, M., Yang, J., Fantinato,
M. (eds.) ICSOC 2010. LNCS, vol. 6470, pp. 213–227. Springer, Heidelberg (2010)
5. Ivanović, D., Carro, M., Hermenegildo, M.: Towards Data-Aware QoS-Driven Adaptation for
Service Orchestrations. In: Proceedings of the 2010 IEEE International Conference on Web
Services (ICWS 2010), Miami, FL, USA, July 5-10, pp. 107–114. IEEE (2010)
6. Kattepur, A., Benveniste, A., Jard, C.: Negotiation strategies for probabilistic contracts in web
services orchestrations. In: ICWS, pp. 106–113 (2012)
7. Zheng, H., Yang, J., Zhao, W., Bouguettaya, A.: QoS Analysis for Web Service Composi-
tions Based on Probabilistic QoS. In: Kappel, G., Maamar, Z., Motahari-Nezhad, H.R. (eds.)
Service Oriented Computing. LNCS, vol. 7084, pp. 47–61. Springer, Heidelberg (2011)
Orchestrating SOA Using Requirement
Specifications and Domain Ontologies
1 Introduction
Services Computing is an interdisciplinary field that covers the science and tech-
nology of using computing and information technology (IT) to model, create,
operate, and manage services that bridges the gap between business and IT [1].
Increase in the creation and consumption of web services has made the analy-
sis and generation of composition plan challenging [2]. Approaches that tackle
the issue of service composition require users to capture the service composition
requirements in the form of service templates, service query profiles, or partial
process models [8–10]. The requirements include: list of sub-services, inputs, out-
puts, preconditions and effects (IOPEs) of the sub-services, and the execution
order of these sub-services. Henceforth, we refer to the templates that capture
these requirements as partial process models. The existing approaches assume
that the partial process models are readily available to initiate the service com-
position engine. However, this assumption does not always hold in practice [3].
In this paper, we address the issue of automatically deriving the partial pro-
cess model for service composition. The goal is to reduce the burden of process
designers to a great extent, especially for non-domain experts. Our experiment
discussed in Section 3 indicates that the main challenges for process designers in-
clude: understanding the composition requirements of complex services, correctly
correlating the inputs and outputs of sub-services, and designing the business
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 403–410, 2014.
c Springer-Verlag Berlin Heidelberg 2014
404 M. Bhat, C. Ye, and H.-A. Jacobsen
logic of the process model. To address these issues, we propose to use domain
ontologies, user stories in the requirement specification documents (RSDs), and
user queries to recommend a set of services along with their inputs, outputs, and
execution order to describe the partial process model.
Ontologies are extensively used in different phases of software engineering [4].
In recent years, organizations are putting in the extra effort for manually creating
domain ontologies due to their significant advantages as a means of knowledge
sharing and reuse. There also exist automatic and semi-automatic approaches
for ontology creation. For instance, LexOnto [5] uses the web services in the
Programmable Web directory [6] to create classification ontologies. As ontologies
are generally available or they can be generated using existing tools, we consider
using ontologies to facilitate the automatic generation of partial process models.
On the other hand, the popularity of agile methodologies has made the use
of user stories to capture requirements a common practice. The user stories are
expressed in a standard format such as “As a role, I want goal so that benefit.” To
simplify the algorithm, we focus only on the user stories. However, our approach
can be generalized to the individual statements in the RSDs.
Since a service composition may involve services from different domains, one
of the main challenges of our approach is to link domain ontologies to handle
requirements from multiple domains. To address this challenge, we extended the
existing approach by Ajmeri et al. [7] that helps requirement analysts visualize
how requirements span across multiple domains. The ontologies are linked using
the semantic similarity of concepts. The linking of ontologies is used to derive a
conceptual model of the requirements to help requirement analysts improve the
completeness of requirements. Furthermore, we use natural language processing
(NLP) to link the concepts in the ontologies with the terms in the user stories
and to classify the concepts as either services or input-output (i/o) of services.
Once the atomic services and their inputs and outputs are identified for a queried
service, the data dependency constraints determine the execution order.
The main contributions of this paper are two-fold. First we propose an ap-
proach to automatically generate the partial process model for service compo-
sition. Our approach complements the ideas behind the existing ontology-based
and NLP-based service composition approaches. Second we realize our approach
as a recommender system that can integrate ontologies and user stories to sub-
stantially reduce the time and effort involved in service composition.
2 Related Work
between the user requests and the services is computed based on the syntac-
tic, semantic and operational details. Furthermore, Grigori et al. [10] propose a
method to transform the behavior matching of services and user requests to a
graph matching problem. Capturing the requirements in a partial process model
is a non-trivial task. We address this issue in our approach by automatically
deriving the partial process model from the RSDs and the domain ontologies.
Tag-Based Service Composition: The tag-based approaches for service
composition are becoming popular [11]. They are easy to implement and to use.
However, they have their own shortcomings, for instance, tags are usually too
short to carry enough semantic information and most services have only a few
tags. These tags can belong to different types such as content-based, context-
based, attribute, and subjective. This results in a large tag-space and low effi-
ciency and effectiveness in semantic matching. Therefore, these approaches are
oriented towards mashup. They do not address how to generate traditional work-
flows which involve sequentially and parallel executing tasks.
Publish/Subscribe-Based Distributed Algorithm: Hu et al. [12] pro-
pose a distributed algorithm to enable service composition via a content-based
pub/sub infrastructure. Even though the distributed approach seems promis-
ing, it considers matching of services at a syntactic level, whereas our solution
concerns both syntactic and semantic levels.
Service Composition Using NLP: The service composition system pro-
posed in [13] addresses the shortcomings of the existing NLP-based solutions [14].
The solution proposed in [13] comprises of an integrated natural language parser,
a semantic matcher, and an AI planner. The parser extracts the grammatical
relations from the requirements and generates the service prototypes comprising
of process names and input values. The service prototypes are further used by
the semantic matcher to identify services from the repository and the AI plan-
ner generates a composed service. In this approach, a clear set of patterns used
to identify the process names and their input values from the requirements is
not captured. Furthermore, a detailed evaluation of the system with respect to
correctness and completeness of the generated composed services is missing.
Although, there exist a large body of knowledge addressing service discovery
and composition, understanding the technical and domain specific requirements
for process designers to use these approaches still remains a challenge.
4 Approach
The challenges that we address in our approach include: mapping of domain
ontologies using semantic similarity mapping approach, identifying if a concept
is a service or an i/o of a service, capturing the data dependency of services, and
integrating the recommendation with user preferences.
Overview: In this section, we briefly introduce how the partial process model
is generated. As shown in Figure 2, the process designer inputs the keywords cor-
responding to the service he plans to develop. He also selects the relevant domain
ontologies and provides the list of user stories to the system. The system maps
the concepts in the domain ontologies to the terms in the user stories and identi-
fies the candidate sub-services. The concepts in the domain ontologies associated
with the candidate services are identified as i/o concepts of sub-services. Users’
past preferences filter the candidate services and tag the i/o concept as either
input to a service or output of a service. Tagging of concepts as inputs and out-
puts of sub-services establishes the data-dependency constraints to determine the
execution order of the sub-services. The process designers’ selection iteratively
updates the user preference repository to improve the recommendations. Process
designers can also suggest the missing concepts to evolve the ontologies, which
are validated by a domain expert before updating the ontologies. The partial
process model created based on the recommendations is further given as input
to the composition engine that retrieves services from the service repositories.
Cross-Domain Ontology Mapping: Ajmeri et al. [7] propose an approach
to identify the concepts that link different ontologies using semantic similar-
ity mapping. The semantic similarity between concepts is calculated based on
SOA Using Req. Specifications and Domain Ontologies 407
syntax, sense, and context similarity. The syntactic similarity matches concepts
based on the string equivalence. The sense similarity matches concepts based
on the similar usage sense determined using a set of synonyms called synset [16].
The context similarity matches concepts based on the similarity of their neigh-
borhood. The neighborhood of a concept is determined based on the inheritance
relationships from the parent concept to the child concept. The semantic similar-
ity between the concepts is computed as a weighted mean of syntactic, sense, and
context similarity. The semantically similar concepts identified using the above
approach are the concepts which map two different domain ontologies (cf. [7] for
a detailed description of the cross-domain ontology mapping algorithm).
Identification of Candidate Services: Once the domain ontologies are
linked using the semantic similarity mapping approach, we identify if the con-
cepts in the ontology that are associated with the user query are services. We
use the following criteria in [7] to identify if the concept represents a service: the
concept or its equivalent concept in the ontologies must be present in the user
stories, and the concept or its substring should be part of a verb phrase (VP).
However, if it is part of a noun phrase (NP), it should be prefixed by a VP.
For example, in the user story “As a customer, I want flight booking function-
ality, to book flight from source to destination city” the substring booking of the
concept flight booking is part of a VP and hence the concept flight booking is
suggested as a candidate service. To parse a sentence and to create a constituent
tree of objects, we use the open-source Link Grammar library [17].
service(c1 ) = USi .contains(c1 ) and (c1 .parseType = “VP” or (c1 .parseType
= “NP” and c1 .parent.parseType = “VP”)) ? true : false;
where USi is the ith user story; c1 .parseType is the tag associated with the
concept c1 w.r.t. parse tree of USi ; c1 .parent is the parent token in the parse
tree.
5 Evaluation
To evaluate the system developed based on the approach discussed in Section 4,
we have considered two practical use-case scenarios which are commonly used
as the benchmark examples for service composition1 . We also conducted an em-
pirical study to evaluate the quality of the recommended partial process model.
Empirical Study: We introduced our system in the course assignment which
had the same requirements as the assignment in Section 3. The participants were
assigned groups and each group was required to submit three milestones. M3,
involved developing the TA service by composing other services. The participants
were given a brief introduction on how to use our recommender system.
27 groups deployed their services for M3 and 33% of the groups correctly im-
plemented the service. Services implemented by 44% of the groups failed to pass
the test cases due to syntactical errors such as incorrect variable initialization
and incorrect namespaces. However, these services were complete with respect to
the requirements; in the sense that, the TA services included invocation of all the
1
Due to space limitations, the case studies are available online via
https://sites.google.com/site/wsccs2013
SOA Using Req. Specifications and Domain Ontologies 409
6 Conclusions
Our approach, realized as a recommender system derives the partial process
model using domain ontologies and user stories. The observations based on the
evaluation indicate that our approach not only helps in understanding the re-
quirements of service composition but also reduces the time and effort involved
in the development of service composition. Our evaluation is based only on the
case study and the empirical study. In our future work, we plan to extend the
recommender system by integrating a composition engine so as to retrieve de-
sired service compositions in a specific service specification language. Also, in
410 M. Bhat, C. Ye, and H.-A. Jacobsen
References
1. Yan, Y., Bode, J., McIver, W.: Between service science and service-oriented soft-
ware systems. In: Congress on Services Part II. SERVICES-2 (2008)
2. Xiao, H., Zou, Y., Ng, J., Nigul, L.: An approach for context-aware service discovery
and recommendation. In: ICWS (2010)
3. Srivastava, B., Koehler, J.: Web service composition - current solutions and open
problems. In: ICAPS Workshop on Planning for Web Services (2003)
4. Happel, H.J., Seedorf, S.: Applications of ontologies in software engineering. In:
Proc. of Workshop on SWESE on the ISWC (2006)
5. Arabshian, K., Danielsen, P., Afroz, S.: Lexont: A semi-automatic ontology creation
tool for programmable web. In: AAAI Spring Symposium Series (2012)
6. The Programmable Web, http://www.programmableweb.com
7. Ajmeri, N., Vidhani, K., Bhat, M., Ghaisas, S.: An ontology-based method and tool
for cross-domain requirements visualization. In: Fourth Intl. Workshop on MARK
(2011)
8. Traverso, P., Pistore, M.: Automated composition of semantic web services into
executable processes. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.)
ISWC 2004. LNCS, vol. 3298, pp. 380–394. Springer, Heidelberg (2004)
9. Cardoso, J., Sheth, A.: Semantic e-workflow composition. Intell. Inf. Syst. (2003)
10. Grigori, D., Corrales, J.C., Bouzeghoub, M., Gater, A.: Ranking BPEL processes
for service discovery. IEEE Trans. on Services Comput. (2010)
11. Liu, X., Zhao, Q., Huang, G., Mei, H., Teng, T.: Composing data-driven service
mashups with tag-based semantic annotations. In: ICWS (2011)
12. Hu, S., Muthusamy, V., Li, G., Jacobsen, H.-A.: Distributed automatic service
composition in large-scale systems. In: Second Intl. Conf. on Distributed Event-
Based Syst. (2008)
13. Pop, F.C., Cremene, M., Vaida, M., Riveill, M.: Natural language service compo-
sition with request disambiguation. In: Service-Oriented Comput. (2010)
14. Lim, J., Lee, K.H.: Constructing composite web services from natural language
requests. In: Web Semantics: Science, Services and Agents on the WWW (2010)
15. Introduction to Service Comput., https://sites.google.com/site/sc2012winter
16. Miller, G.A.: WordNet: A lexical database for English. Commun. of the ACM
(1995)
17. Sleator, D.D., Temperley, D.: Parsing English with a link grammar. arXiv preprint
cmp-lg/9508004 (1995)
18. Horridge, M., Bechhofer, S.: The OWL API: A Java API for OWL ontologies.
Semantic Web (2011)
19. Allen, I.E., Seaman, C.A.: Likert scales and data analyses. Quality Progress (2007)
Estimating Functional Reusability of Services
Felix Mohr
1 Introduction
During the last decade, the focus in software development has moved towards
the service paradigm. Services are self contained software components that can
be used platform independent and that aim at maximizing software reuse.
A basic concern in service oriented architectures is to measure the functional
reusability of the services in general or for specific tasks. Such a metric would
support the analysis of relations between services, allow to estimate the potential
impact of new services, and indicate the suitability of automatization techniques
like composition; Fig. 1 shows this information gain. Usually, we have no knowl-
edge about how services in a network are related; they are merely members of
a homogeneous set (Fig. 1a). Analyzing their specifications helps us recognize
relations between them and identify reuse potential (Fig. 1b).
Surprisingly, there is no metric of which we could say that it is even close to be
satisfactorial in regards of measuring reusability. The main problem is that most
reusability metrics are based on code analysis [7], e.g. the Halstead metric and
others. However, the idea of services is precisely that the implementation needs
not to be available. Reusability metrics for black box components exist [4, 9, 10]
but are notoriously inexpressive; that is, they do effectively not say anything
about functional reusability even though that was the design goal.
This paper gives a vision statement for the service reusability question and
hints to possible solutions for the problem. Intuitively, the reusability of a service
s is based on the number of problems for which there is a solution that contains
s. Instead of simply using this number directly as a metric for reuse, it should be
somehow weighted, since the complexity and likelihood of occurrence of prob-
lems strongly varies. Unfortunately, it seems to be very hard, or even impossible,
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 411–418, 2014.
c Springer-Verlag Berlin Heidelberg 2014
412 F. Mohr
(a) No knowledge about services relations (b) Insights in how service are related
Fig. 1. A metric for reusability helps us learn more about how services are related
3 Problem Description
This section describes the formal frame of this paper. First, I describe my idea of
an ideal reusability metric in terms of composition problems to whose solution
a service may contribute. Second, I explain the obstacles that occur with that
metric and the necessity for alternatives. Third, I introduce the formal service
model that underlies the rest of the paper.
This metric only captures the functional reusability and ignores other factors
that may affect the practical suitability of solutions. In fact, non-functional as-
pects may cause that a service s is never part of a chosen solution for problem
p. However, we shall focus on the purely functional aspect here.
414 F. Mohr
The problem set P is usually infinite, but we may expect the metric to con-
verge towards a finite number for every service s. Of course, divergence is an
issue, because there will be probably infinitely many ”potential” problems where
a service s could be part of a solution. However, I think that the number of ac-
tually occurring problems to whose solution a service s may contribute is quite
limited. The probability of occurrence of other problems to whose solution s may
contribute, hence, is either zero or converges to zero very fast. Concludingly, we
may assume that r∗ (s) takes a fix value in R+ for every service s.
The reusability of a reference service may normalize the metric. Using service
s∗ as the reference service, the normalized reusability of a service s is:
r∗ (s)
||r∗ ||(s) =
r∗ (s ∗ )
This allows us to say that a service is, for example, twice as reusable as the
reference service; this makes the metric more intuitive.
kind of assertions about ||r∗ || can be achieved, so, strictly spoken, we cannot
claim to approximate reusability.
However, it is absolutely possible to define other metrics that, due to some
semantic link to reusability, are indicators for reliability. As a consequence, ap-
proaches for estimating reusability must qualitatively explain why it is reasonable
to believe that they are a reliable indicator of reusability.
goal state that must be reached from the initial state through the application of
services.
Definition 1. A service description is a tuple (I , O , P , E ). I and O are dis-
joint sets of input and output variables. P and E describe the precondition and
effect of the service in first-order logic formulas without quantifiers or functions.
Variables in P must be in I; variables in E must be in I ∪ O.
As an example, consider a service getAvailability that determines the avail-
ability of a book. The service has one input b for the ISBN of a book and one
output a for the availability info; we have I={b} and O={a}. The precondition
is P = Book (b) and requires that the object passed to the input b is known
to have the type Book. The effect is E = HasAvInfo(b, a) and assures that the
object where the output a is stored contains the info whether b is available.
I acknowledge that this formalization may not always be adequate, but it
is the by far most established service description paradigm besides finite state
machines (FSM), and even FSM service representations can often be efficiently
transformed into an IOPE representation. The introduction of a(n even) more
sophisticated formalism is beyond the scope.
For simplicity, I leave a knowledge base out of the model. A knowledge base
is usually used to express ontological information and logical implications in the
model but is not needed to explain the idea of this paper.
This section gives a brief sketch about one possibility to use semantic service
descriptions to estimate their reusability. A service s is most likely to be reused
if there are many other services that can do something with the effect of s; then
s contributes to those services. This section defines the relevance of a service
based on its contribution to other services and the relevance of those services in
turn. The higher the relevance of a service, the higher the number of problems
that can be solved with it; this relevance is a good estimation for reusability.
We can capture the direct relation of two services in a service contribution graph.
Given a set of services S with descriptions as in Def. 1, a service contribution
graph is a graph (V, E) with exactly one node in V for every service in S and
with an edge (si , sj ) ∈ E if and only if at least one literal in the effect of si
and the preconditions sj can be unified. Intuitively, there is a link between from
416 F. Mohr
where c(s) are the child nodes of service s in the composition tree1 and w(s, s )
is the weight of the edge in the contribution graph. Since the maximal recur-
sion depth k will usually be a parameter that is chosen once and then remains
unchanged, let the relevance value of a service be denoted as rk (s) := r(s, k).
4.3 Discussion
The above metric is obviously very rudimentary, but it gives a clue of what a
reusability estimating metric may look like. For example, the constant factor
of a node could be substituted by an expert’s estimation. Also, it would be
a good idea to not only consider the outgoing edges of a service, but also its
incoming edges in the contribution graph. However, compared to the absence of
information as depicted on the left of Fig. 1, this metric already gives very useful
insights. We can argue even for this simple measure that it is an estimator for
reusability: The more compositions that start with a service are imaginable, the
more problems will exist for which that service may be part of a solution.
Note that the alleged redundancy of execution paths that are merely per-
mutations of each other is intended. For example, a service s0 may contribute
to both s1 and s2 while in turn s1 contributes to s2 and vice versa. Thereby,
the relevance of s0 is increased twice, once by s1 , s2 and once for s2 , s1 . This
may seem unreasonable at first, but it is actually quite what we want. The edge
(s1 , s2 ) only has a high weight if s1 contributes for s2 and vice versa. If one of
the paths does not make sense, it will have a low weight anyway and will only
marginally affect the relevance value of s0 .
1
More precisely, the last service in the child, since nodes are service sequences.
418 F. Mohr
5 Conclusion
This paper gives a vision statement for metrics of functional reusability of ser-
vices and sketches service relevance as one possible such metric. It defines an ideal
reusability metric and explains why such a metric is usually not computable. The
sketched metric tackles this problem by estimating service reusability through
service relevance, a recursive metric based on the contribution of services to the
preconditions of other services in the network. Its explanatory power is limited
by the quality of the service descriptions.
This paper is merely a first step into the direction of analyzing functional reusabil-
ity, so there is great potential for future work; I just mention some options. First,
it would be interesting to estimate the reusability of services in a completely dif-
ferent way; for example, we could use a simplification of the service model that
makes the number of composition problems tractable. Second, the presented
metric only works with forward edges in the contribution graph, yet we could
take into account the provision of required service inputs. Third, the weights in
the composition tree could be discounted depending on the depth in order to
consider possible noise between the model and the real services that may affect
composition. Fourth, the proposed metric could be integrated with a learning
approach that collects information about how services are used together.
References
1. Fazal-e Amin, A., Oxley, A.: A review of software component reusability assessment
approaches. Research Journal of Information Technology 3(1), 1–11 (2011)
2. Caldiera, G., Basili, V.R.: Identifying and qualifying reusable software components.
Computer 24(2), 61–70 (1991)
3. Cheesman, J., Daniels, J.: UML components. Addison-Wesley, Reading (2001)
4. Choi, S.W., Kim, S.D.: A quality model for evaluating reusability of services in
soa. In: Proceedings of the 10th IEEE Conference on E-Commerce Technology, pp.
293–298. IEEE (2008)
5. Frakes, W.: Software reuse research: status and future. IEEE Transactions on Soft-
ware Engineering 31(7), 529–536 (2005)
6. Frakes, W., Terry, C.: Software reuse: metrics and models. ACM Computing Sur-
veys (CSUR) 28(2), 415–435 (1996)
7. Gill, N.S., Grover, P.: Component-based measurement: few useful guidelines. ACM
SIGSOFT Software Engineering Notes 28(6), 4 (2003)
8. Krueger, C.W.: Software reuse. ACM Computing Surveys 24(2), 131–183 (1992)
9. Rotaru, O.P., Dobre, M.: Reusability metrics for software components. In: Pro-
ceedings of the 3rd ACS/IEEE International Conference on Computer Systems
and Applications, p. 24. IEEE (2005)
10. Washizaki, H., Yamamoto, H., Fukazawa, Y.: A metrics suite for measuring
reusability of software components. In: Proceedings of 5th Workshop on Enterprise
Networking and Computing in Healthcare Industry, pp. 211–223. IEEE (2003)
Negative-Connection-Aware Tag-Based Association
Mining and Service Recommendation*
Yayu Ni1, Yushun Fan1,*, Keman Huang2, Jing Bi1, and Wei Tan3
1
Tsinghua National Laboratory for Information Science and Technology,
Department of Automation Tsinghua University, Beijing 100084, China
nyy07@mails.tsinghua.edu.cn, {fanyus,bijing}@tsinghua.edu.cn
2
School of Computer Science and Technology, Tianjin University, Tianjin 300072, China
victoryhkm@gmail.com
3
IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA
wtan@us.ibm.com
1 Introduction
In Web 2.0 era, a growing number of interactive web services have been published on
the Internet. By combing chosen web services, web developers now are able to create
novel mashups (i.e., composite services derived from services) to meet specific func-
tional requirements from web users. This programmable paradigm produces an en-
larging ecosystem of web services and their mashups [1].
Rapid increasing of available online services makes manual selection of suitable
web services a challenging task. Automatic service integration architecture, like SOA
[2], and service recommendation algorithms [3-6] are proposed to facilitate service
selection and integration in mashup completion. Recently, an increasing number of
*
Yushun Fan is with Tsinghua National Laboratory for Information Science and Technology,
the Automation Department Tsinghua University, Beijing, 100084 China. (Corresponding,
E-mail: fanyus@tsinghua.edu.cn).
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 419–428, 2014.
© Springer-Verlag Berlin Heidelberg 2014
420 Y. Ni et al.
1
http://www.programmableweb.com
Negative-Connection-Aware Tag-Based Association Mining 421
Let , ,…, denote all the web services, , ,…, denote all
the mashups, and , the collaboration network between services and mashups. If
a mashup is composed by several services , ,…, , then
, ,…, . Let , ,..., denote all tags that used to annotate ser-
vices and |∀ denote all the subsets of . For an service , its anno-
tated tags are , ,..., .
Due to the sparsity of collaboration network and the incompleteness of historical ma-
shup dataset, there is an extreme imbalance between positive and negative samples:
the number of never-collaboration connections is much greater than that of ever-
collaboration ones. Thus, this section proposes a strategy utilizing a service popularity
criterion to collect a balanced set of both positive and negative connections.
For two services and which have been co-used in at least one mashup, a
positive connection between them is generated. Thus given all the historical mashups,
a set of positive service connections can be defined as follow:
, , 1 |∀ , , , , 1 (3)
For two services which have never collaborate with each other in the same mashup,
there exist a negative connection among them. Due to the incompleteness of the data-
set, the negative connections cannot be simply considered as the services cannot col-
laborate to compose a mashup. However, if two services are very popular but never
collaborate with each other, it is reasonable to believe that these two services are not
capable to construct a mashup in the near future. Hence this paper defines the nega-
tive connections between popular services as the credible negative connection. Here
the popular service is defined as follow:
Given the popularity threshold , if the service’s popularity is larger than , i.e.
, then this service is popular.
Then a set of credible negative service connections can be generated as follows:
∀ , , , ,
, , 1 (4)
, , 1
∀ ∃ ⋃ , (5)
RuleScore algorithm scores the tag collaboration rules previously found out by Rule-
Tree. RuleScore employs the classic Adaboost algorithm [15] to construct a sequence
of shallow rule-based decision trees whose depths are restricted to 1 and assign each
of these trees with a coefficient indicating its contribution to composability. At each
iteration of main loop, RuleScore constructs a tree over current training dataset of
service connections with current weights and then updates the weights of all training
samples according to the fact whether they are correctly classified by generated rule
stumps. The detail of RuleScore algorithm is given as follows:
Algorithm 2. RuleScore: Score Collaboration Rules
Require: the connection set
Require: candidate rule set
Require: number of maximal iterations
RuleScore( , , ):
1. Initialize every connection of with weight 1⁄| |
2. Initialize every rule of do with 0
3. for 1, … , do
424 Y. Ni et al.
, ∀ _ | , , , (6)
The more positive rules and less negative rules are satisfied by , , , ,
the greater , is, and two services are more probably composable. Therefore if
the estimated value , 0, then services and are regarded composable,
and if , 0 , then and are considered as none-composable in
mashups.
4 Experiments
, ∑ , 1 (8)
Table 1 shows the average precision, recall and F1-Score values of proposed service
recommendation model in a ten-fold cross validation experiment compared with base-
line algorithms for dataset introduced above. Empirically, the threshold of popularity
for credible negative connections generation is set 10 in the proposed model, and run
20000 iterations for it to stop. The weighting coefficient of Apriori-based approach is
set 0.5, and threshold to determine composability is set 0.4.
It can be observed that our proposed model has the best performance in terms of
F1-Score, which can be interpreted as a weighted average of the precision and recall.
The classic Apriori algorithm results in an extreme low prediction precision because
of its incapability of modeling negative associations and it misclassifies a large
portion of negative connections as positive ones. Our proposed model outperforms
RuleTree-based approach because of the additional adoption of RuleScore, which
enhances the accuracy of RuleTree by utilizing Adaboost meta-algorithm. Our model
also outperforms the baseline method that utilizing subsampling for negative connec-
tion generation, because popularity-based selection strategy of our model produces a
more credible set of negative connections than random sampling.
426 Y. Ni et al.
5 Related Works
6 Conclusion
The paper proposes a tag-based association model for service recommendation, com-
bining positive mashup patterns and negative ones. This combination gives a more
comprehensive and meaningful illustration of current trend for mashup creation. To
the best of our knowledge, we are the first to mine negative tag collaboration rules
from service collaboration network, shedding new light on service usage pattern dis-
covery. Our model also produces a more accurate prediction than the well-known
Apriori-based service recommendation approaches, making a great accuracy im-
provement in service collaboration prediction.
In the future work, we plan to develop a distributed version of our model to im-
prove the efficiency of rule mining, which enables it to scale to a massive number of
services and service tags in big data applications.
References
1. Han, Y., Chen, S., Feng, Z.: Mining Integration Patterns of Programmable Ecosystem with
Social Tags. Journal of Grid Computing, 1–19 (2014)
2. Erl, T.: SOA: principles of service design. Prentice Hall Upper Saddle River (1) (2008)
3. Keman, H., Yushun, F., Wei, T., Xiang, L.: Service Recommendation in an Evolving
Eco-system: A Link Prediction Approach. In: International Conference on Web Services,
pp. 507–514 (2013)
4. Tapia, B., Torres, R., Astudillo, H., Ortega, P.: Recommending APIs for Mashup Comple-
tion Using Association Rules Mined from Real Usage Data. In: International Conference
of the Chilean Computer Science Society, pp. 83–89 (2011)
5. Dou, W., Zhang, X., Chen, J.: KASR: A Keyword-Aware Service Recommendation
Method on MapReduce for Big Data Application. IEEE Transactions on Parallel and
Distributed Systems PP, 1 (2014)
6. Xi, C., Xudong, L., Zicheng, H., Hailong, S.: RegionKNN: A Scalable Hybrid Collabora-
tive Filtering Algorithm for Personalized Web Service Recommendation. In: International
Conference on Web Services, pp. 9–16 (2010)
7. Liang, Q.A., Chung, J.Y., Miller, S., Yang, O.: Service Pattern Discovery of Web Service
Mining in Web Service Registry-Repository. In: IEEE International Conference on
E-Business Engineering, pp. 286–293 (2006)
8. Chien-Hsiang, L., San-Yih, H., I-Ling, Y.: A Service Pattern Model for Flexible Service
Composition. In: International Conference on Web Services, pp. 626–627 (2012)
9. Vollino, B., Becker, K.: Usage Profiles: A Process for Discovering Usage Patterns over
Web Services and its Application to Service Evolution. International Journal of Web
Services Research 10(1), 1–28 (2013)
10. Spagnoletti, P., Bianchini, D., De Antonellis, V., Melchiori, M.: Modeling Collaboration
for Mashup Design. In: Lecture Notes in Information Systems and Organization, pp.
461–469 (2013)
11. Goarany, K., Kulczycki, G., Blake, M.B.: Mining social tags to predict mashup patterns.
In: Proceedings of the 2nd International Workshop on Search and Mining User-Generated
Contents, pp. 71–78 (2010)
12. Keman, H., Yushun, F., Wei, T.: An Empirical Study of Programmable Web: A Network
Analysis on a Service-Mashup System. In: International Conference on Web Services,
pp. 552–559 (2012)
13. Huang, K., Fan, Y., Tan, W.: Recommendation in an Evolving Service Ecosystem
Based on Network Prediction. IEEE Transactions on Automation Science and Engineer-
ing PP, 1–15 (2014)
14. Mitchell, T.: Decision tree learning. Mach. Learn. 414 (1997)
15. Friedman, J., Hastie, T., Tibshirani, R.: Special invited paper. additive logistic regression:
A statistical view of boosting. Annals of Statistics, 337–374 (2000)
16. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of
the 20th International Conference on Very Large Databases, pp. 487–499 (1994)
17. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensem-
bles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches.
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Re-
views 42(4), 463–484 (2012)
428 Y. Ni et al.
18. Bayati, S., Nejad, A.F., Kharazmi, S., Bahreininejad, A.: Using association rule mining to
improve semantic web services composition performance. In: International Conference on
Computer, Control and Communication, pp. 1–5 (2009)
19. Wu, X., Zhang, C., Zhang, S.: Efficient mining of both positive and negative association
rules. ACM Transactions on Information System 22(3), 381–405 (2004)
20. Antonie, M.-L., Zaïane, O.R.: Mining positive and negative association rules: An approach
for confined rules. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.)
PKDD 2004. LNCS (LNAI), vol. 3202, pp. 27–38. Springer, Heidelberg (2004)
Choreographing Services over Mobile Devices
1 Introduction
Service oriented architecture owes its popularity to web services and their tem-
poral collaboration, commonly referred to as web service composition. Using web
service composition, an organization can achieve a low operation and mainte-
nance cost. As is evident, the Internet today is constantly evolving towards the
‘Future Internet’ (FI). In the FI, a mobile device is envisioned to become the
center of computation in all aspects of daily and professional life, specially for
the applications related to Internet of Everything 1 .
At the moment, service orchestration is a widely accepted standard to accom-
plish service composition. In the context of the Future Internet, especially the
IoS and IoT, service orchestration is expected to run into several hurdles. The
biggest problem is: Considering an ultra large scale of the FI, orchestration is not
scalable and the coordination between consumers and providers is impossible [4].
In addition, an orchestrator can become a potential communication bottleneck
and a single point of failure [3]. In this context, we believe service choreography is
‘the’ solution. However, even on a wired network, enacting service choreography
successfully for an ultra large scale Future Internet is a challenge. This is further
1
http://www.cisco.com/web/about/ac79/innov/IoE.html
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 429–436, 2014.
c Springer-Verlag Berlin Heidelberg 2014
430 T. Ahmed and A. Srivastava
2 Proposed Model
In the real world, whenever a charged particle, e.g. an electron, moves in an elec-
tromagnetic field, it experiences an electro-magnetic force (EMF). The particle
experiences acceleration (purely electric) and a drift in the direction of motion
perpendicular to both the electric and the magnetic field. If we assume the two
fields to be in the X and Y axes respectively, then the particle will drift in the
Z axis. The Electromagnetic force experienced by the particle is known as the
Lorentz’s force [8].
In this paper, we have taken inspiration from such a phenomenon in the
physical world. In our model, the movement of an electron is analogous to the
control flow between services. We have assumed that each service hosted on a
mobile device offers both electric and magnetic fields, consequently each node
offers an electromagnetic force (EMF) to the next incoming service request.
The EMF offered by a service is the selection criterion in our model. Next, we
present the definitions of Electric and Magnetic Fields, and show how we make
the dynamic service selection decision.
dV
E= (1)
dx
where, dV is the change in Electric Potential and dx represents the change in
displacement, E is the electric field. The electric potential is the amount of work
done to transfer a unit positive electric charge from one position to another
position.
In the proposed work, the electric potential experienced at a service is defined
in terms of the waiting time experienced at a service node. Mathematically,
and,
Vy (i + 1) = twy (i + 1) (3)
where, twx (i) is the waiting time at a service x realizing the ith step (the current
service/step in the composition), y is the index of the next service to be selected
and twy (i+1) is the waiting time. Here, service y realizes the (i+1)th step (the
next process-step). From equations (1), (2) and (3), the proposed definition of
the electric field offered by a mobile node is:
twx (i) − twy (i + 1)
E(y) = (4)
td(x(i), y(i + 1))
where, E(y) is the Electric Field offered by the service realizing the next process
step, td(x(i),y(i+1)) is the data transfer time, defined as “the amount of time
required to pass all the parameters and control from a service at a particular
step to a service at the subsequent step”. In mobile devices, the data transfer
time also represents latency between two individual devices. It can be seen from
the above equation that if a service (realizing the next process step) has less
waiting time, then Electric Field value is high. It is understood that the waiting
time experienced at a service gives a measure of the congestion experienced at
a node. Also, congestion is directly related to battery consumption. Therefore,
selecting services offering a high Electric Field value can help preserve battery
power. Further, driven by this Field requests will be passed to services that are
not overloaded.
selecting a service following such a biased approach should not be the ideal way
forward. A feasible strategy would be to select a service based on conscious rea-
soning as well. In other words, the subjective approach of the human being must
be complemented by a reasonable objective approach. The same line of reason-
ing could be applied to services computing i.e. while selecting a service based on
QoS attributes, one must follow a human oriented subjective approach comple-
mented by a reasoning based objective approach. Taking this line of reasoning,
we propose Magnetic Field as a preference and QoS based selection function in-
corporating both the subjective and the objective behavior. An ideal candidate
to merge both the two choices is the subjective-objective weighted approach.
Therefore, the proposed definition of the Magnetic Field is as follows.
M (y) = β ∗ wQ + (1 − β) ∗ w Q (5)
where, Q is a matrix containing QoS attributes’ values. w, w’ are the subjective-
objective weight matrices respectively. β is bias parameter in the range [0,1]. The
QoS attributes chosen for weight calculation and the purpose of experimentation
were chosen from attributes commonly found in literature2 . The method used to
calculate weights was taken from [7].
So far we have presented the definitions of the electric field and the magnetic
field. The electric field makes the algorithm congestion aware, thus, aids battery
conservation. It is obvious, the combination strategy of the two fields will play
an important role in service composition and battery conservation. Basically,
the degree of influence each field will have in composition will depend on the
method and the parameter of combination. To combine the two fields, there are
two broad categories: Linear and Non-Linear. In a mobile environment process-
ing capabilities are limited. Therefore, considering simplicity and computational
efficiency, we have chosen a linear combination strategy. It was outlined pre-
viously that a node offering electric and magnetic fields offers electromagnetic
force (EMF). Therefore, the proposed definition of EMF is:
During experimentation we observed that the latency factor kept varying all
the time. This observation was due to the fact that mobility played a major
role here. Because of the nature of the devices, ignoring such factors during
composition is not the best of ways. In the proposed model, we have considered
this factor. Therefore, the application completion time is less.
Fig. 4 shows a snapshot of a few devices only. It can be observed from the two
Figures that when there is no load balancing the battery consumption of the
device is high, 17%. This is theoretically expected, since all the service requests
kept arriving at this service node. A lot of work in literature suffer from this
drawback, i.e. repeated selection of a service. Therefore, they violate the battery
power constraint, hence degrade the QoE of a user. Looking at the result in Fig.
4, one can clearly see that the battery consumption saw a significant drop. The
battery consumption in this case varied from 5.7%-10.7%. This reduction is due
to the fact that requests were distributed across devices. Previously we outlined
the effect of congestion on battery consumption. Therefore, efficient distribu-
tion of requests imply a low queue size, consequently a reduced CPU access,
and hence a reduction in power usage. Therefore, in addition to providing a hu-
man oriented QoS aware composition parameter, Magnetic Field, the technique
performed well in preserving the battery life of person’s mobile device.
4 Related Work
Service choreography has become one of most important topics of research in
the service computing field. However, there only a few techniques purely de-
veloped and deployed on real mobile devices. A Technique to enact a service
choreography using the chemical paradigm is proposed in [2], [3]. Fernandez et
al [3] propose executing a workflow using the chemical paradigm. However, the
focus of the proposed middleware is to execute a workflow in wired networks. We
have proposed a physics based approach for the mobile platform. Further, the
authors in [3] do not focus load balancing, dynamic adaptations. A technique
to achieve choreography in peer-peer network is proposed in [2]. The work pre-
sented in [5] studies the effect of QoS metrics in message integrity and accuracy
of choreographies. A self-* framework for configuring and adapting services at
runtime was proposed in [1]. The framework, PAWS, delivered self-optimization
and ensured guaranteed service provisioning even in failures. A comprehensive
review of service choreographies is available in [6]. However, we did not found
any technique with a special focus towards the IoS, let alone a Mobile Device.
5 Conclusion
In this paper, we proposed a technique customized from the behavior of a
charged particle in physics to achieve service choreography over mobile devices.
436 T. Ahmed and A. Srivastava
References
1. Ardagna, D., Comuzzi, M., Mussi, E., Pernici, B., Plebani, P.: Paws: A framework
for executing adaptive web-service processes. IEEE Software 24(6), 39–46 (2007)
2. Barker, A., Walton, C.D., Robertson, D.: Choreographing web services. IEEE Trans-
actions on Services Computing 2(2), 152–166 (2009)
3. Fernández, H., Priol, T., Tedeschi, C.: Decentralized approach for execution of com-
posite web services using the chemical paradigm. In: 2010 IEEE International Con-
ference on Web Services (ICWS), pp. 139–146. IEEE (2010)
4. Hamida, A.B., Linagora, G., De Angelis, F.G.: Composing services in the future in-
ternet: Choreography-based approach. iBPMS: Intelligent BPM Systems: Intelligent
BPM Systems: Impact and Opportunity, 163 (2013)
5. Kattepur, A., Georgantas, N., Issarny, V.: Qos composition and analysis in recon-
figurable web services choreographies. In: 2013 IEEE 20th International Conference
on Web Services (ICWS), pp. 235–242 (2013)
6. Leite, L.A., Oliva, G.A., Nogueira, G.M., Gerosa, M.A., Kon, F., Milojicic, D.S.: A
systematic literature review of service choreography adaptation. Service Oriented
Computing and Applications 7(3), 199–216 (2013)
7. Ma, J., Fan, Z.P., Huang, L.H.: A subjective and objective integrated approach
to determine attribute weights. European Journal of Operational Research 112(2),
397–404 (1999)
8. Rothwell, E.J., Cloud, M.J.: Electromagnetics. CRC Press (2001)
Adaptation of Asynchronously Communicating
Software
1 Introduction
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 437–444, 2014.
c Springer-Verlag Berlin Heidelberg 2014
438 C. Canal and G. Salaün
system with its 1-bounded asynchronous version (in which each peer is equipped
with one input FIFO buffer bounded to size 1). Thus, this property can be
analysed using equivalence checking techniques on finite systems.
More precisely, given a set of peers modelled using Labelled Transition Sys-
tems and an adaptation contract, we first reuse existing adapter generation tech-
niques for synchronous communication, e.g., [10,16]. Then, we consider a system
composed of a set of peers interacting through the generated adapter, and we
check whether that system satisfies the synchronizability property. If this is the
case, this means that the system will behave exactly the same whatever bound
we choose for buffers, therefore this adapter is a solution to our composition
problem. If synchronizability is not preserved, a counterexample is returned,
which can be used for refining the adaptation contract, until preserving synchro-
nizability. It is worth observing that the main reason for non-synchronizability
is due to emissions, which are uncontrollable in an asynchronous environment,
hence have to be considered properly in the adaptation contract.
The organization of this paper is as follows. Section 2 defines our models for
peers and introduces the basics on synchronous software adaptation. Section 3
presents our results on the generation of adapters assuming that peers interact
asynchronously. Finally, Section 4 reviews related work and Section 5 concludes.
2 Synchronous Adaptation
We assume that peers are described using a behavioural interface in the form
of a Labelled Transition System. A Labelled Transition System (LTS) is a tuple
(S, s0 , Σ, T ) where S is a set of states, s0 ∈ S is the initial state, Σ = Σ ! ∪Σ ? ∪{τ }
is a finite alphabet partitioned into a set Σ ! (Σ ? , resp.) of send (receive, resp.)
messages and the internal action τ , and T ⊆ S × Σ × S is the transition relation.
The alphabet of the LTS is built on the set of operations used by the peer
in its interaction with the world. This means that for each operation p provided
by the peer, there is an event p? ∈ Σ ? in the alphabet, and for each operation
r required from its environment, there is an event r! ∈ Σ ! . Events with the
same name and opposite directions (a!, a?) are complementary, and their match
stands for inter-peer communication through message-passing. Additionally to
peer communication events, we assume that the alphabet also contains a special
τ event to denote internal (not communicating) behaviour. Note that as usually
done in the literature [15,2,20], our interfaces abstract from operation arguments,
types of return values, and exceptions. Nevertheless, they can be easily extended
to explicitly represent operation arguments and their associated data types, by
using Symbolic Transition Systems (STSs) [16] instead of LTSs.
Example 1. We use as running example an online hardware supplier. This ex-
ample was originally presented in [11] and both participants (a supplier and a
buyer) were implemented using the Microsoft WF/.NET technology. Figure 1
presents the LTSs corresponding to both peers. The supplier first receives a re-
quest under the form of two messages that indicate the reference of the requested
hardware (type), and the max price to pay (price). Then, it sends a response
Adaptation of Asynchronously Communicating Software 439
indicating if the request can be replied positively or not (reply). Next, the sup-
plier may receive and reply other requests, or receive an order of purchase on the
last reference requested (buy). In the latter case, a confirmation is sent (ack).
The behaviour of the buyer starts by submitting a request (request). Upon re-
ception of the response (reply), the buyer either submits another request, buys
the requested product (purchase and ack), or ends the session (stop).
3 Asynchronous Adaptation
Example 4. As far as our running example is concerned, given the LTSs of the
peers and the set of vectors presented in Section 2, we can automatically generate
the corresponding adapter (Figure 2). However, if we check whether the com-
position of this adapter with the peers’ LTSs satisfies synchronizability, the ver-
dict is false, and we obtain the following counterexample: b:request!, s:type!,
s:price!, s:reply!, b:reply!, and b:stop!, where the very last event appears
in the asynchronous system but not in the synchronous one. Note that synchro-
nizability focuses on emissions, hence the counterexample above contains only
messages sent by a peer to the adapter (b:request!, s:reply!, b:stop!) or
by the adapter to a peer (s:type!, s:price!, b:reply!). This violation is due
to the fact that the emission of stop is not captured by any vector, and con-
sequently it is inhibited in the synchronous system, while it is still possible in
the asynchronous system because reachable emissions cannot be inhibited under
asynchronous communication.
In order to correct this problem, we extend the adaptation contract by adding
the following vector: Vstop = b : stop!, s : ε . The corresponding adapter is gener-
ated and shown in Figure 3. The system composed of the two peers interacting
through this adapter turns out to satisfy the synchronizability property. This
means that the adapter can be used under asynchronous communication and
the system will behave exactly the same whatever bound is chosen for buffers or
if buffers are unbounded.
4 Related Work
theory of regular trees and imposes two requirements (regularity and contractiv-
ity) on the orchestrator. However, this work does not support name mismatch
nor data-related adaptation. Seguel et al. [21] present automatic techniques for
constructing a minimal adapter for two business protocols possibly involving
parallelism and loops. The approach works by assigning to loops a fixed number
of iterations, whereas we do not impose any restriction, and peers may loop in-
finitely. Gierds and colleagues [13] present an approach for specifying behavioural
adapters based on domain-specific transformation rules that reflect the elemen-
tary operations that adapters can perform. The authors also present a novel way
to synthesise complex adapters that adhere to these rules by consistently sep-
arating data and control, and by using existing controller synthesis algorithms.
Asynchronous adaptation is supported in this work, but buffers/places must be
arbitrarily bounded for ensuring computability of the adapter.
5 Conclusion
Most existing approaches for adapting stateful software focus on systems relying
on synchronous communication. In this paper, we tackle the adapter generation
question from a different angle by assuming that peers interact asynchronously
via FIFO buffers. This complicates the synthesis process because we may have to
face infinite systems when generating the adapter behaviour. Our approach uses
jointly adapter generation techniques for synchronous communication and the
synchronizability property for solving this issue. This enables us to propose an
iterative approach for synthesising adapters in asynchronous environments. We
have applied it in this paper on a real-world example for illustration purposes.
References
1. van der Aalst, W.M.P., Mooij, A.J., Stahl, C., Wolf, K.: Service Interaction: Pat-
terns, Formalization, and Analysis. In: Bernardo, M., Padovani, L., Zavattaro, G.
(eds.) SFM 2009. LNCS, vol. 5569, pp. 42–88. Springer, Heidelberg (2009)
2. de Alfaro, L., Henzinger, T.A.: Interface Automata. In: Proc. of ESEC/FSE 2001,
pp. 109–120. ACM Press (2001)
3. Basu, S., Bultan, T., Ouederni, M.: Deciding Choreography Realizability. In: Proc.
of POPL 2012, pp. 191–202. ACM (2012)
4. Bennaceur, A., Chilton, C., Isberner, M., Jonsson, B.: Automated Mediator Syn-
thesis: Combining Behavioural and Ontological Reasoning. In: Hierons, R.M., Mer-
ayo, M.G., Bravetti, M. (eds.) SEFM 2013. LNCS, vol. 8137, pp. 274–288. Springer,
Heidelberg (2013)
5. Bracciali, A., Brogi, A., Canal, C.: A Formal Approach to Component Adaptation.
Journal of Systems and Software 74(1), 45–54 (2005)
444 C. Canal and G. Salaün
1 Introduction
In order to interact seamlessly, a service requester and a Web service should be
compatible both in signature and in behavior [3]. Service mediation is a feasi-
ble technique to deal with incompatible services by introducing extra compo-
nents such as service mediators (or adaptors) [11]. Most existing approaches for
Web service mediation only focus on how to synthesize service mediators semi-
automatically or automatically in the case when services could be mediated. If
there are irreconcilable mismatches, the services are simply considered as “not
mediatable” and no further solution can be taken for mediation.
However, in practice, interactions among many services may not be fully me-
diated due to irreconcilable mismatches. Therefore, it is of great significance for
analyzing and resolving irreconcilable mismatches between Web services. On the
one hand, the irreconcilable information could be readily applied to measure i)
the mediation degree of a given service and ii) the difficulty degree in amend-
ing the service request for a service mediation. Since there are usually multiple
This work has been partially supported by the National High Technology Research
and Development Program of China (863) under Grant No. 2012AA011204, the
National Natural Science Foundation of China under Grant No. 61100065.
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 445–452, 2014.
c Springer-Verlag Berlin Heidelberg 2014
446 X. Qiao, Q.Z. Sheng, and W. Chen
– Min is the finite set of messages that are received by service S, and Mout
is the finite set of messages that are sent by the service;
– P is the interaction protocol of service S .
We use the deadlock process concept in CSP to check the existence of mediator
and locate the irreconcilable mismatches. To automatically perform the checking
process, we further improve the algorithm in [9] to quantify the mediation degree
of a service.
respectively, the behavior of PM ||PM is: (p1 ||p1 ) $ (p1 ||p2 ) $ ... $ (p1 ||pl ) $
A B A B A B A B
(p1 ||p1 ), (p1 ||p2 ),..., (pn ||pl ), is a interaction path between PM and PM .
A B A B A B A B
Algorithm 1 shows the procedure to check and record the deadlock events of
each interaction path between the requester and the provided service. Function
move is invoked alternately to traverse all events of the input sub-protocols (line
2). The return value of function move has four types. NoMove indicates no event
is checked during this invocation, while Moved means some events have been
checked in the invocation. SKIP indicates the checking is finished and ReadNull
means a ReadNull is encountered.
Algorithm 1. Deadlock Event Checking
Requester
Input: a sub-protocol of PM : p1 , a sub-protocol of PM
Service
: p2
Output: the deadlock event set: events
1. while (true) do
2. result1 := move (p1 ), result2 := move (p2 );
3. if (result1 = ReadNull ∨ result2 = ReadNull)
4. if (result1 = ReadNull) record (events, p1 ); end if
5. if (result2 = ReadNull) record (events, p2 ); end if
6. else if (result1 = NoMove ∧ result2 = NoMove)
7. record (events, p2 ); record (events, p1 );
8. else if (result1 = NoMove ∧ result2 = SKIP )
9. record (events, p1 );
10. else if (result2 = NoMove ∧ result1 = SKIP )
11 record (events, p2 );
12. else if (result1 = SKIP ∧ result2 = SKIP )
13. return events;
14. end if
15. end while
If either result1 or result2 is ReadNull (line 3), or both of them cannot move
forward (return NoMove, line 6), the corresponding events can cause a deadlock
and should be recorded (i.e., the function record ). In order to check the remaining
parts of p1 and p2 , we assume the deadlock is resolved and continue the algorithm
(line 2). It is noted that the checking is performed from the perspective of the
requester. In the scenario when both p1 (i.e., the requester) and p2 (i.e., the
service) return NoMove (line 6), the corresponding event in p2 firstly will be
resolved (line 7). If either result is SKIP and the other result is NoMove (line
8 and line 10), all of the remaining events in the corresponding protocol will be
recorded. If both result1 and result2 are SKIP (line 12), the checking procedure
is finished. Algorithm 2 shows the details on function move.
Based on the recording of the deadlock events, we can calculate the mediata-
bility between the requester and the service. The mediatability of one interaction
path is computed as follows:
M Dpath = 1 − ( Ndeadlocks
Ntotal ) (1)
where Ndeadlocks is the number of the recorded deadlock events and Ntotal
is the number of all receiving events in the interaction path. If Ntotal is 0,
( Ndeadlocks
Ntotal ) should be 0. Clearly, the value of the mediatability of one interaction
path lies in the range of 0 and 1.
Handling Irreconcilable Mismatches in Web Services Mediation 449
Algorithm 2. Move
Input: a protocol to be checked: p
Output: the checking result: result
1. if (isSequential(p)) 26. else if (isExternalChoice(p))
2. for each subSequentialProtocol pi do 27. for each subChoiceProtocol pi do
3. result := move(pi ); 28. if (isChosen(pi ))
4. if (result=SKIP ) 29. return move(pi );
5. hasMoved := true; 30. end if
6. else if (result=ReadNull) 31. return NoMove;
7. return ReadNull; 32. end for
8. else if (hasMoved=true∨result=Moved)
9. return Moved; 33. else
10. else return NoMove; 34. for each event ai do
11. end if 35. if (isWriting(ai ))
12. end for 36. writePipe(ai );
13. return SKIP ; 37. hasMoved := true;
38. else if (isReading(ai ))
14. else if (isP arallel(p)) 39. if (canRead(ai ))
15. for each subParallelProtocol pi do 40. hasMoved := true;
16. resulti := move(pi ); 41. else if (hasMoved)
17. end for 42. return Moved;
18. if (all resulti = SKIP ) 43. else return NoMove;
19. return SKIP ; 44. end if
20. else if (∃ resulti = ReadNull) 45. else if (ai = ReadN ull)
21. return ReadNull; 46. return ReadNull;
22. else if (∃ resulti = Moved) 47. end if
23. return Moved; 48. end for
24. end if 49. return SKIP ;
25. return NoMove; 50. end if
The mediatability between the requester and the service is calculated using:
n
i
M Dservice = ( Σ M Dpath )/n (2)
i=1
i
Here M Dpath is the mediatability of pathi in the mediation model and n is
the number of the interaction paths. Larger values of the mediatability indicate
fewer deadlock events and higher mediation degrees.
The deadlock
The entire
events belong to
choice branch
a choice branch The requester pro-
in the service
4 of the service and vides the required
protocol has no
the start event choice branch.
counterpart.
of the branch is
WriteNull.
Missing Choice
The deadlock
The entire events belong to
choice branch a choice branch The requester deletes
5 in the requester of the requester the required choice
protocol has no and the start event branch.
counterpart. of the branch is
WriteNull.
A loop struc-
When p2 ends
Missing Loop
in Requester
ture in the
with the loop flag
service proto- The requester changes
while p1 ends
col interacts the non-loop structure
6 with SKIP, the
with a non-loop into the loop struc-
receiving events in
structure in ture.
the loop structure
the requester
would be recorded.
protocol.
A loop struc- When p1 ends
Missing Loop
4 Conclusion
In this paper, we advance the existing works on service mediation by proposing an
approach to analyze and measure the irreconcilable behaviors for service
mediation, including a quantifiable metric for measuring mediation degrees, a
pattern-based method for mismatch analysis, a set of resolutions for irreconcilable
patterns, and a further metric for measuring complexity and cost of modification
in service mediation. Our proposed approach, particularly the two metrics, can
also help developers in Web services selection. Our future work will extend the
approach to support more complicated processes and investigate techniques de-
veloped by semantic Web initiatives to automate the service mediation process.
References
1. Benatallah, B., Casati, F., Grigori, D., Nezhad, H.R.M., Toumani, F.: Developing
Adapters for Web Services Integration. In: Pastor, Ó., Falcão e Cunha, J. (eds.)
CAiSE 2005. LNCS, vol. 3520, pp. 415–429. Springer, Heidelberg (2005)
2. Canal, C., Poizat, P., Salaün, G.: Model-Based Adaptation of Behavioral Mis-
matching Components. IEEE Trans. Softw. Eng. 34(4), 546–563 (2008)
3. Dumas, M., Benatallah, B., Nezhad, H.R.M.: Web Service Protocols: Compatibility
and Adaptation. IEEE Data Engineering Bulletin 31(1), 40–44 (2008)
4. Hoare, C.: Communicating Sequential Processes. Prentice-Hall (1985)
5. Kuang, L., Deng, S., Wu, J., Li, Y.: Towards Adaptation of Service Interface Se-
mantics. In: Proc. of the 2009 IEEE Intl. Conf. on Web Services, ICWS 2009 (2009)
6. Li, X., Fan, Y., Madnick, S., Sheng, Q.Z.: A Pattern-based Approach to Protocol
Mediation for Web Services Composition. Info. and Soft. Tech. 52(3), 304–323
(2010)
7. Nezhad, H., et al.: Semi-Automated adaptation of service interactions. In: Proc. of
the 16th Intl. Conf. on World Wide Web, WWW 2007(2007)
8. Qiao, X., Sheng, Q.Z., Chen, W.: Handling irreconcilable mismatches in web ser-
vices mediation. Tech. Rep. TCSE-TR-20140501,
http://otc.iscas.ac.cn/cms/UploadFile/20140731050648880/
9. Qiao, X., Wei, J.: Implementing Service Collaboration Based on Decentralized
Mediation. In: Proc. of the 11th Intl. Conf. on Quality Software, QSIC 2011 (2011)
10. Tan, W., Fan, Y., Zhou, M., Zhou, M.: A Petri Net-Based Method for Compat-
ibility Analysis and Composition of Web Services in Business Process Execution
Language. IEEE Trans. Autom. Sci. Eng. 6(1), 94–106 (2009)
11. Yellin, D.M., Strom, R.E.: Protocol Specifications and Component Adaptors. ACM
Transactions on Programming Languages And Systems 19(2), 292–333 (1997)
12. Zhou, Z., et al.: Assessment of Service Protocol Adaptability Based on Novel Walk
Computation. IEEE Trans. on Systems, Man and Cybernetics, Part A: Systems
and Humans 42(5), 1109–1140 (2012)
Evaluating Cloud Users’ Credibility of Providing
Subjective Assessment or Objective Assessment
for Cloud Services
Abstract. This paper proposes a novel model for evaluating cloud users’
credibility of providing subjective assessment or objective assessment for
cloud services. In contrast to prior studies, cloud users in our model are
divided into two classes, i.e., ordinary cloud consumers providing sub-
jective assessments and professional testing parties providing objective
assessments. By analyzing and comparing subjective assessments and
objective assessments of cloud services, our proposed model can not only
effectively evaluate the trustworthiness of cloud consumers and reputa-
tions of testing parties on how truthfully they assess cloud services, but
also resist user collusion to some extent. The experimental results demon-
strate that our model significantly outperforms existing work in both the
evaluation of users’ credibility and the resistance of user collusion.
1 Introduction
Due to the diversity and complexity of cloud services, the selection of the most
suitable cloud services has become a major concern for potential cloud con-
sumers. In general, there are three types of approaches which can be adopted
to conduct cloud service evaluation prior to cloud service selection. The first
type is based on cloud users’ subjective assessment extracted from their sub-
jective ratings [5]. The second type is based on objective assessment via cloud
performance monitoring and benchmark testing [10] provided by professional
organizations, such as CloudSleuth1 . The third type is based on the comparison
and aggregation of both subjective assessment and objective assessment [7,8].
Whichever type of approaches are adopted, the credibility of cloud users pro-
viding assessments has a strong influence on the effectiveness of cloud service
selection. In cloud environments, cloud users can be generally classified into
two classes according to the different purposes of consuming cloud services. The
first class comprises ordinary cloud consumers whose purpose is to consume a
1
www.cloudsleuth.net
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 453–461, 2014.
c Springer-Verlag Berlin Heidelberg 2014
454 L. Qu et al.
cloud service having high quality performance and spend as little money as pos-
sible. They usually offer subjective assessment of cloud services through user
feedback. The second class comprises professional cloud performance monitor-
ing and testing parties whose purpose is to offer objective assessment of cloud
services to potential cloud consumers for helping them select the most suitable
cloud services. To the best of our knowledge, there are no prior approaches in the
literature, which can effectively evaluate the credibility of both types of cloud
users in cloud service evaluation.
In this paper, we propose a novel model for evaluating cloud users’ credibility
of providing subjective assessment or objective assessment, where subjective as-
sessment is from ordinary cloud consumers (called Ordinary Consumers, OC for
short), and objective assessment is from professional cloud performance monitor-
ing and testing parties (called Testing Parties, TP for short). The credibility of
OCs and T P s providing subjective assessment or objective assessment is respec-
tively represented by trustworthiness of OCs and reputations of T P s. For an OC,
an authority center computes the relative trustworthiness of the other OC s who
consume the same cloud services as the OC. Relative trustworthiness represents
other OCs’ trustworthiness from the OC’s prospect. The relative trustworthi-
ness can also be affected by the difference of variation trend between the other
OC ’s subjective assessments and TP s’ objective assessments over time. Then,
the authority center selects the OCs who are considered trustworthy enough by
the OC as his/her virtual neighbors according to all the relative trustworthiness
values. The neighborhood relationships of all the OCs form a social network.
The global trustworthiness of an OC on how truthful he/she provides subjective
assessment is computed based on the number of OCs who select him/her as their
virtual neighbor.
In the meantime, the reputation of a TP on providing truthful objective as-
sessment is modeled in a different way based on the difference among the TP ’s
objective assessments, the majority of objective assessments from other T P s
and the majority of subjective assessments from OC s. That implies that the
trustworthiness of OCs and the reputations of T P s can be influenced by each
other. For this reason, our model can resist collusion among cloud users pro-
viding untruthful assessments to some extent. Through our model, a successful
collusion attack would become very difficult in practice since a large number
of cloud users would have to be involved in such collusion. In contrast to the
existing user credibility evaluation model which is based on subjective ratings
only, our experimental results show that our model can significantly improve the
accuracy of evaluating user credibility, and enhance the resistance capability of
user collusion in cloud environments.
In this section, we first introduce the framework of our proposed model for
evaluating cloud users’ credibility, and then present the details of our model.
Evaluating Cloud Users’ Credibility of Providing Subjective Assessment 455
instead of Zhang et al.’s method. The experimental results demonstrate that our
method is fairer than Zhang et al.’s.
Distance Measurement between Multiple Ratings: In this sub model, we
apply the rating system defined in Table 1, which is frequently used in prior liter-
ature, such as [1,7], to express OCs’ subjective assessments. In order to compare
two ratings, we adopt the approach proposed by Li and Wang [2], which maps the
rating space into a trust space, to measure the distance between two ratings. As
shown in Table 1, fuzzy ratings are first converted into crisp ratings through the
signed distance defuzzification method [1]. Then, the crisp ratings are normalized
into the interval [0, 1] according to their values. Due to space limitations, we omit
the detailed procedure of mapping the rating space into the trust space. In short,
a trust space for a service is defined as a triple T = {(t, d, u)|t 0, d 0, u
0, t + d + u = 1}. Through Bayesian Inference and the calculation of certainty and
expected probability based on a number of sample ratings, normalized ratings can
be put into three intervals, i.e., for a normalized rating ri ∈ [0, 1], we have
⎧
⎪
⎨ distrust, if 0 ri d;
ri is uncertainty, if d < ri < d + u;
⎪
⎩
trust, if d + u ri 1.
A rating in the distrust range means the consumer who gave this rating deems
that the service provider did not provide the service with committed quality, and
we have a contrary conclusion when a rating is in the trust range. A rating in
the uncertainty range means the consumer is not sure whether the service is
provided with committed quality. Here, we call such a range a trust level.
The Trustworthiness of OCs: The computation of the trustworthiness of an
ordinary consumer OCA consists of two steps: in Step 1, the authority center
computes all the other OCs’ relative trustworthiness based on OCA ’s own expe-
rience, and selects a fixed number of top OCs according to the descending order
of all their relative trustworthiness values, where these top OCs are considered
as OCA ’s virtual neighbors. Here, relative trustworthiness represents other OCs’
trustworthiness from OCA ’s prospect. In Step 2, all these neighborhood relation-
ships form a virtual social network, based on which, the global trustworthiness
of all OCs are computed.
The details of these two steps are provided below:
Step 1. Computing Relative Trustworthiness of OCs: Suppose there are
two ordinary consumers denoted as OC and OC , both of whom consume a group
of cloud services, denoted as {s1 , s2 , · · · , si , · · · , sl }. The relative trustworthiness
of OC based on OC is denoted as RT r(OC ∼ OC ), where OC = OC , and
computed as follows:
Evaluating Cloud Users’ Credibility of Providing Subjective Assessment 457
RT r(OC ∼ OC ) = RT P (OC )×
(1)
[ω × Spri (OC ∼ OC ) + (1 − ω) × Spub (OC ∼ ALL)].
3. ω (weight for private similarity): ω is the weight for how much the
private similarity and the public similarity of OC can be trusted if there are
insufficient correspondent rating pairs between OC and OC . Such a weight can
be calculated based on the Chernoff Bound [4] as follows:
458 L. Qu et al.
⎧
⎨ Nall ,
1 1−γ if Nall < Nmin ;
Nmin = − 2 ln , ω= Nmin (4)
2ε 2 ⎩
1, otherwise,
where ε is a small value (e.g., 0.1) representing a fixed maximal error bound
which OC can accept, and γ ∈ (0, 1) is OC’s confidence level about his/her own
subjective assessments.
4. RT P (OC ) (average reputation of similar TPs with OC’ ): RT P (OC )
represents the weighted average of reputations of T P s, the variation trends of
whose objective assessments over time are similar to that of OC ’s subjective as-
sessments. Suppose there are m T P s, denoted as {T P1 , T P2 , · · · , T Pj , · · · , T Pm },
providing objective assessments for the l cloud services mentioned above. Follow-
ing the time window partition method introduced above, we build correspondent
assessment pairs between OC ’s subjective assessments and T Pj ’s objective as-
sessments for each cloud service, denoted as (rOC ,si , oaT Pj ,si ), where oa denotes
the value of objective assessments. All rOC ,si and oaT Pj ,si are then put together
to build two assessment sequences ordered by the time of every time window, de-
noted as −r−−−→ −−−−−→
OC ,si and oaT Pj ,si respectively. After that, each assessment sequence
is converted into a ranking sequence according to the assessment values. Suppose
the converted ranking sequences for r−− −−→ −−−−−→ −−−−→ −−−−→
OC ,si and oaT Pj ,si are xOC ,si and yT Pj ,si
respectively. Then, the similarity, denoted as ρ(OC ∼ T Pj , si ), between these
two ranking sequences are computed via Spearman’s rank correlation coefficient
[3] which is a common method to compute ranking similarity. Hence, the aver-
age similarity of assessment variation trends between OC and T Pj for all cloud
services can be computed as follows:
1
l
ρ(OC ∼ T Pj ) = ρ(OC ∼ T Pj , si ). (5)
l i=1
All the T P s with ρ(OC ∼ T Pj ) > 0 are then selected as the T P s whose
objective assessments are similar to OC ’s subjective assessments. Suppose there
are p such T P s for OC , then the weighted average reputation of these T P s in
Eq. (1) is computed as follows:
1
p
RT P (OC ) = ( ρ(OC ∼ T Pq ) × RT Pq ), (6)
p q=1
1−d
G(OC)
T r(OC) = +d T r(OCi ), (7)
N
OCi ∈G(OC)
Evaluating Cloud Users’ Credibility of Providing Subjective Assessment 459
where G(OC) is the set of all vertices who select the OC as their neighbor, N is
the total number of vertexes in G and d is a damping factor which is commonly
set to 0.85 in the PageRank algorithm. In our model, T r(OC) is equivalent to
the probability that a random OC selects the OC as his/her neighbor.
Case 1: If rT Pj ,si , rT P ,si and rOC,si are all in the same trust level, which
means a high probability of T Pj providing truthful objective assessments of si .
Cases 2&3: If (rT Pj ,si , rT P ,si ) or (rT Pj ,si , rOC,si ) is in the same trust level,
but (rT Pj ,si , rOC,si ) or (rT Pj ,si , rT P ,si ) is not, the probability of T Pj provid-
ing truthful objective assessments should be lower than that in Case 1. Because
objective assessments are usually considered more reliable than subjective as-
sessments, the payoff in Case 2 should be higher than that in Case 3.
Case 4: If both (rT Pj ,si , rT P ,si ) and (rT Pj ,si , rOC,si ) are all in the different
trust levels, then T Pj is penalized by giving the least reputation payoff. The
reputation payoffs can be defined in the inequality: εa > εb > εc > εd > 0.
460 L. Qu et al.
−3
x 10
8 0.7
2
0.3
0 0.2
0 10 20 30 40 50 0 10 20 30 40 50
Days Days
3 Experimental Results
Because no suitable testing environment exists to evaluate our model, we simu-
late a cloud service environment based on our proposed framework. We collect
the data of response time from CloudSleuth for 59 real cloud services. To the best
of our knowledge, there is no data set of subjective assessments published for
those 59 cloud services. Hence, we select 8 similar cloud services from these cloud
services, and then simulate subjective assessments from 300 OCs and objective
assessments from 36 T P s for the 8 cloud services. We simulate the assessment
behavior of all the participants in the cloud environment for a period of 50 sim-
ulated days. The trustworthiness of every OC and the reputation of every T P
are computed and recorded at the end of each day. In our model, a collusion
attack refer to that some users colluding to provide similar untruthful (too high
or too low) assessments for a cloud service in order to manipulate the cloud
service’s reputation, and collusive assessments refers to such similar untruthful
assessments. We require that each OC or T P has his/her/its own percentage of
providing randomly untruthful or collusive assessments.
In our experiments, all the OCs or T P s are divided into three groups. The OCs
or T P s in each group provide different percentages of randomly untruthful or
collusive assessments. We have conducted experiments in many different settings.
The experimental results demonstrate that our model can effectively detect the
OCs or T P s who/which provide randomly untruthful or collusive assessments.
Due to space limitations, we only present the experimental results in Fig. 2
when some OCs provide collusive subjective assessments and some T P s provide
randomly untruthful objective assessments. Fig. 2 demonstrates that the more
collusive assessments/randomly untruthful assessments the OCs/T P s provide,
the lower the trustworthiness of the OCs/the reputations of the T P s.
Next, we test the tolerance of our model, i.e, the maximum percentages of ran-
domly untruthful or collusive assessments that our model can withstand to stay
effective. We compare our model with Zhang et al.’s work [9] and the version of our
model without T P s, i.e., only OCs’ subjective assessments are used to compute
Evaluating Cloud Users’ Credibility of Providing Subjective Assessment 461
Models
Subjective Zhang et al.’s model [9] Our model without T P s Our model with T P s
Assessments
Untruthful Assessments 30% 43% 55%
Collusive Assessments 21% 24% 29%
4 Conclusion
We propose a novel model for evaluating cloud users’ credibility of providing
subjective assessment or objective assessment for cloud services. Our model con-
siders two different classes of cloud users (i.e., ordinary users and testing parties).
The trustworthiness of OC s and the reputation of TP s are computed respec-
tively to reflect how truthfully they provide subjective or objective assessments.
Moreover, our model have the ability to resist user collusion to some extent. The
experimental results demonstrate that our proposed model considering both sub-
jective assessment and objective assessment significantly outperform the exist
work considering users’ subjective assessment only.
References
1. Chou, S.Y., Chang, Y.H., Shen, C.Y.: A fuzzy simple additive weighting system un-
der group decision-making for facility location selection with objective/subjective
attributes. EJOR 189(1), 132–145 (2008)
2. Li, L., Wang, Y.: Subjective trust inference in composite services. In: AAAI Con-
ference on Artificial Intelligence (2010)
3. Marden, J.I.: Analyzing and Modeling Ranking Data. Chapman & Hall (1995)
4. Mui, L., Mohtashemi, M., Halberstadt, A.: A computational model of trust and
reputation for e-businesses. In: HICSS, p. 188 (2002)
5. Noor, T.H., Sheng, Q.Z.: Trust as a service: A framework for trust management
in cloud environments. In: Bouguettaya, A., Hauswirth, M., Liu, L. (eds.) WISE
2011. LNCS, vol. 6997, pp. 314–321. Springer, Heidelberg (2011)
6. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking:
Bringing order to the web. Technical Report 1999-66 (November 1999)
7. Qu, L., Wang, Y., Orgun, M.A.: Cloud service selection based on the aggregation
of user feedback and quantitative performance assessment. In: IEEE International
Conference on Services Computing (SCC), pp. 152–159 (2013)
8. Qu, L., Wang, Y., Orgun, M.A., Liu, L., Bouguettaya, A.: Cloud service selection
based on contextual subjective assessment and objective assessment. In: AAMAS
2014, pp. 1483–1484 (2014)
9. Zhang, J., Cohen, R., Larson, K.: A trust-based incentive mechanism for E-
marketplaces. In: Falcone, R., Barber, S.K., Sabater-Mir, J., Singh, M.P. (eds.)
Trust 2008. LNCS (LNAI), vol. 5396, pp. 135–161. Springer, Heidelberg (2008)
10. Zheng, Z., Wu, X., Zhang, Y., Lyu, M.R., Wang, J.: QoS ranking prediction for
cloud services. IEEE Trans. Parallel Distrib. Syst. 24(6), 1213–1222 (2013)
Composition of Cloud Collaborations under
Consideration of Non-functional Attributes
1 Introduction
Cloud markets promise unlimited resource supplies, standardized commodities
and proper services in a scalable, pay-as-you-go fashion [1]. Some providers set
up distributed data centers at different geographical locations and jurisdictions
and may not always be able to offer effectual physical capacity to serve large
customers in one location. A solution is cloud collaborations within cloud mar-
kets, i. e., the cooperation of multiple providers to aggregate their resources and
conjointly satisfy users demands. Supposably, such cloud collaborations have
both Quality of Service (QoS) and security impacts. As a user may potentially
be served by any provider within a collaboration, the aggregated non-functional
service attributes (e. g., availability, security protection level, data center loca-
tion) will be determined by the “weakest link in the chain”, i. e., by the provider
with the lowest guarantees. Consideration of country- and industry-specific data
protection laws and regulations is another concern by building cloud collab-
orations, as providers can act in different jurisdictions (the European Union,
Canada, Singapore, or the United States), where data privacy laws differ [4].
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 462–469, 2014.
© Springer-Verlag Berlin Heidelberg 2014
Composition of Cloud Collaborations 463
Based on our previous research [5], we examine the Cloud Collaboration Com-
position Problem (CCCP) with a focus on a broker, who aims to maximize
his/her profit through the composition of cloud collaborations from a set of
providers and assignment of users to these collaborations. In that assignment,
QoS and security requirements, i. e., non-functional attributes, are to be consid-
ered and fulfilled. This work extends the previously introduced exact optimiza-
tion solution approach with a heuristic approach that improves the computa-
tional time in the context of cloud markets.
The remainder of this paper is structured as follows: In Section 2, we briefly
describe the problem and the formal optimization model, we discussed in our
position paper [5]. Section 3 introduces a heuristic approach CCCP-HEU.COM
with deterministic and stochastic variants, which is quantitatively evaluated and
compared with the previous results. Section 4 concludes the paper.
In our work, we take the perspective of a broker, who acts within a cloud market
and unites cloud providers to build cloud collaborations and provides assignment
of cloud users to these collaborations. So, the cloud market consists of a set of
providers P = {1, 2, . . . , P # } and a set of users U = {1, 2, . . . , U # }. We define
resource demand of each user u ∈ U as RDu ∈ R+ units, for which he/she is
willing to pay a total of Mu+ ∈ R+ monetary units. Resource supply of each
cloud provider p ∈ P is defined as RSp ∈ R+ units at a total cost of Mp− ∈ R+ .
We define QoS and security constraints as non-functional constraints and
distinguish two sets of quantitative A = {1, 2, . . . , A# } and qualitative  =
{1, 2, . . . , Â# } non-functional attributes. Quantitative attributes represent nu-
merical properties, e. g., availability. Qualitative attributes depict nominal prop-
erties, e. g., applied security policies. The providers make certain guarantees
with respect to the non-functional attributes. For each quantitative attribute
a ∈ A, the value guaranteed by provider p ∈ P is denoted as AGp,a ∈ R.
For each qualitative attribute â ∈ Â, the corresponding information is given by
ˆ p,â ∈ {0, 1}. The users specify certain requirements concerning their non-
AG
functional attributes. With respect to each quantitative attribute a ∈ A, the
value required by user u ∈ U is denoted as ARu,a ∈ R. Likewise, AR ˆ u,â ∈ {0, 1}
denotes the requirement for each qualitative attribute â ∈ Â, i. e., indicates
whether this attribute is mandatory or not.
Based on these notations, the CCCP can be represented as an optimization
model, as shown in Model 1. We define xu,c and yp,c as the main decision vari-
ables in the model (cf. Equation 11). They are binary and indicate whether user
u or provider p are assigned to collaboration c or not. We introduce yp,c as auxil-
iary decision variables, which are binary as well and indicate the non-assignment
of a provider p to a collaboration c. Furthermore, za,c and ẑâ,c are defined as real
and binary, respectively, and represent the cumulative value of the non-functional
property a or â, respectively, for collaboration c (cf. Equation 12).
464 O. Wenge et al.
such that
xu,c ≤ 1 ∀u ∈ U (2)
c∈C
yp,c ≤ 1 ∀p ∈ P (3)
c∈C
yp,c + yp,c =1 ∀p ∈ P, ∀c ∈ C (4)
xu,c × RDu ≤ yp,c × RSp ∀c ∈ C (5)
u∈U p∈P
za,c ≤ yp,c × AGp,a + yp,c × max(AGp,a ) (6)
p∈P
∀p ∈ P, ∀c ∈ C, ∀a ∈ A
ẑâ,c ≤ yp,c × AG
ˆ p,â + yp,c (7)
∀p ∈ P, ∀c ∈ C, ∀â ∈ Â
ẑâ,c ≥ xu,c × AR
ˆ u,â ∀u ∈ U, ∀c ∈ C, ∀â ∈ Â (9)
yp,c ∈ {0, 1} ∀p ∈ P, ∀c ∈ C (12)
za,c ∈ R ∀a ∈ A, ∀c ∈ C
ẑâ,c ∈ {0, 1} ∀â ∈ Â, ∀c ∈ C
Composition of Cloud Collaborations 465
users and providers will be reduced. At the end, a set P̂ of NFAs-valid assign-
ments (provider - users) is built with respect to the defined NFAs. Resource
demand/supply constraints are not considered in this step.
COLLAB: Building of collaborations. In this step, we build cloud col-
laborations Ĉ, i. e., we bring together providers, who can serve the same users.
Thereby, Equations 6 and 7 are to be considered, i. e., the aggregated NFAs of
collaborative providers will be defined by the worst ones. The set of valid collabo-
rations is the intersecting set of P̂ . Applying of the intersection can be examined
in two ways: determinictic and stochastic. By the deterministic approach (Al-
gorithm 3), the complete set P̂ will be searched through: all permutations of
users û ∈ Û from the assign.Pp̂ lists will be compared (lines 7-12). Thus, we
#
have P̂ # ∗ 2Û possibilities (single provider sets and empty sets are exclusive),
that leads in the worst case to asymptotical exponential runtime for Û , namely
#
O(P̂ ∗ 2Û ). By the stochastic approach, we generate a random subset from the
set P̂ (Algorithm 4), where not all permutations are considered. The replace-
ment of the Input (P̂ ) of Algorithm 3 by the subset generation improves the
algorithm and leads to asymptotical polynomial runtime.
RCHECK: Checking of resource constraints. In this step, we check re-
source constraints (as defined in Model 1). As shown in Algorithm 5, firstly, the
quotients Qû = Mû+ /RDû (willingness to pay for a resource unit) will be calcu-
lated for all users from the provider-users assignments list P̂ . These quotients
are then will be sorted in the descending order with respect to our objective
function, namely, profit maximization (lines 5-9). So, the users with the best
willingness to pay will be considered first.
COMPOSE: Composition of cloud collaborations. In this step, the best
composition of cloud collaborations will be selected. As only one collaboration is
allowed for providers and users simultaneously, the duplicates of them must be
eliminated. So, the cloud collaborations with the same collaborative partners will
be examined and the best constellation with respect to the maximum profit for
a broker will be selected. The selected collaborations build then the complete
solution of CCCP - CCCP sol. As shown in Algorithm 6, each collaboration
c ∈ C produces a certain profit P Rc . To provide an optimal solution, mostly
profitable collaborations must be selected to fulfill the objective function. We
apply here again the greedy principle and go through all collaborations. In lines
(3-7) the collaborations that include the same collaborative partners will be
compared - and the collaboration with the best profit CCCP best will be added
to the complete solution CCCP sol. So, the composition of cloud collaborations
occurs in a polynomial time.
3.1 Evaluation
Algorithm 2. Assignment
1: Input: set of providers P = {1, 2, . . . , P # }; set of users U = {1, 2, . . . , U # }
2: Output: set NFAs-valid provider-users assignments P̂
3: P̂ = ∅; assign.Pp = ∅
4: for all p ∈ P do
5: for all u ∈ U do
6: if AGp ≥ ARu and AG ˆ p ≥ AR ˆ u then check the NFAs fulfillment
7: assign.Pp = assign.Pp + u assign user u to provider p
8: if assign.Pp = ∅ then delete p
9: P̂ = P̂ + Pn (assign.Pp )
10: end if
11: end if
12: end for
13: end for
of our evaluation, i. e., the observed ratio of solved instances and the ratio of the
mean computation times in comparison to the CCCP-EXA.KOM approach, are
summarized in Table 1. As can be clearly seen, the mean computation times are
drasticaly improved, and even the test case (12,18) by CCCP-HEUfull.COM (a
heuristic with the full set COLLAB component) takes only 3.46% of the previosly
computation time used by the exact approach. This variant shows rather optimal
ratio of solving instances in all test cases. CCCP-HEUsub.COM (a heuristic with
the sub-set COLLAB component) has better computation times, but the ratio
of the solved instances (from 100 problem instances) goes already down with the
test case (8,8). It explains also drastical improvement in CCCP-HEUsub.COM
computation times for test cases (8,12)-(12,18), as not all solution will be exam-
ined - only in the randomly generated sub-sets.
Composition of Cloud Collaborations 469
4 Conclusions
While cloud markets promise virtually unlimited resources, the physical infras-
tructure of cloud providers is actually limited and they may not be able to serve
the demands of large customers. A possible solution is cloud collaborations,
where multiple providers join forces to conjointly serve customers. In this work,
we introduced the corresponding Cloud Collaboration Composition Problem with
our new heuristic optimization approach CCCP-HEU.KOM, as a complement
to our prior exact optimisation approach. Our evaluation results indicated dras-
tic improvement in the computation times, but showed also that the proposed
heuristic optimization approach CCCP-HEU.KOM is still rather limited and
needs further improvements, as a broker acts under rigid time constraints. In
our future work, we aim at the development of heuristic approches with meta-
heuristics and dynamic changes. In addition, we plan to extend the proposed
model with more complex non-functional constraints.
References
1. Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J., Brandic, I.: Cloud Computing and
Emerging IT Platforms: Vision, Hype, and Reality for Delivering Computing as the
5th Utility. Future Generation Computer Systems 25(6), 599–616 (2009)
2. Hillier, F., Lieberman, G.: Introduction to Operations Research. McGraw-Hill (2005)
3. Jonson, D.S.: A Brief History of Np-completeness. In: Documenta Mathematica
(2012)
4. Wenge, O., Lampe, U., Müller, A., Schaarschmidt, R.: Data Privacy in Cloud
Computing–An Empirical Study in the Financial Industry. In: 20th Americas Con-
ference on Information Systems (AMCIS) (2014)
5. Wenge, O., Lampe, U., Steinmetz, R.: QoS- and Security-Aware Composition of
Cloud Collaborations. In: 4th International Conference on Cloud Computing and
Services Science (CLOSER), pp. 578–583 (2014)
Bottleneck Detection and Solution Recommendation
for Cloud-Based Multi-Tier Application
Abstract. Cloud computing has gained extremely rapid adoption in the recent
years. In the complex computing environment of the cloud, automatically
detecting application bottleneck points of multi-tier applications is practically
a challenging problem. This is because multiple potential bottlenecks can co-
exist in the system and affect each other while a management system reallocates
resources. In this paper, we tackle this problem by developing a comprehensive
capability profiling of such multi-tier applications. Based on the capability
profiling, we develop techniques to identify the potential resource bottlenecks
and recommend the additional required resources.
1 Introduction
Cloud computing has gained extremely rapid adoption in the recent years. Enterprises
have started to deploy their complex multi-tier web applications into these clouds for
cost-efficiency. Here, the cloud-based multi-tier application consists of multiple soft-
ware components (i.e., tiers) that are connected over inter- and/or intra-
communication networks in data centers. Detecting application bottleneck points of
multi-tier applications is practically a challenging problem, and yet it is a fundamental
issue for system management. Hence, it is desirable to have a mechanism to monitor
the application performance changes (e.g., application throughput changes) and then,
to correlate system resource usages of all components into the application perfor-
mance saturation for system diagnosis.
However, automatically pinpointing and correlating bottlenecked resources are not
trivial. One of important factors we should focus on is that multiple potential bottle-
necks can co-exist typically by oscillating back and forward between distributed re-
sources in the multi-tier applications[6, 9], and they affect each other while a man-
agement system performs resource reallocations to resolve the immediate bottlenecks
observed individually. Therefore, certain potential and critical bottlenecks may not be
timely noticed until other bottlenecks are completely resolved. In this paper, we tackle
this problem by developing a comprehensive capability profiling of such multi-tier
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 470–477, 2014.
© Springer-Verlag Berlin Heidelberg 2014
Bottleneck Detection and Solution Recommendation 471
more because the system bottleneck occurs. Figure 2 illustrates the bottleneck pattern
of the throughput. In the figure, we have plotted normalized throughput of the applica-
tion against normalized load to the application. To capture the knee point, our system
first generates a linear line that connects the first measurement point to the last mea-
surement point and then, computes its length (i.e., z in the figure). Second, at each
measurement point according to the measurement window size, we compute the length
of the orthogonal line drawn from the linear line to the measurement point (i.e., the
height hk in the figure, where k is each measurement point). To compute the height of
each measurement point, it generates two lines and computes their lengths (i.e., xk and
yk in the figure). First line is drawn from the first measurement point to the current
measurement point, and the second line is from the current measurement point to the
last measurement point. Then, using cosine rule and sine rule, the height is computed
as following,
hk = xk sin (cos-1((xk2 + z2 – yk2)/2xkz))
Finally, the knee point is the measurement point that has the highest height from the
linear line among all measurement points. And this knee point indicates the capability
of this application (i.e., potential bottleneck starting point of the application or the tier
being considered).
, where L is the amount of load, αj is the change rate of resource usage (e.g., a slope
in a linear function), and γj is an initial resource consumption in the system. We can
474 J. Yao and G. Jung
obtain αj and γj by calibrating the function to fit into actual curve observed. In this
fitting, we use the change rate of resource usage before knee point.
According to Equation 1, the throughput Tj equals to a function of the resource
usage Uj, given that all other resource types have unlimited capacities (i.e., Cr’= ∞).
Therefore, this implies that Tj reaches its maximum when the resource being consi-
dered is completely utilized (e.g.,Uj = 1, when it is normalized). Thus, from Equation
1, we can derive the maximum load , which this resource can undertake at its
knee point, as follows:
(3)
We can compute the maximum loads of all different types of the resources with the
same way, to produce a set of maximum loads as L , L , L …,L , where n
is the number of resource types in the system. Once we have obtained the set of all
maximum loads, finding the bottleneck resource, intuitively enough, is to find the
resource that has the lowest maximum load, since the resource having the lowest
maximum load has the earliest knee point argmin .
T= L (4)
, where L is the amount of load and is the change rate of throughput. Similarly,
we can obtain by calibrating the function to fit into an actual curve. As mentioned
earlier, the correlation between the load and the application throughput can be defined
as a non-linear function. However, we have focused on the throughput before the knee
point in our performance modeling, and observed the linear function is a good ap-
proximation while calibrating the function. By substituting Equation 4 into Equation 2
in the context of L, we have
(5)
Bottleneck Detection and Solution Recommendation 475
In Equation 6, if we define the target throughput as T*, and the required perfor-
mance capability value as of a resource type j (i.e., the usage rate required to
achieve T*), we can replace T and with T* and , respectively, in the equation.
Here, (αj / β) indicates the normalized increase rate of the resource usage to increase a
unit of throughput. Thus, the equation indicates how much resource capability is re-
quired to meet T*. Note that if is more than 1, it indicates that more resource ca-
pability is required to meet T* than currently available in the configuration, and in this
case the normalized resource shortage is thus ∆ = - 1. With this equation, the
,
required capability of component x, defined as , for the workload and its
throughput goal is a set of such required performance capability values:
,
={ , , ,…, } (6)
4 Preliminary Evaluation
To evaluate our approach, we have used an online auction web transaction workload,
called RUBiS (http://rubis.ow2.org) that is deployed as a 3-tier web application includ-
ing Apache web server, Tomcat servlet server, and back-end MySQL database server.
The workload provided by RUBiS package consists of 26 different transaction types
such as “Home,” “Search Category”. Some of transactions need database read or write
transactions, while some of them only need HTML documents. We have created a
database intensive workload by increasing the rate of database read/write to making the
MySQL server tier being bottlenecked.
Figure 4 shows 3 throughput curves of Apache web server, Tomcat server, and
MySQL database server. The figure points out 3 knee points (the red circles in the
figure) that have been computed by the technique described in Section 2.1. The earliest
knee point has been observed in the database tier as shown in the figure, and it correct-
ly indicates the database tier is bottlenecked for the database intensive workload. As
mentioned above, we intentionally set up the workload to make the database server
bottlenecked. Note that throughputs of 3 servers are different since, by workload setup,
some user requests are controlled not to go through all tiers. As shown in Figure 5,
obviously, CPU is the bottlenecked resource type in the current configuration. This can
be identified by computing the earliest knee point, similar with the way of identifying
bottlenecked tier above. Note that the knee points of disk I/O and network I/O are lo-
cated at the last measurement points. This is because there are no obvious knee points
of these resource types, so the last measurement point is used.
Alternatively, we can identify the bottlenecked resource type by computing the max-
imum load that each resource type can handle as described in Section 2.2. The result is
the set {925.5, 2762.5, 15840.4, 79204.4}, which represents {LCPU , LM , LNW , L }
as the maximum load of CPU, memory, network, and disk, respectively. It also shows
476 J. Yao and G. Jung
that CPU is the bottlenecked resource type because LCPU has the lowest maximum
load. When we see the maximum throughput of the database tier in Figure 4, it shows
the similar amount of load at the knee point. Therefore, it indicates that our perfor-
mance model is accurate enough to compute the resource shortage. Note that we have
also measured their source usages in Web and App tiers, and observed significant un-
der-utilizations of all resources so that they have very high maximum loads.
Fig. 4. Knee points of 3-tier application Fig. 5. Resource usages in the DB tier
5 Related Work
Cloud has gathered pace, as most enterprises are moving toward the agile hosting of
multi-tier applications in public clouds, many researchers have focused on three dif-
ferent research directions: 1) updating application architecture to move from legacy
systems to clouds [3], 2) evaluating different clouds’ functional and non-functional
attributes for allowing cloud users to correctly make a decision on which cloud to host
applications [2, 4, 7, 8], and 3) efficiently orchestrating virtual appliances in a cloud,
which may also include negotiations with cloud users. While some highly related
previous work has principally focused on estimating rudimentary cloud capabilities
using benchmarks [8] and automated performance testing [9],our approach focuses on
the precise characterization of application capabilities in a cloud infrastructure.
Analytical models like [1, 5]have been proposed for bottleneck detection and per-
formance prediction of multi-tier systems. They predict system performance based on
burst workloads and then, determines how much resource to be allocated for each tier
of the application for the target system response time. And there are numerous efforts
that have addressed the challenges of managing cloud application performance. For
example, [10, 11, 12] are based on very detailed understanding of the system resource
utilization characteristics. Performance management solutions like AzureWatch
(http://www.paraleap.com/azurewatch) continuously monitor the utilization of the
various resource types and send a notification once they are saturated.
Bottleneck Detection and Solution Recommendation 477
References
1. Urgaonkar, B., Shenoy, P., Chandra, A., Goyal, P.: Dynamic provisioning of multi-tier internet appli-
cations. In: Int. Conf. on Autonomic Computing, pp. 217–228 (2005)
2. Jayasinghe, D., Malkowski, S., Wang, Q., et al.: Variations in performance and scalability when mi-
grating n-tier applications to different clouds. In: Int. Conf. on Cloud Computing, pp. 73–80 (2011)
3. Chauhan, M.A., Babar, A.M.: Migrating service-oriented system to cloud computing: An experience
report. In: Int. Conf. on Cloud Computing, pp. 404–411 (2011)
4. Cunha, M., Mendonca, N., Sampaio, A.: A declarative environment for automatic performance evalu-
ation in IaaS clouds. In: Int. Conf. on Cloud Computing, pp. 285–292 (2013)
5. Casale, N.M., Cherkasova, G.L., Smirni, E.: Burstiness in multi-tier applications: symptoms, causes,
and new models. In: Int. Conf. on Middleware, pp. 265–286 (2008)
6. Wang, Q., Kanemasa, Y., et al.: Detecting Transient Bottlenecks in n-Tier Applications through Fine-
Grained Analysis. In: Int. Conf. on Distributed Computing Systems, pp. 31–40 (2013)
7. Calheiros, R., Ranjan, R., Beloglazov, A., DeRose, A.C., Buyya, R.: CloudSim: A toolkit for model-
ing and simulation of cloud computing environments and evaluation of resource provisioning algo-
rithms. Software: Practice and Experience 41, 23–50 (2011)
8. Yao, J., Chen, S., Wang, C., Levy, D., Zic, J.: Accountability as a service for the cloud. In: IEEE Int.
Conf. on Services Computing (SCC), pp. 81–88 (2010)
9. Malkowski, S., Hedwig, M., Pu, C.: Experimental evaluation of n-tier systems: Observation and anal-
ysis of multi-bottlenecks. In: Int. Sym. on Workload Characterization, pp. 118–127 (2009)
10. Abdelzaher, T.F., Lu, C.: Modeling and performance control of internet servers. In: Int. Conf. on De-
cision and Control, pp. 2234–2239 (2000)
11. Diao, Y., Gandhi, N., Hellerstein, J.L., Parekh, S., Tilbury, D.M.: Using MIMO feedback control to
enforce policies for interrelated metrics with application to the Apache web server. In: Network Oper-
ation and Management Symposium, pp. 219–234 (2002)
12. Diao, Y., Hu, X., Tantawi, A.N., Wu, H.: An adaptive feedback controller for sip server memory over-
load protection. In: Int. Conf. on Autonomic Computing, pp. 23–32 (2009)
Towards Auto-remediation in Services Delivery:
Context-Based Classification
of Noisy and Unstructured Tickets
Abstract. Service interactions account for major source of revenue and employ-
ment in many modern economies, and yet the service operations management
process remains extremely complex. Ticket is the fundamental management en-
tity in this process and resolution of tickets remains largely human intensive. A
large portion of these human executed resolution tasks are repetitive in nature
and can be automated. Ticket description analytics can be used to automatically
identify the true category of the problem. This when combined with automated
remediation actions considerably reduces the human effort. We look at monitor-
ing data in a big provider’s domain and abstract out the repeatable tasks from
the noisy and unstructured human-readable text in tickets. We present a novel
approach for automatic problem determination from this noisy and unstructured
text. The approach uses two distinct levels of analysis, (a) correlating different
data sources to obtain a richer text followed by (b) context based classification of
the correlated data. We report on accuracy and efficiency of our approach using
real customer data.
1 Introduction
A Service System (SS) is an organization composed of (a) the resources that support,
and (b) the processes that drive service interactions in order to meet customer expecta-
tions. Due to the labor intensive processes and their complex inter-dependencies, these
environments are often at the risk of missing performance targets.
To mitigate this risk and conforming with the underlying philosophy of “what gets
measured, gets done”, every SS defines a set of measurement tools that provide insights
into the performance of its operational processes. One such set of tools include the event
management and ticketing tools. Event management is a key function for monitoring
and coordinating across several infrastructure components. Ticketing systems record
problems that are logged and resolved in the environment. When integrated with the
event management setup, ticketing systems enable proactive and quick reaction to sit-
uations. Together they help in delivering continuous up-time of business services and
applications.
Figure 1 shows an integrated event management and ticketing system, that traces the
life-cycle of a problem ticket in the customer’s domain. Lightweight agents or probes
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 478–485, 2014.
c Springer-Verlag Berlin Heidelberg 2014
Towards Auto-remediation in Services Delivery 479
(shown on the left of Figure 1) are configured to monitor the health of the infrastruc-
ture including servers, storage and network. The collected data is fed into the event
management server (i.e. component 2) whose main functions include: (a) continuous
collection of real time data from probes on endpoints. (b) For each of the data streams,
configure individual event rules to indicate when an event should be triggered. Some
examples of an event rule are: CPU usage is above a utilization threshold, available
disk space is below a space threshold, service not running etc. All the generated events
are stored in event management database which can process up to a million events per
day. All events could be routed to a manual monitoring team (3A). That team monitors
the consoles for alerts, clears the ones that are informational, and raise tickets (4A) in
the Ticket DB for the critical ones. Manual handling of large volume of events results in
an human-intensive component. In contrast, automated handling (3B) of events, enables
auto-clearing of certain classes of bulk alerts. Some event management systems also al-
low remediation actions to be automatically triggered on event occurrence. If the action
is successful the event is auto-cleared and no ticket is created. If not, a ticket is raised
automatically (4B). The path:1-2-3A-4A comprises the manual workflow for event-
to-ticket handling whereas path:1-2-3B-4B comprises auto-ticket handling. At present
majority of the systems continue to operate in manual workflow mode. The main reason
for this is that the auto-ticketing causes large volume of tickets. In the absence of reli-
able auto-clear systems, all tickets have to be manually closed by the monitoring team,
thereby adding more manual work in the system.
Thus in a service delivery environment (SDE), the auto-classification of the ticket
symptom, and auto-remediation through a corrective action is critical. There has been
many relevant works [6], [7], [4], [1], [3], [5] in the area of ticket analytics on struc-
tured and unstructured data. What makes SDE data particularly challenging is that it is
extremely noisy, unstructured and often incomplete. In this paper we present a novel
approach for automatic problem determination from the noisy and unstructured text.
480 G.B. Dasgupta et al.
The work in [1] comes very close to the text classification approach used in our work.
It shows the limitation of SVM-like techniques in terms of scalability and proposes
a notion of discriminative keyword approach. However, it falls short of using context
based analysis to refine the results which is one of the key differentiating factors in
our approach. Another work based on discriminative term approach is [2]. This work
focuses on commonly used text classification data sets rather than service tickets. The
work in [7] approaches the problem by mining resolution sequence data and does not
access ticket description at all. The rest of the paper is organized as follows. Section 2
introduces our 2-step approach for correlating ticket with event data and classifying the
correlated data using context based classification. In Section 3, we present our experi-
mental results and we conclude with Section 4.
while the ticket text gives context specific details. This combination is used for classifi-
cation as described in the following section. In cases where the heuristic fails to identify
the correct relevant event and the confidence is low, we proceed with ticket text only.
Logical Structure for Unstructured and Noisy Tickets: Semantics is required to un-
derstand the contextual information in tickets. Identification of semantic information in
the unstructured and noisy service delivery tickets is difficult. Furthermore, these tickets
are syntactically ill-formed sentences. Hence we define a logical structure for these tick-
ets as shown in Figure 2. The logical structure contains two sub-structures: category de-
pendent and category independent. Category dependent structure stores the information
corresponding to the specific output category. Category independent structure stores the
information present in ticket which is independent of the output categories. Below we
describe components of each of these two sub-structures.
(a) Discriminative specific words: help in discriminating a output category from other
categories.
(b) Discriminative generic words: help in discrimination of categories but less specific
in comparison to discriminative specific words.
(c) Context defining words: constitute the contextual keywords. They by themselves do
not help in discrimination but are useful to capture the contextual information of a
category.
(d) Special specific patterns: regular expressions which help in discriminating a output
category from other categories.
(e) Special generic patterns: regular expressions for discrimination but less specific in
comparison to special specific patterns.
482 G.B. Dasgupta et al.
(f) Domain invariant words: help in identifying the contextual information. Context
defining words help in identification of contextual information related to a partic-
ular category, whereas domain invariant words help in identification of contextual
information in general.
(g) Domain invariant patterns: regular expressions which help in identifying the con-
textual information.
Table 1. Sample words and patterns for discriminating Disk C Full category
For example, consider the Disk C Full output category. For discriminating this cat-
egory from other output categories, words such as disk, drive and patterns such as *c:
drive* will help most when compared to words such as percent, threshold, etc (Table 1).
Words such as not, too, db2 help in identification of contextual information.
For each of the pre-defined output categories, we instantiate a logical structure and
populate the fields of the structure with domain specific keywords and patterns (which
we call as domain dictionary). By comparing with the fields of logical structure of a
category, we assign the words/tokens in the ticket to the appropriate fields of logical
structures of the ticket. For a ticket, we define and populate N logical structures corre-
sponding to N output categories. Category independent structure remains same in all
these N logical structures. The following notation is used throughout this paper:
Let ti denotes the ith ticket, Li denotes the set of logical structures of the ticket ti , lik
denotes the k th logical element from the set Li and fjik denotes the j th field in the log-
ical structure lik . We define the pair of any two logical structures (lij , lik ) of a ticket ti
as contextually disjoint if, for all the fields in the structure, either of the corresponding
fields is empty. i.e. ∀m, either fmik = empty or fmij = empty. Based on the contextual
disjointness, we categorize the tickets into the following two categories:
(a) Simple Tickets: if all the highly ranked logical structures of a ticket are contextually
disjoint.
(b) Complex Tickets: if any two highly ranked logical structures of a ticket are not con-
textually disjoint.
Classification of simple tickets: We use a linear weight based approach to score the
logical structures of ticket. The output category corresponding to highest scored logical
structure is assigned to the ticket. Weights are assigned to various fields of logical struc-
ture based on their discriminative capability. For example, keywords belonging to dis-
criminative specific words gets higher weight compared to discriminative general words.
Classification of complex tickets: As the logical structures of complex tickets are
not contextually disjoint, linear weight based approaches may fail to discriminate be-
tween the logical structures. Hence we need deeper level of context based analysis to
Towards Auto-remediation in Services Delivery 483
classify complex tickets. We use supervised learning approach to learn the contextual
information from complex tickets. The keywords belonging to various fields of logical
structure of output categories are used as features. Feature weights are assigned based
on the discriminative capability of keywords.
To learn the global contextual information about all the output categories together, a
large amount of training data is required. To circumvent this, we build a separate model
for each category. A model for category i will have the knowledge about whether a
ticket belong to category i or not (local contextual information). We used the Support
Vector Machine (SVM) method with a Radial Basis Function (RBF) Kernel to build the
classification engine. Complex tickets pass through all the individual models of output
categories. Since each individual model knows about tickets belonging to it, globally
all the tickets will be correctly classified.
Using rule/weight based approach for classifying simple tickets increases recall but
can lower the precision. To maintain higher precision, one can further validate the output
of rule based approach using context based analysis to filter out any misclassifications.
3 Evaluation
This section outlines the experimental analysis of the proposed approach. The method-
ologies have been implemented as part of a Ticket Analysis Tool from IBM called
BlueFin and deployed to analyze events and tickets for several key customer accounts.
We evaluate the performance of BlueFin in comparison with another popular ticket
analysis tool SmartDispatch [1], a SVM-based text-classification tool, based on large
datasets from some well-known real customer accounts.
For unbiased evaluation, we randomly select tickets from 7 different accounts and
first manually label them into categories. For the analysis here, we consider 17 different
categories of tickets for classification as shown in Table 2. Finally we choose 5000
tickets labeled with one of these 17 categories as the ground-truth data. To measure the
accuracy, we computed the Precision, Recall and F1-score for each of these category.
Note that in a multi-class or multinomial classification, precision of ith category is the
fraction of tickets classified correctly in i (true positives) out of all tickets classified
as i (sum of true and false positives). Recall of category i is the fraction of tickets
classified correctly as i (true positives) out of all tickets labeled as i in the ground-
truth data (sum of true positives and false negatives). F1-score is the harmonic mean of
precision and recall. Alternatively, these can be computed from the confusion matrix, by
summing over appropriate rows/columns. Note that F1-score is a well-known measure
of classification accuracy. The accuracy measures are computed for both BlueFin and
SmartDispatch and the results are shown in Table 2. In addition, we also compute the
overall accuracy measures for all the categories and present in Figure 3. Observe that the
precision measure for each individual categories and the overall precision are extremely
good for BlueFin. Moreover, it also maintains high recall values and thus the F1-score
and results significantly better performance in comparison to the existing approach in
SmartDispatch for all categories.
The major improvement in precision is attributed to the context based analysis in
BlueFin while the higher recall is due to the enriched text set using event-ticket cor-
relation model. To understand this in detail we look at the confusion matrix of BluFin
484 G.B. Dasgupta et al.
(Figure 4) and SmartDispatch (Figure 5). Deeper color shades in cells represent higher
volumes of tickets. The diagonal elements represent the correct classifications and the
non-diagonals are the mis-classified tickets. SmartDispatch has a higher number of
overall mis-classifications. For example, consider the tickets that are originally “Win-
dows non C drive full ”, but are mis-classified as “backup failed”. There are 83 such
mis-classifications in case SmartDispatch while BlueFin has only 11. SmartDispatch
on the other hand, mis-classifies the ticket due to absence of contextual information.
The performance of BlueFin far exceeds smartDispatch in all categories. The reason
smartDispatch underperforms, it only uses discriminative keywords and completely ig-
nores contextual keywords, special patterns, which provides important discriminations
in case of noisy data.
Fig. 4. Confusion matrix for Blufin Fig. 5. Confusion matrix for SmartDispatch
Towards Auto-remediation in Services Delivery 485
4 Conclusion
In this paper, we proposed a novel approach for automatic problem determination from
noisy and unstructured service delivery tickets. Central to our theme is the use of two
distinct levels of analysis, namely, correlation of event and ticket data followed by con-
text based classification of the correlated data to achieve higher precision and improved
recall. Furthermore, we evaluated our approach on real customer data and the results
confirm the superiority of the proposed approach. In the future, we plan to improve the
precision of our approach by using bi-grams, tri-grams etc. as features and the recall by
increasing the size of domain dictionaries.
References
1. Agarwal, S., Sindhgatta, R., Sengupta, B.: Smartdispatch: Enabling efficient ticket dispatch
in an it service environment. In: Proceedings of the 18th ACM SIGKDD International Con-
ference on Knowledge Discovery and Data Mining, KDD 2012, pp. 1393–1401. ACM, New
York (2012), http://doi.acm.org/10.1145/2339530.2339744
2. Junejo, K., Karim, A.: A robust discriminative term weighting based linear discriminant
method for text classification. In: Proceedings of the Eighth IEEE International Conference
on Data Mining, ICDM 2008, pp. 323–332. IEEE (2008)
3. Kadar, C., Wiesmann, D., Iria, J., Husemann, D., Lucic, M.: Automatic classification of
change requests for improved it service quality. In: Proceedings of the 2011 Annual SRII
Global Conference, SRII 2011, pp. 430–439. IEEE Computer Society, Washington, DC
(2011), http://dx.doi.org/10.1109/SRII.2011.95
4. Parvin, H., Bose, A., Van Oyen, M.P.: Priority-based routing with strict deadlines and server
flexibility under uncertainty. In: Winter Simulation Conference, WSC 2009, pp. 3181–3188
(2009), http://dl.acm.org/citation.cfm?id=1995456.1995888
5. Potharaju, R., Jain, N., Nita-Rotaru, C.: Juggling the jigsaw: Towards automated problem
inference from network trouble tickets. In: Presented as part of the 10th USENIX Sym-
posium on Networked Systems Design and Implementation (NSDI 2013), pp. 127–141.
USENIX, Lombard (2013), https://www.usenix.org/conference/nsdi13/
technical-sessions/presentation/potharaju
6. Shao, Q., Chen, Y., Tao, S., Yan, E.A.X., Anerousis, N.: Easyticket: a ticket routing rec-
ommendation engine for enterprise problem resolution. Proc. VLDB Endow. 1, 1436–1439
(2008), http://dx.doi.org/10.1145/1454159.1454193
7. Shao, Q., Chen, Y., Tao, S., Yan, X., Anerousis, N.: Efficient ticket routing by resolution
sequence mining. In: Proceedings of the 14th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, KDD 2008, pp. 605–613. ACM, New York (2008),
http://doi.acm.org/10.1145/1401890.1401964
ITIL Metamodel
Abstract. IT Infrastructure Library (ITIL) has become the de facto standard for
IT Service Management (ITSM). Despite the advantages in the adoption of
ITIL’s best practices, some problems have been identified: different interpreta-
tions due to the complexity of concepts with poor specification and formaliza-
tion; different approaches to the same problems; difficulties exchanging process
models in different process model languages. Besides all published work, is still
missing a metamodel expressing the core concepts, their relationship, and con-
straints. In this paper, we propose an ITIL metamodel to reduce conceptual and
terminological ambiguity, addressing the identified problems, namely: (1) de-
scribing the core concepts of ITIL to be used by other approaches; (2) allowing
the integration, exchange, sharing and reutilization of models; and (3) the use of
different modelling languages following the defined principles.
1 Introduction
Abstract. IT Infrastructure Library (ITIL) has becoming the de facto standard, current-
ly the most widely accepted framework in the world, for implementing IT Service
Management (ITSM) [1-3].
Despite the advantages in the adoption of ITIL’s best practices, many organizations
follow ITIL’s best practices without a reference model and some problems have been
identified: (1) the complexity of ITIL concepts with poor specification and formaliza-
tion, which leads to misunderstandings about these concepts; (2) different tools and
methodologies that are not harmonized or grounded in a shared reference model lead-
ing to different approaches to the same problems; (3) the exchange of process models
in different process model languages still remains a challenge [4].
Besides all published work and books about ITIL, a metamodel expressing the core
concepts, the relation between them, their constraints and limitations is still missing,
especially with academic support.
Once a model is an instance of a metamodel, an ITIL metamodel will be a model
to shape ITIL. A metamodel of ITIL as an explicit model of constructs and rules,
*
Corresponding Author.
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 486–493, 2014.
© Springer-Verlag Berlin Heidelberg 2014
ITIL Metamodel 487
2 Related Work
3 Research Problem
ITIL processes, concepts, and relationships are specified in natural language. Without
a formal and commonly accepted semantics, modelling graphical representation is
complex [10]. In addition to the aforementioned problems in ITIL adoption, we iden-
tified some weaknesses in ITIL representation: (1) unclear concepts definition leading
to different interpretations; (2) models developed from a language description and not
from a universal referential; (3) lack in formal notation and representation leading to
loosely depicted graphical diagrams; (4) focus on logical description of processes; (5)
different approaches and methodologies to the same problems, making exchange and
knowledge sharing difficult; (6) lack of holistic visibility and traceability from the
theory; and (7) different approaches to implementations and tools development.
A metamodel of ITIL, as an explicit model of constructs and rules, is still needed
to specify models within this defined domain of interest. The most important contri-
bution of an ITIL metamodel would be the convergence of approaches and applica-
tions of leading vendors and motion towards the ITIL compliant solutions [5].
Metamodels are also closely related to ontologies. Both are often used to describe
relations between concepts [12], allowing us to understand the logical structures and
generating semantics for best practice frameworks [13]. We acknowledge the differ-
ence between ontologies and metamodels, once their characteristics and goals are
different. However, without an ontology, different knowledge representations of the
same domain can be incompatible even when using the same metamodel for their
implementation [14]. While an ontology is descriptive and belongs to the domain of
the problem, a metamodel is prescriptive and belongs to the domain of the solution
[15]. Ontologies provide the semantics, while metamodels provide a visual interpreta-
tion of the syntax of specific languages that approximate as closely as possible to the
ideal representation of the domain [16]. As a semantic model, the relation to the real-
ity and the interrelation of concepts are true if they obey to a mathematical structure
for all axioms and derivation rules of the structure [17]. To the best of our knowledge,
there is no universally accepted ITIL metamodel as a reference that allows the model-
ling development and a language basis for graphical representation.
4 Proposal
Following some previous published work [4, 5, 13, 18, 19] we considered two sepa-
rate orthogonal dimensions of metamodelling: one dimension concerned with lan-
guage definition and the other with representation. A language can be defined as a set
of valid combinations of symbols and rules that constitute the language’s syntax [11].
Firstly, we identified the core concepts from the ITIL glossary [20], by reducing
the concepts to the fundamental ones, with representation needs, that should be part
of the metamodel. To that aim we followed an ontology engineering process [19]
ITIL Metamodel 489
and analysed ITIL domain, clarifying abstract concepts from ITIL’ books specifica-
tions developing the proposed metamodel. Secondly, we defined linguistically all
concepts for an ontological common understanding adding a mathematic representa-
tion to concepts and relationships. This clarification provided design rules and a mod-
elling process, decreasing the concept’s abstraction but allowing the fundamental dis-
tinction of the concepts relation and interoperability. In a next step, we defined the nota-
tion, clarifying the ontological metamodel and, thus, the metamodel. The metamodel’s
quality was validated through the guidelines of modelling defined by Schutte [17].
Table 1. (Continued)
An atom activity αij with defined procedure from a set Αij of Αi = {αi1, αi2,…αin}
Action possible actions. Actions can be performed by a Stakeholder or
automatically by an Application Service asi αi1= f(asi)
We mapped the ITIL concepts in the language’s metamodel. The proposed ITIL me-
tamodel formalizes expressiveness through the definition of concepts and correspond-
ing visual representation as the following graphical representation (Fig. 1). The pro-
posed ITIL metamodel is based on the structure illustrated in Fig. 1, which relies on
concepts presented in Table 1.
Associated
Service with Contract
Defined by
realised by
realises Defines
triggered by supports
Event Process Role Stakeholder
triggers supported by
supports
Associated
with created by
creates
Record Activity
supported by
uses
used by
realised by
Action
Uses
realises Used by
Accesses Supports
Data Application Infrastructure
Accessed by Supported by
We used the ArchiMate’s [26] notation to graphically represent the metamodel for no
other reason than making it easy to use, but we may represent the metamodel in any
other notation. This generalization makes it possible to model in different languages,
and the integration and reuse of models.
We have modelled an overview of all ITIL’s five books [24, 25] to understand
which services (and from which books) ITIL can provide to its external environment.
We have also modelled each ITIL book, showing which are the applications ITIL uses
to support its processes, and also the infrastructure components that support those
applications. It provides a top view, having ITIL core processes as a black box system
providing services to the environment while using all the ITIL processes. We mod-
elled each one of ITIL’s processes showing a deeper fine-grained representation,
which allow us to look inside the ITIL’s processes and see all of its individual activi-
ties. These models are consistent, since the processes’ inputs and outputs, business,
application and infrastructure services are the same, although at different granularity
levels. We also mapped the activity sequence of ITIL’s Incident Management process
from two different notations (ArchiMate [26] and BPMN [6]), which matched almost
completely. We realized that it would be harder to integrate two approaches if they
did not speak the same language. Therefore, a common frame of reference provided
by the ITIL metamodel is warranted. Even in the absence of a formal graphical lan-
guage we are able to model ITIL using the proposed metamodel.
For the purpose of our research, a high-level checking of utility, correctness, con-
sistency and completeness of ITIL metamodel has been performed. Schutte [17]
defines guidelines to metamodel’s quality evaluation, which are very similar to the set
of design criteria for ontologies [15]: clarity, coherence, extendibility, minimal encod-
ing bias, and minimal ontological commitment.
492 N. Gama, M. Vicente, and M.M. da Silva
6 Conclusion
The understanding of ITIL’s concepts and relationships from ITIL referential books is
hard and requires a lot of time and effort. Different organizations and service pro-
viders develop their own models regarding ITIL adoption without a metamodel or a
common referential, making difficult to share and communicate ITIL models between
different stakeholders. We developed an ITIL metamodel providing an academic con-
tribution to this area, which was not available by the time we started this research.
An ITIL metamodel is per se a valuable contribution. However, the main contribu-
tion of this proposal lies in defining a metamodel to help the ITIL adopters with a
universal identification of concepts and relationships among them, independently of
approach, language or tool used. We identified the core concepts of ITIL’s service
lifecycle and the relationship among them, proposing an ITIL metamodel. Our ap-
proach keeps the semantics of the core concepts intact and thus allows for the reuse of
models and reasoning over the customized metamodel.
Our proposed metamodel might represent a basis to model and to implement ITIL.
Moreover, it provides the sharing and the reutilization of the models from one ap-
proach to another, even with different modelling languages, to improve the represen-
tation of ITIL concepts, and to help promote ITIL discussion and validation within the
ITIL community itself.
References
1. Hochstein, A., Zarnekow, R., Brenner, W.: ITIL as Common Practice Reference Model for
IT Service Management: Formal Assessment and Implications for Practice. In: Interna-
tional Conference on e-Technology, e-Commerce and e-Service (EEE 2005), pp. 704–710.
IEEE Computer Society (2005)
2. Correia, A., Abreu, F.B.E.: Integrating IT Service Management within the Enterprise Ar-
chitecture. In: 4th International Conference on Software Engineering Advances (ICSEA),
pp. 553–558. IEEE, Porto (2009)
3. Gama, N., Sousa, P., Mira da Silva, M.: Integrating Enterprise Architecture and IT Service
Management. In: 21st International Conference on Information Systems Development
(ISD 2012), Springer, Prato (2012)
4. Shen, B., Huang, X., Zhou, K., Tang, W.: Engineering Adaptive IT Service Support
Processes Using Meta-modeling Technologies. In: Münch, J., Yang, Y., Schäfer, W. (eds.)
ICSP 2010. LNCS, vol. 6195, pp. 200–210. Springer, Heidelberg (2010)
5. Strahonja, V.: Definition Metamodel of ITIL. Information Systems Development Chal-
lenges in Practice, Theory, and Education 2, 1081–1092 (2009)
6. Object Management Group: Business Process Model and Notation (BPMN). V 2.0 (2011)
7. OMG: MDA Guide Version 1.0. The Object Management Group (OMG) (2003)
ITIL Metamodel 493
8. OMG: MetaObject Facility (MOF) 2.0 Core Specification Version 2.4.1. OMG Adopted
Specification. The Object Management Group (OMG) (2003)
9. Jantti, M., Eerola, A.: A Conceptual Model of IT Service Problem Management. In: Inter-
national Conference on Service Systems and Service Management (ISSSM 2006), Troyes,
France, vol. 1, pp. 798–803 (2006)
10. Valiente, M.-C., Garcia-Barriocanal, E., Sicilia, M.-A.: Applying an Ontology Approach
to IT Service Management for Business-IT Integration. Knowledge-Based Systems 28,
76–87 (2012)
11. Guizzardi, G.: On Ontology, ontologies, Conceptualizations, Modeling Languages. In:
Vasilecas, O., Ede, J., Caplinskas, A. (eds.) Frontiers in Artificial Intelligence and Appli-
cations, Databases and Information Systems IV, pp. 18–39. IOS Press (2007)
12. Söderström, E., Andersson, B., Johannesson, P., Perjons, E., Wangler, B.: Towards a
Framework for Comparing Process Modelling Languages. In: Pidduck, A.B., Mylopoulos,
J., Woo, C.C., Ozsu, M.T. (eds.) CAiSE 2002. LNCS, vol. 2348, pp. 600–611. Springer,
Heidelberg (2002)
13. Neto, A.N.F., Neto, J.S.: Metamodels of Information Technology Best Practices Frame-
works. Journal of Information Systems and Technology Management (JISTEM) 8,
619–640 (2011)
14. Calero, C., Ruiz, F., Piattini, M.: Ontologies for Software Engineering and Software Tech-
nology. Springer (2006)
15. Gruber, T.R.: Toward principles for the design of ontologies used for knowledge sharing.
International Journal of Human-Computer Studies 43, 907–928 (1995)
16. Baioco, G., Costa, A., Calvi, C., Garcia, A.: IT Service Management and Governance
Modeling an ITSM Configuration Process: A Foundational Ontology Approach. In: Inter-
national Symposium on Integrated Network Management-Workshops (IM 2009)
IFIP/IEEE, New York, pp. 24–33 (2009)
17. Schuette, R., Rotthowe, T.: The Guidelines of Modeling - An Approach to Enhance the
Quality in Information Models. In: Ling, T.-W., Ram, S., Li Lee, M. (eds.) ER 1998.
LNCS, vol. 1507, pp. 240–254. Springer, Heidelberg (1998)
18. Atkinson, C., Kühne, T.: Model-Driven Development: A Metamodeling Foundation. IEEE
Software 20, 36–41 (2003)
19. Ostrowski, L., Helfert, M., Xie, S.: A Conceptual Framework to Construct an Artefact for
Meta-Abstract Design. In: Sprague, R. (ed.) 45th Hawaii International Conference on Sys-
tem Sciences (HICSS), pp. 4074–4081. IEEE, Maui (2012)
20. OGC: ITIL Glossary of Terms, Definitions and Acronyms (2007)
21. Hevner, A.R., March, S.T., Park, J., Ram, S.: Design Science in Information Systems Re-
search. MIS Quarterly 28, 75–105 (2004)
22. Vaishnavi, V.K., William Kuechler, J.: Design Science Research Methods and Patterns:
Innovating Information and Communication Technology. Auerbach Publications, Boston
(2007)
23. Lankhorst, M.: Enterprise Architecture at Work. Springer (2009)
24. Vicente, M., Gama, N., Mira da Silva, M.: Using ArchiMate to Represent ITIL Metamo-
del. In: 15th IEEE Conference on Business Informatics (CBI), IEEE (2013)
25. Vicente, M., Gama, N., da Silva, M.M.: Using archiMate and TOGAF to understand the
enterprise architecture and ITIL relationship. In: Franch, X., Soffer, P. (eds.) CAiSE
Workshops 2013. LNBIP, vol. 148, pp. 134–145. Springer, Heidelberg (2013)
26. The Open Group: ArchiMate 2.0 Specification. Van Haren Publishing (2012)
27. Oates, B.J.: Researching Information Systems and Computing. Sage Publications (2006)
Formal Modeling and Analysis
of Home Care Plans
Abstract. A home care plan defines all the services provided for a given
patient at his/her own home and permits the coordination of the involved
health care professionals. In this paper, we present a DSL (Domain spe-
cific language) based approach tailored to express home care plans using
high level and user-oriented abstractions. Then we describe how home
care plans, formalized as timed automata, can be automatically gener-
ated from these abstractions. We finally show how verification and moni-
toring of the resulting care plan can be handled using existing techniques
and tools.
1 Introduction
A general trend that can be observed these recent years is to enable as much as
possible patients to stay at their own homes instead of having long-term stays
at hospitals or health establishments. This trend is motivated by obvious social
and economic reasons. Several types of care may be provided at a patient’s home
including health services, specialized care such as parenteral nutrition or activities
related to daily living such as bathing, dressing, toilet, etc. All the medical and
social activities delivered for a given patient according to certain frequencies
are scheduled in a so-called care plan. Hence, the notion of a care plan is a key
concept in home care area. As part of the project Plas’O’Soins1 , we are interested
by the problems underlying the design and management of home care plans.
The design of a care plan is however a difficult task. Indeed, process modeling
in the medical field is not trivial because it requires complex coordination and
interdisciplinary cooperation due to involvement of actors from various health
care institutions [7]. Furthermore, care plans are essentially unstructured pro-
cesses in the sense that each patient must have his/her own specific care plan.
Therefore, it is simply not possible to design a unique process capturing in ad-
vance the care plans of all the patients. Another important feature of care plans
lies in their associated complex temporal constraints. Indeed, the design of a
1
http://plasosoins.univ-jfc.fr/
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 494–501, 2014.
c Springer-Verlag Berlin Heidelberg 2014
Formal Modeling and Analysis of Home Care Plans 495
care plan requires the specification of the frequencies of the delivered home care
activities. Such specifications are expressed by healthcare professionals in nat-
ural language, using usually a compact form: Every day morning, each Monday
morning, etc. The home care activities are generally repetitive but may have ir-
regularities or exceptions. Given the crucial role played by temporal constraints
in home care plans, it appears clearly that such specifications could take benefit
from existing theory and tools in the area of timed systems [3]. In this paper,
we use timed automata [1], one of the most used modeling formalism to deal
with timing constraints, as a basis to develop a formal framework to analyze
care plans.
Solving the above problem and supporting design, analysis and verification,
execution and monitoring of home care plan require tackling a number of chal-
lenges. The first challenge consists in the design and modeling of care plans.
Due to the aforementioned features of care plans, it is not feasible to ask home
care professionals to describe directly a care plan using a formal language such
as, for example, timed automata. To cope with this difficulty, we first propose
a DSL (Domain Specific Language) and a user centered specification language
tailored to express home care plans using high level abstractions. We then define
an automatic transformation of user specifications into timed automata. The
resulting automaton is used to support automatic verification and monitoring of
home care plans.
The paper is organized as follows. Section 2 describes the DSL based approach
in which we mainly identify elementary temporal expressions. The general mod-
eling process is presented at section 3 together with the construction of the
proposed automata, i.e., pattern automata, activity automata and care plan au-
tomata. Section 4 presents some verification and monitoring issues. We discuss
the result of this work at section 5.
plan for a given patient. The main building block in a care plan is the notion of
activity. Our DSL includes several predefined activities identified by our analysis
of the application domain. Each activity of the care plan is associated with
a set of elementary temporal specifications. These specifications provide the
information about the time when the activity should be performed, expressed
as a quadruplet (Days, Time ranges, Period, Duration). In [6], we proposed a
language that enables to express regular or irregular repetitions of an activity
within some period in a condensed form, similar to that used by doctors. Figure 1
shows a simple example of a specification using this language. Each row of the
table corresponds to an elementary temporal specification. In the quadruplet
(Days, Time ranges, Period, Duration), Days and Time ranges fields can take
different forms (patterns) to reflect the various possibilities encountered in the
medical world [6]. Combination of elementary specifications permits to express
superposition of different repetitions. Exceptions are introduced via the keyword
except. Roughly speaking, the notion of a legal schedule of a care plan activity is
defined as a sequence of allowed instances of this activity which satisfies the set of
temporal specifications. An appropriate external representation of the care plan
is crucial to facilitate the work of the coordinator. Figure 1 shows the current
GUI (Graphical User Interface) developed to support a coordinator in designing
a care plan using the proposed DSL.
We recall that our main objective is to build a care plan automaton for a
given patient. To achieve this objective, we propose a three-steps approach which
consists in: (i) mapping each elementary temporal specification into a pattern
automaton, (ii) combination of pattern automaton to build an Activity automaton,
and (iii) construction of global care plan automaton by composition of activity
automata. These different steps are described below.
needed when the activity automata are at waiting states in order to synchronize
the reset of the day and week clocks (respectively, the variables xd and xw ).
In particular we propose to synchronize on -transitions (with reset) [5] when
their origin states are waiting ones. We will see in what follows a more formal
definition.
Definition 1. (Composition of timed automata) Let A1 = (S1 , s10 , Σ1 ,
X1 , Inv1 , T1 , W1 , E1 , St1 ) and A2 = (S2 , s20 , Σ2 , X2 , Inv2 , T2 , W2 , E2 , St2 ) be
two timed automata. The composition of A1 and A2 , denoted A1 × A2 , is the
timed automata (S1 × S2 , s01 × s02 , Σ1 ∪ Σ2 , Xs1 ∪ X2 , Inv, T ), where Inv (S1 , S2 )
= Inv (S1 ) ∧ Inv (S2 ) and the transitions T is the union of the following sets:
1. {((s1 , s2 ), , φ, λ, (s1 , s2 )) : (s1 , , φ1 ,λ1 ,s1 )∈ T1 and (s2 ,a,φ2 , λ2 ,s2 ) ∈ T2 ,
s1 and s2 are both ∈ W }.
2. {((s1 , s2 ), a, φ, λ, (s1 , s2 )): ((s1 ,a,φ1 ,λ1 ,s1 ) ∈ T1 , s2 =s2 ) or ((s2 , a, φ2 , λ2 ,
s2 )∈ T2 , s1 =s1 ), s1 and s2 are both ∈ St }.
3. {((s1 , s2 ), a, φ, λ, (s1 , s2 )): ((s1 , a, φ1 , λ1 , s1 )∈ T1 , s2 =s2 , s2 ∈ W/St, s1 ∈
E) or ((s2 , a, φ2 , λ2 , s2 )∈ T2 ,s1 =s1 ,s1 ∈ W/St, s2 ∈ E)}.
4. {((s1 , s2 ), a, φ, λ, (s1 , s2 )): ((s1 , a, φ1 ,λ1 ,s1 )∈ T1 , s2 =s2 , s2 ∈ W, s1 ∈ St)
or ((s2 , a, φ2 , λ2 , s2 )∈ T2 , s1 =s1 , s1 ∈ W, s2 ∈ St)}.
Figure 4 shows the result of composition of the Toilet and Injection automata.
The resulting automaton encompasses all the possible schedules of the activities
Toilet and Injection.
500 K. Gani et al.
Monitoring of home care plans. Note that most of the activities of a care plan
are manual. In current state of affairs, the activities that have been performed
are often recorded manually on paper. Our goal is to enable electronic recording
of executed activities in order to keep track of the execution traces of care plans.
Such information can then be used to monitor care plans. For example, com-
pliance of executions traces w.r.t. a care plan may be checked by reducing this
problem to the membership problem in the timed automata framework. Also,
the monitoring system may be used to detect executions that deviate from the
specification. More generally, a monitoring system can be enhanced with rules
that enable to trigger alerts when particular deviations are detected.
5 Discussion
We described in this paper, an approach to generate formal specifications of
home care plans, expressed as timed automata, from a set of high level and user-
oriented abstractions. We briefly discussed then how verification and monitoring
of the resulting care plan can be handled using existing techniques and tools. The
paper focuses on specific pattern (i.e., the relative days pattern). An extension
of this work to the other patterns is described in [8].
Our specification language can easily be extended in order to increase its ex-
pressivity and usability. Extensions are performed by introducing other patterns
for defining elementary temporal expressions. For example, patterns such as n
times per day or per week, would be useful in a medical context.
In this study we considered only the activities of a single care plan relative
to one patient. We intend to combine care plan automata to allow the planifica-
tion of the interventions of several patients. It is necessary to take into account
movements between patient homes and availability of human resources. Some
works [9,11] have already highlighted the interest of automata for the activities
planification. But in these works automata are directly designed by experts. In
our approach, automata would result from high-level specifications produced by
the administrator users.
References
1. Alur, R.: Timed automata. In: Halbwachs, N., Peled, D.A. (eds.) CAV 1999. LNCS,
vol. 1633, pp. 8–22. Springer, Heidelberg (1999)
2. Alur, R., Dill, D.: A theory of timed automata. TCS (1994)
3. Alur, R., Henzinger, T.: Logics and models of real time: A survey. In: de Bakker,
J.W., Huizing, C., de Roever, W.-P., Rozenberg, G. (eds.) REX 1991. LNCS,
vol. 600, pp. 74–106. Springer, Heidelberg (1992)
4. Behrmann, G., David, A., Larsen, K.G.: A tutorial on uppaal. In: Bernardo,
M., Corradini, F. (eds.) SFM-RT 2004. LNCS, vol. 3185, pp. 200–236. Springer,
Heidelberg (2004)
5. Bérard, B., Petit, A., Diekert, V., Gastin, P.: Characterization of the expressive
power of silent transitions in timed automata. Fundam. Inf. (1998)
6. Bouet, M., Gani, K., Schneider, M., Toumani, F.: A general model for specifying
near periodic recurrent activities - application to home care activities. In: e-Health
Networking, Applications Services (Healthcom) (2013)
7. Dadam, P., Reichertand, M., Kuhn, K.: Clinical workflows - the killer application
for process-oriented information systems? Business (2000)
8. Gani, K., Bouet, M., Schneider, M., Toumani, F.: Modeling home care plan. Rap-
port de recherche RR-14-02, Limos, Clermont Ferrand, France (2014)
9. Abdeddaı̈m, Y., Maler, O.: Job-shop scheduling using timed automata. In: Berry,
G., Comon, H., Finkel, A. (eds.) CAV 2001. LNCS, vol. 2102, pp. 478–492. Springer,
Heidelberg (2001)
10. Menezes, A.L., Cirilo, C.E., de Moraes, J.L.C., de Souza, W.L., do Prado, A.F.:
Using archetypes and domain specific languages on development of ubiquitous ap-
plications to pervasive healthcare. IEEE Computer Society (2010)
11. Paneka, S., Engella, S., Strsberg, O.: Scheduling and planning with timed au-
tomata. ISPSE, Elsevier (2006)
Effort Analysis Using Collective Stochastic
Model
Vugranam C. Sreedhar
1 Introduction
Strategic outsourcing (SO) happens when one company outsources part of its
business to another company. A service provider and a service consumer nego-
tiate a contract that outlines different kinds of work that needs to be done in
terms of managing the consumer’s business. A strategic outsourcing company,
such as IBM, manages Information Technology (IT) infrastructure and applica-
tions for many different companies. A breach of contract happens when services
are not delivered as negotiated in the contract. Very often, even when services
are delivered that are in par with what is negotiated in the service level agree-
ments (SLAs), a service consumer can quickly become unhappy when things go
wrong. There are many reasons why a contract can become troubled or risky, in-
curring loss to a service provider. A service provider strives very hard to provide
services that will increase profitability, customer loyalty and customer value. An
SO contract often include SLAs that when violated, the service consumer can
impose penalty on the service provider.
A large service provider, such as IBM, have service delivery centers to manage
several customers. The management of IT of a customer is broken down into
different kinds of work orders (WOs). A work order can be as simple as a request
to change someone’s password to as complex as migrating 100 physical servers
(along with the applications) to a cloud environment. Very often complex WOs
are broken down into smaller WOs that are easy to track and manage. Different
WOs take different amount of time to resolve. A key question is then to ask is:
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 502–509, 2014.
c Springer-Verlag Berlin Heidelberg 2014
Effort Analysis Using Collective Stochastic Model 503
How much time (or effort) is needed, and hence how many full time employees
(FTEs) are needed to resolve work orders, say in a month or a year?
In this article we develop a collective stochastic model (CSM) to determine
the total time or effort, and hence the number of FTEs, needed to resolve work
orders over certain time period such as a month or a year. The main contribution
of this paper is to apply the well established theory of collective stochastic pro-
cess model, and in particular ruin theory developed in actuarial science, to model
services delivery system [7]. Modeling services delivery system is a non-trivial
exercise, and developing mathematical models will allow future researchers to
optimize and gain deeper insights into the complex behavior of services delivery
system. To the best of our knowledge, ours is the first comprehensive attempt
to leverage concepts from actuarial science and ruin theory to model portions of
services delivery system, and in particular, to model effort, contract loss proba-
bility, and staffing requirements.
Work orders arrive one at a time and each work order is independent of each
other. Let {N (t), t ≥ 0} denote the number of work orders that was processed
before time t. We assume that N (0) = 0, and N (t) ≥ 0, ∀t ≥ 0. In other words,
there are no work orders before t = 0, and there cannot be negative number
of work orders. Therefore, N (t) is non-decreasing in t. For s < t, we also have
N (t) − N (s) equals the number of work orders in the time interval (s, t]. We can
now define the nth work order arrival as Tn = inf{t ≥ 0 : N (t) = n} and the
inter-arrival time of work order as An = Tn − Tn−1 . The model described above
captures the basic set of assumptions needed to describe a work order arrivals.
It is important to keep in mind that N (n), Tn , and An are all random variables
and for n ≥ 0, they form a stochastic process.
A (homogeneous) Poisson process is a very simple stochastic process that has
two important properties: independence property and stationary property. The
independence property states that for ∀i, j, k, 0 ≤ ti ≤ tj ≤ tk , N (tj ) − N (ti )
is independent of N (tk ) − N (tj ). In other words, the number of events in each
disjoint interval are independent of each other. The stationary property states
that ∀s, t, 0 ≤ s < t, h > 0, N (t) − N (s) and N (t + h) − N (s + t) have the same
distribution.
A homogeneous Poisson is too restrictive when we include the time it takes
to resolve a work order. We next assume that the time it takes to resolve a
work order, that is, the effort, itself is a random variable. We use Collective
Poisson Process to model the aforementioned situation. A stochastic process
{X(t), t ≥ 0} is called a collective Poisson process if it can be represented as
follows:
N (t)
S(t) = C1 + C2 + . . . CN (t) = Ci , t ≥ 0 (1)
i=1
504 V.C. Sreedhar
To ensure that the work orders do not all collapse at Ai = 0, we also assume
that P (Ai = 0) < 1. Once again we assume both the independence and station-
ary properties for work order arrivals. It can easily be shown that N (t) as defined
by Equation 2 cannot be infinite for some finite time t [8]. In renewal process the
inter-arrival times An is distributed with a common distribution function FA ,
and FA (0) = 0 and Tn = 0. The points {Tn } are called the renewal times. Notice
that the function FA is a Poisson distribution function for Poisson process. Let
us assume that the distribution function FA has mean μ, one can then show the
following result:
N (t) μ−1 , if μ < ∞
lim = (4)
t→∞ t 0, if μ = ∞
Recall that with collective Poisson process it was simple to derive a model
for aggregated work order (see Equation 1). On the other hand it is almost
impossible to determine the distribution FA for renewal process {N (t), t ≥ 0}. So
we use the central limit theorem to get an approximate work order distribution.
Let 0 < V ar[Ai ] < ∞ and μ = E[Ai ], then ∀x ∈
N (t) − tμ−1
lim P √ = Φ(x) (5)
t→∞ ct
where c = μ−3 V ar[Ai ], and Φ(x) is the standard normal distribution function.
The above results allows us to look for E[N (t)] for which we can use renewal
function. We then define the renewal function as the average number of renewals
in the interval (0, t] as M (t) = E[N (t)] + 1.
Let F (k) denote the k-fold convolution of FA , which is the underlying dis-
tributions of the renewal process {N (t)}. Since {N (t) ≥ k} = {Ak ≤ t} for
Effort Analysis Using Collective Stochastic Model 505
k = 1, 2, . . . we can derive the following result relating the mean value and the
distribution.
∞
M (t) = 1 + P (N (t) ≥ k)
k=1
∞
= 1+ P (Ak ≤ t)
k=1
∞
(k)
= FA (t) (6)
k=0
In the previous two sections we developed models for WO arrivals and WO effort.
In this section we will combine the two models to calculate the probability that a
contract will exceed the allocated budget for resolving work orders.1 The Contract
Loss Probability (CLP) gives a good indication of the health of a contract. This
quantity can be used for staffing decision, resource allocation, staff training, and
work order dispatch optimization.
In a typical SO contract during engagement phase, the customer environment
is “discovered” and “analyzed” for sizing the cost of the contract. Various factors,
such as the number of servers, types of servers, number of historical tickets that
were generated and resolved, management process, etc., are used to determine
the cost of the contract. A typical cost model include unit price such as cost
per server per month. The way these unit prices are computed is more of an art
than science. Productivity factors, market competition, economy of scale and
other external factors are also incorporated into the pricing or cost model. Once
a contract is signed, service provider allocate quarterly or monthly budget for
different services of the contract and when operational cost exceeds the allocated
budget, the contract is considered to be “troubled” and management systems
are put in place to track the services.
Let us assume that each client account has a periodic (say, quarterly) budget
q(t) = rt, which is the budget rate, and so q(t) is deterministic. We can then
define the following contract loss process: Z(t) = a + rt − S(t), t ≥ 0, where a is
some initial base budget allocated for resolving work orders. We can see that if
Z(t) < 0 for some t ≥ 0, then we have a contract loss for that time period, that
is, effort spent exceeds the allocated budget for resolving work orders. Assuming
collective Poisson process, a minimum requirement in determining the contract
budget rate r is then given by r > λE[S], where λ is the Poisson WO arrival
rate. The above condition is called the net profit condition. A safer condition
would be to include a safety factor ρ, so that c > (1 + ρ)λE[S].
We can define the contract loss time as τ0 = inf {t ≥ 0 : S(t) > 0}, and
the contract loss probability as φ(z) = P (τ0 < ∞|S(0) = z) = Pz (τ0 < ∞). If
we assume that X(t) is a collective Poisson process, we can then calculate the
contract loss probability φ(z) as a closed form solution by focusing on the tail
1
It is important to keep in mind that a contract will allocate budget for different
activities, and resolving work order is one of the major activities of a contract. In
this article we will just focus on budget for resolving work orders.
Effort Analysis Using Collective Stochastic Model 507
end of the claim size distribution. Let ψ(t) = 1 − φ(t) denote the tail of the
contract loss probability, then
∞
θ 1
ψ(t) = F ∗(n) (t), t ≥ 0 (7)
1 + θ n (1 + θ)n
where F ∗(n) is the n-fold convolution of the distribution function F (x), and θ =
r
( λμ − 1), μ = E(Ci ), r is the budget rate, and λ is the Poisson arrival rate of the
work orders. Now when the effort sizes are (light-tailed) exponentially distributed
P (Ci > c) = e−c/μ , we can derive the following contract loss probability:
1 θ
ψ(t) = exp − t ,t ≥ 0 (8)
1+θ (1 + θ)μ
Notice that we made two assumptions when deriving the above contract loss
probability: (1) work order arrivals follows a Poisson process, and (2) effort or
time spent on work orders follows (light-tailed) exponential distribution.
There are several methods for calculating the staffing budget. The Expected
Value principle can be stated as follows [6]: Π(S) = (1 + a)E[S], where a is
a safety loading factor. The expected value budget is very simple, but it does
not take into account the variability in the effort. We can extend this model to
include variability as follows: Π(S) = E[S] + aV ar[S]
One issue with the above Variance principle is that different delivery center
may have custom staffing budget, depending on local labor policy, pay scale,
monetary values, etc. To handle such changes to loading factor, we can use the
following modified Variance principle: Π = E[S] + a VE[S]
ar[S]
508 V.C. Sreedhar
Our focus in this paper is not to develop a new compound stochastic process
model, but to apply concepts from ruin theory in actuarial science for modeling
IT service delivery system, and in particular to model “effort” needed to manage
a customer IT environment, and to understand under what condition a contract
can become troubled. To the best of our knowledge, ours is the first work that
models IT service delivery leveraging ruin theory from actuarial science. A lot
more work is needed to fully model IT services delivery system. Please refer to
the technical report that explains in details on modeling effort, contract loss
probability, and staffing requirements, beyond what is explained in the current
article [10].
IT service delivery is a complex process with many intricate processes, man-
agement systems, people’s behavior, and tool sets. Diao et al. proposed a mod-
eling framework for analyzing interactions among key factors that contribute to
the decision making of staffing skill level requirements [3,4]. The authors develop
a simulation approach based on constructed and real data taking into considera-
tion factors such as scheduling constraints, service level constraints, and available
skill sets. The area of optimal staffing with skill based routing is a mature area.
Analytical methods are typically complex and do not capture full generality of
real IT service delivery systems. The main focus of our paper is not to model the
full generality of IT service delivery system. We focus on developing a compound
stochastic process model to model effort needed to handle service requests. We
focus on understanding the underlying stochastic model for when a contract can
become “troubled”.
Staffing problem based on queuing theory is old problem and several solutions
have been proposed to model in the past. The staffing problem can be simply
stated as the number of staff members or agents required to handle work orders,
such as calls in a call center, as a function of time. Skill based routing problem is
an extension of staffing problem where skills set are incorporated to determine
which staff skill is needed as a function of time [5]. Staffing problem are typically
modeled a queuing problem rather than as a compound stochastic process. Coban
models staffing problem in a service center as a multi-server queuing problem
with preemptive-resume priority service discipline and uses Markov chain to
model [2].
Buco et al describe a method where in they instrument a management system
to capture time and effort when SAs work on work orders [1]. They collect this
information from multiple SAs working on different kinds of WOs. The collected
data is a sample of the universe of IT service environment. One can use the
sampled data to estimate the staffing requirement of a contract.
8 Conclusion
IT services delivery system is a complex system. There has been very little work
done to model such a system, mostly due to lack of mathematical maturity in
Effort Analysis Using Collective Stochastic Model 509
this field. Fortunately, actuarial science and ruin theory provides a foundational
mathematics that can be applied to modeling IT services delivery system. We
have made several simplifying assumptions such as WOs are independent of each
others, all WOs are the same, etc. We are currently refining the mathematics
to relax some of these simplifying assumptions. The resulting analytical model
will become even more complex, and so can use a combination of estimators and
Monte Carlo simulation for understanding the asymptotic behavior of a contract.
References
1. Buco, M., Rosu, D., Meliksetian, D., Wu, F., Anerousis, N.: Effort instrumentation
and management in service delivery environments. In: International Conference on
Network and Service Management, pp. 257–260 (2012)
2. Coban, E.: Deterministic and Stochastic Models for Practical Scheduling Problems.
Ph.D. thesis, Carnegie Mellon University (2012)
3. Diao, Y., Heching, A., Northcutt, D., Stark, G.: Modeling a complex global service
delivery system. In: Winter Simulation Conference, pp. 690–702 (2011)
4. Diao, Y., Lam, L., Shwartz, L., Northcutt, D.: Sla impact modeling for service
engagement. In: International Conference on Network and Service Management,
pp. 185–188 (2013)
5. Gans, N., Koole, G., Mandelbaum, A.: Telephone call centers: tutorial, review and
research prospects. Manufacturing and Service Operations Management 5(2), 79–141
(2013)
6. Geiss, C.: Non-life insurance mathematics (2010),
http://users.jyu.fi/~ geiss/insu-w09/insurance.pdf
7. Rolski, T., Schmidli, H., Schmidt, V., Teugels., J.: Stochastic Processes for Insur-
ance and Finance. Wiley (1999)
8. Ross, S.: A First Course in Probability. Pearson Prentice Hall (2006)
9. Sparre, A.: On the collective theory of risk in case of contagion between claims.
Transactions of the XVth International Congress of Actuaries 2(6) (1957)
10. Sreedhar, V.: Effort analysis using collective stochastic model. Tech. rep., IBM
Technical Report (2014)
A Novel Equitable Trustworthy Mechanism for Service
Recommendation in the Evolving Service Ecosystem
Keman Huang1, Yi Liu2, Surya Nepal3, Yushun Fan2, Shiping Chen3, and Wei Tan4
1
School of Computer Science and Technology, Tianjin University, Tianjin 300072, China
victoryhkm@gmail.com
2
Department of Automation, Tsinghua University, Beijing 100084, China
{yi-liu10,fanyus}@mails.tsinghua.edu.cn
3
CSIRO, Digital Productivity and Services Flagship, Australia
surya.nepal@csiro.au
4
IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA
wtan@us.ibm.com
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 510–517, 2014.
© Springer-Verlag Berlin Heidelberg 2014
A Novel Equitable Trustworthy Mechanism for Service Recommendation 511
Equality, also known as fairness, has been studied in many disciplines [9]. For the
service ecosystem, we define equality as both existing services and newcomers have a
fair chance of being selected and building trust. Some try to offer fairness from the
bootstrapping aspect [8,10-12]. However, it is non-trivial to assign an equitable boot-
strapping trust value for the new services. Actually, the problem of the unfairness in
the traditional trustworthy methods arises from the situation that the new services
have to compete with the ones which have built trust over time as soon as they enter
the ecosystem. Thus the basic idea here is to split all the services in the same domain
into the novice service queue and the mature service queue so that the new services
only compete with new ones until they grow matured. Difference mechanisms for the
novice and mature services over the four-step trustworthy service recommendation
(trust bootstrapping, service organization, recommendation generation and trust up-
dating) need to be designed to distinguish the difference between them. Hence the
major contributions of this paper can be summarized as follows:
The remainder of this paper is organized as follows. Section 2 describes the formal
definition of equality guarantee. Section 3 presents the proposed four-phase equitable
trustworthy recommendation model. Section 4 reports the experimental result. Section
5 concludes the paper.
2 Equality Guarantee
Equality measures are based on the proportions of shared resources in the system. In
the service ecosystem, services with the similar functionality will compete with each
other to gain the opportunity of being selected by consumers. As a consequence, in
this paper, the resource in service ecosystem can be defined as the opportunity of
being selected in the composition.
Equality Metric:
Gini Index has been widely used for fairness measure [13]. Here we reuse Gini
Index as service equality metric in a service ecosystem. Suppose S is the set of servic-
es in the service ecosystem. According to the number of resources allocated to each
service, they can be divided into x subset. Let S r = i present the services with i re-
source, then our Gini index is defined as:
| Sr =i | 2
Gini = 1 − i =1 (
x
) (1)
|S|
512 K. Huang et al.
Here the function | ∗ | refers to the number of item in any given set. Additionally,
in a similar manner to how Shannon defines information, the entropy-based fairness
[14] in the service ecosystem can be defined as:
| Sr =i | |S |
EnFair = − i =1
x
log( r =i ) (2)
|S| |S|
| RS |
ReDi = (3)
|S|
Here RS refers to all the unique services which are recommended to the consumers.
White-washing Prevention:
White-washing phenomenon means that services may re-enroll into the ecosystem as
new services to white wash their historical records. Suppose ARB( si ) refers to the
allocated resource number of service si if it keeps the same behavior as before,
ARA( si ) refers to the one after it white-washes its historical information. Thus we can
define the white-washing prevention effect for this service as follows:
ARB( si )
WWP ( si ) = (4)
ARA( si )
Then the white-washing prevention effect for the service ecosystem can be consi-
dered as the average of the white-washing prevention effect for each service:
1 ARB( si )
WWP =
| S | si ∈S ARA( si )
(5)
If WWP > 1 , the system can prevent the white-washing phenomenon. A larger
WWP indicates a better performance in white-washing prevention.
service recommendation consists of the following four important steps: trust boot-
strapping, service organization, recommendation generation and trust updating. Notes
that the requirement decomposed and domain mapping are not included as they are
dealt in the same way for both novice and mature services. Hence our equitable trust-
worthy recommendation mechanism (ETRM) works in four steps as follows:
generate the recommendation list. Obviously, the proportion of the mature services in
the recommendation candidates is adjustable to reflect an ecosystem’s principal and
business model. For example, if the system is conservative, q can be very big (even q
= k, where being equivalent to no novice services queue). If the system welcomes and
encourages new services, a smaller q would be selected, e.g., q = k/2.
• Single List Presentation Strategy (SLP): The mature service candidates and the
novice service candidates are merged into a single list. Thus it is “One Domain
One Recommend List”.
• Double List Presentation Strategy (DLP): The mature service candidates and the
novice service candidates are recommended to the consumer separately using two
lists for consumers to select. Thus it is “One Domain Double Recommend List”.
where w = [0,1] refers to the weight of the feedback trust which varies from 0 to 1.
Tt ( si ) = Tt −1 ( si ) × e − λ (7)
where Tt ( si ) refers to the service’s trust at the end of time interval t and λ is the
parameter to control the evaporation speed. Obviously, we can use different λ for
mature and novice services so that the trust values will evaporate in a different speed.
A Novel Equitable Trustworthy Mechanism for Service Recommendation 515
Hence we note λm as the evaporation speed control parameter for mature services
and λn for novice services.
To examine the performance of the proposed approach and make the simulation expe-
riment fitting with the actual data, the same to our previous work [4], we obtain the
data set from ProgrammableWeb, by far one of the largest online service ecosystem,
which contains 7077 services and 6726 compositions over 86 time intervals. Each
service contains the information such as name, domain and publication date. Each
composition contains the information such as name, creation date, the invoking ser-
vices’ domain list and its visited number as well as the user rating which are used to
calculate the composition’s feedback trust for the invoking services.
As discussed before, by setting the protection-time-window as 0, the proposed
ETRM will reduce to the traditional trustworthy model. The recommendation candi-
dates will all be mature and the presentation strategy will only be SLP. Also, only one
evaporation speed control parameter will be considered. Thus, we can get the tradi-
tional trustworthy models by setting Amature = 0 , q = k , ps = SLP , λm = λn . Hence
based on the different bootstrapping strategies, we consider the following baselines:
The bootstrapping strategy is set as DB and the initial value Tini is given. If a high
initial value is used, Tini = 0.7 , we get the None Approach [12], named as nTTDIT; If
a low initial value is used Tini = 0.3 , we get the Punishing Approach [11], named as
pTTDIT.
The bootstrapping strategy is set as AB and the average trust value in the communi-
ty is used as the initial trust value. We get the Adaptive Approach [8], named as TTAA
in this paper.
Equitable Guarantee
First of all, we consider the three ETMs which have different parameter combina-
tions. Here, for nETMDIT and pETMDIT, we set the Tmature = Tini + 0.2 so that the
novice services can move to the mature queue after they build their trust. For the
ETMAA with the adaptive initial strategy, we just use the average trust value over
time as the threshold, which is 0.7 in our experiment. Then we set Amature = 15 to
make sure the length of the mature and novice queue in the system is comparable.
The evaporation speed for both mature and novice services are set as 0.005.
516 K. Huang et al.
White-washing-prevention
In order to simulate the white-washing prevention phenomenon, given a time inter-
val tw , all the mature services in the ecosystem are republished. Each service’s status
is set as novice and the initial trust value are assigned to these services. Then, the total
selected frequency of these services after white-washing is collected and the
WWP can be calculated. Here, we set tw as the time interval when the number of the
published compositions is half of the total number over the whole period. In order to
remove the random effect, we run 5 round simulations for each models and the aver-
age WWP is used.
From Table. 1, we can conclude that the three ETRMs gain better performance
than the traditional trust methods. They achieve a 19.31%~31.08% reduction in Gini
index, 20.61%~30.21% increases in entropy-based fairness and 215.64%~ 239.16%
diversity improvements. This is because of the separation between novice and mature
services that makes the novice services gain an equitable opportunity to be recom-
mended and selected by the consumers for the compositions. Also all the three
ETRMs gain a 5.22%~22.99% higher WWP than the traditional methods. It means
that the white-washing services in our ETRMs g a lower possibility to be reused.
5 Conclusion
The trustworthy service recommendation has become indispensable for the success of
a service ecosystem. However, traditional approaches overlook the service equality
for the usage of services, which harms the extension and growth of the service ecosys-
tem. To our best knowledge, this is the first work to: (a) identify the service equality
problem in the service ecosystem as well as the evaluation metrics including the
equality measurement and the white-washing-prevention effect; (b) propose an equit-
able trustworthy mechanism which distinguishes the difference between mature and
novice services to ensure the equality. The empirical experiments based on Program-
mableWeb show the effectiveness and usefulness of the proposed approach for equali-
ty guarantee and white-washing-prevention.
In the future, we will further work on the affection of the parameter combinations to
the performance and then construct the mathematical model for the equitable trustwor-
thy model as well as the approach to optimize the evolution of service ecosystems.
A Novel Equitable Trustworthy Mechanism for Service Recommendation 517
References
1. Wang, X., Liu, L., Su, J.: Rlm: A general model for trust representation and aggregation.
IEEE Transactions on Services Computing 5(1), 131–143 (2012)
2. Malik, Z., Akbar, I., Bouguettaya, A.: Web Services Reputation Assessment Using a
Hidden Markov Model. In: Baresi, L., Chi, C.-H., Suzuki, J. (eds.) ICSOC-ServiceWave
2009. LNCS, vol. 5900, pp. 576–591. Springer, Heidelberg (2009)
3. Yahyaoui, H.: A trust-based game theoretical model for Web services collaboration.
Knowl.-Based Syst. 27, 162–169 (2012)
4. Huang, K., Yao, J., Fan, Y., Tan, W., Nepal, S., Ni, Y., Chen, S.: Mirror, mirror, on the
web, which is the most reputable service of them all? In: Basu, S., Pautasso, C., Zhang, L.,
Fu, X. (eds.) ICSOC 2013. LNCS, vol. 8274, pp. 343–357. Springer, Heidelberg (2013)
5. Huang, K., Fan, Y., Tan, W.: Recommendation in an Evolving Service Ecosystem Based
on Network Prediction. IEEE Transactions on Automation Science and Engineering 11(3),
906–920 (2014)
6. Sherchan, W., Nepal, S., Paris, C.: A Survey of Trust in Social Networks. ACM Comput.
Surv. 45(4), 41–47 (2013)
7. Malik, Z., Bouguettaya, A.: Rateweb: Reputation assessment for trust establishment
among web services. The VLDB Journal—The International Journal on Very Large Data
Bases 18(4), 885–911 (2009)
8. Malik, Z., Bouguettaya, A.: Reputation bootstrapping for trust establishment among web
services. IEEE Internet Computing 13(1), 40–47 (2009)
9. Seiders, K., Berry, L.L.: Service fairness: What it is and why it matters. The Academy of
Management Executive 12(2), 8–20 (1998)
10. Yahyaoui, H., Zhioua, S.: Bootstrapping trust of Web services based on trust patterns and
Hidden Markov Models. Knowledge and Information Systems 37(2), 389–416 (2013)
11. Zacharia, G., Moukas, A., Maes, P.: Collaborative reputation mechanisms for electronic
marketplaces. Decis. Support Syst. 29(4), 371–388 (2000)
12. Marti, S., Garcia-Molina, H.: Taxonomy of trust: Categorizing P2P reputation systems.
Computer Networks 50(4), 472–484 (2006)
13. Yitzhaki, S.: On an extension of the Gini inequality index. International Economic
Review, 617–628 (1983)
14. Elliott, R.: A measure of fairness of service for scheduling algorithms in multiuser
systems. In: IEEE Canadian Conference on Electrical and Computer Engineering, pp.
1583–1588 (2002)
Semantics-Based Approach for Dynamic
Evolution of Trust Negotiation Protocols
in Cloud Collaboration
1 Introduction
Collaboration environments have been widely adopted for diverse domains, from
scientific domains to end-user communities on the Web. Recently the resources
sharing among people in collaborative environments have been managed using
cloud computing platforms [1]. In cloud collaboration environments, making ac-
cess control decisions to resources managed in cloud platforms is a hard task
because of the size and dynamics of the users [2,3].
Trust negotiation has been proposed as a viable authorization solution for
addressing the issue [7,3]. A trust negotiation protocol 1 describes a negotiation
process between negotiation parties, in the sense that it specifies which creden-
tials (e.g., digital versions of passports or credit cards) a service provider and
users should exchange for the users to access protected resources [6].
Although existing approaches for addressing trust negotiation issues have
made significant progress (see [3] for a recent survey), little work has been
done on the problem of dynamic protocol evolution, which refers to manag-
ing the ongoing negotiations when an existing protocol has been changed. In
Most of the work was done when the author was as a postdoc at Qatar University.
1
In this paper we use “trust negotiation protocol” and “protocol” interchangeably.
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 518–526, 2014.
c Springer-Verlag Berlin Heidelberg 2014
Semantics-Based Approach for Dynamic Evolution 519
2 Preliminaries
In what follows, we explain the protocol model for representing trust negotiation
protocols and then present an example scenario.
520 S.H. Ryu et al.
MigrationStrategies Description
discloseORCID
discloseORCID Continue Active negotiations can continue to run according
Course to the old protocol. However, in some cases, this
Designer strategy could be inapplicable, e.g., when there are
disclosePC security holes in the old protocol.
Migrationtonew Active negotiations are migrated to the new
Exam
LectureSlide discloseAC&PC
protocol protocol.
Designer
Editor Migrationtotemporary Temporary protocols are defined to manage those
protocol active negotiations for which the other strategies
are not applicable.
discloseAC Textbook
Editor (C) Relationships between sequences of P and P’
discloseAC&RID Semantically equivalent sequences in protocols P and P’ ( )
discloseRID P: CourseDesigner.discloseAC().disclosePC().TextbookEditor
§ P’: CourseDesigner.discloseAC&PC().TextbookEditor
Fig. 2. Changed protocol P’ for the education material co-authoring service and some
migration strategies applicable to ongoing negotiations
AddSequentialTransition RemoveSequentialTransition
Example 1. Consider the old protocol (Figure 1(a)) and the new protocol (Fig-
ure 2(a)). Assume that a protocol manager changes the sequence (CourseDesigner.
discloseAC().disclosePC().TextbookEditor) of protocol P to the sequence (CourseD
esigner.discloseAC&PC().TextbookEditor) of protocol P’ by merging two messages
into one message. She applies the composite operator “MergeTransition” to ex-
press the intention on that the two sequences are equivalent semantically.
SplitTransition (State s, State t, Message m): This operator splits a transition
with message m into n transitions with m1 , ..., mn . It is applied in the opposite
case of the situation in which the “MergeTransition” operator is applied.
AddSequentialTransition(State s, State t, Message sm): This operator adds
sequentially a transition with message sm between source state s and target
state t. It could be used when protocol P requires an extra message from s to
t, which protocol P does not support.
RemoveSequentialTransition (State s, State t, Message sm): This operator
removes a sequential transition with message sm between source state s and
target state t. It is applied when protocol P needs to receive two messages for
clients to access target state t from source state s while protocol protocol P
only requires one of them to grant the access to the same state.
MessageSwap (State s, State t, Message m1, Message m2 ) This operator swaps
two messages m1 and m2 between two states s and t. It is applied in situations
where, though the order of exchanged messages is swapped, a protocol manager
only makes sure that two credentials has been submitted, regardless of the order.
In contrast to the previous works that only rely on the old and new protocols
for syntactically comparing message sequences in the change impact analysis, we
exploit the semantic equivalence between sequences in terms of message contents.
100 Pair 3
50
0
5000 10000 15000 20000 25000 5000 10000 15000 20000 25000
# of negotiations # of negotiations
5 Evaluation
We now present the evaluation results that show how our system can be effec-
tively utilized in the dynamic evolution of protocols.
Evaluation Methodology: We evaluated the performance of change impact (re-
placeability) analysis in terms of the scalability and effectiveness. For this, we
defined three pairs of protocols (each pair consisting of old and new protocols)
with different number of states. For example, in pair one, an old protocol had
18 states and 19 transitions and a new protocol 16 states and 17 transitions. We
populated the system with a number of artificial negotiations (e.g., 5k, 10k, etc).
Results: Figure 5(a) shows the time taken to perform the replaceability analyses
from Section 4.2 for the three pairs of protocols. For example, for 15k negoti-
ations, it took about 100 seconds in performing all the replaceability analyses
on the negotiations and determining which ones are migrateable. As we can see
from the figure, the time taken to complete the analyses grows linearly with
respect to the number of negotiations. In the second evaluation, we measured
how much the rate of successful migration could be improved by considering
the change semantics as an additional knowledge in the replaceability analysis.
Figure 5(b) shows the improvement rate for the protocol pairs. For instance,
for 20k negotiations simulated in the pair two, we could obtain the improvement
rate 89% ( (91−48)
48 = 89, where the 48% is the successful migration rate from
the previous works and the 91% is the rate from this work). In the figure, we
can see that we would achieve better migration rate by taking into account the
semantics behind protocol changes.
6 Conclusion
This paper proposed an approach for considering both message sequences and
their contents in managing the dynamic protocol evolution problem. Particularly,
we presented composite change operators for expressing the semantic equivalence
between message sequences. We also proposed the change impact analysis that
considers the change semantics as an additional knowledge.
526 S.H. Ryu et al.
References
1. Chard, K., Bubendorfer, K., Caton, S., Rana, O.F.: Social cloud computing: A vision
for socially motivated resource sharing. IEEE T. Services Computing 5(4), 551–563
(2012)
2. Lee, A.J., Winslett, M., Basney, J., Welch, V.: The traust authorization service.
ACM Trans. Inf. Syst. Secur. 11(1) (2008)
3. Noor, T.H., Sheng, Q.Z., Zeadally, S., Yu, J.: Trust management of services in cloud
environments: Obstacles and solutions. ACM Comput. Surv. 46(1), 12 (2013)
4. Ryu, S.H., Casati, F., Skogsrud, H., Benatallah, B., Saint-Paul, R.: Supporting the
dynamic evolution of web service protocols in service-oriented architectures. ACM
Transactions on the Web 2(2) (2008)
5. Skogsrud, H., Benatallah, B., Casati, F., Toumani, F.: Managing impacts of security
protocol changes in service-oriented applications. In: ICSE (2007)
6. Skogsrud, H., Nezhad, H.R.M., Benatallah, B., Casati, F.: Modeling trust negotia-
tion for web services. IEEE Computer 42(2) (2009)
7. di Vimercati, S.D.C., Foresti, S., Jajodia, S., Paraboschi, S., Psaila, G., Samarati, P.:
Integrating trust management and access control in data-intensive web applications.
TWEB 6(2), 6 (2012)
Social Context-Aware Trust Prediction in Social
Networks
Abstract. Online social networks have been widely used for a large
number of activities in recent years. Utilizing social network information
to infer or predict trust among people to recommend services from trust-
worthy providers have drawn growing attention, especially in online envi-
ronments. Conventional trust inference approaches predict trust between
people along paths connecting them in social networks. However, most
of the state-of-the-art trust prediction approaches do not consider the
contextual information that influences trust and trust evaluation. In this
paper, we first analyze the personal properties and interpersonal proper-
ties which impact trust transference between contexts. Then, a new trust
transference method is proposed to predict the trust in a target context
from that in different but relevant contexts. Next, a social context-aware
trust prediction model based on matrix factorization is proposed to pre-
dict trust in various situations regardless of whether there is a path from
a source participant to a target participant. To the best of our knowledge,
this is the first context-aware trust prediction model in social networks
in the literature. The experimental analysis illustrates that the proposed
model can mitigate the sparsity situation in social networks and gener-
ate more reasonable trust results than the most recent state-of-the-art
context-aware trust inference approach.
1 Introduction
In recent years, a growing and large number of users have joined e-commerce,
online employment and social network web sites while online social networks
have proliferated to be the platforms for a variety of rich activities, such as
seeking employees and jobs, and trustworthy recommendations for products and
services. In such activities, trust (the commitment to a future action based on a
belief that it will lead to a good outcome, despite the lack of ability to monitor
or control the environment [2]) is one of the most critical factors for the decision
making of users. It is context dependent and it is rare for a person to have
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 527–534, 2014.
c Springer-Verlag Berlin Heidelberg 2014
528 X. Zheng et al.
full trust on another in every facet. For example, the case of full trust in all
aspects is less than 1% at popular product review websites of Epinions.com
and Ciao.co.uk [12]. In real life, people’s trust to another is limited to certain
domains.
Trust prediction is the process of estimating a new pair-wise trust relation-
ship between two participants in a context, who are not directly connected by
interactions in the context [14]. Recently, some studies suggest to predict trust
taking into account some kind of social contextual information. Liu et al. [7]
propose a randomized algorithm for searching a sub-network between a source
participant and a target one. In this work, contextual factors, such as social inti-
macy and role impact factor, are taken into account as constraints for searching,
rather than simple trust inference or propagation. Wang et al. [12] propose a
probabilistic social trust model to infer trust along a path in a social network
exploring all available social context information. However, this method only
relies on trust paths and ignores participants off the path who might also have
an impact on the predicted trust.
In the literature, most trust prediction models suffer from the following draw-
backs: (i) The property of trust values has not been studied sufficiently. For
example, the similarity of people’s trust can be modeled not only from the trust
values but also from their distributions [14]. (ii) The diversity of social contexts
is not well dealt with. In real life, the connection between two people can be any
of friendship, family member, business partnership, or classmate etc. Even the
same relationships—say friendship, their interaction frequency and interaction
contexts can be largely different [12]. (iii) The ways to incorporate social infor-
mation require further study as inappropriate introduction of social information
may introduce noise and degrade the trust prediction quality. (iv) Differences
of contextual information are not handled properly. For example, how to model
the relationship of two contexts? To what extent, the trust in context Ci can be
transferred to context Cj ?
In order to address the above drawbacks, we first present a social context-
aware network model taking into account both personal properties (i.e., features
extracted from personal preference, habit, expertise and active context revealed
in historical data) and interpersonal properties (i.e., features extracted from
two participants including social relationship, social intimacy, similarity etc.) of
participants. Then, we propose a new approach to compute the trust transferred
from interaction contexts to a target context considering both the properties of
participants and the features of contexts in which they have interactions. Finally,
we modify matrix factorization methods, by introducing indicator functions of
both interaction trust and transferred trust, to predict the trust of a participant
in others’ minds regarding a certain target context.
The main contributions of our work are summarized as follows: (i) we intro-
duce relevant context information into our model; (ii) we propose a context-aware
trust transference method that can mitigate the sparsity problem and enhance
the trust prediction accuracy; and (iii) we propose a matrix factorization based
method that can predict the trust between two participants in a target context
regardless of whether there is a path connecting them.
Social Context-Aware Trust Prediction in Social Networks 529
Social context describes the context about participants. Before it can be used
to predict trust of participants, the properties of each aspect must be extracted
modeling the characteristics of participants and the relationship between them.
Therefore, social contexts can be divided into two groups according to the char-
acteristics of each impact factor: personal properties (e.g., role impact factor,
reliability and preference) and interpersonal properties (e.g., preference similar-
ity, social intimacy and existing trust).
Role Impact Factor: Role impact factor (denoted as RIFpc1i ) has a signif-
icant influence on the trust between participants in a society [7]. It illustrates
the impact of a participant’s social position and expertise on his/her trustwor-
thiness when making recommendations based on that the recommendation from
a person who has expertise in a domain is more credible than others with less
knowledge. There are various ways to calculate the role impact factor in different
domains. For example, the social position between email users is discoverd by
mining the subjects and contents of emails in Enron Corporation1 [4].
Recommendation Reliability: In a certain context, the reliability of rec-
ommendations (RLBpci1 ) measures the rate of a participant’s recommendations
accepted by recommendees [3]. On the dataset MovieLens2 , the leave-one-out
approach is used in [3] to calculate the deviation between the predicted rating
and the actual ratings as the reliability of a participant.
Preference: Preference (P Spc1i ,p2 ) is an individual’s attitude or affinity to-
wards a set of objects in a decision making process [6]. This property may dif-
fer greatly between different contexts in real life. The similarity of two partici-
pants’ preferences can impact the trust between them to some extent [12]. Here,
P Spc1i ,p2 = P Spc2i ,p1 . It can be calculated from the rating values given by users
using models such as PCC and VSS [8].
Social Intimacy: Social intimacy (SIpc1i ,p2 ) refers to the frequency of connec-
tions between participants in a social network. The degree of social intimacy can
impact trust as people tend to trust these with more intimate social relation-
ships [1]. Here, SIpc1i ,p2 is not equivalent to SIpc2i ,p1 . Models like PageRank [11],
are able to calculate the social intimacy degree values.
1
http://www.cs.cmu.edu/~ enron/
2
http://movielens.sumn.edu/
530 X. Zheng et al.
p5 p1 p2 p3 p4 p5
0.9 0.6 p1 0.9
0.9
p2 0.7 0.9
p2 0.7
p1 0.9
p3
p3 0.9 0.6
0.9 0.6 p4 0.6
p4 Interaction trust p5 0.9
Interaction context is the information about the situation when the interaction
happens between participants p1 and p2 . For example, suppose that p2 has rec-
ommended mobile phones to p1 many times in the past. As a result, p1 trusts
p2 with the value Tpc1i,p2 = 0.8 in the context of mobile phones. Now p2 recom-
mends p1 a laptop. As there is no historical recommendation in the context of
laptops, and there does exist similarity between the contexts of mobile phones
and laptops, we need to calculate the context similarity in order to determine
how much p1 can trust p2 in the target context of recommending laptops. Let
CS ci ,cj ∈ [0, 1] denote the similarity between two contexts ci and cj . Only when
ci and cj are exactly the same context, CS ci ,cj = 1. And CS ci ,cj = 0 indicates
that the information in context ci is not relevant to cj at all and cannot im-
pact participants’ trust in context cj . Here, CS ci ,cj = CS cj ,ci . We adopt the
classification of contexts introduced in [12] with a number of existing methods
to compute similarity [13,12], such as Linear discriminant analysis and context
hierarchy based similarity calculation. In addition, the interaction context cj is
relevant to the interaction context ci if CS ci ,cj > μ (μ is a threshold, e.g., 0.7),
denoted as ci ∼ cj . Otherwise, if cj is irrelevant to ci , denoted as ci cj .
0.77c1
p5 p1 p2 p3 p4 p5
0.6c1
0.7c2 0.6 c4
0.9c1 0.9c1 0.8c2 0.7 p1 0.9
p2 0.7c1
0.65c2
p1 0.9c1
0.8c5
p3 p2 0.7 0.9
0.6c1 p3 0.9 0.6
0.9c1 0.8c3 0.9c2
0.7c2
0.83c4 p4 0.6
0.5c3
p4 Interaction trust p5 0.9
Contexts
The process to predict the trust between participants px and py in the target
context of cj can be divided into two situations based on available information.
They are discussed in the following subsections.
The trust in relevant interaction contexts can be transferred to the target con-
text. The result is called transferred trust. This process is trust transference.
As introduced in Section 2, the personal properties and interpersonal proper-
ties can impact how much of the trust in interaction contexts can be transferred
to that in a target context, which is termed as trust transference degree. Thus
the transference degree of trust to py in px ’s mind from interaction context ci
to target context cj can be calculated from the following equation:
This equation assumes that participant px trusts participant py with the trust
value Tpcxi ,py after interactions in context ci in the past. It calculates the trans-
ference degree from the trust in interaction context ci to the trust in target con-
text cj , when participant py makes recommendations to participant px . Here,
{ωi }, i = 1...5 are theweights of the properties that impact the trust of py in
the mind of px , and ωi = 1. Therefore, the trust value to py in the mind of
i
px regarding the context ci , Tpcxi ,py , can be transferred to the one in the target
c ,c
context cj by αpix ,pjy · Tpcxi ,py .
However, in the target context cj , even if participant px has no interaction
with participant py , px can trust py to some extent primarily due to py ’s social
effect and his/her ability to give an appropriate recommendation, which can be
depicted by the role impact factor and recommendation reliability. We use the
term“basic trust” [9] to refer to this kind of trust, which can be formulated as:
p1 p2 p3 p4 p5
p5
0.9 0.6 p1 0.7 0.69 0.9
0.8 0.9
p2 0.7 0.9
p2 0.7
0.7
p1 0.9
p3
0.9 0.6 p3 0.9 0.91 0.6
0.69
0.75 0.91
p4 0.75 0.6
p4 Interaction trust
Transferred trust p5 0.8 0.9
how much participant px can trust py in the target context cj can be formulated
as follows:
T̃pcxj ,py = β1 max{αcpix,c,pjy · Tpcxi ,py } + β2 BTpcxj ,py (3)
ci ∈C
where, β1 + β2 = 1; max{·} means the maximum trust value among all the trust
ci ∈C
values transferred from relevant contexts without basic trust. These coefficients
can be calculated using leave-one-out approach [3] in the historical data.
R ≈ UT V (5)
1
n n
λ1 λ2
min (Iij + η I˜ij )(rij − uTi vj )2 + ||U ||2 + ||V ||2 , (6)
U,V 2 i=1 j=1 2 2
Social Context-Aware Trust Prediction in Social Networks 533
where ||.||2F represents the Frobenius norm; Iij is an indicator function of inter-
action trust. Iij = 1 iff participant pi (truster) trusts participant pj (trustee)
in the target context originally, i = j. Otherwise, Iij = 0. In addition, I˜ij is an-
other indicator function of transferred trust. I˜ij = 1 iff participant pi (truster)
has trust calculated by Eq. (3) to participant pj (trustee), i = j. Otherwise,
I˜ij = 0. η ∈ [0, 1] is a coefficient controlling the weight of transferred trust. Once
the learning process of the method is achieved by Eq. (6), the trust we want to
predict can be calculated by Eq. (4).
4 Experiments
We evaluate the effectiveness of our model in typical scenarios including the basic
cases of social networks in real world and compare our model with the state-of-
the-art approach social context-aware trust inference (SocialTrust) [12], as well
as the prevalent multiplication strategy (MUL) [5]. Due to space limitations,
only the comparison of trust inference between contexts is presented here.
In real life, a typical situation needing trust prediction is that a recommender
and a recommendee do not have any interactions in the target context cj .
However, they have many interactions in the past in other relevant contexts
Ch = {ci }, i = 1, ...n and i = j. Without any loss of generality, the trust val-
ues between two participants are generated using a random function in Matlab.
We adopt the coefficients from SocialTrust giving the same weight for each co-
efficient, where applicable, and set ω1 = ω2 = ω3 = 0.333, δ1 = δ2 = 0.5,
β1 = β2 = 0.5, CS c1 ,c2 = 0.8, CS c1 ,c3 = 0.1. The context information we used in
this case study can be found in Table 1. In this situation, the trust values to p2
ID Context Context Relation Tp1 ,p2 P Sp1 ,p2 SIp1 ,p2 RIFp1 RIFp2 RLBp1 RLBp2
c1 Teaching VC c1 ∼ c2 & c1 c3 ? 0 0 0 0.8 0 0.9
c2 Teaching Java c2 ∼ c1 & c2 c3 0.7 1 1 0.5 0.8 0.5 0.9
c3 Car repair c3 c1 & c3 c2 0.8 1 1 0.5 0.8 0.5 0.9
in p1 ’s mind calculated by Eq. (3) and SocialTrust are 0.57 and 0.74 respectively.
MUL does not apply in this case, as it does not deal with trust between contexts.
SocialTrust neglects the concept of basic trust while taking the role impact
factor of p1 in the target context c1 into account. In real life, this value should be
0 consistently, because when a participant seeks suggestions from others, he/she
usually has no experience in the target context. Otherwise, he/she has his/her
own trust in the target context already and may not need recommendations.
Therefore, our result is the most reasonable one in this scenario. It fits the case
in real life that, a VC teacher is usually also good at teaching Java, as teaching
Java and teaching VC are similar contexts.
5 Conclusions
As trust prediction is a dynamic and context sensitive process. In this paper, we
have first analyzed the properties that can impact trust transference between
534 X. Zheng et al.
different but relevant contexts. Based on these impact properties, we have pro-
posed a new trust transference method to transfer trust from interaction contexts
to a target context considering personal properties and interpersonal properties.
Then, a social context-aware trust prediction model has been proposed to predict
trust from a source participant to a target participant. The proposed approach
analyzes and incorporates the characteristics of participants’ trust values, and
predicts the missing trust in the target context using modified matrix factoriza-
tion. The conducted experiments show that our proposed model transfers trust
between contexts in a reasonable way and is able to predict trust between source
and target participants.
References
1. Brehm, S.: Intimate relationships. Random House (1985)
2. Golbeck, J., Hendler, J.A.: Inferring binary trust relationships in web-based social
networks. ACM Transactions on Internet Technology 6(4), 497–529 (2006)
3. Jia, D., Zhang, F., Liu, S.: A robust collaborative filtering recommendation algo-
rithm based on multidimensional trust model. JSW 8(1), 11–18 (2013)
4. Klimt, B., Yang, Y.: Introducing the Enron Corpus. In: CEAS (2004)
5. Li, L., Wang, Y., Lim, E.-P.: Trust-oriented composite service selection and discov-
ery. In: Baresi, L., Chi, C.-H., Suzuki, J. (eds.) ICSOC-ServiceWave 2009. LNCS,
vol. 5900, pp. 50–67. Springer, Heidelberg (2009)
6. Lichtenstein, S., Slovic, P.: The Construction of Preference. Cambridge University
Press (2006)
7. Liu, G., Wang, Y., Orgun, M.A.: Social context-aware trust network discovery in
complex contextual social networks. In: AAAI, pp. 101–107 (2012)
8. Ma, H., Zhou, D., Liu, C., Lyu, M.R., King, I.: Recommender systems with social
regularization. In: Proceedings of the Fourth ACM International Conference on
Web Search and Data Mining, WSDM 2011, pp. 287–296. ACM (2011)
9. Marsh, S.P.: Formalising Trust as a Computational Concept. Ph.D. thesis, Univer-
sity of Stirling (April 1994)
10. Sherchan, W., Nepal, S., Paris, C.: A survey of trust in social networks. ACM
Comput. Surv. 45(4), 47 (2013)
11. Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: Extraction and
mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, KDD 2008,
pp. 990–998. ACM, New York (2008)
12. Wang, Y., Li, L., Liu, G.: Social context-aware trust inference for trust enhance-
ment in social network based recommendations on service providers. World Wide
Web Journal (WWWJ) (2013) (accepted)
13. Zhang, H., Wang, Y., Zhang, X.: Transaction similarity-based contextual trust
evaluation in e-commerce and e-service environments. In: IEEE International Con-
ference on Web Services, pp. 500–507 (2011)
14. Zheng, X., Wang, Y., Orgun, M.A., Zhong, Y., Liu, G.: Trust prediction with prop-
agation and similarity regularization. In: Twenty-Eighth Conference on Artificial
Intelligence, Quebec City, Quebec, Canada, July 27-31 (in press, 2014)
Decidability and Complexity of Simulation
Preorder for Data-Centric Web Services
1 Introduction
Business protocols, and associated representation models (e.g., state machines
[3, 4], , Petri-nets), are used for specifying external behavior of services. They
open the opportunity for formal analysis, verification and synthesis of services.
For example, business protocols have been used as a basis to develop techniques
for compatibility and replaceability analysis of web services [5] and also to study
the web service composition problem [6]. In the aforementioned research works,
the simulation preorder [7] plays a fundamental role to solve the considered
problems. Indeed, simulation preorder enables to formalize the idea that a given
service is able to faithfully reproduce the externally visible behavior of another
service.
Recently, the need of incorporating data as a first-class citizen in business
protocols has been widely recognized and a number of research works has been
carried out in this direction, laying the foundations of a data-centric approach to
web services [1, 8, 9]. Formal models used to describe such specifications, called
data-centric services, are essentially communicating guarded transitions systems
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 535–542, 2014.
c Springer-Verlag Berlin Heidelberg 2014
536 L. Akroun et al.
in which transitions are used to model either the exchange of messages between
a service and its environment (i.e. a client ), or service’s actions (i.e., read, write)
over a global database shared among the existing services. A configuration (or a
state) of a data-centric service is made of a control state of the transition system
augmented with the current instance of the global database. The incorporation of
data turns out to be very challenging since it makes service specifications infinite
which leads, in most cases, to undecidability of many analysis and verification
problems.
In this paper, we investigate the decidability and the complexity issues of
service simulation in the framework of the Colombo model [1]. A Colombo service
is specified as a guarded transition system, augmented with a global database as
well as a set of variables that are used to send and receive messages. Two sources
of infiniteness makes the simulation test difficult in this context: (i) the variables
used by a service range over infinite domains and hence the number of potential
concrete messages that can be received by a service in a given state may be
infinite; (ii) the number of possible initial instances of the global database is
infinite which makes the number of configurations of a service infinite.
In a preliminary work [10], we showed that checking simulation in a Colombo
model with unbounded accesses to the database, called Colombounb , is undecid-
able. To complete the picture and provide a decidability border of simulation
in the Colombo framework, we study in this paper the simulation problem in
the case of Colombo services with bounded databases (i.e. the class of Colombo
services restricted to global databases having a number of tuples that cannot
exceed a given constant k). Such a class is called Colombobound . We show that
the simulation is 2exptime-complete for Colombobound . The proof is achieved
in two steps: (i) first we show that checking simulation is exptime-complete
for Colombo services without any access to the database (called DB-less ser-
vices and denoted ColomboDB=∅ ). ColomboDB=∅ services are also infinite-state
systems, because they still manipulate variables ranging over infinite domains.
However, a finite symbolic representation of such services can be obtained by
partitioning the original infinite state space into a finite number of equivalence
classes. As a side effect of this work, we establish a correspondence between
ColomboDB=∅ , restricted to equality, and Guarded Variable Automata (GVA)
[2]. As a consequence, we derive exptime-completeness of simulation for GVA;
(ii) then we show that the simulation test for Colombobound services exponen-
tially reduces to checking simulation in ColomboDB=∅ . The exponential blow-up
is an unavoidable price to pay since we prove that simulation in Colombobound
is 2exptime-complete. For space reasons, the proofs are omitted and are given
in the extended version of this paper [11].
Organization of the paper. We start by giving an overview of the Colombo
framework in Sect.2, then we present our results on ColomboDB=∅ and
Colombobound in Sect.3. Finally, we discuss related work in Sect.4 while we con-
clude in Sect.5.
Simulation Preorder for Data-Centric Web Services 537
?replyCCCheck(approved)
In the Colombo model, service actions are achieved using the notion of atomic
processes. An atomic process is a triplet p = (I, O, CE) where: I and O are
respectively input and output signatures (i.e., sets of typed variables) and CE =
{(θ, E)}, is a set of conditional effects, with:
3 Main Results
3.1 DB-Less Services (Colombodb=∅ )
We consider the simulation problem in the class of Colombo services without
any access to the database, called DB-less services and denoted ColomboDB=∅ .
Let S be a Colombodb=∅ service. The associated state machine is a tuple
E(S) = (Q, Q0 , F, Δ). A configuration of E(S) has the form id = (l, ∅, α)
while there is only one initial configuration id0 = (l0 , ∅, α0 ) with α0 (x) = ω,
∀x ∈ LStore(S). Moreover, in Colombodb=∅ services, atomic processes can only
assign constants to variables of LStore(S) or affect value of a variable to an-
other. Note that E(S) is still an infinite state system. This is due to the presence
of input messages with parameters taking their values from a possibly infinite
domain. Using a symbolization technique, it is possible however to abstract from
concrete values and hence turns extended machines associated with Colombodb=∅
services into finite state machines. The main idea is to use the notion of regions to
group together states of E(S). Interestingly, the obtained representation, called
Colombodb=∅ region automaton, is a finite state machine and hence a simulation
algorithm can be devised for this case.
Theorem 1. Given two DB-less Colombo services S and S , checking whether
S S is exptime-complete.
The detailed proof of this theorem is given in the extended version of this pa-
per [11]. As said before starting from a test of simulation between two DB-less
Colombo services S and S , we construct a test of simulation between two cor-
responding (finite) Colombodb=∅ region automaton RS and RS . The problem is
clearly exponential because the numbers of symbolic states in RS and RS is expo-
nential in the size of the two services S and S . The proof of hardness is obtained
from a reduction of the problem of the existence of an infinite execution of an al-
ternating Turing machine M working on space polynomially bounded by the size
of the input [12] to a simulation test between two DB-less Colombo services.
4 Related Works
5 Conclusion
for automata over infinite domain, namely GV A. Our future works will be devoted
to the definition of a generic framework that enables to capture the main features
of data-centric services and which can be used as basis to study the problems un-
derlying formal analysis, verification and synthesis of data-centric services.
References
[1] Berardi, D., Calvanese, D., Giacomo, G.D., Hull, R., Mecella, M.: Automatic
composition of transition-based semantic web services with messaging. In: VLDB,
pp. 613–624 (2005)
[2] Belkhir, W., Chevalier, Y., Rusinowitch, M.: Guarded variable automata over
infinite alphabets. CoRR abs/1304.6297 (2013)
[3] Bultan, T., Fu, X., Hull, R., Su, J.: Conversation specification: A new approach
to design and analysis of e-service composition. In: WWW 2003. ACM (2003)
[4] Benatallah, B., Casati, F., Toumani, F.: Web service conversation modeling: A
cornerstone for e-business automation. IEEE Internet Computing 08, 46–54 (2004)
[5] Benatallah, B., Casati, F., Toumani, F.: Representing, analysing and managing
web service protocols. DKE 58, 327–357 (2006)
[6] Muscholl, A., Walukiewicz, I.: A lower bound on web services composition. In: Seidl,
H. (ed.) FOSSACS 2007. LNCS, vol. 4423, pp. 274–286. Springer, Heidelberg (2007)
[7] Milner, R.: Communication and concurrency. Prentice-Hall, Inc., Upper Saddle
River (1989)
[8] Hull, R.: Artifact-centric business process models: Brief survey of research results
and challenges. In: Meersman, R., Tari, Z. (eds.) OTM 2008, Part II. LNCS,
vol. 5332, pp. 1152–1163. Springer, Heidelberg (2008)
[9] Calvanese, D., De Giacomo, G., Montali, M.: Foundations of data-aware process
analysis: A database theory perspective. In: PODS, pp. 1–12 (2013)
[10] Akroun, L., Benatallah, B., Nourine, L., Toumani, F.: On decidability of simula-
tion in data-centeric business protocols. In: La Rosa, M., Soffer, P. (eds.) BPM
Workshops 2012. LNBIP, vol. 132, pp. 352–363. Springer, Heidelberg (2013)
[11] Akroun, L., Benatallah, B., Nourine, L., Toumani, F.: Decidability and complex-
ity of simulation preorder for data-centric web services (extended version). Tech-
nical report (2014), http://fc.isima.fr/~akroun/fichiers/journal_version_
colombo.pdf
[12] Chandra, A.K., Kozen, D., Stockmeyer, L.J.: Alternation. J. ACM 28, 114–133
(1981)
[13] Belardinelli, F., Lomuscio, A., Patrizi, F.: Verification of deployed artifact systems
via data abstraction. In: Kappel, G., Maamar, Z., Motahari-Nezhad, H.R. (eds.)
ICSOC 2011. LNCS, vol. 7084, pp. 142–156. Springer, Heidelberg (2011)
[14] Patrizi, F., Giacomo, G.D.: Composition of services that share an infinite-state
blackboard (extended abstract). In: IIWEB (2009)
[15] Abdulla, P.A., Cerans, K.: Simulation is decidable for one-counter nets. In: San-
giorgi, D., de Simone, R. (eds.) CONCUR 1998. LNCS, vol. 1466, pp. 253–268.
Springer, Heidelberg (1998)
[16] Grumberg, O., Kupferman, O., Sheinvald, S.: Variable automata over infinite al-
phabets. In: Dediu, A.-H., Fernau, H., Martı́n-Vide, C. (eds.) LATA 2010. LNCS,
vol. 6031, pp. 561–572. Springer, Heidelberg (2010)
Market-Optimized Service Specification
and Matching
1 Introduction
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 543–550, 2014.
c Springer-Verlag Berlin Heidelberg 2014
544 S. Arifulina et al.
the extent, to which a provider’s service complies with the requesters’ require-
ments. Furthermore, providers and requesters often use different specification
languages. Thus, the broker has to translate their specifications into her own
language, which is supported by a certain matcher. This translation is out of
the scope of this paper as it can be done automatically based on existing ap-
proaches [7].
In the market, different brokers compete with each other for customers [5].
Customers prefer a broker, who delivers most suitable services fast and with
the least possible effort for them. Thus, in order to succeed in this competition,
brokers can distinguish themselves by providing a fast and accurate service dis-
covery with low effort for their customers. For that, brokers have to develop their
own business strategies, which they adjust to the given market. A main part of
this strategy is to find the configuration of a language and a matcher, which is
optimal wrt. the service discovery and the customer’s effort. Depending on the
broker’s strategy and the market characteristics, different configurations can be
optimal because they are subject to multiple trade-offs. For example, a compre-
hensive specification language enables very accurate matching, but it requires
quite a lot of effort for providers and requesters to create such detailed spec-
ifications. In contrast, simpler specifications can be matched much faster, but
matching accuracy may suffer. Therefore, a broker becomes successful if she has
several languages and matchers optimized according to her different strategies in
the given market. However, there are too many different variations of languages
and matchers to explore them manually as it is a tedious and error-prone task.
In this paper, we present a fully automated approach called LM Optimizer.
LM Optimizer supports brokers to find an optimal “language-matcher config-
uration” (LM Config) in a service market. An LM Config refers to a pair of
a service specification language and a corresponding matcher, working on in-
stances of this language. Both the language and the matcher are configured in a
certain manner. Configuration possibilities are determined by five kinds of con-
figuration rules. Depending on the configuration, matching accuracy, matching
runtime, and specification effort can be improved. LM Optimizer takes as an
input market characteristics and the broker strategy described in the form of so-
called configuration properties (CPs). Based on the given CPs, a configuration
procedure applies well-defined configuration rules to configure a holistic service
specification language and its matchers (provided as a part of LM Optimizer).
As an output, the broker receives an LM Config optimal for the given CPs.
To sum up the contribution, our approach provides brokers with an optimal
configuration of a language and a matcher customized for their business strategy
and the given market. This allows brokers to obtain the best possible results in
the service discovery. Thereby, our approach contributes to the development of
successful service markets.
The paper is organized as follows: In the next section, we introduce a running
example. Section 3 presents an overview of our approach, while its details are
explained in Section 4 and 5. Section 6 briefly presents related work and Section 7
Market-Optimized Service Specification and Matching 545
2 Running Example
ServiceMarketUniversityManagement
Strategy: Exam Market:
Management
Room
Service Manager ͲNonͲstandardized
ͲPrefercomplex
terminologyandprocesses
services
Course Exam
Broker ͲSmallmarket
Manager Manager
3 Overview: LM Optimizer
LM Optimizer
configuration Service Specification Configuration
optimal
properties Language (SSL) LM Config
Configuration Configuration
SSL Matchers Procedure Rules
matching for this language. (3) SSL and SSL matchers are configured by the
Configuration, which is responsible for obtaining an optimal LM Config for
the given configuration properties. It consists of a set of configuration rules
and a configuration procedure. The configuration procedure applies the con-
figuration rules for the given configuration properties, in order to configure the
SSL and its matchers optimally. SSL and its matching are described Section 4,
while the configuration part is explained in Section 5.
5 Configuration
In this section, we describe the configuration performed by LM Optimizer. This
includes a set of configuration rules presented in Section 5.1 and a configuration
procedure presented in Section 5.2.
Market-Optimized Service Specification and Matching 547
more services returned. Since in a small market, the probability of a perfect match
is rather low, we can receive more matching results by decreasing the thresholds.
In the matching process, matchers with a higher threshold are moved to the be-
ginning because after their execution fewer services have to be matched. Thus,
this decreases the runtime of the matching process. Rule 5 sets the matching al-
gorithm to string similarity matching if the market has standardized terminology
and processes. This allows sparing the runtime because names in the standardized
market are reliable for matching. Rule 6 configures the aggregation of matching
results by increasing the weight of the privacy matching result by the multiplicity
of 2 if privacy is important for the broker in this market.
We rely on the knowledge of a broker to assign reasonable range values to
her CPs. As future work, we plan to introduce measurable metrics for market
properties, which will allow setting the range values at least semi-automatically.
In this section, we present the part of LM Optimizer responsible for the config-
uration of the SSL and its matchers. It applies the configuration rules to a set
of CPs assigned with concrete values given as input by the broker.
The configuration procedure configures the SSL by building a view on a sub-
set of its specification aspects. Each aspect is also reduced to a subset of its
language constructs. Thus, the whole aspects like Signatures or their language
constructs, e.g., parameter names, can be omitted. Matching of an SSL con-
figuration is limited to aspects and constructs defined in this configuration. We
show three different example configurations in the technical report [2].
There are two phases in the configuration procedure (their order is important
as the configuration of matching steps depends on the preceding selection):
1. Language Configuration: In this phase, the necessary service aspects
are selected by applying the rule types Selection of specification aspects and
Selection of language constructs described in Section 5.1.
2. Matcher Configuration: In this phase, for each selected language aspect,
a corresponding matcher is added as a matching step in the matching process.
The matching process is configured by ordering the matching steps. Then, the
matching algorithms and the aggregation of results are configured.
Market-Optimized Service Specification and Matching 549
6 Related Work
In the following, we briefly discuss the approaches mostly related to our work.
We also explain why they do not solve the problem we stated in this paper.
Two comprehensive service specification approaches established in academia
are the Unified Specification of Components (UnSCoM) framework [9] and the
Unified Service Description Language (USDL) [3]. These two aim at comprehen-
sive description and matching of a variety of existing service aspects and language
constructs for them. In comparison to our approach, the authors of these two
approaches do not provide any configuration possibilities either of the languages
or of the corresponding matchers. Furthermore, neither the languages nor the
matchers are optimized for any market characteristics or broker strategies.
Di Ruscio et al. propose a framework called BYADL to create a customized
architectural description language (ADL) by tailoring it, e.g., for domain specifics
or operations like analysis or visualization [6]. The authors tailor an ADL by
extending it with another language. In comparison, we configure the SSL by
view building. In addition, we also configure the operation of matching.
Furthermore, there are some configurable service matchers [12]. However, their
configuration possibilities are limited to different signature matching strategies
and not selected automatically. Similarly, there are matchers that configure their
aggregation strategies (but no other features) automatically [8]. Furthermore,
their configuration only influences the matcher but never the specification lan-
guage, and thereby the considered language constructs, as in our approach.
7 Conclusions
In this paper, we presented a fully automated approach LM Optimizer that sup-
ports service brokers to create a language-matcher configuration that is optimal
for a given service market as well as a broker’s strategy. Using this configuration,
a broker can distinguish herself from other brokers competing for customers of
their service discovery commissions. Thereby, LM Optimizer supports brokers
550 S. Arifulina et al.
References
1. Alonso, G., Casati, F., Kuno, H., Machiraju, V.: Web Services: Concepts, Archi-
tectures and Applications, 1st edn. Springer (2010)
2. Arifulina, S., Platenius, M.C., Gerth, C., Becker, S., Engels, G., Schäfer, W.: Con-
figuration of Specification Language and Matching for Services in On-The-Fly
Computing. Tech. Rep. tr-ri-14-342, Heinz Nixdorf Institute (2014)
3. Barros, A., Oberle, D. (eds.): Handbook of Service Description: USDL and Its
Methods. Springer, New York (2012)
4. Benatallah, B., Hacid, M.S., Leger, A., Rey, C., Toumani, F.: On automating web
services discovery. The VLDB Journal 14(1), 84–96 (2005)
5. Caillaud, B., Jullien, B.: Chicken & Egg: Competition among Intermediation
Service Providers. The RAND Journal of Economics 34(2), 309–328 (2003),
http://www.jstor.org/stable/1593720
6. Di Ruscio, D., Malavolta, I., Muccini, H., Pelliccione, P., Pierantonio, A.: Devel-
oping next generation adls through mde techniques. In: Proceedings of the ICSE
2010, USA, vol. 1, pp. 85–94 (2010)
7. Kappel, G., Langer, P., Retschitzegger, W., Schwinger, W., Wimmer, M.: Model
Transformation By-Example: A Survey of the First Wave. In: Düsterhöft, A., Klet-
tke, M., Schewe, K.-D. (eds.) Conceptual Modelling and Its Theoretical Founda-
tions. LNCS, vol. 7260, pp. 197–215. Springer, Heidelberg (2012)
8. Klusch, M., Kapahnke, P.: The iSeM Matchmaker: A Flexible Approach for Adap-
tive Hybrid Semantic Service Selection. Web Semantics: Science, Services and
Agents on the World Wide Web 15(3) (2012)
9. Overhage, S.: UnSCom: A Standardized Framework for the Specification of Soft-
ware Components. In: Weske, M., Liggesmeyer, P. (eds.) NODe 2004. LNCS,
vol. 3263, pp. 169–184. Springer, Heidelberg (2004)
10. Papazoglou, M.P., Traverso, P., Dustdar, S., Leymann, F.: Service-Oriented Com-
puting: A Research Roadmap. International Journal of Cooperative Information
Systems 17(2), 223–255 (2008)
11. Schlauderer, S., Overhage, S.: How perfect are markets for software services? an
economic perspective on market deficiencies and desirable market features. In:
Tuunainen, V.K., Rossi, M., Nandhakumar, J. (eds.) ECIS (2011)
12. Wei, D., Wang, T., Wang, J., Bernstein, A.: Sawsdl-imatcher: A customizable and
effective semantic web service matchmaker. Web Semantics: Science, Services and
Agents on the World Wide Web 9(4), 402–417 (2011)
Designing Secure Service Workflows in BPEL
1 Introduction
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 551–559, 2014.
© Springer-Verlag Berlin Heidelberg 2014
552 L. Pino, K. Mahbub, and G. Spanoudakis
Our approach supports the design of secure SBAs. It is based on the use of Secure
Service Composition patterns (SSC patterns), which are proven to preserve certain
composition level security properties if the services that are composed according to
the pattern satisfy other properties individually. SSC patterns are used for two pur-
poses: (a) to analyse whether a given workflow fragment satisfies a given security
property, and (b) to generate service compositions that could substitute for individual
services within a workflow that cause the violation of the security properties required
of it. Our approach supports also the replacement of individual services, which violate
given security properties, by other individual services or compositions that are dis-
covered based on properties identified by the patterns. The satisfaction of security
properties at the service level is determined by digital service security certificates. We
implemented our approach in a tool that extends Eclipse BPEL Designer [7].
The paper is structured as follows. Section 2 presents scenarios of secure SBA
process design. Section 3 introduces the SSC patterns. Section 4 presents the valida-
tion and adaptation supported by the SSC patterns. Finally, Section 5 reviews related
work and Section 6 summarizes our approach and outlines directions for future work.
currentAccount, currentAccount,
paymOrder, paymOrder paymResult
symbol, stockValues, tradingAccount, Process
indexID indexValues stocksOrder report
Payment
GetStock AnalysisBy yes Write
Order? tradingAccount,
Details Preferences stocksOrder tradeResult Report
Trade
Stocks
Sequence “ProcessOrder”
Fig. 1 shows the workflow that realises StockBroker. This workflow receives a
stock symbol and a stock market index ID; invokes a stock information service (cf.
activity GetStockDetails) to get the details for the given stock in the particular market;
matches these details with preferences (cf. activity AnalysisByPreferences); and, if a
trade order is to be placed, it invokes in parallel the payment service (cf. activity
ProcessPayment) and the trading service (cf. activity TradeStocks)1. Finally, a report
of all results is produced by the reporting service (cf. activity WriteReport).
1
Carrying trading in parallel with payment is possible as clearing of payment transactions can
be completed after the trade transaction has taken place.
Designing Secure Service Workflows in BPEL 553
symb stockV
GetStock
symb, Values stockV,
index indexV
index indexV
GetStock
MarketIndex
The second scenario arises in cases where the SBA designer wishes to verify that a
part of a workflow (as opposed to an individual activity of it) satisfies a given security
property. Workflow fragments are identified (delimited) by a control flow activity. In
the Stock Broker workflow, for instance, a designer might wish to verify whether the
sub sequence of activities designated as ProcessOrder in Fig. 1 preserves the confi-
dentiality of the personal current account information of Stock Investor.
SSC patterns are used to specify how security properties of whole abstract workflows
(i.e., composition level security properties) can be guaranteed via security properties
of the individual services used in the workflow. The causal relation between work-
flow and activity level properties specified in such patterns is formally proven.
An SSC pattern is composed of: (a) an abstract workflow structure (Pattern.WF),
called workflow specification, that indicates how services are to be composed and the
data flows between them; (b) the composition level security property that the pattern
guarantees (Pattern.CSP); and (c) the security properties required of the partner ser-
vices that may be bound to the workflow specification (i.e., to the abstract invoke
activities of the workflow) to guarantee the security property specified in (b) (Pat-
tern.ASP). SSC patterns are expressed as rules of the production system Drools [9], to
enable their application for workflow security validation and adaptation.
In the following, we present an example of an SSC pattern that we have encoded
specifying the effect of composition on the security property of separability. Separability
is a security property introduced in [20] and has been defined as complete independence
554 L. Pino, K. Mahbub, and G. Spanoudakis
between high (confidential) and low level (public) sequences of actions. For this property
to hold there should be no interaction between confidential and public sequences of ac-
tions (e.g., running these actions as two separate processes without any communication
between them). The composition of separability, proven in [20,21], is used for specifica-
tion of the SSC pattern in Drools as given in Sect. 4.1.
SSC patterns are used to infer the security properties that the individual services
should have for the workflow to have another security property as a whole. This al-
lows to: (a) analyse whether a given workflow (or a fragment of it) satisfies a given
security property (security validation); and (b) generate compositions of services that
could substitute for individual services, which prevent the satisfaction of the security
properties required (security driven workflow adaptation). In the following, we pre-
sent the approaches that enable these forms of applications.
SSC patterns are used to infer the security properties, which have to be satisfied by
the individual activities (services) of a composition, for the whole composition to
satisfy a given security property. In general, there can be zero, one or several alterna-
tive combinations of activity level properties, called security solutions, that can guar-
antee the security property required of the composition. The algorithm that applies
SSC patterns for this purpose is given in Table 1.
As shown in the table, given an input service workflow WF and a required security
property RSP, the algorithm (INFERSECPROPERTIES) tries to apply all the SSC pat-
terns that would be able to guarantee the requested security property RSP. A pattern
is applied if the workflow specification of the pattern (Pattern.WF) matches with WF.
Designing Secure Service Workflows in BPEL 555
If a pattern matches the workflow, then the security solutions computed up to that
point are updated to replace the requested security property RSP with the security
properties for the matched elements in WF (these can be individual activities or sub-
workflows). If a matched element E of WF is an atomic activity, the process ends
w.r.t it. If E is a sub-workflow, the algorithm is applied recursively for it.
currentAccount,
currentAccount, paymOrder paymResult
paymResult,
paymOrder, Process
tradingAccount, tradeResult report
Payment
stocksOrder Write
WF’ tradingAccount,
Report stocksOrder tradeResult
Trade
Stocks
WF
WF’
insert(newSolution);
end
556 L. Pino, K. Mahbub, and G. Spanoudakis
5 Related Work
Research related to the security of service based applications focuses on making se-
cure an SBA, or verifying its security.
A common approach underpinning research in the former area is to secure SBAs
by using additional security services that can enforce the required security properties
[12,13,14]. More specifically, an aspect-oriented version of BPEL, called AO4BPEL
[12], allows the integration of security specifications in a BPEL process. These speci-
fications are then used to indicate security functionalities that are offered by a special
Security Service, and integrate them in the AO4BPEL process.
Sectet [13] is a framework for the implementation of security patterns from design
to the implementation of an orchestration. Sectet enables the design of orchestrations
as UML message flow diagrams, which are converted into workflows and used to gen-
erate stubs for actual orchestrations. In orchestrations, services are wrapped by Policy
Enforcement Points, whose purpose is to provide the required security properties.
PWSSec [14] describes a set of complementary stages to be added to the SBAs de-
velopment phases in order to support security. In particular the WSSecArch is a de-
sign phase that takes care of the indications about which security requirements are
achieved and where they are in the architecture. The approach makes usage of secu-
rity architectural patterns to convert the security requirements into architecture speci-
fications, with external security services providing the security functionalities.
Unlike the above approaches, our approach does not use special types of security
components or services but supports the discovery of normal services and service
compositions that themselves have the security properties required of an SBA.
Attention has been given also to the model based verification of security properties
during the design of orchestrations [15,16,17]. These works usually require a UML
558 L. Pino, K. Mahbub, and G. Spanoudakis
specification of the system, the security threats associated with it and the description
of required properties in order to verify the satisfiability of the latter. Our approach
does not require the specification of threats. Furthermore, it does not perform exhaus-
tive verification since its analysis is driven by specific SSC patterns. This is important
as it makes security analysis more scalable at the expense of loss of completeness.
Some model based approaches [18,19] support also the transformation of from se-
curity requirements into security policies and architectures. This usually happens in
an early design phase that must be followed by a subsequent phase where details
about the implementation have to be worked out. Our approach offers the possibility
to add and address security properties during the workflow design phase, without
requiring designer to have a security background.
The METEOR-S project [10] allows annotation of abstract BPEL process to specify
semantic-aware QoS properties, including security. The annotations are then used to
discover appropriate services for the BPEL process, using an annotated registry. The
Sec-MoSC (Security for Model-oriented Service Composition) tool [11] is an exten-
sion of the Eclipse BPMN Modeller that allows to design BPMN business processes
and to add security properties to them. These two approaches focus only on the valida-
tion single service of security properties, while our approach allows the validation of
workflow fragments and the substitution of services with service compositions.
6 Conclusion
In this paper we have presented an approach supporting the validation of security
properties of BPEL workflows and the security based adaptation of such workflows
during their design. A-BPEL Designer implements this approach in the Eclipse plat-
form through the usage of a service discovery engine.
Our approach is based on Secure Service Composition (SSC) patterns, which en-
code formally proven causal relations between individual service level security prop-
erties and composition level security properties. The validation of workflow security
is based on identifying (through the SSC patterns) the security properties that the
individual partner services need to have for the workflow to have composition level
properties. The identified service level properties are used to check if existing partner
services satisfy them, discover alternative services for them in case they do not, and
discover service compositions satisfying the services if necessary. Our approach sup-
ports also the automatic replacement of security non-compliant services.
Our current implementation supports workflows with sequential, parallel and
choice control activities (i.e., BPEL sequence, flow, if-then-else and pick activities),
and the replacement of individual service invocations. Hence, in its current form, its
application is restricted to non-transactional and stateless services.
Our on-going work focuses on supporting transactional services. We are also con-
ducting performance and scalability tests, in order to compare our results with com-
peting approaches (especially approaches based on full verification of security).
Acknowledgment . The work presented in this paper has been partially funded by the
EU F7 project ASSERT4SOA (grant no.257351).
Designing Secure Service Workflows in BPEL 559
References
1. Pawar, P., Tokmakoff, A.: Ontology-Based Context-Aware Service Discovery for
Pervasive Environments. In: 1st IEEE International Workshop on Services Integration in
Pervasive Environments (SIPE 2006), in conjunction with IEEE ICPS 2006 (2006)
2. Mikhaiel, R., Stroulia, E.: Examining usage protocols for service discovery. In: Dan, A.,
Lamersdorf, W. (eds.) ICSOC 2006. LNCS, vol. 4294, pp. 496–502. Springer, Heidelberg
(2006)
3. Spanoudakis, G., Zisman, A.: Discovering Services During Service Based Systems Design
Using UML. IEEE Trans. on Software Eng. 36(3), 371–389 (2010)
4. Fujii, K., Suda, T.: Semantics-Based Dynamic Web Service Composition. IEEE Journal on
Selected Areas in Communications 23(12), 2361–2372 (2005)
5. Silva, E., Pires, L.F., van Sinderen, M.: On the Support of Dynamic Service Composition
at Runtime. In: Dan, A., Gittler, F., Toumani, F. (eds.) ICSOC/ServiceWave 2009. LNCS,
vol. 6275, pp. 530–539. Springer, Heidelberg (2010)
6. Pino, L., Spanoudakis, G.: Constructing Secure Service Compositions with Patterns. In:
IEEE SERVICES 2012, pp. 184–191. IEEE Press (2012)
7. BPEL Designer Project, http://www.eclipse.org/bpel/
8. ASSERT4SOA Consortium: ASSERTs Aware Service Based Systems Adaptation.
ASSERT4SOA Project, Deliverable D2.3 (2012)
9. Drools – Jboss Community, http://drools.jboss.org
10. Aggarwal, R., Verma, K., et al.: Constraint Driven Web Service Composition in
METEOR-S. In: IEEE SCC 2004, pp. 23–30. IEEE Press (2004)
11. Souza, A.R.R., et al.: Incorporating Security Requirements into Service Composition:
From Modelling to Execution. In: Baresi, L., Chi, C.-H., Suzuki, J. (eds.) ICSOC-
ServiceWave 2009. LNCS, vol. 5900, pp. 373–388. Springer, Heidelberg (2009)
12. Charfi, A., Mezini, M.: Using aspects for security engineering of web service composi-
tions. In: IEEE ICWS 2005, pp. 59–66. IEEE Press (2005)
13. Hafner, M., Breu, R., et al.: Sectet: An extensible framework for the realization of secure
inter-organizational workflows. Internet Research 16(5), 491–506 (2006)
14. Gutiérrez, C., Fernández-Medina, E., Piattini, M.: Towards a process for web services se-
curity. J. of Research and Practice in Information Technology 38(1), 57–68 (2006)
15. Bartoletti, M., Degano, P., et al.: Semantics-based design for secure web services. IEEE
Trans. on Software Eng. 34(1), 33–49 (2008)
16. Deubler, M., Grünbauer, J., Jürjens, J., Wimmel, G.: Sound development of secure service-
based systems. In: ICSOC 2004, pp. 115–124. ACM, New York (2004)
17. Georg, G., Anastasakis, K., et al.: Verification and trade-off analysis of security properties
in UML system models. IEEE Trans. on Software Eng. 36(3), 338–356 (2010)
18. Menzel, M., Warschofsky, R., Meinel, C.: A pattern-driven generation of security policies
for service-oriented architectures. In: IEEE ICWS 2010, pp. 243–250. IEEE Press (2010)
19. Séguran, M., Hébert, C., Frankova, G.: Secure workflow development from early require-
ments analysis. In: IEEE ECOWS 2008, pp. 125–134. IEEE Press (2008)
20. McLean, J.: A general theory of composition for trace sets closed under selective interleav-
ing functions. In: 1994 IEEE Symp. on Sec. and Privacy, pp. 79–93. IEEE CS Press (1994)
21. Mantel, H.: On the composition of secure systems. In: 2002 IEEE Symp. on Sec. and Pri-
vacy, pp. 88–101. IEEE CS Press (2002)
Runtime Management of Multi-level SLAs
for Transport and Logistics Services
Clarissa Cassales Marquezan1, Andreas Metzger1, Rod Franklin2, and Klaus Pohl1
1
paluno (The Ruhr Institute for Software Technology)
University of Duisburg-Essen, Essen, Germany
{clarissa.marquezan,andreas.metzger,klaus.pohl}@paluno.uni-due.de
2 Kühne Logistics University, Hamburg, Germany and
1 Introduction
Managing Service Level Agreements (SLAs) is an essential task for all kinds of ser-
vices, be they computational (e.g., cloud services or web services) or non-computational
(such as transport and logistics, manufacturing, energy, or agriculture services). The
major goals of SLA management are to monitor, check, and ensure that the expected
QoS attributes are met during service execution. Expected QoS attributes are expressed
in terms of Service Level Objectives (SLOs) that are part of SLAs. In the computational
domain, SLA management has been extensively researched. A diversity of languages
and tools have been developed [24,2,15,16,19].
SLA management for transport and logistics services is just beginning to be investi-
gated [12]. This especially holds true for automating SLA management, which is fos-
tered by the increasing digitization of SLAs of transport and logistics services together
with the need to share SLA information among the participants in a business process.
The transport and logistics domain thus significantly would benefit from the techniques
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 560–574, 2014.
c Springer-Verlag Berlin Heidelberg 2014
Runtime Management of Multi-level SLAs 561
and methods developed by the services community for computational services. This
paper investigates opportunities for extending techniques developed for computational
services to non-computational services in the transport and logistics domain by starting
from an understanding of industry requirements and their potential for automation.
Traditionally, managing computational SLAs involves handling two levels of infor-
mation: (1) QoS monitoring data collected during service execution used, for instance,
to check whether the service-level objectives are met; and (2) the actual SLAs that
specify expected and agreed to service-level objectives.
SLA management for transport and logistics requires an additional level of infor-
mation: (3) terms and conditions of so-called frame SLAs. A frame SLA is a general
agreement that constitutes a long-term agreement (e.g., one year) between parties that
have decided to work together. During this period of time, each request for service
execution creates a specific SLA (which is equivalent to the SLA at level (2) for com-
putational services). The terms and conditions of the frame SLA become the governing
terms and conditions for all specific SLAs established under the frame SLA. In con-
trast to computational services, the frame SLA is the actual legally binding document
between the two partners. The advantage of frame SLAs is that they simplify the ex-
ecution of services that will be delivered in a repeated manner over an extended time
frame. These services can all be executed under the same agreement without having to
renegotiate SLAs and SLOs for each service execution.
To automate the SLA management process for transport and logistics services exe-
cuted under frame agreements requires dedicated solutions capable of handling these
three levels of information at run-time in an automated fashion. It is important to con-
sider the multi-level relationships between frame SLA, specific SLA, and actual QoS
measurements. Otherwise, SLA management may lead to wrong conclusions and deci-
sions that service levels have or have not been met, resulting in inapplicable and avoid-
able penalties. Section 2 elaborates on these problems using industry data, thereby mo-
tivating the industry needs for such automated solutions.
In our previous work [12], we presented an analyzer component for runtime SLA
management of transport and logistics services, providing a computational solution for
automatic SLA checking at run-time. In this paper, we integrate this analyzer compo-
nent into the BizSLAM App, which is developed on top of FIspace.eu, a cloud-based
business collaboration platform [23]. To this end, we (i) define an extensive data model
for transport and logistics services and (ii) implement dedicated user interfaces for man-
aging SLAs. Section 3 introduces the conceptual foundations and features of app as well
as the data model used to express and relate the multiple levels of SLA information. It
also describes how our SLA management approach advances the state of the art.
Section 4 discusses feasibility and usefulness of our SLA management approach,
applying the BizSLAM App prototype to a specific scenario in SLA management.
Transport and logistics services can account for between 10% to 20% of a country’s
Gross Domestic Product, and CO2 emissions from transport activities amount to 14%
of total greenhouse gas emissions. Therefore, an improvement in how efficiently these
562 C.C. Marquezan et al.
services are provided can dramatically increase competitiveness and sustainability. Ev-
idence suggests that improved management of transport processes through advanced
IT could yield efficiency gains in the range from 10% to 15% [1]. Many opportuni-
ties for employing IT to optimize and improve transport and logistics processes can
be listed, such as better business collaboration support [23], real-time information han-
dling, better transport and logistics planning tools, predictive process management [17],
and enhanced SLA management solutions [9].
In this paper, we focus on enhanced SLA management. More specifically, we look at
transport and logistics service level agreements (or SLAs) and their management dur-
ing the execution of transport and logistics processes. Illustrated by concrete examples
from an industry dataset (available from http://www.s-cube-network.eu/c2k),
we elaborate on the current situation in industry and the key business requirements
for enhanced IT solutions for SLA management in this domain. The industry dataset is
based on Cargo 2000 logs covering five months of business operations. Cargo 2000 is a
standard established by IATA, the International Air Transport Association, enabling the
cross-organizational monitoring of transport processes.
Figure 1 shows the relationship between the actual and planned units of cargo as-
sociated with the transportation processes covered by this dataset. The ”planned” axis
denotes the number of units the logistics service client booked and thus constitutes the
number of units that the logistics service provider was expecting to receive from the
client. This booked value thus forms part of an SLA between the logistics service client
and the logistics service provider. The ”actual” axis indicates the effective cargo units
received by the logistics service provider at the beginning of the air transport service.
From the perspective of the logistics service provider, all the circles off the diagonal
line in Figure 1 would theoretically indicate SLA violations since the actual amount
delivered by the customer does not comply with what has been booked. However,
the aforementioned information is not sufficient to determine whether an actual SLA
Fig. 1. Planned vs. actual weight of cargo for real-world transport processes (adapted from [8])
Runtime Management of Multi-level SLAs 563
violation happened for the transport and logistics service. As discussed in Section 1,
the relationships of the three levels of information must be considered, i.e., how the
actual measured QoS value, the planned value of the specific SLA, and the frame SLA
are related. For instance, assume that the points highlighted by the boxes in Figure 1 are
associated with the same frame SLA. Also, assume that this frame SLA establishes that
a logistics service client may ship up to 25 containers, each with up to 3000 units of
cargo, within the time span of one year. This means that whenever the logistic service
client delivers a container with up to 3000 units to be transported, this delivery complies
with the frame SLA and the cargo should be transported for the fixed price established
under the frame SLA (provided that the number of containers delivered previously does
not exceed the established limit of 25). Above this threshold, the fixed price might not
apply and may thus require re-negotiating the SLAs.
The boxes P1, P2, P3, and P4 in Figure 1 show the actual amount delivered by the
logistics service client (axis Y) versus the planned and reserved amount of cargo to
be transported by the logistics service provider (axis X). An analysis of these points
without factoring in the frame SLA would indicate that points P3 (3000, 3000) and P4
(8000, 8000) do not constitute SLA violations, while P1 (0, 2900) and P2 (2900, 1200)
constitute SLA violations. In this case, penalties should be applied for the service exe-
cution of points P1 and P2. Now, taking into account the frame SLA, we actually reach
a different conclusion: We find that points P1, P2, and P3 do not constitute violations
since the respective amount of cargo in these service executions is under, or equal to,
the amount established in the frame SLA (i.e., 3000 units of cargo). In contrast, the
service execution represented by point P4 does constitute a violation of the frame SLA.
Currently, industry follows a manual process to check whether the SLOs of the spe-
cific agreements (i.e., each individual service execution) conform with the SLOs of the
related frame agreement. The numbers provided by a large company from the transport
and logistics domain show that in a given month up to 100,000 transports may have
to be handled by the logistics service provider [17]. Each of these transports may be
associated with a specific agreement, meaning that the number of specific agreements
to be checked by a large transport and logistics company could reach up to hundreds of
thousands of documents per month. This clearly requires automated support.
The situation faced by industry today, and as presented above, is mainly caused by
the following limitations: First, frame SLA information is currently not available in
real-time to the down-stream individuals in charge of the actual operations of the logis-
tics service providers. Second, there are currently no standards for representing SLAs
in the domain in a structured way. Third, as a consequence from the aforementioned
limitations, the SLA management in the transport and logistics domain is manually per-
formed in a ”post-morten” fashion (i.e., long after the execution of the service). The
remainder of this paper introduces our solution to address these limitations.
(Section 3.2). One key element of the solution is an extensive data model that includes
the major data types found in SLAs for transport and logistics services (Section 3.3).
The section concludes with a discussion of related work (Section 3.4).
Logistics Service
Provider
1
1 1..*
Terms and Frame
Frame
1
Conditions SLA
SLA SLA
SLA
Specific
Specific
SLO SLA
1..* 1..* SLA 0..*
1
Logistics Service
Client
Atomic Aggregated
SLO SLO
Fig. 2. UML model representing key concepts of Transport and Logistics SLAs
Following from our observations in Section 2, an SLA can either be a Frame SLA
or a Specific SLA. Each Specific SLA is related to exactly one Frame SLA. This also
leads to two types of SLOs specified in the domain: An Atomic SLO defined in a frame
SLA specifies a quality guarantee that has to be met by each of the specific SLAs.
In our example from Section 2, the maximum of 3000 cargo units constitutes such an
atomic SLO. Each specific SLA established under the related frame SLA may only de-
fine a maximum of 3000 cargo units. Another example of an atomic SLO is transit time,
defining a maximum time span during which each individual transport must occur. In
contrast, an Aggregated SLO defined in a frame SLA specifies a quality guarantee in
terms of an accumulative value based on the respective SLOs in the specific SLAs. In
our example from Section 2, the maximum number of 25 containers per year consti-
tutes such an aggregated SLO. This means that the sum of all containers defined in the
specific SLAs may not be more than 25. The two types of SLAs (frame and specific)
together with the two types of SLOs (atomic and aggregated) constitute the core for sup-
porting runtime and automated SLA management for transport and logistics services.
Currently, there is no ”de facto” standard in the transport and logistics domain that is
able to represent different types of SLAs and the diversity of SLOs. Therefore, based
on experience gathered from interviews and repeated interactions with transport and
logistics partners from industry, we defined an extensive data model for SLAs in that
domain. As a result, the data model consolidates all information relevant for SLA man-
agement of transport and logistics services. Nowadays, such information is scattered
across e-mails, spread sheets, and paper documents.
The data model defines all information constituting a transport and logistics SLA,
called Transport and Logistics SLA Vocabulary. This model allows for the customiza-
tion of the SLA and SLO types to meet the specific requirements of different sectors and
modes of operations in the industry. Primarily, the data model supports the process of
1 http://catalogue.fi-ware.org/enablers/repository-sap-ri
566 C.C. Marquezan et al.
introducing SLA information during the execution of services. The data model thereby
provides a common frame for expressing SLAs. Based on such a common frame, con-
tract terms (and their definitions) can be announced by the logistics service provider
and agreed on by the logistics service users, thereby ensuring “semantic” equivalence
of the SLOs employed in the various SLAs (e.g., see Section 6 in [21]).
The design of our data model builds upon initial data models proposed by the EU
e-Freight project [7]. It is implemented in Linked-USDL, which is a version of USDL
(the Unified Service Description Language2) that builds upon the Linked Data prin-
ciples and the Web of Data. To this end, we define our Transport and Logistics SLA
Vocabulary as an RDF vocabulary, which is depicted in Figures 3–6. Concepts in green
and purple indicate the extensions we introduced on top of the e-Freight model. Purple
concepts represent transport and logistics concepts defined in existing data models. Blue
concepts represent existing vocabularies adopted by Linked-USDL, such as GoodRela-
tions3 and vCard4 . Due to space limitations we focus the following description on the
most important concepts of the data model.
Part A includes the basic concepts for the Transport and Logistics SLA Vocabulary.
The central concept is Contract, which links to all other concepts in the vocabulary (as
explained below). Contract holds the information about the established SLA like issue
date, issue time, validity period, involved parties and so forth. In order to differentiate
between frame and specific SLAs, the ContractType concept is used. The links between
lcontract:
Identifier id Gr:
B
foaf: QuantitativeValue
versionId Document
totalTransportHandlingUnitQuantity
Logistics-codes:
DocTypeCode totalGoodsItemQuantity
documentType consignmentQuantity
lcontract:
DocumentReference
Xsd: description
String
A C
externalReference Xsd:
issueDateTime
Boolean
Owl-time:
Instant Logistics-codes:
lcontract: ContractType
ExternalReference
mimeType Owl-time:
Logistics-codes: e
multipleLegs Instant
MimeTypeCodes
formatCode description contractDocumentRefernce
contractType
Logistics-codes: fileName consignments
FormatCode
lcontract: Description
lcontract:
Identifier Identifier
Owl-time:
Instant contractId
Owl-time:
DateTimeDescription
recurrenceDateTime
duration
id
lcontract:
Recurrence
recurrences
lcontract:
Amount
reservedCapacity
allowedAmount
lcontract:
Contract
maxAllowedWeight
totalTransitTimeInterval
lcontract:
Measure
E
dayOfYear
Xsd: recurrencePattern
ern appointments
Int Logistics-codes:
RecurrencePatternCode lcontract:
pp
AppointmentPeriod
Appointment validityPeriod
D
timeZone
Logistics-codes: Xsd:
TimeZone String
Description lcontract: Owl-time:
ExceptionDateTime Period
description
DurationDescription
DurationDescription Lc
Lc
Xsd: transportationServices T
String
Owl-time: hasDuration
Interval
Owl-time: duration
Instant Lcontract:
Owl-time: TransportService
DurationDescription
transportServiceTerms
hasServicePoints
hasServicePoint
Points
oints
Fig. 3. Data model for Transport and Logistics SLAs represented as RDF graph (Part A)
2 http://linked-usdl.org/
3 http://www.heppnetz.de/ontologies/goodrelations/v1
4 http://www.w3.org/Submission/vcard-rdf/
Runtime Management of Multi-level SLAs 567
lcontract:
Amount
totalInvoiceAmount
chargeableWeightMeasure
declaredCustomsValueAmount
m lcontract:
eV
insureanceValueAmount Measure
lcontract: declaredForCarriageValueAmount
eV u grossWeightMeasure
Indicator netWeightMeasure
We tM
ggrossVolumeMeasure
livestockIndicator humanFoodIndicator
ma loadingLengthMeasure
din
to
animalFoodIndicator
tor
hazardousRiskIndicator netVolumeMeasure
splitConsignmentIndicator
lcontract: summaryDescription
remarks
id Consignment
customsClearanceServiceInstructions
lcontract: customsDeclarationId Xsd: specialServiceInstructions
handlingCode
Identifier String
handlingInstructions
Logistics-codes:
deliveryInstructions
HandlingCode
Gr:
B
QuantitativeValue
totalTransportHandlingUnitQuantity
totalGoodsItemQuantity
consignmentQuantity
A C
Xsd:
Boolean
Logistics-codes:
Fig. 4. Data model for Transport and Logistics SLAs represented as RDF graph (Part B)
frame SLA and specific SLAs are realized by means of the ServicePoint concept intro-
duced in Part E of the data model.
Part B is designed to enable a very detailed description of the goods that could
be transported under the SLA terms. Nonetheless, the attributes and relationships of
this section of the SLA are not mandatory and can be used according to the needs of
partners establishing the SLA. Examples of concepts that allow for expressing detailed
customsClearanceServiceInstructions
specialServiceInstructions
handlingInstructions
Logistics-codes:
deliveryInstructions
GenderCode
gender
Xsd: organizationalDepartment
String
Quantity note
nameSuffix
gr: Logistics-codes:
BusinessEntity AddressTypeCode
id
nationalityId
lcontract: addressType
A C
Identifier
id vcard:
adr
person
websiteUri partyId postalAddress
buildingNumber
lcontract:
name
n
country
buildingName
Party postbox district
consigner fforwarder region
carrier Gn:
Feature
Xsd:
consignee Xsd: String
String
Owl-time:
Instant
E
paymentDueDate
price
lcontract:
lcontract: bonus Amount
Measure penalty
p enaltyy
Lcontract:
Fig. 5. Data model for Transport and Logistics SLAs represented as RDF graph (Part C)
568 C.C. Marquezan et al.
pp
pointmentPeriod
AppointmentPeriod
Appointment validityPeriod
D
timeZone
ss:: Xsd:
String
lcontract: Owl-time: Lcontract:
Description deliveryTerms
ExceptionDateTime Period
description
DurationDescription Lcontract: DeliveryTerm
transportationServices Terms deliveryLocation
Owl-time: hasDuration
Interval executionTerms transportService
duration Lcontract:
Owl-time: TransportService Lcontract: changeCo
Co
o
DurationDescription
ExecutionTerm
transportServiceTerms
ServiceTerms
oints
hasServicePoints
servicePoint
Xsd:
hasSpecialRemarks String
Gr: toLocation
messurementFrom Location
lcontract: TransitTimeInterval
lcontract:
in
informationUri Measure
id
fromLocation
ServicePoint
address sequenceNumber
hasTransportationMode xsd:
messurementTo llocationCategory name lcontract: Logistics-code: Integer
description Identifier transportationMeans TransportationMode
V
vcard: Logistics-code: hasCode
lcontract:
adr LocationCategory
EmissionCalculationMethod Xsd:
String lcontract:
journeyId lcontract:
fullnessIndicator TransportMeans
Identifier
calculationMethod
registrationNationality
Logistics-code: Logistics-code: Gn:
LoadFactor CalculationModecalculationMethod Feature
alcu
cu
roadTransport airTransport maritimeTransport
railTransport
environmentalEmissions
hasTransportMeansTypeCode
lcontract: lcontract: lcontract: lcontract:
RoadTransport RailTransport AirTransport MaritimeTransport
lcontract: Logistics-code:
railCarId aircraftId TransportMeansType
EnvironmentalEmission vesselId
licensePlateId trainId
description
vesselName
valueMeasure
environmentalEmissionType
Xsd:
String
lcontract: Xsd: Logistics-code: lcontract:
Measure String EnvironmentalEmissionType Identifier
Fig. 6. Data model for Transport and Logistics SLAs represented as RDF graph (Part D)
information of goods include Measure (e.g., volume, weight), Amount (e.g., amount
declared for customs), and Indicators (e.g., hazardous).
Part C describes the parties associated with an SLA. The Party concept and its asso-
ciated concept defines the information about the provider and consumer of the agreed
contract.
Part D depicts the transportation service agreed among the parties of the SLA. The
concepts Transport Service and Service Point are the most relevant in this part of the
vocabulary. The ServicePoint concept is used to specify a single transportation service
(transport leg) with a specific sequence number (also see Part E). We designed the
Transport and Logistics SLA Vocabulary in such a way that two basic representations
Owl-time:
Instant
E
paymentDueDate
price
lcontract:
ct: bonus Amount
re penalty
Lcontract:
PaymentTerm settlementPeriod
validityPeriod
Owl-time:
penaltyPeriod Interval
amountCurrencyId
executionTerms
lcontract:
id Identifier
ime: Lcontract:
deliveryTerms
escription Lcontract: DeliveryTerm
Terms deliveryLocation specialTerms
Xsd:
String
executionTerms transportServiceProvicerSpecialTerms
transportUserSpecialTerms
ansportU
Lcontract: changeCo
chan
ch
h Co
changeConditions
ExecutionTerm
transportServiceTerms
transportSS
servicePoint
Fig. 7. Data model for Transport and Logistics SLAs represented as RDF graph (Part E)
Runtime Management of Multi-level SLAs 569
of transport and logistics SLAs can be chosen depending on the actual situation faced
in practice: The first representation uses only one service point to define a transport
and logistics SLA. This means that the SLA specifies SLOs for a single transportation
leg. The granularity of this leg is irrelevant. For example, the leg could be from Turkey
to UK, or from the airport of Amsterdam to the port of Rotterdam. The second rep-
resentation uses multiple service points, each with individual SLOs. In this case, the
vocabulary is able – in a more fine-grained way – to represent SLAs that specify differ-
ent SLOs for each transportation service. For example, if the SLA specifies that goods
from partner P1 should be transported by partner P2 from Turkey to the UK, this may
involve two service points with specific SLOs: one for sea transportation from Turkey
(i.e., the first leg of transportation service), and a second for road transportation once
the goods have arrived in the UK (i.e., the second leg).
Part E associates the terms of the SLA to each Service Point. For each of the service
points certain terms must be defined. Using the previous data models as a basis, we
defined three minimal terms that must be specified for each service point using the
contract vocabulary: payment, delivery, and execution. The ServicePoint concept can
thereby be used to define multi-leg, multi-party, as well as multi-level SLAs.
different information models used by different logistics partners. Yet, they do not focus
on representing SLA information once an agreement has been established.
In our previous work [12], we presented a first solution for the run-time management
of multi-level transport and logistics services. Specifically, we introduced a computa-
tional solution for automatic SLA checking at run-time that employed WS-Agreement
to formally represent frame and specific SLAs, and that used CSP solvers to check for
inconsistencies. In this paper, we integrate this technical approach into an overall sys-
tems perspective and provide evidence for the industrial relevance, applicability and
usefulness of such an approach in the transport and logistics domain.
This section demonstrates the feasibility of the BizSLAM App (Section 4.1) and dis-
cusses the usefulness of applying the App in an industrial context (Section 4.2).
4.1 Feasibility
As described above, the BizSLAM App can be applied to automatically determine in-
consistencies in multi-level SLAs during business operations. Figure 3.4 depicts a real-
world scenario that shows typical inconsistencies that can be detected. In the given sce-
nario, a logistics service client has established a frame SLA A with a logistics service
Frame SLA A
SLOs
Agreement Date 12.12.2012
provider. This scenario defines two atomic SLOs as part of the frame SLA: a maximum
of 25 days Transit Time as well as a maximum of 3000 Cargo Units. In addition, the
frame SLA defines an aggregated SLO that defines 25 as the SUM of Containers to be
transported during the validity period of the frame SLA. For each execution of a trans-
port and logistics service under the frame SLA A, a specific SLA is created. Figure 3.4
shows three such specific SLAs: A.1, A.2 and A.3.
In the scenario depicted in Figure 3.4, two violations occur that are detected by
the BizSLAM App as shown in Figure 4.1. The automated conformance check of the
BizSLAM App detects these violations immediately, i.e., as soon as they occur, and
issues so called pre-violation alerts (the red boxes in Figure 4.1). These alerts inform
the logistics service users that if they insist on the chosen SLOs (e.g., in order to ensure
timely delivery of goods) this might imply penalties for violating the frame SLA at the
end of the validity period of the frame SLA.
Fig. 9. BizSLAM App detecting inconsistencies between specific and frame agreements
572 C.C. Marquezan et al.
Violation 1: According to the frame SLA, only 3000 cargo units may be transported
for each specific SLA. However, the specific SLA A.2 asks for a cargo volume of 3100
and thus violates the atomic SLO Cargo Units specified in the frame SLA.
Violation 2: A total of 25 containers may be contracted during the validity period of
the frame SLA. When the specific SLA A.3 asks for 20 containers, 25 containers have
already been contracted in the previous specific SLAs A.1 and A.2. Thus, no containers
remain to be contracted under the frame SLA, which in turn means that the specific
SLA A.3 leads to a violation of the aggregated SLO Containers.
As part of our ongoing research we are preparing an empirical evaluation of our
SLA management approach. This includes more sophisticated examples and use cases,
as well as controlled experiments that combine real data from the field with simulation
to assess performance, scalability, effectiveness and accuracy of the BizSLAM App.
4.2 Usefulness
Having access to the multiple levels of SLA information along the whole supply chain
significantly contributes to a better and more efficient planning and execution of trans-
port and logistics services. The data model underlying the BizSLAM App consolidates
all information relevant for SLA management of transport and logistics services. Nowa-
days, such information is scattered across e-mails, spread sheets, and paper documents.
Of course, this data model might not cover all cases of SLOs and relationships of the
entire transport and logistics industry. However, encouraging feedback from industry
partners indicates that the data model covers most of such cases. The organizations
we solicited feedback from represented companies of different size (SMEs and large
companies) and industry sectors (sea, air, and road carriers, as well as forwarders).
Considering the service level violations in the above scenario, current situation in in-
dustry would have seen penalties enforced only long after the logistics service provider
suffered the actual losses. This happens because the conformity check in transport and
logistics agreements is currently a manual process executed only periodically (e.g.,
quarterly, half yearly, annually, etc.). Such manual processes might be viable in a small
company, but in large companies with high volumes of specific agreements such manual
processes become extremely costly. Hence, new online, automated conformity check
mechanisms can drastically improve the timeliness of contract violation detection and
should thus lead to cost reductions.
5 Conclusion
Starting from an identification of industry requirements, this paper presents a runtime
SLA management approach for the transport and logistics domain. Specifically, we in-
troduced and demonstrated the usefulness of a novel software component called BizS-
LAM App that is able to manage SLAs of transport and logistics services at runtime.
The App leveraged SLA management approaches from the service-oriented comput-
ing field and adapted them to fit the specific requirements of the transports and logistic
domain, especially the need to support both frame SLAs and specific SLAs.
The BizSLAM App was developed on top of FIspace.eu, a cloud-based business col-
laboration platform that offers novel business-to-business collaboration facilities. This
Runtime Management of Multi-level SLAs 573
Acknowledgements. We cordially thank our industry partners of the FInest and FIs-
pace projects for their valuable contributions to the SLA data model. In addition, we
thank Stephan Heyne for supporting us in implementing the Linked-USDL models,
as well as Nadeem Bari for his help in implementing the BizSLAM App. We further
express our gratitude to Antonio Manuel Gutierrez, Manuel Resinas and Antonio Ruiz-
Cortés for earlier collaborations on that subject. Finally, we would like to thank the
anonymous reviewers for their constructive comments that benefited this paper.
This work was partially supported by the EU’s Seventh Framework Programme
(FP7/2007-2013) under grant agreements 285598 (FInest) and 604123 (FIspace).
References
1. Alliance of European Logistics: A technology roadmap for logistics. Technical Report
(October 2013)
2. Andrieux, A., Czajkowski, K., Dan, A., Keahey, K., Ludwig, H., Nakata, T., Pruyne, J.,
Rofrano, J., Tuecke, S., Xu, M.: Web services agreement specification (WS-Agreement).
Specification from the Open Grid Forum (OGF) (March 2007)
3. Augenstein, C., Ludwig, A., Franczyk, B.: Integration of service models – preliminary results
for consistent logistics service management. In: Service Research and Innovation Institute
Global Conference (SRII 2012), San Jose, Calif., USA (2012)
4. Benaissa, M., Boukachour, J., Benabdelhafid, A.: Web service in integrated logistics infor-
mation system. In: Int’l Symposium on Logistics and Industrial Informatics (LINDI 2007).
Wildau, Germany (2007)
5. Cuomo, A., Modica, G.D., Distefano, S., Puliafito, A., Rak, M., Tomarchio, O., Venticinque,
S., Villano, U.: An SLA-based broker for cloud infrastructures. Journal of Grid Comput-
ing 11(1), 1–25 (2013)
6. Dong, H., Hussain, F., Chang, E.: Transport service ontology and its application in the field of
semantic search. In: Int’l Conference on Service Operations and Logistics, and Informatics
(IEEE/SOLI 2008), vol. 1, pp. 820–824 (October 2008)
7. e-Freight project: D1.3b: e-Freight framework – information models (March 2010),
http://www.efreightproject.eu/
8. Feldman, Z., Fournier, F., Franklin, R., Metzger, A.: Proactive event processing in action:
a case study on the proactive management of transport processes (industry article). In: 7th
Int’l Conference on Distributed Event-based Systems (DEBS 2013), Arlington, Texas, USA
(2013)
9. Franklin, R., Metzger, A., Stollberg, M., Engel, Y., Fjørtoft, K., Fleischhauer, R., Marquezan,
C., Ramstad, L.S.: Future Internet technology for the future of transport and logistics. In:
ServiceWave Conference 2011, Future Internet PPP Track, Ghent, Belgium (2011)
10. Goel, N., Kumar, N., Shyamasundar, R.K.: SLA monitor: A system for dynamic monitoring
of adaptive web services. In: 9th European Conference on Web Services (ECOWS 2011),
Lugano, Switzerland (2011)
574 C.C. Marquezan et al.
11. Guihua, N., Fu, M., Xia, H.: A semantic mapping system based on e-commerce logistics
ontology. In: World Congress on Software Engineering (WCSE 2009), vol. 2, pp. 133–136
(May 2009)
12. Gutiérrez, A.M., Cassales Marquezan, C., Resinas, M., Metzger, A., Ruiz-Cortés, A., Pohl,
K.: Extending WS-Agreement to support automated conformity check on transport and lo-
gistics service agreements. In: Basu, S., Pautasso, C., Zhang, L., Fu, X. (eds.) ICSOC 2013.
LNCS, vol. 8274, pp. 567–574. Springer, Heidelberg (2013)
13. Hvitved, T., Klaedtke, F., Zalinescu, E.: A trace-based model for multiparty contracts. The
Journal of Logic and Algebraic Programming 81(2), 72–98 (2012)
14. Kouki, Y., Ledoux, T.: SLA-driven capacity planning for cloud applications. In: 4th Int’l
Conference on Cloud Computing Technology and Science (CloudCom 2012), Taipei,
Taiwan, pp. 135–140 (2012)
15. Leitner, P., Ferner, J., Hummer, W., Dustdar, S.: Data-driven and automated prediction
of service level agreement violations in service compositions. Distributed and Parallel
Databases 31(3), 447–470 (2013)
16. McConnell, A., Parr, G., McClean, S., Morrow, P., Scotney, B.: A SLA-compliant cloud
resource allocation framework for n-tier applications. In: 1st Int’l Conference on Cloud Net-
working (CLOUDNET 2012), Paris, France (2012)
17. Metzger, A., Franklin, R., Engel, Y.: Predictive monitoring of heterogeneous service-oriented
business networks: The transport and logistics case. In: Service Research and Innovation
Institute Global Conference (SRII 2012), San Jose, Calif., USA (2012)
18. Molina-Jimenez, C., Shrivastava, S., Strano, M.: A model for checking contractual compli-
ance of business interactions. IEEE Trans. on Services Comp. 5(2), 276–289 (2012)
19. Müller, C., Resinas, M., Ruiz-Cortés, A.: Automated analysis of conflicts in WS-Agreement.
IEEE Trans. on Services Comp. PP(99), 1 (2013)
20. Munteanu, V., Fortis, T., Negru, V.: An evolutionary approach for SLA-based cloud resource
provisioning. In: 27th Int’l Conference on Advanced Information Networking and Applica-
tions (AINA 2013), Barcelona, Spain (2013)
21. Papazoglou, M., Pohl, K., Parkin, M., Metzger, M. (eds.): Service Research Challenges and
Solutions for the Future Internet: S-Cube – Towards Mechanisms and Methods for Engineer-
ing, Managing, and Adapting Service-Based Systems. Springer (2010)
22. Rosario, S., Benveniste, A., Jard, C.: Monitoring probabilistic SLAs in web service orches-
trations. In: Int’l Symposium on Integrated Network Management (IM 2009), New York,
USA (2009)
23. Verdouw, C., Beulens, A., Wolfert, S.: Towards software mass customization for business
collaboration. In: Service Research and Innovation Institute Global Conference (SRII 2014),
San Jose, Calif., USA (2014)
24. Wieder, P., Butler, J.M., Theilmann, W., Yahyapour, R. (eds.): Service Level Agreements for
Cloud Computing. Springer (2011)
25. Xu, L., Jeusfeld, M.A.: Pro-active monitoring of electronic contracts. In: Eder, J., Missikoff,
M. (eds.) CAiSE 2003. LNCS, vol. 2681, pp. 584–600. Springer, Heidelberg (2003)
26. Yahya, B., Mo, J., Bae, H., Lee, H.: Ontology-based process design support tool for vessel
clearance system. In: Int’l Conference on Computers and Industrial Engineering (CIE 2010),
pp. 1–6 (July 2010)
27. Hua, W.Z., Yousen, H., Yun, D.Z., Wei, Z.: SOA-BPM based information system for promot-
ing agility of third party logistics. In: Int’l Conference on Automation and Logistics (ICAL
2009), Shenyang, China (2009)
28. Zhu, Q., Fung, R.: Design and analysis of optimal incentive contracts between fourth-party
and third-party logistics providers. In: Int’l Conference on Automation and Logistics (ICAL
2012), Zhengzhou, China (2012)
Single Source of Truth (SSOT)
for Service Oriented Architecture (SOA)
1 Introduction
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 575–589, 2014.
© Springer-Verlag Berlin Heidelberg 2014
576 C. Pang and D. Szafron
This paper describes a SSOT service model for two common data sharing scena-
rios: mutable and immutable data sources. We use two motivating examples to illu-
strate the variants: (a) the management of Postal Codes (PC) and (b) the management
of Electronic Patient Records (EPR). The proposed SSOT service model is useful for
any business entity that maintains full ownership of its data, and does not want the
clients to duplicate its data, but allows data access by restricted queries. The value of
the model will be illustrated by the PC and EPR examples.
Most business applications use addresses. In Canada, Canada Post is the single au-
thoritative agency that manages PCs for mail-delivery addresses. Each address should
have exactly one PC, while each PC covers an area with multiple addresses. Most
business applications collect address information from their customers, and store cus-
tomers’ addresses with PCs in their local databases. Periodically, business applica-
tions also replicate Canada Post’s PCs to their local databases for data-validation
purpose. For example, an application using billing addresses, which require PCs, may
have a Billing_Address table and a Postal_Code table as shown in Fig. 1(a). The
Postal_Code table contains all valid PCs periodically replicated from Canada Post.
Before adding a new address to the Billing_Address table, the application checks the
validity of the provided PC, i.e., whether the PC exists in the Postal_Code table. If so,
the application will add the new address to the Billing_Address table. This process
can only validate the existence of the PC, but it cannot validate whether the PC is
correct for that address. There is a mapping between the PC and the billing address.
A mapping that changes over time is called a mutable mapping, and a mapping that
does not change is immutable. Canada Post changes PCs from time to time. PCs can
be inserted, updated, deleted, split or merged. The application needs to synchronize
the Postal_Code table with Canada Post, and, if necessary, to correct the PCs in the
Single Source of Truth (SSOT) for Service Oriented Architecture (SOA) 577
Billing_Address table. Since the PC that is mapped to a billing address can change
over time, the mapping between PC and billing address is an example of a mutable
mapping. Mutable mappings are often subject to synchronization errors. For example,
Canada Post only provides PC changes to subscribers monthly. Therefore, the sub-
scribers’ Postal_Code tables are out-of-sync with Canada Post most of the time. As
shown in Fig. 1(b), when a PC is updated (from A1B 2C3 to A2B 2C3), the corres-
ponding mappings in the Billing_Address table can be updated. However, when a PC
is deleted (from A1B 2C4 to none) or split (from A1B 2C5 to A1B 2C5 and AIB
2C8), there is no simple solution to correct the mappings in the Billing_Address table,
by using the updated PC list.
As a second example, let us consider a regional Electronic Patient Record (EPR)
system that manages patients’ health numbers (PHNs), names and contacts. Each
patient in the region receives services from multiple healthcare providers, with differ-
ent specialties. Each healthcare provider obtains patient information directly from the
patient during the patient’s visit, containing information included in the EPR. Then,
each provider independently stores the patient’s information in its local database. In
this model, a provider-specific patient record may be inconsistent with the regional
EPR. In principle, each provider-specific patient record could be mapped to a corres-
ponding EPR in the regional authoritative system. In this case, the mapping between
an EPR and a provider-specific record is immutable. The mapping is immutable de-
spite the fact that patient’s data may change. For example, when a patient changes
his/her name, the patient’s EPR will be updated, but the patient’s EPR is still mapped
to the same provider-specific record. Therefore, the mapping is immutable.
The SOA paradigm can alleviate data synchronization issues. Clients using data
that already exists in an authoritative source will not replicate the authoritative data in
their local databases. Instead, the authoritative source acts as a SSOT service that
provides clients the authoritative data. In the examples above, Canada Post serves as
the SSOT service for PCs and a regional health authority provides the SSOT service
for EPRs. Clients access PCs and EPRs by invoking SSOT services, without replicat-
ing SSOT data in their local databases. Therefore, clients do not need to manage and
synchronize data with the SSOT. The SSOT can also shield its autonomy from the
clients. We advocate the SSOT service model over data-replication.
This paper is structured as follows. Section 2 and 3 illustrate the challenges and so-
lutions associated with the mutable and the immutable SSOT by the PC and the EPR
use cases respectively. Section 4 evaluates the performance of the SSOT service.
Section 5 describes the related works. Section 6 recommends future works, and Sec-
tion 7 concludes the paper by enumerating SSOT’s benefits.
For example, in the PC use case, Canada Post maintains a mutable PC SSOT ser-
vice that provides a single web service operation, PC-query. The PC-query operation
takes PC query criteria as input. Clients may specify Street Number, Number Suffix,
Unit/Suite Apartment, Street Name, Street Type, Street Direction, City, and/or Prov-
ince as criteria. The PC-query operation returns a set of PCs that match the criteria.
This PC SSOT service can replace data replication. For example, in Fig. 1(a), the
PC column in the Billing_Address table and the Postal_Code table are replicated data
that can be dropped in favor of invoking the PC-query operation provided by the mut-
able PC SSOT service. The PC query criteria will come from the remaining columns
(e.g. Street#, City) in the Billing_Address table. When the application needs the PC of
a billing address, it will use the address data in the Billing_Address table as query
criteria to retrieve the PC from the PC SSOT.
The rest of this section will use the PC use case to illustrate how clients can use a
mutable SSOT service to replace data replication.
Fig. 1(c) depicts how an application can use mutable SSOT service for data validation
during record creation. In the PC use case, when the application receives a new billing
address from a customer, the application queries the PC SSOT using the customer
address. If the PC SSOT returns a single valid PC, then the address is valid. The ap-
plication proceeds to create a new Billing_Address record for the customer.
If the PC SSOT returns more than one PC, then the customer address is not defini-
tive. For example, if the customer address contains only the city field, then the PC
SSOT will return all the PCs for the city. In principle, the application should not ac-
cept a non-definitive address as a billing address. Therefore, the application would
seek additional address details from the customer. Similarly, if the PC SSOT returns
no PC for the customer address, the application should alert the customer that the
address is invalid and request the user to take remedial action.
In contrast, the data-replication model may allow non-definitive or invalid billing
addresses in the application’s database. Using a mutable SSOT service not only eli-
minates data replication, but also enhances data quality.
Different clients may have different processes for the SSOT response. The applica-
tion in the PC use case expects a single valid PC from the response. Other applica-
tions may iterate the response records to select the most desired result. To support
different clients’ processes, the mutable SSOT query-by-criteria operation may return
additional information. For example, the PC-query operation may return other address
fields (Street Number, Unit/Suite Apartment, Street Name, etc.) in addition to the PC.
Clients can use the additional address fields to filter the response records.
Since the SSOT data is excluded from the clients’ local databases, each client needs to
combine its local data with the SSOT data to compose the complete data records. The
client data retrieval process is depicted in Fig. 1(d).
Single Source of Truth (SSOT) for Service Oriented Architecture (SOA) 579
In the PC use case, the application first retrieves a Billing_Address record from its
local database, then uses the local address data as query criteria to invoke the PC-
query operation, and retrieve the up-to-date PC from the PC SSOT. If the PC of the
billing address has changed since the last retrieval, then the PC SSOT will return a
different PC from the last retrieval, but it will be the correct PC.
3.1 Query-by-Criteria
An immutable SSOT service could provide the same query-by-criteria operation as a
mutable SSOT service. Clients could query the SSOT service by criteria and receive a
Single Source of Truth (SSOT) for Service Oriented Architecture (SOA) 581
correlated set of results matching the criteria. However, in some situations, the im-
mutable SSOT query-by-criteria operation may sustain a privacy violation risk. In the
EPR use case, if the query-by-criteria operation returns all resources matching any
given set of criteria, then any client can browse the EPR data, which violate patients’
privacy. For example, a client may specify Firstname=’JOHN’ as the query criteria
for the EPR SSOT query-by-criteria operation. In response, the EPR SSOT returns all
patients with first name equals ‘JOHN’. The large response set may violate many
patients’ privacy, since many of the set may not be patients of the querying facility.
To avoid browsing, the query-by-criteria operation could specify a maximum
number of returned records (i.e. query-limit). If the response set is larger than the
query-limit, the service would return an error. At which point the client needs to re-
fine the query criteria and queries again.
A safer query-by-criteria operation may also demand more than one query criterion
to avoid brute force hacking. For example, the EPR SSOT query-by-criteria operation
could reject single criterion queries to avoid PHN cracking by querying with random-
ly generated PHN’s until a valid resource is returned.
For further privacy protection, the query-by-criteria response set can be filtered to
contain partial data. The partial data must include the complete SSOT-ID and suffi-
cient information to identify a SSOT resource. Fig. 2(c) shows a set of query criteria
for the EPR SSOT query-by-criteria operation, and one partial response record. The
partial record includes the SSOT-ID and only part of the PHN, address and birthday.
Using the partial record, a client should be able to determine whether the EPR maps
to the targeted patient. Once a partial record is selected, the client can use the SSOT-
ID to retrieve the resource details using the query-by-SSOT-ID operation.
3.2 Query-by-SSOT-ID
If the query-by-criteria operation returns only partial data for privacy protection, the
immutable SSOT service must provide a query-by-SSOT-ID operation, which takes a
SSOT-ID as input. In response to a query-by-SSOT-ID request, the immutable SSOT
provides details of the resource corresponding to the provided SSOT-ID, but only the
details that the client is authorized to see. This allows the SSOT service to distinguish
between client access permissions, providing different information to different facili-
ties, such as pharmacies, clinics and acute-care facilities.
To protect data privacy, the immutable SSOT should include a proper auditing me-
chanism to detect, identify and stop improper browsing behavior or unlawful use of
data. In the EPR use case, if a client continually invokes the EPR SSOT query-by-
SSOT-ID operation with randomly generated or guessed SSOT-IDs, the EPR SSOT
auditing mechanism should detect and deter the client.
Each SSOT-ID is effectively a foreign key to a remote SSOT resource. Since the
SSOT and clients are loosely coupled, the foreign key constraints in the client data
cannot be enforced at the SSOT. An alternate foreign key constraint handling me-
chanism will be discussed in the later sub-sections.
582 C. Pang and D. Szafron
Fig. 2(d) depicts the role of an immutable SSOT service during client record creation.
Using the clinic application in the EPR use case as an illustration, when a patient first
visits the clinic, the clinic application needs the patient’s SSOT-ID. The patient pro-
vides personal data to the clinic. The clinic application invokes the EPR SSOT query-
by-criteria operation with the patient’s data. From the returned set of partial EPRs, the
clinic and patient together identify the correct EPR. With the SSOT-ID from the se-
lected partial EPR, the clinic application invokes the query-by-SSOT-ID operation to
retrieve the portion of the patient’s EPR that is permitted to the clinic. The clinic ap-
plication then assigns a county to the patient according to the patient’s home address
in the EPR. With the patient’s SSOT-ID and assigned county, the clinic application
adds a new record to the Clinic_Patient table for the patient. Similarly, when a patient
first visits the pharmacy, the pharmacy application uses the EPR SSOT query-by-
criteria operation to retrieve a partial EPR of the patient. The pharmacy application
gets the patient’s SSOT-ID from the partial EPR. Since the pharmacy data does not
depend on data in the EPR, the pharmacy application can add a new record to the
Pharmacy_Patient table for the patient with the patient’s SSOT-ID.
As with the mutable SSOT service, the immutable SSOT clients need to combine the
SSOT data with the local data to compose complete data records. The data retrieval
process is depicted in Fig. 2(e).
In the EPR use case, when a patient revisits the clinic or the pharmacy, the clinic
and pharmacy applications use the EPR SSOT query-by-criteria operation to retrieve
the patient’s SSOT-ID. With the SSOT-ID, the applications fetch the permission-
filtered EPR with query-by-SSOT-ID. Using the SSOT-ID again, the applications
retrieve patient’s local data from the local databases, i.e. the Clinic_Patient, Pa-
tient_Visit, Pharmacy_Patient and Drug_Dispensing tables in Fig. 2(b). Finally, the
applications combine the EPR and local data to instantiate a complete patient record.
number, to ensure its uniqueness. Since the SSOT-ID will be shared between systems, it
is recommended that the SSOT-ID take a different format from the SSOT database
standard, so that the SSOT database standard will not be exposed.
Clients store the SSOT-IDs in their local databases. The SSOT-IDs are guaranteed
to be unique and permanent. Therefore, clients may consider using the SSOT-IDs as
the primary keys in their local databases. If the SSOT-ID data type and size do not
match the local database standard, we recommend that the local database create its
own local primary key, and use the SSOT-ID as a foreign key. In the EPR use case,
the Clinic_Patient table could have used the EPR SSOT-ID as the primary key. Since
the primary key of the Clinic_Patient table will be a foreign key of the Patient_Visit
table, the clinic application created its local primary key (PID in the Clinic_Patient
table), to preserve data type consistency between tables.
The clients should also consider whether the sequence of the primary keys matter
to the local business logic. In the EPR use case, it is likely that a portion of the EPR
SSOT-ID is a sequential number. The clinic serves only a relatively small number of
patients in the regional EPR system. Therefore, only a small number of the EPR
SSOT-IDs will be imported into the clinic’s database. If the clinic application uses the
EPR SSOT-IDs as its primary key, then the primary key will have a lot of gaps in its
sequence. In addition, the order of the primary keys will not represent the order in
which patients’ records are added to the clinic’s database.
Data updates for an immutable SSOT may affect clients. In the EPR use case, the
clinic application assigns a patient’s county based on a patient’s home address. When
a patient moves, the clinic application may assign a different county for the patient. In
this case, data updates in the EPR SSOT affect the clinic application. On the other
hand, none of the pharmacy application local data depends on the EPR SSOT data.
Therefore, data updates in the EPR SSOT do not affect the pharmacy application.
The SSOT cannot determine how data updates will affect the loosely coupled
clients. Clients are responsible for managing their own data. Therefore, the immutable
SSOT must provide an update subscription operation. If a client is concerned about
data updates in the SSOT, then the client is responsible for subscribing to the SSOT
update service through the update subscription operation.
After a SSOT resource is updated, the SSOT will send an update message with the
SSOT-ID of the updated resource to the subscribers. When the subscriber receives the
update message, the subscriber can check whether the SSOT-ID is referenced locally.
If not, the subscriber can ignore the update message. If the SSOT-ID is referenced
locally, then the subscriber can fetch the resource details using the query-by-SSOT-ID
operation. Based on the latest resource details, the subscriber may update its local data
accordingly. The update subscription process is depicted in Fig. 2(f).
In the EPR use case, the clinic application would subscribe to the EPR SSOT up-
date service. When an EPR is updated, the clinic application will receive an update
message with the SSOT-ID of the updated EPR. The clinic application determines
whether the SSOT-ID is referenced locally. If so, it retrieves the patient details from
584 C. Pang and D. Szafron
the EPR SSOT using the query-by-SSOT-ID operation. Then the clinic application
can determine whether the patient’s latest home address matches the clinic-assigned
county in the local database. If not, it updates the local database accordingly. On the
other hand, the pharmacy application is not affected by EPR updates. Therefore, the
pharmacy application would not subscribe to the EPR SSOT update service. Notice
that the pharmacy still relies on the SSOT for the latest EPR patient information.
However, it does not subscribe for updates since it does not need to update its own
local database, when EPR data changes.
In the EPR use case, if a patient has not registered with the regional EPR SSOT, then
the clinic and pharmacy applications would not find the patient through the query-by-
criteria operation. If the EPR SSOT also provides a patient creation operation, then
the authorized clinic or pharmacy personnel can create SSOT EPRs as needed.
If the clinic or pharmacy personnel are not authorized to create SSOT EPRs, then
the un-registered patients need to register with the EPR authority later. In the mean-
time, the applications can create a local temporary file to store the patient’s data. An
optional column (TEMP FILE#) can be added to the Clinic_Patient and Pharma-
cy_Patient tables to keep track of the temporary file number. In the absence of the
SSOT ID and existence of the TEMP FILE#, the applications will not query the EPR
SSOT for patient’s information, but retrieve data from the local temporary file. After
the patient registers with the EPR SSOT, the applications can insert the EPR SSOT-
ID into the Clinic_Patient and Pharmacy_Patient tables, and delete the temporary file.
At this point, the application returns to normal processing.
The SSOT may publish additional QoS metrics, such as capacity, performance, ro-
bustness, accuracy and more [3]. Clients can design their usage of the SSOT service
according to the SSOT’s QoS metrics.
If the SSOT service is part of a larger enterprise or jurisdiction, then the SSOT ser-
vice usually has the same operational hours and maintenance schedules as their
clients. Public SSOT services, like Canada Post, are usually available 24x7.
5 Related Work
Most SSOT is implemented on the data layer. Ives et al. [5] suggest synchronizing
distributed data on the data layer. Ives et al. propose a “Collaborative Data Sharing
System (CDSS) [that] models the exchange of data among sites as update propagation
among peers, which is subject to transformation (schema mapping), filtering (based
on policies about source authority), and local revision or replacement of data.” Since
data across multiple sites are continuously “updated, cleaned and annotated”, cross-
site synchronization has to deal with issues such as data correctness, schema and ter-
minology consistence, and timing. These data layer synchronization hurdles highlight
the advantage of our SSOT service model that eliminates data layer synchronization.
Others try to implement SSOT using an Enterprise Service Bus (ESB) [6], in which
a SSOT is defined. Clients duplicate the SSOT data locally, and subscribe to ESB for
SSOT updates. Whenever the SSOT is updated, clients synchronize with the SSOT by
repeating the changes in their local copies. Our SSOT model totally avoids data dup-
lication at the clients’ site.
Instead of a data-centric model for SSOT, some research has turned to artifact-
centric modeling [7]. An artifact is a set of name-value-pairs related to a business
process or task, where data represents business objects. In the artifact-centric model,
each artifact instance is shared between all process participants. The participants get
information from the artifact and change the state of the artifact to accomplish the
process goal. Since the artifacts are shared between process participants, access and
transaction control is necessary. Hull [8] suggests using artifact-centric hubs to facili-
tate communication and synchronization between the participants. Our SSOT service
model does not require complicated facilitation or a centralized hub.
Other researchers have proposed the Personal Information Management (PIM) [9]
model. In our model, the SSOT does not have any knowledge about its clients’ data.
However, the PIM model finds, links, groups and manages clients’ data references to
the source. PIM is a centralized data management model, while SSOT is a distributed
data management model.
Finally, Ludwig et al. [10] propose a decentralized approach to manage distributed
service configurations. The proposed solution uses RESTful services to exchange
configuration data between hosts, and a subscription mechanism to manage changes.
This approach endorses a data perspective similar to an SSOT service, where each
data source maintains self-contained autonomous data. Data is not synchronized
across multiple sites. Sources and clients are statically bound.
Single Source of Truth (SSOT) for Service Oriented Architecture (SOA) 587
6 Future Work
7 Conclusion
The SSOT service model described in this paper addresses the data-synchronization
problems that arise due to data-layer replication across distributed systems. On the
other hand, the SSOT service model introduces a single point of failure in the system.
Depending on the Service Level Agreement (SLA), the SSOT may need support from
multi-site configurations or cloud-infrastructure with fail-over capability. Although
the data-replication model does not have a single point of failure, it suffers from data-
synchronization and data-inconsistency issues. Data synchronization usually involves
defining a custom peer-to-peer data exchange agreement. The custom agreement
tightly couples the data provider and consumer, which makes switching providers
very costly. Nonetheless, data synchronization usually happens during the overnight
maintenance windows. Data becomes stale between synchronizations. Furthermore,
data replication keeps a full copy of the provider’s data at the clients’ sites. If clients
use only a small portion of the provider’s data, then the clients are wasting resources.
An added benefit of the SSOT service model is that it can control what data each
client is authorized to access, while data replication makes all data available to clients.
The SSOT service model allows the provider and clients to be loosely coupled.
Clients do not need to pledge infrastructure resources for the foreign data. The SSOT
service model also provides up-to-date data. Overall, we believe that the SSOT ser-
vice model can be used to eliminate data replication, enforce data autonomy, advocate
data self-containment, ease data maintenance and enhance data protection. In the long
term, these properties will also increase business adaptability.
Within large enterprises or government agencies, managing large amounts of data
as a single entity is problematic. Decomposing a large data set into smaller autonom-
ous and independently managed data sets can increase flexibility. As in the EPR case,
once the EPR SSOT service is established, a new patient related service can be
created without defining and creating its own patient data set. The new service does
not need to negotiate with other parties regarding data acquisition or synchronization.
The new service can loosely couple with the EPR SSOT and be established quickly.
In addition, the SSOT service model allows each individual service to be self-
contained and maintain its local database. For example, the clinic application and the
pharmacy application in the EPR use case maintain their individual local databases
without sharing data with the EPR system. This characteristic is very important in the
health industry, where patients’ privacy is closely monitored.
The SSOT service model is also applicable to the financial industry. Banking, in-
vestment and insurance businesses are often integrated under one corporation. How-
ever, legislation may require each of these businesses to be separate entities. The
SSOT service model allows the corporation to create a customer SSOT to register
each customer once. Banking, investment and insurance services can run as separate
entities, while being loosely coupled with the customer SSOT service. With the SSOT
service model, new financial services can be introduced more quickly. Similarly, the
SSOT service model can benefit any jurisdiction that provides multiple services.
Single Source of Truth (SSOT) for Service Oriented Architecture (SOA) 589
References
1. Lasila, O., Swick, R.R.: World Wide and Web Consortium: Resource Description Frame-
work (RDF) Model and Syntax Specification, W3C Recommendation (1998)
2. Fielding, R.T.: Chapter 5 Representational State Transfer (REST), Architectural Styles and
the Design of Network-based Software Architectures, Doctoral dissertation, University of
California, Irvine (2000)
3. Dustdar, S., Schreiner, W.: Survey on Web services Composition. International Journal on
Web and Grid Services 1, 1–30 (2005)
4. Ran, S.: A Model for Web Services Discovery With QoS. ACM SIGecom Exchanges 4(1),
1–10 (2003)
5. Ives, Z., Khandelwal, N., Kapur, A., Cakir, M.: ORCHESTRA: Rapid, Collaborative Shar-
ing of Dynamic Data. In: The 2nd Biennial Conference on Innovative Data Systems
Research (CIDR 2005), Asilomar, CA, USA (2005)
6. Schmidt, M.-T., Hutchison, B., Lambros, P., Phippen, R.: The Enterprise Service Bus:
Making servie-oriented architecture real. IBM Systems Journal 44(4), 781–797 (2005)
7. Nigam, A., Caswell, N.: Business artifacts: An approach to operational specification. IBM
Systems Journal 47(3), 428–445 (2003)
8. Hull, R.: Artifact-Centric Business Process Models: Brief Survey of Research Results and
Challenges. In: OTM 2008, Monterrey, Mexico (2008)
9. Jones, W.: Personal Information Management. Annual Review of Information Science and
Technology 41(1), 453–504 (2007)
10. Ludwig, H., Laredo, J., Bhattacharya, K., Pasquale, L., Wassermann, B.: REST-Based
Management of Loosely Coupled Services. In: The 18th International Conference on
World Wide Web (WWW 2009), Madrid, Spain (2009)
11. HL7 Health Level Seven International, http://www.hl7.org
12. RosettaNet (1999), http://www.rosettanet.org
13. EDIFACT, United Nations Directories for Electronic Data Interchange for Administration,
Commerce and Transport,
http://www.unece.org/trade/untdid/welcome.htm
Model for Service License in API Ecosystems
1 Introduction
X. Franch et al. (Eds.): ICSOC 2014, LNCS 8831, pp. 590–597, 2014.
© Springer-Verlag Berlin Heidelberg 2014
Model for Service License in API Ecosystems 591
Next section presents the related work. Section 3 describes our proposed model
and we discuss how it can be used to automate composition of licenses. Section 4
describes the implementation of SLA analyzer that uses our model. Section 5 con-
cludes and outlines future areas of research.
2 Related Work
Software licenses [2] are centered on the capacity-based model, where the focus is on
capacity elements, such as CPU. They are static and data-centric, and computed when
the new deployment is complete. They do capture QoS and SLA guarantees.
Web-Service License Agreement (WSLA) framework [3] was designed to capture
involved parties, SLA parameters, their metrics and algorithms, and service licensing
objectives and the corresponding actions. It does not provide support for automati-
cally creating agreements and it does not capture business and legal terms.
Web Service Agreement Specification (WS-Agreement) [4] provides terms and
language to describe services, their properties, and associated guarantees. It is not a
generic tool for conflict classification. There is no mechanism for deriving service
delivery system based it and claiming a service against the agreement.
Existing efforts addressed compatibility of functional [5,6,7] and non-functional
[8, 9, 10] parameters as part of service selection and matching process.
A number of commercial solutions aim to help consumers understand service li-
censes [11,12,13]. Our prior work presents a classification of common terms and con-
ditions and describes a terms of service management console [14].
592 M. Vukovic, L. Zeng, and S. Rajagopal
In this section, we introduce a service license meta-model, shown on Figure 1., which
consists of layers: Information source, Property Function, License Metric and License
Terms and Conditions Metamodel. Details about each layer of the metamodel follow.
The information source meta-model provides constructs for defining the service dep-
loyment and execution information. The deployment information is related to hard-
ware assets that install software. The execution information describes runtime events
that related to execution of software. The runtime event may indicate how many in-
stances are running in a physical server, when the instance is started and terminated,
how many CPU cycles, memory, etc. are consumed, etc. Runtime events also report
the business usage of the service. For example, when a business transaction is in-
itiated, an event with transaction details is created.
By considering business users, we adopt object-oriented model to construct infor-
mation metamodel. On the one hand, the deployment information usually is persisted
in relational database. Therefore, the proposed metamodel provides constructs to map
the deployment information from relational database to object-oriented. In most the
case, the deployment information model are static and common to most of the organi-
zation and service vendors. Therefore, in practice, most of the organization can adopt
pre-defined information model, without creating a new information model from
scratch.
On the other hand, the execution information usually is live events that need to be
processed before persisted in storage. The information metamodel provides constructs
to define the event catalog that can include a collection of event types. It should be
noted that the events that related to service usages could be diverse from different
service vendors. And usually the service vendors provide definition of these events
and event dispatchers that can detect and emit related events in runtime.
Action *
*
* *
Price Calculation Condition
1
1 *
License Capacity License Capacity Unit License Term
Calculation CapacityUnitName: String LicenseTermName: String
* * * * *
** * *
*
Expression Formula-based Function Table-based Function Boolean Function
ExpressionString:String * FunctionName:String FunctionName:String *
*
1
*
TableRow
ReturnValue:String
Property Function RetrunType:String
*
* * * *
Deployment Env Deployment Entity Execution Event 1 *
Metric ECA Rule
ApplicationDomain:String EntityName:String EventName:String
1 1
1 * * *
Data Source TypedAttributed Event Source
TableName:String AttributeName:String Event Emitter:String
AttributeType:String
IsKey:Boolean
Information Source
and usage-based. Capacity-based license metric usually calculates the license require-
ment by considered the capacity of host that installs or executes the service. The license
requirement will not change unless there are hardware upgrades. For example, license
metric that measures total number of CPU cores of the physical server that the software
deploys to. Unlike capacity-based license metrics, usage-based license metrics are
created, calculated and based on live execution event events in realtime fashion. An ex-
ample of usage-based license metric may measure number of transactions the service
executes.
In general, both types of license metrics consist of a LicenseUnit and License Cal-
culations. The license unit defines denominations that are used to measure the license
requirement. In case of capacity-based license metrics, the license unit can be either
defined as formula-based function either fixed value or a property function. For ex-
ample, a license metric bubbled ”NumberOfCore” that uses total number of CPU
cores in the physical server to measure the license entitlement.
In this case the license capacity unit is the fix string “core”. In another example, a
license metric “UserTier” uses number of users as input to map user tiers such as:
from one to ten users is considered as ”tier 1”, and from eleven to hundred users is
594 M. Vukovic, L. Zeng, and S. Rajagopal
considered as “tier 2”. In this case, the table-based function that specifies mapping be-
tween the numbers of users to kinds of tiers can be used to define the license capacity unit.
Similar to license capacity unit, license capacity calculation can also be defined by
formula-based or table-based functions. In the example of “NumberOfCore”, a for-
mula expression n CPUi .numberOfCore that sums up all the CPU cores in the physical
i=0
server can be used to define the license capacity calculation. In the expression, the
CPUi .numberOfCore indicates number of cores in each CPU.
In another example, the license metric Processor Value Unit (PVU) maps processor
properties such as processor vendor, brand, type and model number to numerical val-
ue. In this case, the license capacity calculation is defined as table-based function
(shown in Table 1).
Above two constraints indicate that local data copy is retained by the service how-
ever, it will not reliable for data lost.
Brand usage constraints:
Service.Deployment.BrandPermission=”Not specified”;
Service.Deployment.LogoUsage=”Consumer logo”
Above two constraints indicate that when deploying the service, brand permission
is not specified and consumer logo is automatically used.
Model for Service License in API Ecosystems 595
4 Model in Use
Using the proposed model we have developed the SLA analyzer system, which ana-
lyzes the service licenses based on the agreed and actual availability. It also has the
capability to analyze licenses based on various other parameters, such as business
constraints (e.g. user eligibility and brand permission). User can select one or more
APIs and track their violations in a given time window, as shown on Figure 1. Details
about the input and output of SLA API are shown in Figure 2.
Output:
{"Results":[{"ServiceID":"<Service ID>","SLA":<Agreed
SLA>,"CustomerID":"<Customer
ID>","QoSAlerts":[{"TimeStamp":"<TimeStamp>","QoSAlert":<true|false>
,"QoSValue":<Actual SLA>}]}]}
Sample Input/Output
Input:
{"apiID":["62","209","19"],"customerId":"10","timeWindow":3}
Output:
{"Results":[
{"ServiceID":"62","SLA":99.9,"CustomerID":"10",
"QoSAlerts":[{"TimeStamp":"2013-11-
05","QoSAlert":true,"QoSValue":98.0},
{"TimeStamp":"2013-11-06","QoSAlert":false,"QoSValue":100.0},
{"TimeStamp":"2013-11-07","QoSAlert":false,"QoSValue":100.0}]},
{"ServiceID":"209","SLA":99.0,"CustomerID":"10",
"QoSAlerts":[{"TimeStamp":"2013-11-
05","QoSAlert":true,"QoSValue":92.0},
{"TimeStamp":"2013-11-06","QoSAlert":true,"QoSValue":88.0},
{"TimeStamp":"2013-11-07","QoSAlert":true,"QoSValue":79.0}]},
{"ServiceID":"19","SLA":99.7,"CustomerID":"10",
"QoSAlerts":[{"TimeStamp":"2013-11-
05","QoSAlert":true,"QoSValue":96.0},
{"TimeStamp":"2013-11-06","QoSAlert":false,"QoSValue":100.0},
{"TimeStamp":"2013-11-07","QoSAlert":false,"QoSValue":100.0}]}
]}
References
1. Pautasso, C., Zimmermann, O., Leymann, F.: Restful web services vs. big web services:
making the right architectural decision. In: 17th International Conference on World Wide
Web (2008)
2. Minkyong, K., Han, C., Munson, J., Lei, H.: Management-Based License Discovery for
the Cloud. In: International Conference of Service Oriented Computing (2012)
3. Keller, A., Ludwig, H.: The WSLA Framework: Specifying and Monitoring Service Level
Agreements for Web Services. Journal of Network System Management (2003)
4. Web Services Agreement Specification. Available at:
http://www.ogf.org/documents/GFD.107.pdf
5. Liu, Y., Ngu, A.H., Zeng, L.Z.: QoS computation and policing in dynamic web service se-
lection. World Wide Web (2004)
6. Karmarkar, A., Walmsley, P., Haas, H., Yalcinalp, L.U., Liu, K., Orchard, D., Pasley, J.:
Web service contract design and versioning for SOA. Prentice Hall (2009)
7. Verma, K., Akkiraj, R., Goodwin, R.: Semantic Matching of Web Service Policies. In:
Second International Workshop on Semantic and Dynamic Web Processes (2005)
8. Reiff-Marganiec, S., Yu, H.Q., Tilly, M.: Service Selection Based on Non-functional
Properties. In: International Conference of Service Oriented Computing (2007)
9. Web Services Policy (WS-Policy), http://www.w3.org/Submission/WS-
Policy/
10. Gangadharan, G.R., Comerio, M., Truong, H.-L., D’Andrea, V., De Paoli, F., Dustdar, S.:
LASS – License Aware Service Selection: Methodology and Framework. In: International
Conference of Service Oriented Computing (2008)
11. Digital Trends. Terms and Conditions. Available at:
http://www.digitaltrends.com/topic/terms-and-conditions/
12. 500px’s Terms Of Service Are Kind Of Awesome. TechCrunc Article. Available at:
http://techcrunch.com/2012/04/12/500pxs-terms-ofservice-are-
kind-of-awesome/
13. Terms of Service Didn’t Read. Available at: http://tosdr.org
14. Vukovic, M., Rajagopal, S., Laredo, J.: API Terms and Conditions as a Service. In: IEEE
Service Computing Conference (2014)
Author Index