Workflow Management System
Workflow Management System
workflow engine,
a resource broker ,
where the Aneka platform is used in its entirety to complete the workflow.
the other where Amazon EC2 is used to supplement a local cluster when
there are insufficient resources to meet the QoS requirements of the
application.
This relieves the WfMS from the responsibility of managing and allocating
resources directly, to simply negotiating the required resources with Aneka.
Aneka also provides a set of Web services for service negotiation, job
submission, and job monitoring.
The WfMS would orchestrate the workflow execution by scheduling jobs
in the right sequence to the Aneka Web Services.
In this case, the data would take the form of a set of files, including the
application binaries.
These data can be uploaded by the user prior to execution, and they can be
stored in storage facilities offered by cloud services for future use. The
WfMS then forwards workflow tasks to Aneka’s scheduler via the Web
service interface.
These tasks are subsequently examined for required files, and the storage
service is instructed to stage them in from the remote storage server, so that
they are accessible by the internal network of execution nodes. The
execution begins by scheduling tasks to available execution nodes (also
known as worker nodes).
The workers download any required files for each task they execute from
the storage server, execute the application, and upload all output files as a
result of the execution back to the storage server.
These files are then staged out to the remote storage server so that they are
accessible by other tasks in the workflow managed by the WfMS. This
process continues until the workflow application is complete.
The second scenario describes a situation in which the WfMS has greater
control over the compute resources and provisioning policies for executing
workflow applications.
In this second scenario, the WfMS interacts directly with the resources
provisioned.
When using Aneka, however, all interaction takes place via the Web service
interface.
Workflow management systems are responsible for managing and executing these workflows.
The Cloud bus workflow management system consists of components that are responsible for
Landlines tasks, data and resources.
User Interface:
Workflow Composition
Workflow Execution Planning
Submission
Monitoring
The Core System:
The core components are responsible for managing the execution of workflows. They
facilitate the transaction of high-level workflow description into task and data objects.
These objects are used by the execution sub system. The scheduling component applies user
selected scheduling policies and plans to the work flow of various stages on their execution.
Plug-ins:
The Plug-ins support workflow execution on different environments and platforms. The plug-ins
are used for querying task transferring the execution status of tasks and applications and
measuring the energy consumption.
The resources are at the bottom layer of the architecture which includes clusters
global grids and clouds. The resources managers may communicate with the
Market Maker
Sealable Application Manager
Intercloud Services for global resource manager
First, Aneka serves as a useful tool for utilizing clouds, including platform abstraction and
dynamic provisioning.
Second, we describe later in the chapter a case study detailing the use of Aneka to execute a
scientific workflow application on clouds.
1.Aneka
Aneka is a distributed middleware for deploying platform-as-a-service (PaaS) offerings (Figure
12.3).
Developed at CLOUDS Lab, University of Melbourne, Aneka is the result of years of research on
cluster, grid, and cloud computing for high-performance computing (HPC) applications.
Aneka, which is both a development and runtime environment, is available for public use (for a
cost), can be installed on corporate networks, or dedicated clusters, or can be hosted on
infrastructure clouds like Amazon EC2.
In comparison, similar PaaS services such as Google AppEngine [19] and Windows Azure [20]
are in-house platforms hosted on infrastructures owned by the respective companies
. Aneka was developed on Microsoft’s.NET Framework 2.0 and is compatible with other
implementations of the ECMA 335 standard [21], such as Mono.
Aneka can run on popular platforms such as Microsoft Windows, Linux, and Mac OS X,
harnessing the collective computing power of a heterogeneous network.
The runtime environment consists of a collection of Aneka containers running on physical or
virtualized nodes.
Each of these containers can be configured to play a specific role such as scheduling or
execution.
The Aneka distribution also provides a set of tools for administrating the cloud, reconfiguring
nodes, managing users, and monitoring the execution of applications.
The Aneka service stack provides services for infrastructure management, application
execution management, accounting, licensing, and security.
Aneka’s Dynamic Resource Provisioning service enables horizontal scaling depending on the
overall load in the cloud.
The platform is thus elastic in nature and can provision additional resources on-demand from
external physical or virtualized resource pools, in order to meet the QoS requirements of
applications. I
in a typical scenario, Aneka would acquire new virtualized resources from external clouds such as
Amazon EC2, in order to meet the minimum waiting time of applications submitted to Aneka.
Such a scenario would arise when the current load in the cloud is high, and there is a lack of
available resources to timely process all jobs.
The development environment provides a rich set of APIs for developing applications that can
utilize free resources of the underlying infrastructure.
These APIs expose different programming abstractions, such as the task model, thread
model, and MapReduce
The task programming model is of particular importance to the current discussion.
It models “independent bag of tasks” (BoT) applications that are composed of a collection of
work units independent of each other, and it may be executed in any given order.
One of the benefits of the task programming model is its simplicity, making it easy to run legacy
applications on the cloud.
An application using the task model composes one or more task instances and forwards them as
work units to the scheduler.
The scheduling service currently supports the First-In-First-Out, First-In-First-Out with
Backfilling, Clock-Rate Priority, and Preemption-Based Priority Queue scheduling
algorithms.
The runtime environment also provides two specialized services to support this model:
1. the task scheduling service and
2. the task execution service.
The storage service provides a temporary repository for application files— that is, input files that
are required for task execution, and output files that are he result of execution.
Prior to dispatching work units, any files required are staged-in to the storage service from the
remote location.
This remote location can be either the client machine, a remote FTP server, or a cloud storage
service such as Amazon S3.
The work units are then dispatched to executors, which download the files before execution.
Any output files produced as a result of the execution are uploaded back to the storage service.
From here they are staged-out to the remote storage location.
2.Aneka Web Services
Aneka exposes three SOAP Web services for
1. service negotiation,
2. reservation, and
3. task submission, as depicted in Figure 12.4.
The negotiation and reservation services work in concert, and they provide interfaces for
negotiating resource use and reserving them in Aneka for predetermined timeslots
As such, these services are only useful when Aneka has limited resources to work with
and no opportunities for provisioning additional resources.
The task Web service provides a SOAP interface for executing jobs on Aneka.
Based on the task programming model, this service allows remote clients to submit jobs,
monitor their status, and abort jobs.
3.General Approach
Traditional WfMSs were designed with a centralized architecture and were thus tied to a single
machine.
Moving workflow engines to clouds requires
(a) architectural changes and
(b) integration of cloud management tools
Architectural Changes.
Most components of a WfMS can be separated from the core engine so that they can be
executed on different cloud services.
Each separated component could communicate with a centralized or replicated workflow
engine using events.
The manager is responsible for coordinating the distribution of load to its subcomponents,
such as the Web server, persistence, monitoring units, and so forth.
In our WfMS, we have separated the components that form the architecture into the
following:
o user interface,
o core, and
o plug-ins.
The user interface can now be coupled with a Web server running on a “large” instance of cloud
that can handle increasing number of users.
The Web request from users accessing the WfMS via a portal is thus offloaded to a different set
of resources.
Similarly, the core and plug-in components can be hosted on different types of instances
separately.
Depending on the size of the workload from users, these components could be migrated or
replicated to other resources, or reinforced with additional resources to satisfy the increased load.
Thus, employing distributed modules of the WfMS on the basis of application requirements helps
scale the architecture.
Integration Of Cloud Management Tools.
As the WfMS is broken down into components to be hosted across multiple cloud
resources, we need a mechanism to
(a) access, transfer, and store data and
(b) enable and monitor executions that can utilize this approach of scalable distribution
of components.
The cloud service provider may provide APIs and tools for discovering the VM instances
that are associated to a user’s account.
Because various types of instances can be dynamically created, their characteristics such
as CPU capacity and amount of available memory are a part of the cloud service
provider’s specifications.
Similarly, for data storage and access, a cloud may provide data sharing, data movement,
and access rights management capabilities to user’s applications.
Cloud measurement tools may be in place to account for the amount of data and
computing power used, so that users are charged on the pay-per-use basis.
A WfMS now needs to access these tools to discover and characterize the resources
available in the cloud.
It also needs to interpret the access rights (e.g., access control lists provided by
Amazon), use the data movement APIs, and share mechanisms between VMs to fully
utilize the benefits of moving to clouds.
In other words, traditional catalog services such as the Globus Monitoring and
Discovery Service (MDS) , Replica Location Services, Storage Resource Brokers,
Network Weather Service , and so on could be easily replaced by more user-friendly
and scalable tools and APIs associated with a cloud service provider.
We describe some of these tools in the following section.
The range of tools and services offered by cloud providers play an important role in integrating
WfMSs with clouds (Figure 12.5).
Such services can facilitate in the deployment, scaling, execution, and monitoring of workflow
systems.
This section discusses some of the tools and services offered by various service providers that can
complement and support WfMSs
. A WfMS manages dynamic provisioning of compute and storage resources in the cloud with the
help of tools and APIs provided by service providers.
The provisioning is required to dynamically scale up/down according to application
requirements.
For instance, data-intensive workflow applications may require large amount of disk space for
storage.
A WfMS could provision dynamic volumes of large capacity that could be shared across all
instances of VMs (similar to snapshots and volumes provided by Amazon).
Similarly, for compute-intensive tasks in an workflow, a WfMS could provision specific instances
that would help accelerate the execution of these compute-intensive tasks.
A WfMS implements scheduling policies to assign tasks to resources based on applications’
objectives.
This task-resource mapping is dependent on several factors: compute resource capacity,
application requirements, user’s QoS, and so forth.
Based on these objectives, a WfMS could also direct a VM provisioning system to consolidate
data center loads by migrating VMs so that it could make scheduling decisions based on locality
of data and compute resources.
A persistence mechanism is often important in workflow management systems and for managing
metadata such as available resources, job queues, job status, and user data including large input
and output files.
Technologies such as Amazon S3, Google’s BigTable, and the Windows Azure Storage
Services can support most storage requirements for workflow systems, while also being
scalable, reliable, and secure.
If large quantities of user data are being dealt with, such as a large number of brain images used
in functional magnetic resonance imaging (fMRI) studies [12], transferring them online can be
both expensive and time-consuming.
In such cases, traditional post can prove to be cheaper and faster. Amazon’s AWS
Import/Export5 is one such service that aims to speed up data movement by transferring large
amounts of data in portable storage devices.
The data are shipped to/from Amazon and offloaded into/from S3 buckets using Amazon’s high-
speed internal network.
The cost savings can be significant when transferring data on the order of terabytes. Most cloud
providers also offer services and APIs for tracking resource usage and the costs incurred.
This can complement workflow systems that support budget-based scheduling by utilizing real-
time data on the resources used, the duration, and the expenditure.
This information can be used both for making scheduling decisions on subsequent jobs and for
billing the user at the completion of the workflow application.
Cloud services such as Google App Engine and Windows Azure provide platforms for building
scalable interactive Web applications.
This makes it relatively easy to port the graphical components of a workflow management system
to such platforms while benefiting from their inherent scalability and reduced administration.
For instance, such components deployed on Google App Engine can utilize the same scalable
systems that drive Google applications, including technologies such as BigTable and GFS .