0% found this document useful (0 votes)
9 views17 pages

CloudComputing Module2

The document outlines the syllabus for a module on cloud computing, covering application paradigms, challenges, and architectural styles. It details various cloud applications, including scientific, business, social media, and mobile applications, and discusses service models like IaaS, PaaS, and SaaS. Additionally, it highlights challenges such as performance isolation, security, and vendor lock-in, while presenting architectural styles like microservices and event-driven architecture.

Uploaded by

Kalpana Murthy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views17 pages

CloudComputing Module2

The document outlines the syllabus for a module on cloud computing, covering application paradigms, challenges, and architectural styles. It details various cloud applications, including scientific, business, social media, and mobile applications, and discusses service models like IaaS, PaaS, and SaaS. Additionally, it highlights challenges such as performance isolation, security, and vendor lock-in, while presenting architectural styles like microservices and event-driven architecture.

Uploaded by

Kalpana Murthy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Module-1:Syllabus:

Cloud Computing: Application Paradigms.: Challenges of cloud computing, Architectural styles of


cloud computing, Workflows: Coordination of multiple activities, Coordination based on a state
machine model: The Zookeeper, The Map Reduce programming model, A case study: The Gre The
Web application, Cloud for science and engineering, High- performance computing on a cloud, Cloud
computing for Biology research, Social computing, digital content and cloud computing.

Cloud Computing: Applications and Paradigms


Cloud computing supports a wide range of applications and is based on distinct paradigms that
define how cloud services are structured and delivered.

Cloud Applications
Cloud applications are software programs that run on cloud infrastructure and are accessed over
the internet. They benefit from on-demand scalability, pay-per-use billing, and ubiquitous
accessibility.

1. Scientific Applications

 Used in physics, biology, astronomy, climate modeling, etc.

 Require large-scale computation and storage.

 Example: CERN, which processes petabytes of particle collision data.

2. Business Applications

 Include ERP, CRM, e-commerce platforms, and data analytics tools.

 Reduce infrastructure costs and enable global access.

 Example: Salesforce, SAP on cloud, Zoho CRM.

3. Social Networking and Media Applications

 Handle massive user-generated content (photos, videos, messages).

 Examples: Facebook, Instagram, YouTube, all relying on cloud infrastructure to store and
deliver content efficiently.

4. Mobile Applications

 Cloud backends support mobile apps with storage, notifications, and synchronization.

 Example: iCloud, Google Drive, or Firebase for backend-as-a-service.

5. Real-time Analytics

 Applications that process live data streams for immediate insights.

 Used in finance, fraud detection, IoT monitoring, and e-commerce personalization.


Cloud Computing Paradigms
Cloud computing is built on multiple paradigms that determine how services are consumed and
delivered:

1. Infrastructure as a Service (IaaS)

 Provides virtualized computing resources (VMs, storage, networks).

 Users manage OS and applications.

 Example: Amazon EC2, Google Compute Engine.

2. Platform as a Service (PaaS)

 Offers development platforms and tools to build applications.

 Abstracts infrastructure management.

 Example: Google App Engine, Microsoft Azure App Services.

3. Software as a Service (SaaS)

 Delivers software applications over the web.

 Users interact via browsers; no need to install or maintain.

 Example: Google Workspace, Dropbox, Salesforce.

4. Data as a Service (DaaS)

 Provides data on demand to users regardless of organizational boundaries.

 Useful for data marketplaces or data aggregation platforms.

5. Everything as a Service (XaaS)

 A broad concept encompassing all the above.

 Includes security as a service, AI as a service, etc.

Conclusion

Cloud computing enables a wide range of real-world applications by leveraging distinct service
paradigms. These paradigms define the responsibilities of the cloud provider vs. the user and
facilitate scalable, cost-effective, and global computing solutions.

Challenges of Cloud Computing


Cloud computing offers significant advantages, such as scalability, cost-efficiency, and flexibility.
However, it also introduces several technical, operational, and security challenges. Below are the key
challenges discussed in the document:

1. Performance Isolation
 Issue: Shared infrastructure (multi-tenancy) can lead to performance fluctuations.

 Impact: Virtual machines (VMs) may experience unpredictable latency or bandwidth due to
noisy neighbors.

 Example: A VM running a high-priority task may slow down if another VM on the same host
consumes excessive resources.

2. Reliability and Fault Tolerance

 Issue: Large-scale systems are prone to node failures.

 Impact: Applications must handle failures gracefully without data loss.

 Example: In Amazon’s 2012 outage, a power failure in one data center caused cascading
failures due to flawed recovery mechanisms.

3. Security and Privacy

 Issue: Multi-tenancy increases risks like data breaches, insider threats, and unauthorized
access.

 Impact: Sensitive data stored in public clouds may violate compliance regulations (e.g.,
GDPR, HIPAA).

 Example: A malicious insider in a cloud provider could access confidential enterprise data.

4. Data Transfer Bottlenecks

 Issue: Moving large datasets to/from the cloud is slow over standard networks.

 Impact: High-latency WANs (vs. LANs) hinder real-time applications.

 Example: Transferring 1 TB over 1 Mbps takes ~10 days; faster networks (1 Gbps) reduce this
to ~2 hours.

5. Vendor Lock-In

 Issue: Proprietary APIs and services make migration between providers difficult.

 Impact: Businesses risk dependency on a single provider (e.g., AWS, Azure).

 Solution: Standardization efforts (e.g., NIST) aim to improve interoperability.

6. Resource Management & Scalability

 Issue: Dynamically scaling resources while maintaining efficiency is complex.


 Impact: Over-provisioning wastes money; under-provisioning causes performance
degradation.

 Example: Auto-scaling in AWS must balance cost and response time for variable workloads.

7. Software Licensing

 Issue: Traditional per-CPU/core licensing models conflict with cloud elasticity.

 Impact: Costs may spike if licenses don’t align with pay-per-use cloud models.

 Example: Running licensed software on 100 VMs temporarily could incur prohibitive fees.

8. Network Latency & Bandwidth

 Issue: Cloud applications often rely on high-speed interconnects, but WANs introduce delays.

 Impact: Communication-heavy applications (e.g., HPC, MPI-based simulations) suffer.

 Example: EC2’s latency (145 μs) is 70× slower than dedicated supercomputers (e.g., Carver
at 2.1 μs).

9. Dynamic Workflow Management

 Issue: Coordinating interdependent tasks in distributed environments is error-prone.

 Impact: Deadlocks, race conditions, or unreachable states can disrupt workflows.

 Example: In ZooKeeper, consensus protocols (Paxos) are needed to synchronize distributed


tasks.

10. Ethical and Governance Concerns

 Issue: Lack of transparency in data handling and accountability.

 Impact: Users may lose control over their data (e.g., stored in foreign jurisdictions).

 Example: Government cloud initiatives must ensure data sovereignty and auditability.

Architectural Styles of Cloud Computing


The document discusses several architectural styles that define how cloud applications are
structured and interact with distributed systems. These styles influence scalability, fault tolerance,
and interoperability in cloud environments. Below are the key architectural paradigms:

1. Client-Server Model (Stateless Servers)


 Description: Traditional request-response interaction where clients (users or applications)
communicate with stateless servers.

 Characteristics:

o Statelessness: Servers treat each request independently (no session data stored).

o Scalability: Easy to scale horizontally by adding more servers.

o Fault Tolerance: Failures are isolated since no session state is lost.

 Example: Basic HTTP web servers (e.g., REST APIs).

2. Service-Oriented Architecture (SOA)

 Description: Applications are built as loosely coupled services that communicate via
standardized protocols.

 Key Technologies:

o SOAP (Simple Object Access Protocol): XML-based messaging for structured


communication.

o WSDL (Web Services Description Language): Defines service interfaces.

 Use Case: Enterprise systems requiring interoperability (e.g., banking transactions).

3. Representational State Transfer (REST)

 Description: A lightweight alternative to SOAP, using HTTP methods (GET, POST, PUT,
DELETE).

 Advantages:

o Simplicity: Easier to implement than SOAP (no XML overhead).

o Caching: Supports HTTP caching for performance.

o Statelessness: Each request contains all necessary context.

 Example: Cloud storage APIs (e.g., AWS S3, Google Drive).

4. MapReduce (Data-Parallel Processing)

 Description: A programming model for processing large datasets in parallel across


distributed nodes.

 Phases:

1. Map: Splits data into chunks and processes them independently.

2. Reduce: Aggregates results from Map tasks.

 Use Case: Batch processing (e.g., log analysis, genomics in Hadoop).


5. Event-Driven Architecture (EDA)

 Description: Systems react to events (e.g., messages, triggers) rather than requests.

 Components:

o Producers: Generate events (e.g., IoT sensors).

o Consumers: React to events (e.g., serverless functions like AWS Lambda).

 Example: Real-time analytics (e.g., fraud detection in financial systems).

6. Peer-to-Peer (P2P) Systems

 Description: Decentralized networks where nodes (peers) share resources without a central
server.

 Characteristics:

o Resilience: No single point of failure.

o Scalability: Resources grow with the number of peers.

 Example: File-sharing systems (e.g., BitTorrent), blockchain networks.

7. Microservices Architecture

 Description: Applications are decomposed into small, independent services.

 Benefits:

o Modularity: Easier to update and scale individual components.

o Fault Isolation: A failure in one service doesn’t crash the entire system.

 Example: Netflix’s cloud-native streaming platform.

8. Workflow-Based Systems

 Description: Orchestrates multi-step processes with dependencies.

 Patterns:

o Sequential: Tasks run one after another.

o Parallel: Tasks execute concurrently (e.g., AND-split in workflows).

 Tools: AWS Step Functions, Apache Airflow.

9. State Machine Coordination (ZooKeeper)


 Description: Ensures consistency in distributed systems using a replicated state machine.

 Use Case: Leader election, configuration management (e.g., Kafka for messaging).

Comparison of Architectural Styles

Scalabilit
Style Fault Tolerance Use Case
y

Client-Server High Moderate Web applications, APIs

SOA Moderate High Enterprise integration

REST High High Cloud APIs, mobile apps

MapReduce Very High High Big data processing

Event-Driven High High Real-time systems

P2P Extreme Extreme Decentralized apps

Microservices High High Cloud-native apps

Workflows Moderate Moderate Batch processing, automation

Workflows: Coordination of multiple activities


What is a Workflow?

A workflow is a structured sequence of interconnected tasks or activities designed to achieve a


specific goal in a systematic manner. In cloud computing, workflows coordinate multiple
interdependent tasks across distributed systems, ensuring efficient execution of complex
applications. Workflows are essential for:

 Orchestrating tasks: Managing dependencies (e.g., Task B runs only after Task A completes).

 Resource allocation: Assigning compute/storage resources dynamically.

 Handling exceptions: Addressing failures or unexpected events during execution.

Workflows are modeled using directed activity graphs or workflow description languages (WFDL),
resembling flowcharts with tasks as nodes and dependencies as edges.
Lifecycle of a Workflow

The lifecycle of a workflow consists of four primary phases, analogous to the lifecycle of a traditional
computer program (see Figure 4.1 in the document):

1. Creation

 Objective: Define the workflow’s purpose and scope.

 Activities:

o Identify tasks (e.g., data processing, computation, storage operations).

o Specify task attributes:

 Preconditions: Conditions required to start a task (e.g., "File X must exist").

 Post-conditions: Outcomes after task completion (e.g., "Output file Y is


generated").

 Exceptions: Handling errors (e.g., retry or notify).


o Example: A scientific workflow might include tasks like "Load dataset," "Run
simulation," and "Save results."

2. Definition

 Objective: Formalize the workflow structure.

 Activities:

o Use Workflow Description Language (WFDL) to encode tasks and dependencies.

o Define control flow patterns (e.g., sequential, parallel, conditional branching).

 Basic Patterns (Figure 4.3):

 Sequence: Tasks run one after another (A → B → C).

 AND Split: Concurrent execution (A → [B, C]).

 XOR Split: Conditional branching (A → B OR A → C).

 Advanced Patterns: Synchronization, loops, or dynamic task addition.

o Example: A batch processing workflow might use an AND split to process data
chunks in parallel.

3. Verification

 Objective: Ensure correctness and robustness.

 Activities:

o Safety: Verify "nothing bad happens" (e.g., no deadlocks).

o Liveness: Ensure "something good eventually happens" (e.g., tasks terminate).

o Check for:

 Deadlocks: Circular dependencies (e.g., Task A waits for Task B, which waits
for Task A).

 Unreachable tasks: Tasks that cannot execute due to unmet preconditions.

o Example: In Figure 4.2(a), if Task D is chosen after B, Task F never runs, violating
liveness.

4. Enactment

 Objective: Execute the workflow.

 Activities:

o Static Workflows: Fixed structure; tasks run as predefined.

o Dynamic Workflows: Modify tasks/dependencies at runtime (e.g., add recovery


tasks after failure).

o Coordination Models:
 Strong Coordination: Centralized control (e.g., a master process monitors
tasks).

 Weak Coordination: Decentralized (e.g., tasks communicate via shared


storage like ZooKeeper).

o Monitoring: Track progress and handle exceptions (e.g., retry failed tasks).

o Example: The GrepTheWeb application (Section 4.7) uses queues and controllers to
manage tasks like launching EC2 instances and merging results.

The basic workflow patterns illustrated in Figure 4.3 are:

• The sequence pattern occurs when several tasks have to be scheduled one after the completion of
the other [see Figure 4.3(a)].

• The AND split pattern requires several tasks to be executed concurrently. Both tasks B and C are
activated when task A terminates [see Figure 4.3(b)].
In case of an explicit AND split, the activity graph has a routing node and all activities connected to
the routing node are activated as soon as the flow of control reaches the routing node. In the case of
an implicit AND split, activities are connected directly and conditions can be associated with
branches linking an activity with the next ones. Only when the conditions associated with a branch
are true are the tasks activated.

• The synchronization pattern requires several concurrent activities to terminate before an activity
can start. In our example, task C can only start after both tasks A and B terminate [see Figure 4.3(c)].

• The XOR split requires a decision; after the completion of task A, either B or C can be activated[see
Figure (d)]

Workflow Coordination Based on a State Machine Model: ZooKeeper


ZooKeeper is a distributed coordination service designed to manage workflows and synchronization
in large-scale cloud systems. It follows a state machine model, ensuring consistency across
distributed nodes. Below is a detailed breakdown of its architecture and functionality:

1. Core Concepts

(A) State Machine Model

 ZooKeeper treats each distributed process as a deterministic finite state machine (FSM).

 Commands (e.g., read/write operations) trigger state transitions.

 All replicas must execute the same sequence of commands to maintain consistency.

(B) Consensus Protocol (Paxos/ZAB)

 Uses atomic broadcast to ensure all nodes agree on state changes.

 Leader election: A single leader coordinates updates.

 Fault tolerance: Works as long as a majority of nodes are alive.

2. ZooKeeper Architecture

(A) Components

1. Servers (Ensemble)

o A cluster of replicated ZooKeeper servers.

o One leader handles writes; followers handle reads.

2. Clients

o Connect to any server via TCP.

o Read requests are served locally; writes are forwarded to the leader.
3. Znodes (Data Nodes)

o Similar to filesystem inodes but store state metadata (e.g., configuration, locks).

o Two types:

 Ephemeral: Exist only for a session (e.g., for temporary locks).

 Persistent: Survive session termination.

(B) Data Model (Hierarchical Namespace)

 Tree-like structure (e.g., /services/service1/node1).

 Each znode stores:

o Data (small, <1MB).

o Version numbers (for optimistic locking).

o ACLs (access control lists).

3. Workflow Coordination Mechanisms

(A) Leader-Follower Coordination

1. Write Request Handling:

o Client sends a write to any server.

o Follower forwards it to the leader.

o Leader uses atomic broadcast to synchronize updates across followers.

2. Read Request Handling:

o Served directly by any server (no consensus needed).

(B) Use Cases in Cloud Workflows

1. Distributed Locking

o Ensures only one process accesses a resource at a time.

o Example: Preventing concurrent updates to a shared database.

2. Configuration Management

o Centralized storage for cluster-wide settings.

3. Leader Election

o Used in systems like Kafka to assign partition leaders.

4. Service Discovery

o Tracks active instances in microservices (e.g., /services/web/instance1).


The ZooKeeper service guarantees:

1. Atomicity - A transaction either completes or fails.


2. Sequential consistency of updates - Updates are applied strictly in the order in which they are
received.
3. Single system image for the clients - A client receives the same response regardless of the server it
connects to.
4. Persistence of updates - Once applied, an update persists until it is overwritten by a client.
5. Reliability - The system is guaranteed to function correctly as long as the majority of servers
function correctly.

The MapReduce Programming Model

MapReduce is a distributed data processing model designed for large-scale parallel computation in
cloud environments.
It simplifies batch processing by breaking tasks into two phases:
Map (data splitting and processing) and
Reduce (aggregation of results). Below is a detailed explanation based on the document.

1. Core Principles
1. Divide-and-Conquer Approach
o Splits large datasets into smaller chunks processed in parallel.
o Inspired by functional programming (map and reduce operations in LISP).
2. Key-Value Pair Processing
o Input & output data are structured as <key, value> pairs.
3. Fault Tolerance
o Automatically handles node failures by re-executing tasks.

2. Phases of MapReduce
(A) Map Phase
 Input: A set of <key, value> pairs (e.g., <document_name, text>).
 Process:
o Each Map task processes a split of the input data.
o Applies a user-defined map() function to emit intermediate <k, v> pairs.
 Example:
Python code
# Input: <doc1, "hello world hello">
def map(key, value):
for word in value.split():
emit(word, 1) # Output: [<"hello", 1>, <"world", 1>, <"hello", 1>]

(B) Shuffle & Sort Phase


 Intermediate Step:
o Groups all values by their keys (e.g., "hello" → [1, 1]).
o Sorts keys to optimize Reduce phase.
(C) Reduce Phase
 Input: Intermediate <key, list(values)> pairs (e.g., "hello" → [1, 1]).
 Process:
o Applies a user-defined reduce() function to aggregate results.
 Example:
Python code
# Input: <"hello", [1, 1]>
def reduce(key, values):
emit(key, sum(values)) # Output: <"hello", 2>

3. Execution Workflow
1. Input Splitting:
o Data is divided into 16–64 MB chunks (e.g., in HDFS).
2. Master-Worker Model:
o Master Node: Assigns tasks to workers and tracks progress.
o Worker Nodes: Run Map/Reduce tasks.
3. Data Flow:
o Map Workers: Read input splits → emit intermediate data to local disk.
o Reduce Workers: Fetch intermediate data → aggregate → write final output.

4. Fault Tolerance
 Task Retries: Failed tasks are re-executed on other nodes.
 Checkpointing: Master periodically saves state to recover from failures.

5. Example Applications
1. Word Count (Classic Example)
o Counts word frequencies in documents.
2. Distributed Sort
o Sorts large datasets (e.g., Google’s web indexing).
3. Log Analysis
o Processes server logs to detect trends (e.g., AWS GrepTheWeb).

When a user program invokes the MapReduce function, the following sequence of actions take place
(see Figure 4.6):
1. The run-time library splits the input files into M splits of 16 to 64 MB each, identifies a number N
of systems to run, and starts multiple copies of the program, one of the system being a master and
the others workers. The master assigns to each idle system either a Map or a Reduce task. The
master makes O(M+R) scheduling decisions and keeps O(M×R)worker state vectors in memory. These
considerations limit the size of M and R; at the same time, efficiency considerations require that M,R
N.
2.A worker being assigned a Map task reads the corresponding input split, parses pairs, and passes
each pair to a user-defined Map function. The intermediate pairs produced by the Map function are
buffered in memory before being written to a local disk and partitioned into R regions by the
partitioning function.
3. The locations of these buffered pairs on the local disk are passed back to the master, who is
responsible for forwarding these locations to the Reduce workers. A Reduce worker uses remote
procedure calls to read the buffered data from the local disks of the Map workers; after reading all
the
intermediate data, it sorts it by the intermediate keys. For each unique intermediate key, the key and
the corresponding set of intermediate values are passed to a user-defined Reduce function. The
output of the Reduce function is appended to a final output file.
4. When all Map and Reduce tasks have been completed, the master wakes up the user
program

Cloud Computing for Biology Research


Cloud computing has become a game-changer in biology, enabling large-scale data
processing, genomic analysis, and collaborative research. Below is a structured breakdown of
its applications, challenges, and real-world implementations, as discussed in the document.

1. Key Applications in Biology


(A) Genomic Sequencing & Alignment
 Problem: Processing massive DNA/protein datasets (e.g., NCBI’s 10 million sequences).
 Cloud Solution:
o AzureBLAST (Section 4.10): Ran on 475 Azure VMs to compare protein sequences,
completing a 6–7 CPU-year task in 14 days.
o Parallelization: Divides sequences into chunks processed concurrently.
(B) Molecular Dynamics Simulations
 Problem: Simulating protein folding or drug interactions requires immense computational
power.
 Cloud Solution:
o Elastic Clusters: Spin up 100s of VMs on-demand (e.g., AWS EC2).
o Example: Folding@home uses volunteer clouds for COVID-19 research.
(C) Metadata Management & Data Discovery
 Problem: Extracting insights from unstructured biomedical data (e.g., MRI scans, lab results).
 Cloud Solution:
o Glear System (Section 4.8): Uses MapReduce to auto-generate metadata for
scientific datasets.

2. Architectural Approaches
(A) Workflow Automation
 Tools: Apache Airflow, Nextflow.
 Use Case:
o Cirrus (Section 4.10): A cloud platform for legacy biology apps, orchestrating tasks
like:
 Data Preprocessing → Alignment → Analysis.
(B) Distributed Data Storage
 Systems:
o Hadoop HDFS: Stores genomic data (e.g., FASTQ files).
o Amazon S3/Google Cloud Storage: Hosts public datasets (e.g., 1000 Genomes
Project).
(C) Hybrid Cloud for Sensitive Data
 Challenge: Privacy laws (e.g., HIPAA) restrict genomic data movement.
 Solution:
o Private Cloud: Stores raw patient data.
o Public Cloud: Runs anonymized analyses (e.g., AWS GovCloud).

3. Challenges & Solutions

Challenge Cloud-Based Solution

Data
Distributed storage (S3, HDFS) + compression.
Volume (Petabytes)

Compute Cost Spot instances (AWS) for cost-efficient batch jobs.

Managed services (e.g., AWS Batch, Google Life


Workflow Complexity
Sciences).

Fault Tolerance Checkpointing (e.g., Apache Spark RDDs).

4. Real-World Case Studies


(A) AzureBLAST on Microsoft Azure
 Goal: Compare 10 million protein sequences.
 Execution:
o Used 3,700 VM instances across 3 data centers.
o Lesson: Optimized parameter tuning to avoid redundant computations.
(B) DNAnexus Platform
 Function: Cloud-based genomic analysis with HIPAA compliance.
 Tech Stack: AWS + Kubernetes for scalable pipelines.

5. Future Directions
 AI/ML Integration: Train models on genomic data (e.g., Google DeepVariant).
 Federated Learning: Analyze data across hospitals without centralizing it.
 Quantum Computing: Accelerate protein-folding simulations (e.g., Google’s Quantum AI).

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy