0% found this document useful (0 votes)

31 views27 pages

CC Unit - 5

Uploaded by

harshitamakhija100

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views27 pages

CC Unit - 5

Uploaded by

harshitamakhija100

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 27

CC UNIT – 5

Cloud Technologies and Advancements

Introduction to Hadoop

Hadoop is an open-source software framework that is used for storing and

processing large amounts of data in a distributed computing environment. It
is designed to handle big data and is based on the MapReduce programming
model, which allows for the parallel processing of large datasets.
What is Hadoop?
Hadoop is an open source software programming framework for storing a
large amount of data and performing the computation. Its framework is based
on Java programming with some native code in C and shell scripts.
Hadoop is an open-source software framework that is used for storing and
processing large amounts of data in a distributed computing environment. It
is designed to handle big data and is based on the MapReduce programming
model, which allows for the parallel processing of large datasets.
Hadoop has two main components:
 HDFS (Hadoop Distributed File System): This is the storage component
of Hadoop, which allows for the storage of large amounts of data across
multiple machines. It is designed to work with commodity hardware, which
makes it cost-effective.
 YARN (Yet Another Resource Negotiator): This is the resource
management component of Hadoop, which manages the allocation of
resources (such as CPU and memory) for processing the data stored in
HDFS.
 Hadoop also includes several additional modules that provide additional
functionality, such as Hive (a SQL-like query language), Pig (a high-level
platform for creating MapReduce programs), and HBase (a non-relational,
distributed database).
 Hadoop is commonly used in big data scenarios such as data
warehousing, business intelligence, and machine learning. It’s also used
for data processing, data analysis, and data mining. It enables the
distributed processing of large data sets across clusters of computers
using a simple programming model.

History of Hadoop
Apache Software Foundation is the developers of Hadoop, and it’s co-
founders are Doug Cutting and Mike Cafarella. It’s co-founder Doug
Cutting named it on his son’s toy elephant. In October 2003 the first paper
release was Google File System. In January 2006, MapReduce development
started on the Apache Nutch which consisted of around 6000 lines coding for
it and around 5000 lines coding for HDFS. In April 2006 Hadoop 0.1.0 was
released.
Hadoop is an open-source software framework for storing and processing big
data. It was created by Apache Software Foundation in 2006, based on a
white paper written by Google in 2003 that described the Google File System
(GFS) and the MapReduce programming model. The Hadoop framework
allows for the distributed processing of large data sets across clusters of
computers using simple programming models. It is designed to scale up from
single servers to thousands of machines, each offering local computation
and storage. It is used by many organizations, including Yahoo, Facebook,
and IBM, for a variety of purposes such as data warehousing, log
processing, and research. Hadoop has been widely adopted in the industry
and has become a key technology for big data processing.
Features of hadoop:
1. it is fault tolerance.
2. it is highly available.
3. it’s programming is easy.
4. it have huge flexible storage.
5. it is low cost.

Hadoop has several key features that make it well-suited for big
data processing:

 Distributed Storage: Hadoop stores large data sets across multiple

machines, allowing for the storage and processing of extremely large
amounts of data.
 Scalability: Hadoop can scale from a single server to thousands of
machines, making it easy to add more capacity as needed.
 Fault-Tolerance: Hadoop is designed to be highly fault-tolerant, meaning
it can continue to operate even in the presence of hardware failures.
 Data locality: Hadoop provides data locality feature, where the data is
stored on the same node where it will be processed, this feature helps to
reduce the network traffic and improve the performance
 High Availability: Hadoop provides High Availability feature, which helps
to make sure that the data is always available and is not lost.
 Flexible Data Processing: Hadoop’s MapReduce programming model
allows for the processing of data in a distributed fashion, making it easy to
implement a wide variety of data processing tasks.
 Data Integrity: Hadoop provides built-in checksum feature, which helps to
ensure that the data stored is consistent and correct.
 Data Replication: Hadoop provides data replication feature, which helps
to replicate the data across the cluster for fault tolerance.
 Data Compression: Hadoop provides built-in data compression feature,
which helps to reduce the storage space and improve the performance.
 YARN: A resource management platform that allows multiple data
processing engines like real-time streaming, batch processing, and
interactive SQL, to run and process data stored in HDFS.

Hadoop Distributed File System

It has distributed file system known as HDFS and this HDFS splits files into
blocks and sends them across various nodes in form of large clusters. Also
in case of a node failure, the system operates and data transfer takes place
between the nodes which are facilitated by HDFS.

Advantages of HDFS: It is inexpensive, immutable in nature, stores data

reliably, ability to tolerate faults, scalable, block structured, can process a
large amount of data simultaneously and many more. Disadvantages of
HDFS: It’s the biggest disadvantage is that it is not fit for small quantities of
data. Also, it has issues related to potential stability, restrictive and rough in
nature. Hadoop also supports a wide range of software packages such as
Apache Flumes, Apache Oozie, Apache HBase, Apache Sqoop, Apache
Spark, Apache Storm, Apache Pig, Apache Hive, Apache Phoenix, Cloudera
Impala.
Some common frameworks of Hadoop
1. Hive- It uses HiveQl for data structuring and for writing complicated
MapReduce in HDFS.
2. Drill- It consists of user-defined functions and is used for data exploration.
3. Storm- It allows real-time processing and streaming of data.
4. Spark- It contains a Machine Learning Library(MLlib) for providing
enhanced machine learning and is widely used for data processing. It also
supports Java, Python, and Scala.
5. Pig- It has Pig Latin, a SQL-Like language and performs data
transformation of unstructured data.
6. Tez- It reduces the complexities of Hive and Pig and helps in the running
of their codes faster.
Hadoop framework is made up of the following modules:
1. Hadoop MapReduce- a MapReduce programming model for handling and
processing large data.
2. Hadoop Distributed File System- distributed files in clusters among nodes.
3. Hadoop YARN- a platform which manages computing resources.
4. Hadoop Common- it contains packages and libraries which are used for
other modules.

Advantages and Disadvantages of Hadoop

Advantages:
 Ability to store a large amount of data.
 High flexibility.
 Cost effective.
 High computational power.
 Tasks are independent.
 Linear scaling.

Hadoop has several advantages that make it a popular choice for

big data processing:

 Scalability: Hadoop can easily scale to handle large amounts of data by

adding more nodes to the cluster.
 Cost-effective: Hadoop is designed to work with commodity hardware,
which makes it a cost-effective option for storing and processing large
amounts of data.
 Fault-tolerance: Hadoop’s distributed architecture provides built-in fault-
tolerance, which means that if one node in the cluster goes down, the
data can still be processed by the other nodes.
 Flexibility: Hadoop can process structured, semi-structured, and
unstructured data, which makes it a versatile option for a wide range of
big data scenarios.
 Open-source: Hadoop is open-source software, which means that it is
free to use and modify. This also allows developers to access the source
code and make improvements or add new features.
 Large community: Hadoop has a large and active community of
developers and users who contribute to the development of the software,
provide support, and share best practices.
 Integration: Hadoop is designed to work with other big data technologies
such as Spark, Storm, and Flink, which allows for integration with a wide
range of data processing and analysis tools.
Disadvantages:
 Not very effective for small data.
 Hard cluster management.
 Has stability issues.
 Security concerns.
 Complexity: Hadoop can be complex to set up and maintain, especially
for organizations without a dedicated team of experts.
 Latency: Hadoop is not well-suited for low-latency workloads and may not
be the best choice for real-time data processing.
 Limited Support for Real-time Processing: Hadoop’s batch-oriented
nature makes it less suited for real-time streaming or interactive data
processing use cases.
 Limited Support for Structured Data: Hadoop is designed to work with
unstructured and semi-structured data, it is not well-suited for structured
data processing
 Data Security: Hadoop does not provide built-in security features such as
data encryption or user authentication, which can make it difficult to
secure sensitive data.
 Limited Support for Ad-hoc Queries: Hadoop’s MapReduce programming
model is not well-suited for ad-hoc queries, making it difficult to perform
exploratory data analysis.
 Limited Support for Graph and Machine Learning: Hadoop’s core
component HDFS and MapReduce are not well-suited for graph and
machine learning workloads, specialized components like Apache Graph
and Mahout are available but have some limitations.
 Cost: Hadoop can be expensive to set up and maintain, especially for
organizations with large amounts of data.
 Data Loss: In the event of a hardware failure, the data stored in a single
node may be lost permanently.
 Data Governance: Data Governance is a critical aspect of data
management, Hadoop does not provide a built-in feature to manage data
lineage, data quality, data cataloging, data lineage, and data audit.

What is MapReduce?

MapReduce is a processing technique and a program model for

distributed computing based on java. The MapReduce algorithm
contains two important tasks, namely Map and Reduce. Map
takes a set of data and converts it into another set of data, where
individual elements are broken down into tuples (key/value
pairs). Secondly, reduce task, which takes the output from a map
as an input and combines those data tuples into a smaller set of
tuples. As the sequence of the name MapReduce implies, the
reduce task is always performed after the map job.
The major advantage of MapReduce is that it is easy to scale data
processing over multiple computing nodes. Under the MapReduce
model, the data processing primitives are called mappers and
reducers. Decomposing a data processing application
into mappers and reducers is sometimes nontrivial. But, once we
write an application in the MapReduce form, scaling the
application to run over hundreds, thousands, or even tens of
thousands of machines in a cluster is merely a configuration
change. This simple scalability is what has attracted many
programmers to use the MapReduce model.
The Algorithm
 Generally MapReduce paradigm is based on sending the
computer to where the data resides!
 MapReduce program executes in three stages, namely map
stage, shuffle stage, and reduce stage.
o Map stage − The map or mapper’s job is to process the
input data. Generally the input data is in the form of
file or directory and is stored in the Hadoop file system
(HDFS). The input file is passed to the mapper function
line by line. The mapper processes the data and
creates several small chunks of data.
o Reduce stage − This stage is the combination of
the Shuffle stage and the Reduce stage. The Reducer’s
job is to process the data that comes from the mapper.
After processing, it produces a new set of output,
which will be stored in the HDFS.
 During a MapReduce job, Hadoop sends the Map and Reduce
tasks to the appropriate servers in the cluster.
 The framework manages all the details of data-passing such
as issuing tasks, verifying task completion, and copying data
around the cluster between the nodes.
 Most of the computing takes place on nodes with data on
local disks that reduces the network traffic.
 After completion of the given tasks, the cluster collects and
reduces the data to form an appropriate result, and sends it
back to the Hadoop server.
Inputs and Outputs (Java Perspective)

The MapReduce framework operates on <key, value> pairs, that

is, the framework views the input to the job as a set of <key,
value> pairs and produces a set of <key, value> pairs as the
output of the job, conceivably of different types.

The key and the value classes should be in serialized manner by

the framework and hence, need to implement the Writable
interface. Additionally, the key classes have to implement the
Writable-Comparable interface to facilitate sorting by the
framework. Input and Output types of a MapReduce job − (Input)
<k1, v1> → map → <k2, v2> → reduce → <k3, v3>(Output).
Input Output

Map <k1, v1> list (<k2, v2>)

Reduce <k2, list(v2)> list (<k3, v3>)

Terminology
 PayLoad − Applications implement the Map and the Reduce
functions, and form the core of the job.
 Mapper − Mapper maps the input key/value pairs to a set of
intermediate key/value pair.
 NamedNode − Node that manages the Hadoop Distributed File
System (HDFS).
 DataNode − Node where data is presented in advance before
any processing takes place.
 MasterNode − Node where JobTracker runs and which accepts
job requests from clients.
 SlaveNode − Node where Map and Reduce program runs.
 JobTracker − Schedules jobs and tracks the assign jobs to
Task tracker.
 Task Tracker − Tracks the task and reports status to
JobTracker.
 Job − A program is an execution of a Mapper and Reducer
across a dataset.
 Task − An execution of a Mapper or a Reducer on a slice of
data.
 Task Attempt − A particular instance of an attempt to execute
a task on a SlaveNode.

How to Interact with MapReduce Jobs

Usage − hadoop job [GENERIC_OPTIONS]

The following are the Generic Options available in a Hadoop job.

Sr.No
GENERIC_OPTION & Description
.

1 -submit <job-file>
Submits the job.

2 -status <job-id>
Prints the map and reduce completion percentage and all job counters.

3 -counter <job-id> <group-name> <countername>

Prints the counter value.

4 -kill <job-id>
Kills the job.

5 -events <job-id> <fromevent-#> <#-of-events>

Prints the events' details received by jobtracker for the given range.

-history [all] <jobOutputDir> - history < jobOutputDir>

6 Prints job details, failed and killed tip details. More details about the job
such as successful tasks and task attempts made for each task can be
viewed by specifying the [all] option.

7 -list[all]
Displays all jobs. -list displays only jobs which are yet to complete.

8 -kill-task <task-id>
Kills the task. Killed tasks are NOT counted against failed attempts.
9 -fail-task <task-id>
Fails the task. Failed tasks are counted against failed attempts.

-set-priority <job-id> <priority>

10
Changes the priority of the job. Allowed priority values are VERY_HIGH,
HIGH, NORMAL, LOW, VERY_LOW

To see the status of job

$ $HADOOP_HOME/bin/hadoop job -status <JOB-ID>
e.g.
$ $HADOOP_HOME/bin/hadoop job -status job_201310191043_0004

To see the history of job output-dir

$ $HADOOP_HOME/bin/hadoop job -history <DIR-NAME>
e.g.
$ $HADOOP_HOME/bin/hadoop job -history /user/expert/output

To kill the job

$ $HADOOP_HOME/bin/hadoop job -kill <JOB-ID>
e.g.
$ $HADOOP_HOME/bin/hadoop job -kill job_201310191043_0004

What is VirtualBox?
VirtualBox is a free and open-source software program for virtualizing
the x86 computing architecture. Oracle Corporation developed it. It works as a
hypervisor and develops a Virtual Machine where the user can run another operating
system. The "host" OS is the operating system where VirtualBox runs.
The "guest" OS is the operating system running on the Virtual Machine. As the host
OS, VirtualBox supports Windows, Linux, Solaris, Open Solaris, and MacOS.

When setting up a virtual machine, the user can determine how many processor
cores and how much RAM and disc space would be devoted to the VM. When the
VM is running, it may be "paused".

History of VirtualBox
Innotek GmbH originally developed it. VirtualBox was released as an open-source
software package on January 17, 2007. Sun Microsystems later purchased the
company. Oracle Corporation bought Sun on January 27, 2010, and took over
VirtualBox production.
Features of VirtualBox
There are various features of VirtualBox. Some of the essential features are as follows:

Portability

VirtualBox is a Type 2 hypervisor cross-platform OS, which means a Virtual Machine

built on a single host would be run on other hosts. Guest Virtual Machine could be
imported and exported according to its requirement using the Open Virtualization
Format (OVF).

Guest Addition

These are the collections of tools installed on the Guest OS to optimize its
performance and offer extra host system integration and communication.

VM Groups

VirtualBox offers group functionality. This functionality allows the user to individually
and collectively organize virtual machines. Generally, it is possible to apply
operations such as start, pause, close, reset, shutdown, save state, power off, and so
on to VM classes such as individual VMs.

Hardware Support

VirtualBox supports the Guest SMP, multiscreen resolution, built-in iSCSI

support, USB devices, full ACPI support, and PXE network boot.

Snapshot

VirtualBox provides the guest with VM state details with the save snapshot function.
In time, we can go back and get the virtual machine back.

How the VirtualBox Works?

Oracle VirtualBox is scalable and bendy. In theory, software program virtualization
isn't overly complex. We can run many operating systems on top of VirtualBox. Each
operating system can be independently started, paused, and stopped.

The hypervisor is applied as a Ring zero kernel service. The kernel includes a device
driver tool known as vboxsrv. This device driver manages the tasks or activities,
including loading hypervisor modules for functionality, allocating physical memory to
the digital visitor machine, and saving and restoring the visitor technique's context.
Whenever any interruption occurs, we can use another OS to start execution and
identifying while the VT-x or AMD-V activities want to be handled.

The user itself manages its OS scheduling throughout its execution. The user
operates on the host system as a single process and scheduled via a host. Apart from
this, there are extra device drivers present while the user permits the OS to access
resources like disks, community controllers, and different devices. Besides kernel
modules, other processes are running on the host that help the running guest. The
VBoxSVC process begins automatically in the background when a guest VM is
booted from the VirtualBox GUI.

What is Google App Engine (GAE)?

A scalable runtime environment, Google App Engine is mostly used to run

Web applications. These dynamic scales as demand change over time
because of Google’s vast computing infrastructure. Because it offers a
secure execution environment in addition to a number of services, App
Engine makes it easier to develop scalable and high-performance Web apps.
Google’s applications will scale up and down in response to shifting demand.
Croon tasks, communications, scalable data stores, work queues, and in-
memory caching are some of these services.
The App Engine SDK facilitates the testing and professionalization of
applications by emulating the production runtime environment and allowing
developers to design and test applications on their own PCs. When an
application is finished being produced, developers can quickly migrate it to
App Engine, put in place quotas to control the cost that is generated, and
make the programmer available to everyone. Python, Java, and Go are
among the languages that are currently supported.
The development and hosting platform Google App Engine, which powers
anything from web programming for huge enterprises to mobile apps, uses
the same infrastructure as Google’s large-scale internet services. It is a fully
managed PaaS (platform as a service) cloud computing platform that uses
in-built services to run your apps. You can start creating almost immediately
after receiving the software development kit (SDK). You may immediately
access the Google app developer’s manual once you’ve chosen the
language you wish to use to build your app.
After creating a Cloud account, you may Start Building your App
 Using the Go template/HTML package
 Python-based webapp2 with Jinja2
 PHP and Cloud SQL
 using Java’s Maven
The app engine runs the programmers on various servers while “sandboxing”
them. The app engine allows the program to use more resources in order to
handle increased demands. The app engine powers programs like
Snapchat, Rovio, and Khan Academy.
Features of App Engine

Runtimes and Languages

To create an application for an app engine, you can use Go, Java, PHP, or
Python. You can develop and test an app locally using the SDK’s
deployment toolkit. Each language’s SDK and nun time are unique. Your
program is run in a:
 Java Run Time Environment version 7
 Python Run Time environment version 2.7
 PHP runtime’s PHP 5.4 environment
 Go runtime 1.2 environment

Generally Usable Features

These are protected by the service-level agreement and depreciation policy

of the app engine. The implementation of such a feature is often stable, and
any changes made to it are backward-compatible. These include
communications, process management, computing, data storage, retrieval,
and search, as well as app configuration and management. Features like the
HRD migration tool, Google Cloud SQL, logs, datastore, dedicated
Memcached, blob store, Memcached, and search are included in the
categories of data storage, retrieval, and search.

Features in Preview

In a later iteration of the app engine, these functions will undoubtedly be

made broadly accessible. However, because they are in the preview, their
implementation may change in ways that are backward-incompatible.
Sockets, MapReduce, and the Google Cloud Storage Client Library are a few
of them.

Experimental Features

These might or might not be made broadly accessible in the next app engine
updates. They might be changed in ways that are irreconcilable with the
past. The “trusted tester” features, however, are only accessible to a limited
user base and require registration in order to utilize them. The experimental
features include Prospective Search, Page Speed, OpenID,
Restore/Backup/Datastore Admin, Task Queue Tagging, MapReduce, and
Task Queue REST API. App metrics analytics, datastore
admin/backup/restore, task queue tagging, MapReduce, task queue REST
API, OAuth, prospective search, OpenID, and Page Speed are some of the
experimental features.

Third-Party Services

As Google provides documentation and helper libraries to expand the

capabilities of the app engine platform, your app can perform tasks that are
not built into the core product you are familiar with as app engine. To do this,
Google collaborates with other organizations. Along with the helper libraries,
the partners frequently provide exclusive deals to app engine users.
Advantages of Google App Engine
The Google App Engine has a lot of benefits that can help you advance your
app ideas. This comprises:
1. Infrastructure for Security: The Internet infrastructure that Google uses
is arguably the safest in the entire world. Since the application data and
code are hosted on extremely secure servers, there has rarely been any
kind of illegal access to date.
2. Faster Time to Market: For every organization, getting a product or
service to market quickly is crucial. When it comes to quickly releasing the
product, encouraging the development and maintenance of an app is
essential. A firm can grow swiftly with Google Cloud App Engine’s
assistance.
3. Quick to Start: You don’t need to spend a lot of time prototyping or
deploying the app to users because there is no hardware or product to
buy and maintain.
4. Easy to Use: The tools that you need to create, test, launch, and update
the applications are included in Google App Engine (GAE).
5. Rich set of APIs & Services: A number of built-in APIs and services in
Google App Engine enable developers to create strong, feature-rich apps.
6. Scalability: This is one of the deciding variables for the success of any
software. When using the Google app engine to construct apps, you may
access technologies like GFS, Big Table, and others that Google uses to
build its own apps.
7. Performance and Reliability: Among international brands, Google ranks
among the top ones. Therefore, you must bear that in mind while talking
about performance and reliability.
8. Cost Savings: To administer your servers, you don’t need to employ
engineers or even do it yourself. The money you save might be put
toward developing other areas of your company.
9. Platform Independence: Since the app engine platform only has a few
dependencies, you can easily relocate all of your data to another
environment.
Environments for Google App Engine

App Engine standard environment

The foundation of the App Engine standard environment is a set of container

instances running on Google's network. One of the many runtime options is
predefined in containers.

Building and deploying an application that functions dependably even under

high demand and with a lot of data is simple with the standard environment.
Applications are run in a safe, sandboxed environment that enables the
standard environment to scale servers to handle increased traffic and divide
requests among several servers. Your application operates in a safe,
dependable environment separate from the server's hardware, operating
system, and physical location.

Standard environment languages and runtimes

The default environment supports the following languages:

Python
Java
Node.js
PHP
Ruby
Go
Instance classes
Each instance's memory and CPU allocations, as well as the amount of free
quota and the cost per hour after your program uses up the free quota, are
determined by the instance class.

Runtime generation affects the RAM restrictions. The memory cap applies to
all runtime generations and considers both the memory your program requires
and the memory the runtime needs to function. The Java runtimes consume
more memory when running your app than other runtimes. Utilize the instance
class property in your app.yaml file to override the default instance class.
Quotas and limits

You can enable premium applications to augment the 1 GB of data storage and
traffic provided to you for free in the default setting. To ensure the system's
stability, some features do, however, impose restrictions that are unrelated to
quotas.

App Engine standard environment runtimes

There are two generations of runtime environments in the App Engine

standard environment. The second-generation runtimes greatly enhance the
possibilities of App Engine. It can also do away with some of their restrictions.
The similarities and differences between the first- and second-generation
runtimes are discussed on this page.

App Engine first-generation runtime support

There are no plans to discontinue any other App Engine first-generation

runtimes. The second-generation runtimes represent App Engine's future
development; however, users of the first-generation runtimes will still get
support and necessary system updates.

Similarities between first- and second-generation runtimes

Scale-up time in response to traffic spikes is almost immediate.

Applications are created utilizing a standardized build procedure.

identical SLA for GA services

support for same gcloud commands and a similar GCP terminal interface

Free tier
Long-term support for legacy runtimes

Language versions no longer updated by open source communities are

supported by legacy runtimes. The following legacy runtimes will receive long-
term support in the App Engine standard environment from Google because
many App Engine customers still rely on these language versions:

Python 2.7
Java 8
Go 1.11
PHP 5.5

Commitment
In keeping with our more than the ten-year tradition of supporting your apps
as you advance into the future at your own pace, Google is dedicated to
offering long-term support for these runtimes.

We will gradually implement the following adjustments to sustain these

runtimes in the future:

Restore the runtimes to their open-source, unforked states as much as

feasible. To safely operate your programs in our data centers, we have to
restrict and change several of the runtimes severely. We can provide a long-
term secure, scalable environment for these runtimes by altering how we
operate the runtimes in our data centers.

Include comprehensive build systems that handle asset repositories, native

component builds, and package repositories.

Google might have to deprecate some of the updates that you are currently
using APIs or development tools.

Security updates
Your software may be vulnerable to flaws for which no publicly available patch
exists as communities discontinue supporting versions of their languages. As a
result, switching to a runtime with a supported language is safer than
continuing to run your program on some App Engine runtimes.

Introduction to OpenStack

It is a free open standard cloud computing platform that first came into
existence on July 21′ 2010. It was a joint project of Rackspace Hosting and
NASA to make cloud computing more ubiquitous in nature. It is deployed as
Infrastructure-as-a-service(IaaS) in both public and private clouds where
virtual resources are made available to the users. The software platform
contains interrelated components that control multi-vendor hardware pools of
processing, storage, networking resources through a data center. In
OpenStack, the tools which are used to build this platform are referred to as
“projects”. These projects handle a large number of services including
computing, networking, and storage services. Unlike virtualization, in which
resources such as RAM, CPU, etc are abstracted from the hardware using
hypervisors, OpenStack uses a number of APIs to abstract those resources
so that users and the administrators are able to directly interact with the
cloud services.

OpenStack components

Apart from various projects which constitute the OpenStack platform, there
are nine major services namely Nova, Neutron, Swift, Cinder, Keystone,
Horizon, Ceilometer, and Heat. Here is the basic definition of all the
components which will give us a basic idea about these components.
1. Nova (compute service): It manages the compute resources like
creating, deleting, and handling the scheduling. It can be seen as a
program dedicated to the automation of resources that are responsible for
the virtualization of services and high-performance computing.
2. Neutron (networking service): It is responsible for connecting all the
networks across OpenStack. It is an API driven service that manages all
networks and IP addresses.
3. Swift (object storage): It is an object storage service with high fault
tolerance capabilities and it used to retrieve unstructured data objects
with the help of Restful API. Being a distributed platform, it is also used to
provide redundant storage within servers that are clustered together. It is
able to successfully manage petabytes of data.
4. Cinder (block storage): It is responsible for providing persistent block
storage that is made accessible using an API (self- service).
Consequently, it allows users to define and manage the amount of cloud
storage required.
5. Keystone (identity service provider): It is responsible for all types of
authentications and authorizations in the OpenStack services. It is a
directory-based service that uses a central repository to map the correct
services with the correct user.
6. Glance (image service provider): It is responsible for registering,
storing, and retrieving virtual disk images from the complete network.
These images are stored in a wide range of back-end systems.
7. Horizon (dashboard): It is responsible for providing a web-based
interface for OpenStack services. It is used to manage, provision, and
monitor cloud resources.
8. Ceilometer (telemetry): It is responsible for metering and billing of
services used. Also, it is used to generate alarms when a certain
threshold is exceeded.
9. Heat (orchestration): It is used for on-demand service provisioning with
auto-scaling of cloud resources. It works in coordination with the
ceilometer.
These are the services around which this platform revolves around. These
services individually handle storage, compute, networking, identity, etc.
These services are the base on which the rest of the projects rely on and are
able to orchestrate services, allow bare-metal provisioning, handle
dashboards, etc.

Features of OpenStack

 Modular architecture: OpenStack is designed with a modular

architecture that enables users to deploy only the components they need.
This makes it easier to customize and scale the platform to meet specific
business requirements.
 Multi-tenancy support: OpenStack provides multi-tenancy support,
which enables multiple users to access the same cloud infrastructure
while maintaining security and isolation between them. This is particularly
important for cloud service providers who need to offer services to
multiple customers.
 Open-source software: OpenStack is an open-source software platform
that is free to use and modify. This enables users to customize the
platform to meet their specific requirements, without the need for
expensive proprietary software licenses.
 Distributed architecture: OpenStack is designed with a distributed
architecture that enables users to scale their cloud infrastructure
horizontally across multiple physical servers. This makes it easier to
handle large workloads and improve system performance.
 API-driven: OpenStack is API-driven, which means that all components
can be accessed and controlled through a set of APIs. This makes it
easier to automate and integrate with other tools and services.
 Comprehensive dashboard: OpenStack provides a comprehensive
dashboard that enables users to manage their cloud infrastructure and
resources through a user-friendly web interface. This makes it easier to
monitor and manage cloud resources without the need for specialized
technical skills.
 Resource pooling: OpenStack enables users to pool computing,
storage, and networking resources, which can be dynamically allocated
and de-allocated based on demand. This enables users to optimize
resource utilization and reduce waste.

Advantages of using OpenStack

 It boosts rapid provisioning of resources due to which orchestration and

scaling up and down of resources becomes easy.
 Deployment of applications using OpenStack does not consume a large
amount of time.
 Since resources are scalable therefore they are used more wisely and
efficiently.
 The regulatory compliances associated with its usage are manageable.

Disadvantages of using OpenStack

 OpenStack is not very robust when orchestration is considered.

 Even today, the APIs provided and supported by OpenStack are not
compatible with many of the hybrid cloud providers, thus integrating
solutions becomes difficult.
 Like all cloud service providers OpenStack services also come with the
risk of security breaches.

What is Cloud Federation?

Cloud Federation, also known as Federated Cloud is the deployment and

management of several external and internal cloud computing services to
match business needs. It is a multi-national cloud system that integrates
private, community, and public clouds into scalable computing platforms.
Federated cloud is created by connecting the cloud environment of different
cloud providers using a common standard.
The architecture of Federated Cloud:

The architecture of Federated Cloud consists of three basic components:

1. Cloud Exchange
The Cloud Exchange acts as a mediator between cloud coordinator and
cloud broker. The demands of the cloud broker are mapped by the cloud
exchange to the available services provided by the cloud coordinator. The
cloud exchange has a track record of what is the present cost, demand
patterns, and available cloud providers, and this information is periodically
reformed by the cloud coordinator.
2. Cloud Coordinator
The cloud coordinator assigns the resources of the cloud to the remote users
based on the quality of service they demand and the credits they have in the
cloud bank. The cloud enterprises and their membership are managed by the
cloud controller.
3. Cloud Broker
The cloud broker interacts with the cloud coordinator, analyzes the Service-
level agreement and the resources offered by several cloud providers in
cloud exchange. Cloud broker finalizes the most suitable deal for their client.
Properties of Federated Cloud:

1. In the federated cloud, the users can interact with the architecture either
centrally or in a decentralized manner. In centralized interaction, the user
interacts with a broker to mediate between them and the organization.
Decentralized interaction permits the user to interact directly with the
clouds in the federation.
2. Federated cloud can be practiced with various niches like commercial and
non-commercial.
3. The visibility of a federated cloud assists the user to interpret the
organization of several clouds in the federated environment.
4. Federated cloud can be monitored in two ways. MaaS (Monitoring as a
Service) provides information that aids in tracking contracted services to
the user. Global monitoring aids in maintaining the federated cloud.
5. The providers who participate in the federation publish their offers to a
central entity. The user interacts with this central entity to verify the prices
and propose an offer.
6. The marketing objects like infrastructure, software, and platform have to
pass through federation when consumed in the federated cloud.

Benefits of Federated Cloud:

1. It minimizes the consumption of energy.

2. It increases reliability.
3. It minimizes the time and cost of providers due to dynamic scalability.
4. It connects various cloud service providers globally. The providers may
buy and sell services on demand.
5. It provides easy scaling up of resources.

Challenges in Federated Cloud:

1. In cloud federation, it is common to have more than one provider for

processing the incoming demands. In such cases, there must be a
scheme needed to distribute the incoming demands equally among the
cloud service providers.
2. The increasing requests in cloud federation have resulted in more
heterogeneous infrastructure, making interoperability an area of concern.
It becomes a challenge for cloud users to select relevant cloud service
providers and therefore, it ties them to a particular cloud service provider.
3. A federated cloud means constructing a seamless cloud environment that
can interact with people, different devices, several application interfaces,
and other entities.

Federated Cloud technologies:

The technologies that aid the cloud federation and cloud services are:
1. OpenNebula
It is a cloud computing platform for managing heterogeneous distributed data
center infrastructures. It can use the resources of its interoperability,
leveraging existing information technology assets, protecting the deals, and
adding the application programming interface (API).
2. Aneka coordinator
The Aneka coordinator is a proposition of the Aneka services and Aneka
peer components (network architectures) which give the cloud ability and
performance to interact with other cloud services.
3. Eucalyptus
Eucalyptus defines the pooling computational, storage, and network
resources that can be measured scaled up or down as application workloads
change in the utilization of the software. It is an open-source framework that
performs the storage, network, and many other computational resources to
access the cloud environment.
Levels of Cloud Federation

Cloud Federation stack

Each level of the cloud federation poses unique problems and functions at a
different level of the IT stack. Then, several strategies and technologies are
needed. The answers to the problems encountered at each of these levels
when combined form a reference model for a cloud federation.
Conceptual Level

The difficulties in presenting a cloud federation as an advantageous option

for using services rented from a single cloud provider are addressed at the
conceptual level. At this level, it’s crucial to define the new opportunities that
a federated environment brings in comparison to a single-provider solution
and to explicitly describe the benefits of joining a federation for service
providers or service users.
At this level, the following factors need attention:
 The reasons that cloud providers would want to join a federation.
 Motivations for service users to use a federation.
 Benefits for service providers who rent their services to other service
providers. Once a provider joins the federation, they have obligations.
 Agreements on trust between suppliers.
 Consumers versus transparency.
The incentives of service providers and customers joining a federation stand
out among these factors as being the most important.

Logical and Operational Level

The obstacles in creating a framework that allows the aggregation of

providers from various administrative domains within the context of a single
overlay infrastructure, or cloud federation, are identified and addressed at
the logical and operational level of a federated cloud.
Policies and guidelines for cooperation are established at this level.
Additionally, this is the layer where choices are made regarding how and
when to use a service from another provider that is being leased or
leveraged. The operational component characterizes and molds the dynamic
behavior of the federation as a result of the decisions made by the individual
providers, while the logical component specifies the context in which
agreements among providers are made and services are negotiated.
At this level, MOCC is put into precise and becomes a reality. At this stage,
it’s crucial to deal with the following difficulties:
 How ought a federation should be portrayed?
 How should a cloud service, a cloud provider, or an agreement be
modeled and represented?
 How should the regulations and standards that permit providers to join a
federation be defined?
 What procedures are in place to resolve disputes between providers?
 What obligations does each supplier have to the other?
 When should consumers and providers utilize the federation?
 What categories of services are more likely to be rented than purchased?
 Which percentage of the resources should be leased, and how should we
value the resources that are leased?
Both academia and industry have potential at the logical and operational
levels.

Infrastructure Level

The technological difficulties in making it possible for various cloud

computing systems to work together seamlessly are dealt with at the
infrastructure level. It addresses the technical obstacles keeping distinct
cloud computing systems from existing inside various administrative
domains. These obstacles can be removed by using standardized protocols
and interfaces.
The following concerns should be addressed at this level:
 What types of standards ought to be applied?
 How should interfaces and protocols be created to work together?
 Which technologies should be used for collaboration?
 How can we design platform components, software systems, and services
that support interoperability?
Only open standards and interfaces allow for interoperability and
composition amongst various cloud computing companies. Additionally, the
Cloud Computing Reference Model has layers that each has significantly
different interfaces and protocols.
Services of Cloud Federation

Active Directory Federation Services (ADFS)

Microsoft developed the Single Sign-On (SSO) system known as (ADFS). It

serves as a component of Windows Server operating systems, giving users
authenticated access to programs through Active Directory that cannot use
Integrated Windows Authentication (IWA) (AD).
Through a proxy service located between Active Directory and the intended
application, ADFS manages authentication. Users’ access is granted through
the usage of a Federated Trust, which connects ADFS and the intended
application. As a result, users no longer need to directly validate their identity
on the federated application in order to log on.
These Four Phases are typically followed by the Authentication Process:
 The user accesses a URL that the ADFS service has provided.
 The user is then verified by the AD service of the company through the
ADFS service.
 The ADFS service then gives the user an authentication claim after
successful authentication.
 The target application then receives this claim from the user’s browser
and decides whether to grant or deny access based on the Federated
Trust service established.

Cloud-based Single Sign-On and Identity Federation without ADFS

Applications can assign user authentication duties to a different system

through a process known as identity federation. You can accomplish single
sign-on, where users only need to log in once to be able to access any
number of their applications, by delegating access for all of your applications
through a single federation system. But because federation enables
organizations to centralize the access management function, it is far more
significant than single sign-on (see our piece on this). User experience,
security, application onboarding, service logging and monitoring, operational
efficiency in IT, and many other areas may all benefit from this.

Radiant One Cloud Federation Service: You’re On-Premises IdP

The newest addition to the Radiant One package is the Cloud Federation
Service (CFS), which is powered by identity virtualization. Together with
Radiant One FID, CFS isolates your external and cloud applications from the
complexity of your identity systems by delegating the work of authenticating
against all of your identity stores to a single common virtual layer.

Federation is the Future of the

Cloud
A new approach to the cloud – one based on a federated model – will
be increasingly important for cloud providers and users alike. The
future of the cloud is federated, and when you look at the broad
categories of apps moving to the cloud, the truth of this statement
begins to become clear, writes CEO and co-founder of OnApp, Ditlev
Bredahl.

It can be tempting to think of ‘the cloud’ as a ubiquitous global phenomenon: always

on and always available, everywhere to anyone. And, it’s easy to assume that cloud
providers like Amazon are the only way you can get access to that kind of global
capability. The reality, however, is really quite different. That’s why a new approach
to the cloud – one based on a federated model – will be increasingly important for
cloud providers and users alike.

Why You Can’t Get ‘The Cloud’ From a Single Provider

The future of the cloud is federated, and when you look at the broad categories of
apps moving to the cloud, the truth of this statement begins to become clear.
Gaming, social media, Web, eCommerce, publishing, CRM – these applications
demand truly global coverage, so that the user experience is always on, local and
instant, with ultra-low latency. That’s what the cloud has always promised to be.

The problem is that end users can’t get that from a single provider, no matter how
large. Even market giants like Amazon have limited geographic presence, with
infrastructure only where it’s profitable for them to invest. As a result, outside the
major countries and cities, coverage from today’s ‘global’ cloud providers is actually
pretty thin. Iceland, Jordan, Latvia, Turkey, Malaysia? Good luck. Even in the U.S.,
you might find that the closest access point to your business isn’t even in the same
state, let alone the same city.

Cloud Federations Benefit Providers and End Users

The federated cloud connects these local infrastructure providers to a global

marketplace that enables each participant to buy and sell capacity on demand. As a
provider, this gives you instant access to global infrastructure on an unprecedented
scale. If your customer suddenly needs a few hundred new servers, you just buy the
capacity they need from the marketplace. If a customer needs to accelerate a website
or an application in Hong Kong, Tokyo or Latvia, you simply subscribe to those
locations and make use of the infrastructure that’s already there.

As part of a cloud federation, even a small service provider can offer a truly global
service without spending a dime building new infrastructure. For companies with
spare capacity in the data center, the federation also provides a simple way to
monetize that capacity by submitting it to the marketplace for other providers to buy,
creating an additional source of revenue.

There are immediate benefits for end users, too. The federated cloud means that end
users can host apps with their federated cloud provider of choice, instead of choosing
from a handful of “global” cloud providers on the market today and making do with
whatever pricing, app support and SLAs they happen to impose. Cloud users can
choose a local host with the exact pricing, expertise and support package that fits
their need, while still receiving instant access to as much local or global IT resources
as they’d like. They get global scalability without restricted choice, and without
having to manage multiple providers and invoices.

The Future of Cloud is Federated

The federated cloud model is a force for real democratization in the cloud market. It’s
how businesses will be able to use local cloud providers to connect with customers,
partners and employees anywhere in the world. It’s how end users will finally get to
realize the promise of the cloud. And, it’s how data center operators and other service
providers will finally be able to compete with, and beat, today’s so-called global cloud
providers.

Hadoop Components
No ratings yet
Hadoop Components
5 pages
Big Data – Introduction to Hadoop
No ratings yet
Big Data – Introduction to Hadoop
61 pages
Unit Iii
No ratings yet
Unit Iii
20 pages
MODULE 2 BIG DATA ANALYTICS
No ratings yet
MODULE 2 BIG DATA ANALYTICS
38 pages
Big Data - Unit 2 Hadoop Framework
100% (1)
Big Data - Unit 2 Hadoop Framework
19 pages
BDA UNIT-2
No ratings yet
BDA UNIT-2
37 pages
Unit 4 Hadoop
No ratings yet
Unit 4 Hadoop
31 pages
unit 2,3
No ratings yet
unit 2,3
24 pages
INTRODUCTION TO DATA SCIENCE
No ratings yet
INTRODUCTION TO DATA SCIENCE
14 pages
Unit 2 Part A
No ratings yet
Unit 2 Part A
34 pages
CC UNIT 2 (1)
No ratings yet
CC UNIT 2 (1)
29 pages
Unit-2 (HADOOP)
No ratings yet
Unit-2 (HADOOP)
20 pages
BDA Unit-3
No ratings yet
BDA Unit-3
47 pages
Big Data Unit II
No ratings yet
Big Data Unit II
42 pages
INtroduction To Big DAta and HAdoop
No ratings yet
INtroduction To Big DAta and HAdoop
30 pages
Unit 2
No ratings yet
Unit 2
73 pages
Module 2 CN
No ratings yet
Module 2 CN
23 pages
Chapter 2 Hadoop Eco System
No ratings yet
Chapter 2 Hadoop Eco System
34 pages
Unit 2-1
No ratings yet
Unit 2-1
43 pages
Introduction to
No ratings yet
Introduction to
7 pages
Hadoop Main
No ratings yet
Hadoop Main
19 pages
unit 2
No ratings yet
unit 2
9 pages
BDA Presentations Unit-4 - Hadoop, Ecosystem
100% (1)
BDA Presentations Unit-4 - Hadoop, Ecosystem
25 pages
HADOOP
No ratings yet
HADOOP
18 pages
Bda Module 2
No ratings yet
Bda Module 2
12 pages
Bda 18CS72 Mod-2
No ratings yet
Bda 18CS72 Mod-2
152 pages
Hadoop is an Open
No ratings yet
Hadoop is an Open
4 pages
Unit 3 Bda
No ratings yet
Unit 3 Bda
13 pages
BDAunit-II
No ratings yet
BDAunit-II
4 pages
Apache Hadoop
No ratings yet
Apache Hadoop
27 pages
Big Data ABHISHEK PRAJA C CCCCCCCCCCC
No ratings yet
Big Data ABHISHEK PRAJA C CCCCCCCCCCC
11 pages
Hadoop
No ratings yet
Hadoop
11 pages
Big data 2 - part
No ratings yet
Big data 2 - part
40 pages
Unit-2-_Hadoop2_
No ratings yet
Unit-2-_Hadoop2_
30 pages
Unit-III (Big Data) Final
No ratings yet
Unit-III (Big Data) Final
34 pages
Part 02 - Big Data Solutions
No ratings yet
Part 02 - Big Data Solutions
17 pages
Unit 2 Big Data Notes
No ratings yet
Unit 2 Big Data Notes
21 pages
Module-2
No ratings yet
Module-2
23 pages
BDA Unit 3
No ratings yet
BDA Unit 3
6 pages
Hadoop in bigdata processing concept
No ratings yet
Hadoop in bigdata processing concept
2 pages
Module 2. 16974328568170
No ratings yet
Module 2. 16974328568170
113 pages
BDA Unit2 Notes
No ratings yet
BDA Unit2 Notes
23 pages
Unit III
No ratings yet
Unit III
15 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
5 pages
UNIT II
No ratings yet
UNIT II
30 pages
Unit 2
No ratings yet
Unit 2
23 pages
Report On An Exploratory Analysis of The
No ratings yet
Report On An Exploratory Analysis of The
19 pages
Assignment 5 (Hadoop)
No ratings yet
Assignment 5 (Hadoop)
1 page
Unit 3 ETI (BDA)
No ratings yet
Unit 3 ETI (BDA)
34 pages
UNIT 3-1
No ratings yet
UNIT 3-1
14 pages
BIG Data_Unit_2
No ratings yet
BIG Data_Unit_2
24 pages
Attachment (21)
No ratings yet
Attachment (21)
11 pages
Unit-2 Hadoop and MapReduce
No ratings yet
Unit-2 Hadoop and MapReduce
32 pages
Big Data RAJNEESH CCC
No ratings yet
Big Data RAJNEESH CCC
11 pages
CS 4407 Discussion Forum Unit 2
No ratings yet
CS 4407 Discussion Forum Unit 2
2 pages
HADOOP
No ratings yet
HADOOP
10 pages
BD - Unit - II - Hadoop Frameworks and HDFS
No ratings yet
BD - Unit - II - Hadoop Frameworks and HDFS
37 pages
Asafjahis Dynasty Ed PDF
No ratings yet
Asafjahis Dynasty Ed PDF
7 pages
Nuro Symbolic AI 1706972510
No ratings yet
Nuro Symbolic AI 1706972510
38 pages
Language Awareness Test
No ratings yet
Language Awareness Test
13 pages
Module Wise Question Bank POP
100% (1)
Module Wise Question Bank POP
3 pages
Paper 3
100% (2)
Paper 3
18 pages
Thunder Cake K Lit
No ratings yet
Thunder Cake K Lit
17 pages
Run DeepSeek Models Locally in 5 Minutes
No ratings yet
Run DeepSeek Models Locally in 5 Minutes
10 pages
Pvo - Oracle BI Applications
No ratings yet
Pvo - Oracle BI Applications
8 pages
Hci Unit 1
No ratings yet
Hci Unit 1
53 pages
Lesson 59 - Speaking Test - Role Play
No ratings yet
Lesson 59 - Speaking Test - Role Play
4 pages
5
No ratings yet
5
3 pages
English Quarter 2 Week 8
No ratings yet
English Quarter 2 Week 8
29 pages
9780735695849
No ratings yet
9780735695849
121 pages
PHPCK 3 H 9 R
No ratings yet
PHPCK 3 H 9 R
15 pages
Q3 Week 8
No ratings yet
Q3 Week 8
9 pages
(ANCIENT and MEDIEVAL PHILOSOPHY - De Wulf-Mansion Centre, Series I Volume 13) Ibn Sînâ (Avicenna), Jules L. Janssens-An Annotated Bibliography on Ibn Sînâ (1970-1989). Including Arabic and Persian Pu
No ratings yet
(ANCIENT and MEDIEVAL PHILOSOPHY - De Wulf-Mansion Centre, Series I Volume 13) Ibn Sînâ (Avicenna), Jules L. Janssens-An Annotated Bibliography on Ibn Sînâ (1970-1989). Including Arabic and Persian Pu
391 pages
The Short Stories of James Stephens
No ratings yet
The Short Stories of James Stephens
12 pages
There Is No Place Like Home: in 30 Seconds, Answer The Following Question
No ratings yet
There Is No Place Like Home: in 30 Seconds, Answer The Following Question
9 pages
OS Lab Programs
No ratings yet
OS Lab Programs
17 pages
BOA and RTL Integration v3
No ratings yet
BOA and RTL Integration v3
23 pages
CLEFT SENTENCES- CÂU CHẺ (CÂU NHẤN MẠNH)
No ratings yet
CLEFT SENTENCES- CÂU CHẺ (CÂU NHẤN MẠNH)
6 pages
Presidential Address I Have Scinde Flogging A Dead White Male Orientalist Horse by Wendy Doniger
No ratings yet
Presidential Address I Have Scinde Flogging A Dead White Male Orientalist Horse by Wendy Doniger
22 pages
Assignment Access PDF
No ratings yet
Assignment Access PDF
9 pages
RECOUNT TEXT Contoh Soal Trip To
No ratings yet
RECOUNT TEXT Contoh Soal Trip To
4 pages
Alan Beck, "Is Radio Blind or Invisible? A Call For A Wider Debate On Listening-In"
No ratings yet
Alan Beck, "Is Radio Blind or Invisible? A Call For A Wider Debate On Listening-In"
20 pages
Shopping Cart
No ratings yet
Shopping Cart
1 page
Arabic: Section A: Letter, Report, Dialogue or Speech
No ratings yet
Arabic: Section A: Letter, Report, Dialogue or Speech
3 pages
Drla : Regret Ir
No ratings yet
Drla : Regret Ir
1 page
Hussin wassof-CV
No ratings yet
Hussin wassof-CV
4 pages
Homework 1
No ratings yet
Homework 1
1 page
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.