0% found this document useful (0 votes)

44 views

CC-KML051-Unit V

Apache Hadoop is an open-source framework that allows distributed storage and processing of large datasets across computer clusters. It stores data on local disks for high throughput access and scales linearly by adding more nodes. Hadoop consists of HDFS for storage, YARN for resource management, MapReduce for processing, and common utilities. It is commonly used for analytics, data warehousing, data lakes, and machine learning on big data ranging from gigabytes to petabytes. While powerful, Hadoop has challenges around complexity, security, and skills gaps.

Uploaded by

Fdjs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

CC-KML051-Unit V

Uploaded by

Fdjs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Cloud Computing (KML051) – Unit 5 Notes – By Mr.

Faz Mohammad (GLB ITM – ACSE)

What is Apache Hadoop?

Apache Hadoop software is an open source framework that allows for the distributed storage and processing
of large datasets across clusters of computers using simple programming models. Hadoop is designed to scale
up from a single computer to thousands of clustered computers, with each machine offering local computation
and storage. In this way, Hadoop can efficiently store and process large datasets ranging in size from gigabytes
to petabytes of data.

Learn about how to use Dataproc to run Apache Hadoop clusters, on Google Cloud, in a simpler, integrated,
more cost-effective way.

Hadoop history

Hadoop has its origins in the early era of the World Wide Web. As the Web grew to millions and then billions
of pages, the task of searching and returning search results became one of the most prominent challenges.
Startups like Google, Yahoo, and AltaVista began building frameworks to automate search results. One
project called Nutch was built by computer scientists Doug Cutting and Mike Cafarella based on Google’s
early work on MapReduce (more on that later) and Google File System. Nutch was eventually moved to the
Apache open source software foundation and was split between Nutch and Hadoop. Yahoo, where Cutting
began working in 2006, open sourced Hadoop in 2008.

While Hadoop is sometimes referred to as an acronym for High Availability Distributed Object Oriented
Platform, it was originally named after Cutting’s son’s toy elephant.

Hadoop defined

Hadoop is an open source framework based on Java that manages the storage and processing of large amounts
of data for applications. Hadoop uses distributed storage and parallel processing to handle big data and
analytics jobs, breaking workloads down into smaller workloads that can be run at the same time.

Four modules comprise the primary Hadoop framework and work collectively to form the Hadoop ecosystem:

Hadoop Distributed File System (HDFS): As the primary component of the Hadoop ecosystem, HDFS is a
distributed file system in which individual Hadoop nodes operate on data that resides in their local storage.
This removes network latency, providing high-throughput access to application data. In addition,
administrators don’t need to define schemas up front.
Yet Another Resource Negotiator (YARN): YARN is a resource-management platform responsible for
managing compute resources in clusters and using them to schedule users’ applications. It performs scheduling
and resource allocation across the Hadoop system.

MapReduce: MapReduce is a programming model for large-scale data processing. In the MapReduce model,
subsets of larger datasets and instructions for processing the subsets are dispatched to multiple different nodes,
where each subset is processed by a node in parallel with other processing jobs. After processing the results,
individual subsets are combined into a smaller, more manageable dataset.

Hadoop Common: Hadoop Common includes the libraries and utilities used and shared by other Hadoop
modules.

Beyond HDFS, YARN, and MapReduce, the entire Hadoop open source ecosystem continues to grow and
includes many tools and applications to help collect, store, process, analyze, and manage big data. These
include Apache Pig, Apache Hive, Apache HBase, Apache Spark, Presto, and Apache Zeppelin.

How does Hadoop work?

Hadoop allows for the distribution of datasets across a cluster of commodity hardware. Processing is
performed in parallel on multiple servers simultaneously.

Software clients input data into Hadoop. HDFS handles metadata and the distributed file system. MapReduce
then processes and converts the data. Finally, YARN divides the jobs across the computing cluster.

All Hadoop modules are designed with a fundamental assumption that hardware failures of individual
machines or racks of machines are common and should be automatically handled in software by the
framework.

What are the benefits of Hadoop?

Scalability
Hadoop is important as one of the primary tools to store and process huge amounts of data quickly. It does
this by using a distributed computing model which enables the fast processing of data that can be rapidly
scaled by adding computing nodes.
Low cost
As an open source framework that can run on commodity hardware and has a large ecosystem of tools,
Hadoop is a low-cost option for the storage and management of big data.
Flexibility
Hadoop allows for flexibility in data storage as data does not require preprocessing before storing it which
means that an organization can store as much data as they like and then utilize it later.
Resilience
As a distributed computing model, Hadoop allows for fault tolerance and system resilience, meaning if one
of the hardware nodes fail, jobs are redirected to other nodes. Data stored on one Hadoop cluster is replicated
across other nodes within the system to fortify against the possibility of hardware or software failure.

What are the challenges of Hadoop?

MapReduce complexity and limitations

As a file-intensive system, MapReduce can be a difficult tool to utilize for complex jobs, such as interactive
analytical tasks. MapReduce functions also need to be written in Java and can require a steep learning curve.
The MapReduce ecosystem is quite large, with many components for different functions that can make it
difficult to determine what tools to use.

Security

Data sensitivity and protection can be issues as Hadoop handles such large datasets. An ecosystem of tools
for authentication, encryption, auditing, and provisioning has emerged to help developers secure data in
Hadoop.

Governance and management

Hadoop does not have many robust tools for data management and governance, nor for data quality and
standardization.

Talent gap

Like many areas of programming, Hadoop has an acknowledged talent gap. Finding developers with the
combined requisite skills in Java to program MapReduce, operating systems, and hardware can be difficult.
In addition, MapReduce has a steep learning curve, making it hard to get new programmers up to speed on its
best practices and ecosystem.

Why is Hadoop important?

Research firm IDC estimated that 62.4 zettabytes of data were created or replicated in 2020, driven by the
Internet of Things, social media, edge computing, and data created in the cloud. The firm forecasted that data
growth from 2020 to 2025 was expected at 23% per year. While not all that data is saved (it is either deleted
after consumption or overwritten), the data needs of the world continue to grow.
Hadoop tools

Hadoop has a large ecosystem of open source tools that can augment and extend the capabilities of the core
module. Some of the main software tools used with Hadoop include:

Apache Hive: A data warehouse that allows programmers to work with data in HDFS using a query language
called HiveQL, which is similar to SQL

Apache HBase: An open source non-relational distributed database often paired with Hadoop

Apache Pig: A tool used as an abstraction layer over MapReduce to analyze large sets of data and enables
functions like filter, sort, load, and join

Apache Impala: Open source, massively parallel processing SQL query engine often used with Hadoop

Apache Sqoop: A command-line interface application for efficiently transferring bulk data between relational
databases and Hadoop

Apache ZooKeeper: An open source server that enables reliable distributed coordination in Hadoop; a service
for, "maintaining configuration information, naming, providing distributed synchronization, and providing
group services"

Apache Oozie: A workflow scheduler for Hadoop jobs

What is Apache Hadoop used for?

Here are some common uses cases for Apache Hadoop:
Analytics and big data
A wide variety of companies and organizations use Hadoop for research, production data processing, and
analytics that require processing terabytes or petabytes of big data, storing diverse datasets, and data parallel
processing.
Data storage and archiving
As Hadoop enables mass storage on commodity hardware, it is useful as a low-cost storage option for all
kinds of data, such as transactions, click streams, or sensor and machine data.
Data lakes
Since Hadoop can help store data without preprocessing, it can be used to complement to data lakes, where
large amounts of unrefined data are stored.
Marketing analytics
Marketing departments often use Hadoop to store and analyze customer relationship management (CRM)
data.
Risk management
Banks, insurance companies, and other financial services companies use Hadoop to build risk analysis and
management models.
AI and machine learning
Hadoop ecosystems help with the processing of data and model training operations for machine learning
applications.

Related products and services

Companies often choose to run Hadoop clusters on public, or hybrid cloud resources versus on-premises
hardware to gain flexibility, availability, and cost control. Many cloud solution providers offer fully
managed services for Hadoop. With this kind of prepackaged service for cloud-first Hadoop, operations that
used to take hours or days can be completed in seconds or minutes, with companies paying only for the
resources used.

On Google Cloud, Dataproc is a fast, easy-to-use, and fully-managed cloud service for running Apache
Spark and Apache Hadoop clusters in a simpler, integrated, most cost-effective way. It fully integrates with
other Google Cloud services that meet critical security, governance, and support needs, allowing you to gain
a complete and powerful platform for data processing, analytics, and machine learning.

Big data analytics tools from Google Cloud—such as Dataproc, BigQuery, Vertex AI Workbench,
and Dataflow—can enable you to build context-rich applications, build new analytics solutions, and turn
data into actionable insights.
Google App Engine
A scalable runtime environment, Google App Engine is mostly used to run Web applications. These
dynamic scales as demand change over time because of Google’s vast computing infrastructure. Because it
offers a secure execution environment in addition to a number of services, App Engine makes it easier to
develop scalable and high-performance Web apps. Google’s applications will scale up and down in response
to shifting demand. Croon tasks, communications, scalable data stores, work queues, and in-memory
caching are some of these services.
The App Engine SDK facilitates the testing and professionalization of applications by emulating the
production runtime environment and allowing developers to design and test applications on their own PCs.
When an application is finished being produced, developers can quickly migrate it to App Engine, put in
place quotas to control the cost that is generated, and make the programmer available to everyone. Python,
Java, and Go are among the languages that are currently supported.
The development and hosting platform Google App Engine, which powers anything from web programming
for huge enterprises to mobile apps, uses the same infrastructure as Google’s large-scale internet services.
It is a fully managed PaaS (platform as a service) cloud computing platform that uses in-built services to
run your apps. You can start creating almost immediately after receiving the software development kit
(SDK). You may immediately access the Google app developer’s manual once you’ve chosen the language
you wish to use to build your app.
After creating a Cloud account, you may Start Building your App
 Using the Go template/HTML package
 Python-based webapp2 with Jinja2
 PHP and Cloud SQL
 using Java’s Maven
The app engine runs the programmers on various servers while “sandboxing” them. The app engine allows
the program to use more resources in order to handle increased demands. The app engine powers programs
like Snapchat, Rovio, and Khan Academy.
Features of App Engine

Runtimes and Languages

To create an application for an app engine, you can use Go, Java, PHP, or Python. You can develop and test
an app locally using the SDK’s deployment toolkit. Each language’s SDK and nun time are unique. Your
program is run in a:
 Java Run Time Environment version 7
 Python Run Time environment version 2.7
 PHP runtime’s PHP 5.4 environment
 Go runtime 1.2 environment

Generally Usable Features

These are protected by the service-level agreement and depreciation policy of the app engine. The
implementation of such a feature is often stable, and any changes made to it are backward-compatible. These
include communications, process management, computing, data storage, retrieval, and search, as well as
app configuration and management. Features like the HRD migration tool, Google Cloud SQL, lo gs,
datastore, dedicated Memcached, blob store, Memcached, and search are included in the categories of data
storage, retrieval, and search.

Features in Preview

In a later iteration of the app engine, these functions will undoubtedly be made broadly accessible. However,
because they are in the preview, their implementation may change in ways that are backward-incompatible.
Sockets, MapReduce, and the Google Cloud Storage Client Library are a few of them.

Experimental Features

These might or might not be made broadly accessible in the next app engine updates. They might be changed
in ways that are irreconcilable with the past. The “trusted tester” features, however, are only accessible to a
limited user base and require registration in order to utilize them. The experimental features include
Prospective Search, Page Speed, OpenID, Restore/Backup/Datastore Admin, Task Queue Tagging,
MapReduce, and Task Queue REST API. App metrics analytics, datastore admin/backup/restore, task queue
tagging, MapReduce, task queue REST API, OAuth, prospective search, OpenID, and Page Speed are some
of the experimental features.

Third-Party Services

As Google provides documentation and helper libraries to expand the capabilities of the app engine
platform, your app can perform tasks that are not built into the core product you are familiar with as app
engine. To do this, Google collaborates with other organizations. Along with the helper libraries, the
partners frequently provide exclusive deals to app engine users.
Advantages of Google App Engine
The Google App Engine has a lot of benefits that can help you advance your app ideas. This comprises:
1. Infrastructure for Security: The Internet infrastructure that Google uses is arguably the safest in
the entire world. Since the application data and code are hosted on extremely secure servers,
there has rarely been any kind of illegal access to date.
2. Faster Time to Market: For every organization, getting a product or service to market quickly is
crucial. When it comes to quickly releasing the product, encouraging the development and
maintenance of an app is essential. A firm can grow swiftly with Google Cloud App Engine’s
assistance.
3. Quick to Start: You don’t need to spend a lot of time prototyping or deploying the app to users
because there is no hardware or product to buy and maintain.
4. Easy to Use: The tools that you need to create, test, launch, and update the applications are
included in Google App Engine (GAE).
5. Rich set of APIs & Services: A number of built-in APIs and services in Google App Engine
enable developers to create strong, feature-rich apps.
6. Scalability: This is one of the deciding variables for the success of any software. When using the
Google app engine to construct apps, you may access technologies like GFS, Big Table, and
others that Google uses to build its own apps.
7. Performance and Reliability: Among international brands, Google ranks among the top ones.
Therefore, you must bear that in mind while talking about performance and reliability.
8. Cost Savings: To administer your servers, you don’t need to employ engineers or even do it
yourself. The money you save might be put toward developing other areas of your company.
9. Platform Independence: Since the app engine platform only has a few dependencies, you can
easily relocate all of your data to another environment.

What is OpenStack?

OpenStack is a collection of open source software modules and tools that provides a framework to create and
manage both public cloud and private cloud infrastructure.

OpenStack delivers infrastructure-as-a-service functionality -- it pools, provisions and manages large

concentrations of compute, storage and network resources. These resources, which include bare metal
hardware, virtual machines (VMs) and containers, are managed through application programming interfaces
(APIs) as well as an OpenStack dashboard. Other OpenStack components provide orchestration, fault
management and services intended to support reliable, high availability operations.

Businesses and service providers can deploy OpenStack on premises (in the data center to build a private
cloud), in the cloud to enable or drive public cloud platforms, and at the network edge for distributed
computing systems.

What does OpenStack do?

To create a cloud computing environment, an organization typically builds off of its existing virtualized
infrastructure, using a well-established hypervisor such as VMware vSphere, Microsoft Hyper-V or KVM.
However, cloud computing offers more than just virtualization -- a public or private cloud provides extensive
provisioning, lifecycle automation, user self-service, cost reporting and billing, orchestration and other
features.
How does OpenStack work?

OpenStack is not an application in the traditional sense, but rather a platform composed of several dozen
separate components, called projects, which interoperate with each other through APIs. Each component is
complementary, but not all components are required to create a basic cloud. Organizations can install only
select components that build the features and functionality in a desired cloud environment.

OpenStack also relies on two additional foundation technologies: a base operating system, such as Linux, and
a virtualization platform, such as VMware or Citrix. The OS handles the commands and data exchanged from
OpenStack, while the virtualization engine manages the virtualized hardware resources used by OpenStack
projects.

Once the OS, virtualization platform and OpenStack components are deployed and configured properly,
administrators can provision and manage the instanced resources that applications require. Actions and
requests made through a dashboard produce a series of API calls, which are authenticated through a security
service and delivered to the destination component, which executes the associated tasks.

What are the different OpenStack components?

The OpenStack cloud platform is an amalgam of software components. These components are shaped by open
source contributions from the developer community, and OpenStack adopters can choose to implement some
or all of these components as business needs dictate.

The following map shows all OpenStack components, as of April 2021.

(SOURCE: OPENSTACK.ORG, LICENSED UNDER CREATIVE COMMONS ATTRIBUTION 4.0.)

Map of all OpenStack components (as of April 2021), their functions and interactions.

OpenStack setups vary, but typically start with a handful of central components: compute (Nova), VM images
(Glance), networking (Neutron), storage (Cinder or Swift), identity management (Keystone) and resource
management (Placement).

What are the pros and cons of OpenStack?

Many enterprises that deploy and maintain an OpenStack infrastructure enjoy several advantages, including
that it is:

 Affordable. OpenStack is available freely as open source software released under the Apache 2.0
license. This means there is no upfront cost to acquire and use OpenStack.

 Reliable. With almost a decade of development and use, OpenStack provides a comprehensive and
proven production-ready modular platform upon which an enterprise can build and operate a
private or public cloud. Its rich set of capabilities includes scalable storage, good performance and
high data security, and it enjoys broad acceptance across industries.

 Vendor-neutral. Because of OpenStack's open source nature, some organizations also see it as a
way to avoid vendor lock-in, as an overall platform as well as its individual component functions.

But potential adopters must also consider some drawbacks, such as the following:

 Complexity. Because of its size and scope, OpenStack requires an IT staff with significant
knowledge to deploy the platform and make it work. In some cases, an organization might require
additional staff or a consulting firm to deploy OpenStack, which adds time and cost.

 Support. As open source software, OpenStack is not owned or directed by any one vendor or team.
This can make it difficult to obtain support for the technology, beyond the open source community.

 Consistency. The OpenStack component suite is always in flux as new components are added and
others are deprecated.

To reduce the complexity of an OpenStack deployment, and to gain direct access to technical support, an
organization can select an OpenStack distribution from a vendor. This is a version of the open source platform
packaged with other components, such as an installation program and management tools. It often comes with
technical support options.

An organization has many OpenStack distributions to choose from, including the Red Hat OpenStack
platform, the Mirantis Cloud Platform and the Rackspace OpenStack private cloud.
OpenStack vs. other cloud platforms

Even simple clouds are complex and require extensive automation, orchestration and management to operate.
This means there are few direct alternatives to OpenStack that are practical and proven. However, there are
some options that can help organizations combine the benefits of cloud and on-premises capabilities to
simplify or speed an enterprise's adoption of next-generation technology.

Kubernetes (containers)

Organizations with small, dynamic container-based environments may balk at OpenStack's embrace of
traditional VMs. They may instead opt for a pure container-based approach using a platform such as
Kubernetes.

Hybrid cloud stacks

The three major public cloud providers all provide managed offerings for on-premises clouds, with a strong
emphasis on hybrid cloud adoption. AWS Outposts, Azure Stack and Google Anthos all offer appliances that
sit within a local data center to facilitate a range of services that mimic the providers' public services and
capabilities.

VMware vCloud

Given the vast enterprise investments in virtualization technology, it's natural to consider building a private
cloud based on VMware's vCloud Suite. VMware has partnerships with cloud providers, notably AWS, to
support such hybrid cloud projects. However, VMware software is proprietary and requires licensing, and it
may offer fewer capabilities and less flexibility than an open source platform such as OpenStack.

Public clouds only

Plenty of organizations decide that the breadth and reliability of public cloud services fulfill their
requirements, thereby avoiding the need to invest financially and intellectually in a private cloud
infrastructure.

How to get started with OpenStack

OpenStack adoption is a process, not an event. There are potentially dozens of components to understand,
install and employ. Organizations that seek to build a private cloud based on OpenStack need time, financial
investment and support from upper management.

What is Cloud Federation?

Cloud Federation, also known as Federated Cloud is the deployment and management of several external
and internal cloud computing services to match business needs. It is a multi-national cloud system that
integrates private, community, and public clouds into scalable computing platforms. Federated cloud is
created by connecting the cloud environment of different cloud providers using a common standard.

Federated Cloud

The architecture of Federated Cloud:

The architecture of Federated Cloud consists of three basic components:

1. Cloud Exchange
The Cloud Exchange acts as a mediator between cloud coordinator and cloud broker. The demands of the
cloud broker are mapped by the cloud exchange to the available services provided by the cloud coordinator.
The cloud exchange has a track record of what is the present cost, demand patterns, and available cloud
providers, and this information is periodically reformed by the cloud coordinator.
2. Cloud Coordinator
The cloud coordinator assigns the resources of the cloud to the remote users based on the quality of service
they demand and the credits they have in the cloud bank. The cloud enterprises and their membership are
managed by the cloud controller.
3. Cloud Broker
The cloud broker interacts with the cloud coordinator, analyzes the Service-level agreement and the
resources offered by several cloud providers in cloud exchange. Cloud broker finalizes the most suitable
deal for their client.
Federal Cloud Architecture

Properties of Federated Cloud:

1. In the federated cloud, the users can interact with the architecture either centrally or in a
decentralized manner. In centralized interaction, the user interacts with a broker to mediate
between them and the organization. Decentralized interaction permits the user to interact directly
with the clouds in the federation.
2. Federated cloud can be practiced with various niches like commercial and non-commercial.
3. The visibility of a federated cloud assists the user to interpret the organization of several clouds
in the federated environment.
4. Federated cloud can be monitored in two ways. MaaS (Monitoring as a Service) provides
information that aids in tracking contracted services to the user. Global monitoring aids in
maintaining the federated cloud.
5. The providers who participate in the federation publish their offers to a central entity. The user
interacts with this central entity to verify the prices and propose an offer.
6. The marketing objects like infrastructure, software, and platform have to pass through federation
when consumed in the federated cloud.

Benefits of Federated Cloud:

1. It minimizes the consumption of energy.

2. It increases reliability.
3. It minimizes the time and cost of providers due to dynamic scalability.
4. It connects various cloud service providers globally. The providers may buy and sell services on
demand.
5. It provides easy scaling up of resources.

Challenges in Federated Cloud:

1. In cloud federation, it is common to have more than one provider for processing the incoming
demands. In such cases, there must be a scheme needed to distribute the incoming demands
equally among the cloud service providers.
2. The increasing requests in cloud federation have resulted in more heterogeneous infrastructure,
making interoperability an area of concern. It becomes a challenge for cloud users to select
relevant cloud service providers and therefore, it ties them to a particular cloud service provider.
3. A federated cloud means constructing a seamless cloud environment that can interact with
people, different devices, several application interfaces, and other entities.

Federated Cloud technologies:

The technologies that aid the cloud federation and cloud services are:
1. OpenNebula
It is a cloud computing platform for managing heterogeneous distributed data center infrastructures. It can
use the resources of its interoperability, leveraging existing information technology assets, protecting the
deals, and adding the application programming interface (API).
2. Aneka coordinator
The Aneka coordinator is a proposition of the Aneka services and Aneka peer components (network
architectures) which give the cloud ability and performance to interact with other cloud services.
3. Eucalyptus
Eucalyptus defines the pooling computational, storage, and network resources that can be measured scaled
up or down as application workloads change in the utilization of the software. It is an open-source
framework that performs the storage, network, and many other computational resources to access the cloud
environment.
Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech
landscape, GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable
prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already
empowered, and we're here to do the same for you.

Levels of Federation and Services in Cloud

The implementation and management of several internal and external cloud computing services to meet
business demands is known as cloud federation, sometimes known as federated cloud. A global cloud
system combines community, private, and public clouds into scalable computing platforms. By utilizing a
common standard to link the cloud environments of several cloud providers, a federated cloud is built.
Levels of Cloud Federation

Cloud Federation stack

Each level of the cloud federation poses unique problems and functions at a different level of the IT stack.
Then, several strategies and technologies are needed. The answers to the problems encountered at each of
these levels when combined form a reference model for a cloud federation.

Conceptual Level

The difficulties in presenting a cloud federation as an advantageous option for using services rented from a
single cloud provider are addressed at the conceptual level. At this level, it’s crucial to define the new
opportunities that a federated environment brings in comparison to a single-provider solution and to
explicitly describe the benefits of joining a federation for service providers or service users.
At this level, the following factors need attention:
 The reasons that cloud providers would want to join a federation.
 Motivations for service users to use a federation.
 Benefits for service providers who rent their services to other service providers. Once a provider
joins the federation, they have obligations.
 Agreements on trust between suppliers.
 Consumers versus transparency.
The incentives of service providers and customers joining a federation stand out among these factors as
being the most important.

Logical and Operational Level

The obstacles in creating a framework that allows the aggregation of providers from various administrative
domains within the context of a single overlay infrastructure, or cloud federation, are identified and
addressed at the logical and operational level of a federated cloud.
Policies and guidelines for cooperation are established at this level. Additionally, this is the layer where
choices are made regarding how and when to use a service from another provider that is being leased or
leveraged. The operational component characterizes and molds the dynamic behavior of the federation as a
result of the decisions made by the individual providers, while the logical component specifies the context
in which agreements among providers are made and services are negotiated.
At this level, MOCC is put into precise and becomes a reality. At this stage, it’s crucial to deal with the
following difficulties:
 How ought a federation should be portrayed?
 How should a cloud service, a cloud provider, or an agreement be modeled and represented?
 How should the regulations and standards that permit providers to join a federation be defined?
 What procedures are in place to resolve disputes between providers?
 What obligations does each supplier have to the other?
 When should consumers and providers utilize the federation?
 What categories of services are more likely to be rented than purchased?
 Which percentage of the resources should be leased, and how should we value the resources that
are leased?
Both academia and industry have potential at the logical and operational levels.

Infrastructure Level

The technological difficulties in making it possible for various cloud computing systems to work together
seamlessly are dealt with at the infrastructure level. It addresses the technical obstacles keeping distinct
cloud computing systems from existing inside various administrative domains. These obstacles can be
removed by using standardized protocols and interfaces.
The following concerns should be addressed at this level:
 What types of standards ought to be applied?
 How should interfaces and protocols be created to work together?
 Which technologies should be used for collaboration?
 How can we design platform components, software systems, and services that support
interoperability?
Only open standards and interfaces allow for interoperability and composition amongst various cloud
computing companies. Additionally, the Cloud Computing Reference Model has layers that each has
significantly different interfaces and protocols.
Services of Cloud Federation

Active Directory Federation Services (ADFS)

Microsoft developed the Single Sign-On (SSO) system known as (ADFS). It serves as a component of
Windows Server operating systems, giving users authenticated access to programs through Active Directory
that cannot use Integrated Windows Authentication (IWA) (AD).
Through a proxy service located between Active Directory and the intended application, ADFS manages
authentication. Users’ access is granted through the usage of a Federated Trust, which connects ADFS and
the intended application. As a result, users no longer need to directly validate their identity on the federated
application in order to log on.
These Four Phases are typically followed by the Authentication Process:
 The user accesses a URL that the ADFS service has provided.
 The user is then verified by the AD service of the company through the ADFS service.
 The ADFS service then gives the user an authentication claim after successful authentication.
 The target application then receives this claim from the user’s browser and decides whether to
grant or deny access based on the Federated Trust service established.

Cloud-based Single Sign-On and Identity Federation without ADFS

Applications can assign user authentication duties to a different system through a process known as identit y
federation. You can accomplish single sign-on, where users only need to log in once to be able to access
any number of their applications, by delegating access for all of your applications through a single federation
system. But because federation enables organizations to centralize the access management function, it is far
more significant than single sign-on (see our piece on this). User experience, security, application
onboarding, service logging and monitoring, operational efficiency in IT, and many other areas may all
benefit from this.

Radiant One Cloud Federation Service: You’re On-Premises IdP

The newest addition to the Radiant One package is the Cloud Federation Service (CFS), which is powered
by identity virtualization. Together with Radiant One FID, CFS isolates your external and cloud applications
from the complexity of your identity systems by delegating the work of authenticating against all of your
identity stores to a single common virtual layer.
Lost in the complex landscape of DevOps? It's time to find your way! Enroll in our DevOps Engineering
Planning to Production Live Course and set out on an exhilarating expedition to conquer DevOps
methodologies with precision and timeliness.
What We Offer:
 Comprehensive DevOps Curriculum
 Expert Guidance for Streamlined Learning
 Hands-on Experience with Real-world Scenarios
 Proven Track Record with 100,000+ Successful DevOps Enthusiasts

Federation is the Future of the Cloud

A new approach to the cloud – one based on a federated model – will be increasingly important for cloud
providers and users alike. The future of the cloud is federated, and when you look at the broad categories of
apps moving to the cloud, the truth of this statement begins to become clear, writes CEO and co-founder of
OnApp, Ditlev Bredahl.

It can be tempting to think of ‘the cloud’ as a ubiquitous global phenomenon: always on and always available,
everywhere to anyone. And, it’s easy to assume that cloud providers like Amazon are the only way you can
get access to that kind of global capability. The reality, however, is really quite different. That’s why a new
approach to the cloud – one based on a federated model – will be increasingly important for cloud providers
and users alike.
Why You Can’t Get ‘The Cloud’ From a Single Provider

The future of the cloud is federated, and when you look at the broad categories of apps moving to the cloud,
the truth of this statement begins to become clear. Gaming, social media, Web, eCommerce, publishing, CRM
– these applications demand truly global coverage, so that the user experience is always on, local and instant,
with ultra-low latency. That’s what the cloud has always promised to be.

The problem is that end users can’t get that from a single provider, no matter how large. Even market giants
like Amazon have limited geographic presence, with infrastructure only where it’s profitable for them to
invest. As a result, outside the major countries and cities, coverage from today’s ‘global’ cloud providers is
actually pretty thin. Iceland, Jordan, Latvia, Turkey, Malaysia? Good luck. Even in the U.S., you might find
that the closest access point to your business isn’t even in the same state, let alone the same city.

Of course, these locations aren’t devoid of infrastructure. There are hosting providers, telcos, ISPs and data
center operators pretty much everywhere. If you own infrastructure in one of these locations, you already have
a working business model for your local market. And, like most providers, you are likely to have spare capacity
almost all of the time.

Cloud Federations Benefit Providers and End Users

The federated cloud connects these local infrastructure providers to a global marketplace that enables each
participant to buy and sell capacity on demand. As a provider, this gives you instant access to global
infrastructure on an unprecedented scale. If your customer suddenly needs a few hundred new servers, you
just buy the capacity they need from the marketplace. If a customer needs to accelerate a website or an
application in Hong Kong, Tokyo or Latvia, you simply subscribe to those locations and make use of the
infrastructure that’s already there.

As part of a cloud federation, even a small service provider can offer a truly global service without spending
a dime building new infrastructure. For companies with spare capacity in the data center, the federation also
provides a simple way to monetize that capacity by submitting it to the marketplace for other providers to buy,
creating an additional source of revenue.

There are immediate benefits for end users, too. The federated cloud means that end users can host apps with
their federated cloud provider of choice, instead of choosing from a handful of “global” cloud providers on
the market today and making do with whatever pricing, app support and SLAs they happen to impose. Cloud
users can choose a local host with the exact pricing, expertise and support package that fits their need, while
still receiving instant access to as much local or global IT resources as they’d like. They get global scalability
without restricted choice, and without having to manage multiple providers and invoices.

The Future of Cloud is Federated

The federated cloud model is a force for real democratization in the cloud market. It’s how businesses will be
able to use local cloud providers to connect with customers, partners and employees anywhere in the world.
It’s how end users will finally get to realize the promise of the cloud. And, it’s how data center operators and
other service providers will finally be able to compete with, and beat, today’s so-called global cloud providers.

Unit Iii
No ratings yet
Unit Iii
20 pages
Huawei HG8245 Backdoor and Remote Access
No ratings yet
Huawei HG8245 Backdoor and Remote Access
2 pages
Excel Beyond The Basics
100% (1)
Excel Beyond The Basics
307 pages
Bda 18CS72 Mod-2
No ratings yet
Bda 18CS72 Mod-2
152 pages
Unit 2-1
No ratings yet
Unit 2-1
43 pages
Unit 4 Hadoop
No ratings yet
Unit 4 Hadoop
31 pages
Big Data ABHISHEK PRAJA C CCCCCCCCCCC
No ratings yet
Big Data ABHISHEK PRAJA C CCCCCCCCCCC
11 pages
Big Data RAJNEESH CCC
No ratings yet
Big Data RAJNEESH CCC
11 pages
Hadoop-How It Works
No ratings yet
Hadoop-How It Works
5 pages
Report On An Exploratory Analysis of The
No ratings yet
Report On An Exploratory Analysis of The
19 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Unit 2
No ratings yet
Unit 2
10 pages
Unit 2 Big Data Notes
No ratings yet
Unit 2 Big Data Notes
21 pages
Big Data?: Hadoop?
No ratings yet
Big Data?: Hadoop?
2 pages
Big Data – Introduction to Hadoop
No ratings yet
Big Data – Introduction to Hadoop
61 pages
Big Data Analytics Unit-3
No ratings yet
Big Data Analytics Unit-3
15 pages
BIG Data_Unit_2
No ratings yet
BIG Data_Unit_2
24 pages
Chapter 2 Hadoop Eco System
No ratings yet
Chapter 2 Hadoop Eco System
34 pages
CLOUD_COMPUTING
No ratings yet
CLOUD_COMPUTING
21 pages
Hadoop
No ratings yet
Hadoop
11 pages
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
100% (1)
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
89 pages
Unit II BDA
No ratings yet
Unit II BDA
32 pages
Hadoop in bigdata processing concept
No ratings yet
Hadoop in bigdata processing concept
2 pages
Unit 3 ETI (BDA)
No ratings yet
Unit 3 ETI (BDA)
34 pages
INTRODUCTION TO DATA SCIENCE
No ratings yet
INTRODUCTION TO DATA SCIENCE
14 pages
Hadoop Presentation: Swarnali B.SC Computer Science Hons. 2 Year Chandernagore Govt. College Halder
No ratings yet
Hadoop Presentation: Swarnali B.SC Computer Science Hons. 2 Year Chandernagore Govt. College Halder
8 pages
Bachelor of Engineering: C K Pithawalla College of Engineering & Technology, SURAT
No ratings yet
Bachelor of Engineering: C K Pithawalla College of Engineering & Technology, SURAT
14 pages
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
No ratings yet
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
40 pages
Module 2. 16974328568170
No ratings yet
Module 2. 16974328568170
113 pages
Hadoop and Mapreduce
No ratings yet
Hadoop and Mapreduce
21 pages
INtroduction To Big DAta and HAdoop
No ratings yet
INtroduction To Big DAta and HAdoop
30 pages
Unit-2 Hadoop
No ratings yet
Unit-2 Hadoop
16 pages
Hadoop Notesforstudents
No ratings yet
Hadoop Notesforstudents
13 pages
CC Unit - 5
No ratings yet
CC Unit - 5
27 pages
Parallel Project
No ratings yet
Parallel Project
32 pages
Apache Hadoop: Developer(s) Stable Release Preview Release
No ratings yet
Apache Hadoop: Developer(s) Stable Release Preview Release
5 pages
Eng - Hadoopthe Next Big Thing in - Tanvi Deshpande
No ratings yet
Eng - Hadoopthe Next Big Thing in - Tanvi Deshpande
6 pages
BD - HadoopEcoSystem Unit 2part 1
No ratings yet
BD - HadoopEcoSystem Unit 2part 1
12 pages
00 HadoopWelcome Transcript
No ratings yet
00 HadoopWelcome Transcript
4 pages
Chapter 3 Hadoop
No ratings yet
Chapter 3 Hadoop
10 pages
Apache Hadoop: Jump To Navigation Jump To Search
No ratings yet
Apache Hadoop: Jump To Navigation Jump To Search
2 pages
Apache Hadoop
No ratings yet
Apache Hadoop
11 pages
UNIT II
No ratings yet
UNIT II
30 pages
CASE STUDY On Application of Hadoop
No ratings yet
CASE STUDY On Application of Hadoop
16 pages
BDA Presentations Unit-4 - Hadoop, Ecosystem
100% (1)
BDA Presentations Unit-4 - Hadoop, Ecosystem
25 pages
Unit-2 Hadoop and MapReduce
No ratings yet
Unit-2 Hadoop and MapReduce
32 pages
Big Data Unit 2 Notes
No ratings yet
Big Data Unit 2 Notes
6 pages
Big Data Unit II
No ratings yet
Big Data Unit II
42 pages
Bda Aiml Note Unit 2
No ratings yet
Bda Aiml Note Unit 2
13 pages
Hadoop
No ratings yet
Hadoop
5 pages
What Is The Hadoop Ecosystem?
No ratings yet
What Is The Hadoop Ecosystem?
4 pages
Hadoop Lab
100% (1)
Hadoop Lab
32 pages
Hadoop Part 1
No ratings yet
Hadoop Part 1
2 pages
HADOOP
No ratings yet
HADOOP
10 pages
BDA Unit 3
No ratings yet
BDA Unit 3
6 pages
Bda Module 2
No ratings yet
Bda Module 2
12 pages
Big Data Analytics Assignment
No ratings yet
Big Data Analytics Assignment
7 pages
Unit 3 Introduction To Hadoop Syllabus
No ratings yet
Unit 3 Introduction To Hadoop Syllabus
22 pages
Apache Hadoop: Getting Started With
No ratings yet
Apache Hadoop: Getting Started With
7 pages
Subject: Data Driven Decision Making: Apache Hadoop For Big Data
No ratings yet
Subject: Data Driven Decision Making: Apache Hadoop For Big Data
5 pages
Unit 2
No ratings yet
Unit 2
30 pages
Mastering Big Data and Hadoop: From Basics to Expert Proficiency
From Everand
Mastering Big Data and Hadoop: From Basics to Expert Proficiency
William Smith
No ratings yet
Virus
No ratings yet
Virus
6 pages
We Have Many Ways To SERVE YOU!: Statement Date: Nov 04, 2019 Billing Period Covering: Oct 05, 2019 - Nov 04, 2019
No ratings yet
We Have Many Ways To SERVE YOU!: Statement Date: Nov 04, 2019 Billing Period Covering: Oct 05, 2019 - Nov 04, 2019
6 pages
Smart Card1
No ratings yet
Smart Card1
14 pages
(1847) Cauchy M
No ratings yet
(1847) Cauchy M
3 pages
An Economic Analysis Methodology For Project Evaluation and Programming
No ratings yet
An Economic Analysis Methodology For Project Evaluation and Programming
44 pages
Aficio MP c5502 PDF
No ratings yet
Aficio MP c5502 PDF
830 pages
Mvmeter2 Manual
No ratings yet
Mvmeter2 Manual
5 pages
Analysis of Variance: One-Way Anova: Questions Answered in This Chapter
No ratings yet
Analysis of Variance: One-Way Anova: Questions Answered in This Chapter
6 pages
Interview Question Answers Based On 8051 Mic..
No ratings yet
Interview Question Answers Based On 8051 Mic..
3 pages
4.1 Basic Compiler Techniques For Exposing ILP Instruction-Level Parallelism
No ratings yet
4.1 Basic Compiler Techniques For Exposing ILP Instruction-Level Parallelism
11 pages
Amplification of Accidental Torsion in Etabs
100% (1)
Amplification of Accidental Torsion in Etabs
3 pages
Problem Management OTC October 2014
No ratings yet
Problem Management OTC October 2014
20 pages
IDEA Slab 5: User Guide
No ratings yet
IDEA Slab 5: User Guide
38 pages
C Programming Notes
95% (21)
C Programming Notes
30 pages
Basic Configuration of Mikrotik
No ratings yet
Basic Configuration of Mikrotik
7 pages
Administrative Executive Assistant in Chicago IL Resume Keena Lemons
No ratings yet
Administrative Executive Assistant in Chicago IL Resume Keena Lemons
2 pages
Levin
No ratings yet
Levin
6 pages
Instant ebooks textbook Clean Code with C# - Second Edition: Refactor your legacy C# code base and improve application performance using best practices Alls download all chapters
100% (4)
Instant ebooks textbook Clean Code with C# - Second Edition: Refactor your legacy C# code base and improve application performance using best practices Alls download all chapters
62 pages
Traffic Light Notes
No ratings yet
Traffic Light Notes
21 pages
Estratehiya
No ratings yet
Estratehiya
2 pages
Lab Testing: Key Findings and Conclusions
No ratings yet
Lab Testing: Key Findings and Conclusions
6 pages
8021 X
No ratings yet
8021 X
40 pages
Little Riak Book
No ratings yet
Little Riak Book
105 pages
Programming for Problem Solving-1
No ratings yet
Programming for Problem Solving-1
3 pages
Answer Key EN000 A
0% (1)
Answer Key EN000 A
4 pages
An Introduction To Oracle Hyperion Certification Program by Amit Sharma
No ratings yet
An Introduction To Oracle Hyperion Certification Program by Amit Sharma
20 pages
Google Chrome
100% (1)
Google Chrome
8 pages
Using C Int Unix System
No ratings yet
Using C Int Unix System
242 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

CC-KML051-Unit V

Uploaded by

CC-KML051-Unit V

Uploaded by

Cloud Computing (KML051) – Unit 5 Notes – By Mr.

Faz Mohammad (GLB ITM – ACSE)

What is Apache Hadoop?

How does Hadoop work?

What are the benefits of Hadoop?

What are the challenges of Hadoop?

MapReduce complexity and limitations

Governance and management

Why is Hadoop important?

Apache Oozie: A workflow scheduler for Hadoop jobs

What is Apache Hadoop used for?

Related products and services

Runtimes and Languages

Generally Usable Features

OpenStack delivers infrastructure-as-a-service functionality -- it pools, provisions and manages large

What does OpenStack do?

What are the different OpenStack components?

The following map shows all OpenStack components, as of April 2021.

(SOURCE: OPENSTACK.ORG, LICENSED UNDER CREATIVE COMMONS ATTRIBUTION 4.0.)

What are the pros and cons of OpenStack?

Hybrid cloud stacks

Public clouds only

How to get started with OpenStack

What is Cloud Federation?

The architecture of Federated Cloud:

The architecture of Federated Cloud consists of three basic components:

Properties of Federated Cloud:

Benefits of Federated Cloud:

1. It minimizes the consumption of energy.

Challenges in Federated Cloud:

Federated Cloud technologies:

Levels of Federation and Services in Cloud

Cloud Federation stack

Logical and Operational Level

Active Directory Federation Services (ADFS)

Cloud-based Single Sign-On and Identity Federation without ADFS

Radiant One Cloud Federation Service: You’re On-Premises IdP

Federation is the Future of the Cloud

Cloud Federations Benefit Providers and End Users

The Future of Cloud is Federated

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.