CWS Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 55

CWS Notes

Cloud Computing and Characteristics


Cloud Computing refers to the delivery of computing services such as servers,
storage, databases, networking, software, analytics, and intelligence over the
internet ("the cloud"). This allows for faster innovation, flexible resources, and
economies of scale. Instead of owning and maintaining physical infrastructure,
businesses and individuals can rent computing power and storage on an as-
needed basis.
Characteristics of Cloud Computing:
1. On-demand self-service:
o Users can access computing resources like server time and
network storage automatically, without requiring human
intervention from service providers.
2. Broad network access:
o Cloud services are accessible over the network and can be
accessed from a wide range of devices such as laptops, desktops,
mobile phones, and tablets.
3. Resource pooling:
o The cloud provider's resources are pooled together to serve
multiple consumers using a multi-tenant model, with different
physical and virtual resources dynamically assigned according to
demand. The resources are typically abstracted and assigned
based on the user's needs.
4. Rapid elasticity:
o Cloud services can be scaled up or down rapidly depending on the
demand. From a user's perspective, the resources often appear to
be unlimited, and can be appropriated in any quantity at any time.

5. Measured service:
o Cloud systems automatically control and optimize resource use by
leveraging a metering capability. This can be done at some level of
abstraction appropriate to the type of service (e.g., storage,
processing, bandwidth). Customers only pay for what they use.
6. Cost efficiency:
o Cloud computing eliminates the capital expense of buying
hardware and software and setting up and running on-site data
centers, which includes the racks of servers, round-the-clock
electricity for power and cooling, and IT experts to manage the
infrastructure.
7. Security:
o Cloud providers often offer a set of policies, technologies, and
controls that strengthen security and protect data, applications,
and the infrastructure from potential threats.
8. Scalability and flexibility:
o Cloud computing provides the ability to scale resources seamlessly
based on the application demand. Organizations can quickly and
easily scale their infrastructure and services.
These characteristics have made cloud computing a vital technology for
businesses looking for cost-effective, flexible, and scalable solutions for their IT
needs.

Software-as-a-Service (SaaS)
SaaS (Software as a Service) is a cloud computing model where software
applications are delivered over the internet as a service. Users can access these
applications via a web browser without the need to install, manage, or update
software on their local devices.
1. Cloud-hosted:
o SaaS applications are hosted on the provider’s cloud
infrastructure, eliminating the need for local servers.
2. Subscription-based:
o Users typically access the software through a subscription model,
paying on a monthly or annual basis.
3. No installations required:
o SaaS applications can be used directly from a web browser
without the need for installation or setup on local devices.
4. Automatic updates:
o The SaaS provider manages software updates, security patches,
and maintenance, so users always have access to the latest
version.
5. Cost-effective:
o SaaS eliminates the need for costly hardware and software
purchases, reducing upfront expenses for businesses.
6. Scalability:
o SaaS solutions can easily scale up or down according to the
number of users or business needs, providing flexibility.
7. Anywhere accessibility:
o Since SaaS is internet-based, users can access it from any location
with an internet connection, using any compatible device.
8. Multi-tenant architecture:
o Multiple users (or tenants) can access a single instance of the
software, with data segregated for privacy and security.
9. Collaboration-friendly:
o SaaS applications often include collaboration tools, allowing
multiple users to work on documents or projects simultaneously in
real-time.
10.Security management:
• SaaS providers invest in robust security measures, such as encryption
and data protection, helping to safeguard users' information.
Examples of SaaS:
• Google Workspace (formerly G Suite) – Provides tools like Gmail, Google
Drive, Google Docs, etc.
• Microsoft 365 – Offers applications like Word, Excel, PowerPoint,
Outlook, etc., as cloud-based services.
• Salesforce – A customer relationship management (CRM) platform used
to manage customer data, sales, and interactions.
• Slack – A collaboration tool for communication and project
management.
• Dropbox – A cloud storage service that allows users to store files and
access them from any device.
• SaaS is popular for its simplicity, affordability, and accessibility, making it
ideal for both businesses and individual users.

Infrastructure-as-a-Service (IaaS)
IaaS (Infrastructure as a Service) is a cloud computing model that provides
virtualized computing resources over the internet. In this model, a third-party
provider offers virtual machines, storage, servers, networking, and other
infrastructure components, allowing businesses to rent rather than own and
maintain their own physical infrastructure.
Key Points about IaaS:
1. Cloud-based infrastructure:
o IaaS offers essential infrastructure components like virtual servers,
storage, and networks hosted in the cloud, removing the need for
on-premises data centers.
2. Pay-as-you-go model:
o Users are charged based on their usage of resources such as CPU
time, storage, and bandwidth, making it cost-effective.
3. Scalable resources:
o IaaS platforms offer flexible scaling, allowing businesses to
increase or decrease computing resources as needed without
purchasing new hardware.
4. Virtualization technology:
o IaaS providers use virtualization to create virtual instances of
physical resources, enabling users to run multiple virtual machines
on a single physical server.
5. Self-service provisioning:
o Users can access and manage resources through a web interface
or API, allowing them to deploy, configure, and control
infrastructure as needed.
6. No hardware management:
o The cloud provider is responsible for maintaining and managing
physical hardware, while users focus on their own applications and
services.
7. High availability and redundancy:
o IaaS providers often ensure high availability through
geographically distributed data centers, offering redundancy and
disaster recovery options.
8. Customization:
o IaaS allows for high levels of customization. Users can choose their
operating systems, configure their networks, and set up their own
middleware or runtime environments.
9. Security and compliance:
o IaaS providers often offer security features like firewalls, intrusion
detection systems, and data encryption, but users are also
responsible for securing their own applications and data.
10.Supports a variety of workloads:
o IaaS can handle a wide range of workloads, from testing and
development environments to large-scale, mission-critical
applications.
Examples of IaaS:
• Amazon Web Services (AWS) – Provides services like EC2 (Elastic
Compute Cloud), S3 (Simple Storage Service), and VPC (Virtual Private
Cloud).
• Microsoft Azure – Offers virtual machines, blob storage, virtual
networks, and more.
• Google Cloud Platform (GCP) – Provides Compute Engine (VMs), Cloud
Storage, and Networking services.
• IBM Cloud – Offers virtual servers, storage, and networking, along with
advanced AI and data services.
• Oracle Cloud Infrastructure (OCI) – Provides computing, storage, and
networking services for enterprise-level workloads.
• DigitalOcean – Simplified cloud computing service mainly aimed at
developers with virtual machines and object storage

Platform-as-a-Service (PaaS)
PaaS (Platform as a Service) is a cloud computing model that provides a
platform allowing developers to build, deploy, and manage applications
without dealing with the complexity of infrastructure management. PaaS offers
a pre-configured environment with all the necessary tools and services for
application development, streamlining the process of creating software.
Key Points about PaaS:
1. Application development platform:
o PaaS offers a platform with development tools, runtime
environments, databases, and infrastructure required to build,
test, and deploy applications.

2. No infrastructure management:
o The cloud provider manages the underlying hardware, operating
systems, storage, and networking, so developers can focus on
writing code and developing applications.
3. Pre-configured environment:
o PaaS platforms come with pre-installed software components like
web servers, databases, and frameworks, saving time on
configuration and setup.
4. Supports multiple programming languages:
o Most PaaS offerings support multiple programming languages and
frameworks such as Java, Python, Ruby, Node.js, and PHP, giving
developers flexibility in choosing the tools they are comfortable
with.
5. Automated scaling:
o PaaS platforms automatically scale computing resources based on
the demand of the application, ensuring optimal performance
without manual intervention.
6. Collaboration-friendly:
o PaaS is ideal for team collaboration as it allows multiple
developers to work on the same project simultaneously in a
shared environment.
7. Integrated services:
o PaaS often includes built-in services like databases, caching,
authentication, and APIs, allowing developers to easily integrate
these into their applications without building them from scratch.
8. Rapid development and deployment:
o PaaS accelerates the development cycle by providing ready-to-use
components and tools, allowing developers to build and deploy
applications faster than traditional methods.

9. Cost-effective:

o PaaS reduces development costs by providing a pay-as-you-go


model where users only pay for the resources they consume,
avoiding the need for purchasing and maintaining infrastructure.
10.Security and compliance:
o The PaaS provider typically handles security patches, compliance,
and monitoring at the platform level, while developers are
responsible for securing their applications and data.
Examples of PaaS:
• Google App Engine – A fully managed platform for building and hosting
web applications.
• Microsoft Azure App Service – Provides a platform for building,
deploying, and scaling web apps and APIs.
• AWS Elastic Beanstalk – Simplifies the deployment of applications by
managing the infrastructure and scaling automatically.
• Heroku – A popular PaaS platform for developers to build, run, and
operate applications entirely in the cloud.
PaaS is ideal for developers who want to focus on writing code and deploying
applications quickly without managing the underlying infrastructure. It offers a
streamlined environment that makes application development faster and more
efficient while handling scaling and infrastructure concerns.

Difference between SaaS and IaaS


SaaS (Software as a Service):
1. What it is: Ready-to-use software applications accessible via the
internet.
2. User: End-users who need the application without worrying about
infrastructure.
3. Management: Provider manages everything (software, hardware,
networking).
4. Customization: Limited customization; users use the software as is.
5. Scalability: Automatically scales based on usage needs.
6. Cost: Subscription-based, pay-per-user or pay-per-feature.
7. Use Case: Email, collaboration tools, CRM (e.g., Gmail, Salesforce,
Dropbox).
8. Maintenance: Provider handles updates, security, and maintenance.
9. Access: Accessible from any device with a browser.
10.Examples: Google Workspace, Microsoft 365, Salesforce.

IaaS (Infrastructure as a Service):


1. What it is: Virtualized computing resources like servers, storage, and
networking.
2. User: Developers and IT teams who need infrastructure to build and run
applications.
3. Management: Provider manages the physical infrastructure; users
manage applications, OS, and data.
4. Customization: Highly customizable; users control their own software
environment.
5. Scalability: Users scale resources (e.g., VMs, storage) based on demand.
6. Cost: Pay-as-you-go model based on resource consumption.
7. Use Case: Hosting websites, databases, development environments.
8. Maintenance: Users manage their software, operating system, and
updates.
9. Access: Managed via web interfaces or APIs.
10.Examples: AWS EC2, Microsoft Azure, Google Compute Engine.

Key Differences:
• SaaS: Ready-to-use software, no management needed.
• IaaS: Provides infrastructure resources; users manage everything else.
Cloud Elasticity and Scalability
Cloud Elasticity:
• Elasticity allows cloud infrastructure to automatically scale up or down
based on sudden changes in demand.
• It helps manage fluctuating workloads efficiently, minimizing costs.
• Best suited for scenarios with temporary or fluctuating resource needs,
but not for environments that require a persistent, heavy workload.
• It's critical for mission-critical applications where performance is key, as
it ensures that additional resources are provided when needed, like CPU,
memory, or bandwidth.
Cloud Scalability:
• Scalability is used to handle growing workloads with consistent
performance.
• It's often applied in environments requiring persistent resource
deployment to manage static workloads efficiently.
Types of Scalability:
1. Vertical Scalability (Scale-up): Increases the power of existing resources
by adding more capacity (e.g., more CPU or memory).
2. Horizontal Scalability (Scale-out): Adds more resources (e.g., more
servers) to distribute the workload.
3. Diagonal Scalability: Combines both vertical and horizontal scalability,
adding resources in both directions when necessary.

AWS Infrastructure
Amazon Web Services (AWS) provides a comprehensive cloud infrastructure
that allows businesses to scale their applications, store data, and perform
computing tasks in a flexible, cost-effective manner. AWS offers a wide range of
services including computing power, storage options, and networking
capabilities.
1. Regions:
• AWS operates in multiple geographic regions around the world.
• Each region is a separate geographic area that contains multiple
Availability Zones.
• Regions allow users to deploy resources near their customers for low-
latency performance and compliance.
2. Availability Zones (AZs):
• Each Region is divided into Availability Zones (AZs), which are isolated
data centers within a region.
• AZs have independent power, networking, and cooling, ensuring high
availability and fault tolerance.
• Users can distribute workloads across AZs to ensure redundancy and
avoid service disruptions.
3. Edge Locations:
• Edge Locations are global data centers used for content delivery and
caching (via Amazon CloudFront).
• They enable faster content delivery to end-users by caching data closer
to the user's location.
• Used for services like CloudFront, Lambda@Edge, and Route 53.
4. Data Centers:
• AWS infrastructure is built around data centers that house compute,
storage, and networking hardware.
• AWS has numerous data centers across different regions, with security
protocols and physical safeguards in place to protect data.
5. Networking:
• AWS networking services enable secure communication between cloud
resources and external systems.
• Amazon VPC (Virtual Private Cloud) allows users to create isolated
networks in the cloud.
• AWS Direct Connect provides dedicated network connections between
on-premises environments and AWS for low-latency, high-throughput
communication.
6. Storage:
• AWS offers a range of storage services, including:
o Amazon S3 (Simple Storage Service) for scalable object storage.
o Amazon EBS (Elastic Block Store) for block storage attached to EC2
instances.
o Amazon Glacier for long-term, low-cost archival storage.
• AWS storage services are designed to be highly durable, available, and
secure.
7. Compute:
• Amazon EC2 (Elastic Compute Cloud) provides scalable virtual servers
(instances) that can be customized based on computing needs.
• AWS Lambda enables serverless computing, allowing users to run code
without managing infrastructure.
• Elastic Beanstalk automates deployment of applications while managing
the underlying infrastructure.
8. Database:
• AWS provides a variety of managed databases to support different use
cases:
o Amazon RDS (Relational Database Service) for MySQL,
PostgreSQL, Oracle, SQL Server, and MariaDB.
o Amazon DynamoDB for scalable NoSQL database needs.
o Amazon Redshift for data warehousing and analytics.
o Amazon Aurora, a high-performance relational database
compatible with MySQL and PostgreSQL.
Summary of AWS Infrastructure Components:
1. Regions: Geographic areas where AWS resources are deployed.
2. Availability Zones: Isolated data centers within regions for fault
tolerance.
3. Edge Locations: Data centers used for fast content delivery and caching.
4. Data Centers: Physical locations that house AWS hardware.
5. Networking: Virtual private cloud (VPC) for secure communication and
AWS Direct Connect for dedicated links.
6. Storage: Scalable and secure storage services like S3, EBS, and Glacier.
7. Compute: Virtual machines (EC2), serverless functions (Lambda), and
PaaS (Elastic Beanstalk).
8. Database: Managed relational and NoSQL databases like RDS,
DynamoDB, and Redshift.
These components combine to provide a robust, scalable, and highly available
cloud infrastructure for businesses to deploy applications and manage data
with flexibility and security.

AWS S3
Amazon S3 (Simple Storage Service) is a scalable, durable, and secure object
storage service provided by AWS. It is used to store and retrieve any amount of
data, from anywhere on the web. S3 is designed to provide high availability,
low-latency access to data, and offers flexible pricing based on the amount of
storage used.
Key Terms Related to Amazon S3:
a. Buckets:
• Definition: A Bucket is the fundamental container in S3 used to store
objects. Think of it as a "folder" or "directory" for organizing files.
• Unique Name: Each bucket name must be globally unique across AWS,
as the bucket name is part of the URL used to access the objects inside.
• Location: Buckets are created in specific AWS regions. This allows users
to store data close to where it is needed.
• Access Control: Buckets have configurable permissions, allowing users to
define who can read/write to the bucket (e.g., public access, specific IAM
users, etc.).
b. Object:
• Definition: An Object is the actual data that you store in a bucket in S3.
An object consists of two parts:
1. Data: The file (e.g., an image, document, video, or any other type
of file).
2. Metadata: Information about the object (such as the file type, last
modified date, and custom metadata added by the user).
• Unique Identifier: Each object is uniquely identified by a key (essentially
the name of the object within the bucket) and the bucket name.
• Storage: Objects can be as large as 5 TB each in size.

c. Lifecycle Management in S3:


• Definition: Lifecycle Management allows users to automatically manage
the lifecycle of objects in S3, including transitioning between storage
classes and deleting objects based on specified rules.
• Key Features:
1. Transition: Automatically move objects between different storage
classes (e.g., from S3 Standard to S3 Glacier for archival storage)
based on age or other criteria.
2. Expiration: Automatically delete objects after a specified period or
when they are no longer needed (e.g., after 30 days).
3. Custom Rules: Users can set up custom rules for transitioning or
deleting objects, such as:
▪ Transitioning objects to a cheaper storage class (like Glacier)
after 30 days.
▪ Deleting objects older than 1 year.
• Cost Management: Lifecycle management helps in optimizing storage
costs by moving infrequently accessed data to lower-cost storage classes
or deleting unused data.

Amazon S3 Storage and Lifecycle Management


Amazon S3 offers various storage tiers (also called storage classes) that are
optimized for different use cases, from frequently accessed data to long-term
archival storage. Each storage class has different cost structures and
performance characteristics. Here's an overview of the S3 storage tiers and
how lifecycle management applies to each tier:
S3 Storage Tiers (Storage Classes):
1. S3 Standard (Standard Storage Class)
o Use Case: Frequently accessed data (e.g., active files, websites,
applications).
o Performance: Low latency and high throughput.
o Durability: 99.999999999% (11 9's) durability.
o Availability: 99.99% availability over a given year.
o Cost: Higher cost due to frequent access.
o Lifecycle Management:
▪ Transition to Lower Tiers: Move to S3 Standard-IA or S3
Glacier for infrequent access after a certain period (e.g., 30
days).
▪ Expiration: Set to delete after a specific time if data is no
longer needed.
2. S3 Standard-IA (Infrequent Access)
o Use Case: Infrequently accessed data that needs fast retrieval
when accessed (e.g., backups, disaster recovery).
o Performance: Similar to S3 Standard but designed for infrequent
access.
o Durability: 99.999999999% durability.
o Availability: 99.9% availability over a given year.
o Cost: Lower cost than S3 Standard but retrieval costs apply.
o Lifecycle Management:
▪ Transition from S3 Standard: Automatically move data from
S3 Standard to S3 Standard-IA after a certain period of
inactivity (e.g., after 30 days).
▪ Transition to S3 Glacier: For very long-term storage,
transition objects to S3 Glacier after a few months or years.
▪ Expiration: Set to delete older or obsolete files that are no
longer needed.

3. S3 One Zone-IA
o Use Case: Infrequently accessed data that does not require
multiple availability zone redundancy (e.g., secondary backups,
non-critical data).
o Performance: Low latency and high throughput, but stored in a
single availability zone.
o Durability: 99.999999999% durability, but limited to one
availability zone (lower fault tolerance than Standard-IA).
o Availability: 99.5% availability over a given year.
o Cost: Lower cost than S3 Standard-IA.
o Lifecycle Management:
▪ Transition: Move data to S3 Glacier for long-term, low-cost
storage.
▪ Expiration: Automatically delete the data after a defined
retention period.
4. S3 Glacier
o Use Case: Archival storage and data that is rarely accessed but
needs to be preserved (e.g., long-term backups, legal records).
o Performance: Retrieval times vary from minutes to hours
(depending on retrieval type: Expedited, Standard, or Bulk).
o Durability: 99.999999999% durability.
o Cost: Very low cost for storage, but retrieval fees apply based on
speed of access.
o Lifecycle Management:
▪ Transition from S3 Standard-IA or One Zone-IA: Move data
to S3 Glacier for long-term storage after a specified period
(e.g., 1 year).
▪ Expiration: Delete data once it is no longer needed or after
a defined retention period.
5. S3 Glacier Deep Archive
o Use Case: Long-term archiving of data that is rarely accessed (e.g.,
regulatory archives, compliance data).
o Performance: Retrieval takes 12 hours or more.
o Durability: 99.999999999% durability.
o Cost: Lowest storage cost in AWS, but retrieval can be expensive
and slow.
o Lifecycle Management:
▪ Transition from S3 Glacier: Move to S3 Glacier Deep
Archive for very infrequent access or compliance storage.
▪ Expiration: Set to delete after a long retention period if the
data is no longer required.
6. S3 Intelligent-Tiering
o Use Case: Data with unpredictable access patterns. It
automatically moves data between two access tiers (frequent and
infrequent access) based on access patterns.
o Performance: Low latency and high throughput.
o Durability: 99.999999999% durability.
o Cost: Slightly higher than S3 Standard-IA due to automation, but
no retrieval charges for infrequent access.
o Lifecycle Management:
▪ Automatic Tiering: Automatically moves data between
Frequent Access and Infrequent Access tiers.
▪ Transition to Glacier: Users can set up rules to transition
older data to S3 Glacier or Glacier Deep Archive for cost
savings.
Lifecycle Management Strategy for Each Tier:
1. S3 Standard: Transition to S3 Standard-IA or S3 Glacier if infrequently
accessed.
2. S3 Standard-IA: Transition to S3 Glacier or S3 Glacier Deep Archive for
long-term archiving, or delete after a specified period.
3. S3 One Zone-IA: Transition to S3 Glacier for archival storage or delete
when no longer needed.
4. S3 Glacier: Transition to S3 Glacier Deep Archive for long-term archiving
or delete when data retention is complete.
5. S3 Glacier Deep Archive: Delete after a specified retention period if no
longer required.
6. S3 Intelligent-Tiering: Automatically moves objects between Frequent
Access and Infrequent Access tiers. Transition to Glacier for long-term
storage.

Summary of S3 Tiers and Lifecycle Management:


• S3 Standard: For frequently accessed data. Lifecycle management moves
it to lower-cost tiers like S3 Standard-IA or S3 Glacier after a period.
• S3 Standard-IA: For less frequently accessed data. Can transition to S3
Glacier or S3 Glacier Deep Archive for archival purposes.
• S3 Glacier & Deep Archive: For long-term storage. Transition from
infrequent access tiers and eventually expire when no longer needed.
• S3 Intelligent-Tiering: Automatically moves data between access tiers
based on usage, with optional transitions to archival storage.
These storage classes allow users to manage their data efficiently by reducing
costs while ensuring performance and durability.

AWS EC2
Amazon EC2 (Elastic Compute Cloud) is one of the core services provided by
AWS that allows you to run virtual machines, called instances, in the cloud. EC2
provides scalable computing power and flexibility to run applications, process
data, host websites, and more. Users can launch virtual machines in minutes,
configure them as needed, and scale them up or down depending on demand.
Key Features of EC2:
• Scalability: Easily scale up or down based on your application’s
requirements.
• Customizable: Choose the instance type (CPU, memory, storage) that fits
your workload.
• Pay-as-you-go: Pay only for the compute resources you use, with options
for long-term savings (like Reserved Instances).
• Global Reach: Launch instances in different AWS Regions and
Availability Zones worldwide for lower latency and high availability.
• Elastic Load Balancing (ELB): Distribute incoming traffic across multiple
instances to ensure availability and fault tolerance.
Different Instance Types in EC2
AWS EC2 instances are categorized into different instance types, each
optimized for different use cases based on CPU, memory, storage, and
networking capabilities.
Here are the major EC2 instance families and their characteristics:
1. General Purpose Instances:
o Instance Types: t3, t3a, t2, m5, m5a, m6g, m6i
o Use Case: Balanced compute, memory, and network resources for
applications like small to medium databases, development and
testing environments, and web servers.

o Characteristics:
▪ Provides a balance of CPU, memory, and network resources.
▪ Can handle diverse workloads such as web hosting,
application servers, and development environments.
2. Compute Optimized Instances:
o Instance Types: c5, c5a, c5n, c6g, c6i
o Use Case: High-performance compute-intensive applications like
high-performance web servers, scientific modeling, and batch
processing.
o Characteristics:
▪ Optimized for compute-heavy applications.
▪ High-performance processors (often Intel or AMD) with high
clock speeds.
▪ Ideal for CPU-bound tasks like gaming servers, data
analytics, and scientific applications.
3. Memory Optimized Instances:
o Instance Types: r5, r5a, r5n, r6g, r6i, x1e, u-6tb1.metal
o Use Case: Applications requiring large amounts of memory, such
as high-performance databases, in-memory caches, and real-time
big data analytics.
o Characteristics:
▪ Provides a high ratio of memory to CPU.
▪ Ideal for workloads such as databases, in-memory caches,
and big data analytics.
4. Storage Optimized Instances:
o Instance Types: i3, i3en, d2, h1, i4i
o Use Case: Applications that require high disk throughput and low-
latency access to large amounts of data (e.g., NoSQL databases,
data warehousing).
o Characteristics:
▪ Provides fast, low-latency access to local storage.
▪ Ideal for workloads that require high storage performance
such as large databases, data warehousing, and real-time
big data processing.
5. Accelerated Computing Instances:
o Instance Types: p4, p3, inf1, g4ad, g5
o Use Case: Machine learning, artificial intelligence (AI), graphics
processing, and video transcoding.
o Characteristics:
▪ Includes hardware accelerators like GPUs (Graphics
Processing Units) and FPGAs (Field-Programmable Gate
Arrays).
▪ Designed for compute-intensive applications like deep
learning training and inferencing, 3D rendering, and high-
performance computing.
▪ that don’t require constant full CPU utilization.
Choosing the Right Instance:
• General Purpose instances are ideal for a wide range of applications,
including development, testing, and web hosting.
• Compute Optimized instances are best for applications that need high
processing power, such as batch processing and scientific computing.
• Memory Optimized instances are suited for applications that need large
amounts of memory, such as databases and analytics platforms.
• Storage Optimized instances are designed for data-intensive applications
requiring high storage throughput.
• Accelerated Computing instances are built for tasks requiring specialized
hardware accelerators like GPUs for machine learning and AI.

EC2 Purchasing
Option Commitment Pricing Use Case Key Benefit Cost
Model Advantage

On-Demand No Pay-per- Unpredictable workloads, Flexibility, pay Higher cost


Instance commitment hour or short-term or only for what compared to
per- development/testing you use others
second

Dedicated No Pay-per- Instances on hardware Physical More


Instance commitment hour or dedicated to your server expensive than
per- account, isolated from isolation for On-Demand,
second other AWS accounts compliance less than
Dedicated Host

Dedicated No Pay-per- Full control over instance Full control Expensive,


Host commitment hour or placement, used for over hardware suitable for
per- licensing needs (e.g., BYOL licensing
second BYOL) and
compliance

Spot No Bidding Interruptible, flexible Low cost, up Substantial


Instance commitment (pay workloads (e.g., big data, to 90% cost savings,
based on batch processing) savings but risk of
Spot compared to termination
price) On-Demand

Scheduled Commitment Pay-per- Applications with Pre-scheduled Lower cost


Instance (specific time) hour or predictable usage instances for than On-
per- patterns at specific times predictable Demand, but
second workloads higher than
Reserved

Reserved 1 or 3 years All Predictable workloads Substantial Up to 75%


Instance (commitment) upfront, with steady demand cost savings savings
partial (e.g., long-term, for long-term, compared to
upfront, consistent usage) steady usage On-Demand
or no
upfront
Summary of EC2 Purchasing Options:
• On-Demand Instances: Flexible and good for unpredictable workloads.
Higher cost due to the lack of long-term commitment.
• Dedicated Instances: Run on hardware dedicated to your account,
ensuring isolation. More expensive than On-Demand but cheaper than
Dedicated Hosts.
• Dedicated Hosts: Full control over server placement and use cases like
BYOL licensing. Expensive but ideal for compliance needs.
• Spot Instances: Cheap but can be terminated at any time. Best suited for
flexible, non-critical workloads.
• Scheduled Instances: Suitable for workloads that occur at predictable
times. Lower cost than On-Demand, but not as flexible.
• Reserved Instances: Long-term commitment with up to 75% savings.
Best for consistent, predictable workloads.

Versioning in AWS S3
Versioning in Amazon S3 (Simple Storage Service) is a feature that allows you to
preserve, retrieve, and restore every version of every object stored in a bucket.
When versioning is enabled on an S3 bucket, multiple versions of an object
(file) can exist in the same bucket, providing an important mechanism for data
protection.
Key Features of S3 Versioning:
1. Object Versioning:
o When you upload a new object with the same key (name) to a
versioned bucket, instead of overwriting the previous object,
Amazon S3 stores the new object as a new version.
o Every version of the object, including deletes, is tracked and
stored.
2. Protects Against Unintentional Overwrites:
o When versioning is enabled, each update creates a new version.
You can easily recover previous versions if an object is accidentally
overwritten.
3. Delete Protection:
o Even if an object is deleted, the object is not actually removed.
Instead, a delete marker is created, and the previous versions
remain intact.
o You can restore the object by deleting the delete marker to make
the prior version the current version again.
4. Data Archiving:
o You can use versioning to archive older versions of your objects
and move them to cheaper storage tiers (such as S3 Glacier or
Glacier Deep Archive) for long-term retention.
5. Enabling Versioning:
o Versioning must be explicitly enabled for each S3 bucket.
o Once enabled, versioning cannot be disabled, though it can be
suspended. Suspending versioning prevents new versions from
being created but does not remove existing versions.
6. Version IDs:
o Each object version gets assigned a unique Version ID. You can
reference different versions of the object using this Version ID to
access or restore a specific version.
7. Cost Considerations:
o Versioning increases storage costs since every version of an object
is retained. Implementing Lifecycle Management to automatically
transition older versions to cheaper storage classes can help
mitigate costs.
Common Use Cases of Versioning:
• Data Protection: Protect against accidental deletions or overwrites.
• Backup: Keep backup copies of files for disaster recovery.
• Auditing: Keep historical versions of files for compliance or auditing
purposes.
• Archiving: Archive previous versions of files to cheaper storage classes
like S3 Glacier.

Amazon S3 ACL (Access Control List) - Key Points:


1. What is ACL? In Amazon S3 (Simple Storage Service), Access Control Lists
(ACLs) are one of the access control mechanisms used to manage
permissions for individual objects (files) and buckets (folders) within S3.
An ACL defines who can access an S3 resource and the type of
operations they can perform (e.g., read, write).
2. How it Works:
o Owner Control: Every S3 object or bucket has an owner (the AWS
account that created the resource). The owner has full control and
can set permissions for other AWS accounts.
o Permissions: ACLs grant read/write permissions to:
▪ Other AWS accounts (by specifying their AWS account ID).
▪ Groups (like predefined Amazon groups such as
Authenticated Users or AllUsers).
▪ Public access (enabling anonymous access).
o Permissions are granular, allowing access to specific operations
like reading an object or listing a bucket.
3. Types of Permissions in ACL:
o READ: Allows reading an object, listing the contents of a bucket, or
reading access logs.
o WRITE: Allows modifying or deleting an object, or adding/deleting
an object in a bucket.
o READ_ACP: Allows reading the ACL of a bucket or object.
o WRITE_ACP: Allows modifying the ACL of a bucket or object.
o FULL_CONTROL: Provides complete control (all permissions).
4. ACL Structure:
o ACLs have a list of grants, where each grant consists of a grantee
(the entity being granted access) and the specific permissions
(e.g., read, write).
o Grants can be assigned to AWS account users, Amazon groups, or
even to the public.

Benefits of Using Amazon S3 ACLs:


1. Fine-Grained Access Control: ACLs allow you to grant precise
permissions to specific users or groups at the object or bucket level. You
can allow different levels of access (read, write, full control) to different
AWS accounts or even anonymous users.
2. Cross-Account Access: With ACLs, you can easily share access with other
AWS accounts by specifying their account IDs without needing to set up
more complex IAM policies.
3. Simple Public Access: ACLs provide a straightforward way to make
objects or buckets public by granting access to the AllUsers group,
enabling easy content distribution for public resources (like images or
documents).
4. Bucket Logging Access: ACLs enable the owner of a bucket to grant
Amazon services like S3 log delivery permission to write log files to a
bucket.
5. Legacy Support: ACLs are a legacy feature that offers compatibility with
older systems or applications that rely on this access control method.
6. Object-Level Permissions: ACLs allow setting specific permissions on a
per-object basis, providing flexibility for scenarios where different
objects in the same bucket need different access levels.
7. Decentralized Permission Management: By using ACLs, specific AWS
accounts can be delegated permission control for their objects in a
shared bucket, reducing the administrative overhead on the bucket
owner.
EC2
a.) Security Groups:
• Definition: Security groups function as virtual firewalls for EC2 instances,
controlling network traffic.
• Features:
1. Acts at the instance level; controls traffic to and from the instance.
2. Can define inbound and outbound rules specifying allowed traffic
by IP addresses, ports, and protocols.
3. Stateful: Inbound rules automatically allow corresponding
outbound traffic for the same session.
4. Multiple security groups can be associated with an instance, and
an instance can belong to several security groups.
5. Rules are applied immediately, with no need to reboot the
instance after changes.
6. Supports group-based rules—you can refer to other security
groups as the source or destination.
7. Logging: Security groups can be integrated with VPC Flow Logs to
capture traffic patterns.
8. Default Security Group: Each VPC has a default security group
with basic rules (allow self-communication and outbound traffic).
9. Security groups are not tied to a specific instance but rather to
the network interface attached to instances.
10.Rules can be modified at any time, and changes apply instantly
across the instances using the security group.
b.) Image Management (AMI - Amazon Machine Image):
• Definition: AMIs are pre-configured templates for creating EC2 instances,
including the OS, application server, and installed software.
• Features:
1. Can be public (available for all AWS users) or private (only
available to your account).
2. Customizable: You can create a custom AMI from an existing EC2
instance with all its configurations and applications.
3. AMI Types: AWS provides different types of AMIs (e.g., Linux,
Windows, custom, etc.).
4. Region-Specific: AMIs are specific to a region, but can be copied
between regions.
5. Snapshots: You can create snapshots of an instance's volumes and
create AMIs from them.
6. Versioning: You can maintain multiple versions of AMIs for
different environments (e.g., dev, staging, production).
7. Automated Creation: AMIs can be automatically created using
tools like AWS Lambda and Amazon CloudWatch Events.
8. Used for scalability—launch identical instances across multiple
regions or accounts based on the same image.
9. Cost-effective: When you launch instances from an AMI, you pay
only for the instances and storage, not the AMI itself.
10.Secure: You can set access control policies for who can use or
modify your AMIs.
c.) Key Pairs:
• Definition: A key pair in EC2 is a set of public and private keys used for
secure SSH access to EC2 instances.
• Features:
1. SSH Key-Based Authentication: Instead of passwords, you use the
private key to securely access instances via SSH.
2. The public key is stored on the EC2 instance, and the private key
stays with the user (never shared).
3. No password is required for SSH access when using key pairs
(more secure than passwords).
4. EC2 Instance Connect: A feature that allows temporary, secure
SSH access without needing a previously created key pair.
5. Key pairs are generated via the AWS Management Console, CLI, or
API, and can be downloaded only once.
6. Encryption: Private keys are never stored by AWS and are the
responsibility of the user.
7. Key pairs can be associated with multiple instances—same key
can be used to access all instances.
8. If you lose the private key, you won’t be able to access the
instance, making key management critical.
9. You can rotate keys by creating a new key pair and associating it
with the instance.
10.AWS allows Key Pair Management using IAM roles, allowing easier
distribution in large organizations.
d.) EBS (Elastic Block Store):
• Definition: EBS is a scalable, persistent block storage service that can be
attached to EC2 instances.
• Features:
1. Persistence: Data on EBS volumes is persistent, meaning it persists
even if the EC2 instance is stopped or terminated.
2. EBS volumes can be attached to any running or stopped EC2
instance within the same Availability Zone.
3. Scalability: You can resize or change the type of an EBS volume
without downtime.
4. Snapshot Capabilities: You can create snapshots of EBS volumes
for backup or creating AMIs.
5. Multiple Volume Types: EBS offers different volume types for
different use cases (e.g., SSD-backed (gp2), IOPS (io1), HDD (st1)).
6. Encryption: EBS volumes support encryption at rest for data
security.
7. Performance: EBS volumes can offer high throughput and low
latency, making them suitable for database workloads.
8. You can mount multiple EBS volumes to an instance to increase
storage capacity.
9. EBS volumes are automatically replicated within an Availability
Zone to protect against hardware failures.
10.Cost-Effective: You pay based on the storage size and provisioned
IOPS for your EBS volumes, which can help optimize costs.
Each of these concepts is critical to effectively using EC2, providing security,
management, access control, and persistent storage for your cloud workloads.

Object Based Storage vs Block Based Storage


Aspect Object-Based Storage Block-Based Storage
Definition Data is stored as discrete, Data is stored in fixed-size
self-contained objects, each blocks, each block having an
with a unique identifier address, without any inherent
(key). metadata.
Structure Data is stored as objects Data is split into blocks (e.g.,
with metadata and a unique 4KB or 8KB) and stored at a
ID. Includes the data itself specific memory address on
and attributes. the storage device.
Data Access Accessed through API calls Accessed through block-level
(e.g., RESTful APIs, HTTP), protocols (e.g., iSCSI, Fibre
primarily for unstructured Channel), used for structured
data. data.
Use Case Ideal for large-scale Best for applications that
unstructured data such as require fast and consistent
images, videos, backups, I/O, such as databases, virtual
and archives. machines, and OS-level
storage.
Scalability Highly scalable (horizontal Limited scalability (vertical
scaling), ideal for large scaling), adding storage
amounts of data with a typically requires adding more
distributed architecture. disks or hardware.
Performance Slower compared to block High-performance access,
storage for high- providing low latency and fast
performance workloads, I/O operations, optimized for
optimized for throughput transaction-intensive
and durability. workloads.
Cost Generally cheaper for large, More expensive due to its
unstructured data. Suitable low-latency performance and
for storing static or high speed. Often used for
infrequently accessed data. high-demand applications.
Example Amazon S3, Google Cloud Amazon EBS, Azure Managed
Services Storage, Azure Blob Disks, Google Persistent
Storage. Disks.
Data Managed with metadata Managed using file systems
Management that helps with organizing (like NTFS, ext4), requiring an
and searching objects. Data OS to handle block
is self-contained. management.
Durability Highly durable, often Typically durable within a
designed for data single system or zone but may
redundancy across multiple require RAID configurations
locations (e.g., S3 provides for redundancy.
11 9's durability).
Access Best suited for write-once- Ideal for random read/write
Patterns read-many (WORM) access operations where data is
patterns where data is frequently updated or
appended or accessed modified.
frequently.
Example Amazon S3 (stores objects Amazon EBS (provides block-
such as files, videos, etc.) level storage for EC2
instances)
Key Differences:
• Storage Structure: Object-based stores data as objects (with metadata),
while block-based stores data as blocks with an address.
• Use Cases: Object storage is ideal for unstructured data (images, videos),
while block storage is suited for structured data (databases, VMs).
• Performance: Block storage typically offers lower latency and faster
performance, making it suitable for applications needing high I/O
performance.
• Scalability: Object storage is more scalable and cost-efficient for large-
scale storage of unstructured data, while block storage is generally more
expensive but offers higher performance.
In summary:
• Object-based storage is great for cost-effective, large-scale storage of
static, unstructured data.
• Block-based storage is optimized for fast, transactional workloads
requiring low-latency access and high performance.

IAM and Authentication & Authorization


Features of IAM (Identity and Access Management) in AWS:
IAM is a service that helps manage access to AWS resources securely. It allows
you to control who can access your AWS resources and what they can do with
those resources. Here are the key features of IAM:
1. User Management:
o Allows you to create and manage individual users within your AWS
account, each with unique security credentials.
o Users can be assigned to IAM groups to manage permissions
collectively.
2. Role-Based Access Control (RBAC):
o IAM roles are created to grant access to specific resources. Instead
of assigning permissions to individual users, roles are assigned,
and users can assume these roles.
o Roles are often used to grant temporary access or permissions to
services (like EC2 or Lambda) to interact with other AWS services.
3. Granular Permissions:
o IAM allows fine-grained permissions through policies that specify
allowed or denied actions on AWS resources.
o Policies can be applied to users, groups, and roles.
4. Policy-Based Access:
o Policies are written in JSON format that specify what actions are
allowed or denied on which AWS resources.
o Managed Policies (AWS pre-built) and Customer Managed
Policies (user-defined) can be used.
5. Temporary Security Credentials:
o With IAM roles, you can provide temporary security credentials
to users, applications, or AWS services (using AWS STS - Security
Token Service).
o Useful for short-lived sessions or federated users.
6. Multi-Factor Authentication (MFA):
o Enhances security by requiring two forms of authentication:
something you know (password) and something you have (MFA
token).
o MFA can be applied to individual users for increased security.
7. Federation:
o Allows users from external identity providers (e.g., Active
Directory, Google, Facebook) to authenticate and access AWS
services.
o Useful for single sign-on (SSO) across different platforms and
services.
8. Access Advisor:
o Provides visibility into what permissions a user has and the last
time those permissions were used. It helps identify unused
permissions and minimize excessive access.
9. Resource-Based Policies:
o In addition to IAM user, group, and role policies, AWS services (like
S3, Lambda) support resource-based policies to control access at
the resource level.
10.Logging and Monitoring:
o IAM integrates with AWS CloudTrail, which logs API calls for
auditing purposes.
o You can track who did what, when, and from where within your
AWS environment.

Authentication vs. Authorization:


Authentication:
• Authentication is the process of verifying the identity of a user, system,
or application.
• It confirms whether a user or system is who they claim to be.
• AWS supports multiple authentication methods:
o Password-based authentication: A user provides a password to
verify their identity.
o MFA (Multi-Factor Authentication): Requires an additional layer
of security (e.g., OTP from a mobile app or hardware token) to
access resources.
o Federated authentication: Allows users to authenticate using
external identity providers (such as Active Directory, Google,
Facebook).
• Key Concept: Authentication is about "Who are you?".
Authorization:
• Authorization determines what an authenticated user is allowed to do.
• Once the user's identity is confirmed, authorization defines what actions
or resources the user has permission to access.
• In AWS, authorization is implemented via IAM policies, roles, and
groups.
• Key Concept: Authorization is about "What are you allowed to do?".
• Policies define permissions in IAM for various actions (e.g., s3:GetObject,
ec2:StartInstances) on specific resources (e.g., S3 buckets, EC2
instances).
IAM and the Authentication-Authorization Process:
1. Authentication: When a user logs into the AWS Management Console or
calls the AWS API, IAM checks the user's credentials (username,
password, MFA).
2. Authorization: Once authenticated, IAM evaluates the user’s
permissions based on IAM policies to decide whether they can access
specific resources (e.g., S3, EC2).

Summary of IAM Features:


• User and Group Management: Simplifies the administration of AWS
users.
• Role-Based Access Control: Grants specific roles to users or AWS
services.
• Granular Access Control: Offers fine-tuned permissions using policies.
• Federation and Single Sign-On: Provides seamless access for users
outside the AWS environment.
• Logging and Monitoring: Enhances security through auditing and
tracking of API calls.
• Multi-Factor Authentication: Adds a second layer of security for
sensitive operations.
IAM's primary job is to secure and manage access to AWS services by
authenticating users and authorizing their permissions.
IP Classes and Subnetting
IP Classes:
IP addresses are categorized into different classes to help organize the vast
number of IP addresses available. These classes were originally designed to
assign different address ranges to different types of networks. The most
common IP addresses used are IPv4 addresses, which are 32-bit numbers
typically represented as four octets (e.g., 192.168.1.1). Here are the most
common IP address classes:
Class Address Range Default Network Type Use Case
Subnet Mask

Class 1.0.0.0 to 255.0.0.0 Large networks (e.g., Supports 16 million


A 127.255.255.255 large organizations or hosts, used for very
ISPs) large networks

Class 128.0.0.0 to 255.255.0.0 Medium-sized Supports 65,000


B 191.255.255.255 networks (e.g., hosts, typically used
universities, by mid-sized
corporations) networks

Class 192.0.0.0 to 255.255.255.0 Small networks (e.g., Supports 254 hosts,


C 223.255.255.255 small businesses, commonly used in
home networks) local networks

Class 224.0.0.0 to N/A Multicast addresses, Used for multicast


D 239.255.255.255 used for group transmissions, not
communications assigned to
individual hosts

Class 240.0.0.0 to N/A Reserved for Reserved for


E 255.255.255.255 experimental or research,
future use development, and
experimental
purposes

Subnetting:
Subnetting is the process of dividing a larger network (typically a Class A, Class
B, or Class C network) into smaller, more manageable subnetworks (subnets).
Subnetting allows better utilization of IP address space and helps improve
network performance and security.
Key Terms in Subnetting:
1. Network Address: The first address of the subnet. This identifies the
subnet itself and cannot be assigned to any device.
2. Broadcast Address: The last address of the subnet. It is used to send a
message to all devices within the subnet.
3. Host Range: The range of addresses between the network address and
the broadcast address. These are the assignable addresses for devices in
the subnet.
How Subnetting Works:
• Subnetting allows the division of a network into smaller sub-networks.
• By using the subnet mask, you can define the portion of the IP address
that is reserved for the network and the portion that is available for host
addresses.
• The subnet mask (e.g., 255.255.255.0) tells you how many bits are
allocated to the network part and how many bits are available for host
addresses.
Subnet Mask:
A subnet mask is a 32-bit number used to divide an IP address into network
and host portions. Here are some common subnet masks:
• 255.0.0.0 (Class A)
• 255.255.0.0 (Class B)
• 255.255.255.0 (Class C)
CIDR (Classless Inter-Domain Routing):
• CIDR is an alternative to traditional IP class-based addressing.
• It uses a notation like 192.168.1.0/24 where /24 indicates that the first
24 bits of the IP address are used for the network portion and the
remaining bits are used for hosts.
• CIDR is more flexible than the old class system because it allows more
precise allocation of IP addresses.
Why Subnetting Is Important:
1. Efficient IP Usage: Subnetting helps optimize the use of IP addresses,
particularly in large networks.
2. Security: Subnetting limits the size of broadcast domains, reducing the
risk of security breaches.
3. Improved Performance: Smaller subnets reduce the volume of traffic
within each subnet, improving network performance.
In summary:
• IP classes help categorize IP addresses into networks based on size.
• Subnetting divides a large network into smaller, more manageable
segments, allowing better resource utilization and security.

Router vs Gateway
Aspect Router Gateway

Definition A networking device that A network device that acts as a point of


forwards data packets entry or exit in a network, connecting
between networks, ensuring two different network systems (often
data reaches its destination. with different protocols).

Function Directs traffic between Acts as a translator between two


networks based on IP different network protocols, ensuring
addresses, enabling communication between two
communication between incompatible systems.
devices on different networks.

Layer in OSI Operates at Layer 3 (Network Can operate across multiple layers,
Model Layer), using IP addresses to typically Layer 7 (Application Layer) or
determine the best path for Layer 3. It often deals with protocol
data transmission. conversion at higher layers.

Use Case Used primarily for routing Used to connect two networks with
traffic between LANs (Local different protocols (e.g., connecting an
Area Networks) or WANs IP network to a non-IP network like a
(Wide Area Networks). legacy system or telecommunication
network).
Protocol Works within the same Handles protocol translation, allowing
Support protocol family, typically IP- communication between networks with
based (IPv4, IPv6). different protocols (e.g., IP to X.25).

Traffic Deals with data packet routing Translates, processes, and forwards
Handling based on IP addresses and traffic between different networks,
decides the optimal path for even if they use different
data to travel. communication protocols.

IP Addressing Routes packets based on IP May or may not use IP addresses, as it


addresses. Ensures proper deals with protocol conversion and
forwarding of data within and bridging incompatible systems.
between networks.

Security Role Provides basic routing but Often incorporates firewall features,
typically does not offer providing security, and acting as a
extensive security functions. checkpoint for external and internal
Some routers include firewall networks.
capabilities.

Example A router directing traffic A gateway connecting a private network


between a home LAN and the to the internet or a VoIP gateway
internet. converting data between voice
protocols.

Complexity Simpler device, mainly More complex device, handling


focused on routing packets protocol translation and additional
between IP networks. functionalities like security or data
filtering.

Connection Connects two or more IP- Connects two different types of


Type based networks, like a local networks, such as a local network to a
network to an external telecommunications network or a
network (e.g., the internet). legacy system.

Main Purpose To route data packets To act as an interface between different


between networks, ensuring networks, allowing communication and
efficient data transfer. data exchange across systems with
different protocols.

NAT (Network Often supports NAT to allow Can perform NAT and other protocol
Address multiple devices on a private conversion processes to manage
Translation) network to share a single communication between different
public IP address. networks.
Summary:
• Router: Specializes in routing data between IP-based networks, ensuring
that data packets are sent to the correct destination.
• Gateway: Translates communication between different network
protocols or systems, often acting as a protocol converter and security
checkpoint.
While routers focus mainly on IP-based routing, gateways are more versatile
and handle protocol translation and connectivity between different types of
networks, such as converting data between internet networks and legacy
systems.

Grid Computing
Grid computing is a distributed architecture where multiple computers,
connected by networks, work together to perform a joint task. The system
operates by breaking down a task into smaller subtasks, which are distributed
across different computers (grid nodes). These nodes then work in parallel, and
their outputs are combined to accomplish the overall task.
How Grid Computing Works
1. Control Node: A server or group of servers that administers and
maintains the network's resource pool.
2. Provider (Grid Node): A computer contributing resources such as
processing power and storage to the grid.
3. User: A computer that utilizes the grid resources to complete a task.
The grid operates via specialized software that manages task distribution,
communication, and resource sharing. The software divides the main task into
subtasks and assigns them to various grid nodes for parallel processing.
Key Components of Grid Computing:
1. User Interface:
o Provides users with a unified portal-like interface to launch and
manage applications on the grid.
o Users view the grid as a single large virtual computer offering
computing resources.
2. Security:
o Grid security is ensured through mechanisms like authentication,
authorization, and data encryption.
o Grid Security Infrastructure (GSI) facilitates secure communication
within the grid using tools like OpenSSL.
3. Scheduler:
o Responsible for scheduling tasks across grid nodes, ensuring
efficient execution.
o High-level schedulers may be required to manage resources across
different clusters.
4. Data Management:
o Involves secure data movement and access across grid nodes.
o Example: The Globus toolkit with GridFTP for secure file transfer
and data management.
5. Workload & Resource Management:
o Handles job execution, monitors job status, and retrieves results.
o Coordinates resource availability and workload distribution across
grid nodes.
Types of Grid Computing:
• Computational Grids: Focus on distributing and executing complex
computational tasks.
• Data Grids: Manage and distribute large data sets across geographically
dispersed locations.
Applications of Grid Computing:
• Scientific research (e.g., protein folding simulations).
• Large-scale data analysis (e.g., climate modeling).
• Collaboration between organizations for shared computing resources.
What Are Search Engines?
Search engines are programs that help users find information on the internet.
They use algorithms to index and rank web pages based on relevance to a
user's query. Popular search engines include Google, Bing, and Yahoo. For
example, if a student searches for "C++ tutorial GeeksforGeeks," the search
engine provides links to relevant tutorials.
How Do Search Engines Work?
Search engines operate through three main steps: Crawling, Indexing, and
Ranking.
1. Crawling:
o Computer programs, known as crawlers or spiders, explore the
web to find publicly available information. They scan websites,
read the HTML code, and understand the structure and content of
each page.
o Importance: If crawlers can't access a site, it won't be ranked or
appear in search results.
2. Indexing:
o Once crawlers identify a page, the data is organized and stored in a
large database (index). The index includes details like the title,
description, keywords, and links of a page.
o Importance: If a page is not indexed, it won't appear in search
results.
3. Ranking:
o Search engines use algorithms to rank pages based on how well
they match the user's query.
▪ Step 1: Analyzing the user's query and breaking it down into
keywords.
▪ Step 2: Finding pages that match the query from the index.
▪ Step 3: Presenting the best matching results to the user,
often supplemented by paid ads or direct answers.
In short, search engines crawl the web, index the data, and rank pages to
deliver relevant search results.

Search Engine Components


A search engine consists of four basic components: Web Crawlers, Database,
Search Interfaces, and Ranking.
1. Web Crawler:
o Also Known As: Spider or Bot.
o Function: Web crawlers systematically browse the internet to
gather information. They discover new web pages by following
links, retrieve content (HTML, text, images), parse the information,
and filter unnecessary or irrelevant pages.
o Key Features:
▪ Scalable Crawling: Handles billions of pages.
▪ Robots.txt Compliance: Respects rules on which pages can
be crawled.
▪ Politeness: Avoids overloading servers.
o Technologies: Distributed crawling frameworks (e.g., Apache
Nutch), URL management, HTTP protocols, and data storage.
2. Database:
o Purpose: The database stores all the web resources collected by
the crawler. It contains a massive amount of data from across the
internet that will be used for indexing and retrieval.
3. Search Interface:
o Function: Provides the user with a way to input queries and
interact with the database. It acts as the bridge between users and
the search engine's database.
o Key Features:
▪ Search Box: For entering queries with features like
autocomplete.
▪ Result Presentation: Shows structured results with
metadata (e.g., title, URL).
▪ Filters and Sorting: Helps narrow results by date, location,
etc.
▪ Pagination: For navigating through multiple pages of results.
o Technologies: Built using front-end frameworks (e.g., React,
Angular) and UX design principles to optimize user interaction.
4. Ranking:
o Purpose: Determines the order in which search results are
presented. The ranking process involves analyzing the user's
query, finding matching pages in the index, and presenting the
most relevant results.
o Steps:
▪ Analyze Query: Break down into meaningful keywords.
▪ Find Matching Pages: Look up the best matches from the
indexed data.
▪ Present Results: Display results with relevance to the user's
query, often enriched by additional elements like ads or
direct answers.
In summary, search engines work through these components to crawl the web,
store information, provide an intuitive interface for users, and rank search
results for relevance.

REST API (Representational State Transfer API) - Key Points:


1. Architecture Style: REST is an architectural style for building APIs
(Application Programming Interfaces) that interact over HTTP.
2. Stateless: Each request from a client to the server must contain all the
information the server needs to process it. The server does not store
client state between requests.
3. Resource-Based: REST APIs operate on resources (e.g., users, orders),
which are identified using URIs (Uniform Resource Identifiers).
4. HTTP Methods:
o GET: Retrieve data from a resource.
o POST: Submit data to create a resource.
o PUT: Update an existing resource.
o DELETE: Remove a resource.
5. Data Format: REST APIs commonly exchange data in formats like JSON
(JavaScript Object Notation) or XML.
6. Stateless Communication: Each API call is independent, and no client
context is stored on the server.
7. Scalability: REST is highly scalable and can handle multiple requests
efficiently due to its stateless nature.
8. Cacheable: Responses can be explicitly marked as cacheable or non-
cacheable to improve performance.
9. Client-Server Separation: The client and server are decoupled; the client
only needs to know the URI of the resource and does not interact with
the backend logic.
10.Uniform Interface: REST APIs follow consistent conventions across all
resources and methods, making them predictable and easier to use.
In short, REST APIs are widely used for creating scalable, easy-to-use web
services by adhering to a set of stateless communication principles over HTTP.

SOAP API (Simple Object Access Protocol) - Key Points:


1. Protocol-Based: SOAP is a protocol for exchanging structured
information in web services, operating primarily over HTTP or SMTP.
2. XML-Based: SOAP exclusively uses XML for formatting requests and
responses, making it more rigid and verbose compared to other APIs like
REST.
3. Strong Standards: SOAP adheres to strict standards defined by the W3C,
which include message structure, security, and error handling.
4. WS-Security: SOAP supports WS-Security for security features like
authentication, encryption, and integrity, making it a better option for
sensitive transactions (e.g., banking).
5. Transport Agnostic: SOAP is not limited to HTTP; it can be used with
other protocols like SMTP, JMS, or FTP, offering flexibility in
communication channels.
6. Operations Based: SOAP APIs are based on operations or actions (not
resources), defined in a WSDL (Web Services Description Language) file
that describes the API's structure.
7. Tightly Coupled: SOAP services and clients are closely tied, meaning
both need to know about the WSDL structure, making it less flexible than
REST.
8. Stateful: SOAP can support stateful operations, meaning it can maintain
context between requests, unlike REST, which is stateless.
9. Error Handling: SOAP provides robust error handling with detailed fault
messages using a standard <Fault> element, which helps in identifying
issues during communication.
10.Higher Overhead: Due to its use of XML and strict standards, SOAP
generally has a higher overhead compared to lightweight alternatives
like REST.
In summary, SOAP APIs are protocol-based, using XML for communication and
offering strict standards and security features, often suited for enterprise-level
services where security and transaction reliability are critical.

Aspect REST API SOAP API

Architecture/Protocol REST is an architectural style SOAP is a protocol for


for building APIs. exchanging structured
information.

Data Format Supports multiple formats: Strictly uses XML for both
JSON, XML, HTML, Plain text. request and response.

State Stateless: Each request is Can be stateless or stateful


independent, and no client depending on the need.
information is stored on the
server.

Transport Protocol Primarily uses HTTP for Supports multiple transport


communication. protocols: HTTP, SMTP, FTP, etc.

Performance Lightweight, faster due to less Heavier due to XML messaging,


overhead, especially with resulting in slower performance.
JSON.

Security Depends on transport-layer Built-in security features like


security (e.g., SSL/TLS). Can WS-Security, which offers
implement OAuth for additional layers like encryption,
authentication. security tokens.

Flexibility More flexible, works with Less flexible, requires strict


various content types, and is adherence to WSDL (Web
easier to implement and Services Description Language)
modify. and XML structure.

Caching Supports caching of data for No built-in caching mechanism.


improved performance,
especially for GET requests.

Error Handling Error handling is done using SOAP uses its own error-
standard HTTP status codes handling standard via the
(e.g., 404, 500). <Fault> element for detailed
error reporting.

Message Structure Simple, lightweight message More complex and rigid


structure, typically includes message structure with
headers and body. envelopes and headers.

Use Cases Ideal for lightweight Preferred for enterprise-level


applications, mobile apps, web services, financial services, or
services, public APIs. any transaction where security,
reliability, and compliance are
critical.

Service Description No formal description Uses WSDL for service contract,


language (though making the API self-descriptive
OpenAPI/Swagger can be and discoverable.
used).

Implementation Easier to implement, requiring More complex to implement


fewer resources. due to strict standards and XML
handling.
Summary:
• REST is preferred for simplicity, flexibility, and speed, particularly for
lightweight applications and mobile/web development.
• SOAP is better suited for applications requiring high security, reliability,
and formal contracts, such as banking and enterprise systems.

GCP Hierarchy
The Google Cloud Platform (GCP) hierarchy is structured to provide
organization, management, and access control across cloud resources. The
hierarchy is organized into four main levels:
1. Organization:
o Top-most level representing the entire company or enterprise.
o Centralizes resource management, billing, security policies, and
access control across all departments and teams.
2. Folders:
o Used to logically group resources based on departments, teams,
or business functions.
o Helps organize and manage resources, policies, and permissions.
o Example: Department X, Department Y, Shared Infrastructure.
3. Teams:
o Teams exist within folders and are responsible for specific projects
or services.
o Teams are independent units that manage their resources and
have distinct access controls.
o Example: Team A, Team B under respective departments.
4. Projects:
o Fundamental units where all cloud resources are created (e.g.,
virtual machines, databases, storage).
o Each project is isolated, with its own settings, resources, billing,
and permissions.
o Example: WhatsApp Project (Team A) and Twitter Project (Team
B).
5. Development and Production Environments:
o Projects are often divided into Development (Dev) and Production
(Prod) environments.
o Enables separate resource management for testing and live
deployment without conflicts.
o Example: Test Project (Development) and Production Project.
Key Benefits:
• Organized Resource Management: Hierarchy ensures resources are
logically structured and easily managed.
• Access Control: Permissions can be applied at different levels
(organization, folder, project) for better security.
• Scalability: Flexible enough to accommodate various departments,
teams, and projects as the organization grows.

How MapReduce Works?


MapReduce organizes work by dividing a job into multiple tasks, which are
executed in parallel across a cluster of machines. The execution process
involves several phases, each handled by specific components. Here's a
detailed breakdown based on the tutorial:
1. Job Splitting
• The job is divided into smaller units called input splits, each processed
by a separate Map task. These splits ensure parallel processing by
distributing data across the nodes in the cluster.
2. Map Phase
• The Map task processes the input splits, applying a user-defined map
function to produce intermediate key-value pairs. For example, if the
task is word count, the map phase will output each word as a key and its
frequency as the value.
• This phase is fully distributed, with each node processing part of the
data, which allows for efficient data handling across large datasets.
3. Shuffling and Sorting
• Once the Map tasks are completed, the system performs a shuffle and
sort operation, where the intermediate key-value pairs are grouped by
key. This ensures that all values associated with a specific key are
collected together before being processed in the Reduce phase.
4. Reduce Phase
• The Reduce task aggregates the intermediate key-value pairs generated
by the Map tasks. It processes the grouped data and performs further
computations, such as summing the occurrences of words in the word
count example.
• This phase outputs the final reduced result, which can be stored or used
for further analysis.
5. Master and Worker Nodes
• The execution of the MapReduce job is controlled by a JobTracker, which
acts as the master node. The JobTracker assigns tasks to TaskTrackers
(worker nodes) that reside on the individual data nodes.
• TaskTrackers execute the assigned tasks and periodically send progress
reports (heartbeats) to the JobTracker. If a task fails, the JobTracker
reschedules it to another available TaskTracker.
6. Fault Tolerance
• MapReduce provides fault tolerance by redistributing tasks from failed
nodes to other working nodes. The JobTracker monitors the progress of
each task and can reschedule tasks in the event of failure.
7. Output Phase
• After the Reduce phase, the final output is written to the distributed file
system (such as HDFS), where it can be accessed for further processing
or analysis.
Components of GFS (Google File System)
Google File System (GFS) is a distributed file system designed by Google for
handling large-scale data processing workloads across many machines. It is
highly fault-tolerant and designed to work efficiently on commodity hardware.
The key components of GFS include:
1. GFS Master
• The Master server is responsible for managing metadata, including the
namespace, access control, and the mapping of files to chunks.
• It handles operations such as creating, deleting, and relocating chunks
across the chunkservers.
• The Master server does not directly manage the data itself but
coordinates tasks like replication and rebalancing.
2. Chunkservers
• Chunkservers store actual data in fixed-size blocks called chunks,
typically 64 MB each.
• Each chunk is replicated across multiple chunkservers (usually three
replicas) to ensure fault tolerance and availability.
• Chunkservers respond to read and write requests from clients and
periodically send status updates to the Master.
3. Clients
• Clients interact with the GFS to perform read and write operations.
• The client contacts the Master to locate the chunks and then
communicates directly with the chunkservers for data access.
• Clients cache metadata for a short period to minimize load on the
Master.
4. Chunks
• Files in GFS are divided into chunks of fixed size (64 MB by default), and
each chunk is stored across multiple chunkservers.
• Each chunk is identified by a unique 64-bit identifier assigned by the
Master.
• Data replication is managed to ensure high availability, with a typical
setup having three replicas of each chunk.
5. Replication
• GFS provides data replication to ensure reliability and fault tolerance.
• Chunks are replicated across multiple chunkservers (default is three
copies), so if one server fails, the data can still be retrieved from another
server.
• The Master continuously monitors the health of replicas and ensures
under-replicated chunks are replicated to maintain the desired level of
replication.
6. Lease Mechanism
• GFS uses a lease mechanism to ensure consistent writes. The Master
grants a lease to one of the chunkservers (called the primary), which
then coordinates updates with the other replicas (secondaries) before
committing the write.
• This ensures that all replicas have a consistent view of the data.
7. Garbage Collection
• GFS uses garbage collection to manage and delete obsolete data. When
files are deleted or chunks are no longer needed, the system reclaims
space periodically.
• The Master keeps track of chunk references and removes those that are
no longer referenced.
8. Fault Tolerance and Recovery
• GFS is designed to be fault-tolerant. In the event of a failure (e.g., a
chunkserver crash), the Master detects the failure and schedules
replication of any lost chunks to maintain availability.
• The system automatically recovers from failures, ensuring continued
operation even when individual components fail
Shared Responsibility Model in Cloud
The Shared Responsibility Model in cloud computing defines the security and
compliance responsibilities of both the cloud service provider (CSP) and the
customer. It clarifies which party is responsible for specific tasks, helping to
ensure better security and regulatory compliance.
1. Cloud Service Provider (CSP) Responsibilities
• Infrastructure Security: Securing physical infrastructure, including
servers, storage, and networking.
• Physical Security: Protecting data centers against unauthorized access
and disasters.
• Network Security: Securing networks with firewalls and intrusion
detection systems.
• Compliance: Ensuring cloud services meet industry regulations (GDPR,
HIPAA, etc.).
• Platform Security: Securing the virtualization layer (hypervisors,
containers).
• Service Availability: Maintaining uptime and disaster recovery solutions.
2. Customer Responsibilities
• Data Security: Protecting their data, including encryption and access
control.
• Access Management: Managing user access via Identity and Access
Management (IAM).
• Application Security: Ensuring secure development, patching, and
vulnerability management.
• Configuration Management: Correctly configuring cloud resources for
security.
• Compliance: Ensuring their cloud use complies with regulations
(especially for sensitive data).
• Monitoring and Logging: Implementing monitoring systems to detect
and respond to incidents.
3. Service Model Variations
• IaaS (Infrastructure as a Service):
o CSP: Manages physical infrastructure, networking, hypervisors.
o Customer: Manages operating systems, applications, and data.
• PaaS (Platform as a Service):
o CSP: Manages infrastructure and runtime environments.
o Customer: Manages applications, data, and configurations.
• SaaS (Software as a Service):
o CSP: Manages the entire application and its security.
o Customer: Manages user access and data security settings.
4. Importance of the Shared Responsibility Model
• Clarifies Roles: Reduces ambiguity about who manages specific security
aspects.
• Enhances Security: Helps customers better protect their data and
applications.
• Promotes Compliance: Ensures customers meet relevant legal and
regulatory requirements.
This model ensures both the provider and the customer play their part in
maintaining a secure cloud environment.

SSH Techniques
1. Password-Based Authentication: Users authenticate with a username
and password.
o Pros: Easy setup and use.
o Cons: Less secure, prone to brute-force attacks.
2. Public Key-Based Authentication: Uses a private key (client) and public
key (server) for authentication.
o Pros: More secure than passwords, no need for repeated logins.
o Cons: Requires key management; losing keys can cause access
issues.
3. SSH Agent Forwarding: Forwards the local private key to the remote
server without storing the key on it.
o Pros: Secure, avoids storing keys on remote servers.
o Cons: Risky if the remote server is compromised.
4. SSH Tunneling (Port Forwarding): Creates a secure tunnel to forward
ports between local and remote machines.
o Types: Local, Remote, and Dynamic port forwarding.
o Pros: Secure access to services behind firewalls.
o Cons: Complicated setup for dynamic tunneling.
5. SSH File Transfer Protocol (SFTP): Securely transfers files over SSH.
o Pros: Secure, supports file manipulation.
o Cons: Slower than non-encrypted file transfer protocols.
Benefits of SSH Techniques
• Encryption: All communication is encrypted, ensuring data
confidentiality and integrity.
• Authentication: Secure user authentication with passwords or key pairs.
• Flexibility: SSH can be used for tunneling, file transfers, remote
administration, and much more.
• Access Control: Allows granular access control and secure multi-user
environments.
• These SSH techniques provide a secure way to manage remote servers,
transfer files, and establish encrypted communication channels.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy