CWS Notes
CWS Notes
CWS Notes
5. Measured service:
o Cloud systems automatically control and optimize resource use by
leveraging a metering capability. This can be done at some level of
abstraction appropriate to the type of service (e.g., storage,
processing, bandwidth). Customers only pay for what they use.
6. Cost efficiency:
o Cloud computing eliminates the capital expense of buying
hardware and software and setting up and running on-site data
centers, which includes the racks of servers, round-the-clock
electricity for power and cooling, and IT experts to manage the
infrastructure.
7. Security:
o Cloud providers often offer a set of policies, technologies, and
controls that strengthen security and protect data, applications,
and the infrastructure from potential threats.
8. Scalability and flexibility:
o Cloud computing provides the ability to scale resources seamlessly
based on the application demand. Organizations can quickly and
easily scale their infrastructure and services.
These characteristics have made cloud computing a vital technology for
businesses looking for cost-effective, flexible, and scalable solutions for their IT
needs.
Software-as-a-Service (SaaS)
SaaS (Software as a Service) is a cloud computing model where software
applications are delivered over the internet as a service. Users can access these
applications via a web browser without the need to install, manage, or update
software on their local devices.
1. Cloud-hosted:
o SaaS applications are hosted on the provider’s cloud
infrastructure, eliminating the need for local servers.
2. Subscription-based:
o Users typically access the software through a subscription model,
paying on a monthly or annual basis.
3. No installations required:
o SaaS applications can be used directly from a web browser
without the need for installation or setup on local devices.
4. Automatic updates:
o The SaaS provider manages software updates, security patches,
and maintenance, so users always have access to the latest
version.
5. Cost-effective:
o SaaS eliminates the need for costly hardware and software
purchases, reducing upfront expenses for businesses.
6. Scalability:
o SaaS solutions can easily scale up or down according to the
number of users or business needs, providing flexibility.
7. Anywhere accessibility:
o Since SaaS is internet-based, users can access it from any location
with an internet connection, using any compatible device.
8. Multi-tenant architecture:
o Multiple users (or tenants) can access a single instance of the
software, with data segregated for privacy and security.
9. Collaboration-friendly:
o SaaS applications often include collaboration tools, allowing
multiple users to work on documents or projects simultaneously in
real-time.
10.Security management:
• SaaS providers invest in robust security measures, such as encryption
and data protection, helping to safeguard users' information.
Examples of SaaS:
• Google Workspace (formerly G Suite) – Provides tools like Gmail, Google
Drive, Google Docs, etc.
• Microsoft 365 – Offers applications like Word, Excel, PowerPoint,
Outlook, etc., as cloud-based services.
• Salesforce – A customer relationship management (CRM) platform used
to manage customer data, sales, and interactions.
• Slack – A collaboration tool for communication and project
management.
• Dropbox – A cloud storage service that allows users to store files and
access them from any device.
• SaaS is popular for its simplicity, affordability, and accessibility, making it
ideal for both businesses and individual users.
Infrastructure-as-a-Service (IaaS)
IaaS (Infrastructure as a Service) is a cloud computing model that provides
virtualized computing resources over the internet. In this model, a third-party
provider offers virtual machines, storage, servers, networking, and other
infrastructure components, allowing businesses to rent rather than own and
maintain their own physical infrastructure.
Key Points about IaaS:
1. Cloud-based infrastructure:
o IaaS offers essential infrastructure components like virtual servers,
storage, and networks hosted in the cloud, removing the need for
on-premises data centers.
2. Pay-as-you-go model:
o Users are charged based on their usage of resources such as CPU
time, storage, and bandwidth, making it cost-effective.
3. Scalable resources:
o IaaS platforms offer flexible scaling, allowing businesses to
increase or decrease computing resources as needed without
purchasing new hardware.
4. Virtualization technology:
o IaaS providers use virtualization to create virtual instances of
physical resources, enabling users to run multiple virtual machines
on a single physical server.
5. Self-service provisioning:
o Users can access and manage resources through a web interface
or API, allowing them to deploy, configure, and control
infrastructure as needed.
6. No hardware management:
o The cloud provider is responsible for maintaining and managing
physical hardware, while users focus on their own applications and
services.
7. High availability and redundancy:
o IaaS providers often ensure high availability through
geographically distributed data centers, offering redundancy and
disaster recovery options.
8. Customization:
o IaaS allows for high levels of customization. Users can choose their
operating systems, configure their networks, and set up their own
middleware or runtime environments.
9. Security and compliance:
o IaaS providers often offer security features like firewalls, intrusion
detection systems, and data encryption, but users are also
responsible for securing their own applications and data.
10.Supports a variety of workloads:
o IaaS can handle a wide range of workloads, from testing and
development environments to large-scale, mission-critical
applications.
Examples of IaaS:
• Amazon Web Services (AWS) – Provides services like EC2 (Elastic
Compute Cloud), S3 (Simple Storage Service), and VPC (Virtual Private
Cloud).
• Microsoft Azure – Offers virtual machines, blob storage, virtual
networks, and more.
• Google Cloud Platform (GCP) – Provides Compute Engine (VMs), Cloud
Storage, and Networking services.
• IBM Cloud – Offers virtual servers, storage, and networking, along with
advanced AI and data services.
• Oracle Cloud Infrastructure (OCI) – Provides computing, storage, and
networking services for enterprise-level workloads.
• DigitalOcean – Simplified cloud computing service mainly aimed at
developers with virtual machines and object storage
Platform-as-a-Service (PaaS)
PaaS (Platform as a Service) is a cloud computing model that provides a
platform allowing developers to build, deploy, and manage applications
without dealing with the complexity of infrastructure management. PaaS offers
a pre-configured environment with all the necessary tools and services for
application development, streamlining the process of creating software.
Key Points about PaaS:
1. Application development platform:
o PaaS offers a platform with development tools, runtime
environments, databases, and infrastructure required to build,
test, and deploy applications.
2. No infrastructure management:
o The cloud provider manages the underlying hardware, operating
systems, storage, and networking, so developers can focus on
writing code and developing applications.
3. Pre-configured environment:
o PaaS platforms come with pre-installed software components like
web servers, databases, and frameworks, saving time on
configuration and setup.
4. Supports multiple programming languages:
o Most PaaS offerings support multiple programming languages and
frameworks such as Java, Python, Ruby, Node.js, and PHP, giving
developers flexibility in choosing the tools they are comfortable
with.
5. Automated scaling:
o PaaS platforms automatically scale computing resources based on
the demand of the application, ensuring optimal performance
without manual intervention.
6. Collaboration-friendly:
o PaaS is ideal for team collaboration as it allows multiple
developers to work on the same project simultaneously in a
shared environment.
7. Integrated services:
o PaaS often includes built-in services like databases, caching,
authentication, and APIs, allowing developers to easily integrate
these into their applications without building them from scratch.
8. Rapid development and deployment:
o PaaS accelerates the development cycle by providing ready-to-use
components and tools, allowing developers to build and deploy
applications faster than traditional methods.
9. Cost-effective:
Key Differences:
• SaaS: Ready-to-use software, no management needed.
• IaaS: Provides infrastructure resources; users manage everything else.
Cloud Elasticity and Scalability
Cloud Elasticity:
• Elasticity allows cloud infrastructure to automatically scale up or down
based on sudden changes in demand.
• It helps manage fluctuating workloads efficiently, minimizing costs.
• Best suited for scenarios with temporary or fluctuating resource needs,
but not for environments that require a persistent, heavy workload.
• It's critical for mission-critical applications where performance is key, as
it ensures that additional resources are provided when needed, like CPU,
memory, or bandwidth.
Cloud Scalability:
• Scalability is used to handle growing workloads with consistent
performance.
• It's often applied in environments requiring persistent resource
deployment to manage static workloads efficiently.
Types of Scalability:
1. Vertical Scalability (Scale-up): Increases the power of existing resources
by adding more capacity (e.g., more CPU or memory).
2. Horizontal Scalability (Scale-out): Adds more resources (e.g., more
servers) to distribute the workload.
3. Diagonal Scalability: Combines both vertical and horizontal scalability,
adding resources in both directions when necessary.
AWS Infrastructure
Amazon Web Services (AWS) provides a comprehensive cloud infrastructure
that allows businesses to scale their applications, store data, and perform
computing tasks in a flexible, cost-effective manner. AWS offers a wide range of
services including computing power, storage options, and networking
capabilities.
1. Regions:
• AWS operates in multiple geographic regions around the world.
• Each region is a separate geographic area that contains multiple
Availability Zones.
• Regions allow users to deploy resources near their customers for low-
latency performance and compliance.
2. Availability Zones (AZs):
• Each Region is divided into Availability Zones (AZs), which are isolated
data centers within a region.
• AZs have independent power, networking, and cooling, ensuring high
availability and fault tolerance.
• Users can distribute workloads across AZs to ensure redundancy and
avoid service disruptions.
3. Edge Locations:
• Edge Locations are global data centers used for content delivery and
caching (via Amazon CloudFront).
• They enable faster content delivery to end-users by caching data closer
to the user's location.
• Used for services like CloudFront, Lambda@Edge, and Route 53.
4. Data Centers:
• AWS infrastructure is built around data centers that house compute,
storage, and networking hardware.
• AWS has numerous data centers across different regions, with security
protocols and physical safeguards in place to protect data.
5. Networking:
• AWS networking services enable secure communication between cloud
resources and external systems.
• Amazon VPC (Virtual Private Cloud) allows users to create isolated
networks in the cloud.
• AWS Direct Connect provides dedicated network connections between
on-premises environments and AWS for low-latency, high-throughput
communication.
6. Storage:
• AWS offers a range of storage services, including:
o Amazon S3 (Simple Storage Service) for scalable object storage.
o Amazon EBS (Elastic Block Store) for block storage attached to EC2
instances.
o Amazon Glacier for long-term, low-cost archival storage.
• AWS storage services are designed to be highly durable, available, and
secure.
7. Compute:
• Amazon EC2 (Elastic Compute Cloud) provides scalable virtual servers
(instances) that can be customized based on computing needs.
• AWS Lambda enables serverless computing, allowing users to run code
without managing infrastructure.
• Elastic Beanstalk automates deployment of applications while managing
the underlying infrastructure.
8. Database:
• AWS provides a variety of managed databases to support different use
cases:
o Amazon RDS (Relational Database Service) for MySQL,
PostgreSQL, Oracle, SQL Server, and MariaDB.
o Amazon DynamoDB for scalable NoSQL database needs.
o Amazon Redshift for data warehousing and analytics.
o Amazon Aurora, a high-performance relational database
compatible with MySQL and PostgreSQL.
Summary of AWS Infrastructure Components:
1. Regions: Geographic areas where AWS resources are deployed.
2. Availability Zones: Isolated data centers within regions for fault
tolerance.
3. Edge Locations: Data centers used for fast content delivery and caching.
4. Data Centers: Physical locations that house AWS hardware.
5. Networking: Virtual private cloud (VPC) for secure communication and
AWS Direct Connect for dedicated links.
6. Storage: Scalable and secure storage services like S3, EBS, and Glacier.
7. Compute: Virtual machines (EC2), serverless functions (Lambda), and
PaaS (Elastic Beanstalk).
8. Database: Managed relational and NoSQL databases like RDS,
DynamoDB, and Redshift.
These components combine to provide a robust, scalable, and highly available
cloud infrastructure for businesses to deploy applications and manage data
with flexibility and security.
AWS S3
Amazon S3 (Simple Storage Service) is a scalable, durable, and secure object
storage service provided by AWS. It is used to store and retrieve any amount of
data, from anywhere on the web. S3 is designed to provide high availability,
low-latency access to data, and offers flexible pricing based on the amount of
storage used.
Key Terms Related to Amazon S3:
a. Buckets:
• Definition: A Bucket is the fundamental container in S3 used to store
objects. Think of it as a "folder" or "directory" for organizing files.
• Unique Name: Each bucket name must be globally unique across AWS,
as the bucket name is part of the URL used to access the objects inside.
• Location: Buckets are created in specific AWS regions. This allows users
to store data close to where it is needed.
• Access Control: Buckets have configurable permissions, allowing users to
define who can read/write to the bucket (e.g., public access, specific IAM
users, etc.).
b. Object:
• Definition: An Object is the actual data that you store in a bucket in S3.
An object consists of two parts:
1. Data: The file (e.g., an image, document, video, or any other type
of file).
2. Metadata: Information about the object (such as the file type, last
modified date, and custom metadata added by the user).
• Unique Identifier: Each object is uniquely identified by a key (essentially
the name of the object within the bucket) and the bucket name.
• Storage: Objects can be as large as 5 TB each in size.
3. S3 One Zone-IA
o Use Case: Infrequently accessed data that does not require
multiple availability zone redundancy (e.g., secondary backups,
non-critical data).
o Performance: Low latency and high throughput, but stored in a
single availability zone.
o Durability: 99.999999999% durability, but limited to one
availability zone (lower fault tolerance than Standard-IA).
o Availability: 99.5% availability over a given year.
o Cost: Lower cost than S3 Standard-IA.
o Lifecycle Management:
▪ Transition: Move data to S3 Glacier for long-term, low-cost
storage.
▪ Expiration: Automatically delete the data after a defined
retention period.
4. S3 Glacier
o Use Case: Archival storage and data that is rarely accessed but
needs to be preserved (e.g., long-term backups, legal records).
o Performance: Retrieval times vary from minutes to hours
(depending on retrieval type: Expedited, Standard, or Bulk).
o Durability: 99.999999999% durability.
o Cost: Very low cost for storage, but retrieval fees apply based on
speed of access.
o Lifecycle Management:
▪ Transition from S3 Standard-IA or One Zone-IA: Move data
to S3 Glacier for long-term storage after a specified period
(e.g., 1 year).
▪ Expiration: Delete data once it is no longer needed or after
a defined retention period.
5. S3 Glacier Deep Archive
o Use Case: Long-term archiving of data that is rarely accessed (e.g.,
regulatory archives, compliance data).
o Performance: Retrieval takes 12 hours or more.
o Durability: 99.999999999% durability.
o Cost: Lowest storage cost in AWS, but retrieval can be expensive
and slow.
o Lifecycle Management:
▪ Transition from S3 Glacier: Move to S3 Glacier Deep
Archive for very infrequent access or compliance storage.
▪ Expiration: Set to delete after a long retention period if the
data is no longer required.
6. S3 Intelligent-Tiering
o Use Case: Data with unpredictable access patterns. It
automatically moves data between two access tiers (frequent and
infrequent access) based on access patterns.
o Performance: Low latency and high throughput.
o Durability: 99.999999999% durability.
o Cost: Slightly higher than S3 Standard-IA due to automation, but
no retrieval charges for infrequent access.
o Lifecycle Management:
▪ Automatic Tiering: Automatically moves data between
Frequent Access and Infrequent Access tiers.
▪ Transition to Glacier: Users can set up rules to transition
older data to S3 Glacier or Glacier Deep Archive for cost
savings.
Lifecycle Management Strategy for Each Tier:
1. S3 Standard: Transition to S3 Standard-IA or S3 Glacier if infrequently
accessed.
2. S3 Standard-IA: Transition to S3 Glacier or S3 Glacier Deep Archive for
long-term archiving, or delete after a specified period.
3. S3 One Zone-IA: Transition to S3 Glacier for archival storage or delete
when no longer needed.
4. S3 Glacier: Transition to S3 Glacier Deep Archive for long-term archiving
or delete when data retention is complete.
5. S3 Glacier Deep Archive: Delete after a specified retention period if no
longer required.
6. S3 Intelligent-Tiering: Automatically moves objects between Frequent
Access and Infrequent Access tiers. Transition to Glacier for long-term
storage.
AWS EC2
Amazon EC2 (Elastic Compute Cloud) is one of the core services provided by
AWS that allows you to run virtual machines, called instances, in the cloud. EC2
provides scalable computing power and flexibility to run applications, process
data, host websites, and more. Users can launch virtual machines in minutes,
configure them as needed, and scale them up or down depending on demand.
Key Features of EC2:
• Scalability: Easily scale up or down based on your application’s
requirements.
• Customizable: Choose the instance type (CPU, memory, storage) that fits
your workload.
• Pay-as-you-go: Pay only for the compute resources you use, with options
for long-term savings (like Reserved Instances).
• Global Reach: Launch instances in different AWS Regions and
Availability Zones worldwide for lower latency and high availability.
• Elastic Load Balancing (ELB): Distribute incoming traffic across multiple
instances to ensure availability and fault tolerance.
Different Instance Types in EC2
AWS EC2 instances are categorized into different instance types, each
optimized for different use cases based on CPU, memory, storage, and
networking capabilities.
Here are the major EC2 instance families and their characteristics:
1. General Purpose Instances:
o Instance Types: t3, t3a, t2, m5, m5a, m6g, m6i
o Use Case: Balanced compute, memory, and network resources for
applications like small to medium databases, development and
testing environments, and web servers.
o Characteristics:
▪ Provides a balance of CPU, memory, and network resources.
▪ Can handle diverse workloads such as web hosting,
application servers, and development environments.
2. Compute Optimized Instances:
o Instance Types: c5, c5a, c5n, c6g, c6i
o Use Case: High-performance compute-intensive applications like
high-performance web servers, scientific modeling, and batch
processing.
o Characteristics:
▪ Optimized for compute-heavy applications.
▪ High-performance processors (often Intel or AMD) with high
clock speeds.
▪ Ideal for CPU-bound tasks like gaming servers, data
analytics, and scientific applications.
3. Memory Optimized Instances:
o Instance Types: r5, r5a, r5n, r6g, r6i, x1e, u-6tb1.metal
o Use Case: Applications requiring large amounts of memory, such
as high-performance databases, in-memory caches, and real-time
big data analytics.
o Characteristics:
▪ Provides a high ratio of memory to CPU.
▪ Ideal for workloads such as databases, in-memory caches,
and big data analytics.
4. Storage Optimized Instances:
o Instance Types: i3, i3en, d2, h1, i4i
o Use Case: Applications that require high disk throughput and low-
latency access to large amounts of data (e.g., NoSQL databases,
data warehousing).
o Characteristics:
▪ Provides fast, low-latency access to local storage.
▪ Ideal for workloads that require high storage performance
such as large databases, data warehousing, and real-time
big data processing.
5. Accelerated Computing Instances:
o Instance Types: p4, p3, inf1, g4ad, g5
o Use Case: Machine learning, artificial intelligence (AI), graphics
processing, and video transcoding.
o Characteristics:
▪ Includes hardware accelerators like GPUs (Graphics
Processing Units) and FPGAs (Field-Programmable Gate
Arrays).
▪ Designed for compute-intensive applications like deep
learning training and inferencing, 3D rendering, and high-
performance computing.
▪ that don’t require constant full CPU utilization.
Choosing the Right Instance:
• General Purpose instances are ideal for a wide range of applications,
including development, testing, and web hosting.
• Compute Optimized instances are best for applications that need high
processing power, such as batch processing and scientific computing.
• Memory Optimized instances are suited for applications that need large
amounts of memory, such as databases and analytics platforms.
• Storage Optimized instances are designed for data-intensive applications
requiring high storage throughput.
• Accelerated Computing instances are built for tasks requiring specialized
hardware accelerators like GPUs for machine learning and AI.
EC2 Purchasing
Option Commitment Pricing Use Case Key Benefit Cost
Model Advantage
Versioning in AWS S3
Versioning in Amazon S3 (Simple Storage Service) is a feature that allows you to
preserve, retrieve, and restore every version of every object stored in a bucket.
When versioning is enabled on an S3 bucket, multiple versions of an object
(file) can exist in the same bucket, providing an important mechanism for data
protection.
Key Features of S3 Versioning:
1. Object Versioning:
o When you upload a new object with the same key (name) to a
versioned bucket, instead of overwriting the previous object,
Amazon S3 stores the new object as a new version.
o Every version of the object, including deletes, is tracked and
stored.
2. Protects Against Unintentional Overwrites:
o When versioning is enabled, each update creates a new version.
You can easily recover previous versions if an object is accidentally
overwritten.
3. Delete Protection:
o Even if an object is deleted, the object is not actually removed.
Instead, a delete marker is created, and the previous versions
remain intact.
o You can restore the object by deleting the delete marker to make
the prior version the current version again.
4. Data Archiving:
o You can use versioning to archive older versions of your objects
and move them to cheaper storage tiers (such as S3 Glacier or
Glacier Deep Archive) for long-term retention.
5. Enabling Versioning:
o Versioning must be explicitly enabled for each S3 bucket.
o Once enabled, versioning cannot be disabled, though it can be
suspended. Suspending versioning prevents new versions from
being created but does not remove existing versions.
6. Version IDs:
o Each object version gets assigned a unique Version ID. You can
reference different versions of the object using this Version ID to
access or restore a specific version.
7. Cost Considerations:
o Versioning increases storage costs since every version of an object
is retained. Implementing Lifecycle Management to automatically
transition older versions to cheaper storage classes can help
mitigate costs.
Common Use Cases of Versioning:
• Data Protection: Protect against accidental deletions or overwrites.
• Backup: Keep backup copies of files for disaster recovery.
• Auditing: Keep historical versions of files for compliance or auditing
purposes.
• Archiving: Archive previous versions of files to cheaper storage classes
like S3 Glacier.
Subnetting:
Subnetting is the process of dividing a larger network (typically a Class A, Class
B, or Class C network) into smaller, more manageable subnetworks (subnets).
Subnetting allows better utilization of IP address space and helps improve
network performance and security.
Key Terms in Subnetting:
1. Network Address: The first address of the subnet. This identifies the
subnet itself and cannot be assigned to any device.
2. Broadcast Address: The last address of the subnet. It is used to send a
message to all devices within the subnet.
3. Host Range: The range of addresses between the network address and
the broadcast address. These are the assignable addresses for devices in
the subnet.
How Subnetting Works:
• Subnetting allows the division of a network into smaller sub-networks.
• By using the subnet mask, you can define the portion of the IP address
that is reserved for the network and the portion that is available for host
addresses.
• The subnet mask (e.g., 255.255.255.0) tells you how many bits are
allocated to the network part and how many bits are available for host
addresses.
Subnet Mask:
A subnet mask is a 32-bit number used to divide an IP address into network
and host portions. Here are some common subnet masks:
• 255.0.0.0 (Class A)
• 255.255.0.0 (Class B)
• 255.255.255.0 (Class C)
CIDR (Classless Inter-Domain Routing):
• CIDR is an alternative to traditional IP class-based addressing.
• It uses a notation like 192.168.1.0/24 where /24 indicates that the first
24 bits of the IP address are used for the network portion and the
remaining bits are used for hosts.
• CIDR is more flexible than the old class system because it allows more
precise allocation of IP addresses.
Why Subnetting Is Important:
1. Efficient IP Usage: Subnetting helps optimize the use of IP addresses,
particularly in large networks.
2. Security: Subnetting limits the size of broadcast domains, reducing the
risk of security breaches.
3. Improved Performance: Smaller subnets reduce the volume of traffic
within each subnet, improving network performance.
In summary:
• IP classes help categorize IP addresses into networks based on size.
• Subnetting divides a large network into smaller, more manageable
segments, allowing better resource utilization and security.
Router vs Gateway
Aspect Router Gateway
Layer in OSI Operates at Layer 3 (Network Can operate across multiple layers,
Model Layer), using IP addresses to typically Layer 7 (Application Layer) or
determine the best path for Layer 3. It often deals with protocol
data transmission. conversion at higher layers.
Use Case Used primarily for routing Used to connect two networks with
traffic between LANs (Local different protocols (e.g., connecting an
Area Networks) or WANs IP network to a non-IP network like a
(Wide Area Networks). legacy system or telecommunication
network).
Protocol Works within the same Handles protocol translation, allowing
Support protocol family, typically IP- communication between networks with
based (IPv4, IPv6). different protocols (e.g., IP to X.25).
Traffic Deals with data packet routing Translates, processes, and forwards
Handling based on IP addresses and traffic between different networks,
decides the optimal path for even if they use different
data to travel. communication protocols.
Security Role Provides basic routing but Often incorporates firewall features,
typically does not offer providing security, and acting as a
extensive security functions. checkpoint for external and internal
Some routers include firewall networks.
capabilities.
NAT (Network Often supports NAT to allow Can perform NAT and other protocol
Address multiple devices on a private conversion processes to manage
Translation) network to share a single communication between different
public IP address. networks.
Summary:
• Router: Specializes in routing data between IP-based networks, ensuring
that data packets are sent to the correct destination.
• Gateway: Translates communication between different network
protocols or systems, often acting as a protocol converter and security
checkpoint.
While routers focus mainly on IP-based routing, gateways are more versatile
and handle protocol translation and connectivity between different types of
networks, such as converting data between internet networks and legacy
systems.
Grid Computing
Grid computing is a distributed architecture where multiple computers,
connected by networks, work together to perform a joint task. The system
operates by breaking down a task into smaller subtasks, which are distributed
across different computers (grid nodes). These nodes then work in parallel, and
their outputs are combined to accomplish the overall task.
How Grid Computing Works
1. Control Node: A server or group of servers that administers and
maintains the network's resource pool.
2. Provider (Grid Node): A computer contributing resources such as
processing power and storage to the grid.
3. User: A computer that utilizes the grid resources to complete a task.
The grid operates via specialized software that manages task distribution,
communication, and resource sharing. The software divides the main task into
subtasks and assigns them to various grid nodes for parallel processing.
Key Components of Grid Computing:
1. User Interface:
o Provides users with a unified portal-like interface to launch and
manage applications on the grid.
o Users view the grid as a single large virtual computer offering
computing resources.
2. Security:
o Grid security is ensured through mechanisms like authentication,
authorization, and data encryption.
o Grid Security Infrastructure (GSI) facilitates secure communication
within the grid using tools like OpenSSL.
3. Scheduler:
o Responsible for scheduling tasks across grid nodes, ensuring
efficient execution.
o High-level schedulers may be required to manage resources across
different clusters.
4. Data Management:
o Involves secure data movement and access across grid nodes.
o Example: The Globus toolkit with GridFTP for secure file transfer
and data management.
5. Workload & Resource Management:
o Handles job execution, monitors job status, and retrieves results.
o Coordinates resource availability and workload distribution across
grid nodes.
Types of Grid Computing:
• Computational Grids: Focus on distributing and executing complex
computational tasks.
• Data Grids: Manage and distribute large data sets across geographically
dispersed locations.
Applications of Grid Computing:
• Scientific research (e.g., protein folding simulations).
• Large-scale data analysis (e.g., climate modeling).
• Collaboration between organizations for shared computing resources.
What Are Search Engines?
Search engines are programs that help users find information on the internet.
They use algorithms to index and rank web pages based on relevance to a
user's query. Popular search engines include Google, Bing, and Yahoo. For
example, if a student searches for "C++ tutorial GeeksforGeeks," the search
engine provides links to relevant tutorials.
How Do Search Engines Work?
Search engines operate through three main steps: Crawling, Indexing, and
Ranking.
1. Crawling:
o Computer programs, known as crawlers or spiders, explore the
web to find publicly available information. They scan websites,
read the HTML code, and understand the structure and content of
each page.
o Importance: If crawlers can't access a site, it won't be ranked or
appear in search results.
2. Indexing:
o Once crawlers identify a page, the data is organized and stored in a
large database (index). The index includes details like the title,
description, keywords, and links of a page.
o Importance: If a page is not indexed, it won't appear in search
results.
3. Ranking:
o Search engines use algorithms to rank pages based on how well
they match the user's query.
▪ Step 1: Analyzing the user's query and breaking it down into
keywords.
▪ Step 2: Finding pages that match the query from the index.
▪ Step 3: Presenting the best matching results to the user,
often supplemented by paid ads or direct answers.
In short, search engines crawl the web, index the data, and rank pages to
deliver relevant search results.
Data Format Supports multiple formats: Strictly uses XML for both
JSON, XML, HTML, Plain text. request and response.
Error Handling Error handling is done using SOAP uses its own error-
standard HTTP status codes handling standard via the
(e.g., 404, 500). <Fault> element for detailed
error reporting.
GCP Hierarchy
The Google Cloud Platform (GCP) hierarchy is structured to provide
organization, management, and access control across cloud resources. The
hierarchy is organized into four main levels:
1. Organization:
o Top-most level representing the entire company or enterprise.
o Centralizes resource management, billing, security policies, and
access control across all departments and teams.
2. Folders:
o Used to logically group resources based on departments, teams,
or business functions.
o Helps organize and manage resources, policies, and permissions.
o Example: Department X, Department Y, Shared Infrastructure.
3. Teams:
o Teams exist within folders and are responsible for specific projects
or services.
o Teams are independent units that manage their resources and
have distinct access controls.
o Example: Team A, Team B under respective departments.
4. Projects:
o Fundamental units where all cloud resources are created (e.g.,
virtual machines, databases, storage).
o Each project is isolated, with its own settings, resources, billing,
and permissions.
o Example: WhatsApp Project (Team A) and Twitter Project (Team
B).
5. Development and Production Environments:
o Projects are often divided into Development (Dev) and Production
(Prod) environments.
o Enables separate resource management for testing and live
deployment without conflicts.
o Example: Test Project (Development) and Production Project.
Key Benefits:
• Organized Resource Management: Hierarchy ensures resources are
logically structured and easily managed.
• Access Control: Permissions can be applied at different levels
(organization, folder, project) for better security.
• Scalability: Flexible enough to accommodate various departments,
teams, and projects as the organization grows.
SSH Techniques
1. Password-Based Authentication: Users authenticate with a username
and password.
o Pros: Easy setup and use.
o Cons: Less secure, prone to brute-force attacks.
2. Public Key-Based Authentication: Uses a private key (client) and public
key (server) for authentication.
o Pros: More secure than passwords, no need for repeated logins.
o Cons: Requires key management; losing keys can cause access
issues.
3. SSH Agent Forwarding: Forwards the local private key to the remote
server without storing the key on it.
o Pros: Secure, avoids storing keys on remote servers.
o Cons: Risky if the remote server is compromised.
4. SSH Tunneling (Port Forwarding): Creates a secure tunnel to forward
ports between local and remote machines.
o Types: Local, Remote, and Dynamic port forwarding.
o Pros: Secure access to services behind firewalls.
o Cons: Complicated setup for dynamic tunneling.
5. SSH File Transfer Protocol (SFTP): Securely transfers files over SSH.
o Pros: Secure, supports file manipulation.
o Cons: Slower than non-encrypted file transfer protocols.
Benefits of SSH Techniques
• Encryption: All communication is encrypted, ensuring data
confidentiality and integrity.
• Authentication: Secure user authentication with passwords or key pairs.
• Flexibility: SSH can be used for tunneling, file transfers, remote
administration, and much more.
• Access Control: Allows granular access control and secure multi-user
environments.
• These SSH techniques provide a secure way to manage remote servers,
transfer files, and establish encrypted communication channels.