0% found this document useful (0 votes)
23 views7 pages

Unit3 - Cloud Data Storage

unit 3 AIMl for diploma

Uploaded by

dhanashree
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views7 pages

Unit3 - Cloud Data Storage

unit 3 AIMl for diploma

Uploaded by

dhanashree
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Cloud Data Storage

 Cloud Storage Types:-


There are three main cloud storage types: object storage, file storage, and
block storage. Each offers its own advantages and has its own use cases.
1. Object storage
Organizations have to store a massive and growing amount of
unstructured data, such as photos, videos, machine learning (ML),
sensor data, audio files, and other types of web content, and finding
scalable, efficient, and affordable ways to store them can be a
challenge. Object storage is a data storage architecture for large
stores of unstructured data. Objects store data in the format it
arrives in and makes it possible to customize metadata in ways that
make the data easier to access and analyze. Instead of being
organized in files or folder hierarchies, objects are kept in secure
buckets that deliver virtually unlimited scalability. It is also less
costly to store large data volumes.
Applications developed in the cloud often take advantage of the
vast scalability and metadata characteristics of object storage.
Object storage solutions are ideal for building modern applications
from scratch that require scale and flexibility, and can also be used
to import existing data stores for analytics, backup, or archive.
2. File storage
File-based storage or file storage is widely used among
applications and stores data in a hierarchical folder and file format.
This type of storage is often known as a network-attached storage
(NAS) server with common file level protocols of Server Message
Block (SMB) used in Windows instances and Network File System
(NFS) found in Linux.
3. Block storage
Enterprise applications like databases or enterprise resource
planning (ERP) systems often require dedicated, low-latency
storage for each host. This is analogous to direct-attached storage
(DAS) or a storage area network (SAN). In this case, you can use a
cloud storage service that stores data in the form of blocks. Each
block has its own unique identifier for quick storage and retrieval.
 Cloud Data Governance :-
Cloud data governance is a concept that organizations need to be familiar
with if they have data in the cloud or plan to migrate their data to the
cloud. Cloud governance is a set of processes that ensure data stored in
cloud environments is secure, accurate, and compliant with all relevant
data regulations and policies. It also helps organizations identify and
classify sensitive data, define usage, and manage access to data.

Why Is It Important?
Cloud data governance is important for numerous reasons. Organizations
that deal with sensitive information and data assets need to ensure that
they are managing and governing data across their cloud platforms and
applications. Proper cloud data governance makes this simpler.
Cloud data governance also helps ensure data is more accurate and
reliable by reducing duplications, inconsistencies, and other errors. This
allows organizations to get more value from data and make better data-
driven decisions.

What Are the Benefits?


i. Data Security and Privacy
ii. Enhanced Collaboration and Data Sharing
iii. Data Quality and Integrity
iv. Scalability and Flexibility
 Key value databases:-
1. Key-value databases, also known as key-value stores or NoSQL
databases, are a type of non-relational database that use a key-value
method to store data. In a key-value database, data is stored as a
collection of key-value pairs, where a key acts as a unique
identifier for a value. The key can be a simple string, and the value
can be a simple object like a number or string, or a complex
compound object.
2. Key-value databases are often considered the simplest and fastest
type of NoSQL database. They are easy to design and implement,
and they don't require the schema to be constantly changed to
accommodate unstructured data. They are also highly partitionable
and allow for horizontal scaling, which other types of databases
can't achieve.

 Some key-value database features include:


i. Retrieving values: Users can retrieve values associated with
a given key.
ii. Deleting values: Users can delete values associated with a
given key.
iii. Setting, updating, and replacing values: Users can set,
update, or replace values associated with a given key.
iv. Links: Links can be used to map the relationship between
pairs of key values.
v. Search: Some key-value databases, like Riak, have search
capabilities for full-text searches.
vi. Secondary indexes: Developers can mark values with the
value of one or more key fields, and then applications can
query the index to return a list of similar keys.

 Batch data and Streaming data on Machine Learning:-

S.No. BATCH PROCESSING STREAM PROCESSING


Batch processing refers to
processing of high volume of Stream processing refers to
data in batch within a specific processing of continuous stream of
01. time span. data immediately as it is produced.
Batch processing processes large Stream processing analyzes
02. volume of data all at once. streaming data in real time.
In Batch processing data size is In Stream processing data size is
04. known and finite. unknown and infinite in advance.
In Batch processing the data is In stream processing generally data
05. processes in multiple passes. is processed in few passes.
Batch processor takes longer time Stream processor takes few seconds
06. to processes data. or milliseconds to process data.
In batch processing the input In stream processing the input
07. graph is static. graph is dynamic.
In this processing the data is In this processing the data is
08. analyzed on a snapshot. analyzed on continuous.
In batch processing the response In stream processing the response is
09. is provided after job completion. provided immediately.
Examples are programming
Examples are distributed platforms like spark streaming and
programming platforms like S4 (Simple Scalable Streaming
10. MapReduce, Spark, GraphX etc. System) etc.
Batch processing is used in Stream processing is used in stock
payroll and billing system, food market, e-commerce transactions,
11. processing system etc. social media etc.
Processes data in batches or sets, Processes data in real-time, as it is
typically stored in a database or generated or received from a
12 file system. source.
Processes data in discrete, finite Processes data continuously and
13 batches or jobs. incrementally.
 Cloud data warehouse:-
A cloud data warehouse is a modern way of storing and managing large
amounts of data in a public cloud. It lets you quickly access and use your
data. This makes it the perfect solution for businesses that rely on data
and require agility, flexibility, and ease of use for their infrastructure
requirements.
 Cloud Data Warehouse Benefits:-
1. Faster Insights: A cloud data warehouse provides more
powerful computing capabilities, and will deliver real-time
cloud analytics using data from diverse data sources much
faster than an on-premises data warehouse, allowing
business users to access better insights, faster.
2. Scalability: A cloud-based data warehouse offers immediate
and nearly unlimited storage, and it’s easy to scale as your
storage needs grow. Increasing cloud storage doesn’t require
you to purchase new hardware as an on-premises data
warehouse does, and you’ll pay a fraction of the cost.
3. Overhead: Maintaining a data warehouse on-premises
requires a dedicated server room full of expensive hardware,
and experienced employees to oversee, manually upgrade,
and troubleshoot issues. A cloud data warehouse requires no
physical hardware or allocated office space, making
operational costs significantly lower.
Cloud Data Warehouse Vendors
There are many popular cloud-based data warehouse platforms to choose
from, including Amazon Redshift, Google BigQuery, Microsoft Azure,
Snowflake, and others — and there are just as many important
considerations when deciding on the right solution for your organization.

 Amazon Redshift:-
For many years, data warehousing was only available as an on-premise
solution. Then in November 2012, Amazon Web Services (AWS)
launched Redshift, a fully managed, petabyte-scale data warehouse
service in the cloud. Although not the first cloud-based data warehouse, it
was the first to gain market share through adoption. Redshift’s SQL
dialect is based on PostgreSQL, which is well understood by analysts
worldwide, and uses an architecture familiar to many on-premises data
warehouses users.

You can start with as little as a few gigabytes of data and scale to
petabytes. This empowers you to acquire new insights from your business
and customer data.

The first step to creating a Redshift data warehouse is to launch a set of


nodes, called an Amazon Redshift cluster. After you provision your
cluster, you upload your data set and then perform data analysis queries.
Regardless of the size of your data set, Amazon Redshift delivers fast
query performance using familiar SQL-based tools and business
intelligence applications.

 GCP(Google Cloud Platform) BigQuery:-


BigQuery is a fully managed, serverless data warehouse that
automatically scales to match storage and computing power needs.
Google doesn’t expect you to manage your data warehouse infrastructure
which is why BigQuery hides many of the underlying hardware, database,
nodes, and configuration details. Its elasticity automatically works out of
the box. And getting started is simply a matter of creating an account with
Google Cloud Platform (GCP), loading a table, and running a query.
Google takes care of the rest.

With BigQuery, you get a columnar and ANSI SQL database that can
analyze terabytes to petabytes of data at incredible speeds. BigQuery also
lets you do spatial analysis using familiar SQL with BigQuery GIS. In
addition, you can quickly build and operationalize ML models on large-
scale structured or semi-structured data using simple SQL with BigQuery
ML. And you can support real-time interactive dashboarding with
BigQuery BI Engine.

The BigQuery architecture is composed of several components. Borg is


the compute. Colossus is the distributed storage. Jupiter is the network.
And Dremel is the execution engine.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy