0% found this document useful (0 votes)
18 views

Ch1Introduction To Information Storage

The document provides an introduction to information storage. It discusses the following key points in 3 sentences: Digital data refers to data stored as binary numbers that is accessible by computers for processing. The growth of digital data is driven by increases in data processing capabilities, lower digital storage costs, faster communication technologies, and more applications. Various types of digital data require different storage solutions based on how often the data is created, used, and how valuable it remains over time.

Uploaded by

Naruto Uzamaki
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Ch1Introduction To Information Storage

The document provides an introduction to information storage. It discusses the following key points in 3 sentences: Digital data refers to data stored as binary numbers that is accessible by computers for processing. The growth of digital data is driven by increases in data processing capabilities, lower digital storage costs, faster communication technologies, and more applications. Various types of digital data require different storage solutions based on how often the data is created, used, and how valuable it remains over time.

Uploaded by

Naruto Uzamaki
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 39

INTRODUCTION TO INFORMATION STORAGE

-Ashu Mehta
Database systems
Information Storage
Data
Data is a collection of raw facts from which conclusions
might be drawn.
Examples include
 Handwritten letters, a printed book, a family photograph, a
bank’s ledgers, and an airline ticket are all examples that
contain data.
Information Storage
Digital data
Before the advent of computers, the methods adopted
for data creation and sharing were limited.
Today, the same data can be converted into more
convenient forms, such as an e-mail message, an e-
book, a digital image, or a digital movie. This data can
be generated using a computer and stored as strings of
binary numbers (0s and 1s).
Data in this form is called digital data and is accessible
by the user only after a computer processes it.
Digital data
Information Storage
Factors that have contributed to the growth of digital data:
 Increase in data-processing capabilities:
 Modern computers provide a significant increase in processing and storage
capabilities. This enables the conversion of various types of content and
media from conventional forms to digital formats.
 Lower cost of digital storage:
 Technological advances and the decrease in the cost of storage devices
have provided low-cost storage solutions. This cost benefit has increased
the rate at which digital data is generated and stored.
 Affordable and faster communication technology:
 The rate of sharing digital data is now much faster than traditional
approaches. A handwritten letter might take a week to reach its
destination, whereas it typically takes only a few seconds for an e-mail
message to reach its recipient.
 Proliferation of applications and smart devices:
 Smart-phones, tablets, and newer digital devices, along with smart
applications, have significantly contributed to the generation of digital
content.
Information Storage
Data explosion
Inexpensive and easier ways to create, collect, and store
all types of data, coupled with increasing individual and
business needs, have led to accelerated data growth,
popularly termed data explosion.
Information Storage
The importance and value of data vary with time.
Most of the data created holds significance for a short
term but becomes less valuable over time.
 This governs the type of data storage solutions used.
Typically, recent data which has higher usage is
stored on faster and more expensive storage.
 As it ages, it may be moved to slower, less expensive
but reliable storage.
Information Storage
Types of Data
Structured data is organized in rows and columns in a
rigidly defined format so that applications can retrieve
and process it efficiently. Structured data is typically
stored using a database management system (DBMS).
Data is unstructured if its elements cannot be stored
in rows and columns, which makes it difficult to query
and retrieve by applications. For example, customer
contacts that are stored in various forms such as sticky
notes, e-mail messages, business cards, or even digital
format fi les, such as .doc, .txt, and .pdf.
Information Storage
Information
 Data, whether structured or unstructured, does not fulfill any
purpose for individuals or businesses unless it is presented in a
meaningful form.
 Information is the intelligence and knowledge derived from
data.
Storage
 In a computing environment, devices designed for storing data
are termed storage devices or simply storage.
 The type of storage used varies based on the type of data and
the rate at which it is created and used.
 Devices, such as a media card in a cell phone or digital camera,
DVDs, CD-ROMs, and disk drives in personal computers are
examples of storage devices.
Big data
'Big Data' is also a data but with a huge size.
'Big Data' is a term used to describe collection of data
that is huge in size and yet growing exponentially
with time.
In short, such a data is so large and complex that
none of the traditional data management tools are
able to store it or process it efficiently.
Characteristics Of 'Big Data'
Also called 3Vs of Big Data
1. Volume
2. Variety
3. Velocity

Now a days, 2 more Vs are added to the characteristics


of Big data i.e. Veracity and Value.
1. Volume
The name 'Big Data' itself is related to a size which is
enormous.
Size of data plays very crucial role in determining
value out of data.
Also, whether a particular data can actually be
considered as a Big Data or not, is dependent upon
volume of data.
Hence, 'Volume' is one characteristic which needs to
be considered while dealing with 'Big Data'.
2. Variety
 The next aspect of 'Big Data' is its variety.
Variety refers to heterogeneous sources and the nature of
data, both structured and unstructured.
During earlier days, spreadsheets and databases were the
only sources of data considered by most of the
applications.
Now days, data in the form of emails, photos, videos,
monitoring devices, PDFs, audio, etc. is also being
considered in the analysis applications.
This variety of unstructured data poses certain issues for
storage, mining and analyzing data.
3. Velocity
 The term 'velocity' refers to the speed of generation
of data.
How fast the data is generated and processed to meet
the demands, determines real potential in the data.
Big Data Velocity deals with the speed at which data
flows in from sources like business processes,
application logs, networks and social media sites,
sensors, Mobile devices, etc.
 The flow of data is massive and continuous.
Big data ecosystem
The big data ecosystem consists of the following:
1. Devices that collect data from multiple locations
and also generate new data about this data (metadata).
2. Data collectors who gather data from devices and
users.
3. Data aggregators that compile the collected data to
extract meaningful information.
4. Data users and buyers who benefit from the
information collected and aggregated by others in the
data value chain.
Big data ecosystem
Evolution of storage architecture
In earlier implementations of open systems, the storage
was typically internal to the server. These storage
devices could not be shared with any other servers.
This approach is referred to as server-centric storage
architecture
 In this architecture, each server has a limited number
of storage devices, and any administrative tasks, such as
maintenance of the server or increasing storage
capacity, might result in unavailability of information.
Evolution of storage architecture
To overcome these challenges, storage evolved from server-
centric to information-centric architecture.
In this architecture, storage devices are managed centrally and
independent of servers.
These centrally-managed storage devices are shared with
multiple servers.
When a new server is deployed in the environment, storage is
assigned from the same shared storage devices to that server.
The capacity of shared storage can be increased dynamically by
adding more storage devices without impacting information
availability.
In this architecture, information management is easier and
cost-effective.
Evolution of storage architecture
Data Center Infrastructure
Organizations maintain data centers to provide centralized
data-processing capabilities across the enterprise.
Data centers house and manage large amounts of data.
The data center infrastructure includes hardware
components, such as computers, storage systems, network
devices, and power backups and software components, such
as applications, operating systems, and management
software.
It also includes environmental controls, such as air
conditioning, fire suppression, and ventilation.
Large organizations often maintain more than one data
center to distribute data processing workloads and provide
backup if a disaster occurs.
Core Elements of a Data Center
Five core elements are essential for functionality of a datacenter:
Application: A computer program that provides the logic for
computing operations
Database management system (DBMS): Provides a
structured way to store data in logically organized tables that
are interrelated
 Host or compute: A computing platform (hardware, firmware,
and software) that runs applications and databases
Network: A data path that facilitates communication among
various networked devices
Storage: A device that stores data persistently for subsequent
use
Example
Application
An application is a computer program that provides the
logic for computing operations.
The application sends requests to the underlying
operating system to perform read/write (R/W) operations
on the storage devices.
Applications deployed in a data center environment are
commonly categorized as business applications,
infrastructure management applications, data protection
applications, and security applications.
Some examples of these applications are e-mail, enterprise
resource planning (ERP), decision support system (DSS),
resource management, backup, authentication and
antivirus applications, and so on.
DBMS
A database is a structured way to store data in
logically organized tables that are interrelated.
A database helps to optimize the storage and retrieval
of data.
A DBMS controls the creation, maintenance, and use
of a database.
The DBMS processes an application’s request for data
and instructs the operating system to transfer the
appropriate data from the storage.
Host/compute
Users store and retrieve data through applications. The
computers on which these applications run are referred to
as hosts or compute systems. Hosts can be physical or
virtual machines.
Examples of physical hosts include desktop computers,
servers or a cluster of servers, laptops, and mobile devices.
A host consists of CPU, memory, I/O devices, and a
collection of software to perform computing operations.
This software includes the operating system, file system,
logical volume manager, device drivers, and so on.
Host/compute
An operating system controls all aspects of
computing. It works between the application and the
physical components of a compute system.
A device driver is special software that permits the
operating system to interact with a specific device,
such as a printer, a mouse, or a disk drive.
Volume Manager
 In the early days, the entire disk drive was being allocated to the file
system. The disadvantage was lack of flexibility. When a disk drive ran
out of space, there was no easy way to extend the file system’s size.
Also, as the storage capacity of the disk drive increased, allocating the
entire disk drive for the file system often resulted in underutilization
of storage capacity.
 The evolution of Logical Volume Managers (LVMs) enabled dynamic
extension of file system capacity and efficient storage management.
 The LVM is software that runs on the compute system and manages
logical and physical storage.
 LVM is an intermediate layer between the file system and the physical
disk. It can partition a larger-capacity disk into virtual, smaller-
capacity volumes (the process is called partitioning) or aggregate
several smaller disks to form a larger virtual volume. (The process is
called concatenation.) These volumes are then presented to
applications.
 In partitioning, a disk drive is divided into logical containers called
logical volumes (LVs)
 Concatenation is the process of grouping several physical drives and
presenting them to the host as one big logical volume
File system
A file is a collection of related records or data stored as a
unit with a name. A file system is a hierarchical structure
of files.
The following list shows the process of mapping user files
to the disk storage subsystem with an LVM :
 1. Files are created and managed by users and applications.
 2. These files reside in the file systems.
 3. The file systems are mapped to file system blocks.
 4. The file system blocks are mapped to logical extents of a
logical volume.
 5. These logical extents in turn are mapped to the disk physical
extents either by the operating system or by the LVM.
 6. These physical extents are mapped to the disk sectors in a
storage subsystem.
Network/Connectivity
Connectivity refers to the interconnection between hosts
or between a host and peripheral devices, such as printers
or storage devices.
Physical components of connectivity are the hardware
elements that connect the host to storage. Three physical
components of connectivity between the host and storage
are the host interface device, port, and cable.
 A host interface device or host adapter connects a host to
other hosts and storage devices.
 A port is a specialized outlet that enables connectivity
between the host and external devices.
 Cables connect hosts to internal or external devices using
copper or fiber optic media.
Interface Protocols
A protocol enables communication between the host and
storage.
Protocols are implemented using interface devices (or
controllers) at both source and destination. The
popular interface protocols used for host to storage
communications are
 Integrated Device Electronics/Advanced Technology
Attachment(IDE/ATA),
 Small Computer System Interface (SCSI)
 Fibre Channel (FC)
 Internet Protocol (IP).
Storage
Storage is a core component in a data center. A
storage device uses magnetic(cheaper),
optic(moderate), or solid state media(expensive).
Disks, tapes, and diskettes use magnetic media,
CD/DVD uses optical media for storage.
Removable Flash memory or Flash drives are examples
of solid state media.
Key Characteristics of a Data
Center
 Availability: A data center should ensure the availability of information when
required.
 Security: Data centers must establish policies, procedures, and core element
integration to prevent unauthorized access to information.
 Scalability: Business growth often requires deploying more servers, new
applications, and additional databases. Data center resources should scale based on
requirements, without interrupting business operations.
 Performance: All the elements of the data center should provide optimal
performance based on the required service levels.
 Data integrity: Data integrity refers to mechanisms, such as error correction codes
or parity bits, which ensure that data is stored and retrieved exactly as it was
received.
 Capacity: Data center operations require adequate resources to store and process
large amounts of data, efficiently. When capacity requirements increase, the data
center must provide additional capacity without interrupting availability or with
minimal disruption. Capacity may be managed by reallocating the existing
resources or by adding new resources.
 Manageability: A data center should provide easy and integrated management of
all its elements.
Managing a Data Center
Managing a data center involves many tasks. The key
management activities include the following:
 Monitoring: It is a continuous process of gathering
information on various elements and services running in a
data center. The aspects of a data center that are monitored
include security, performance, availability, and capacity.
 Reporting: It is done periodically on resource performance,
capacity, and utilization. Reporting tasks help to establish
business justifications and chargeback of costs associated
with data center operations.
 Provisioning: It is a process of providing the hardware,
software, and other resources required to run a data center.
Provisioning activities primarily include resources
management to meet capacity, availability, performance, and
security requirements.
Virtualization & Cloud Computing
Virtualization is a technique of abstracting physical
resources, such as compute, storage, and network, and
making them appear as logical resources.
Virtualization enables pooling of physical resources and
providing an aggregated view of the physical resource
capabilities.
For example, storage virtualization enables multiple pooled
storage devices to appear as a single large storage entity.
Similarly, by using compute virtualization, the CPU
capacity of the pooled physical servers can be viewed as the
aggregation of the power of all CPUs.
Virtualization & Cloud Computing
Cloud computing allows users to use the virtual
resources from data center of a cloud as a service.
To provide users with this capability, cloud
computing make use of the concept of virtualization.
Virtualization divides hardware resources from
heterogeneous computer systems into various
implementation environments referred as Virtual
Machines (VMs).
Each of the VM is kept in isolation from rest of the
VMs and can behave as an entire system to
accomplish applications of users.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy