File Organization
File Organization
File Organization
Therefore, both these tables ‘student’ and ‘subject’ are allowed to combine
using a join operation and can be seen as following in the cluster file
Cluster Key
Subject_ID Subject_Name Student_ID Student_Name Student_Age
C01 Math 101 John 20
103 Anik 21
C02 Java 104 James 22
C03 C 105 Trump 21
107 Deny 20
C04 DBMS 102 Robert 20
106 Charles 20
108 Varun 21
INDEXING
• Indexing is used to optimize the performance of a database by
minimizing the number of disk accesses required when a query is
processed.
• The index is a type of data structure. It is used to locate and access
the data in a database table quickly.
• Index structure:
• Indexes can be created using some database columns.
• The first column of the database is the search key that contains a copy
of the primary key or candidate key of the table. The values of the
primary key are stored in sorted order so that the corresponding data
can be accessed easily.
• The second column of the database is the data reference. It contains a
set of pointers holding the address of the disk block where the value
of the particular key can be found.
Indexing methods
Ordered indices
• The indices are usually sorted to make searching faster. The indices which
are sorted are known as ordered indices.
• Example: Suppose we have an employee table with thousands of record and
each of which is 10 bytes long. If their IDs start with 1, 2, 3....and so on and
we have to search student with ID-543.
• In the case of a database with no index, we have to search the disk block
from starting till it reaches 543. The DBMS will read the record after reading
543*10=5430 bytes.
• In the case of an index, we will search using indexes and the DBMS will read
the record after reading 542*2= 1084 bytes which are very less compared to
the previous case.
Primary Index
• If the index is created on the basis of the primary key of the table,
then it is known as primary indexing.
• These primary keys are unique to each record and contain 1:1 relation
between the records.
• As primary keys are stored in sorted order, the performance of the
searching operation is quite efficient.
• The primary index can be classified into two types:
• Dense index and Sparse index.
Dense index
• The dense index contains an index record for every search key value in
the data file. It makes searching faster.
• In this, the number of records in the index table is same as the
number of records in the main table.
• It needs more space to store index record itself. The index records
have the search key and a pointer to the actual record on the disk.
Sparse index
• In the data file, index record appears only for a few items. Each item
points to a block.
• In this, instead of pointing to each record in the main table, the index
points to the records in the main table in a gap.
Clustering Index
• A clustered index can be defined as an ordered data file. Sometimes
the index is created on non-primary key columns which may not be
unique for each record.
• In this case, to identify the record faster, we will group two or more
columns to get the unique value and create index out of them. This
method is called a clustering index.
• The records which have similar characteristics are grouped, and
indexes are created for these group.
BIG DATA
• Big data is a collection of massive and complex data sets and data
volume that include the huge quantities of data, data management
capabilities, social media analytics and real-time data.
• Big data is about data volume and large data set's measured in terms
of terabytes or petabytes.
• This phenomenon is called Bigdata
• The challenges include capturing, analysis, storage, searching, sharing,
visualization, transferring and privacy violations.
• It can neither be worked upon by using traditional SQL queries nor
can the relational database management system (RDBMS) be used for
storage.
• Though, a wide variety of scalable database tools and techniques has
evolved.
• Hadoop is an open source distributed data processing is one of the
prominent and well known solutions.
• The NoSQL has a non-relational database with the likes of MongoDB
from Apache.
Types of Big data
• Structured- Any data that can be stored, accessed and processed in the
form of fixed format is termed as a ‘structured’ data.
• E.g. Table with column names
• Unstructured-Any data with unknown form or the structure is classified
as unstructured data.
• E.g. Google search results
• Semi-structured-Semi-structured data can contain both the forms of
data. We can see semi-structured data as a structured in form but it is
actually not defined with e.g. a table definition in relational DBMS.
• E.g. data represented in an XML file.
NOSQL
• A NoSQL originally referring to non SQL
• database that provides a mechanism for storage and retrieval of data.
• This data is modeled in means other than the tabular relations used in
relational databases.
• NoSQL databases are used in real-time web applications and big data
• their use are increasing over time.
• NoSQL systems are also sometimes called Not only SQL to emphasize
the fact that they may support SQL-like query languages.
Types of NoSQL database:
• MongoDB falls in the category of NoSQL document based database.
• Key value store: Memcached, Redis, Coherence
• Tabular: Hbase, Big Table, Accumulo
• Document based: MongoDB, CouchDB, Cloudant
Advantages of NOSQL
• High scalability
• High availability
Disdvantages of NOSQL
• Narrow focus-mainly focused on storage not much functionality
provided
• Open-source
• Management challenge
• GUI is not available
• Backup
• Large document size