0% found this document useful (0 votes)
13 views29 pages

UNIT I 2 Marks

The document provides an overview of relational databases, including definitions of key concepts such as databases, DBMS, schemas, and data models. It discusses various types of database constraints, normalization forms, and the importance of avoiding redundancy. Additionally, it covers advanced SQL features, including embedded and dynamic SQL, as well as the entity-relationship model and its application in database design.

Uploaded by

Srividhya Sri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views29 pages

UNIT I 2 Marks

The document provides an overview of relational databases, including definitions of key concepts such as databases, DBMS, schemas, and data models. It discusses various types of database constraints, normalization forms, and the importance of avoiding redundancy. Additionally, it covers advanced SQL features, including embedded and dynamic SQL, as well as the entity-relationship model and its application in database design.

Uploaded by

Srividhya Sri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 29

UNIT I

RELATIONAL DATABASE :
Purpose of Database System – Views of data – Data Models – Database System Architecture –
Introduction to relational databases – Relational Model – Keys – Relational Algebra – SQL fundamentals
Advanced SQL features – Embedded SQL– Dynamic SQL
PART A

1. What is database?(R)
A database is a basic electronic storage with collection of interrelated data, organized to provide
efficient retrieval. Databases are organized by fields, records and files or tables.

2. What is DBMS? (R) (NOV 2008)


A database management system (DBMS) is a software package designed to define, manipulate,
retrieve and manage data in a database.

3. List out some representative application of databases (U) (DEC


2007)
Databases are widely used. Here are some representative applications.
 Banking
 Airlines
 Universities
 Credit card transactions
 Telecommunication
 Finance
 Sales
 On-line retailers
 Manufacturing
 Human resources
4. What is the purpose of data base system? (U) (NOV 2014)
The primary goal of a DBMS is to provide an environment that is both convenient and
efficient for people to storing and retrieving information.

5. What are problems caused by redundancy?(Nov/Dec-2017)


 The first is that storing values multiple times wastes space.
 The second problem is that when a field value changes, multiple occurrences need
to be updated.
 The third problem occurs if we forget to change the values in any of the records.
The database would then have inconsistent data.

6. Describe the three levels of data abstraction? (R)(Nov/Dec-2017)


There are three levels of abstraction:
 Physical level
 Logical level
 View level

7. What is a physical level? (R)
The lowest level of abstraction describes how data are stored.

8. What is a logical level? (R)


The next higher level of abstraction, describes what data are stored in database and
what relationship among those data.

9. What is a view level? (R)


The highest level of abstraction describes only part of entire database.

10. What is a schema? (R)(MAY 2007)


The structure of a database system, described in a formal language supported by the database
management system (DBMS), In a relational database, the schema defines the tables, the fields in
each table, and the relationships between fields and tables. Schemas are generally stored in a data
dictionary.

11. What are the different types of schemas? (R)


 Physical Schema
 Logical Schema
 External Schema

12. What is an instance? (R)


The collection of information stored in the database at a particular moment is called an
instance of the database.
13. What do u mean by data model?(U) (MAY 2011)
Data Model can be defined as an integrated collection of concepts for describing and
manipulating data, relationships between data, and constraints on the data in an organization.

14. What are the different data models? (R) (MAY 2012)
 Relational Data Model
 The Entity-Relationship Data Model
 Object-Based Data Model
 Semi structured Data Model

15. What is a relational model? (R) (MAY 2010)


The Relational model uses a collection of tables to represent both data and the relationships
among those data. Each table has multiple columns, and each column has a unique Value.

16. What is an entity-relationship model? (R) (MAY 2008)


This data model is based on real world that consists of basic objects called entities and of
relationship among these objects. Entities are described in a database by a set of attributes.
17. What is object based data model? (R)
This model is based on collection of objects. An object contains values stored in instance
variables with in the object. An object also contains bodies of code that operate on the object. These
bodies of code are called methods. Objects that contain same types of values and the same methods
are grouped together into classes.

18. What is a semi structured data model? (R)


The semi structured data model permits the specification of data where individual data items of
the same type may have different sets of attributes.

19. What is Meta data? (R) (MAY 2008)


Data dictionary contains Metadata-that is, data about data.

20. What is the use of data dictionary? (R)


The DDL gets input as some instructions (statements) and generates some output. The output
of the DDL is placed in the data dictionary.

21. Define Relational Database. (R)


A relational database is based on the relational model and uses a collection of tables to
represent both data and the relationships among those data. It also includes a DML and DDL.

22. What is data base design ? (R)


Database design mainly involves the design of the database schema. The design of a
complete database application environment that meets the needs of the enterprise being modeled
requires attention to a broder set of issues.

23. What is physical data independence? (R)


Physical data independence deals with hiding the details of the storage structure from user
applications. The application should not be involved with these issues, since there is no difference
in the operation carried out against the data.

24. What are different access types in DML?(U)


The types of access are
 Retrieval of information stored in the database
 Insertion of new information into the database
 Deletion of information from the database
 Modification of information stored in the database

25. What is a procedural DML? (R)


Procedural DMLs require a user to specify what data are needed and how to get those data.
26. What is declarative DML? (R)
Declarative DMLs also referred to as nonprocedural DMLs require a user to specify what
data are needed without specifying how to get those data.

27. Define Assertions(R)


An assertion is any condition that the database must always satisfy.

28. What is storage manager? (R)


A storage manager is a program module that provides the interface between the low level
data stored in the database and the application programs and queries submitted to the system.

29. What are the components of storage manager? (U)


 Authorization and integrity manager
 Transaction manager
 File manager
 Buffer manager
30. List out the data structure used to implement the storage manager. (R)
 Data files
 Data dictionary
 Indices

31. What is the use of embedded SQL? (U) (MAY 2012) (NOV 2014)
A fundamental principle underlying embedded SQL, which we call the dual-mode
principle, is that any SQL statement that can be used interactively can also be embedded in an
application program.
32.Write short note on OPEN, FETCH and CLOSE statements. (R)
OPEN:
EXEC SQL OPEN < Cursor name>;
Opens the specified cursor. A set of rows is thus identified and becomes the current active set for
the cursor.
FETCH:
EXEC SQL FETCH <Cursor name>
INTO <host variable reference
commalist>; Advances the specified cursor to the next
row in the active set.
CLOSE:

EXEC SQL CLOSE <Cursor name>;


Closes the specified cursor. The cursor now has no current active set.

33. Discuss about dynamic SQL. (R) (NOV 2014)


Dynamic SQL is a part of embedded SQL. It consists of a set of ―dynamic statements‖-
which themselves are compiled ahead of time- whose purpose is precisely to support the
compilation and execution of regular SQL statements that are constructed at run time.
34. What is a candidate key? (R)
Let K be a set of attributes of relvar R. Then K is a candidate key for R if and only if it
has both of the following properties: Uniqueness and Irreducibility

35. What is a primary key? (R)


It is possible for a given relation to have two or more candidate keys. In such a case, the
relational model has historically required that exactly one of those keys be chosen as
the primary key

36. What is an alternate key? (R)


It is possible for a given relation to have two or more candidate keys. In such a case, the
relational model has historically required that exactly one of those keys be chosen as the
primary key and the others are then called alternate keys.

37. What is a foreign key? (R)


A foreign key is a set of attributes of some relvar R2 whose values are required to match
values of some candidate key of some relvar R1.

38. What is a trigger? (R)


Triggered procedures are precompiled procedures that are stored along with the database and
invoked automatically whenever some specified event occurs.

39. What are the relational operators? (R)


Union, Intersect, Difference, Product, Select, Project, Join, Divide are the relational operators.

40. Define Cartesian product. (R)


Cartesian product of two relations a and b, a TIMES b, where a and b have no common
attribute names, to be a relation with a heading that is the union of the headings of a and b and with
a body appearing in a and a tuple appearing in b.

41. What is a range variable? (R)


A range variable is a variable that ―ranges over‖ some specified relation. If range
variable V ranges over relation r, then, at any given time, the expression ―V‖ denotes some
tuple of r.

42. Discuss about query-by-example? (U)


Query-By-Example syntax, which is attractive and intuitively very simple, is based on the
idea of making entries in blank tables.

43. What do you mean by type constraints? (R)


A type constraint is, precisely, a definition of the set of values that constitute a given type.
44. What do you mean by database constraints? (R)
A database constraint is a constraint on the values a given database is permitted to assume.

45. What do you mean by relation constraints? (R)


A relation constraint is a constraint on the values a given relvar is permitted to assume.
46. What do you mean by attribute constraints? (R)
An attribute constraint is a constraint on the values a given attribute is permitted to assume.

47. What is referential integrity? (R)


Referential integrity database must not contain any unmatched foreign key values.

48. What is a transition constraint? (R)


A transition constraint is a constraint on the legal transitions that a given
variable-in particular, a given relvar or a given database-can make from one value
to another.

49. Give the usage of the rename operation with an example.(MAY 2010) (U)

RA _S1.Name;S2.Name(_S1.Address = S2.Address(_S1(Students) _ _S2(Students))).

50. What are the desirable properties of decomposition?(Apr/May-2017)

1.LosslessDecomposition
2.DependencyPreservation
3. Lack of Data Redundancy

51. Distinguish between Key and Super Key? (Apr/May-2017)

Key A key is a single or combination of multiple fields. Its purpose is to access or retrieve

data rows from table according to the requirement.

A superkey is a combination of attributes that can be uniquely used to identify a database

record. A table might have many superkeys.


UNIT II
DATABASE DESIGN
Entity-Relationship model – E-R Diagrams – Enhanced-ER Model – ER-to-Relational
Mapping – Functional Dependencies – Non-loss Decomposition – First, Second, Third
Normal Forms, Dependency Preservation – Boyce/Codd Normal Form – Multi-valued
Dependencies and Fourth Normal Form – JoinDependencies and Fifth Normal Form
Fifth Normal Form
PART A
• What is entity? (R)
It is a 'thing' in the real world with an independent existence.

• What is relationship set? (R)


The collection (or set) of similar relationships.

• What is entity set? (R)


It is a collection of all entities of particular entity type in the database.

• What is week entity? (R) (NOV 2008)


(NOV 2016)
An entity set may not have sufficient attributes to form a primary key, and its primary
key compromises of its partial key and primary key of its parent entity, then it is said to be
Weak Entity set.
• What is mapping cardinality? (R)
Mapping cardinality express the number of entities to which another rntity can be
associated via a relationship set.

• What is functional dependence? (U) (Apr/May2015)


A Functional dependency is denoted by X Y between two sets of attributes X and Y
that are subsets of R specifies a constraint on the possible tuple that can form a relation state r
of R. The constraint is for any two tuples t1 and t2 in r if t1[X] = t2[X] then they have t1[Y] =
t2[Y]. This means the value of X component of a tuple uniquely determines the value of
component Y.

• What is object oriented data model? (R)


The object –oriented data model is another data model that has seen increasing attention.
The object-oiented model can be seen as extending the E-R model with notions of encapsulation,
methods and object identity.
• What do you mean by functional dependencies? (U) (MAY 2010)
(nov2013)(Dec2015)
Let R be a relation variable and let X and Y be arbitrary subsets of the set of attributes of
R. then we say that Y is functionally dependent on-in symbols,XY (read ―X functionally
determines Y,‖ or simply ― X arrow Y ‖)- if and only if in every possible legal value of R, each X
Value has associated with it precisely one Y value.

• What is a trivial dependency? (R) (MAY 2012)(nov 2013)


A dependency is trivial if it cannot possibly fail to be satisfied.

• What is a nontrivial dependency? (R) (MAY 2012)


Nontrivial dependencies are the ones that correspond to ―genuine‖ integrity constraints.

• Define Heath‟s theorem. (R)


Let R{A,B,C}be a relvar, where A,B, and C are sets of attributes. If R satisfies the FD
A B, then R is equal to the join of its projections on {A,B} and {A,C}.

• Define first normal form.(R)


A relvar is in 1NF if and only if, in every legal value of that relvar, every tuple contains
exactly one value for each attribute.

• Define second normal form.(R)


A relvar is in 2NF if and only if it is in 1NF and every nonkey attribute is irreducibly
dependent on the primary key.

• Define third normal form. (R)


A relvar is in 3NF if and only if it is in 2NF and every nonkey attribute is nontransitively
dependent on the primary key.
• What is dependency preservation? (R)
Normalization procedure should decompose relvars into projections that are independent in Rissanen‗s
sense has come to be known as dependency preservation.

• Define Boyce/Codd normal form. (R) (MAY 2012)


A relvar is in BCNF if and only if every nontrivial, left-irreducible FD has a candidate
key as its determinant (OR) A relvar is in BCNF if and only if every determinant is a candidate
key.
• What is multi-valued dependence? (R)
Let R be a relvar, and let A, B and C be subsets of the attributes of R. Then we say that B
is multi-dependent on A-in symbols, AB

• Define Fagin theorem. (R)


Let R{A,B,C} be a relvar, where A, B, and C are sets of attributes. Then R is equal to the
join of its projections on {A, B} and {A, C} if and only if R satisfies the MVDs AB|C.

• Define fourth normal form. (R)


Relvar R is in 4NF if and only if, whenever there exist subsets A and B of the attributes of
R such that the nontrivial MVD AB is satisfied, then all attributes of R are also functionally
dependent on A.
• What is a join dependency? (R)
Let R be a relvar, and let A, B,…. .Z be subsets of the attributes of R. Then we say that R
satisfies the JD *{A, B …Z} if and only if every legal value of R is equal to the join of its
projections on A, B,……Z.

• Define fifth normal form. (R)


A relvar R is in 5NF-also called projection-join normal form(PJ/NF)-if and only if every
nontrivial join dependency that is satisfied by R is implied by the candidate key(s) of R, where

 The join dependency *{A, B,…….Z} on R is trivial if and only if


atleast one of A, B,…Z is the set of all attributes of R.

 The join dependency *{A, B,…….Z} on R is implied by the candidate


 key(s) of R if and only

• What is denormalization? (R)


Let R1, R2,….., Rn be a set of relvars. Then denormalizing those relvars means replacing
them by their join R, such that for all i (i=1, 2,….., n) projecting R over the attributes of Ri is
guaranteed to yield Ri again.

• What is normalization? (R) (MAY 2010)


Normalization is the process of structuring relational database schema such that most
ambiguity is removed.

• What are the aspects of relational model?(R)


The Three aspects of relational model are:
• Structural aspect: The data in the database is perceived by the user as tables, and nothing but
tables.
• Integrity aspect: Those tables satisfy certain integrity constraints.
• Manipulative aspect: The operators available to the user for manipulating those tables- for
example, for purposes of data retrieval – are operators that derive tables from tables. Of those
operators, three particularly important ones are select, project and join.
• Mention the important points about relational databases. (R)
A set of important points about relational databases:
 Relational databases store data in the form of tables (logically).
 The rows of a table are called as tuples.
 The columns of a table are known as attributes.
 Every attribute has a data type associated with it.
 Every attribute has a domain which provides the set of all possible
values that can be stored as values for that attribute.
 Tables are called as relations.
 The table names are called as relational variable
• What is a catalog? (R)
The catalog contains detailed information, sometimes called descriptor information or
Metadata, regarding the various objects that are of interest to the system itself.
• What do you mean by weak entity set? (MAY 2010) (R)
A weak entity is an entity that cannot be uniquely identified by its attributes alone;
therefore, it must use a foreign key in conjunction with its attributes to create a primary key. The
foreign key is typically a primary key of an entity it is related to.

• What is the difference between Data Integrity and Data Security? (NOV 2013) (U)
Data integrity and data security are two different aspects that make sure the usability of
data is preserved all the time. Main difference between integrity and security is that integrity deals
with the validity of data, while security deals with protection of data. Backing up, designing
suitable user interfaces and error detection/correction in data are some of the means to preserve
integrity, while authentication/authorization, encryptions and masking are some of the popular
means of data security. Suitable control mechanisms can be used for both security and integrity.
• Which operators are called as unary operators and why are they called so
An operator that takes a single operand in an expression or a statement. The unary operators in
C# are +, -,!, ~, ++, -- and the cast operator.

• What is an Entity? (R)


It is a 'thing' in the real world with an independent existence.

• What is a query? (NOV2013) (R)


A query is a statement requesting the retrieval of information.

• Define query language? (R)


A query language is a language in which user requests information from the database.
UNIT III
TRANSACTIONS
Transaction Concepts – ACID Properties – Schedules – Serializability – Transaction support
in SQL – Need for Concurrency – Concurrency control –Two Phase Locking- Timestamp –
Multiversion – Validation and Snapshot isolation– Multiple Granularity locking – Deadlock
Handling – Recovery Concepts – Recovery based on deferred and immediate update – Shadow
paging – ARIES Algorithm

PART – A

• Define recovery in a database system.(R)

Recovery in a database system means, primarily, recovering the database itself; that is,
restoring the database to a correct state after some failure has rendered the current state
incorrect, or at least suspect.

• What is a transaction? (MAY 2010) (R)

A transaction is a logical unit of work; it begins with the execution of a BEGIN


TRANSACTION operation, and ends with the execution of a COMMIT or ROLLBACK
operation.

• What is a commit point? (R)

A commit point corresponds to the end of a logical unit of work, and hence to a point at
which the database is supposed to be in a correct state.

• Discuss the ACID properties. (R) (MAY 2010) (NOV 2014)(Dec2015)


ACID properties are

 Atomicity: Transactions are atomic


 Correctness: Transactions transform a correct state of the database into
another correct state, without necessarily preserving correctness at all
intermediate points
 Isolation: Transactions are isolated from one another. That is, even though in
general there will be many transactions running concurrently, any given
transaction‗s updates all connected from all the rest, until that transaction
commits. Another way of saying the same thing is that, for any two distinct
transactions A and B , A might see B‗s updates or B might see A‗s updates,
but not both.
 Durability: Once a transaction commits, its updates persist in the database,
even if there is a subsequent system crash.

• What is a system failure? (R)


System failures (e.g., power outage), which affect all transactions currently in
progress but do not physically damage the database. A system failure is sometimes called a
soft crash.

• What is a media failure? (R)


Media failures (e.g. head crash on the disk), which do cause damage to the database or
some portion thereof, and affect at least those transactions currently using that portion. A
media failure is sometimes called a hard crash.

• What is a check point record? (R)


The checkpoint record contains a list of all transactions that were in progress at the time the
checkpoint was taken.

• Expand ARIES. (R)


The name ARIES stands for ―Algorithms for Recovery and Isolation Exploiting
Semantics.‖
• What are the three broad phases of ARIES? (U)
o ARIES operates in three broad phases:

 Analysis: Build the REDO and UNDO lists


 Redo: Start from a position in the log determined in the analysis phase and
restore the database to the state it was in at the time of the crash.
 Undo: Undo the effects of transactions that failed to commit.
• What is a two-phase commit? (U)
Two-phase commit is important whenever a given transaction can interact with several
independent ―resource managers,‖ each managing its own set of recoverable
resources and maintaining its own recovery log.‖

• What is the need for save points? (U)


It might be possible for a transaction to establish intermediate save points while it is
executing, and subsequently to roll back to a previously established save point, if required,
instead of having to roll back all the way to the beginning.
• Define concurrency (MAY 2012) (R)
Concurrency refers to the fact that DBMSs typically allow many transactions to access
the same database at the same time.

• What are the three problems that any concurrency control mechanism must address? (R)
The three problems are:

 The lost update problem


 The uncommitted dependency problem
 The inconsistent analysis problem
• What is the last update problem? (U)
Transaction A retrieves some tuple t at time t1; transaction B retrieves that same tuple t
at time t2; transaction A updates the tuple at time t3; and transaction B updates the same tuple
at time t4; Transaction A‗s update is lost at time t4, because transaction B overwrites it with-
out even looking at it.

• What is the uncommitted dependency problem? (U)


The uncommitted dependency problem arises if one transaction is allowed to retrieve-
or, worse, update-a tuple that has been updated by another transaction but not yet committed
by that other transaction.

• What is deadlock? (NOV 2014) (R)


Deadlock is a situation in which two or more transactions are in a simultaneous wait
state, each of them waiting for one of the others to release a lock before it can proceed.

• What is serializability? (NOV2016) (NOV 2014) (R)


Serializability is the generally accepted ―criterion for correctness‖ for the
interleaved execution of a set of transactions; that is, such an execution is considered to be
correct if and only if it is serializable.

• State the two-phase locking theorem (R)

Two –phase locking theorem is ―If all transactions obey the two-phase locking
protocol, then all possible interleaved schedules are serializable‖.

• State the two-phase locking protocol. (MAY 2012)(NOV 2013) (R)


The two-phase locking protocol is as follows

Before operating on any object a transaction must acquire a lock on that object.After
releasing a lock, a transaction must never go on to acquire any more locks.
• What is an isolation level? (R)
The isolation level that applies to a given transaction might be defined as the degree of
interference the transaction in question is prepared to tolerate on the part of concurrent
transactions.
• What do you mean by phantom problem? (R)
If transactions operate at less than the maximum isolation level is the so-called
phantom problem

• What is an intent locking protocol? (U)


The intent locking protocol, according to which no transaction is allowed to acquire a
lock on a tuple before first acquiring a lock-probably an intent lock on the relvar that contains
it.

• List the four conditions for Deadlock(U) (NOV 2016)


A deadlock situation can arise if and only if the following four conditions
hold simultaneously in a system-

o Mutual Exclusion: At least one resource is held in a non- sharable mode that is
only one process at a time can use the resource. If another process requests that
resource, the requesting process must be delayed until the resource has been
released.

o Hold and Wait: There must exist a process that is holding at least one resource and is
waiting to acquire additional resources that are currently being held by other processes.

o No Preemption: Resources cannot be preempted; that is, a resource can only be


released voluntarily by the process holding it, after the process has completed its
task.

o Circular Wait: There must exist a set {p0, p1,................................pn} of waiting processes
such that p0 is waiting for a resource which is held by p1, p1 is waiting for a resource which
is held by p2, pn-1 is waiting for a resource which is held by pn and pn is waiting for a resource
which is held by p0.

• What is a shadow copy scheme?


It is simple, but efficient, scheme called the shadow copy schemes. It is based o n making
copies of the database called shadow copies that one transaction is active at a time. The scheme also
assumes that the database is simply a file on disk.

• What type of locking needed for insert and delete operations (April/May- 2017)

When you execute an INSERT, UPDATE, or DELETE statement, the database server uses

exclusive locks. An exclusive lock means that no other users can update or delete the item until the

database server removes the lock.


UNITIV

IMPLEMENTATION TECHNIQUES
RAID – File Organization – Organization of Records in Files – Data dictionary Storage – Column
Oriented Storage– Indexing and Hashing –Ordered Indices – B+ tree Index Files – B tree Index
Files – Static Hashing – Dynamic Hashing – Query Processing Overview – Algorithms for
Selection, Sorting and join operations – Query optimization using Heuristics
- Cost Estimation.

PART – A

• What is cache memory?(U)


The cache is the fastest and most costly form of storage. Cache memory is small; its
use is managed by the computer system hardware.
• What is main memory?(U)
The storage medium used for data that are available to be operated on is main
memory. The general-purpose machine instructions operate on main memory. Although main
memory may contain many megabytes of data, or even hundreds of gigabytes of data in large
server systems, it is generally too small for storing the entire database. The contents of main
memory are usually lost if a power failure or system crash occurs.

• What is flash memory? (MAY 2010)(U)


EEPROM (Electrically Erasable Programmable Read Only Memory)

 Data in flash memory survive from power failure


 Reading data from flash memory takes about 10 nano-secs (roughly as fast as
from main memory), and writing data into flash memory is more complicated:
write-once takes about 4-10 microsecs.
 To overwrite what has been written, one has to first erase the entire bank of the
memory. It may support only a limited number of erase cycles (104 to 106).
 It has found its popularity as a replacement for disks for storing small volumes
of data (5- 10 megabytes).
• What is magnetic disk storage?(U)
Primary medium for long-term storage.
 Typically the entire database is stored on disk.
 Data must be moved from disk to main memory in order for the data to be operated
on.
 After operations are performed, data must be copied back to disk if any
changes were made.
 Disk storage is called direct access storage as it is possible to read data on the
disk in any order (unlike sequential access).
 Disk storage usually survives power failures and system crashes
• What is optical storage?(U)
The most popular form of optical storage are CD-ROM (compact – disk read – only
memory), WORM (write-once read-many)disk (for archival storage of data), and Juke
box(containing a few drives and numerous disks loaded on demand).

• What is tape storage?)(U)


Tape storage used for primarily for backup and archival data. Cheaper, but much slower access,
since tape must be read sequentially from the beginning.Used as protection from disk failures.
• Draw the storage device hierarchy. (MAY 2010)(R)

Cache

Main memory

Flash memory

Magnetic disk

Optical disk

Magnetic tapes

• What is SAN architecture?(R)


In the storage area network (SAN) architecture, large numbers of disks are connected by a
high-speed network to a number of server computers.

• What is NAS?(R)
Network attached storage (NAS) is an alternative to SAN. NAS is much like SAN, except
that instead of the networked storage appearing to be a large disk, it provides a file system
interface using networked file system protocols such as NFS or CIFS.

• Define access time(R)


Access time is the time from when a read or write request is issued to when data
transfer begins.

• Define seek time.(R)


To access data on a given sector of a disk, the arm first must move so that it is positioned
over the correct track, and then must wait for the sector to appear under it as the disk rotates. The
time for repositioning the arm is called the seek time.
• Define average seek time.(R)
The average seek time is the average of the seek times, measured over a sequence of
random requests.
• Define rotational latency time.(U)
Once the head has reached the desired track, the time spent waiting for the
sector to be accessed to appear under the head is called the rotational latency time

• Define average latency time.(U)


Average latency time of the disk is one-half the time for a full rotation of the disk.

• Define data-transfer rate.(U)


The data-transfer rate is the rate at which data can be retrieved from or stored to the disk.
• What is RAID? (NOV 2014)(R)
A variety of disk organization techniques, collectively called redundant arrays of
independent disks (RAID), have been proposed to achieve improved performance and reliability.

• What are the factors to be taken into account in choosing a RAID level?(R)
The factors to be taken into account in choosing a RAID level are

 Monetary cost of extra disk-storage requirements


 Performance requirements in terms of number of I/O operations
 Performance when a disk has failed.
 Performance during rebuild(that is, while the date in a failed disk are being
rebuilt on a new disk)
• What is heap file organization?(R)
Any record can be placed anywhere in the file where there is space for the record. There is no
ordering of records. Typically, there is a single file for each relation.

• What is sequential file organization?(R)


Records are stored in sequential order, according to the value of a search key‖ of each record.

• What is hashing file organization?(R)


A hash function is computed on some attribute of each record. The result of the hash function
specifies in which block of the file the record should be placed,

• What is a multitable clustering file organization?(U)


A multitable clustering file organization is a file organization that stores related records of
two or more relations in each block. Such a file organization allows us to read records that would
satisfy the join condition by using one block read.

• What are the two basic kinds of indices?(U)


The Two basic kinds of indices are
 Ordered indices
 Hash indices

• What are access types?(U)


The types of access that are supported efficiently. Access types can include finding records
with a specified attribute vale and finding records whose attribute values fall in a specified range.
• What is access time?(U)
The time it takes to find a particular data item, or set of items, using the technique
in question.

• What is insertion time?(U)


The time it takes to insert a new data item. This value includes the time it takes to find
the correct place to insert the new data item, as well as the time it takes to update the index
structure.

• What is deletion time?(U)


The time it takes to delete a data item. This value includes the time it takes to find the
item to be deleted, as well as the time it takes to update the index structure.

• What is space overhead?(U)


The additional space occupied by an index structure. Provided that the amount of
additional space is moderate, it is usually worth-while to sacrifice the space to achieve
improved performance.

• What is a search key? (R)


An attribute or set of attributes used to look up records in a file is called a search key.

• What is an clustering index (or) primary index? (NOV 2014)(U)

Clustering index is an index whose search key also defines the sequential
order of the file.

• What is an nonclustering index (or) secondary index?(U)


Indices whose search key specifies an order different from the sequential order of the file
are called nonclustering indices.

• What is an index-sequential file?(U)


All files are ordered sequentially on some search key. Such files, with a clustering index
on the search key, are called index-sequential files.

• What is a dense index? (U) (MAY 2012)


An index record appears for every search-key value in the file. In a dense clustering index,
the index record contains the search-key value and a pointer to the first data record with that
search-key value. The rest of the records with the same search-key value would be stored
sequentially after the first record, since, because the index is a clustering one, records are stored
on the same search key. Dense index implementations may store a list of pointers to all records
with the same search-key value; doing so is not essential for clustering indices.
• What is a sparse index? (U) (MAY 2012)
An index record appears for only some of the search-key values. as is true in dense indices,
each index record contains a search-key value and a pointer to the first data record with that
search-key value. To locate a record, we find the index entry with the largest search-key value that
is less than or equal to the search-key value for which we are looking. We start at the record
pointed to by that index entry, and follow the pointers in the file until we find the desired record.

• What is a multilevel indices?(U)


Indices with two or more levels are called multilevel indices.

• What is a balanced tree?(U)


Balanced tree in which every path from the root of the tree to the leaf of the tree is of
the same length.
• Write the querying a B+ tree.(AP)

o Procedure find (value V)


Set C = root node
While C is not a leaf node
begin Let Ki= smallest
search-key value, if any,
greater than V If
there is no such value
then begin
Let m = the number o pointers in the node
Set C= node pointed to by Pm
End
End
Else
set C= the node pointed to by Pi
If there is a key value
Ki =V Then pointer Pi
directs us to the desired
record or bucket Else
no record with key
value k exists.

• Define hashing.(R)
Hashing allow us to avoid accessing an index structure. It provides a way of constructing
indices.

• Define a bucket.(R)
The term bucket to denote a unit of storage that can store one or more records.
A bucket is typically a disk block, but could be chosen to be a smaller or larger than a
disk

• Define hash function.(U)


A hash function h is a function from K to B. Let h denote a hash function

• What are the two different purposes of hashing?(R)


Hashing can be used for two different purposes.

 Hash file organization


 Hash index organization

• What is a hash index?(R)


A hash index organizes the search keys, with their associated pointers, into a hash file
structure.

• What is dynamic hashing? (R) (NOV 2014)


Dynamic hashing techniques allow the hash function to be modified dynamically to
accommodate the growth or shrinkage of the database.

• What are the steps involved in query processing?(R)


The steps involved in processing a query are

 Parsing and translation


 Optimization
 Evaluation
• Write the nested-loop join algorithm.(U)
 For each tuple tr in r do begin

o For each tuple ts in s do


begin Test pair (tr, ts) to see
if they
 satisfy the join condition If
 they do, add tr.ts
to the result
 En d
 En d

• Write the block nested-loop join and its algorithm.(U)


Block nested-loop join, which is a variant of the nested – loop join where every block of the inner
relation is paired with every block of the outer relation.For each block Br of r do begin For each block Bs
of s do begin For each tuple tr in Br do begin For each tuple ts in Bs do begin Test pair (tr, ts) to see if they
satisfy the join condition If they do, add tr.ts to the resuld En dEn dEn d
• What is an indexed nested-loop join?(U)
Indexed nested loop join can be used with existing indices, as well as with temporary indices
created for the sole purpose of evaluating the join.

• What is a merge join?(U)


The merge-join algorithm can be used to compute natural joins and equi-joins.

• What is a hash-join?(U)
The hash-join algorithm can be used to implement natural joins and equi-joins.

• What are the contents of DBMS catalog?(R)


The DBMS catalog stores the following statistical information about database relations:

 nr,the number of tuples in the relation r


 br, the number of blocks containing tuples of relation r
 lr, the size of a tuple of relation r in bytes
 fr,the blocking factor of relation r-that is,the number of tuples of relation r
that fit into one block
 V (A,r),the number of distinct values that appear in the relation r for attribute
A. This value is the same as the size of (r).If A is a key for relation r,V(A,r)is
nr
• What is a slotted page sheet.draw the diagram(R) (NOV 2013)
 Record id = <page id, slot #>

 Can move records on page without changing rid; so,


attractive for fixed-length records too. Page is full when
data . . space 2Rid a
Rid =

nd slot array meet.


• What is a histogram?(U)
In histogram the values for the attribute are divided into a number of ranges, and with each
range the histogram associates the number of tuples whose attribute value lies in that range.

• What is an index record?(U)


An index record, or index entry, consists of a search – key vale and pointers to one or
more records with that value as their search-key value.

• Define mean time to failure(MTTF)(R)


Mean time to failure is a measure of the reliability of the disk. The mean time to failure of
a disk is the amount of time that, on average, we can expect the system to run continuously
without any failure.
• What is mirroring?(R)
The simplest approach to introducing redundancy is to duplicate every disk. This
technique is called mirroring.

• How dynamic hashing differ from static hashing? (U) DEC2015)


o Static Hashing
 A bucket is a unit of storage containing one or more records (a bucket is
typically a disk block).
 The file blocks are divided into M equal-sized buckets, numbered bucket0,
bucket1... bucketM- 1.Typically, a bucket corresponds to one (or a fixed
number of) disk block.

 In a hash file organization we obtain the bucket of a record directly from


its search-key value using a hash
o Dynamic Hashing

 Good for database that grows and shrinks in size


 Allows the hash function to be modified dynamically
• Extendable hashing – one form of dynamic hashing
 Hash function generates values over a large range —typically b-bit integers, with
b = 32.
 At any time use only a prefix of the hash function to index into a table of bucket
addresses.
• Disadvantages of B TREE over B+ TREE(R)(NOV 2016)
 In a B tree search keys and data stored in internal or leaf nodes. But
in B+- tree data store only leaf nodes.

 Searching any data in a B+ tree is very easy because all data are found
in leaf nodes. In a B tree, data cannot be found in leaf nodes.

 In a B tree, data may be found in leaf nodes or internal nodes.


Deletion of internal nodes is very complicated. In a B+ tree, data is
only found in leaf nodes. Deletion of leaf nodes is easy.

 Insertion in B tree is more complicated than B+ tree.

 B+ trees store redundant search key but B tree has no redundant value.

 In a B+ tree, leaf nodes data are ordered as a sequential linked list but
in B tree the leaf node cannot be stored using a linked list. Many
database systems' implementations prefer the structural simplicity of
a B+ tree.
• What is a query execution plan?(Apr/May-2017)
 Execution plan will be generated by Query optimizer with the help of
statistics and Algebrizer\processor tree.
 It is the result of Query optimizer and tells how to do\perform your work\
requirement.
 There are two different execution plans - Estimated and Actual.
 Estimated execution plan indicates optimizer view.
 Actual execution plan indicates what executed the query and how was it done.
• Which cost component are used most often as the basis for cost function? (Apr/May-
2017)
 Access cost to secondary storage
 Memory usage of cost
• What is replication transparency?(Apr/May-2017)(U)
Distribution Transparency. Distribution transparency is the property of distributed
databases by the virtue of which the internal details of the distribution are hidden from the users.
The DDBMS designer may choose to fragment tables, replicate the fragments and store them at
different sites.
• State the Need for Query Optimization.

 Process of selecting an efficient execution plan for evaluating the query.


 refers to the process of finding lowest cost method of evaluating a given query.
 Need for being logically consistent because the least cost plan will always be
consistently low.
UN I T V
ADVANCED TOPICS
Unit-V Advanced Topics
Distributed Databases: Architecture, Data Storage, Transaction Processing, Query
processing and optimization – NOSQL Databases: Introduction – CAP Theorem –
Document Based systems – Key value Stores – Column Based Systems – Graph
Databases. Database Security: Security issues – Access control based on privileges – Role
Based access control – SQL Injection – Statistical Database security – Flow control –
Encryption and Public Key infrastructures – Challenges
PART-A

1. What are the different aspects of security problem?(R)


There are many aspects to the security problem. Here are some of them:
i. Legal, social, and ethical aspects
ii. Physical controls
iii. Policy questions
iv. Operational problems
v. Hardware
controls vi.Operational
system support
and finally
Issues that are the specific concern of the database system itself.

2. What is discretionary access control?(U)


In the case of discretionary control, a given user will typically have
different access rights on different objects; further, there are few inherent
limitations regarding which users can have which rights on which objects.
Discretionary schemes are thus flexible.

3. What is mandatory access control?(U)


In the case of mandatory control, by contrast, each data object is
labeled with a certain classification level, and each user is given a certain
clearance level.

4. What are the contents of audit trail record?(R)


A typical audit trail record might contain the following information:

 Request(source text)
 Terminal from which the operation was invoked
 User who invoked the operation
 Date and time of the operation
 Relvar(s), tuples(s),attribute(s) affected
 Before images(old values)
 After images(new values)
5. What is entity integrity?(U)
No component of the primary key of any base relvar is allowed to
accept nulls are called entity integrity.

6. What are views for?(U)


There are many reasons why view support is desirable. Here are some of
them:

 Views provide a shorthand or ―macro‖ capability.


 Views allow the same data to be seen by different users in
different ways at the same time.
 Views provide automatic security for hidden data.
 Views can provide logical data independence.
7. Define the golden rule.(R)
Golden Rule is
No update operation must ever assign to any relvar a value that
causes its relvar predicate to evaluate to FALSE.
Or (a little loosely):
No relvar must ever be allowed to violate its own predicate.

8. Define a distributed database system.(U)


A distributed database system consists of a collection of sites,
connected together via some kind of communications network, in which:
a. Each site is a full database system site in its own right but
b. The sites have agreed to work together so that a user at any site
can access data anywhere in the network exactly as if the data
were all stored at the user‗s own site.

9. Define a distributed database management system.(R)(NOV 2016)


A new software component at each site logically an extension of the
local DBMS provides the necessary partnership functionality, and it is the
combination of these new components together with the existing DBMSs that
constitutes what is usually called the distributed database management
system

10. What are the advantages of distributed databases?(U)


It enables the structure of the database to mirror the structure of the
enterprise-local data can be kept locally, where it most logically belongs-
while at the same time remote data can be accessed when necessary.
11. What is the fundamental principle of distributed database?(U)
The fundamental principle of distributed database is to the user, a
distributed system should look exactly like a non-distributed system.

12. What are the objectives of distributed databases?(R)


The objectives of distributed databases are
1) Local autonomy
2) No reliance on a central site
3) Continuous operation
4) Location independence
5) Fragmentation independence
6) Replication independence
7) Distributed query processing
8) Distributed transaction management
9) Hardware independence
10) Operating system independence
11) Network independence
12) DBMS independence
13. What are the problems of distributed databases?(R)
The problems are
 Query processing
 Catalog management
 Update propagation
 Recovery
 Concurrency
14. Define client/server systems.(R)
A client/server system is a distributed system in which
a) Some sites are client sites and some are server sites
b) All data resides at the server sites
c) All applications execute at the client sites, and
d) ―the seams show
15. What is OODBMS ?(U)
Object-oriented database management systems (OODBMSs) combine
database capabilities with object-oriented programming language
capabilities. OODBMSs allow object-oriented programmers to develop
the product, store them as objects, and replicate or modify existing
objects to make new objects within the OODBMS. Because the database
is integrated with the programming language, the programmer can
maintain consistency within one environment, in that both the OODBMS
and the programming language will use the same model of representation.
16. What is meant by Data warehousing? (NOV 2014)(U)
A data warehouse is a relational database that is designed for query and
analysis rather than for transaction processing. It usually contains historical data
derived from transaction data, but it can include data from other sources. It
separates analysis workload from transaction workload and enables an
organization to consolidate data from several sources.

17. What is meant by data mining? (NOV 2014)(R)


The overall goal of the data mining process is to extract
information from a data set and transform it into an understandable
structure for further use.

18. What is meant by information retrieval?(R)


An information retrieval process begins when a user enters a query into
the system. Queries are formal statements of information needs, for example
search strings in web search engines. In information retrieval a query does not
uniquely identify a single object in the collection. Instead, several objects may
match the query, perhaps with different degrees of relevancy.

19. What is meant by Relevancy ranking? (NOV 2014)(U)


Relevancy ranking is the process of sorting the document results so that those
documents which are most likely to be relevant to your query are shown at
the top.
20. Define Crawling. (NOV 2014)(R)
Web crawling is the process of search engines combing through web
pages in order to properly index them. These ―web crawlers‖ systematically
crawl pages and look at the keywords contained on the page, the kind of content,
all the links on the page, and then returns that information to the search engine‗s
server for indexing. Then they follow all the hyperlinks on the website to get to
other websites. When a search engine user enters a query, the search engine will
go to its index and return the most relevant search results based on the keywords
in the search term. Web crawling is an automated process and provides quick, up
to date data.

21. What is meant by XML Database?(R)


An XML database is a data persistence software system that allows data
to be specified, and sometimes stored, in XML format. These data can then be
queried, transformed, exported and returned to a calling system.

22. Define Data classification.(U)


Data classification is the process of organizing data into categories
for its most effective and efficient use.

23. What is meant by Access Control? (NOV 2014)(R)


In the fields of physical security and information security, access control
is the selective restriction of access to a place or other resource. The act of
accessing may mean consuming, entering, or using. Permission to access a
resource is called authorization. Locks and login credentials are two analogous
mechanisms of access control.
24. What is a threat in dbms?(U)
When the data is been accessed by many people, the chances of data theft
increases. In the past, database attacks were prevalent, but were less in number
as hackers hacked the network more to show it was possible to hack and not to
sell proprietary information. Another reason for database attacks is to gain
money selling sensitive information, which includes credit card numbers, Social
Security Numbers, etc.
25. What is Cryptography?(U)
Cryptography is a method of storing and transmitting data in a particular
form so that only those for whom it is intended can read and process it. The
term is most often associated with scrambling plaintext (ordinary text,
sometimes referred to as cleartext) into ciphertext (a process called encryption),
then back again (known as decryption).
26. Define Statistical database.(R)
A statistical database is a database used for statistical analysis
purposes. It is an OLAP (online analytical processing), instead of OLTP
(online transaction processing) system. Modern decision, and classical
statistical databases are often closer to the relational model than the
multidimensional model commonly used in OLAP systems.

27. Define Clustering in DBMS. (NOV 2014)(R)


Clustering, in the context of databases, refers to the ability of several
servers or instances to connect to a single database. An instance is the collection
of memory and processes that interacts with a database, which is the set of
physical files that actually store data.

28. Write about the four types (Star, Snowflake, Galaxy and Fast
constellation) of Data warehouse schemas.(DEC2015)(R)
1. STAR SCHEMA: Centralized Fact table connect the one or more denormalized data

2. SNOW FLAKE SCHEMA: Centralized Fact table connect the one or


more normalized data

3. STAR FLAKE SCHEMA:One or more centralized fact table connect the


single denormolized data

4. Fact Constallation Schema : For each star schema it is possible to construct


fact constellation schema(for example by splitting the original star
schema into more star schemes each of them describes facts on another
level of dimension hierarchies). The fact constellation architecture contains
multiple fact tables that share many dimension tables.

29. What is Association rule mining?(R)

Association rules are if/then statements that help uncover relationships


between seemingly unrelated data in a relational database or other information
repository. An example of an association rule would be "If a customer buys a
dozen eggs, he is 80% likely to also purchase milk."

Applications:

 Cross-Marketing
 Basket Data Analysis
 Catalog design

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy