337 Lecture-01
337 Lecture-01
337 Lecture-01
Lecture-1
Introduction to the Database Systems
1
Introduction
• A database-management system (DBMS) is a collection of interrelated data and a set
of programs to access those data.
• The collection of data, usually referred to as the database, contains information
relevant to an enterprise.
• The primary goal of a DBMS is to provide a way to store and retrieve database
information that is both convenient and efficient.
• Database systems are designed to manage large bodies of information.
• Management of data involves both defining structures for storage of information and
providing mechanisms for the manipulation of information.
• In addition, the database system must ensure the safety of the information stored,
despite system crashes or attempts at unauthorized access.
• Because information is so important, a large body of concepts and techniques have
been developed for managing data.
Database-System Applications
• The earliest database systems arose in the 1960s in response to the computerized
management of commercial data. Those earlier applications were relatively simple
compared to modern database applications.
• Modern applications include highly sophisticated, worldwide enterprises.
• All database applications, old and new, share important common elements.
• The central aspect of the application is not a program performing some calculation,
but rather the data themselves.
• Database systems are used to manage collections of data that are:
o highly valuable,
o relatively large, and
o accessed by multiple users and applications, often at the same time.
2
Database-System Applications – cnt
• The first database applications had only simple, precisely formatted, structured data.
• Today, database applications may include data with complex relationships and a more
variable structure.
• As an example of an application with structured data, consider a university’s records
regarding courses, students, and course registration.
• The university keeps the same type of information about each course: course-
identifier, title, department, course number, etc., and similarly for students: student-
identifier, name, address, phone, etc.
• Course registration is a collection of pairs: one course identifier and one student
identifier.
• Information of this sort has a standard, repeating structure and is representative of
the type of database applications that go back to the 1960s.
3
Database-System Applications – cnt
• Contrast this simple university database application with a social-networking site.
• Users of the site post varying types of information about themselves ranging from
simple items such as name or date of birth, to complex posts consisting of text,
images, videos, and links to other users.
• There is only a limited amount of common structure among these data.
• Both of these applications, however, share the basic features of a database.
• Modern database systems exploit commonalities in the structure of data to gain
efficiency but also allow for weakly structured data and for data whose formats are
highly variable.
• As a result, a database system is a large, complex software system whose task is to
manage a large, complex collection of data.
This course aims to teach the design and implementation of relational databases (!)
As this list illustrates, databases form an essential part not only of every enterprise but
also of a large part of a person’s daily activities.
Lecture-1 COMP337 by Dr. Ferhun Yorgancıoğlu 8
4
Database Interaction
• The ways in which people interact with databases has changed over time.
• Early databases were maintained as back-office systems with which users interacted
via printed reports and paper forms for input.
• As database systems became more sophisticated, better languages were developed
for programmers to use in interacting with the data, along with user interfaces that
allowed end users within the enterprise to query and update data.
• Today, virtually every enterprise employs web applications or mobile applications to
allow its customers to interact directly with the enterprise’s database, and, thus, with
the enterprise itself.
• For instance, when you read a social-media post, or access an online bookstore and
browse a book or music collection, you are accessing data stored in a database.
• When you enter an order online, your order is stored in a database. When you access
a bank web site and retrieve your bank balance and transaction information, the
information is retrieved from the bank’s database system. When you access a web
site, information about you may be retrieved from a database to select which
advertisements you should see.
• Almost every interaction with a smartphone results in some sort of database access.
5
Purpose of Database Systems – cnt
• Keeping organizational information in a file-processing system has a number of major
disadvantages:
o Data redundancy and inconsistency: data is stored in multiple file formats resulting in
duplication of information in different files
o Data isolation
Multiple files and formats
o Integrity problems
Integrity constraints (e.g., account balance ≥ 0) become “buried” in program code rather
than being stated explicitly
Hard to add new constraints or change existing ones
o Security problems
Hard to provide user access to some, but not all, data
These difficulties, among others, prompted both the initial development of database
systems and the transition of file-based applications to database systems, back in the
1960s and 1970s.
6
View of Data
• A database system is a collection of interrelated data and a set of programs that allow
users to access and modify these data.
• A major purpose of a database system is to provide users with an abstract view of the
data. That is, the system hides certain details of how the data are stored and
maintained.
o Data models
A collection of conceptual tools for describing data, data relationships, data semantics,
and consistency constraints.
o Data abstraction
Hide the complexity of data structures to represent data in the database from users
through several levels of data abstraction.
Data Models
• Underlying the structure of a database is the data model: a collection of conceptual
tools for describing:
o data,
o data relationships,
o data semantics, and
o consistency constraints.
7
Relational Data Model
• In the relational model, data are represented in the form of tables. Ted Codd
Turing Award 1981
• Each table has multiple columns, and each column has a unique name.
• Each row of the table represents one piece of information.
Columns (attributes)
Rows (tuples)
8
Data Abstraction
• Managing complexity is challenging, not only in the management of data but in any
domain.
• Key to the management of complexity is the concept of abstraction.
• Abstraction allows a person to use a complex device or system without having to
know the details of how that device or system is constructed.
• A person is able, for example, to drive a car by knowing how to operate its controls.
However, the driver does not need to know how the motor was built nor how it
operates. All the driver needs to know is an abstraction of what the motor does.
• Similarly, for a large, complex collection of data, a database system provides a simpler,
abstract view of the information so that users and application programmers do not
need to be aware of the underlying details of how data are stored and organized.
• By providing a high level of abstraction, a database system makes it possible for an
enterprise to combine data of various types into a unified repository of the
information needed to run the enterprise.
9
Instances and Schemas
• Databases change over time as information is inserted and deleted.
• Each variable has a particular value at a given instant. The values of the variables in a
program at a point in time correspond to an instance of a database schema.
Database Languages
• A database system provides a data-definition language (DDL) to specify the database
schema and a data-manipulation language (DML) to express database queries and
updates.
• In practice, the data-definition and data-manipulation languages are not two separate
languages; instead they simply form parts of a single database language, such as the
SQL language.
• Declarative DMLs are usually easier to learn and use than are procedural DMLs.
• However, since a user does not have to specify how to get the data, the database
system has to figure out an efficient means of accessing data.
10
Database Access from Application Programs
• Nonprocedural query languages such as SQL are not as powerful as a universal Turing
machine; that is, there are some computations that are possible using a general-
purpose programming language but are not possible using SQL.
• SQL also does not support actions such as input from users, output to displays, or
communication over the network. Such computations and actions must be written in
a host language, such as C/C++, Java, or Python, with embedded SQL queries that
access the data in the database.
• Application programs are programs that are used to interact with the database in this
fashion. Examples in a university system are programs that allow students to register
for courses, generate class rosters, calculate student GPA, generate payroll checks, and
perform other tasks.
• To access the database, DML statements need to be sent from the host to the
database where they will be executed. This is most commonly done by using an
application-program interface (set of procedures) that can be used to send DML and
DDL statements to the database and retrieve the results.
• The Open Database Connectivity (ODBC) standard defines application program
interfaces for use with C and several other languages. The Java Database Connectivity
(JDBC) standard defines a corresponding interface for the Java language.
Database Design
that meets the needs of the enterprise being modelled requires
attention to a broader set of issues.
11
Database Design – cnt
• In terms of the relational model, the conceptual-design process involves decisions on
what attributes we want to capture in the database and how to group these attributes
to form the various tables.
• The “what” part is basically a business decision, and we shall not discuss it further in
this course.
• The “how” part is mainly a computer-science problem. There are principally two ways
to tackle the problem:
o The first one is to use the entity-relationship model;
o The other is to employ a set of algorithms (collectively known as normalization) that
takes as input the set of all attributes and generates a set of tables.
12
Database Users
• There are four different types of database-system users:
o Naïve users
unsophisticated users who interact with the system by using predefined user interfaces,
such as web or mobile applications
o Application programmers
are computer professionals who write application programs
application programmers can choose from many tools to develop user interfaces
o Sophisticated users
interact with the system without writing programs
• form their requests either using a database query language or by using tools such as data
analysis software
analysists who submit queries to explore data in the database fall in this category
o Specialized users
write specialized database applications that do not fit into the traditional data-
processing framework
• For example: CAD, graphic data, audio, video
Database Administrators
• A person who has central control over the system is called a database administrator
(DBA), whose functions are:
o Schema definition
o Storage structure and access-method definition
o Schema and physical-organization modification
o Granting of authorization for data access
o Routine maintenance
o Periodically backing up the database
o Ensuring that enough free disk space is available for normal operations, and upgrading
disk space as required
o Monitoring jobs running on the database and ensuring that performance is not
degraded by very expensive tasks submitted by some users
13
History of Database Systems
• 1950s and early 1960s
o Data processing using magnetic tapes for storage
o Punch cards for input
• Late 1960s and 1970s
o Hard disks allowed direct access to data
o Network and hierarchical data models in widespread use
o Ted Codd defines the relational data model (would win the ACM Turing Award for this work)
o IBM Research begins System R prototype
o Oracle releases the first commercial relational database
• 1980s
o SQL becomes industrial standard
o Parallel and distributed database systems (Wisconsin, IBM, Teradata)
o Object-oriented database systems
• 1990s
o Large decision support and data-mining applications
o Large multi-terabyte data warehouses
o Emergence of web commerce
14