100% found this document useful (1 vote)
167 views

Data Modeling Vs Database Design

Data Modeling vs Database Design

Uploaded by

vadriangmail
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
167 views

Data Modeling Vs Database Design

Data Modeling vs Database Design

Uploaded by

vadriangmail
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Data Modeling vs. Database Design http://www.aisintl.com/case/library/R-Theory_vs_ER/r-theory_vs_er.

htm

Data Modeling vs. Database Design


01 Dec 96 updated

Data modeling and relational database design are nearly synonomous in many minds. Most data
modelers consider their end product to be a relational database instance which implements the
requirements asked of the data model.

Of course not all data bases are relational; a very small portion are "object oriented" and a large plurality
are probably still hierarchical or networked. However the vast majority of databases for which we
construct data models today are relational.

While data modeling and relational database design overlap substantially, their purposes and scope
diverge enough to cause users major problems if we ignore their differences. This article examines some
ways in which current data modeling tools and methods confuse, mislead, and obscure our database
design efforts.

Page 1
AIS Home duncan2d@pacbell.net feedback form
Copyright © 1996 Applied Information Science International

1 of 1 05-Apr-01 10:35
Data Modeling vs. Database Design - Page 2 http://www.aisintl.com/case/library/R-Theory_vs_ER/r-theory_vs_er2.htm

Data Modeling vs. Database Design


01 Dec 96 updated

Structure vs.Behavior in The OO movement is built on the principle that each object should have total knowledge of and
responsibility for itself. We are admonished to rigorously encapsulate each object's attributes and methods.
Data Models
and Database Design Of course these concepts are not new. For twenty-five years we have used data modeling to identify and
isolate the persistent, structrual aspects of information systems from the dynamic rules and processes
which govern them.

Yet we freely intermingle structure and behavior in data models. Models represent concepts like
cardinality and inheritance, which are beyond the capabilities of a relational database. Database schema
include features like triggers which partition applications into fragmented business rules.

Data modeling methodologies and CASE tools muddle the tasks of application architecture, data
requirements, and database design. In this discussion we will look at some specifics of several popular
CASE tools and attempt to clarify which is what.

Page 2
AIS Home duncan2d@pacbell.net feedback form
Copyright © 1996 Applied Information Science International

1 of 1 05-Apr-01 10:36
Data Modeling vs. Database Design - Page 3 http://www.aisintl.com/case/library/R-Theory_vs_ER/r-theory_vs_er3.htm

Data Modeling vs. Database Design


01 Dec 96 updated

An Object Based World Most definitions of object oriented principles agree that objects have attributes as well as methods. In our
own writings and thinking, we refer to those two fundamental property sets as business objects and
vs. Relational Theory business rules.

Data modeling is supposedly concerned only with attributes - the static structure of data found in objects.
That would be true of a purely relational model - i.e., one which represented nothing but tables and their
columns. Such a data model expresses the maximum, least constrained, potential for its design to hold
information.

Page 3

1 of 2 05-Apr-01 10:37
Data Modeling vs. Database Design - Page 4 http://www.aisintl.com/case/library/R-Theory_vs_ER/r-theory_vs_er4.htm

Data Modeling vs. Database Design


01 Dec 96 updated

RDBMS Products Each vendor competing in the RDBMS features war vies to differentiate its product with speed, reliability,
and ease of use. Vendors also add extensions beyond basic relational theory, all of which are useful, some
vs. Relational Theory of which are problematic.

For example, indexes are clearly non-relational, in the strictest sense. The essence of a relation is lack of
physical order or position. Yet an index holds ordered values with pointers to physical locations.
Obviously, indexes are a valuable, and universal, extension to basic relational theory.

Note below that for technical and historical reasons, RDBMS practice abandons the arcane terms of set
mathematics. So in the database we call a relation a table and a tuple becomes a row, etc.

Page 4
AIS Home duncan2d@pacbell.net feedback form
Copyright © 1996 Applied Information Science International

1 of 2 05-Apr-01 10:40
Data Modeling vs. Database Design - Page 5 http://www.aisintl.com/case/library/R-Theory_vs_ER/r-theory_vs_er5.htm

Data Modeling vs. Database Design


01 Dec 96 updated

Column Aggregation By organizing information into tables, we have already imposed many constraints, possibly the greatest
proportion of all constraints which will eventually effect our data. We organize raw data concepts into (at
is the Fundamental least) first normal form - i.e., no repeating values - precisely because we want to impose constraints which
Relational Constraint implement business requirements.

For example, we structure a PERSON table with exactly one column for DATE_OF_BIRTH to implement the
obvious business rule that one human was born only once (ignoring more complex issues of vanity and
deceit).
If we wanted to design a database devoid of constraints, or more constructively, one which would be
maximally adaptable, then we would use the binary technique where every attribute sits in its own table
and is only related to others through many-to-many association tables.

This scheme is not new. There have been studies of binary databases for at least twenty years. Yet as long
as we must pay any price for I/O and processing, we will not be able to afford to join every attribute
involved in even the simplest view of data.

Binary database concepts bring to mind Object Role Modeling as exemplified in InfoModeler. Although
ORM examines and models data from a binary perspective, it also transforms thebinary model into a fully
normalized (i.e., aggregated) relational schema.
The instant we aggregate columns into tables we impose constraints and we begin to partition our
application logic. In addition to whatever business rules are scattered through ten thousand modules of
code, we must also consider the implicit rules designed into the data structures themselves. That much
may be unavoidable unless we want to revert to free-form data under program control, back to the
pre-database days of ISAM files.

However, we must not take column aggregation for granted. Consider the case of airports and
flights:
There are many other sorts of rules and constraints which data modeling conventions encourage us to
embed in our database design without due consideration of the potential costs. Let's examine these issues
in more detail.

1 of 2 05-Apr-01 10:38
http://www.aisintl.com/case/library/R-Theory_vs_ER/r-theory_vs_er6.htm

Data Modeling vs. Database Design


01 Dec 96 updated

(Unfinished raw material)

Null / Not Null


Consider the slight difference between these two statements:

create table PERSON ( NAME char(30) )


create table PERSON ( NAME char(30) not null)

All we have added is "not null". Yet this is the next giant step (after structure of the tables themselves) toward embedding business rules in data structure.
(Probably C. J. Date would argue that the choice between "not null" and "null" is immaterial since, Date would say, a sound design avoids all nulls and so
every column could be treated as "not null".)

What we have done by adding "not null" is to insist that the column NAME must always have a value. I know, that's obvious. But what is less obvious is that
we often overlook some point in the lifecycle of PERSON when she doesn't have a name - perhaps at birth. So insisting that NAME is "not null" is substantially
more constraining than merely saying that a PERSON may have a NAME - i.e., that "null" is allowed.

That's obvious too - until we've coded "not null" and the rule changes; or more properly, we discover the correct rules, evidenced by a PERSON without a NAME
. How many places must we look to make that change of business rule (or repair of the system to reflect the true rule)?

To be fair, a good data modeling technique will place that constraint in only one location (say the PERSON table). But don't forget that the person repairing the
system may have dozens-hundreds-thousands of details to track on this one repair. And he may not have knowledge of that elegant data model where NAME is
defined as "null" in only one place.

Foreign Keys and Referential Integrity


As long as we are designing relational databases (and I think that will be well into the 21st century), the only means available to us to navigate amongst tables
of data is by the reference of a foreign key to some key of another table. (As we all know, a foreign key is a set of one or more columns whose structure and
values match a set of key columns in another table. Not all of us are quite as clear that the referenced set of columns need only be some key, not the "primary
key". But that is a different subject.)

Since a foreign key consists of one set of columns, it follows that any row with a foreign key can reference only one row in the other, "foreign", table. Thus the

1 of 8 05-Apr-01 10:41
http://www.aisintl.com/case/library/R-Theory_vs_ER/r-theory_vs_er6.htm

maximum cardinality in the parent table is exactly one row referenced. And the minimum is exactly zero, unless someone or something tags "not null" on the
column(s) of the foreign key.

On the other hand, nothing about the existence of a foreign key column set can enforce that there must be some row holding a particular foreign key value; that
is, the child cannot do the parent's work if the parent requires some minimum number of children. And nothing about the existence of a foreign key value in one
row prevents the same foreign key value(s) from occurring in any number of other rows. Thus it is stunningly simple that every foreign key reference in a
relational database is (zero-or-)one-to-zero-or-many. A foreign key reference can be nothing else without some procedural code to embellish the constraint.

We often discuss referential integrity as if it is inherent in relational database implementation. But of course it is not. Just because we have provided foreign
key column(s), even if we have marked the column(s) "not null", does not mean that the column(s) will hold valid values. In this case, "valid" means values
which exist in the appropriate key of the referenced table.

To maintain referential integrity, any insert or update on the child table holding the foreign key must be responsible for providing a valid foreign key value set -
i.e., one which currently exists in the referenced table. Any update to or deletion of that referenced row must be responsible for dealing with potentially
orphaned child rows carrying the foreign key. We're not concerned here with the various options such as restrict, cascade, set null, or set default. What we care
about is how any such options are implemented.

Most modern RDBMS products offer us at least two ways to enforce referential integrity:

Declaratively, with "create constraint" statements


Procedurally, with explicit trigger code.

In either case, it is not the structure per se of the database which provides referential integrity. That is, we cannot ascertain what rules and methods of
referential integrity are in force by inspecting only the tables themselves. We must also examine the named constraints, and their internal behaviors, or the
procedural code executed in triggers. Thus we have already fragmented our system's rules into three separate complex webs: columns aggregated into
tables, not null modifiers, and referential integrity behaviors.

Entity-Relationship models attempt to present the design of an RDBMS instance in a filtered, abstract way.

By hiding the physical details of storage and navigation, the ER model focuses attention on the meaning of data structures - the business rules which they
embody.

Of course all the terms must chamge again, not so much to add clarity as to sell more methodology text books. What was first a relation and then a table
is now called an entity. The relational attribute, which became an RDBMS column, is now called an attribute again or perhaps it's a data item or data
element.

2 of 8 05-Apr-01 10:41
http://www.aisintl.com/case/library/R-Theory_vs_ER/r-theory_vs_er6.htm

levels of abstraction - normalization - 1NF vs 5NF

Dependent key is not aggregation

Cardinality in Entity-Relationship Diagrams


As stated above, a foreign key can reference only one row in the other, "foreign", table but that same foreign key value set may repeat in any number of
rows of the child table. Thus every foreign key reference is one-to-many. Yet almost all Entity-Relationship Diagram (ERD) methods and tools allow us
to record one-to-one, one-to-many, and many-to-many. In fact some even capture specific, numeric minimum and maximum cardinality.

Can a relation database digest all of this? No! Every relational FK-PK reference is inherently one-to-zero-or-more. All other forms require specific,
non-relational, solutions. Some examples:

One-to-one is a commonly allowed form. Yet clearly the existence of a foreign key column set in no way limits the occurrence of a foreign key value to

3 of 8 05-Apr-01 10:41
http://www.aisintl.com/case/library/R-Theory_vs_ER/r-theory_vs_er6.htm

only one row. It can be done by several methods . This list is not offered as complete nor is it ordered with meaning:

A unique index or alternate key on the foreign key column(s).


An inherited primary key where the foreign key is also the complete primary key. This enforces uniqueness of the key value. S-Designor, ERwin, and
other CASE tools use this method to implement sub-types as one-to-one children of their super-types.
Procedural code in insert and update triggers on the child to check for the existence of a row with the proposed foreign key value.
Application code to pre-check for a conflicting row.

One-to-many relationships in ERDs are by far the most common. These do not cause problems unless they carry specific cardinality constraints on the
child; e.g., "1,1 team : 1,5 team member". It is important to know that each team has at least one and no more than five members. But these values cannot
be imposed by the structure of the database. They require on of:

Procedural code in the parent table's triggers (insert, update, and delete) to maintain the required minimums and maximums in the child table.
Application code external to the database to do the same.

Many-to-many relationships are well understood by all database designers and most CASE tools. They must cause the creation of an intermediate
associative table to resolve the multiple references. This is no problem if your CASE tool either does it or tells you it can't do it. Or you can avoid
many-to-many relationships is an ERD by explicitly modeling the associative entities. The latter path has pros and cons but it certainly sidesteps failings
of a CASE tool in this regard.

Sub-type / super-type structures pose special problems:

One-to-one relationships between sub-types and its super-types are generally enforced with an inherited primary key ("dependent" inS-Designor and
Silverrun, "identifying" in ERwin, ER/1, and System Architect). This method is effective but weak because it presumes the designer will never assign a
different primary key in the sub-type. A more appropriate solution would be to force a unique index or alternate key on the super-type foreign key.

Complete sub-type sets are a default property in some CASE tools (Oracle CASE and most SSADM tools) and optional in others such as ERwin, ER/1,
and S-Designor. A complete subtype set is logically equivalent to a minimum sub-type cardinality of one. This situation should create or suggest some
means to enforce the existence of one sub-type row for each super-type row. Unfortunately, we know of no CASE tool which even suggests that the user
should solve this problem, let alone offers to generate code for it.

Incomplete sub-type sets do not require at least one sub-type per super-type row so no enforcement is needed.

Exclusive sub-type sets are a default property on come CASE tools such as ERwin and ER/1 and optional in others such as S-Designor. Exclusivity
implies some check before insert or update of a sub-type. This could be a trigger based check; a code check of a "category discriminator" attribute,a la
IDEF1X; or some other application code external to the database.

4 of 8 05-Apr-01 10:41
http://www.aisintl.com/case/library/R-Theory_vs_ER/r-theory_vs_er6.htm

In a nutshell, a relational database can deal directly only with the cardinality 0,1 : 1,N. When we model any other cardinality form we (or our CASE tool)
must create non-structural solutions.

mutually mandatory

optional parent

OR connector in SA; arc

Attribute Roles
So far we have seen how data modeling methods and tools draw us insidiously into a trap of representing some things which are solved and some which
are left undone. So our model represents not the database but rather our use of the database. Or does it? Can any data modeling tool or method capture all
the constraints we need to express?

Our company is small and we have only a few employees. We work and live in an area where people cherish their convenience and free time. Although
many of our employees live within walking or bicycling distance, and in spite of our blandishments about exercise and the environment, only a few
choose not to drive. So parking spaces are a valued benefit to our commuters

But there are some among us who do walk, bike, or even row to the office every day. They, understandably, felt left out of the goodies when we assigned
parking places to those who drive regularly.

Wanting to be fair to all, we calculated our cost per parking space for the company garage and offered each employee a choice: take an assigned parking
space or receive the cash as a commuting allowance The drivers are happy; they don't have to give up their parking to accommodate the health nuts and
environmentalists. The non-drivers like this too; one even saved enough to buy a new rowing shell. Internally, however, we have a systems problem: how
do we model this in an entity-based model?

Each employee is assigned either a parking space (say "A19") or a commuting


allowance (e.g., $35). In an ERD we draw relationships only between entities -
not between attributes. Clearly "parking space" is an entity. Each space exists
with or without an employee assigned to it and the company pays a garage rental
based on the number and location of spaces.

But just as clearly "commuting allowance" is merely an attribute - a value without any existence or meaning on its own. The answer is: we don't.
Entity-Relationship modeling does not capture all the constraints we must recognize for useful systems solutions. So now we know:

Our data model cannot capture all the constraints in our problem space.

5 of 8 05-Apr-01 10:41
http://www.aisintl.com/case/library/R-Theory_vs_ER/r-theory_vs_er6.htm

Of those constraints which the model does capture, only some can be generated into a database schema.
The model doesn't tell us which ones are which.

State Transitions

6 of 8 05-Apr-01 10:41
http://www.aisintl.com/case/library/R-Theory_vs_ER/r-theory_vs_er6.htm

The above scenario is a lot simpler than almost any real life situation. What are the rules, for example, about when an employee can be assigned a parking
space? Can a parking space be reassigned? What about changing the commuting allowance?

Our data model can only represent the least constrained potential for this database design. We have no way to represent or even capture intermediate
conditions and their enabling or causative factors.

For example, an employee is assigned a commuting allowance 60 days after hire (should he last so long). But parking spaces are only assigned every six
months. So clearly some employees have neither, some are eligible only for an allowance, and some may be ready to receive a parking space assignment.

Some CASE tools throw in a few fragments of state transition. For example, S-Designor has its "Change parent allowed" check box to deal with one
minor aspect of value change. This is worse than no feature at all because it seems to imply greater significance to that one value change control than to
any others, which are not even discussed.

On the other hand, classic large CASE tools, and a few smaller ones like System Architect, attempt to provide a complete state transition mechanism
outside of the data modeling environment.

In Summary
We have seen how many of the features and techniques of conventional data modeling are not about database design but rather about the number, range,
and change of values through time. These constraints cannot be designed into a relational schema per se. They typically require some additional code
solution.

This causes two problems:

We are easily lured into ad hoc, implicit partitioning of application logic and away from central definition and execution of each object's behavior.
Data models deceive us by capturing and representing information of two very different sorts: that which can generate database schema code (DDL) and
that which cannot.

We propose that any CASE tool or method capable of modeling constraints beyond relational database structure should identify and report all such
constraints so that the user can make appropriate partitioning decisions and provide code where the CASE tool cannot.

Where to go from here:


Conceptual, Logical, and Physical models
The relational meta model

7 of 8 05-Apr-01 10:41

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy