Basic Oracle Handout
Basic Oracle Handout
Basic Oracle Handout
Version:2
Introduction to SQL
1. Overview
The computer industry is criss-crossed with languages and standards, most of which are unintelligible to each other. Here and there, true standards have emerged, and in these cases it is well worth the time of any programmer to learn them. Structured Query Language, or SQL as we commonly call it, has, over the last ten years, emerged as the standard language for programmers to talk with databases through a Database Management System (DBMS). Oracle, Microsoft SQL Server, Microsoft Access, IBM's DB2, Sybase, and virtually every other DBMS sold in the last five years use SQL. Knowledge of SQL is becoming necessary for almost every IT professional. And as the development of basic web sites becomes common among non-programmers, a grasp of SQL will help them to integrate data into their HTML pages.
4.2. Standards
As we said at the beginning, SQL is a standard, open language without corporate ownership. The commercial acceptance of SQL was precipitated by the formation of SQL Standards committees by the American National Standards Institute and the International Standards Organization in 1986 and 1987. Two years later they published a specification known as SQL-89. An improvement and expansion (to some 600 pages) to the standard gave the world SQL-92. We now have the third generation standard, SQL 99. The existence of standards is important for the general portability of SQL statements. Who is ANSI? The American National Standards Institute is an administrator and coordinator of voluntary systems of standardization for the United States private sector. About 80 years ago a group of engineering societies and government agencies formed the institute to enhance the "quality of life by promoting and facilitating voluntary consensus standards and conformity." Today the Institute represents the interests of about 1,000 companies, organizations and government agencies. ANSI does not itself develop standards; rather it facilitates development by establishing consensus among qualified groups.
5. Flavors of SQL
The computer industry (like most industries) both benefits and suffers from standards. We said that SQL is an open standard, not owned by a company, and the standard comes from ANSI. Therefore the SQL standard from ANSI is considered the "pure" SQL and called ANSI-SQL. Two problems emerge to sully this pureness. First is that every DBMS vendor wants to differentiate their DBMS products. So if we look at the feature set of each DBMS product we see that not only does the product support ANSISQL but it also offers extra features, enhancements or extensions that are available only from individual vendors. For example, most vendors offer a field type which auto- increments even though this is not described in the SQL standards. These additions to ANSI-SQL are generally proprietary and will not work if we try to use them on competitor's SQL products. At the level we discuss in this book there are only very minor differences between the vendors that we will note throughout the book. Many of these features are powerful and robust, but since they vary from vendor to vendor, programmers should use them with caution. It is always safest to stick with pure SQL whenever possible; if we stray it should be with full knowledge that we are losing the portability of our statements (and perhaps even our data). Such enhancements are not all bad because these extensions are very useful. For example, ANSI-SQL does not contain an automatic way to assign a serial number to each new record but most DBMS sold today have added this feature. Since serial numbering is so common programmers are happy to have the enhancement. However, the method of implementation is not uniform, so code written to get the serial number from data in one DBMS may not work when used with another vendor's DBMS.
1. History
Databases have been in use since the earliest days of electronic computing, but the vast majority of these were custom programs written to access custom databases. Unlike modern systems which can be applied to widely different databases and needs, these systems were tightly linked to the database in order to gain speed at the expense of flexibility.
For instance, a common use of a database system is to track information about users, their name, login information, various addresses and phone numbers. In the navigational approach all of these data would be placed in a single record, and unused items would simply not be placed in the database. In the relational approach, the data would be normalized into a user table, an address table and a phone number table (for instance). Records would be created in these optional tables only if the address or phone numbers were actually provided. Linking the information back together is the key to this system. In the relational model some bit of information was used as a "key", uniquely defining a particular record. When information was being collected about a user, information stored in the optional (or related) tables would be found by searching for this key. For instance, if the login name of a user is unique, addresses and phone numbers for that user would be recorded with the login name as its key. This "re-linking" of related data back into a single collection is something that traditional computer languages are not designed for. Just as the navigational approach would require programs to loop in order to collect records, the relational approach would require loops to collect information about any one record. Codd's solution to the necessary looping was a setoriented language, a suggestion that would later spawn the ubiquitous SQL. Using a branch of mathematics known as tuple calculus, he demonstrated that such a system could support all the operations of normal databases (inserting, updating etc.) as well as providing a simple system for finding and returning sets of data in a single operation. Codd's paper was picked up by two people at Berkeley, Eugene Wong and Michael Stonebraker. They started a project known as INGRES using funding that had already been allocated for a geographical database project, using student programmers to produce code. Beginning in 1973, INGRES delivered its first test products which were generally ready for widespread use in 1979. During this time a number of people had moved "through" the group perhaps as many as 30 people worked on the project, about five at a time. INGRES was similar to System R in a number of ways, including the use of a "language" for data access, known as QUEL - QUEL was in fact relational, having been based on Codd's own Alpha language, but has since been corrupted to follow SQL, thus violating much the same concepts of the relational model as SQL itself. IBM itself did only one test implementation of the relational model, PRTV, and a production one, Business System 12, both now discontinued. Honeywell did MRDS for Multics, and now there are two new implementations: Alphora Dataphor and Rel. All other DBMS implementations usually called relational are actually SQL DBMSs.
2. Description of DBMS
A DBMS can be an extremely complex set of software programs that controls the organization, storage and retrieval of data (fields, records and files) in a database. The basic functionalities that a DBMS must provide are: 1. A modeling language to define the schema of each database hosted in the DBMS, according to the DBMS data model. a. The three most common organizations are the hierarchical, network and relational models. A database management system may provide one, two or all three methods. Inverted lists and other methods are also used. The most suitable structure depends on the application and on the transaction rate and the number of inquiries that will be made. The dominant model in use today is the ad hoc one embedded in SQL, a corruption of the relational model by violating several of its fundamental principles. Many DBMSs also support the Open Database Connectivity API that supports a standard way for programmers to access the DBMS.
2. 3.
Data structures optimized to deal with big amounts of data recorded to a permanent data storage device, which are very slow compared to the primary storage (volatile main memory). A database query language and report writer to allow users to interactively interrogate the database, analyse its data and update it according to the users privileges on data. a. b. It also controls the security of the database. Data security prevents unauthorised users from viewing or updating the database. Using passwords, users are allowed access to the entire database or subsets of it called subschemas (pronounced "sub-skeema"). For example, an employee database can contain all the data about an individual employee, but one group of users may be authorized to view only payroll data, while others are allowed access to only work history and medical data. If the DBMS provides a way to interactively enter and update the database, as well as interrogate it, this capability allows for managing personal databases. However, it may not leave an audit trail of actions or provide the kinds of controls necessary in a multi-user
c.
organisation. These controls are only available when a set of application programs are customised for each data entry and updating function. 4. A transaction mechanism, that ideally would guarantee the ACID properties, in order to ensure data integrity despite of concurrent user access (concurrency control) and faults (fault tolerance). a. b. It also controls the integrity of the database. The DBMS can maintain the integrity of the database by not allowing more than one user to update the same record at the same time. The DBMS can keep duplicate records out of the database; for example, no two customers with the same customer numbers (key fields) can be entered into the database. See ACID properties for more information (Redundancy avoidance).
The DBMS accepts requests for data from the application program and instructs the operating system to transfer the appropriate data. When a DBMS is used, information systems can be changed much more easily as the organization's information requirements change. New categories of data can be added to the database without disruption to the existing system. Organizations may use one kind of DBMS for daily transaction processing and then move the detail onto another computer that uses another DBMS better suited for random inquiries and analysis. Overall systems design decisions are performed by data administrators and systems analysts. Detailed database design is performed by database administrators. Database servers are specially designed computers that hold the actual databases and run only the DBMS and related software. Database servers are usually multiprocessor computers, with RAID disk arrays used for stable storage. Connected to one or more servers via a high-speed channel, hardware database accelerators are also used in large volume transaction processing environments.
Codds 12 Rules
Codd's 12 rules are a set of thirteen rules proposed by Edgar F. "Ted" Codd, a pioneer of the relational model for databases, designed to define what is required from a database management system in order for it to be considered relational, i.e., a RDBMS. Codd produced these rules as part of a personal campaign to prevent his vision of the relational database being diluted, as database vendors scrambled in the early 1980s to repackage existing products with a relational veneer. Rule 12 was particularly designed to counter such a positioning. In fact, however, the rules are so strict that even systems whose only interface is the SQL language fail on some of the criteria.
Rule 0: The system must qualify as relational, as a database, and as a management system.
For a system to qualify as a relational database management system (RDBMS), that system must use its relational facilities (exclusively) to manage the database.
Describe Table
Use "Describe" To Get Table Definition
SQLPlus command DESCRIBE returns definitions of tables and views. For example, information about our tables are stored in the table TABS. SQL> DESCRIBE TABS; Name Null? -------------------------- -------TABLE_NAME NOT NULL TABLESPACE_NAME CLUSTER_NAME IOT_NAME PCT_FREE PCT_USED INI_TRANS MAX_TRANS INITIAL_EXTENT NEXT_EXTENT MIN_EXTENTS MAX_EXTENTS PCT_INCREASE FREELISTS FREELIST_GROUPS LOGGING BACKED_UP NUM_ROWS BLOCKS EMPTY_BLOCKS AVG_SPACE CHAIN_CNT AVG_ROW_LEN AVG_SPACE_FREELIST_BLOCKS NUM_FREELIST_BLOCKS DEGREE INSTANCES CACHE TABLE_LOCK SAMPLE_SIZE LAST_ANALYZED PARTITIONED IOT_TYPE TEMPORARY NESTED BUFFER_POOL Type ---VARCHAR2(30) VARCHAR2(30) VARCHAR2(30) VARCHAR2(30) NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER VARCHAR2(3) VARCHAR2(1) NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER VARCHAR2(10) VARCHAR2(10) VARCHAR2(5) VARCHAR2(8) NUMBER DATE VARCHAR2(3) VARCHAR2(12) VARCHAR2(1) VARCHAR2(3) VARCHAR2(7)
The first column is the list of column names. The second column is a list of Not Null indicators. The third column is the list of data types. This output reveals that the column name "TABLE_NAME" is not allowed to be null. Now to get an alphabetical list of all the tables in our account, use the command SQL> SELECT TABLE_NAME FROM TABS ORDER BY TABLE_NAME; We can use the describe command on the following views to obtain information on our account.
View Name
DICT DICT_COLUMN CAT OBJ
Description
table names and table description column names of table names and column description names of all user's tables, views, synonyms, and sequences information on all objects in our account
table information on all user's tables column information on all user's columns view information on all user's views synonyms information on all user's synonyms sequence information on all user's sequences
USER_CONSTRAINTS constraint information on user's constraints USER_CONS_COLUMNS column information on user's constraints IND index information on all user's indices
USER_IND_COLUMNS column information on user's indices Often it is used to describe the structure of a table because we may not be familiar with a particular table. USER: is the owner of the object. TABLE: is a table, view, or synonym. DATABASE_LINK_NAME: is the node name and database where object exists. The syntax depends on the SQL*Net our computer uses. See LINKS for more information on database links. COLUMN: is the column in table we wish to describe. OBJECT: is the function or procedure we wish to describe. If we want to describe a procedure in a package, object is the name of the package. SUBOBJECT: is the function or procedure in a package that we wish to describe. For each column, the description lists: o The column's name o Whether null values are allowed (NULL or NOT NULL) for the column o The column's datatype, for example, NUMBER, CHAR, VARCHAR2 (VARCHAR), LONG, DATE, MLSLABEL, RAW MLSLABEL, RAW, LONGRAW, ROWID. o The column's precision (and scale, if any, for a numeric column) For each function or procedure the description lists: o The type of PL/SQL object (function or procedure) o The name of the function or procedure o The arguments, their type, input/output, and default values
2. login.sql
Additionally, after reading glogin.sql, sql*plus also looks for a file named login.sql in the directory from where SQL*PLUS was and in the directory that the environment variable SQLPATH points to and reads it and executes it. Settings from the login.sql take precedence over settings from glogin.sql
Oracle 10g
Since Oracle 10g, the login.sql is executed after a connect This allows to have a prompt that displays the username. For that, the following line must be in the login.sql: D. SQL> set sqlprompt "&_user> "
In this example everything is capitalized, but it doesn't have to be. The preceding query would work just as well if it were written like this: H. I. J. SQL> SELECT ename, mgr, hiredate FROM emp WHERE ename = ' SMITH ';
Note that SIMTH appears in capital letters in both examples. All though actual SQL statements are not case sensitive, referred to data in DB. For instance, many companies store their data in uppercase. In preceding example, assume that column ENAME stores its contents in uppercase. Therefore, a query searching for 'Smith' in ENAME column wouldnt find any data to return. Check our implementation and/or company policies for any case requirements. Note: Commands in SQL are not case sensitive. Take another look at the sample query. Is there something magical in the spacing? Again the answer is no. The following code would work as well: K. L. M. SQL> SELECT ename, mgr, hiredate FROM emp WHERE ename = 'SMITH';
However, some regard for spacing and capitalization makes our statements much easier to read. It also makes our statements much easier to maintain when they become a part of our project. Another important feature of (semicolon) semicolon (;) the sample query is the semicolon at the end of the expression. This punctuation mark tells the command-line SQL program that our query is complete. If the magic isn't in the capitalization or the format, then just which elements are important? The answer is keywords, or the words in SQL that are reserved as a part of syntax. (Depending on the SQL statement, a keyword can be either a mandatory element of the statement or optional.) The keywords in the current example are N. O. P. SQL> SELECT FROM WHERE
Check the table of contents to see some of the SQL keywords we will learn and on what days.
Basic statements like SELECT couldn't be simpler. However, SELECT does not work alone. If we typed just SELECT into our system, we might get the following response: R. S. T. U. V. SQL> SELECT; SELECT * ERROR at line 1: ORA-00936: missing expression
The asterisk under the offending line indicates where Oracle thinks the offense occurred. The error message tells we that something is missing. That something is the FROM clause: Syntax: W. FROM <TABLE>
Together, the statements SELECT and FROM begin to unlock the power behind our database. Note: keywords clauses at this point we may be wondering what the difference is between a keyword, a statement, and a clause. SQL keywords refer to individual SQL elements, such as SELECT and FROM. A clause is a part of an SQL statement; for example, SELECT column1, column2, ... is a clause. SQL clauses combine to form a complete SQL statement. For example, we can combine a SELECT clause and a FROM clause to write an SQL statement. Examples Before going any further, look at the sample database that is the basis for the following examples. This database illustrates the basic functions of SELECT and FROM. In the real world we would use the techniques to build this database, but for the purpose of describing how to use SELECT and FROM, assume database already exists. This example uses the EMP table to retrieve information about employees that company has. The EMP table: X. Y. Z. AA. BB. CC. DD. EE. FF. GG. HH. II. JJ. KK. LL. MM. Our First Query NN. OO. PP. SQL> select * from emp; EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO EMPNO ----7369 7499 7521 7566 7654 7698 7782 7788 7839 7844 7876 7900 7902 7934 ENAME -------SMITH ALLEN WARD JONES MARTIN BLAKE CLARK SCOTT KING TURNER ADAMS JAMES FORD MILLER JOB --------CLERK SALESMAN SALESMAN MANAGER SALESMAN MANAGER MANAGER ANALYST PRESIDENT SALESMAN CLERK CLERK ANALYST CLERK MGR ---7902 7698 7698 7839 7698 7839 7839 7566 7698 7788 7698 7566 7782 HIREDATE --------17-DEC-80 20-FEB-81 22-FEB-81 02-APR-81 28-SEP-81 01-MAY-81 09-JUN-81 19-APR-87 17-NOV-81 08-SEP-81 23-MAY-87 03-DEC-81 03-DEC-81 23-JAN-82 SAL ---800 1600 1250 2975 1250 2850 2450 3000 5000 1500 1100 950 3000 1300 COMM DEPTNO ----- -----20 300 30 500 30 20 1400 30 30 10 20 10 0 30 20 30 20 10
QQ. RR. SS. TT. UU. VV. WW. XX. YY. ZZ. AAA. BBB. CCC. DDD. EEE.
----7369 7499 7521 7566 7654 7698 7782 7788 7839 7844 7876 7900 7902 7934
-------SMITH ALLEN WARD JONES MARTIN BLAKE CLARK SCOTT KING TURNER ADAMS JAMES FORD MILLER
--------CLERK SALESMAN SALESMAN MANAGER SALESMAN MANAGER MANAGER ANALYST PRESIDENT SALESMAN CLERK CLERK ANALYST CLERK
---7902 7698 7698 7839 7698 7839 7839 7566 7698 7788 7698 7566 7782
--------17-DEC-80 20-FEB-81 22-FEB-81 02-APR-81 28-SEP-81 01-MAY-81 09-JUN-81 19-APR-87 17-NOV-81 08-SEP-81 23-MAY-87 03-DEC-81 03-DEC-81 23-JAN-82
---800 1600 1250 2975 1250 2850 2450 3000 5000 1500 1100 950 3000 1300
Analysis:
This output looks just like the code in the example. Notice that columns 1, 4, 6, 7 and 8 in the output statement are right-justified and that columns 2, 3 and 5 are left-justified. This format follows the alignment convention in which numeric data types are right-justified and character data types are left-justified. The asterisk (*) in SELECT * tells the database to return all the columns associated with the given table described in the FROM clause. The database determines the order in which to return the columns.
Notice that each column name is listed in the SELECT clause. The order in which the columns are listed is the order in which they will appear in the output. Notice both the commas that separate the column names and the space between the final column name and the subsequent clause (in this case FROM). The output would look like this: GGG. HHH. III. JJJ. KKK. LLL. MMM. NNN. OOO. PPP. QQQ. RRR. SSS. TTT. UUU. VVV. ENAME ---------SMITH ALLEN WARD JONES MARTIN BLAKE CLARK SCOTT KING TURNER ADAMS JAMES FORD MILLER JOB EMPNO SAL --------- ---------- ---------CLERK 7369 800 SALESMAN 7499 1600 SALESMAN 7521 1250 MANAGER 7566 2975 SALESMAN 7654 1250 MANAGER 7698 2850 MANAGER 7782 2450 ANALYST 7788 3000 PRESIDENT 7839 5000 SALESMAN 7844 1500 CLERK 7876 1100 CLERK 7900 950 ANALYST 7902 3000 CLERK 7934 1300
Another way to write the same statement follows. WWW. SQL> SELECT ename, job, empno, sal
XXX.
FROM emp;
Notice that the FROM clause has been carried over to the second line. This convention is a matter of personal taste when writing SQL code. The output would look like this: YYY. ZZZ. AAAA. BBBB. CCCC. DDDD. EEEE. FFFF. GGGG. HHHH. IIII. JJJJ. KKKK. LLLL. MMMM. NNNN. OOOO. PPPP. ENAME ---------SMITH ALLEN WARD JONES MARTIN BLAKE CLARK SCOTT KING TURNER ADAMS JAMES FORD MILLER JOB EMPNO SAL --------- ---------- ---------CLERK 7369 800 SALESMAN 7499 1600 SALESMAN 7521 1250 MANAGER 7566 2975 SALESMAN 7654 1250 MANAGER 7698 2850 MANAGER 7782 2450 ANALYST 7788 3000 PRESIDENT 7839 5000 SALESMAN 7844 1500 CLERK 7876 1100 CLERK 7900 950 ANALYST 7902 3000 CLERK 7934 1300
14 rows selected.
Analysis:
The output is identical because only the format of the statement changed. Now that we have established control over the order of the columns, we will be able to specify which columns we want to see.
Analysis:
Now we have columns we want to see. Notice use of upper- and lowercase in the query. It did not affect the result.
Suppose we had a table called DEPT with this structure: LLLLL. MMMMM. NNNNN. OOOOO. PPPPP. QQQQQ. DEPTNO DNAME LOC ---------- -------------- --------10 ACCOUNTING NEW YORK 20 RESEARCH DALLAS 30 SALES CHICAGO 40 OPERATIONS BOSTON
We would simply change the FROM clause to the desired table and type the following statement: RRRRR. SQL> SELECT * FROM dept; SSSSS. TTTTT. DEPTNO DNAME LOC UUUUU. ---------- -------------- --------VVVVV. 10 ACCOUNTING NEW YORK WWWWW. 20 RESEARCH DALLAS XXXXX. 30 SALES CHICAGO YYYYY. 40 OPERATIONS BOSTON
Analysis:
With a single change we have a new data source.
Analysis:
Notice that only 7 rows are selected. Because we specified DISTINCT, only one instance of the duplicated data is shown, which means that 7 less row is returned. ALL is a keyword that is implied in the basic SELECT statement. We almost never see ALL because SELECT <Table> and SELECT ALL <Table> have the same result. Try this example--for the first (and only!) time in our SQL career: HHHHHHH. SQL> SELECT ALL MGR IIIIIII. FROM EMP; JJJJJJJ. KKKKKKK. MGR LLLLLLL. ---------MMMMMMM. 7902 NNNNNNN. 7698 OOOOOOO. 7698 PPPPPPP. 7839 QQQQQQQ. 7698 RRRRRRR. 7839 SSSSSSS. 7839 TTTTTTT. 7566 UUUUUUU. VVVVVVV. 7698 WWWWWWW. 7788 XXXXXXX. 7698 YYYYYYY. 7566 ZZZZZZZ. 7782 AAAAAAAA. BBBBBBBB. 14 rows selected. It is the same as a SELECT <Column>. Who needs the extra keystrokes?
9. Expressions
The definition of an expression is simple: An expression returns a value. Expression types are very broad, covering different data types such as String, Numeric, and Boolean. In fact, pretty much anything following a clause (SELECT or FROM, for example) is an expression. In the following example EMPNO is an expression that returns the value contained in the EMPNO column. CCCCCCCC. DDDDDDDD. SQL> SELECT empno FROM emp;
In the following statement ENAME, HIREDATE and SAL is expressions: EEEEEEEE. FFFFFFFF. SQL> SELECT ENAME, HIREDATE, SAL FROM EMP;
It contains a condition, NAME = 'SMITH', which is an example of a Boolean expression. NAME = 'SMITH' will be either TRUE or FALSE, depending on the condition =.
9.1. Conditions
If we ever want to find a particular item or group of items in our database, we need one or more conditions. Conditions are contained in the WHERE clause. In the preceding example, the condition is
HHHHHHHH.
ENAME = 'SMITH'
For Ex: to find everyone in our organization that worked more than 100 hours last month, our condition would be IIIIIIII. NUMBEROFHOURS > 100
Conditions enable us to make selective queries. In their most common form, conditions comprise a variable, a constant, and a comparison operator. In the first example the variable is ENAME, the constant is 'SMITH', and the comparison operator is =. In the second example the variable is NUMBEROFHOURS, the constant is 100, and the comparison operator is >. We need to know about two more elements before we can write conditional queries: the WHERE clause and operators.
SELECT, FROM, and WHERE are the three most frequently used clauses in SQL. WHERE simply causes our queries to be more selective. Without the WHERE clause, the most useful thing we could do with a query is display all records in the selected table(s). For example: If we wanted a particular employee, we could type KKKKKKKK. LLLLLLLL. MMMMMMMM. NNNNNNNN. OOOOOOOO. PPPPPPPP. QQQQQQQQ. SQL> SELECT * FROM EMP WHERE ENAME = 'SMITH'; EMPNO ----7369 ENAME JOB -------- --------SMITH CLERK MGR ---7902 HIREDATE --------17-DEC-80 SAL ---800 COMM DEPTNO ----- -----20
Analysis:
This simple example shows how we can place a condition on the data that we want to retrieve.
11. Operators
Operators are the elements we use inside an expression to articulate how we want specified conditions to retrieve data. Operators fall into six groups: Arithmetic Comparison Character Logical Set Miscellaneous
Modulo operator does not work with data types that have decimals, such as Real or Number. If we place several of these arithmetic operators in an expression without any parentheses, the operators are resolved in this order: multiplication, division, modulo, addition, and subtraction. For example, the expression TTTTTTTT. Equals 2*6+9/3
12 + 3 = 15
2 * (6 + 9) / 3
WWWWWWWW. 2 * 15 / 3 = 10 Watch where we put those parentheses! Sometimes the expression does exactly what we tell it to do, rather than what we want it to do. The following sections examine the arithmetic operators in some detail and give us a chance to write some queries.
Here the + adds 15 cents to each salary to produce the following: ZZZZZZZZ. ENAME SAL SAL+1000 AAAAAAAAA. ---------- ---------- ---------BBBBBBBBB. SMITH 800 1800 CCCCCCCCC. ALLEN 1600 2600 DDDDDDDDD. WARD 1250 2250 EEEEEEEEE. JONES 2975 3975 FFFFFFFFF. MARTIN 1250 2250 GGGGGGGGG. BLAKE 2850 3850 HHHHHHHHH. CLARK 2450 3450 IIIIIIIII. SCOTT 3000 4000 JJJJJJJJJ. KING 5000 6000 KKKKKKKKK. TURNER 1500 2500 LLLLLLLLL. ADAMS 1100 2100 MMMMMMMMM. JAMES 950 1950 NNNNNNNNN. FORD 3000 4000 OOOOOOOOO. MILLER 1300 2300 PPPPPPPPP. QQQQQQQQQ. 14 rows selected.
Analysis:
What is this last column with the unattractive column heading SAL+1000? It's not in the original table. (Remember, we used * in the SELECT clause, which causes all the columns to be shown.) SQL allows us to create a virtual or derived column by combining or modifying existing columns.
Analysis:
The output confirms that the original data has not been changed and that the column heading SAL+1000 is not a permanent part of it. In fact, the column heading is so unattractive that we should do something about it. Type the following: RRRRRRRRR. SQL> SELECT ENAME, SAL, (SAL + 1000) INCREMENT_SALARY SSSSSSSSS. FROM EMP; TTTTTTTTT. UUUUUUUUU. ENAME SAL INCREMENTED_SALARY VVVVVVVVV. ---------- ---------- -----------------WWWWWWWWW. SMITH 800 1800 XXXXXXXXX. ALLEN 1600 2600 YYYYYYYYY. WARD 1250 2250 ZZZZZZZZZ. JONES 2975 3975 AAAAAAAAAA. MARTIN 1250 2250
BBBBBBBBBB. BLAKE 2850 CCCCCCCCCC. CLARK 2450 DDDDDDDDDD. SCOTT 3000 EEEEEEEEEE. KING 5000 FFFFFFFFFF. TURNER 1500 GGGGGGGGGG. ADAMS HHHHHHHHHH. JAMES 950 IIIIIIIIII. FORD 3000 JJJJJJJJJJ. MILLER 1300 KKKKKKKKKK. LLLLLLLLLL. 14 rows selected.
3850 3450 4000 6000 2500 1100 1950 4000 2300 2100
Analysis:
This is wonderful! Not only can we create new columns, but we can also rename them on the fly. We can rename any of the columns using the syntax column_name alias (note the space between column_name and alias). For example, the query MMMMMMMMMM. SQL> SELECT ENAME EMPLOYEE_NAME, SAL, SAL + 1000 INCREMENT_SALARY NNNNNNNNNN. FROM EMP; OOOOOOOOOO. PPPPPPPPPP. EMPLOYEE_ENAME SAL INCREMENTED_SALARY QQQQQQQQQQ. --------------- ---------- -----------------RRRRRRRRRR. SMITH 800 1800 SSSSSSSSSS. ALLEN 1600 2600 TTTTTTTTTT. WARD 1250 2250 UUUUUUUUUU. JONES 2975 3975 VVVVVVVVVV. MARTIN 1250 2250 WWWWWWWWWW. BLAKE 2850 3850 XXXXXXXXXX. CLARK 2450 3450 YYYYYYYYYY. SCOTT 3000 4000 ZZZZZZZZZZ. KING 5000 6000 AAAAAAAAAAA. TURNER 1500 2500 BBBBBBBBBBB. ADAMS 1100 2100 CCCCCCCCCCC. JAMES 950 1950 DDDDDDDDDDD. FORD 3000 4000 EEEEEEEEEEE. MILLER 1300 2300 FFFFFFFFFFF. GGGGGGGGGGG. 14 rows selected.
DENOMINATOR
N1
N2
For example, here's a way to manipulate the data: QQQQQQQQQQQ. SQL> SELECT STATE, -HIGHTEMP LOWS, -LOWTEMP HIGHS RRRRRRRRRRR. FROM HILOW; SSSSSSSSSSS. TTTTTTTTTTT. STATE LOWS HIGHS UUUUUUUUUUU. ---------- -------- --------VVVVVVVVVVV. CA 50 -120 WWWWWWWWWWW. FL -20 -110 XXXXXXXXXXX. LA -15 -99 YYYYYYYYYYY. ND 70 -101 ZZZZZZZZZZZ. NE 60 -100 The second (and obvious) use of the minus sign is to subtract one column from another. For example: AAAAAAAAAAAA. SQL> SELECT STATE,HIGHTEMP LOWS, LOWTEMP HIGHS, (LOWTEMP HIGHTEMP) DIFFERENCE BBBBBBBBBBBB. FROM HILOW; CCCCCCCCCCCC. DDDDDDDDDDDD. STATE LOWS HIGHS DIFFERENCE EEEEEEEEEEEE. ---------- -------- -------- ---------FFFFFFFFFFFF. CA -50 120 170 GGGGGGGGGGGG. FL 20 110 90 HHHHHHHHHHHH. LA 15 99 84 IIIIIIIIIIII. ND -70 101 171 JJJJJJJJJJJJ. NE -60 100 160 Notice the use of aliases to fix the data that was entered incorrectly. This remedy is merely a temporary patch, though, and not a permanent fix. We should see to it that the data is corrected and entered correctly in the future. This query not only fixed (at least visually) the incorrect data but also created a new column containing the difference between the highs and lows of each state. If we accidentally use the minus sign on a character field, we get something like this: KKKKKKKKKKKK. LLLLLLLLLLLL. MMMMMMMMMMMM. NNNNNNNNNNNN. OOOOOOOOOOOO. SQL> SELECT -STATE FROM HILOW; ERROR: ORA-01722: invalid number no rows selected
The exact error message varies with implementation, but the result is the same.
3000 1300
1500 650
The use of division in the preceding SELECT statement is straightforward (except that coming up with half salary can be tough).
11.1.6. Precedence
This section examines use of precedence in a SELECT statement. Using the DB HILOW, type the following: Use the following code segment to test precedence: SSSSSSSSSSSSSS. SQL> SELECT N1+N2*N3/N4, (N1+N2)*N3/N4, N1+(N2*N3)/N4 TTTTTTTTTTTTTT. FROM HILOW; UUUUUUUUUUUUUU. VVVVVVVVVVVVVV. N1+N2*N3/N4 (N1+N2)*N3/N4 N1+(N2*N3)/N4 WWWWWWWWWWWWWW. ----------- ------------- ------------XXXXXXXXXXXXXX. 2.5 2.25 2.5 YYYYYYYYYYYYYY. 31.26087 28.152174 31.26087 ZZZZZZZZZZZZZZ. 22.8 55.2 22.8 AAAAAAAAAAAAAAA. 93 975 93 BBBBBBBBBBBBBBB. 7.5 2.25 7.5 Notice that the first and last columns are identical. If we added a fourth column N1+N2* (N3/N4), its values would also be identical to those of the current first and last columns.
COMM ----- --
300 30 20 1400
2450
TTTTTTTTTTTTTTT. 10
7934
MILLER
CLERK
7782
23-JAN-82
1300
Notice that nothing is printed out in the COMM field position for 7934. The value for the field COMM for 7934 is NULL. The NULL is noticeable in this case because it is in a numeric column. However, if the NULL appeared in the EMPNO column, it would be impossible to tell the difference between NULL and a blank. Try to find the NULL: UUUUUUUUUUUUUUU. SQL> SELECT * VVVVVVVVVVVVVVV. FROM EMP WWWWWWWWWWWWWWW. WHERE COMM IS NULL; XXXXXXXXXXXXXXX. YYYYYYYYYYYYYYY. EMPNO ENAME JOB MGR HIREDATE SAL DEPTNO ZZZZZZZZZZZZZZZ. ----- -------- --------- ---- --------------AAAAAAAAAAAAAAAA. 7369 SMITH CLERK 7902 17-DEC-80 800 20 BBBBBBBBBBBBBBBB. 7566 JONES MANAGER 7839 02-APR-81 2975 20 CCCCCCCCCCCCCCCC. 7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 DDDDDDDDDDDDDDDD. 7782 CLARK MANAGER 7839 09-JUN-81 2450 10 EEEEEEEEEEEEEEEE. 7788 SCOTT ANALYST 7566 19-APR-87 3000 20 FFFFFFFFFFFFFFFF. 7839 KING PRESIDENT 17-NOV-81 5000 10 GGGGGGGGGGGGGGGG. 7876 ADAMS CLERK 7788 23-MAY-87 20 HHHHHHHHHHHHHHHH. 7900 JAMES CLERK 7698 03-DEC-81 950 30 IIIIIIIIIIIIIIII. 7902 FORD ANALYST 7566 03-DEC-81 3000 JJJJJJJJJJJJJJJJ. 7934 MILLER CLERK 7782 23-JAN-82 1300 10
COMM ----- --
1100
20
Analysis:
As we can see by the output, the above EMPNO whose value for COMM is NULL or does not contain a value. What if we use the equal sign (=) instead? KKKKKKKKKKKKKKKK. SQL> SELECT * LLLLLLLLLLLLLLLL. FROM EMP MMMMMMMMMMMMMMMM. WHERE COMM = NULL; NNNNNNNNNNNNNNNN. OOOOOOOOOOOOOOOO. no rows selected
Analysis:
We didn't find anything because the comparison COMM = NULL returned a FALSE the result is unknown. It would be more appropriate to use an IS NULL instead of =, changing the WHERE statement to WHERE COMM IS NULL. In this case we would get all the rows where a NULL existed. This example also illustrates both the use of the most common comparison operator, the equal sign (=), and the playground of all comparison operators, the WHERE clause. We already know about the WHERE clause, so here's a brief look at the equal sign.
Let's find SMITH row from EMP table. (On a short list this task appears trivial, but we may have more friends than we do or we may have a list with thousands of records.) PPPPPPPPPPPPPPPP. SQL> SELECT * QQQQQQQQQQQQQQQQ. FROM EMP RRRRRRRRRRRRRRRR. WHERE ENAME = 'SMITH'; SSSSSSSSSSSSSSSS. TTTTTTTTTTTTTTTT. EMPNO ENAME JOB MGR DEPTNO UUUUUUUUUUUUUUUU. ----- -------- --------- ------VVVVVVVVVVVVVVVV. 7369 SMITH CLERK 7902 20 We got the result that we expected. Try this: WWWWWWWWWWWWWWWW. SQL> SELECT * XXXXXXXXXXXXXXXX. FROM EMP YYYYYYYYYYYYYYYY. WHERE JOB = 'ANALYST'; ZZZZZZZZZZZZZZZZ. AAAAAAAAAAAAAAAAA. EMPNO ENAME JOB MGR DEPTNO BBBBBBBBBBBBBBBBB. ----- -------- --------- ------CCCCCCCCCCCCCCCCC. 7788 SCOTT ANALYST 20 DDDDDDDDDDDDDDDDD. 7902 FORD ANALYST 20 Note: Here we see that = can pull in multiple records. Here's another very important lesson concerning case sensitivity: EEEEEEEEEEEEEEEEE. SQL> SELECT * FROM EMP FFFFFFFFFFFFFFFFF. WHERE ENAME = 'SMITH'; GGGGGGGGGGGGGGGGG. HHHHHHHHHHHHHHHHH. EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO IIIIIIIIIIIIIIIII. ----- -------- --------- ---- ------------ ----- -----JJJJJJJJJJJJJJJJJ. 7369 SMITH CLERK 7902 17-DEC-80 800 20 KKKKKKKKKKKKKKKKK. LLLLLLLLLLLLLLLLL. 1 row selected. Now try this: MMMMMMMMMMMMMMMMM. SQL> SELECT * FROM EMP NNNNNNNNNNNNNNNNN. WHERE ENAME = 'Smith'; OOOOOOOOOOOOOOOOO. PPPPPPPPPPPPPPPPP. no rows selected.
HIREDATE --------17-DEC-80
SAL ---800
COMM ----- --
SAL ----
19-APR-87 03-DEC-81
Analysis:
Most companies prefer to store data in uppercase to provide data consistency. We should always store data either in all uppercase or in all lowercase. Mixing case creates difficulties when we try to retrieve accurate data.
MGR
HIREDATE
SAL
COMM
-------- --------PRESIDENT
----
--------5000
----
--
17-NOV-81
Analysis:
This example found all the salaries greater than (but not including) 3000. To include 3000, type this: WWWWWWWWWWWWWWWWW. SQL> SELECT * XXXXXXXXXXXXXXXXX. FROM EMP YYYYYYYYYYYYYYYYY. WHERE SAL >= 3000; ZZZZZZZZZZZZZZZZZ. AAAAAAAAAAAAAAAAAA. EMPNO ENAME JOB COMM DEPTNO BBBBBBBBBBBBBBBBBB. ----- -------- ----------- -----CCCCCCCCCCCCCCCCCC. 7788 SCOTT ANALYST 20 DDDDDDDDDDDDDDDDDD. 7839 KING PRESIDENT 10 EEEEEEEEEEEEEEEEEE. 7902 FORD ANALYST 20
MGR ---7566
7566
03-DEC-81
Analysis:
With this change we get salaries starting at 3000 and going up. We could achieve the same results with the statement SAL > 2999. Note: Notice that no quotes surround 3000 in this SQL statement. Number defined fields number-defined fields do not require quotes.
COMM -----
1100 1600
Analysis:
Did we just use < on a character field? Of course we did. We can use any of these operators on any data type. The result varies by data type. For example, use lowercase in the following state search: OOOOOOOOOOOOOOOOOO. SQL> SELECT * PPPPPPPPPPPPPPPPPP. FROM EMP QQQQQQQQQQQQQQQQQQ. WHERE ENAME < 'clark'; RRRRRRRRRRRRRRRRRR. SSSSSSSSSSSSSSSSSS. EMPNO ENAME JOB MGR HIREDATE COMM DEPTNO TTTTTTTTTTTTTTTTTT. ----- -------- --------- ---- ---------------
SAL ----- --
UUUUUUUUUUUUUUUUUU. 7369 20 VVVVVVVVVVVVVVVVVV. 7566 20 WWWWWWWWWWWWWWWWWW. 2850 30 XXXXXXXXXXXXXXXXXX. 7782 10 YYYYYYYYYYYYYYYYYY. 7788 20 ZZZZZZZZZZZZZZZZZZ. 7839 KING 10 AAAAAAAAAAAAAAAAAAA. 7876 20 BBBBBBBBBBBBBBBBBBB. 7900 30 CCCCCCCCCCCCCCCCCCC. 7902 20 DDDDDDDDDDDDDDDDDDD. 7934 10
7902 7839
800 2975
MANAGER ANALYST
Analysis:
Uppercase is usually sorted before lowercase; therefore, the uppercase codes returned are less than 'clark'. Again, to be safe, check our implementation. Tip: To be sure of how these operators will behave, check our language tables. Most PC implementations use the ASCII tables. Some other platforms use EBCDIC.
COMM -----
VVVVVVVVVVVVVVVVVVV. 7566 20 WWWWWWWWWWWWWWWWWWW. 2850 30 XXXXXXXXXXXXXXXXXXX. 7782 10 YYYYYYYYYYYYYYYYYYY. 7788 20 ZZZZZZZZZZZZZZZZZZZ. 7839 KING 10 AAAAAAAAAAAAAAAAAAAA. 7902 20
MANAGER BLAKE
7839
02-APR-81 7839
2975
MANAGER ANALYST
17-NOV-81 7566
03-DEC-81
3000
Note: Notice that both symbols, <> and !=, can express "not equals."
11.3.1. Like
What if we wanted to select parts of a database that fit a pattern but weren't quite exact matches? We could use the equal sign and run through all the possible cases, but that process would be boring and time-consuming. Instead, we could use LIKE. Consider the following: How can we find all the employees whose name starts have OT in there name? A quick visual inspection of this simple table shows that it has two parts, but unfortunately the locations have slightly different names. Try this: BBBBBBBBBBBBBBBBBBBB. SQL> SELECT * CCCCCCCCCCCCCCCCCCCC. FROM EMP DDDDDDDDDDDDDDDDDDDD. WHERE ENAME LIKE '%OT%'; EEEEEEEEEEEEEEEEEEEE. FFFFFFFFFFFFFFFFFFFF. EMPNO ENAME JOB MGR COMM DEPTNO GGGGGGGGGGGGGGGGGGGG. ----- -------- --------- ------ -----HHHHHHHHHHHHHHHHHHHH. 7788 SCOTT ANALYST 7566 20
HIREDATE --------19-APR-87
SAL ---3000 --
Analysis:
We can see the use of the percent sign (%) in the statement after LIKE. When used inside a LIKE expression, % is a wildcard. What we asked for was any occurrence of OT in the column location. If we queried IIIIIIIIIIIIIIIIIIII. SQL> SELECT * JJJJJJJJJJJJJJJJJJJJ. FROM EMP KKKKKKKKKKKKKKKKKKKK. WHERE ENAME LIKE 'SC%'; we would get any occurrence that started with SC: LLLLLLLLLLLLLLLLLLLL. EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO MMMMMMMMMMMMMMMMMMMM. ----- -------- --------- ---- ----------- ----- -----NNNNNNNNNNNNNNNNNNNN. 7788 SCOTT ANALYST 7566 19-APR-87 3000 20 If we queried OOOOOOOOOOOOOOOOOOOO. SQL> SELECT * PPPPPPPPPPPPPPPPPPPP. FROM EMP QQQQQQQQQQQQQQQQQQQQ. WHERE ENAME LIKE 'A%';
we would get any name that starts with A: RRRRRRRRRRRRRRRRRRRR. COMM DEPTNO SSSSSSSSSSSSSSSSSSSS. --- -----TTTTTTTTTTTTTTTTTTTT. 300 30 UUUUUUUUUUUUUUUUUUUU. 20 EMPNO ----7499 7876 ENAME JOB MGR ---7698 7788 HIREDATE --------20-FEB-81 23-MAY-87 SAL ---1600 1100 --
Is LIKE case sensitive? Try the next query to find out. VVVVVVVVVVVVVVVVVVVV. SQL> SELECT * WWWWWWWWWWWWWWWWWWWW. FROM EMP XXXXXXXXXXXXXXXXXXXX. WHERE ENAME LIKE 'a%'; YYYYYYYYYYYYYYYYYYYY. ZZZZZZZZZZZZZZZZZZZZ. no rows selected
Analysis:
The answer is yes. References to data are always case sensitive. What if we want to find data that matches all but one character in a certain pattern? In this case we could use a different type of wildcard: the underscore.
HIREDATE
SAL
VVVVVVVVVVVVVVVVVVVVV. ----- -------- --------- ---- -------------- -----WWWWWWWWWWWWWWWWWWWWW. 7788 SCOTT ANALYST 7566 19APR-87 3000 20
--
Notice that the results are identical. These two wildcards can be combined. The next example finds all records with L as the second character: XXXXXXXXXXXXXXXXXXXXX. SQL> SELECT * YYYYYYYYYYYYYYYYYYYYY. FROM EMP ZZZZZZZZZZZZZZZZZZZZZ. WHERE ENAME LIKE '_L%'; AAAAAAAAAAAAAAAAAAAAAA. BBBBBBBBBBBBBBBBBBBBBB. EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO CCCCCCCCCCCCCCCCCCCCCC. ----- -------- --------- ---- ----------- ----- -----DDDDDDDDDDDDDDDDDDDDDD. 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 EEEEEEEEEEEEEEEEEEEEEE. 7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 FFFFFFFFFFFFFFFFFFFFFF. 7782 CLARK MANAGER 7839 09-JUN-81 2450 10
Analysis:
Notice that || is used instead of +. If we use + to try to concatenate the strings, the SQL interpreter used for this example returns the following error: RRRRRRRRRRRRRRRRRRRRRR. SQL> SELECT DEPTNO + DNAME CODE_NAME SSSSSSSSSSSSSSSSSSSSSS. FROM DEPT; TTTTTTTTTTTTTTTTTTTTTT. ERROR: UUUUUUUUUUUUUUUUUUUUUU. ORA-01722: invalid number It is looking for two numbers to add and throws the error invalid number when it doesn't find any. Note: Some implementations of SQL use the plus sign to concatenate strings. Check our implementation. Here's a more practical example using concatenation: VVVVVVVVVVVVVVVVVVVVVV. SQL> SELECT DEPTNO || ',' || DNAME CODE_NAME WWWWWWWWWWWWWWWWWWWWWW. FROM DEPT; XXXXXXXXXXXXXXXXXXXXXX. YYYYYYYYYYYYYYYYYYYYYY. CODE_NAME ZZZZZZZZZZZZZZZZZZZZZZ. ----------------------------------------------------AAAAAAAAAAAAAAAAAAAAAAA. 10 , ACCOUNTING BBBBBBBBBBBBBBBBBBBBBBB. 20 , RESEARCH CCCCCCCCCCCCCCCCCCCCCCC. 30 , SALES
DDDDDDDDDDDDDDDDDDDDDDD.
40 , OPERATIONS
Analysis:
This statement inserted a comma between the DEPTNO and the DNAME. Note: The extra spaces between the first name and the last name in these examples. These spaces are actually part of the data. With certain data types, spaces are right-padded to values less than the total length allocated for a field.
Analysis:
This query is the most complicated we have done so far. The SELECT clause (lines 1 and 2) uses arithmetic operators to determine how many days of leave each employee has remaining. The normal precedence computes YEARS * 12 - LEAVETAKEN. (A clearer approach would be to write (YEARS * 12) - LEAVETAKEN.) LIKE is used in line 3 with the wildcard % to find all the B names. Line 3 uses the > to find all occurrences greater than 50. The new element is on line 3. We used the logical operator AND to ensure that we found records that meet the criteria in lines 3.
11.4.1. AND
AND means that the expressions on both sides must be true to return TRUE. If either expression is false, AND returns FALSE. For example, to find out which employees have been with the company for 5 years or less and have taken more than 20 days leave, try this: YYYYYYYYYYYYYYYYYYYYYYY. SQL> SELECT LASTNAME ZZZZZZZZZZZZZZZZZZZZZZZ. FROM VACATION
AAAAAAAAAAAAAAAAAAAAAAAA. WHERE YEARS <= 5 AND LEAVETAKEN > 20; BBBBBBBBBBBBBBBBBBBBBBBB. CCCCCCCCCCCCCCCCCCCCCCCC. LASTNAME DDDDDDDDDDDDDDDDDDDDDDDD. -------EEEEEEEEEEEEEEEEEEEEEEEE. BAKER FFFFFFFFFFFFFFFFFFFFFFFF. BOLIVAR If we want to know which employees have been with the company for 5 years or more and have taken less than 50 percent of their leave, we could write: GGGGGGGGGGGGGGGGGGGGGGGG. SQL> SELECT LASTNAME WORKAHOLICS HHHHHHHHHHHHHHHHHHHHHHHH. FROM VACATION IIIIIIIIIIIIIIIIIIIIIIII. WHERE YEARS >= 5 AND ((YEARS *12)-LEAVETAKEN)/(YEARS * 12) < 0.50; JJJJJJJJJJJJJJJJJJJJJJJJ. KKKKKKKKKKKKKKKKKKKKKKKK. WORKAHOLICS LLLLLLLLLLLLLLLLLLLLLLLL. --------------MMMMMMMMMMMMMMMMMMMMMMMM. BAKER NNNNNNNNNNNNNNNNNNNNNNNN. BLEDSOE Check these people for burnout. Also check out how we used the AND to combine these two conditions.
11.4.2. OR
We can also use OR to sum up a series of conditions. If any of the comparisons is true, OR returns TRUE. To illustrate the difference, conditions run the last query with OR instead of with AND: OOOOOOOOOOOOOOOOOOOOOOOO. SQL> SELECT LASTNAME WORKAHOLICS PPPPPPPPPPPPPPPPPPPPPPPP. FROM VACATION QQQQQQQQQQQQQQQQQQQQQQQQ. WHERE YEARS >= 5 OR ((YEARS *12)LEAVETAKEN)/(YEARS * 12) >= 0.50; RRRRRRRRRRRRRRRRRRRRRRRR. SSSSSSSSSSSSSSSSSSSSSSSS. WORKAHOLICS TTTTTTTTTTTTTTTTTTTTTTTT. --------------UUUUUUUUUUUUUUUUUUUUUUUU. ABLE VVVVVVVVVVVVVVVVVVVVVVVV. BAKER WWWWWWWWWWWWWWWWWWWWWWWW. BLEDSOE XXXXXXXXXXXXXXXXXXXXXXXX. BOLD YYYYYYYYYYYYYYYYYYYYYYYY. COSTALES
Analysis:
The original names are still in the list, but we have three new entries (who would probably resent being called workaholics). These three new names made the list because they satisfied one of the conditions. OR requires that only one of the conditions be true in order for data to be returned.
11.4.3. NOT
NOT means just that. If the condition it applies to evaluates to TRUE, NOT make it FALSE. If the condition after the NOT is FALSE, it becomes TRUE. For example, the following SELECT returns the only two names not beginning with B in the table: ZZZZZZZZZZZZZZZZZZZZZZZZ. SQL> SELECT * AAAAAAAAAAAAAAAAAAAAAAAAA. FROM VACATION BBBBBBBBBBBBBBBBBBBBBBBBB. WHERE LASTNAME NOT LIKE 'B%'; CCCCCCCCCCCCCCCCCCCCCCCCC. DDDDDDDDDDDDDDDDDDDDDDDDD. LASTNAME EMPLOYEENUM YEARS LEAVETAKEN EEEEEEEEEEEEEEEEEEEEEEEEE. -------------- ----------- -------- ---------FFFFFFFFFFFFFFFFFFFFFFFFF. ABLE 101 2 4 GGGGGGGGGGGGGGGGGGGGGGGGG. COSTALES 211 10 78 NOT can also be used with the operator IS when applied to NULL. Recall the EMP table where we put a NULL value in the COMM column. To find the non-NULL items, type this:
HHHHHHHHHHHHHHHHHHHHHHHHH. SQL> SELECT * IIIIIIIIIIIIIIIIIIIIIIIII. FROM EMP JJJJJJJJJJJJJJJJJJJJJJJJJ. WHERE COMM IS NOT NULL; KKKKKKKKKKKKKKKKKKKKKKKKK. LLLLLLLLLLLLLLLLLLLLLLLLL. EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO MMMMMMMMMMMMMMMMMMMMMMMMM. ----- -------- --------- ---- ----------- ----- -----NNNNNNNNNNNNNNNNNNNNNNNNN. 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 OOOOOOOOOOOOOOOOOOOOOOOOO. 7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 PPPPPPPPPPPPPPPPPPPPPPPPP. 7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 QQQQQQQQQQQQQQQQQQQQQQQQQ. 7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30
ZZZZZZZZZZZZZZZZZZZZZZZZZZ. CHARLIE AAAAAAAAAAAAAAAAAAAAAAAAAAA. DEAN BBBBBBBBBBBBBBBBBBBBBBBBBBB. DECON CCCCCCCCCCCCCCCCCCCCCCCCCCC. EXITOR DDDDDDDDDDDDDDDDDDDDDDDDDDD. FALCONER EEEEEEEEEEEEEEEEEEEEEEEEEEE. FUBAR FFFFFFFFFFFFFFFFFFFFFFFFFFF. GOOBER GGGGGGGGGGGGGGGGGGGGGGGGGGG. 10 rows selected. UNION returns 10 distinct names from the two lists. How many names are on both lists (including duplicates)? HHHHHHHHHHHHHHHHHHHHHHHHHHH. SQL> SELECT NAME FROM SOFTBALL IIIIIIIIIIIIIIIIIIIIIIIIIII. UNION ALL JJJJJJJJJJJJJJJJJJJJJJJJJJJ. SELECT NAME FROM FOOTBALL; KKKKKKKKKKKKKKKKKKKKKKKKKKK. LLLLLLLLLLLLLLLLLLLLLLLLLLL. NAME MMMMMMMMMMMMMMMMMMMMMMMMMMM. -------------------NNNNNNNNNNNNNNNNNNNNNNNNNNN. ABLE OOOOOOOOOOOOOOOOOOOOOOOOOOO. BAKER PPPPPPPPPPPPPPPPPPPPPPPPPPP. CHARLIE QQQQQQQQQQQQQQQQQQQQQQQQQQQ. DEAN RRRRRRRRRRRRRRRRRRRRRRRRRRR. EXITOR SSSSSSSSSSSSSSSSSSSSSSSSSSS. FALCONER TTTTTTTTTTTTTTTTTTTTTTTTTTT. GOOBER UUUUUUUUUUUUUUUUUUUUUUUUUUU. ABLE VVVVVVVVVVVVVVVVVVVVVVVVVVV. BRAVO WWWWWWWWWWWWWWWWWWWWWWWWWWW. CHARLIE XXXXXXXXXXXXXXXXXXXXXXXXXXX. DECON YYYYYYYYYYYYYYYYYYYYYYYYYYY. EXITOR ZZZZZZZZZZZZZZZZZZZZZZZZZZZ. FUBAR AAAAAAAAAAAAAAAAAAAAAAAAAAAA. GOOBER BBBBBBBBBBBBBBBBBBBBBBBBBBBB. CCCCCCCCCCCCCCCCCCCCCCCCCCCC. 14 rows selected.
Analysis:
The combined list courtesy of the UNION ALL statement has 14 names. UNION ALL works just like UNION except it does not eliminate duplicates. Now show me a list of players who are on both teams. You can't do that with UNION-you need to learn INTERSECT.
11.5.2. INTERSECT
INTERSECT returns only the rows found by both queries. The next SELECT statement shows the list of players who play on both teams: DDDDDDDDDDDDDDDDDDDDDDDDDDDD. SQL> SELECT * FROM FOOTBALL EEEEEEEEEEEEEEEEEEEEEEEEEEEE. INTERSECT FFFFFFFFFFFFFFFFFFFFFFFFFFFF. SELECT * FROM SOFTBALL; GGGGGGGGGGGGGGGGGGGGGGGGGGGG. NAME HHHHHHHHHHHHHHHHHHHHHHHHHHHH. -------------------IIIIIIIIIIIIIIIIIIIIIIIIIIII. ABLE JJJJJJJJJJJJJJJJJJJJJJJJJJJJ. CHARLIE KKKKKKKKKKKKKKKKKKKKKKKKKKKK. EXITOR LLLLLLLLLLLLLLLLLLLLLLLLLLLL. GOOBER
Analysis:
In this example INTERSECT finds the short list of players who are on both teams by combining the results of the two SELECT statements.
FUBAR
Analysis:
The preceding query shows the three football players who are not on the softball team. VVVVVVVVVVVVVVVVVVVVVVVVVVVV. SQL> SELECT * FROM SOFTBALL WWWWWWWWWWWWWWWWWWWWWWWWWWWW. MINUS XXXXXXXXXXXXXXXXXXXXXXXXXXXX. SELECT * FROM FOOTBALL; YYYYYYYYYYYYYYYYYYYYYYYYYYYY. ZZZZZZZZZZZZZZZZZZZZZZZZZZZZ. NAME AAAAAAAAAAAAAAAAAAAAAAAAAAAAA. -------------------BBBBBBBBBBBBBBBBBBBBBBBBBBBBB. BAKER CCCCCCCCCCCCCCCCCCCCCCCCCCCCC. DEAN DDDDDDDDDDDDDDDDDDDDDDDDDDDDD. FALCONER
VVVVVVVVVVVVVVVVVVVVVVVVVVVVV. WWWWWWWWWWWWWWWWWWWWWWWWWWWWW. EMPNO ENAME JOB HIREDATE SAL COMM DEPTNO XXXXXXXXXXXXXXXXXXXXXXXXXXXXX. ----- -------- --------- ------ ----- -----YYYYYYYYYYYYYYYYYYYYYYYYYYYYY. 7369 SMITH CLERK 7902 800 20 ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ. 7566 JONES MANAGER 7839 2975 20 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA. 7782 CLARK MANAGER JUN-81 2450 10 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB. 7788 SCOTT ANALYST APR-87 3000 20 CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC. 7839 KING PRESIDENT NOV-81 5000 10 DDDDDDDDDDDDDDDDDDDDDDDDDDDDDD. 7876 ADAMS CLERK MAY-87 1100 20 EEEEEEEEEEEEEEEEEEEEEEEEEEEEEE. 7902 FORD ANALYST DEC-81 3000 20 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF. 7934 MILLER CLERK 7782 1300 10
23-JAN-82
Analysis:
The second example is shorter and more readable than the first. You never know when you might have to go back and work on something you wrote months ago. IN also works with characters. Consider the following, where the column JOB is a number: GGGGGGGGGGGGGGGGGGGGGGGGGGGGGG. SQL> SELECT * HHHHHHHHHHHHHHHHHHHHHHHHHHHHHH. FROM EMP IIIIIIIIIIIIIIIIIIIIIIIIIIIIII. WHERE JOB IN(ANALYST,CLERK); JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ. KKKKKKKKKKKKKKKKKKKKKKKKKKKKKK. EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO LLLLLLLLLLLLLLLLLLLLLLLLLLLLLL. ----- -------- --------- ---- ----------- ----- -----MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM. 7369 SMITH CLERK 7902 17-DEC-80 800 20 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN. 7788 SCOTT ANALYST 7566 19APR-87 3000 20 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO. 7876 ADAMS CLERK 7788 23MAY-87 1100 20 PPPPPPPPPPPPPPPPPPPPPPPPPPPPPP. 7900 JAMES CLERK 7698 03DEC-81 950 30 QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ. 7902 FORD ANALYST 7566 03DEC-81 3000 20 RRRRRRRRRRRRRRRRRRRRRRRRRRRRRR. 7934 MILLER CLERK 7782 23JAN-82 1300 10 If you needed a range of things from the PRICE table, you could write the following: SSSSSSSSSSSSSSSSSSSSSSSSSSSSSS. SQL> SELECT * TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT. FROM EMP UUUUUUUUUUUUUUUUUUUUUUUUUUUUUU. WHERE SAL > 5000 AND VVVVVVVVVVVVVVVVVVVVVVVVVVVVVV. WWWWWWWWWWWWWWWWWWWWWWWWWWWWWW. EMPNO ENAME MGR HIREDATE SAL COMM DEPTNO XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX. ----- -------- -------------- ----- -----YYYYYYYYYYYYYYYYYYYYYYYYYYYYYY. 7566 JONES MANAGER APR-81 2975 20 ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ. 7698 BLAKE MANAGER 7839 2850 30
01-MAY-81
A. B. C.
10 20 20
Or using BETWEEN, you would write this: D. E. F. G. H. I. J. K. L. M. N. SQL> SELECT * FROM EMP WHERE SAL BETWEEN 2000 AND 5000; EMPNO ----7566 7698 7782 7788 7902 ENAME -------JONES BLAKE CLARK SCOTT FORD JOB --------MANAGER MANAGER MANAGER ANALYST ANALYST MGR ---7839 7839 7839 7566 7566 HIREDATE --------02-APR-81 01-MAY-81 09-JUN-81 19-APR-87 03-DEC-81 SAL ---2975 2850 2450 3000 3000 COMM DEPTNO ----- -----20 30 10 20 20
Again, the second example is a cleaner, more readable solution than the first. Note: If a SAL value of 2000 existed in the EMP table, that record would have been retrieved also. Parameters used in the BETWEEN operator are inclusive parameters inclusive.
Page
40
of
Dual Table
The Oracle DUAL table
Dual is a table which is created by oracle along with the data dictionary. It consists of exactly one column whose name is dummy and one record. The value of that record is X. O. P. Q. R. S. T. U. V. W. X. SQL> DESC DUAL Name Null? Type ----------------------- -------- ---------------DUMMY VARCHAR2(1) SQL> SELECT * FROM DUAL; D X
The owner of dual is SYS but dual can be accessed by every user. As dual contains exactly one row (unless someone fiddled with it), it is guaranteed to return exactly one row in select statements if a constant expression is selected against dual, such as in: Y. SQL> SELECT SYSDATE FROM DUAL;
Although it is possible to delete the one record, or insert additional records, one really should not do that!. DUAL is an interesting table - it contains ONE column and ONE record. Oracle has created this since it makes some calculations more convenient. For example, we can use it for math: Z. SQL> SELECT (319/212)+10 FROM DUAL;
We can use it to increment sequences: AA. SQL> SELECT employee_seq.NEXTVAL FROM DUAL;
We can use it to play around or test some SQL: SQL> SELECT CHR(70) FROM DUAL;
Functions
Functions in SQL enable we to perform feats such as determining the sum of a column or converting all the characters of a string to uppercase. By the end of the day, we will understand and be able to use all the following: Aggregate functions Date and time functions Arithmetic functions Character functions Conversion functions Miscellaneous functions These functions greatly increase our ability to manipulate the information we retrieved using the basic functions of SQL. The first five aggregate functions, COUNT, SUM, AVG, MAX, and MIN, are defined in the ANSI standard. Most implementations of SQL have extensions to these aggregate functions, some of which are covered today. Some implementations may use different names for these functions.
1. Aggregate Functions
These functions are also referred to as group functions. They return a value based on the values in a column. (After all, we wouldn't ask for the average of a single field.) The examples in this section use the table TEAMSTATS: SQL> SELECT * FROM TEAMSTATS; NAME POS AB HITS WALKS SINGLES DOUBLES TRIPLES HR SO --------- --- --- ---- ----- ------- ------- ------- -- -JONES 1B 145 45 34 31 8 1 5 10 DONKNOW 3B 175 65 23 50 10 1 4 15 WORLEY LF 157 49 15 35 8 3 3 16 DAVID OF 187 70 24 48 4 0 17 42 HAMHOCKER 3B 50 12 10 10 2 0 0 13 CASEY DH 1 0 0 0 0 0 0 1 6 rows selected.
1.1. COUNT
The function COUNT returns the number of rows that satisfy the condition in the WHERE clause. Say we wanted to know how many ball players were hitting under 350. We would type SQL> SELECT COUNT(*) FROM TEAMSTATS WHERE HITS/AB < .35; COUNT(*) -------4 To make the code more readable, try an alias: SQL> SELECT COUNT(*) NUM_BELOW_350 FROM TEAMSTATS WHERE HITS/AB < .35; NUM_BELOW_350 ------------4 Would it make any difference if we tried a column name instead of the asterisk? (Notice the use of parentheses around the column names.) Try this: SQL> SELECT COUNT(NAME) NUM_BELOW_350 FROM TEAMSTATS WHERE HITS/AB < .35;
NUM_BELOW_350 ------------4 The answer is no. The NAME column that we selected was not involved in the WHERE statement. If we use COUNT without a WHERE clause, it returns the number of records in the table. SQL> SELECT COUNT(*) FROM TEAMSTATS; COUNT(*) --------6
1.2. SUM
SUM does just that. It returns the sum of all values in a column. To find out how many singles have been hit, type SQL> SELECT SUM(SINGLES) TOTAL_SINGLES FROM TEAMSTATS; TOTAL_SINGLES ------------174 To get several sums, use SQL> SELECT SUM(SINGLES) TOTAL_SINGLES, SUM(DOUBLES) TOTAL_DOUBLES, SUM(TRIPLES) TOTAL_TRIPLES, SUM(HR) TOTAL_HR FROM TEAMSTATS; TOTAL_SINGLES TOTAL_DOUBLES TOTAL_TRIPLES TOTAL_HR ------------- ------------- ------------- -------174 32 5 29 To collect similar information on all 300 or better players, type SQL> SELECT SUM(SINGLES) TOTAL_SINGLES, SUM(DOUBLES) TOTAL_DOUBLES, SUM(TRIPLES) TOTAL_TRIPLES, SUM(HR) TOTAL_HR FROM TEAMSTATS WHERE HITS/AB >- .300; TOTAL_SINGLES TOTAL_DOUBLES TOTAL_TRIPLES TOTAL_HR ------------- ------------- ------------- -------164 30 5 29 To compute a team batting average, type SQL> SELECT SUM(HITS)/SUM(AB) TEAM_AVERAGE FROM TEAMSTATS; TEAM_AVERAGE -----------.33706294 SUM works only with numbers. If we try it on a non-numerical field, we get SQL> SELECT SUM(NAME) FROM TEAMSTATS; ***### ERROR: ORA-01722: invalid number no rows selected This error message is logical because we cannot sum a group of names.
1.3. AVG
The AVG function computes the average of a column. To find the average number of strike outs, use this:
SQL> SELECT AVG(SO) AVE_STRIKE_OUTS FROM TEAMSTATS; AVE_STRIKE_OUTS --------------16.166667 The following example illustrates the difference between SUM and AVG: SQL> SELECT AVG(HITS/AB) TEAM_AVERAGE FROM TEAMSTATS; TEAM_AVERAGE -----------.26803448
Analysis:
The team was batting over 300 in the previous example! What happened? AVG computed the average of the combined column hits divided by at bats, whereas the example with SUM divided the total number of hits by the number of at bats. For example, player A gets 50 hits in 100 at bats for a .500 average. Player B gets 0 hits in 1 at bat for a 0.0 average. The average of 0.0 and 0.5 is .250. If we compute the combined average of 50 hits in 101 at bats, the answer is a respectable .495. The following statement returns the correct batting average: SQL> SELECT AVG(HITS)/AVG(AB) TEAM_AVERAGE FROM TEAMSTATS; TEAM_AVERAGE -----------.33706294 Like the SUM function, AVG works only with numbers.
1.4. MAX
If we want to find the largest value in a column, use MAX. For example, what is the highest number of hits? SQL> SELECT MAX(HITS) FROM TEAMSTATS; MAX(HITS) --------70 Can we find out who has the most hits? SQL> SELECT NAME FROM TEAMSTATS WHERE HITS = MAX(HITS); ***### ERROR at line 3: ORA-00934: group function is not allowed here Unfortunately, we can't. The error message is a reminder that this group function ***### (remember that aggregate functions are also called group functions) doesnt work in WHERE clause. What happens if we try a non-numerical column? SQL> SELECT MAX(NAME) FROM TEAMSTATS; MAX(NAME) --------------WORLEY Here's something new. MAX returns the highest (closest to Z) string. Finally, a function works with both characters and numbers.
1.5. MIN
MIN returns the lowest member of a column. To find out the fewest at bats, type SQL> SELECT MIN(AB) FROM TEAMSTATS; MIN(AB) --------1 The following statement returns the name closest to the beginning of the alphabet: SQL> SELECT MIN(NAME) FROM TEAMSTATS; MIN(NAME) --------------CASEY We can combine MIN with MAX to give a range of values. For example: SQL> SELECT MIN(AB), MAX(AB) FROM TEAMSTATS; MIN(AB) MAX(AB) -------- -------1 187 This sort of information can be useful when using statistical functions. NOTE: As we mentioned in the introduction, the first five aggregate functions are described in the ANSI standard. The remaining aggregate functions have become de facto standards, present in all important implementations of SQL. We use the Oracle7 names for these functions. Other implementations may use different names.
1.6. VARIANCE
VARIANCE produces square of standard deviation, a number vital to many statistical calculations. It works like this: SQL> SELECT VARIANCE(HITS) FROM TEAMSTATS; VARIANCE(HITS) -------------802.96667 If we try a string SQL> SELECT VARIANCE(NAME) FROM TEAMSTATS; ERROR: ORA-01722: invalid number no rows selected We find that VARIANCE is another function that works exclusively with numbers.
1.7. STDDEV
Final group function, STDDEV, finds the standard deviation of a column of numbers, as demonstrated by this example: SQL> SELECT STDDEV(HITS) FROM TEAMSTATS; STDDEV(HITS) -----------28.336666 It also returns an error when confronted by a string:
***### SQL> SELECT STDDEV(NAME) FROM TEAMSTATS; ERROR: ORA-01722: invalid number no rows selected These aggregate functions can also be used in various combinations: SQL> SELECT COUNT(AB),AVG(AB),MIN(AB),MAX(AB),STDDEV(AB),VARIANCE(AB),SUM(AB) FROM TEAMSTATS; COUNT(AB) AVG(AB) MIN(AB) MAX(AB) STDDEV(AB) VARIANCE(AB) SUM(AB) --------- ------- ------- ------- ---------- ------------ ------6 119.167 1 187 75.589 5712.97 715 The next time we hear a sportscaster use statistics to fill the time between plays, we will know that SQL is at work somewhere behind the scenes.
2.1. ADD_MONTHS
This function adds a number of months to a specified date. For example, say something extraordinary happened, and the preceding project slipped to the right by two months. We could make a new schedule by typing SQL> SELECT TASK, STARTDATE, ENDDATE ORIGINAL_END, ADD_MONTHS(ENDDATE,2) FROM PROJECT; TASK -------------KICKOFF MTG TECH SURVEY USER MTGS DESIGN WIDGET CODE WIDGET TESTING STARTDATE --------01-APR-95 02-APR-95 15-MAY-95 01-JUN-95 01-JUL-95 03-SEP-95 ORIGINAL_ --------01-APR-95 01-MAY-95 30-MAY-95 30-JUN-95 02-SEP-95 17-JAN-96 ADD_MONTH --------01-JUN-95 01-JUL-95 30-JUL-95 31-AUG-95 02-NOV-95 17-MAR-96
6 rows selected. Not that a slip like this is possible, but it's nice to have a function that makes it so easy. ADD_MONTHS also works outside the SELECT clause. Typing SQL> SELECT TASK TASKS_SHORTER_THAN_ONE_MONTH FROM PROJECT WHERE ADD_MONTHS(STARTDATE,1) > ENDDATE; TASKS_SHORTER_THAN_ONE_MONTH
Analysis:
***### we will find that all the functions in this section work in more than one place. However, ADD MONTHS does not work with other data types like character or number without the help of functions TO_CHAR and TO_DATE, which are discussed later today.
2.2. LAST_DAY
LAST_DAY returns the last day of a specified month. It is for those of us who haven't mastered the "Thirty days has September..." rhyme--or at least those of us who have not yet taught it to our computers. If, for example, we need to know what the last day of the month is in the column ENDDATE, we would type SQL> SELECT ENDDATE, LAST_DAY(ENDDATE) FROM PROJECT; ENDDATE LAST_DAY(ENDDATE) --------- ----------------01-APR-95 30-APR-95 01-MAY-95 31-MAY-95 30-MAY-95 31-MAY-95 30-JUN-95 30-JUN-95 02-SEP-95 30-SEP-95 17-JAN-96 31-JAN-96 6 rows selected. How does LAST DAY handle leap years? SQL> SELECT LAST_DAY('1-FEB-95') NON_LEAP, LAST_DAY('1-FEB-96') LEAP FROM PROJECT; NON_LEAP LEAP --------- --------28-FEB-95 29-FEB-96 28-FEB-95 29-FEB-96 28-FEB-95 29-FEB-96 28-FEB-95 29-FEB-96 28-FEB-95 29-FEB-96 28-FEB-95 29-FEB-96 6 rows selected.
Analysis:
We got the right result, but why were so many rows returned? Because we didn't specify an existing column or any conditions, the SQL engine applied the date functions in the statement to each existing row. Let's get something less redundant by using the following: SQL>SELECT DISTINCT LAST_DAY('1-FEB-95') NON_LEAP, LAST_DAY('1-FEB-96') LEAP FROM PROJECT; This statement uses the word DISTINCT to produce the singular result NON_LEAP LEAP --------- --------28-FEB-95 29-FEB-96 Unlike we, this function knows which years are leap years. But before we trust our own or our company's financial future to this or any other function, check our implementation!
2.3. MONTHS_BETWEEN
If we need to know how many months fall between month x and month y, use MONTHS_BETWEEN like this: SQL> SELECT TASK,STARTDATE,ENDDATE,MONTHS_BETWEEN(STARTDATE,ENDDATE) DURATION FROM PROJECT; TASK STARTDATE -------------- --------KICKOFF MTG 01-APR-95 TECH SURVEY 02-APR-95 USER MTGS 15-MAY-95 DESIGN WIDGET 01-JUN-95 CODE WIDGET 01-JUL-95 TESTING 03-SEP-95 6 rows selected. ENDDATE --------01-APR-95 01-MAY-95 30-MAY-95 30-JUN-95 02-SEP-95 17-JAN-96 DURATION --------0 -.9677419 -.483871 -.9354839 -2.032258 -4.451613
Wait a minute--that doesn't look right. Try this: SQL> SELECT TASK, STARTDATE,ENDDATE,MONTHS_BETWEEN(ENDDATE,STARTDATE) DURATION FROM PROJECT; TASK STARTDATE -------------- --------KICKOFF MTG 01-APR-95 TECH SURVEY 02-APR-95 USER MTGS 15-MAY-95 DESIGN WIDGET 01-JUN-95 CODE WIDGET 01-JUL-95 TESTING 03-SEP-95 6 rows selected. ENDDATE --------01-APR-95 01-MAY-95 30-MAY-95 30-JUN-95 02-SEP-95 17-JAN-96 DURATION --------0 .96774194 .48387097 .93548387 2.0322581 4.4516129
Analysis:
That's better. We see that MONTHS_BETWEEN is sensitive to the way we order the months. Negative months might not be bad. For example, we could use a negative result to determine whether one date happened before another. For example, the following statement shows all the tasks that started before May 19, 1995: SQL> SELECT * FROM PROJECT WHERE MONTHS_BETWEEN('19 MAY 95', STARTDATE) > 0; TASK -------------KICKOFF MTG TECH SURVEY USER MTGS STARTDATE --------01-APR-95 02-APR-95 15-MAY-95 ENDDATE --------01-APR-95 01-MAY-95 30-MAY-95
2.4. NEW_TIME
If we need to adjust the time according to the time zone we are in, the NEW_TIME function is for us. Here are the time zones we can use with this function:
Abbreviation
AST or ADT BST or BDT CST or CDT EST or EDT GMT HST or HDT MST or MDT
Time Zone
Atlantic standard or daylight time Bering standard or daylight time Central standard or daylight time Eastern standard or daylight time Greenwich mean time Alaska-Hawaii standard or daylight time Mountain standard or daylight time
Newfoundland standard time Pacific standard or daylight time Yukon standard or daylight time
We can adjust our time like this: SQL> SELECT ENDDATE EDT, NEW_TIME(ENDDATE, 'EDT','PDT') FROM PROJECT; EDT ---------------01-APR-95 1200AM 01-MAY-95 1200AM 30-MAY-95 1200AM 30-JUN-95 1200AM 02-SEP-95 1200AM 17-JAN-96 1200AM 6 rows selected. NEW_TIME(ENDDATE ---------------31-MAR-95 0900PM 30-APR-95 0900PM 29-MAY-95 0900PM 29-JUN-95 0900PM 01-SEP-95 0900PM 16-JAN-96 0900PM
Like magic, all the times are in the new time zone and the dates are adjusted.
2.5. NEXT_DAY
NEXT_DAY finds the name of the first day of the week that is equal to or later than another specified date. For example, to send a report on the Friday following the first day of each event, we would type SQL> SELECT STARTDATE, NEXT_DAY(STARTDATE, 'FRIDAY') FROM PROJECT; STARTDATE NEXT_DAY( --------- --------01-APR-95 07-APR-95 02-APR-95 07-APR-95 15-MAY-95 19-MAY-95 01-JUN-95 02-JUN-95 01-JUL-95 07-JUL-95 03-SEP-95 08-SEP-95 6 rows selected.
Analysis:
The output tells us the date of the first Friday that occurs after our STARTDATE.
2.6. SYSDATE
SYSDATE returns the system time and date: SQL> SELECT DISTINCT SYSDATE FROM PROJECT; SYSDATE ---------------18-JUN-95 1020PM If we wanted to see where we stand today in a certain project, we can type SQL> SELECT * FROM PROJECT WHERE STARTDATE > SYSDATE; TASK STARTDATE ENDDATE -------------- --------- --------CODE WIDGET 01-JUL-95 02-SEP-95
TESTING
03-SEP-95 17-JAN-96
Now we can see what parts of the project start after today.
3. Arithmetic Functions
Many of the uses we have for the data we retrieve involve mathematics. Most implementations of SQL provide arithmetic functions similar to the functions covered here. The examples in this section use the NUMCHAR table: SQL> SELECT * FROM NUMCHAR; A B LASTNAME --------- -------- --------------3.1415 4 PURVIS -45 .707 TAYLOR 5 9 CHRISTINE -57.667 42 ADAMS 15 55 COSTALES -7.2 5.3 KONG 6 rows selected. FIRSTNAME --------------KELLY CHUCK LAURA FESTER ARMANDO MAJOR M CODE CHARCODE - --------- -------A 32 15 J 67 30 C 65 25 M 87 40 A 77 35 G 52 20
3.1. ABS
The ABS function returns the absolute value of the number we point to. For example: SQL> SELECT ABS(A) ABSOLUTE_VALUE FROM NUMCHAR; ABSOLUTE_VALUE -------------3.1415 45 5 57.667 15 7.2 6 rows selected. ABS changes all the negative numbers to positive and leaves positive numbers alone.
Analysis:
We would expect the COS of 45 degrees to be in the neighborhood of .707, not .525. To make this function work the way we would expect it to in a degree-oriented world, we need to convert degrees to radians. (When was the last time we heard a news broadcast report that a politician had done a pi-radian turn? We hear about a 180-degree turn.) Because 360 degrees - 2 pi radians, we can write SQL> SELECT A, COS(A* 0.01745329251994) FROM NUMCHAR; A COS(A*0.01745329251994) --------- ----------------------3.1415 .99849724 -45 .70710678 5 .9961947 -57.667 .5348391 15 .96592583 -7.2 .9921147
Analysis:
Note that the number 0.01745329251994 is radians divided by degrees. The trigonometric functions work as follows: SQL> SELECT A, COS(A*0.017453), COSH(A*0.017453) FROM NUMCHAR; A COS(A*0.017453) COSH(A*0.017453) --------- --------------- ---------------3.1415 .99849729 1.0015035 -45 .70711609 1.3245977 5 .99619483 1.00381 -57.667 .53485335 1.5507072 15 .96592696 1.0344645 -7.2 .99211497 1.0079058 6 rows selected. And
SQL> SELECT A, SIN(A*0.017453), SINH(A*0.017453) FROM NUMCHAR; A SIN(A*0.017453) SINH(A*0.017453) --------- --------------- ---------------3.1415 .05480113 .05485607 -45 -.7070975 -.8686535 5 .08715429 .0873758 -57.667 -.8449449 -1.185197 15 .25881481 .26479569 -7.2 -.1253311 -.1259926 6 rows selected. And SQL> SELECT A, TAN(A*0.017453), TANH(A*0.017453) FROM NUMCHAR; A TAN(A*0.017453) TANH(A*0.017453) --------- --------------- ---------------3.1415 .05488361 .05477372 -45 -.9999737 -.6557867 5 .08748719 .08704416 -57.667 -1.579769 -.7642948 15 .26794449 .25597369 -7.2 -.1263272 -.1250043 6 rows selected.
3.4. EXP
EXP enables we to raise e (e is a mathematical constant used in various formulas) to a power. Here's how EXP raises e by the values in column A: SQL> SELECT A, EXP(A) FROM NUMCHAR; A EXP(A) --------- --------3.1415 23.138549 -45 2.863E-20 5 148.41316 -57.667 9.027E-26 15 3269017.4 -7.2 .00074659 6 rows selected.
Analysis:
Notice how we can embed the function ABS inside the LN call. The other logarith-mic function, LOG, takes two arguments, returning the logarithm of the first argument in the base of the second. The following query returns the logarithms of column B in base 10. SQL> SELECT B, LOG(B, 10) FROM NUMCHAR; B LOG(B,10) ----------- --------4 1.660964 .707 -6.640962 9 1.0479516 42 .61604832 55 .57459287 5.3 1.3806894 6 rows selected.
3.6. MOD
We have encountered MOD before. On Day 3, "Expressions, Conditions, and Operators," we saw that the ANSI standard for the modulo operator % is sometimes implemented as the function MOD. Here's a query that returns a table showing the remainder of A divided by B: SQL> SELECT A, B, MOD(A,B) FROM NUMCHAR; A B MOD(A,B) --------- --------- --------3.1415 4 3.1415 -45 .707 -.459 5 9 5 -57.667 42 -15.667 15 55 15 -7.2 5.3 -1.9 6 rows selected.
3.7. POWER
To raise one number to the power of another, use POWER. In this function the first argument is raised to the power of the second: SQL> SELECT A, B, POWER(A,B) FROM NUMCHAR; ERROR: ORA-01428: argument '-45' is out of range
Analysis:
***### At first glance we are likely to think that the first argument can't be negative. But that impression can't be true, because a number like -4 can be raised to a power. Therefore, if the first number in the POWER function is negative, the second must be an integer. We can work around this problem by using CEIL (or FLOOR): SQL> SELECT A, CEIL(B), POWER(A,CEIL(B)) FROM NUMCHAR; A CEIL(B) POWER(A,CEIL(B)) --------- --------- ---------------3.1415 4 97.3976
3.8. SIGN
SIGN returns -1 if its argument is less than 0, 0 if its argument is equal to 0, and 1 if its argument is greater than 0, as shown in the following example: SQL> SELECT A, SIGN(A) FROM NUMCHAR; A SIGN(A) --------- --------3.1415 1 -45 -1 5 1 -57.667 -1 15 1 -7.2 -1 0 0 7 rows selected. We could also use SIGN in a SELECT WHERE clause like this: SQL> SELECT A FROM NUMCHAR WHERE SIGN(A) - 1; A --------3.1415 5 15
3.9. SQRT
The function SQRT returns the square root of an argument. Because the square root of a negative number is undefined, we cannot use SQRT on negative numbers. SQL> SELECT A, SQRT(A) FROM NUMCHAR; ERROR: ORA-01428: argument '-45' is out of range However, we can fix this limitation with ABS: SQL> SELECT ABS(A), SQRT(ABS(A)) FROM NUMCHAR; ABS(A) SQRT(ABS(A)) --------- -----------3.1415 1.7724277 45 6.7082039 5 2.236068 57.667 7.5938791 15 3.8729833 7.2 2.6832816 0 0 7 rows selected.
3.10. ROUND
The ROUND function rounds the column, expression, or value to n decimal places. If the second argument is 0 or is missing, the value is rounded to zero decimal places. If the second argument is 2, the value is rounded to two decimal places. Conversely, if the second argument is -2, the value is rounded to two decimal places to the left. The ROUND function can also be used with DATE function. SQL> SELECT A, ROUND(A) FROM NUMCHAR; A ROUND(A) ---------- ---------3.1415 3 -45 -45 5 5 -57.667 -58 15 15 -7.2 -7 6 rows selected. For example if we want to display records with two decimals. SQL> SELECT A, ROUND(A,2) FROM NUMCHAR; A ROUND(A,2) ---------- ---------3.1415 3.14 -45 -45 5 5 -57.667 -57.67 15 15 -7.2 -7.2 6 rows selected.
3.11. TRUNC
The TRUNC function truncates the column, expression, or value to n decimal places. The TRUNC function works with arguments similar to those of the ROUND function. If the second argument is 0 or is missing, the value is truncated to zero decimal places. It the second argument is 2, the value is truncated to two decimal places. Conversely, if the second argument is -2, the value is rounded to two decimal places to the left. SQL> SELECT A, TRUNC(A) FROM NUMCHAR; A TRUNC(A) ---------- ---------3.1415 3 -45 -45 5 5 -57.667 -57 15 15 -7.2 -7 6 rows selected. For example if we want to display records with two decimals. This works same as the ROUND function but is doesnt round the values as ROUND function does. SQL> SELECT A, TRUNC(A,2) FROM NUMCHAR; A TRUNC(A,2) ---------- ----------
6 rows selected.
4. Character Functions
Many implementations of SQL provide functions to manipulate characters and strings of characters. This section covers the most common character functions. The examples in this section use the table NUMCHAR.
4.1. CHR
CHR returns the character equivalent of the number it uses as argument. The character it returns depends on the character set of the database. For this example the database is set to ASCII. The column CODE includes numbers. SQL> SELECT CODE, CHR(CODE) FROM NUMCHAR; CODE CH --------- -32 67 C 65 A 87 W 77 M 52 4 6 rows selected. The space opposite the 32 shows that 32 is a space in the ASCII character set.
4.2. CONCAT
When we learned about operators. The || symbol splices two strings together, as does CONCAT. It works like this: SQL> SELECT CONCAT(FIRSTNAME, LASTNAME) "FIRST AND LAST NAMES" FROM NUMCHAR; FIRST AND LAST NAMES -----------------------KELLY PURVIS CHUCK TAYLOR LAURA CHRISTINE FESTER ADAMS ARMANDO COSTALES MAJOR KONG 6 rows selected.
Analysis:
Quotation marks surround the multiple-word alias FIRST AND LAST NAMES. Again, it is safest to check our implementation to see if it allows multiple-word aliases. Also notice that even though the table looks like two separate columns, what we are seeing is one column. The first value we concatenated, FIRSTNAME, is 15 characters wide. This operation retained all the characters in the field.
4.3. INITCAP
INITCAP capitalizes the first letter of a word and makes all other characters lowercase. SQL> SELECT FIRSTNAME BEFORE, INITCAP(FIRSTNAME) AFTER FROM NUMCHAR;
BEFORE AFTER -------------- ---------KELLY Kelly CHUCK Chuck LAURA Laura FESTER Fester ARMANDO Armando MAJOR Major 6 rows selected.
-------------- -------------------PURVIS *****PURVIS TAYLOR *****TAYLOR CHRISTINE *****CHRISTINE ADAMS *****ADAMS COSTALES *****COSTALES KONG *****KONG 6 rows selected.
Analysis:
Why were only five pad characters added? Remember that the LASTNAME column is 15 characters wide and that LASTNAME includes the blanks to the right of the characters that make up the name. Some column data types eliminate padding characters if the width of the column value is less than the total width allocated for the column. Check our implementation. Now try the right side: SQL> SELECT LASTNAME, RPAD(LASTNAME,20,'*') FROM NUMCHAR; LASTNAME RPAD(LASTNAME,20,'*' --------------- -------------------PURVIS PURVIS ***** TAYLOR TAYLOR ***** CHRISTINE CHRISTINE ***** ADAMS ADAMS ***** COSTALES COSTALES ***** KONG KONG ***** 6 rows selected.
Analysis:
Here we see that the blanks are considered part of the field name for these operations. The next two functions come in handy in this type of situation.
KONG KONG**************** 6 rows selected. The output proves that trim is working. Now try LTRIM: SQL> SELECT LASTNAME, LTRIM(LASTNAME, 'C') FROM NUMCHAR; LASTNAME LTRIM(LASTNAME, --------------- --------------PURVIS PURVIS TAYLOR TAYLOR CHRISTINE HRISTINE ADAMS ADAMS COSTALES OSTALES KONG KONG 6 rows selected. Note the missing Cs in the third and fifth rows.
4.7. REPLACE
REPLACE does just that. Of its three arguments, the first is the string to be searched. The second is the search key. The last is the optional replacement string. If the third argument is left out or NULL, each occurrence of the search key on the string to be searched is removed and is not replaced with anything. SQL> SELECT LASTNAME, REPLACE(LASTNAME, 'ST') REPLACEMENT FROM NUMCHAR; LASTNAME REPLACEMENT --------------- --------------PURVIS PURVIS TAYLOR TAYLOR CHRISTINE CHRIINE ADAMS ADAMS COSTALES COALES KONG KONG 6 rows selected. If we have a third argument, it is substituted for each occurrence of the search key in the target string. For example: SQL> SELECT LASTNAME, REPLACE(LASTNAME, 'ST','**') REPLACEMENT FROM NUMCHAR; LASTNAME REPLACEMENT --------------- -----------PURVIS PURVIS TAYLOR TAYLOR CHRISTINE CHRI**INE ADAMS ADAMS COSTALES CO**ALES KONG KONG 6 rows selected. If the second argument is NULL, the target string is returned with no changes. SQL> SELECT LASTNAME, REPLACE(LASTNAME, NULL) REPLACEMENT FROM NUMCHAR; LASTNAME REPLACEMENT --------------- --------------PURVIS PURVIS TAYLOR TAYLOR CHRISTINE CHRISTINE ADAMS ADAMS COSTALES COSTALES KONG KONG 6 rows selected.
4.8. SUBSTR
This three-argument function enables we to take a piece out of a target string. The first argument is the target string. The second argument is the position of the first character to be output. The third argument is the number of characters to show. SQL> SELECT FIRSTNAME, SUBSTR(FIRSTNAME,2,3) FROM NUMCHAR; FIRSTNAME SUB --------------- --kelly ell CHUCK HUC LAURA AUR FESTER EST ARMANDO RMA MAJOR AJO 6 rows selected. If we use a negative number as the second argument, the starting point is determined by counting backwards from the end, like this: SQL> SELECT FIRSTNAME, SUBSTR(FIRSTNAME,-13,2) FROM NUMCHAR; FIRSTNAME SU --------------- -kelly ll CHUCK UC LAURA UR FESTER ST ARMANDO MA MAJOR JO 6 rows selected.
Analysis:
Remember the character field FIRSTNAME in this example is 15 characters long. That is why we used a -13 to start at the third character. Counting back from 15 puts us at the start of the third character, not at the start of the second. If we don't have a third argument, use the following statement instead: SQL> SELECT FIRSTNAME, SUBSTR(FIRSTNAME,3) FROM NUMCHAR; FIRSTNAME SUBSTR(FIRSTN --------------- ------------kelly lly CHUCK UCK LAURA URA FESTER STER ARMANDO MANDO MAJOR JOR 6 rows selected.
Analysis:
Reading the results of the preceding output is difficult--Social Security numbers usually have dashes. Now try something fancy and see whether we like the results:
4. 9. TRANSLATE
The function TRANSLATE takes three arguments: the target string, the FROM string, and the TO string. Elements of the target string that occur in the FROM string are translated to the corresponding element in the TO string. SQL> SELECT FIRSTNAME, TRANSLATE(FIRSTNAME '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ
'NNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAA) FROM NUMCHAR; FIRSTNAME --------------kelly CHUCK LAURA FESTER ARMANDO MAJOR TRANSLATE(FIRST --------------kelly AAAAA AAAAA AAAAAA AAAAAAA AAAAA
4.10. INSTR
To find out where in a string a particular pattern occurs, use INSTR. Its first argument is the target string. The second argument is the pattern to match. The third and forth are numbers representing where to start looking and which match to report. This example returns a number representing the first occurrence of O starting with the second character: SQL> SELECT LASTNAME, INSTR(LASTNAME, 'O', 2, 1) FROM NUMCHAR; LASTNAME INSTR(LASTNAME,'O',2,1) --------------- ----------------------PURVIS 0 TAYLOR 5 CHRISTINE 0 ADAMS 0 COSTALES 2 KONG 2 6 rows selected.
Analysis:
The default for the third and fourth arguments is 1. If the third argument is negative, the search starts at a position determined from the end of the string, instead of from the beginning.
4.11. LENGTH
LENGTH returns the length of its lone character argument. For example: SQL> SELECT FIRSTNAME, LENGTH(RTRIM(FIRSTNAME)) FROM NUMCHAR; FIRSTNAME LENGTH(RTRIM(FIRSTNAME)) --------------- -----------------------kelly 5 CHUCK 5 LAURA 5 FESTER 6 ARMANDO 7 MAJOR 5 6 rows selected.
Analysis:
Note the use of the RTRIM function. Otherwise, LENGTH would return 15 for every value.
5. Conversion Functions
These three conversion functions provide a handy way of converting one type of data to another. These examples use the table NUMCHAR.
5.1. TO_CHAR
The primary use of TO_CHAR is to convert a number into a character. Different implementations may also use it to convert other data types, like Date, into a character, or to include different formatting arguments. The next example illustrates the primary use of TO_CHAR: SQL> SELECT CODE, TO_CHAR(CODE) FROM NUMCHAR; CODE --------32 67 65 87 77 52 TO_CHAR(CODE) ---------------32 67 65 87 77 52
Not very exciting or convincing. Here's how to verify that the function returned a character string: SQL> SELECT CODE, LENGTH(TO_CHAR(CODE)) FROM NUMCHAR; CODE -----32 67 65 87 77 52 LENGTH(TO_CHAR(CODE)) -----------------------2 2 2 2 2 2
Analysis:
LENGTH of a number would have returned an error. Notice the difference between TO_CHAR and the CHR function discussed earlier. CHR would have turned this number into a character or a symbol, depending on the character set.
5.2. TO_NUMBER
TO_NUMBER is the companion function to TO_CHAR, and of course, it converts a string into a number. For example: SQL> SELECT CODE, CODE * TO_NUMBER(CHARCODE) FROM NUMCHAR; CODE CODE * TO_NUMBER(CHARCODE) ---- -------------------------32 480 67 2010 65 1625 87 3480 77 2695 52 1040
Analysis:
This test would have returned an error if TO_NUMBER had returned a character.
5.3. TO_DATE
To converts character string to data format for this task we use TO_DATE function. SQL> SELECT ENAME,HIREDATE FROM EMP WHERE HIREDATE = TO_DATE('February 22, 1981','Month dd, YYYY');
Analysis:
February 22, 1981 is a character string and cant be compared with a Data format column, for this we have to convert the character string value ('February 22, 1981') into a date format value. This is done by the TO_DATE function as we have seen in the above example. This test would have returned an error if TO_DATE had returned a character.
5.4 NVL
To convert a null value to an actual value, use NVL function to convert data types like date, character and number.
Data Type
NUMBER DATE CHAR or VARCHAR2
Conversion Example
NVL(number_column,9) NVL(date_column,01-JAN-06) NVL(character_column,Unavailable)
SQL> SELECT SAL, COMM, SAL+NVL(COMM,0) FROM EMP; SAL COMM SAL+NVL(COMM,0) ---------- ---------- --------------800 800 1600 300 1900 1250 500 1750 2975 2975 1250 1400 2650 2850 2850 2450 2450 3000 3000 5000 5000 1500 0 1500 1100 1100 950 950 3000 3000 1300 1300 If we dont use NVL function the result will be as shown in the below example: SQL> SELECT SAL, SAL+COMM FROM EMP; SAL SAL+COMM ---------- ---------800 1600 1900 1250 1750 2975 1250 2650 2850 2450 3000 5000 1500 1500 1100 950 3000 1300
5.5. DECODE
The decode function decodes an expression in a way similar to the IF-THEN-ELSE logic used in various languages. The DECODE function decodes expression after comparing it to each search value. If the expression is same as search, result is returned. The syntax for the DECODE function is: SQL> DECODE(EXPRESSION, SEARCH, RESULT[, SEARCH, RESULT]... [, DEFAULT]) SQL> SELECT ENAME, JOB, SAL, DECODE(JOB, 'CLERK', SAL*1.1, 'MANAGER', SAL*1.15, 'SALESMAN', SAL*1.20,SAL) "REVISED_SALARY" FROM EMP; ENAME ---------SMITH ALLEN WARD JONES MARTIN BLAKE CLARK SCOTT KING TURNER ADAMS JAMES FORD MILLER JOB SAL REVISED_SALARY --------- ---------- -------------CLERK 800 880 SALESMAN 1600 1920 SALESMAN 1250 1500 MANAGER 2975 3421.25 SALESMAN 1250 1500 MANAGER 2850 3277.5 MANAGER 2450 2817.5 ANALYST 3000 3000 PRESIDENT 5000 5000 SALESMAN 1500 1800 CLERK 1100 1210 CLERK 950 1045 ANALYST 3000 3000 CLERK 1300 1430
14 rows selected.
Analysis:
The above decode statement is equivalent to the following IF-THEN-ELSE statement: IF JOB = CLERK THEN SAL = SAL * 1.10; ELSIF JOB = MANAGER THEN SAL = SAL * 1.15; ELSIF JOB = SALESMAN THEN SAL = SAL * 1.20; ELSE SAL = SAL'; END IF; The DECODE function will compare each JOB value, one by one.
6. Miscellaneous Functions
Here are three miscellaneous functions we may find useful.
SALES SALES
Analysis:
Notice GREATEST found the word closest to the end of the alphabet. Notice also a seemingly unnecessary FROM and three occurrences of SALES. If FROM is missing, we will get an error. Every SELECT needs a FROM. The particular table used in the FROM has three rows, so the function in the SELECT clause is performed for each of them. SQL> SELECT LEAST(10, 20, 30, 40) FROM CONVERT; LEAST(10, 20, 30, 40) ----------------------10 10 10 10 As we can see, GREATEST and LEAST also work with numbers.
6.2. USER
USER returns the character name of the current user of the database. SQL> SELECT USER FROM DEPT; USER -----------------------------WILSHIRE WILSHIRE WILSHIRE WILSHIRE There really is only one of us. Again, the echo occurs because of the number of rows in the table. USER is similar to the date functions explained earlier today. Even though USER is not an actual column in the table, it is selected for each row that is contained in the table.
Clauses in SQL
WHERE ORDER BY GROUP BY HAVING To get a feel for where these functions fit in, examine the general syntax for a SELECT statement:
Syntax:
SELECT [DISTINCT | ALL] { * | { [schema.]{table | view | snapshot}.* | expr } [ [AS] c_alias ] [, { [schema.]{table | view | snapshot}.* | expr } [ [AS] c_alias ] ] ... } FROM [schema.]{table | view | snapshot}[@dblink] [t_alias] [, [schema.]{table | view | snapshot}[@dblink] [t_alias] ] ... [WHERE condition ] [GROUP BY expr [, expr] ... [HAVING condition] ] [{UNION | UNION ALL | INTERSECT | MINUS} SELECT command ] [ORDER BY {expr|position} [ASC | DESC] [, {expr|position} [ASC | DESC]] ...] Note: In my experience with SQL, the ANSI standard is really more of an ANSI "suggestion." The preceding syntax will generally work with any SQL engine, but we may find some slight variations.
1. WHERE Clause
Using just SELECT and FROM, we are limited to returning every row in a table. For example, using these two key words on the EMP table, we get all seven rows: With WHERE in our vocabulary, we can be more selective. To find all the employees we wrote with a value of more than 2000 salary, write this: SQL> SELECT * FROM EMP WHERE SAL > 2000; The WHERE clause returns the six instances in the table that meets the required condition: EMPNO ----7566 7698 7782 7788 7839 7902 ENAME -------JONES BLAKE CLARK SCOTT KING FORD JOB --------MANAGER MANAGER MANAGER ANALYST PRESIDENT ANALYST MGR ---7839 7839 7839 7566 7566 HIREDATE --------02-APR-81 01-MAY-81 09-JUN-81 19-APR-87 17-NOV-81 03-DEC-81 SAL ---2975 2850 2450 3000 5000 3000 COMM DEPTNO ----- -----20 30 10 20 10 20
WHERE can also be given on the following table with a string condition, we can ask question that, Where's SMITH? SQL> SELECT EMPNO AS "WHERE'S SMITH?" FROM EMP WHERE ENAME = 'SMITH'; WHERE'S SMITH? -------------7369 This query shows that the column used in the condition of the WHERE statement does not have to be mentioned in the SELECT clause. In this example we selected the empno column but used WHERE on the name, which is perfectly legal. Also notice the AS on the SELECT line. AS is an optional assignment operator, assigning the alias WHERE'S SMITH? to EMPNO. We might never see AS again, because it involves extra typing. SQL> SELECT EMPNO "WHERE'S SMITH?"
FROM EMP WHERE ENAME = 'SMITH'; And get the same result as the previous query without using AS: WHERE'S SMITH? -------------7369 After SELECT and FROM, WHERE is the third most frequently used SQL term.
2. ORDER BY Clause
From time to time we presented the results of our query in some kind of order. As we know, however, SELECT FROM gives us a listing, and unless we have defined a primary key, our query comes out in the order we entered. Consider a beefed-up CHECKS table: The ORDER BY clause gives us a way of ordering our results. For example, to order the preceding listing by EMPNO, we would use the following ORDER BY clause: SQL> SELECT * FROM EMP ORDER BY EMPNO; EMPNO ----7369 7499 7521 7566 7654 7698 7782 7788 7839 7844 7876 7900 7902 7934 ENAME -------SMITH ALLEN WARD JONES MARTIN BLAKE CLARK SCOTT KING TURNER ADAMS JAMES FORD MILLER JOB --------CLERK SALESMAN SALESMAN MANAGER SALESMAN MANAGER MANAGER ANALYST PRESIDENT SALESMAN CLERK CLERK ANALYST CLERK MGR ---7902 7698 7698 7839 7698 7839 7839 7566 7698 7788 7698 7566 7782 HIREDATE --------17-DEC-80 20-FEB-81 22-FEB-81 02-APR-81 28-SEP-81 01-MAY-81 09-JUN-81 19-APR-87 17-NOV-81 08-SEP-81 23-MAY-87 03-DEC-81 03-DEC-81 23-JAN-82 SAL ---800 1600 1250 2975 1250 2850 2450 3000 5000 1500 1100 950 3000 1300 COMM DEPTNO ----- -----20 300 30 500 30 20 1400 30 30 10 20 10 0 30 20 30 20 10
14 rows selected. Now the data is ordered the way we want it, not the way in which it was entered. As the following example shows, ORDER requires BY; BY is not optional. SQL> SELECT * FROM EMP ORDER EMPNO; SELECT * FROM EMP ORDER EMPNO * ERROR at line 1: ORA-00924: missing BY keyword What if we want to list the data in reverse order, with the highest number or letter first? We're in luck! The following query generates a list of EMPLOYEES that stars at the end of the alphabet: SQL> SELECT * FROM EMP ORDER BY ENAME DESC; EMPNO ----7521 7844 7369 7788 7934 ENAME -------WARD TURNER SMITH SCOTT MILLER JOB --------SALESMAN SALESMAN CLERK ANALYST CLERK MGR ---7698 7698 7902 7566 7782 HIREDATE --------22-FEB-81 08-SEP-81 17-DEC-80 19-APR-87 23-JAN-82 SAL ---1250 1500 800 3000 1300 COMM DEPTNO ----- -----500 30 0 30 20 20 10
1400
300
30 10 20 30 20 10 30 30 20
14 rows selected.
Analysis:
The DESC at the end of the ORDER BY clause orders the list in descending order instead of the default (ascending) order. The rarely used, optional keyword ASC appears in the following statement: SQL> SELECT ENAME, HIREDATE FROM EMP ORDER BY EMPNO ASC; 4 ---------- ---------SMITH 17-DEC-80 ALLEN 20-FEB-81 WARD 22-FEB-81 JONES 02-APR-81 MARTIN 28-SEP-81 BLAKE 01-MAY-81 CLARK 09-JUN-81 SCOTT 19-APR-87 KING 17-NOV-81 TURNER 08-SEP-81 ADAMS 23-MAY-87 JAMES 03-DEC-81 FORD 03-DEC-81 MILLER 23-JAN-82 14 rows selected.
Analysis:
The ordering in this list is identical to the ordering of the list at the beginning of the section (without ASC) because ASC is the default. This query also shows that the expression used after the ORDER BY clause does not have to be in the SELECT statement. Although we selected only ENAME and HIREDATE, we were still able to order the list by EMP. We can also use ORDER BY on more than one field. To order EMP by ENAME and HIREDATE, the query as follows: SQL> SELECT * FROM EMP ORDER BY ENAME, HIREDATE; EMPNO ----7876 7499 7698 7782 7902 7900 7566 7839 7654 7934 7788 7369 ENAME -------ADAMS ALLEN BLAKE CLARK FORD JAMES JONES KING MARTIN MILLER SCOTT SMITH JOB --------CLERK SALESMAN MANAGER MANAGER ANALYST CLERK MANAGER PRESIDENT SALESMAN CLERK ANALYST CLERK MGR ---7788 7698 7839 7839 7566 7698 7839 7698 7782 7566 7902 HIREDATE --------23-MAY-87 20-FEB-81 01-MAY-81 09-JUN-81 03-DEC-81 03-DEC-81 02-APR-81 17-NOV-81 28-SEP-81 23-JAN-82 19-APR-87 17-DEC-80 SAL ---1100 1600 2850 2450 3000 950 2975 5000 1250 1300 3000 800 COMM DEPTNO ----- -----20 300 30 30 10 20 30 20 10 1400 30 10 20 20
7844 7521
TURNER WARD
SALESMAN SALESMAN
1500 1250
0 500
30 30
14 rows selected.
Analysis:
In the previous ORDER BY, the EMP was in the order 7876 to 7521. Adding the field HIREDATE to the ORDER BY clause puts the entries in alphabetical order according to HIREDATE. Does the order of multiple columns in the ORDER BY clause make a difference? Try the same query again but reverse ENAME and HIREDATE: SQL> SELECT * FROM EMP ORDER BY HIREDATE, ENAME; EMPNO ----7369 7499 7521 7566 7698 7782 7844 7654 7839 7902 7900 7934 7788 7876 ENAME -------SMITH ALLEN WARD JONES BLAKE CLARK TURNER MARTIN KING FORD JAMES MILLER SCOTT ADAMS JOB --------CLERK SALESMAN SALESMAN MANAGER MANAGER MANAGER SALESMAN SALESMAN PRESIDENT ANALYST CLERK CLERK ANALYST CLERK MGR ---7902 7698 7698 7839 7839 7839 7698 7698 7566 7698 7782 7566 7788 HIREDATE --------17-DEC-80 20-FEB-81 22-FEB-81 02-APR-81 01-MAY-81 09-JUN-81 08-SEP-81 28-SEP-81 17-NOV-81 03-DEC-81 03-DEC-81 23-JAN-82 19-APR-87 23-MAY-87 SAL ---800 1600 1250 2975 2850 2450 1500 1250 5000 3000 950 1300 3000 1100 COMM DEPTNO ----- -----20 300 30 500 30 20 30 10 0 30 1400 30 10 20 30 10 20 20
14 rows selected.
Analysis:
As we probably guessed, the results are completely different. Here's how to list one column in alphabetical order and list the second column is not in alphabetical order: SQL> SELECT * FROM EMP ORDER BY ENAME ASC, HIREDATE DESC; EMPNO ----7876 7499 7698 7782 7902 7900 7566 7839 7654 7934 7788 7369 7844 7521 ENAME -------ADAMS ALLEN BLAKE CLARK FORD JAMES JONES KING MARTIN MILLER SCOTT SMITH TURNER WARD JOB --------CLERK SALESMAN MANAGER MANAGER ANALYST CLERK MANAGER PRESIDENT SALESMAN CLERK ANALYST CLERK SALESMAN SALESMAN MGR ---7788 7698 7839 7839 7566 7698 7839 7698 7782 7566 7902 7698 7698 HIREDATE --------23-MAY-87 20-FEB-81 01-MAY-81 09-JUN-81 03-DEC-81 03-DEC-81 02-APR-81 17-NOV-81 28-SEP-81 23-JAN-82 19-APR-87 17-DEC-80 08-SEP-81 22-FEB-81 SAL ---1100 1600 2850 2450 3000 950 2975 5000 1250 1300 3000 800 1500 1250 COMM DEPTNO ----- -----20 300 30 30 10 20 30 20 10 1400 30 10 20 20 0 30 500 30
14 rows selected.
Analysis:
In this example ENAME is sorted alphabetically, and HIREDATE appears in descending order.
Tip: If we know that a column we want to order our results by is the first column in a table, then we can type ORDER BY 1 in place of spelling out the column name. See the following example. SQL> SELECT * FROM EMP ORDER BY 1; EMPNO ----7369 7499 7521 7566 7654 7698 7782 7788 7839 7844 7876 7900 7902 7934 ENAME -------SMITH ALLEN WARD JONES MARTIN BLAKE CLARK SCOTT KING TURNER ADAMS JAMES FORD MILLER JOB --------CLERK SALESMAN SALESMAN MANAGER SALESMAN MANAGER MANAGER ANALYST PRESIDENT SALESMAN CLERK CLERK ANALYST CLERK MGR ---7902 7698 7698 7839 7698 7839 7839 7566 7698 7788 7698 7566 7782 HIREDATE --------17-DEC-80 20-FEB-81 22-FEB-81 02-APR-81 28-SEP-81 01-MAY-81 09-JUN-81 19-APR-87 17-NOV-81 08-SEP-81 23-MAY-87 03-DEC-81 03-DEC-81 23-JAN-82 SAL ---800 1600 1250 2975 1250 2850 2450 3000 5000 1500 1100 950 3000 1300 COMM DEPTNO ----- -----20 300 30 500 30 20 1400 30 30 10 20 10 0 30 20 30 20 10
14 rows selected.
Analysis:
This result is identical to the result produced by the SELECT statement that we used earlier today: SQL> SELECT * FROM EMP ORDER BY EMPNO;
4. GROUP BY Clause
How to use aggregate functions (COUNT, SUM, AVG, MIN, and MAX). If we wanted to find the total amount of money spent on EMP table, we would type: Then we would type: SQL> SELECT SUM(SAL) FROM EMP; SUM --------------29025
Analysis:
This statement returns the sum of the column SAL. What if we wanted to find out how much we have spent on each EMPNO? SQL helps us with the GROUP BY clause. To find out whom we have paid and how much, we would query like this: SQL> SELECT DEPTNO, SUM(SAL) FROM EMP GROUP BY DEPTNO; DEPTNO SUM(SAL) ---------- ---------10 8750 20 10875 30 9400
Analysis:
The SELECT clause has a normal column selection, DEPTNO, followed by the aggregate function SUM(AMOUNT). If we had tried this query with only the FROM EMP that follows, here's what we would see:
SQL> SELECT DEPTNO, SUM(SAL) FROM EMP; Dynamic SQL Error -SQL error code - -104 -invalid column reference
Analysis:
SQL is complaining about combination of the normal column and the aggregate function. This condition requires the GROUP BY clause. GROUP BY runs the aggregate function described in the SELECT statement for each grouping of the column that follows the GROUP BY clause. The table EMP returned 14 rows when queried with SELECT * FROM EMP. The query on the same table, SELECT DEPTNO, SUM(SAL) FROM EMP GROUP BY DEPTNO, took the 14 rows in the table and made 3 groupings, returning the SUM of each grouping. Suppose we wanted to know how many employees are working under which Manager along with total salary. Can we use more than one aggregate function? SQL> SELECT MGR, SUM(SAL), COUNT(EMPNO) FROM EMP GROUP BY MGR; MGR SUM(SAL) COUNT(EMPNO) ---------- ---------- -----------7566 6000 2 7698 6550 5 7782 1300 1 7788 1100 1 7839 8275 3 7902 800 1 5000 1 7 rows selected.
Analysis:
This SQL is becoming increasingly useful. In the preceding example, we were able to perform group functions on unique groups using the GROUP BY clause. Also notice that the results were ordered by MGR. GROUP BY also acts like the ORDER BY clause. What would happen if we tried to group by more than one column? Try this: SQL> SELECT MGR, SUM(SAL), COUNT(EMPNO) FROM EMP GROUP BY MGR, DEPTNO; MGR SUM(SAL) COUNT(EMPNO) ---------- ---------- -----------5000 1 7566 6000 2 7698 6550 5 7782 1300 1 7788 1100 1 7839 2450 1 7839 2975 1 7839 2850 1 7902 800 1 9 rows selected.
Analysis:
The output has gone from 3 groupings of 14 rows to 9 groupings. What is different about the one grouping with more than one Employee associated with it? Look at the entries for MGR: SQL> SELECT MGR, DEPTNO FROM EMP WHERE MGR = 7698;
Analysis:
We see that the combination of MGR and DEPTNO creates identical entities, which SQL groups together into one line with the GROUP BY clause. The other rows produce unique combinations of MGR and DEPTNO and are assigned their own unique groupings. The next example finds the largest and smallest salaries, grouped by DEPTNO: SQL> SELECT MIN(SAL), MAX(SAL) FROM EMP GROUP BY DEPTNO; MIN(SAL) MAX(SAL) ---------- ---------1300 5000 800 3000 950 2850 Here's what will happen if we try to include in the SELECT statement a column that has several different values within the group formed by GROUP BY: SQL> SELECT MGR, MAX(SAL), MIN(SAL) FROM EMP GROUP BY DEPTNO; select mgr,min(sal),max(sal) from emp group by deptno * ERROR at line 1: ORA-00979: not a GROUP BY expression
Analysis:
This query tries to group EMP by DEPTNO. When the query finds two records with the same DEPTNO but different MGRs, such as the rows that have 30 as a DEPTNO but have MGRs of LOCAL UTILITIES and 30, it throws an error. The rule is, don't use the SELECT statement on columns that have multiple values for the GROUP BY clause column. The reverse is not true. We can use GROUP BY on columns not mentioned in the SELECT statement. For example: SQL> SELECT MGR, COUNT(SAL) FROM EMP GROUP BY MGR, SAL; MGR COUNT(SAL) ---------- ---------1 7566 2 7698 1 7698 1 7698 1 7698 2 7782 1 7788 1 7839 1 7839 1 7839 1 7902 1 12 rows selected.
Analysis:
This simple query shows how many EMP have identical amounts to the same MGR. Its real purpose is to show that we can use SAL in the GROUP BY clause, even though it is not mentioned in the SELECT clause. Try moving SAL out of the GROUP BY clause and into the SELECT clause, like this: SQL> SELECT MGR, SAL, COUNT(SAL) FROM EMP GROUP BY MGR; select mgr,sal,count(sal) from emp group by mgr * ERROR at line 1: ORA-00979: not a GROUP BY expression
Analysis:
SQL cannot run the query, which makes sense if we play the part of SQL for a moment. Say we had to group the following lines: SQL> SELECT MGR, SAL, ENAME FROM EMP WHERE MGR = 7698; MGR SAL ENAME ---------- ---------- ---------7698 1600 ALLEN 7698 1250 WARD 7698 1250 MARTIN 7698 1500 TURNER 7698 950 JAMES If the user asked we to output all 3 columns and group by MGR only, where would we put the unique ENAME? Remember we have only one row per group when we use GROUP BY. SQL can't do two things at once, so it complains: Error #31: Can't do two things at once.
5. HAVING Clause
How can we qualify the data used in our GROUP BY clause? Use the same table EMP and try this: If we wanted to group the output into divisions and show the average salary in each division, we would type: SQL> SELECT DEPTNO, AVG(SAL) FROM EMP GROUP BY DEPTNO; DEPTNO AVG(SAL) ---------- ---------10 2916.66667 20 2175 30 1566.66667 The following statement qualifies this query to return only those departments with average salaries under 2500: SQL> SELECT DEPTNO, AVG(SAL) FROM EMP WHERE AVG(SAL) < 2500 GROUP BY DEPTNO; SELECT DEPTNO,AVG(SAL) FROM EMP WHERE AVG(SAL) < 2500 GROUP BY DEPTNO * ERROR at line 1: ORA-00934: group function is not allowed here
Analysis:
This error occurred because WHERE does not work with aggregate functions. To make this query work, we need HAVING clause. If we type the following query, we get what we ask for:
SQL> SELECT DEPTNO, AVG(SAL) FROM EMP GROUP BY DEPTNO HAVING AVG(SAL) < 2500; DEPTNO AVG(SAL) ---------- ---------20 2175 30 1566.66667
Analysis:
HAVING enables us to use aggregate functions in a comparison statement, providing for aggregate functions what WHERE provides for individual rows. Does HAVING works with non-aggregate expressions? Try this: SQL> SELECT DEPTNO, AVG(SAL),SAL FROM EMP GROUP BY DEPTNO, SAL HAVING SAL < 2500; DEPTNO AVG(SAL) SAL ---------- ---------- ---------10 1300 1300 10 2450 2450 20 800 800 20 1100 1100 30 1500 1500 30 1600 1600 30 950 950 30 1250 1250 8 rows selected.
Analysis:
Why is this result different from last query? HAVING AVG(SAL) < 2500 clause evaluated each grouping and returned only those with an average salary of under 2500, just what we expected. HAVING SAL < 2500, on other hand, had a different outcome. If the user asks us to evaluate and return groups of divisions where SAL < 2500, we would examine each group and reject those where an individual SAL is greater than 2500. Can we use more than one condition in our HAVING clause? Try this: SQL> SELECT DEPTNO, AVG(SAL),AVG(COMM) FROM EMP GROUP BY DEPTNO HAVING AVG(SAL)<2500 AND AVG(COMM)<1000; DEPTNO AVG(SAL) AVG(COMM) ---------- ---------- ---------30 1566.66667 550
Analysis:
The following table is grouped by DEPTNO. It shows all the teams with SAL averages below 2500 and COMM averages below 1000. We can also use an aggregate function in the HAVING clause that was not in the SELECT statement. For example: SQL> SELECT DEPTNO, AVG(SAL),AVG(COMM) FROM EMP GROUP BY DEPTNO HAVING COUNT(DEPTNO) > 3; DEPTNO AVG(SAL) AVG(COMM) ---------- ---------- ---------20 2175
30 1566.66667
550
Analysis:
This query returns the number of DEPTNOs with more than 3 departments. COUNT(DEPTNO) is not used in the SELECT statement but still functions as expected in the HAVING clause. The other logical operators all work well within the HAVING clause. Consider this: SQL> SELECT DEPTNO,MIN(SAL),MAX(SAL) FROM EMP GROUP BY DEPTNO HAVING AVG(SAL) > 500 OR MIN(SAL) > 500; DEPTNO MIN(SAL) MAX(SAL) ---------- ---------- ---------10 1300 5000 20 800 3000 30 950 2850 The operator IN also works in a HAVING clause, as demonstrated here: SQL> SELECT DEPTNO,AVG(SAL) FROM EMP GROUP BY DEPTNO HAVING DEPTNO IN (10,30); DEPTNO AVG(SAL) ---------- ---------10 2916.66667 30 1566.66667 Always follow a select with aggregate function with a group clause. We can order by a column which is not in the select clause but we cannot forget the term in SELECT clause in the ORDER BY clause. So also we need to have a GROUP BY clause when we specify an aggregate function in the SELECT clause otherwise it throws an error. We can avoid this by putting a where on 1 select attribute and a GROUP BY or ORDER BY on the other.
6. Combining Clauses
Through some composite we demonstrate how combinations of clauses perform together.
Example 6.1
Find all the employees written for MGR and DEPT in the EMP table and order them by ENAME. SQL> SELECT MGR, ENAME FROM EMP WHERE MGR = 7698 OR ENAME LIKE 'A%' ORDER BY ENAME; MGR ---------7788 7698 7698 7698 7698 7698 ENAME ---------ADAMS ALLEN JAMES MARTIN TURNER WARD
6 rows selected.
Analysis:
Note the use of LIKE to find the ENAME that started with A. With the use of OR, data was returned if the WHERE clause met either one of the two conditions. What if we asked for the same information and group it by MGR? The query would look something like this: SQL> SELECT MGR, ENAME
FROM EMP WHERE MGR = 7698 OR ENAME LIKE 'A%' GROUP BY MGR ORDER BY ENAME;
Analysis:
This query would not work because the SQL engine would not know what to do with the remarks. Remember that whatever columns we put in the SELECT clause must also be in the GROUP BY clause--unless we don't specify any columns in the SELECT clause.
Example 6.2
Using the table EMP, find the salary of everyone with less than 2500. Order the results by ENAME. SQL> SELECT ENAME, SAL FROM EMP WHERE SAL < 2500 ORDER BY ENAME; ENAME SAL ---------- ---------ADAMS 1100 ALLEN 1600 CLARK 2450 JAMES 950 MARTIN 1250 MILLER 1300 SMITH 800 TURNER 1500 WARD 1250 9 rows selected.
Analysis:
This query is straight forward and enables us to use our new found skills with WHERE and ORDER BY.
Example 6.3
Again, using EMP, display DEPTNO, AVG(SAL) and AVG(COMM) on each DEPTNO: SQL> SELECT DEPTNO,AVG(SAL),AVG(COMM) FROM EMP GROUP BY DEPTNO; DEPTNO AVG(SAL) AVG(COMM) ---------- ---------- ---------10 2916.66667 20 2175 30 1566.66667 550
1400
300
30 10 20 30 20 10 30 30 20
GROUP BY and HAVING are normally seen in the company of aggregates: SQL> SELECT MGR,SUM(SAL) TOTAL,COUNT(EMPNO) NUMBER_WRITTEN FROM EMP GROUP BY MGR HAVING SUM(SAL) < 2500; MGR TOTAL NUMBER_WRITTEN ---------- ---------- -------------7782 1300 1 7788 1100 1 7902 800 1 We have seen that combining these two groups of clauses can have unexpected results, including the following: SQL> SELECT MGR,SUM(SAL) TOTAL,COUNT(EMPNO) NUMBER_WRITTEN FROM EMP WHERE SAL >= 500 GROUP BY MGR HAVING SUM(SAL) < 2500; MGR TOTAL NUMBER_WRITTEN ---------- ---------- -------------7782 1300 1 7788 1100 1 7902 800 1 Compare these two result sets and examine the raw data: SQL> SELECT MGR, SAL FROM EMP ORDER BY MGR; MGR SAL ---------- ---------7566 3000 7566 3000 7698 1600 7698 1250 7698 950 7698 1500 7698 1250 7782 1300 7788 1100 7839 2975 7839 2450 7839 2850 7902 800 5000 14 rows selected.
Analysis:
We see how WHERE clause filtered out all employees less than 500 before the GROUP BY was performed on the query. We are not trying to tell we not to mix these groups--we may have a requirement that this sort of construction will meet. However, we should not casually mix aggregate and non-aggregate functions.
Joins
This information will enable us to gather and manipulate data across several tables. We will understand and be able to do the following: Perform an outer join Perform a left join Perform a right join Perform an equi-join Perform a non-equi-join Join a table to itself
1. Introduction
One of the most powerful features of SQL is its capability to gather and manipulate data from across several tables. Without this feature we would have to store all the data elements necessary for each application in one table. Without common tables we would need to store the same data in several tables. Imagine having to redesign, rebuild, and repopulate our tables and databases every time our user needed a query with a new piece of information. The JOIN statement of SQL enables us to design smaller, more specific tables that are easier to maintain than larger tables.
SQL> SELECT * FROM DEPT DEPTNO -----10 20 30 40 DNAME -------------ACCOUNTING RESEARCH SALES OPERATIONS LOC ------------NEW YORK DALLAS CHICAGO BOSTON
EMPNO ----7369 7499 7521 7566 7654 7698 7782 7788 7839 7844 7876 7900 7902 7934 7369 7499 7521 7566 7654 7698 7782 7788 7839 7844 7876 7900 7902 7934 7369 7499 7521 7566 7654 7698 7782 7788 7839 7844 7876 7900 7902 7934 7369 7499 7521 7566 7654 7698 7782 7788 7839 7844 7876 7900 7902 7934
ENAME ----SMITH ALLEN WARD JONES MARTIN BLAKE CLARK SCOTT KING TURNER ADAMS JAMES FORD MILLER SMITH ALLEN WARD JONES MARTIN BLAKE CLARK SCOTT KING TURNER ADAMS JAMES FORD MILLER SMITH ALLEN WARD JONES MARTIN BLAKE CLARK SCOTT KING TURNER ADAMS JAMES FORD MILLER SMITH ALLEN WARD JONES MARTIN BLAKE CLARK SCOTT KING TURNER ADAMS JAMES FORD MILLER
JOB --------CLERK SALESMAN SALESMAN MANAGER SALESMAN MANAGER MANAGER ANALYST PRESIDENT SALESMAN CLERK CLERK ANALYST CLERK CLERK SALESMAN SALESMAN MANAGER SALESMAN MANAGER MANAGER ANALYST PRESIDENT SALESMAN CLERK CLERK ANALYST CLERK CLERK SALESMAN SALESMAN MANAGER SALESMAN MANAGER MANAGER ANALYST PRESIDENT SALESMAN CLERK CLERK ANALYST CLERK CLERK SALESMAN SALESMAN MANAGER SALESMAN MANAGER MANAGER ANALYST PRESIDENT SALESMAN CLERK CLERK ANALYST CLERK
MGR ---7902 7698 7698 7839 7698 7839 7839 7566 7698 7788 7698 7566 7782 7902 7698 7698 7839 7698 7839 7839 7566 7698 7788 7698 7566 7782 7902 7698 7698 7839 7698 7839 7839 7566 7698 7788 7698 7566 7782 7902 7698 7698 7839 7698 7839 7839 7566 7698 7788 7698 7566 7782
HIREDATE --------17-DEC-80 20-FEB-81 22-FEB-81 02-APR-81 28-SEP-81 01-MAY-81 09-JUN-81 19-APR-87 17-NOV-81 08-SEP-81 23-MAY-87 03-DEC-81 03-DEC-81 23-JAN-82 17-DEC-80 20-FEB-81 22-FEB-81 02-APR-81 28-SEP-81 01-MAY-81 09-JUN-81 19-APR-87 17-NOV-81 08-SEP-81 23-MAY-87 03-DEC-81 03-DEC-81 23-JAN-82 17-DEC-80 20-FEB-81 22-FEB-81 02-APR-81 28-SEP-81 01-MAY-81 09-JUN-81 19-APR-87 17-NOV-81 08-SEP-81 23-MAY-87 03-DEC-81 03-DEC-81 23-JAN-82 17-DEC-80 20-FEB-81 22-FEB-81 02-APR-81 28-SEP-81 01-MAY-81 09-JUN-81 19-APR-87 17-NOV-81 08-SEP-81 23-MAY-87 03-DEC-81 03-DEC-81 23-JAN-82
SAL ---800 1600 1250 2975 1250 2850 2450 3000 5000 1500 1100 950 3000 1300 800 1600 1250 2975 1250 2850 2450 3000 5000 1500 1100 950 3000 1300 800 1600 1250 2975 1250 2850 2450 3000 5000 1500 1100 950 3000 1300 800 1600 1250 2975 1250 2850 2450 3000 5000 1500 1100 950 3000 1300
COMM DEPTNO DEPTNO ---- ------ -----20 10 300 30 10 500 30 10 20 10 1400 30 10 30 10 10 10 20 10 10 10 0 30 10 20 10 30 10 20 10 10 10 20 20 300 30 20 500 30 20 20 20 1400 30 20 30 20 10 20 20 20 10 20 0 30 20 20 20 30 20 20 20 10 20 20 30 300 30 30 500 30 30 20 30 1400 30 30 30 30 10 30 20 30 10 30 0 30 30 20 30 30 30 20 30 10 30 20 40 300 30 40 500 30 40 20 40 1400 30 40 30 40 10 40 20 40 10 40 0 30 40 20 40 30 40 20 40 10 40
DNAME ---------ACCOUNTING ACCOUNTING ACCOUNTING ACCOUNTING ACCOUNTING ACCOUNTING ACCOUNTING ACCOUNTING ACCOUNTING ACCOUNTING ACCOUNTING ACCOUNTING ACCOUNTING ACCOUNTING RESEARCH RESEARCH RESEARCH RESEARCH RESEARCH RESEARCH RESEARCH RESEARCH RESEARCH RESEARCH RESEARCH RESEARCH RESEARCH RESEARCH SALES SALES SALES SALES SALES SALES SALES SALES SALES SALES SALES SALES SALES SALES OPERATIONS OPERATIONS OPERATIONS OPERATIONS OPERATIONS OPERATIONS OPERATIONS OPERATIONS OPERATIONS OPERATIONS OPERATIONS OPERATIONS OPERATIONS OPERATIONS
LOC -------NEW YORK NEW YORK NEW YORK NEW YORK NEW YORK NEW YORK NEW YORK NEW YORK NEW YORK NEW YORK NEW YORK NEW YORK NEW YORK NEW YORK DALLAS DALLAS DALLAS DALLAS DALLAS DALLAS DALLAS DALLAS DALLAS DALLAS DALLAS DALLAS DALLAS DALLAS CHICAGO CHICAGO CHICAGO CHICAGO CHICAGO CHICAGO CHICAGO CHICAGO CHICAGO CHICAGO CHICAGO CHICAGO CHICAGO CHICAGO BOSTON BOSTON BOSTON BOSTON BOSTON BOSTON BOSTON BOSTON BOSTON BOSTON BOSTON BOSTON BOSTON BOSTON
56 rows selected. Fifty-six rows! Where did they come from? And what kind of join is this?
Analysis:
A close examination of the result of our first join shows that each row from EMP was added to each row from DEPT. An extract from this join shows what happened: EMPNO ----7369 7369 7369 7369 ... ENAME ----SMITH SMITH SMITH SMITH JOB --------CLERK CLERK CLERK CLERK MGR ---7902 7902 7902 7902 HIREDATE --------17-DEC-80 17-DEC-80 17-DEC-80 17-DEC-80 SAL ---800 800 800 800 COMM DEPTNO DEPTNO ---- ------ -----20 10 20 20 20 30 20 40 DNAME ---------ACCOUNTING RESEARCH SALES OPERATIONS LOC -------NEW YORK DALLAS CHICAGO BOSTON
Notice how each row in DEPT was combined with row 1 in EMP. Congratulations! We have performed our first join. But what kind of join? An inner join? an outer join? or what? Well, actually this type of join is called a cross-join. A cross-join is not normally as useful as the other joins covered today, but this join does illustrate the basic combining property of all joins: Joins bring tables together. Suppose we sold parts to bike shops for a living. When we designed our database, we built one big table with all the pertinent columns. Every time we had a new requirement, we added a new column or started a new table with all the old data plus the new data required to create a specific query. Eventually, our database would collapse from its own weight-not a pretty sight. An alternative design, based on a relational model, would have we put all related data into one table. Here's how our customer table would look: SQL> SELECT * FROM CUSTOMER; NAME ---------TRUE WHEEL BIKE SPEC LE SHOPPE AAA BIKE JACKS BIKE ADDRESS ---------55O HUSKER CPT SHRIVE HOMETOWN 10 OLDTOWN 24 EGLIN STATE -----NE LA KS NE FL ZIP ---------58702 45678 54678 56784 34567 PHONE REMARKS --------- ---------555-4545 NONE 555-1234 NONE 555-1278 NONE 555-3421 JOHN-MGR 555-2314 NONE
Analysis:
This table contains all information we need to describe our customers. The items we sold would go into another table: SQL> SELECT * FROM PART; PARTNUM ----------54 42 46 23 76 10 DESCRIPTION PRICE -------------------- ----------PEDALS 54.25 SEATS 24.50 TIRES 15.25 MOUNTAIN BIKE 350.45 ROAD BIKE 530.00 TANDEM 1200.00
And the orders we take would have their own table: SQL> SELECT * FROM ORDERS; ORDEREDON ----------15-MAY-1996 19-MAY-1996 2-SEP-1996 30-JUN-1996 30-JUN-1996 30-MAY-1996 30-MAY-1996 17-JAN-1996 17-JAN-1996 1-JUN-1996 1-JUN-1996 1-JUL-1996 1-JUL-1996 11-JUL-1996 NAME PARTNUM QUANTITY REMARKS ---------- ----------- ----------- ------TRUE WHEEL 23 6 PAID TRUE WHEEL 76 3 PAID TRUE WHEEL 10 1 PAID TRUE WHEEL 42 8 PAID BIKE SPEC 54 10 PAID BIKE SPEC 10 2 PAID BIKE SPEC 23 8 PAID BIKE SPEC 76 11 PAID LE SHOPPE 76 5 PAID LE SHOPPE 10 3 PAID AAA BIKE 10 1 PAID AAA BIKE 76 4 PAID AAA BIKE 46 14 PAID JACKS BIKE 76 14 PAID
One advantage of this approach is that we can have three specialized people or departments responsible for maintaining their own data. We don't need a database administrator who is proficient with all aspects of our project to shepherd one gigantic, multi-departmental database. Another advantage is that in the age of networks, each table could reside on a different machine. Now join PARTS and ORDERS: SQL> SELECT O.ORDEREDON, O.NAME, O.PARTNUM, P.PARTNUM, P.DESCRIPTION FROM ORDERS O, PART P; ORDEREDON ----------15-MAY-1996 19-MAY-1996 2-SEP-1996 30-JUN-1996 30-JUN-1996 30-MAY-1996 30-MAY-1996 17-JAN-1996 17-JAN-1996 1-JUN-1996 1-JUN-1996 1-JUL-1996 1-JUL-1996 11-JUL-1996 ... NAME PARTNUM ---------- ----------TRUE WHEEL 23 TRUE WHEEL 76 TRUE WHEEL 10 TRUE WHEEL 42 BIKE SPEC 54 BIKE SPEC 10 BIKE SPEC 23 BIKE SPEC 76 LE SHOPPE 76 LE SHOPPE 10 AAA BIKE 10 AAA BIKE 76 AAA BIKE 46 JACKS BIKE 76 PARTNUM --------54 54 54 54 54 54 54 54 54 54 54 54 54 54 DESCRIPTION -----------PEDALS PEDALS PEDALS PEDALS PEDALS PEDALS PEDALS PEDALS PEDALS PEDALS PEDALS PEDALS PEDALS PEDALS
Analysis:
The preceding code is just a portion of the result set. The actual set is 14 (number of rows in ORDERS) x 6 (number of rows in PART), or 84 rows. It is similar to the result from joining EMP and DEPT on the top, and it is still one statement shy of being useful. Before we reveal that statement, we need to regress a little and talk about another use for the alias.
3. Equi-Joins
An extract from the PART/ORDERS join provides a clue as to what is missing: 30-JUN-1996 TRUE WHEEL 30-JUN-1996 BIKE SPEC 30-MAY-1996 BIKE SPEC 42 54 10 54 PEDALS 54 PEDALS 54 PEDALS
Notice the PARTNUM fields that are common to both tables. What if we wrote the following? SQL> SELECT O.ORDEREDON, O.NAME, O.PARTNUM, P.PARTNUM, P.DESCRIPTION
FROM ORDERS O, PART P WHERE O.PARTNUM = P.PARTNUM; ORDEREDON ----------1-JUN-1996 30-MAY-1996 2-SEP-1996 1-JUN-1996 30-MAY-1996 15-MAY-1996 30-JUN-1996 1-JUL-1996 30-JUN-1996 1-JUL-1996 17-JAN-1996 19-MAY-1996 11-JUL-1996 17-JAN-1996 NAME PARTNUM ---------- ----------AAA BIKE 10 BIKE SPEC 10 TRUE WHEEL 10 LE SHOPPE 10 BIKE SPEC 23 TRUE WHEEL 23 TRUE WHEEL 42 AAA BIKE 46 BIKE SPEC 54 AAA BIKE 76 BIKE SPEC 76 TRUE WHEEL 76 JACKS BIKE 76 LE SHOPPE 76 PARTNUM --------10 10 10 10 23 23 42 46 54 76 76 76 76 76 DESCRIPTION -------------TANDEM TANDEM TANDEM TANDEM MOUNTAIN BIKE MOUNTAIN BIKE SEATS TIRES PEDALS ROAD BIKE ROAD BIKE ROAD BIKE ROAD BIKE ROAD BIKE
Analysis:
Using the column PARTNUM that exists in both of the preceding tables, we have just combined the information we had stored in the ORDERS table with information from the PART table to show a description of parts the bike shops have ordered from us. The join that was used is called an EQUI-JOIN because the goal is to match the values of a column in one table to the corresponding values in the second table. We can further qualify this query by adding more conditions in the WHERE clause. For example: SQL> SELECT O.ORDEREDON, O.NAME, O.PARTNUM, P.PARTNUM, P.DESCRIPTION FROM ORDERS O, PART P WHERE O.PARTNUM = P.PARTNUM AND O.PARTNUM = 76; ORDEREDON ----------1-JUL-1996 17-JAN-1996 19-MAY-1996 11-JUL-1996 17-JAN-1996 NAME PARTNUM ---------- ----------AAA BIKE 76 BIKE SPEC 76 TRUE WHEEL 76 JACKS BIKE 76 LE SHOPPE 76 PARTNUM ---------76 76 76 76 76 DESCRIPTION -----------ROAD BIKE ROAD BIKE ROAD BIKE ROAD BIKE ROAD BIKE
The number 76 is not very descriptive, and we wouldn't want our sales people to have to memorize a part number. (We have had the misfortune to see many data information systems in the field that require the end user to know some obscure code for something that had a perfectly good name). Here's another way to write the query: SQL> SELECT O.ORDEREDON, O.NAME, O.PARTNUM, P.PARTNUM, P.DESCRIPTION FROM ORDERS O, PART P WHERE O.PARTNUM = P.PARTNUM AND P.DESCRIPTION = 'ROAD BIKE'; ORDEREDON ----------1-JUL-1996 17-JAN-1996 19-MAY-1996 11-JUL-1996 17-JAN-1996 NAME PARTNUM ---------- ----------AAA BIKE 76 BIKE SPEC 76 TRUE WHEEL 76 JACKS BIKE 76 LE SHOPPE 76 PARTNUM ---------76 76 76 76 76 DESCRIPTION -----------ROAD BIKE ROAD BIKE ROAD BIKE ROAD BIKE ROAD BIKE
Find out how much money we have made from selling road bikes: SQL> SELECT SUM(O.QUANTITY * P.PRICE) TOTAL FROM ORDERS O, PART P WHERE O.PARTNUM - P.PARTNUM AND P.DESCRIPTION - 'ROAD BIKE'; TOTAL ----------19610.00
Analysis:
With this setup, the sales people can keep the ORDERS table updated, the production department can keep the PART table current, and we can find our bottom line without redesigning our database. Note: Notice the consistent use of table and column aliases in the SQL statement examples. We will save many, many keystrokes by using aliases. They also help to make our statement more readable. Can we join more than one table? For ex, to generate information to send out an invoice, we could type this statement: SQL> SELECT C.NAME, C.ADDRESS, (O.QUANTITY * P.PRICE) TOTAL FROM ORDER O, PART P, CUSTOMER C WHERE O.PARTNUM = P.PARTNUM AND O.NAME = C.NAME; When 3 tables are used equivalence on all three has to be specified. NAME ---------TRUE WHEEL BIKE SPEC LE SHOPPE AAA BIKE TRUE WHEEL BIKE SPEC TRUE WHEEL AAA BIKE BIKE SPEC TRUE WHEEL BIKE SPEC JACKS BIKE LE SHOPPE AAA BIKE ADDRESS TOTAL ---------- ----------55O HUSKER 1200.00 CPT SHRIVE 2400.00 HOMETOWN 3600.00 10 OLDTOWN 1200.00 55O HUSKER 2102.70 CPT SHRIVE 2803.60 55O HUSKER 196.00 10 OLDTOWN 213.50 CPT SHRIVE 542.50 55O HUSKER 1590.00 CPT SHRIVE 5830.00 24 EGLIN 7420.00 HOMETOWN 2650.00 10 OLDTOWN 2120.00
We could make the output more readable by writing the statement like this: SQL> SELECT C.NAME, C.ADDRESS, O.QUANTITY * P.PRICE TOTAL FROM ORDERS O, PART P, CUSTOMER C WHERE O.PARTNUM = P.PARTNUM AND O.NAME = C.NAME ORDER BY C.NAME; NAME ---------AAA BIKE AAA BIKE AAA BIKE BIKE SPEC BIKE SPEC BIKE SPEC BIKE SPEC JACKS BIKE LE SHOPPE LE SHOPPE TRUE WHEEL TRUE WHEEL TRUE WHEEL TRUE WHEEL ADDRESS TOTAL ---------- ----------10 OLDTOWN 213.50 10 OLDTOWN 2120.00 10 OLDTOWN 1200.00 CPT SHRIVE 542.50 CPT SHRIVE 2803.60 CPT SHRIVE 5830.00 CPT SHRIVE 2400.00 24 EGLIN 7420.00 HOMETOWN 2650.00 HOMETOWN 3600.00 55O HUSKER 196.00 55O HUSKER 2102.70 55O HUSKER 1590.00 55O HUSKER 1200.00
Note: Notice that when joining the three tables (ORDERS, PART, and CUSTOMER) that the ORDERS table was used in two joins and the other tables were used only once. Tables that will return the fewest rows with the given conditions are commonly referred to as driving tables, or base tables. Tables other than the base table in a query are usually joined to the base table for more efficient data retrieval. Consequently, the ORDERS table is the base table in this example. In most databases a few base tables join (either directly or indirectly) all the other tables. We can make the previous query more specific, thus more useful, by adding the DESCRIPTION column as in the following example: SQL> SELECT C.NAME, C.ADDRESS, O.QUANTITY * P.PRICE TOTAL, P.DESCRIPTION FROM ORDERS O, PART P, CUSTOMER C
WHERE O.PARTNUM - P.PARTNUM AND O.NAME - C.NAME ORDER BY C.NAME; NAME ---------AAA BIKE AAA BIKE AAA BIKE BIKE SPEC BIKE SPEC BIKE SPEC BIKE SPEC JACKS BIKE LE SHOPPE LE SHOPPE TRUE WHEEL TRUE WHEEL TRUE WHEEL TRUE WHEEL ADDRESS TOTAL DESCRIPTION ---------- ----------- -------------10 OLDTOWN 213.50 TIRES 10 OLDTOWN 2120.00 ROAD BIKE 10 OLDTOWN 1200.00 TANDEM CPT SHRIVE 542.50 PEDALS CPT SHRIVE 2803.60 MOUNTAIN BIKE CPT SHRIVE 5830.00 ROAD BIKE CPT SHRIVE 2400.00 TANDEM 24 EGLIN 7420.00 ROAD BIKE HOMETOWN 2650.00 ROAD BIKE HOMETOWN 3600.00 TANDEM 55O HUSKER 196.00 SEATS 55O HUSKER 2102.70 MOUNTAIN BIKE 55O HUSKER 1590.00 ROAD BIKE 55O HUSKER 1200.00 TANDEM
Analysis:
This information is a result of joining three tables. We can now use this information to create an invoice. Note: In the example, SQL grouped EMP and DEPT to create a new table with X (rows in EMP) x Y (rows in DEPT) number of rows. A physical table is not created by the join, but rather in a virtual sense. The join between the two tables produces a new set that meets all conditions in the WHERE clause, including the join itself. The SELECT statement has reduced the number of rows displayed, but to evaluate the WHERE clause SQL still creates all the possible rows. The sample tables in today's examples have only a handful of rows. Our actual data may have thousands of rows. If we are working on a platform with lots of horsepower, using a multiple-table join might not visibly affect performance. However, if we are working in a slower environment, joins could cause a significant slowdown. We aren't telling us not to use joins, because we have seen the advantages to be gained from a relational design. Just be aware of the platform we are using and our customer's requirements for speed versus reliability.
4. Non-Equi Joins
Because SQL supports an equi-join, we might assume that SQL also has a non-equi-join. Whereas the equi-join uses an (=) sign in the WHERE statement, the non-equi-join uses everything but an > or < sign. For example: SQL> SELECT O.NAME, O.PARTNUM, P.PARTNUM, O.QUANTITY * P.PRICE TOTAL FROM ORDERS O, PART P WHERE O.PARTNUM > P.PARTNUM; NAME PARTNUM PARTNUM TOTAL ---------- ----------- ----------- ----------TRUE WHEEL 76 54 162.75 BIKE SPEC 76 54 596.75 LE SHOPPE 76 54 271.25 AAA BIKE 76 54 217.00 JACKS BIKE 76 54 759.50 TRUE WHEEL 76 42 73.50 BIKE SPEC 54 42 245.00 BIKE SPEC 76 42 269.50 LE SHOPPE 76 42 122.50 AAA BIKE 76 42 98.00 AAA BIKE 46 42 343.00 JACKS BIKE 76 42 343.00 TRUE WHEEL 76 46 45.75 BIKE SPEC 54 46 152.50 BIKE SPEC 76 46 167.75 LE SHOPPE 76 46 76.25 AAA BIKE 76 46 61.00 JACKS BIKE 76 46 213.50 TRUE WHEEL 76 23 1051.35 TRUE WHEEL 42 23 2803.60
...
Analysis:
This listing goes on to describe all the rows in the join WHERE O.PARTNUM > P.PARTNUM. In the context of our bicycle shop, this information doesn't have much meaning, and in the real world the equi-join is far more common than the non-equi-join. However, we may encounter an application in which a non-equi-join produces the perfect result.
Note: The syntax we used to get this join JOIN ON is not ANSI standard. The implementation we used for this example has additional syntax. We are using it here to specify an inner and an outer join. Most implementations of SQL have similar extensions. Notice the absence of the WHERE clause in this type of join.
Analysis:
The result is that all the rows in PART are spliced on to specific rows in ORDERS where the column PARTNUM is 54. Here's a RIGHT OUTER JOIN statement: SQL> SELECT P.PARTNUM, P.DESCRIPTION, P.PRICE, O.NAME, O.PARTNUM FROM PART P RIGHT OUTER JOIN ORDERS O ON O.PARTNUM = 54; PARTNUM ------<null> <null> <null> <null> 54 42 46 23 76 10 <null> <null> <null> <null> <null> <null> <null> <null> <null> DESCRIPTION PRICE -------------------- ------<null> <null> <null> <null> <null> <null> <null> <null> PEDALS 54.25 SEATS 24.50 TIRES 15.25 MOUNTAIN BIKE 350.45 ROAD BIKE 530.00 TANDEM 1200.00 <null> <null> <null> <null> <null> <null> <null> <null> <null> <null> <null> <null> <null> <null> <null> <null> <null> <null> NAME PARTNUM -------------- ------TRUE WHEEL 23 TRUE WHEEL 76 TRUE WHEEL 10 TRUE WHEEL 42 BIKE SPEC 54 BIKE SPEC 54 BIKE SPEC 54 BIKE SPEC 54 BIKE SPEC 54 BIKE SPEC 54 BIKE SPEC 10 BIKE SPEC 23 BIKE SPEC 76 LE SHOPPE 76 LE SHOPPE 10 AAA BIKE 10 AAA BIKE 76 AAA BIKE 46 JACKS BIKE 76
Analysis:
This query is new. First we specified a RIGHT OUTER JOIN, which caused SQL to return a full set of right table, ORDERS, and to place nulls in fields where ORDERS.PARTNUM = 54. Following is a LEFT OUTER JOIN statement: SQL> SELECT P.PARTNUM, P.DESCRIPTION, P.PRICE, O.NAME, O.PARTNUM FROM PART P LEFT OUTER JOIN ORDERS O ON ORDERS.PARTNUM = 54; PARTNUM ------54 42 46 23 76 10 DESCRIPTION PRICE NAME PARTNUM ------------------ ----------- ---------- ----------PEDALS 54.25 BIKE SPEC 54 SEATS 24.50 BIKE SPEC 54 TIRES 15.25 BIKE SPEC 54 MOUNTAIN BIKE 350.45 BIKE SPEC 54 ROAD BIKE 530.00 BIKE SPEC 54 TANDEM 1200.00 BIKE SPEC 54
Analysis:
We get the same six rows as the INNER JOIN. Because we specified LEFT (the LEFT table), PART determined the number of rows we would return. Because PART is smaller than ORDERS, SQL saw no need to pad those other fields with blanks. Some implementations of SQL use the + sign instead of an OUTER JOIN statement. The + simply means "Show me everything even if something is missing." Here's the syntax:
Syntax:
SQL> SELECT P.PARTNUM, P.DESCRIPTION, P.PRICE, O.NAME, O.PARTNUM FROM PART P, ORDERS O WHERE P.PARTNUM = O.PARTNUM(+) AND O.NAME LIKE '%BIKE%';
Analysis:
This statement is joining two tables. The + sign on the O.PARTNUM column will return all rows even if they are empty.
7782 7788 7839 7844 7876 7900 7902 7934 7369 7499 7521 7566 7654 7698 7782 7788 7839 7844 7876 7900 7902 7934 7369 7499 7521 7566 7654 7698 7782 7788 7839 7844 7876 7900 7902 7934
CLARK SCOTT KING TURNER ADAMS JAMES FORD MILLER SMITH ALLEN WARD JONES MARTIN BLAKE CLARK SCOTT KING TURNER ADAMS JAMES FORD MILLER SMITH ALLEN WARD JONES MARTIN BLAKE CLARK SCOTT KING TURNER ADAMS JAMES FORD MILLER
MANAGER ANALYST PRESIDENT SALESMAN CLERK CLERK ANALYST CLERK CLERK SALESMAN SALESMAN MANAGER SALESMAN MANAGER MANAGER ANALYST PRESIDENT SALESMAN CLERK CLERK ANALYST CLERK CLERK SALESMAN SALESMAN MANAGER SALESMAN MANAGER MANAGER ANALYST PRESIDENT SALESMAN CLERK CLERK ANALYST CLERK
7839 7566 7698 7788 7698 7566 7782 7902 7698 7698 7839 7698 7839 7839 7566 7698 7788 7698 7566 7782 7902 7698 7698 7839 7698 7839 7839 7566 7698 7788 7698 7566 7782
09-JUN-81 19-APR-87 17-NOV-81 08-SEP-81 23-MAY-87 03-DEC-81 03-DEC-81 23-JAN-82 17-DEC-80 20-FEB-81 22-FEB-81 02-APR-81 28-SEP-81 01-MAY-81 09-JUN-81 19-APR-87 17-NOV-81 08-SEP-81 23-MAY-87 03-DEC-81 03-DEC-81 23-JAN-82 17-DEC-80 20-FEB-81 22-FEB-81 02-APR-81 28-SEP-81 01-MAY-81 09-JUN-81 19-APR-87 17-NOV-81 08-SEP-81 23-MAY-87 03-DEC-81 03-DEC-81 23-JAN-82
2450 3000 5000 1500 1100 950 3000 1300 800 1600 1250 2975 1250 2850 2450 3000 5000 1500 1100 950 3000 1300 800 1600 1250 2975 1250 2850 2450 3000 5000 1500 1100 950 3000 1300
10 20 10 30 20 30 20 10 20 30 30 20 30 30 10 20 10 30 20 30 20 10 20 30 30 20 30 30 10 20 10 30 20 30 20 10
20 20 20 20 20 20 20 20 30 30 30 30 30 30 30 30 30 30 30 30 30 30 40 40 40 40 40 40 40 40 40 40 40 40 40 40
RESEARCH RESEARCH RESEARCH RESEARCH RESEARCH RESEARCH RESEARCH RESEARCH SALES SALES SALES SALES SALES SALES SALES SALES SALES SALES SALES SALES SALES SALES OPERATIONS OPERATIONS OPERATIONS OPERATIONS OPERATIONS OPERATIONS OPERATIONS OPERATIONS OPERATIONS OPERATIONS OPERATIONS OPERATIONS OPERATIONS OPERATIONS
DALLAS DALLAS DALLAS DALLAS DALLAS DALLAS DALLAS DALLAS CHICAGO CHICAGO CHICAGO CHICAGO CHICAGO CHICAGO CHICAGO CHICAGO CHICAGO CHICAGO CHICAGO CHICAGO CHICAGO CHICAGO BOSTON BOSTON BOSTON BOSTON BOSTON BOSTON BOSTON BOSTON BOSTON BOSTON BOSTON BOSTON BOSTON BOSTON
56 rows selected.
Analysis:
In its complete form, this join produces the same number of combinations as joining two 6-row tables. This type of join could be useful to check the internal consistency of data. What would happen if someone fell asleep in the production department and entered a new part with a PARTNUM that already existed? That would be bad news for everybody: Invoices would be wrong; our application would probably blow up; and in general we would be in for a very bad time. And the cause of all our problems would be the duplicate PARTNUM in the following table: We saved our company from this bad situation by checking PART before anyone used it: SQL> SELECT F.PARTNUM, F.DESCRIPTION, S.PARTNUM, S.DESCRIPTION FROM PART F, PART S WHERE F.PARTNUM = S.PARTNUM AND F.DESCRIPTION <> S.DESCRIPTION; PARTNUM ---------76 76 DESCRIPTION PARTNUM DESCRIPTION ------------------------ ------- -----------ROAD BIKE 76 CLIPPLESS SHOE CLIPPLESS SHOE 76 ROAD BIKE
Analysis:
The join produced two rows that satisfied the condition WHERE F.PARTNUM = S.PARTNUM AND F.DESCRIPTION <> S.DESCRIPTION. Of course, at some point, the row of data containing the duplicate PARTNUM would have to be corrected.
Sub Queries
A subquery is a query whose results are passed as the argument for another query. Sub queries enable us to bind several queries together. We will understand and be able to do the following: Build a subquery Use the keywords EXISTS, ANY, and ALL with our sub queries Build and use correlated sub queries
1. Building a Subquery
Simply put, a subquery let us tie the result set of one query to another. The general syntax is as follows:
Syntax:
SQL> SELECT * FROM TABLE1 WHERE TABLE1.SOMECOLUMN = (SELECT SOMEOTHERCOLUMN FROM TABLE2 WHERE SOMEOTHERCOLUMN - SOMEVALUE) Notice how the second query is nested inside the first. Here's a real example that uses the PART and ORDERS tables: SQL> SELECT * FROM PART; PARTNUM ----------54 42 46 23 76 10 DESCRIPTION PRICE -------------------- ----------PEDALS 54.25 SEATS 24.50 TIRES 15.25 MOUNTAIN BIKE 350.45 ROAD BIKE 530.00 TANDEM 1200.00
SQL> SELECT * FROM ORDERS; ORDEREDON ----------15-MAY-1996 19-MAY-1996 2-SEP-1996 30-JUN-1996 30-JUN-1996 30-MAY-1996 30-MAY-1996 17-JAN-1996 17-JAN-1996 1-JUN-1996 1-JUN-1996 1-JUL-1996 1-JUL-1996 11-JUL-1996 NAME PARTNUM QUANTITY REMARKS ---------- ----------- ----------- -------TRUE WHEEL 23 6 PAID TRUE WHEEL 76 3 PAID TRUE WHEEL 10 1 PAID TRUE WHEEL 42 8 PAID BIKE SPEC 54 10 PAID BIKE SPEC 10 2 PAID BIKE SPEC 23 8 PAID BIKE SPEC 76 11 PAID LE SHOPPE 76 5 PAID LE SHOPPE 10 3 PAID AAA BIKE 10 1 PAID AAA BIKE 76 4 PAID AAA BIKE 46 14 PAID JACKS BIKE 76 14 PAID
Analysis:
The tables share a common field called PARTNUM. Suppose we don't know the PARTNUM, but instead wanted to work with the description of the part. Using a subquery, we could type this: SQL> SELECT * FROM ORDERS WHERE PARTNUM =
(SELECT PARTNUM FROM PART WHERE DESCRIPTION LIKE ROAD%); ORDEREDON ----------19-MAY-1996 17-JAN-1996 17-JAN-1996 1-JUL-1996 11-JUL-1996 NAME PARTNUM QUANTITY REMARKS ---------- ----------- ----------- -------TRUE WHEEL 76 3 PAID BIKE SPEC 76 11 PAID LE SHOPPE 76 5 PAID AAA BIKE 76 4 PAID JACKS BIKE 76 14 PAID
Analysis:
We can enhance the PARTNUM column in the result by including the DESCRIPTION, making PARTNUM clearer for anyone who hasn't memorized it. Try this: SQL> SELECT O.ORDEREDON, O.PARTNUM, P.DESCRIPTION, O.QUANTITY, O.REMARKS FROM ORDERS O, PART P WHERE O.PARTNUM = P.PARTNUM AND O.PARTNUM = (SELECT PARTNUM FROM PART WHERE DESCRIPTION LIKE ROAD%); ORDEREDON PARTNUM DESCRIPTION QUANTITY REMARKS ----------- ----------- ------------ ----------- --------19-MAY-1996 76 ROAD BIKE 3 PAID 1-JUL-1996 76 ROAD BIKE 4 PAID 17-JAN-1996 76 ROAD BIKE 5 PAID 17-JAN-1996 76 ROAD BIKE 11 PAID 11-JUL-1996 76 ROAD BIKE 14 PAID
Analysis:
The first part of the query is very familiar: SQL> SELECT O.ORDEREDON, O.PARTNUM, P.DESCRIPTION, O.QUANTITY, O.REMARKS FROM ORDERS O, PART P Here we are using the aliases O and P for tables ORDERS and PART to select the five columns we are interested in. In this case the aliases were not necessary because each of the columns we asked to return is unique. However, it is easier to make a readable query now than to have to figure it out later. The first WHERE clause we encounter WHERE O.PARTNUM = P.PARTNUM Is standard language for the join of tables PART and ORDERS specified in the FROM clause. If we didn't use this WHERE clause, we would have all the possible row combinations of the two tables. The next section includes the subquery. The statement AND O.PARTNUM = (SELECT PARTNUM FROM PART WHERE DESCRIPTION LIKE ROAD%) Adds the qualification that O.PARTNUM must be equal to the result of our simple subquery. The subquery is straightforward, finding all the part numbers that are LIKE "ROAD%". The use of LIKE was somewhat lazy, saving we the keystrokes required to type ROAD BIKE. However, it turns out we were lucky this time. What if someone in the Parts department had added a new part called ROADKILL? The revised PART table would look like this: SQL> SELECT * FROM PART; PARTNUM ----------54 42 46 DESCRIPTION PRICE -------------------- ----------PEDALS 54.25 SEATS 24.50 TIRES 15.25
23 76 10 77
Suppose we are blissfully unaware of this change and try our query after this new product was added. If we enter this: SQL> SELECT O.ORDEREDON, O.PARTNUM, P.DESCRIPTION, O.QUANTITY, O.REMARKS FROM ORDERS O, PART P WHERE O.PARTNUM = P.PARTNUM AND O.PARTNUM = (SELECT PARTNUM FROM PART WHERE DESCRIPTION LIKE ROAD%); The SQL engine complains multiple rows in singleton select and we don't get any results. The response from our SQL engine may vary, but it still complains and returns nothing. To find out why we get this undesirable result, assume the role of the SQL engine. We will probably evaluate the subquery first. We would return this: SQL> SELECT PARTNUM FROM PART WHERE DESCRIPTION LIKE ROAD%; PARTNUM ----------76 77 The = can only be on one. Say a=b not a=b,c. so it is wrong. But like works. We would take this result and apply it to O.PARTNUM =, which is the step that causes the problem.
Analysis:
How can PARTNUM be equal to both 76 and 77? This must be what the engine meant when it accused we of being a simpleton. When we used the LIKE clause, we opened yourself up for this error. When we combine the results of a relational operator with another relational operator, such as =, <, or >, we need to make sure the result is singular. In the case of the example we have been using, the solution would be to rewrite the query using an = instead of the LIKE, like this: SQL> SELECT O.ORDEREDON, O.PARTNUM, P.DESCRIPTION, O.QUANTITY, O.REMARKS FROM ORDERS O, PART P WHERE O.PARTNUM = P.PARTNUM AND O.PARTNUM = (SELECT PARTNUM FROM PART WHERE DESCRIPTION = ROAD BIKE); ORDEREDON PARTNUM DESCRIPTION QUANTITY REMARKS ----------- ----------- --------------- ----------- ---------19-MAY-1996 76 ROAD BIKE 3 PAID 1-JUL-1996 76 ROAD BIKE 4 PAID 17-JAN-1996 76 ROAD BIKE 5 PAID 17-JAN-1996 76 ROAD BIKE 11 PAID 11-JUL-1996 76 ROAD BIKE 14 PAID
Analysis:
This subquery returns only one unique result; therefore narrowing our = condition to a single value. How can we be sure the subquery won't return multiple values if we are looking for only one value? Avoiding the use of LIKE is a start. Another approach is to ensure the uniqueness of the search field during table design. If we are the untrusting type, we could use the method for joining a table to itself to check a given field for uniqueness. If we design the table yourself or trust the person who designed the table, we could require the column we are searching to have a unique value. We could also use a part of SQL that returns only one answer: the aggregate function.
Analysis:
This statement returns only one value. To find out which orders were above average, use the preceding SELECT statement for our subquery. The complete query and result are as follows: SQL> SELECT O.NAME, O.ORDEREDON, O.QUANTITY * P.PRICE TOTAL FROM ORDERS O, PART P WHERE O.PARTNUM = P.PARTNUM AND O.QUANTITY * P.PRICE > (SELECT AVG(O.QUANTITY * P.PRICE) FROM ORDERS O, PART P WHERE O.PARTNUM = P.PARTNUM); NAME ---------LE SHOPPE BIKE SPEC LE SHOPPE BIKE SPEC JACKS BIKE ORDEREDON TOTAL ----------- ----------1-JUN-1996 3600.00 30-MAY-1996 2803.60 17-JAN-1996 2650.00 17-JAN-1996 5830.00 11-JUL-1996 7420.00
Analysis:
This example contains a rather unremarkable SELECT/FROM/WHERE clause: SQL> SELECT O.NAME, O.ORDEREDON, O.QUANTITY * P.PRICE TOTAL FROM ORDERS O, PART P WHERE O.PARTNUM = P.PARTNUM; These lines represent the common way of joining these two tables. This join is necessary because the price is in PART and the quantity is in ORDERS. The WHERE ensures that we examine only the join-formed rows that are related. We then add the subquery: AND O.QUANTITY * P.PRICE > (SELECT AVG(O.QUANTITY * P.PRICE) FROM ORDERS O, PART P WHERE O.PARTNUM = P.PARTNUM) The preceding condition compares the total of each order with the average we computed in the subquery. Note that the join in the subquery is required for the same reasons as in the main SELECT statement. This join is also constructed exactly the same way. They have exactly the same syntax as a standalone query. In fact, most subqueries start out as standalone queries and are incorporated as subqueries after their results are tested.
3. Nested Subqueries
Nesting is the act of embedding a subquery within another subquery. For example: SQL> SELECT * FROM SOMETHING WHERE (SUBQUERY(SUBQUERY(SUBQUERY))); Subqueries can be nested as deeply as our implementation of SQL allows. For example, to send out special notices to customers who spend more than the average amount of money, we would combine the information in the table CUSTOMER
SQL> SELECT * FROM CUSTOMER; NAME ---------TRUE WHEEL BIKE SPEC LE SHOPPE AAA BIKE JACKS BIKE ADDRESS ---------55O HUSKER CPT SHRIVE HOMETOWN 10 OLDTOWN 24 EGLIN STATE -----NE LA KS NE FL ZIP ---------58702 45678 54678 56784 34567 PHONE ----------555-4545 555-1234 555-1278 555-3421 555-2314 REMARKS ---------NONE NONE NONE JOHN-MGR NONE
With a slightly modified version of the query we used to find the above-average orders: SQL> SELECT ALL C.NAME, C.ADDRESS, C.STATE, C.ZIP FROM CUSTOMER C WHERE C.NAME IN (SELECT O.NAME FROM ORDERS O, PART P WHERE O.PARTNUM = P.PARTNUM AND O.QUANTITY * P.PRICE > (SELECT AVG(O.QUANTITY * P.PRICE) FROM ORDERS O, PART P WHERE O.PARTNUM = P.PARTNUM)); NAME ---------BIKE SPEC LE SHOPPE JACKS BIKE Analysis: Here's a look at what we asked for. In the inner most set of parentheses, we find a familiar statement: SQL> SELECT AVG(O.QUANTITY * P.PRICE) FROM ORDERS O, PART P WHERE O.PARTNUM = P.PARTNUM; This result feeds into a slightly modified version of the SELECT clause we used before: SQL> SELECT O.NAME FROM ORDERS O, PART P WHERE O.PARTNUM = P.PARTNUM AND O.QUANTITY * P.PRICE (...) ADDRESS ---------CPT SHRIVE HOMETOWN 24 EGLIN STATE -----LA KS FL ZIP ---------45678 54678 34567
>
Note the SELECT clause has been modified to return a single column, NAME, which, not so coincidentally, is common with the table CUSTOMER. Running this statement by itself we get: SQL> SELECT O.NAME FROM ORDERS O, PART P WHERE O.PARTNUM = P.PARTNUM AND O.QUANTITY * P.PRICE (SELECT AVG(O.QUANTITY * P.PRICE) FROM ORDERS O, PART P WHERE O.PARTNUM = P.PARTNUM); NAME ---------LE SHOPPE BIKE SPEC LE SHOPPE BIKE SPEC JACKS BIKE
>
Analysis:
We just spent some time discussing why our subqueries should return just one value. The reason this query was able to return more than one value becomes apparent in a moment. We bring these results to the statement: SQL> SELECT C.NAME, C.ADDRESS, C.STATE, C.ZIP
Analysis:
The first two lines are unremarkable. The third reintroduces the keyword IN. IN is the tool that enables us to use the multiple-row output of our subquery. IN, as we remember, looks for matches in the following set of values enclosed by parentheses, which in this case produces the following values: LE SHOPPE BIKE SPEC LE SHOPPE BIKE SPEC JACKS BIKE This subquery provides the conditions that give us the mailing list: NAME ---------BIKE SPEC LE SHOPPE JACKS BIKE ADDRESS ---------CPT SHRIVE HOMETOWN 24 EGLIN STATE -----LA KS FL ZIP -----45678 54678 34567
This use of IN is very common in subqueries. Because IN uses a set of values for its comparison, it does not cause the SQL engine to feel conflicted and inadequate. Subqueries can also be used with GROUP BY and HAVING clauses. Examine the following query: SQL> SELECT NAME, AVG(QUANTITY) FROM ORDERS GROUP BY NAME HAVING AVG(QUANTITY) > (SELECT AVG(QUANTITY) FROM ORDERS); NAME AVG ---------- ----------BIKE SPEC 8 JACKS BIKE 14
Analysis:
Let's examine this query in the order the SQL engine would. First, look at the subquery: SQL> SELECT AVG(QUANTITY) FROM ORDERS; AVG ----------6 By itself, the query is as follows: SQL> SELECT NAME, AVG(QUANTITY) FROM ORDERS GROUP BY NAME; NAME AVG ---------- ----------AAA BIKE 6 BIKE SPEC 8 JACKS BIKE 14 LE SHOPPE 4 TRUE WHEEL 5 When combined through the HAVING clause, the subquery produces two rows that have above-average QUANTITY. HAVING AVG(QUANTITY) > (SELECT AVG(QUANTITY) FROM ORDERS)
This query actually resembles the following JOIN: SQL> SELECT O.ORDEREDON, O.NAME, O.PARTNUM, O.QUANTITY, O.REMARKS FROM ORDERS O, PART P WHERE P.PARTNUM = O.PARTNUM AND P.DESCRIPTION = 'ROAD BIKE'; ORDEREDON ----------19-MAY-1996 1-JUL-1996 17-JAN-1996 17-JAN-1996 11-JUL-1996 NAME PARTNUM QUANTITY REMARKS ---------- ----------- ----------- ------TRUE WHEEL 76 3 PAID AAA BIKE 76 4 PAID LE SHOPPE 76 5 PAID BIKE SPEC 76 11 PAID JACKS BIKE 76 14 PAID
Analysis:
In fact, except for the order, the results are identical. The correlated subquery acts very much like a join. The correlation is established by using an element from the query in the subquery. In this example the correlation was established by the statement WHERE P.PARTNUM = O.PARTNUM In which we compare P.PARTNUM, from the table inside our subquery, to O.PARTNUM, from the table outside our query. Because O.PARTNUM can have a different value for every row, the correlated subquery is executed for each row in the query. In the next example each row in the table ORDERS SQL> SELECT * FROM ORDERS; ORDEREDON ----------15-MAY-1996 19-MAY-1996 2-SEP-1996 30-JUN-1996 30-JUN-1996 30-MAY-1996 30-MAY-1996 17-JAN-1996 17-JAN-1996 1-JUN-1996 NAME PARTNUM QUANTITY REMARKS ---------- ----------- ----------- ------TRUE WHEEL 23 6 PAID TRUE WHEEL 76 3 PAID TRUE WHEEL 10 1 PAID TRUE WHEEL 42 8 PAID BIKE SPEC 54 10 PAID BIKE SPEC 10 2 PAID BIKE SPEC 23 8 PAID BIKE SPEC 76 11 PAID LE SHOPPE 76 5 PAID LE SHOPPE 10 3 PAID
10 76 46 76
1 4 14 14
Is processed against the subquery criteria: SQL> SELECT DESCRIPTION FROM PART P WHERE P.PARTNUM = O.PARTNUM;
Analysis:
This operation returns the DESCRIPTION of every row in PART where P.PARTNUM = O.PARTNUM. These descriptions are then compared in the WHERE clause: WHERE 'ROAD BIKE' = Because each row is examined, the subquery in a correlated subquery can have more than one value. However, don't try to return multiple columns or columns that don't make sense in the context of the WHERE clause. The values returned still must match up against the operation specified in the WHERE clause. For example, in the query we just did, returning the PRICE to compare with ROAD BIKE would have the following result: SQL> SELECT * FROM ORDERS O WHERE 'ROAD BIKE' = (SELECT PRICE FROM PART P WHERE P.PARTNUM = O.PARTNUM); Conversion error from string ROAD BIKE. Here's another example of something not to do: SQL> SELECT * FROM ORDERS O WHERE 'ROAD BIKE' = (SELECT * FROM PART P WHERE P.PARTNUM = O.PARTNUM);
Analysis:
The SQL engine simply can't correlate all the columns in PART with the operator =. Correlated subqueries can also be used with the GROUP BY and HAVING clauses. The following query uses a correlated subquery to find the average total order for a particular part and then applies that average value to filter the total order grouped by PARTNUM: SQL> SELECT O.PARTNUM, SUM(O.QUANTITY*P.PRICE), COUNT(PARTNUM) FROM ORDERS O, PART P WHERE P.PARTNUM = O.PARTNUM GROUP BY O.PARTNUM HAVING SUM(O.QUANTITY*P.PRICE) > (SELECT AVG(O1.QUANTITY * P1.PRICE) FROM PART P1, ORDERS O1 WHERE P1.PARTNUM = O1.PARTNUM AND P1.PARTNUM = O.PARTNUM); PARTNUM SUM COUNT ----------- ----------- ----------10 8400.00 4 23 4906.30 2 76 19610.00 5
Analysis:
The subquery does not just compute one AVG(O1.QUANTITY*P1.PRICE) Because of the correlation between the query and the subquery, AND P1.PARTNUM - O.PARTNUM This average is computed for every group of parts and then compared:
HAVING SUM(O.QUANTITY*P.PRICE) > Tip: When using correlated subqueries with GROUP BY and HAVING, the columns in the HAVING clause must exist in either the SELECT clause or the GROUP BY clause. Otherwise, we get an error message along the lines of invalid column reference because the subquery is evoked for each group, not each row. We cannot make a valid comparison to something that is not used in forming the group.
Analysis:
Not what we might expect. The subquery inside EXISTS is evaluated only once in this uncorrelated example. Because the return from the subquery has at least one row, EXISTS evaluates to TRUE and all the rows in the query are printed. If we change the subquery as shown next, we don't get back any results. SQL> SELECT NAME, ORDEREDON FROM ORDERS WHERE EXISTS (SELECT * FROM ORDERS WHERE NAME = 'MOSTLY HARMLESS');
Analysis:
EXISTS evaluates to FALSE. The subquery does not generate a result set because MOSTLY HARMLESS is not one of our names. Note: Notice the use of SELECT * in the subquery inside the EXISTS. EXISTS does not care how many columns are returned. We could use EXISTS in this way to check on the existence of certain rows and control the output of our query based on whether they exist. If we use EXISTS in a correlated subquery, it is evaluated for every case implied by the correlation we set up. For example: SQL> SELECT NAME, ORDEREDON FROM ORDERS O WHERE EXISTS (SELECT * FROM CUSTOMER C WHERE STATE = 'NE' AND C.NAME = O.NAME); NAME ORDEREDON ---------- ----------TRUE WHEEL 15-MAY-1996 TRUE WHEEL 19-MAY-1996 TRUE WHEEL 2-SEP-1996
TRUE WHEEL 30-JUN-1996 AAA BIKE 1-JUN-1996 AAA BIKE 1-JUL-1996 AAA BIKE 1-JUL-1996 This slight modification of our first uncorrelated query returns all the bike shops from Nebraska that made orders. The following subquery is run for every row in the query correlated on the CUSTOMER name and ORDERS name: SQL> (SELECT * FROM CUSTOMER C WHERE STATE = 'NE' AND C.NAME = O.NAME);
Analysis:
EXISTS is TRUE for those rows that have corresponding names in CUSTOMER located in NE. Otherwise, it returns FALSE. Closely related to EXISTS are the keywords ANY, ALL, and SOME. ANY and SOME are identical in function. An optimist would say this feature provides the user with a choice. A pessimist would see this condition as one more complication. Look at this query: SQL> SELECT NAME, ORDEREDON FROM ORDERS WHERE NAME = ANY (SELECT NAME FROM ORDERS WHERE NAME = 'TRUE WHEEL'); NAME ---------TRUE WHEEL TRUE WHEEL TRUE WHEEL TRUE WHEEL ORDEREDON ----------15-MAY-1996 19-MAY-1996 2-SEP-1996 30-JUN-1996
Analysis:
ANY compared the output of the following subquery to each row in the query, returning TRUE for each row of the query that has a result from the subquery. SQL> (SELECT NAME FROM ORDERS WHERE NAME = 'TRUE WHEEL'); Replacing ANY with SOME produces an identical result: SQL> SELECT NAME, ORDEREDON FROM ORDERS WHERE NAME = SOME (SELECT NAME FROM ORDERS WHERE NAME = 'TRUE WHEEL'); NAME ---------TRUE WHEEL TRUE WHEEL TRUE WHEEL TRUE WHEEL ORDEREDON ----------15-MAY-1996 19-MAY-1996 2-SEP-1996 30-JUN-1996
Analysis:
We may have already noticed the similarity to IN. The same query using IN is as follows: SQL> SELECT NAME, ORDEREDON FROM ORDERS WHERE NAME IN (SELECT NAME
FROM ORDERS WHERE NAME = 'TRUE WHEEL'); NAME ---------TRUE WHEEL TRUE WHEEL TRUE WHEEL TRUE WHEEL ORDEREDON ----------15-MAY-1996 19-MAY-1996 2-SEP-1996 30-JUN-1996
Analysis:
As we can see, IN returns the same result as ANY and SOME. Has the world gone mad? Not yet. Can IN do this? SQL> SELECT NAME, ORDEREDON FROM ORDERS WHERE NAME > ANY (SELECT NAME FROM ORDERS WHERE NAME = 'JACKS BIKE'); NAME ---------TRUE WHEEL TRUE WHEEL TRUE WHEEL TRUE WHEEL LE SHOPPE LE SHOPPE ORDEREDON ----------15-MAY-1996 19-MAY-1996 2-SEP-1996 30-JUN-1996 17-JAN-1996 1-JUN-1996
The answer is no. IN works like multiple equals. ANY and SOME can be used with other relational operators such as greater than or less than. ALL returns TRUE only if all the results of a subquery meet the condition. Oddly enough, ALL is used most commonly as a double negative, as in this query: SQL> SELECT NAME, ORDEREDON FROM ORDERS WHERE NAME <> ALL (SELECT NAME FROM ORDERS WHERE NAME = 'JACKS BIKE'); NAME ---------TRUE WHEEL TRUE WHEEL TRUE WHEEL TRUE WHEEL BIKE SPEC BIKE SPEC BIKE SPEC BIKE SPEC LE SHOPPE LE SHOPPE AAA BIKE AAA BIKE AAA BIKE ORDEREDON ----------15-MAY-1996 19-MAY-1996 2-SEP-1996 30-JUN-1996 30-JUN-1996 30-MAY-1996 30-MAY-1996 17-JAN-1996 17-JAN-1996 1-JUN-1996 1-JUN-1996 1-JUL-1996 1-JUL-1996
Analysis:
This statement returns everybody except JACKS BIKE. <>ALL evaluates to TRUE only if the result set does not contain what is on the left of the <>.
Data Types
Each value manipulated by Oracle Database has a datatype. The datatype of a value associates a fixed set of properties with the value. These properties cause Oracle to treat values of one datatype differently from values of another. For example, we can add values of NUMBER datatype, but not values of RAW datatype. When we create a table or cluster, we must specify a datatype for each of its columns. When we create a procedure or stored function, we must specify a datatype for each of its arguments. These datatypes define the domain of values that each column can contain or each argument can have. For example, DATE columns cannot accept the value February 29 (except for a leap year) or the values 2 or 'SHOE'. Each value subsequently placed in a column assumes the datatype of the column. For example, if we insert '01-JAN-98' into a DATE column, then Oracle treats the '01-JAN-98' character string as a DATE value after verifying that it translates to a valid date. Oracle Database provides a number of built-in datatypes as well as several categories for user-defined types that can be used as datatypes. A datatype is either scalar or nonscalar. A scalar type contains an atomic value, whereas a nonscalar (sometimes called a "collection") contains a set of values. A large object (LOB) is a special form of scalar datatype representing a large scalar value of binary or character data. LOBs are subject to some restrictions that do not affect other scalar types because of their size. The Oracle precompilers recognize other datatypes in embedded SQL programs. These datatypes are called external datatypes and are associated with host variables. Do not confuse built-in datatypes and user-defined types with external datatypes.
Code
1
Datatype
VARCHAR2(size [BYTE | CHAR])
Description
Variable-length character string having maximum length size bytes or characters. Maximum size is 4000 bytes or characters, and minimum is 1 byte or 1 character. We must specify size for VARCHAR2. BYTE indicates that the column will have byte length semantics; CHAR indicates that the column will have character semantics.
NVARCHAR2(size)
Variable-length Unicode character string having maximum length size characters. The number of bytes can be up to two times size for AL16UTF16 encoding and three times size for UTF8 encoding. Maximum size is determined by the national character set definition, with an upper limit of 4000 bytes. We must specify size for NVARCHAR2.
2 8 12
NUMBER[(precision Number having precision p and scale s. The precision p can range from 1 to 38. [, scale]]) The scale s can range from -84 to 127. LONG DATE Character data of variable length up to 2 gigabytes, or 231 -1 bytes. Provided for backward compatibility. Valid date range from January 1, 4712 BC to December 31, 9999 AD. The default format is determined explicitly by the NLS_DATE_FORMAT parameter or implicitly by the NLS_TERRITORY parameter. The size is fixed at 7 bytes. This datatype contains the datetime fields YEAR, MONTH, DAY, HOUR, MINUTE, and SECOND. It does not have fractional seconds or a time zone. 32-bit floating point number. This datatype requires 5 bytes, including length byte. 64-bit floating point number. This datatype requires 9 bytes, including length byte.
21 22 180
BINARY_FLOAT BINARY_DOUBLE
TIMESTAMP Year, month, and day values of date, as well as hour, minute, and second values [(fractional_seco of time, where fractional_seconds_precision is the number of digits in the fractional nds)] part of the SECOND datetime field. Accepted values of fractional_seconds_precision are 0 to 9. The default is 6. The default format is determined explicitly by the NLS_DATE_FORMAT parameter or implicitly by the
Code
Datatype
Description
NLS_TERRITORY parameter. The sizes varies from 7 to 11 bytes, depending on the precision. This datatype contains the datetime fields YEAR, MONTH, DAY, HOUR, MINUTE, and SECOND. It contains fractional seconds but does not have a time zone.
181
All values of TIMESTAMP as well as time zone displacement value, where fractional_seconds_precision is the number of digits in the fractional part of the SECOND datetime field. Accepted values are 0 to 9. The default is 6. The default format is determined explicitly by the NLS_DATE_FORMAT parameter or implicitly by the NLS_TERRITORY parameter. The size is fixed at 13 bytes. This datatype contains the datetime fields YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, TIMEZONE_HOUR, and TIMEZONE_MINUTE. It has fractional seconds and an explicit time zone.
231
TIMESTAMP All values of TIMESTAMP WITH TIME ZONE, with the following exceptions: [(fractional_seco Data is normalized to database time zone when it is stored in the database. nds)] WITH LOCAL When the data is retrieved, users see the data in the session time zone. TIME ZONE The default format is determined explicitly by the NLS_DATE_FORMAT parameter or implicitly by the NLS_TERRITORY parameter. The sizes varies from 7 to 11 bytes, depending on the precision. INTERVAL YEAR Stores a period of time in years and months, where year_precision is the number [(year_precision) of digits in the YEAR datetime field. Accepted values are 0 to 9. The default is 2. ] TO MONTH The size is fixed at 5 bytes. INTERVAL DAY Stores a period of time in days, hours, minutes, and seconds, where [(day_precision)] day_precision is the maximum number of digits in the DAY datetime field. TO SECOND Accepted values are 0 to 9. The default is 2. [(fractional_seco fractional_seconds_precision is the number of digits in fractional part of the nds)] SECOND field. Accepted values are 0 to 9. The default is 6. The size is fixed at 11 bytes. RAW(size) LONG RAW ROWID UROWID [(size)] Raw binary data of length size bytes. Maximum size is 2000 bytes. We must specify size for a RAW value. Raw binary data of variable length up to 2 gigabytes. Base 64 string representing the unique address of a row in its table. This datatype is primarily for values returned by the ROWID pseudocolumn. Base 64 string representing the logical address of a row of an index-organized table. The optional size is the size of a column of type UROWID. The maximum size and default is 4000 bytes.
182
183
23 24 69 208
96
CHAR [(size [BYTE Fixed-length character data of length size bytes. Maximum size is 2000 bytes or | CHAR])] characters. Default and minimum size is 1 byte. BYTE and CHAR have the same semantics as for VARCHAR2. NCHAR[(size)] Fixed-length character data of length size characters. The number of bytes can be up to two times size for AL16UTF16 encoding and three times size for UTF8 encoding. Maximum size is determined by the national character set definition, with an upper limit of 2000 bytes. Default and minimum size is 1 character. A character large object containing single-byte or multibyte characters. Both fixedwidth and variable-width character sets are supported, both using the database character set. Maximum size is (4 gigabytes - 1) * (database block size). A character large object containing Unicode characters. Both fixed-width and variable-width character sets are supported, both using the database national character set. Maximum size is (4 gigabytes - 1) * (database block size). Stores national character set data.
96
112
CLOB
112
NCLOB
Code
113 114
Datatype
BLOB BFILE
Description
A binary large object. Maximum size is (4 gigabytes - 1) * (database block size). Contains a locator to a large binary file stored outside the database. Enables byte stream I/O access to external LOBs residing on the database server. Maximum size is 4 gigabytes.
Datetime Field
YEAR MONTH DAY
0 to 23 0 to 59 0 to 59.9(n), where 9(n) is the precision of interval fractional seconds Not applicable Not applicable Not applicable
TIMEZONE_HOUR
TIMEZONE_MINUTE 00 to 59. Not applicable for DATE or TIMESTAMP. TIMEZONE_REGION Query the TZNAME column of the V$TIMEZONE_NAMES data dictionary view. Not applicable for DATE or TIMESTAMP. TIMEZONE_ABBR
Query the TZABBREV column of the Not applicable V$TIMEZONE_NAMES data dictionary view. Not applicable for DATE or TIMESTAMP.
Note: TIMEZONE_HOUR and TIMEZONE_MINUTE are specified together and interpreted as an entity in the format +|- hh:mm, with values ranging from -12:59 to +14:00.
Transactions Control
We have become an intermediate-level SQL and database user. If required, we could build a database with its associated tables, each of which would contain several fields of different data types. Using proper design techniques, we could leverage the information contained within this database into a powerful application. The basics of transaction control How to finalize and or cancel a transaction
1. Transaction Control
Transaction control, or transaction management, refers to the capability of a relational database management system to perform database transactions. Transactions are units of work that must be done in a logical order and successfully as a group or not at all. The term unit of work means that a transaction has a beginning and an end. If anything goes wrong during the transaction, the entire unit of work can be canceled if desired. If everything looks good, the entire unit of work can be saved to the database. Banking Application We are employed by First Federal Financial Bank to set up an application that handles checking account transactions that consist of debits and credits to customers' checking accounts. We have set up a nice database, which has been tested and verified to work correctly. After calling up our application, we verify that when we take $20 out of the account, $20 actually disappears from the database. When we add $50.25 to the checking account, this deposit shows up as expected. We proudly announce to our bosses that the system is ready to go, and several computers are set up in a local branch to begin work. Within minutes, we notice a situation that we did not anticipate: As one teller is depositing a check, another teller is withdrawing money from the same account. Within minutes, many depositors' balances are incorrect because multiple users are updating tables simultaneously. Unfortunately, these multiple updates are overwriting each other. Shortly thereafter, our application is pulled offline for an overhaul. We will work through this problem with a database called CHECKING. Within this database are two tables, shown in Tables 1 and Table 2. Table 1. CUSTOMERS table.
Name
Bill Turner John Keith
Address
725 N. Deal Parkway 1220 Via De Luna Dr.
City
State
Zip
20085 33581 23478 29652 38764
Customer_ID
1 2 3 4 5
Washington DC Jacksonville FL
Mary Rosenberg 482 Wannamaker Avenue Williamsburg VA David Blanken Rebecca Little 405 N. Davis Highway 7753 Woods Lane Greenville Houston SC TX
Average_Bal
1298.53 5427.22 211.25 73.79 1285.90 1234.56 345.25
Curr_Bal
854.22 6015.96 190.01 25.87 1473.75 1543.67 348.03
Account_ID
1 2 3 4 5 6 7
Assume now that our application program performs a SELECT operation and retrieves the following data for Bill Turner:
NAME: Bill Turner ADDRESS: 725 N. Deal Parkway CITY: Washington STATE: DC ZIP: 20085 CUSTOMER_ID: 1 While this information is being retrieved, another user with a connection to this database updates Bill Turner's address information: SQL> UPDATE CUSTOMERS SET Address = "11741 Kingstowne Road" WHERE Name = "Bill Turner"; As we can see, the information we retrieved earlier could be invalid if the update occurred during the middle of our SELECT. If our application fired off a letter to be sent to Mr. Bill Turner, the address it used would be wrong. Obviously, if the letter has already been sent, we won't be able to change the address. However, if we had used a transaction, this data change could have been detected, and all our other operations could have been rolled back.
2. Beginning a Transaction
Transactions are quite simple to implement. We will examine the syntax used to perform transactions using the Oracle RDBMS SQL syntax. All database systems that support transactions must have a way to explicitly tell the system that a transaction is beginning. (Remember that a transaction is a logical grouping of work that has a beginning and an end.) Using Oracle, the syntax looks like this:
Syntax:
SET TRANSACTION {READ ONLY | USE ROLLBACK SEGMENT segment} The SQL standard specifies that each database's SQL implementation must support statement-level read consistency; that is, data must stay consistent while one statement is executing. However, in many situations data must remain valid across a single unit of work, not just within a single statement. Oracle enables the user to specify when the transaction will begin by using the SET TRANSACTION statement. If we wanted to examine Bill Turner's information and make sure that the data was not changed, we could do the following: SQL> SET TRANSACTION READ ONLY; SQL> SELECT * FROM CUSTOMERS WHERE NAME = 'Bill Turner'; ---Do Other Operations--SQL> COMMIT; The SET TRANSACTION READ ONLY option enables us to effectively lock a set of records until the transaction ends. We can use the READ ONLY option with the following commands: SELECT LOCK TABLE SET ROLE ALTER SESSION ALTER SYSTEM
3. COMMIT a Transaction
The Oracle syntax to end a transaction is as follows:
Syntax:
COMMIT [WORK] [ COMMENT 'text' | FORCE 'text' [, integer] ] ; Here is the same command using Sybase syntax:
Syntax:
COMMIT (TRANSACTION | TRAN | WORK) (TRANSACTION_NAME) The COMMIT command saves all changes made during a transaction. Executing a COMMIT statement before beginning a transaction ensures that no errors were made and no previous transactions are left hanging.
The following example verifies that the COMMIT command can be used by itself without receiving an error back from the database system. SQL> COMMIT; SQL> SET TRANSACTION READ ONLY; SQL> SELECT * FROM CUSTOMERS WHERE NAME = 'Bill Turner'; ---Do Other Operations--SQL> COMMIT; An Oracle SQL use of the COMMIT statement would look like this: SQL> SET TRANSACTION; SQL> INSERT INTO CUSTOMERS VALUES("John MacDowell","2000 Lake Lunge Road","Chicago", "IL", 42854, 7); SQL> COMMIT; SQL> SELECT * FROM CUSTOMERS;
Address
725 N. Deal Parkway 1220 Via De Luna Dr. 482 Wannamaker Avenue 405 N. Davis Highway 7753 Woods Lane 1285 Pineapple Highway 2000 Lake Lunge Road
City
Washington Jacksonville Williamsburg Greenville Houston Greenville Chicago
State
DC FL VA SC TX AL IL
Zip
20085 33581 23478 29652 38764 32854 42854
Customer_ID
1 2 3 4 5 6 7
Remember that every COMMIT command must correspond with a previously executed SET TRANSACTION or BEGIN TRANSACTION command. Note the errors we receive with the following statements: SQL> INSERT INTO BALANCES VALUES (18765.42, 19073.06, 8); SQL> COMMIT WORK;
Syntax:
ROLLBACK [WORK] [ TO [SAVEPOINT] savepoint | FORCE 'text' ] As we can see, this command makes use of a transaction SAVEPOINT. An example with commands look like this: SQL> SET TRANSACTION; SQL> INSERT INTO CUSTOMERS VALUES ("Bubba MacDowell", "2222 Blue Lake Way", "Austin", "TX", 39874, 8); SQL> ROLLBACK; SQL> SELECT * FROM CUSTOMERS;
Bill Turner John Keith Mary Rosenberg David Blanken Rebecca Little Izetta Parsons John MacDowell
725 N. Deal Parkway 1220 Via De Luna Dr. 482 Wannamaker Avenue 405 N. Davis Highway 7753 Woods Lane 1285 Pineapple Highway 2000 Lake Lunge Road
DC FL VA SC TX AL IL
1 2 3 4 5 6 7
As we can see, the new record was not added because the ROLLBACK statement rolled the insert back. Suppose we are writing an application for a graphical user interface, such as Microsoft Windows. We have a dialog box that queries a database and allows the user to change values. If the user chooses OK, the database saves the changes. If the user chooses Cancel, the changes are canceled. Obviously, this situation gives us an opportunity to use a transaction. When the dialog box is loaded, these SQL statements are executed: SQL> SET TRANSACTION; SQL> SELECT CUSTOMERS.NAME, BALANCES.CURR_BAL, BALANCES.ACCOUNT_ID FROM CUSTOMERS, BALANCES WHERE CUSTOMERS.NAME = "Rebecca Little" AND CUSTOMERS.CUSTOMER_ID = BALANCES.ACCOUNT_ID; The dialog box allows the user to change the current account balance, so we need to store this value back to the database. When the user selects OK, the update will run. SQL> UPDATE BALANCES SET CURR_BAL = 'new-value' WHERE ACCOUNT_ID = 6; SQL> COMMIT; When the user selects Cancel, the ROLLBACK statement is issued. SQL> ROLLBACK; The ROLLBACK statement cancels the entire transaction. When we are nesting transactions, the ROLLBACK statement completely cancels all the transactions, rolling them back to the beginning of the outermost transaction. If no transaction is currently active, issuing the ROLLBACK statement or the COMMIT command has no effect on the database system. (Think of them as dead commands with no purpose.) After the COMMIT statement has been executed, all actions with the transaction are executed. At this point it is too late to roll back the transaction.
Syntax:
SAVEPOINT savepoint_name; SQL> SQL> SQL> SQL> SQL> SQL> SQL> SET TRANSACTION; UPDATE BALANCES SET CURR_BAL = 25000 WHERE ACCOUNT_ID = 5; SAVEPOINT save_it; DELETE FROM BALANCES WHERE ACCOUNT_ID = 5; ROLLBACK TO SAVEPOINT save_it; COMMIT; SELECT * FROM BALANCES;
Average_Bal
Curr_Bal
Account_ID
1 2 3 4 5 6 7 8
The previous examples created a savepoint called SAVE_IT. An update was made to the database that changed the value of the CURR_BAL column of the BALANCES table. We then saved this change as a savepoint. Following this save, we executed a DELETE statement, but we rolled the transaction back to the savepoint immediately thereafter. Then we executed COMMIT TRANSACTION, which committed all commands up to the savepoint. Had we executed a ROLLBACK TRANSACTION after the ROLLBACK TRANSACTION savepoint_name command, the entire transaction would have been rolled back and no changes would have been made. SQL> SQL> SQL> SQL> SQL> SQL> SQL> SET TRANSACTION; UPDATE BALANCES SET CURR_BAL = 25000 WHERE ACCOUNT_ID = 5; SAVEPOINT save_it; DELETE FROM BALANCES WHERE ACCOUNT_ID = 5; ROLLBACK TO SAVEPOINT save_it; ROLLBACK; SELECT * FROM BALANCES;
Curr_Bal
854.22 6015.96 190.01 25.87 1473.75 1543.67 348.03 1431.26
Account_ID
1 2 3 4 5 6 7 8
Syntax:
CREATE TABLE ( field1 field2 field3 table_name datatype [ NOT NULL ], datatype [ NOT NULL ], datatype [ NOT NULL ]...)
A simple example of a CREATE TABLE statement follows. SQL> CREATE TABLE BILLS ( NAME CHAR(30), AMOUNT NUMBER, ACCOUNT_ID NUMBER);
Table created.
Analysis:
This statement creates a table named BILLS. Within the BILLS table are three fields: NAME, AMOUNT, and ACCOUNT_ID. The NAME field has a data type of character and can store strings up to 30 characters long. The AMOUNT and ACCOUNT_ID fields can contain number values only. The following section examines components of the CREATE TABLE command.
Note: The LONG data type is often called a MEMO data type in other database management systems. It is primarily used to store large amounts of text for retrieval at some later time. The LONG RAW data type is often called a binary large object (BLOB) in other database management systems. It is typically used to store graphics, sound, or video data. Although relational database management systems were not originally designed to serve this type of data, many multimedia systems today store their data in LONG RAW, or BLOB, fields. The ROWID field type is used to give each record within our table a unique, no duplicating value. Many other database systems support this concept with a COUNTER field (Microsoft Access) or an IDENTITY field (SQL Server).
Analysis:
In this table we want to save the name of the company we owe the money to, along with the bill's amount. If the NAME field and/or the ACCOUNT_ID were not stored, the record would be meaningless. We would end up with a record with a bill, but we would have no idea whom we should pay. The first statement in the next example inserts a valid record containing data for a bill to be sent to Joe's Computer Service for $25. SQL> INSERT INTO BILLS VALUES(Joe's Computer Service, 25, 1); 1 row inserted. SQL> INSERT INTO BILLS VALUES("", 25000, 1); 1 row inserted.
Analysis:
Note that the second record in the preceding example does not contain a NAME value. (We might think that a missing payee is to our advantage because the bill amount is $25,000, but we won't consider that.) If the table had
been created with a NOT NULL value for the NAME field, the second insert would have raised an error. A good rule of thumb is that the primary key field and all foreign key fields should never contain NULL values.
Name
Phone Company Power Company Record Club Software Company Cable TV Company
Amount
125 75 25 250 35
Account_ID
1 1 2 1 3
Account_ID Type
1 2 3 Checking
Balance Band
500 First Federal First Investor's Credit Union
Name
Phone Company Power Company Record Club
Address
111 1st Street
City
Atlanta
State
GA FL CA
Software Company 444 4th Drive Cable TV Company 555 5th Drive
Table created.
Analysis:
In Oracle we can specify a tablespace in which we want the table to reside. A decision is usually made according to the space available, often by the database administrator (DBA). INITIAL SIZE is the size for the initial extent of the table (the initial allocated space). NEXT SIZE is the value for any additional extents the table may take through growth. MINEXTENTS and MAXEXTENTS identify the minimum and maximum extents allowed for the table, and PCTINCREASE identifies the percentage the next extent will be increased each time the table grows, or takes another extent.
Syntax:
SQL> CREATE TABLE NEW_TABLE(FIELD1, FIELD2, FIELD3) AS (SELECT FIELD1, FIELD2, FIELD3 FROM OLD_TABLE <WHERE...> This syntax allows us to create a new table with the same data types as those of the fields that are selected from the old table. It also allows us to rename the fields in the new table by giving them new names. SQL> CREATE TABLE NEW_BILLS(NAME, AMOUNT, ACCOUNT_ID) AS (SELECT * FROM BILLS WHERE AMOUNT < 50); Table created.
Analysis:
The preceding statement creates a new table (NEW_BILLS) with all the records from the BILLS table that have an AMOUNT less than 50. Some database systems also allow us to use the following syntax:
Syntax:
SQL> INSERT NEW_TABLE SELECT <field1, field2... | *> FROM OLD_TABLE <WHERE...> The preceding syntax would create a new table with the exact field structure and data found in the old table. Using SQL Server's Transact-SQL language in the following example illustrates this technique. SQL> INSERT NEW_BILLS SQL> SELECT * FROM BILLS WHERE AMOUNT < 50
Syntax:
SQL> ALTER TABLE table_name <ADD column_name data_type; | MODIFY column_name data_type;> The following command changes the NAME field of the BILLS table to hold 40 characters: SQL> ALTER TABLE BILLS MODIFY NAME CHAR(40); Table altered. Note: We can increase or decrease the length of columns; however, we can not decrease a column's length if the current size of one of its values is greater than the value we want to assign to the column length. Here's a statement to add a new column to the NEW_BILLS table: SQL> ALTER TABLE NEW_BILLS ADD COMMENTS CHAR(80); Table altered.
Analysis:
This statement would add a new column named COMMENTS capable of holding 80 characters. The field would be added to the right of all the existing fields. Several restrictions apply to using the ALTER TABLE statement. We cannot use it to add or delete fields from a database. It can change a column from NOT NULL to NULL, but not necessarily the other way around. A column specification can be changed from NULL to NOT NULL only if the column does not contain any NULL values. To change a column from NOT NULL to NULL, use the following syntax:
Syntax:
SQL> ALTER TABLE table_name MODIFY (column_name data_type NULL); To change a column from NULL to NOT NULL, we might have to take several steps: 1. 2. Determine whether the column has any NULL values. Deal with any NULL values that we find. (Delete those records, update the column's value, and so on.)
3.
Note: Some database management systems allow the use of the MODIFY clause; others do not. Still others have added other clauses to the ALTER TABLE statement. In Oracle, we can even alter the table's storage parameters. Check the documentation of the system we are using to determine the implementation of the ALTER TABLE statement.
Syntax:
SQL> DROP TABLE table_name; Here's how to drop the NEW_BILLS table: SQL> DROP TABLE NEW_BILLS; Table dropped.
Analysis:
Note the absence of system prompts. This command did not ask Are we sure? (Y/N). After the DROP TABLE command is issued, the table is permanently deleted. Warning: If we issue SQL> DROP TABLE NEW_BILLS; We could be dropping the incorrect table. When dropping tables, we should always use the owner or schema name. The recommended syntax is SQL> DROP TABLE OWNER.NEW_BILLS; We are stressing this syntax because we once had to repair a production database from which the wrong table had been dropped. The table was not properly identified with the schema name. Restoring the database was an eighthour job, and we had to work until well past midnight.
programmer in mind. Because these systems are designed to be used in high-volume, multi-user environments, the primary design emphasis is placed on the query optimizer and data retrieval engines.
Syntax:
SQL> INSERT INTO table_name (col1, col2...) VALUES(value1, value2...) The basic format of the INSERT...VALUES statement adds a record to a table using the columns we give it and the corresponding values we instruct it to add. We must follow three rules when inserting data into a table with the INSERT...VALUES statement: The values used must be the same data type as the fields they are being added to. The data's size must be within the column's size. For instance, we cannot add an 80-character string to a 40character column. The data's location in the VALUES list must correspond to the location in the column list of the column it is being added to. (That is, the first value must be entered into the first column, the second value into the second column, and so on.)
Example 1
Assume we have a COLLECTION table that lists all the important stuff we have collected. We can display the table's contents by writing SQL> SELECT * FROM COLLECTION; ITEM WORTH REMARKS -------------------- --------- ---------------------------NBA ALL STAR CARDS 300 SOME STILL IN BIKE SPOKES MALIBU BARBIE 150 TAN NEEDS WORK STAR WARS GLASS 5.5 HANDLE CHIPPED LOCK OF SPOUSES HAIR 1 HASN'T NOTICED BALD SPOT YET If we wanted to add a new record to this table, we would write SQL> INSERT INTO COLLECTION (ITEM, WORTH, REMARKS) VALUES('SUPERMANS CAPE', 250.00, 'TUGGED ON IT'); 1 row created. We can execute a simple SELECT statement to verify the insertion: SQL> SELECT * FROM COLLECTION; ITEM WORTH REMARKS -------------------- --------- ---------------------------NBA ALL STAR CARDS 300 SOME STILL IN BIKE SPOKES MALIBU BARBIE 150 TAN NEEDS WORK STAR WARS GLASS 5.5 HANDLE CHIPPED LOCK OF SPOUSES HAIR 1 HASN'T NOTICED BALD SPOT YET SUPERMANS CAPE 250 TUGGED ON IT
Analysis:
The INSERT statement does not require column names. If the column names are not entered, SQL lines up the values with their corresponding column numbers. In other words, SQL inserts the first value into the first column, the second value into the second column, and so on.
Example 2
The following statement inserts the values from Example 1 into the table: SQL> INSERT INTO COLLECTION VALUES ('STRING',1000.00,'SOME DAY IT WILL BE VALUABLE'); 1 row created.
Analysis:
By issuing the same SELECT statement as we did in Example 1, we can verify that the insertion worked as expected: SQL> SELECT * FROM COLLECTION; ITEM WORTH REMARKS -------------------- --------- ---------------------------NBA ALL STAR CARDS 300 SOME STILL IN BIKE SPOKES MALIBU BARBIE 150 TAN NEEDS WORK STAR WARS GLASS 5.5 HANDLE CHIPPED LOCK OF SPOUSES HAIR 1 HASN'T NOTICED BALD SPOT YET SUPERMANS CAPE 250 TUGGED ON IT STRING 1000 SOME DAY IT WILL BE VALUABLE 6 rows selected.
Analysis:
Using '' instead of NULL inserted a space in the collection table. We then can select the space. SQL> SELECT * FROM collection WHERE remarks = ' ';
ITEM WORTH REMARKS --------------------------- -------- --------SPORES MILDEW FUNGUS 50.00 1 row selected.
Analysis:
The resulting answer comes back as if a NULL is there. With the output of character fields, it is impossible to tell the difference between a null value and a mere space. Assume the column REMARKS in the preceding table has been defined as NOT NULL. Typing SQL> INSERT INTO COLLECTION VALUES('SPORES MILDEW FUNGUS',50.00,NULL); Produces the following error: INSERT INTO COLLECTION * ERROR at line 1: ORA-01400: mandatory (NOT NULL) column is missing or NULL during insert Note: Number data types do not require quotes; NULL does not require quotes; character data types do require quotes.
Analysis:
In this example we tried to insert another ITEM called STRING into the COLLECTION table. Because this table was created with ITEM as a unique value, it returned the appropriate error. ANSI SQL does not offer a solution to this problem, but several commercial implementations include extensions that would allow us to use something like the following: IF NOT EXISTS (SELECT * FROM COLLECTION WHERE NAME = 'STRING' INSERT INTO COLLECTION VALUES('STRING', 50, 'MORE STRING') This particular example is supported in the Sybase system. A properly normalized table should have a unique, or key, field. This field is useful for joining data between tables, and it often improves the speed of our queries when using indexes. Note: Here's an INSERT statement that inserts a new employee into a table: SQL> INSERT INTO employee_tbl VALUES ('300500177', 'SMITHH', 'JOHN'); 1 row inserted.
Many database systems also support temporary tables. Temporary tables exist for the life of our database connection and are deleted when our connection is terminated. The INSERT...SELECT statement can take the output of a SELECT statement and insert these values into a temporary table. Here is an example: SQL> INSERT INTO tmp_tbl SELECT * FROM TABLE; 19,999 rows inserted.
Analysis:
We are selecting all the rows that are in table and inserting them into tmp_tbl. Note: Not all database management systems support temporary tables. Check the documentation for the specific system we are using to determine if this feature is supported. The syntax of the INSERT...SELECT statement is as follows:
Syntax:
SQL> INSERT INTO table_name (col1, col2...) SELECT col1, col2... FROM tablename WHERE search_condition Essentially, the output of a standard SELECT query is then input into a database table. The same rules that applied to the INSERT...VALUES statement apply to the INSERT...SELECT statement. To copy the contents of the COLLECTION table into a new table called INVENTORY, execute the set of statements in Example 3.
Example 3
This example creates the new table INVENTORY. SQL> CREATE TABLE INVENTORY(ITEM CHAR(20),COST NUMBER, ROOM CHAR(20),REMARKS CHAR(40)); Table created. The following INSERT fills the new INVENTORY table with data from COLLECTION. SQL> INSERT INTO INVENTORY (ITEM, COST, REMARKS) SELECT ITEM, WORTH, REMARKS FROM COLLECTION; 6 rows created. We can verify that the INSERT works with this SELECT statement: SQL> SELECT * FROM INVENTORY; ITEM COST ROOM -------------------- --------- -------NBA ALL STAR CARDS 300 MALIBU BARBIE 150 STAR WARS GLASS 5.5 LOCK OF SPOUSES HAIR 1 SUPERMANS CAPE 250 STRING 1000 6 rows selected. Note: The data appears to be in the table; however, the transaction is not finalized until a COMMIT is issued. The transaction can be committed either by issuing the COMMIT command or by simply exiting. REMARKS ---------------------------SOME STILL IN BIKE SPOKES TAN NEEDS WORK HANDLE CHIPPED HASN'T NOTICED BALD SPOT YET TUGGED ON IT SOME DAY IT WILL BE VALUABLE
Analysis:
We have successfully, and somewhat painlessly, moved the data from the COLLECTION table to the new INVENTORY table! The INSERT...SELECT statement requires us to follow several new rules: The SELECT statement cannot select rows from the table that is being inserted into. The number of columns in the INSERT INTO statement must equal the number of columns returned from the SELECT statement. The data types of the columns in the INSERT INTO statement must be the same as the data types of the columns returned from the SELECT statement. Another use of the INSERT...SELECT statement is to back up a table that we are going to drop, truncate for repopulation, or rebuild. The process requires us to create a temporary table and insert data that is contained in our original table into the temporary table by selecting everything from the original table. For example: SQL> INSERT INTO copy_table SELECT * FROM original_table; Now we can make changes to the original table with a clear conscience. Note: Later today we learn how to input data into a table using data from another database format. Nearly all businesses use a variety of database formats to store data for their organizations. The applications programmer is often expected to convert these formats, and we will learn some common methods for doing just that.
Syntax:
SQL> UPDATE table_name SET columnname1 = value1 [, columname2 = value2]... WHERE search_condition This statement checks the WHERE clause first. For all records in the given table in which the WHERE clause evaluates to TRUE, the corresponding value is updated.
Example 4
This example illustrates the use of the UPDATE statement: SQL> UPDATE COLLECTION SET WORTH = 900 WHERE ITEM = 'STRING'; 1 row updated. To confirm the change, the query SQL> SELECT * FROM COLLECTION WHERE ITEM = 'STRING'; ITEM WORTH REMARKS -------------------- --------- -----------------------------STRING 900 SOME DAY IT WILL BE VALUABLE Here is a multiple-column update: SQL> UPDATE collection SET worth = 900, item = ball WHERE item = 'STRING'; 1 row updated. Note: Our implementation might use a different syntax for multiple-row updates. Notice in the set that 900 do not have quotes, because it is a numeric data type. On the other hand, String is a character data type, which requires the quotes.
Example 5
If the WHERE clause is omitted, every record in the COLLECTION table is updated with the value given. SQL> UPDATE COLLECTION SET WORTH = 555; 6 rows updated. Performing a SELECT query shows that every record in the database was updated with that value: SQL> SELECT * FROM COLLECTION; ITEM WORTH REMARKS -------------------- --------- -----------------------------NBA ALL STAR CARDS 555 SOME STILL IN BIKE SPOKES MALIBU BARBIE 555 TAN NEEDS WORK STAR WARS GLASS 555 HANDLE CHIPPED LOCK OF SPOUSES HAIR 555 HASN'T NOTICED BALD SPOT YET SUPERMANS CAPE 555 TUGGED ON IT STRING 555 SOME DAY IT WILL BE VALUABLE 6 rows selected. We, of course, should check whether the column we are updating allows unique values only. Warning: If we omit the WHERE clause from the UPDATE statement, all records in the given table are updated. Some database systems provide an extension to the standard UPDATE syntax. SQL Server's Transact-SQL language, for instance, enables programmers to update the contents of a table based on the contents of several other tables by using a FROM clause. The extended syntax looks like this:
Syntax:
SQL> UPDATE table_name SET columnname1 = value1 [, columname2 = value2]... FROM table_list WHERE search_condition
Example 6
Here's an example of the extension: SQL> UPDATE COLLECTION SET WORTH = WORTH * 0.005; SQL> SELECT * FROM COLLECTION; ITEM WORTH REMARKS -------------------- -------- ---------------------------NBA ALL STAR CARDS 2.775 SOME STILL IN BIKE SPOKES MALIBU BARBIE 2.775 TAN NEEDS WORK STAR WARS GLASS 2.775 HANDLE CHIPPED LOCK OF SPOUSES HAIR 2.775 HASN'T NOTICED BALD SPOT YET SUPERMANS CAPE 2.775 TUGGED ON IT STRING 2.775 SOME DAY IT WILL BE VALUABLE 6 rows selected.
Analysis:
This syntax is useful when the contents of one table need to be updated following the manipulation of the contents of several other tables. Keep in mind that this syntax is nonstandard and that we need to consult the documentation for our particular database management system before we use it. The UPDATE statement can also update columns based on the result of an arithmetic expression. When using this technique, remember the requirement that the data type of the result of the expression must be the same as the data type of the field that is being modified. Also, the size of the value must fit within the size of the field that is being modified.
Two problems can result from the use of calculated values: truncation and overflow. Truncation results when the database system converts a fractional number to an integer, for instance. Overflow results when the resulting value is larger than the capacity of the modified column, which will cause an error to be returned by our database system. Note: Some database systems handle the overflow problem for us. Oracle converts the number to exponential notation and presents the number that way. We should keep this potential error in mind when using number data types.
Syntax:
SQL> DELETE FROM tablename WHERE condition The first thing we will probably notice about the DELETE command is that it doesn't have a prompt. Users are accustomed to being prompted for assurance when, for instance, a directory or file is deleted at the operating system level. Are we sure? (Y/N) is a common question asked before the operation is performed. Using SQL, when we instruct the DBMS to delete a group of records from a table, it obeys our command without asking. That is, when we tell SQL to delete a group of records, it will really do it! Depending on the use of the DELETE statements WHERE clause, SQL can do the following: Delete single rows Delete multiple rows Delete all rows Delete no rows Here are several points to remember when using the DELETE statement: The DELETE statement cannot delete an individual field's values (use UPDATE instead). The DELETE statement deletes entire records from a single table. Like INSERT and UPDATE, deleting records from one table can cause referential integrity problems within other tables. Keep this potential problem area in mind when modifying data within a database. Using the DELETE statement deletes only records, not the table itself. Use the DROP TABLE statement to remove an entire table.
Example 7
This example shows us how to delete all the records from COLLECTION where WORTH is less than 275. SQL> DELETE FROM COLLECTION WHERE WORTH < 275; 4 rows deleted. The result is a table that looks like this: SQL> SELECT * FROM COLLECTION; ITEM WORTH REMARKS -------------------- --------- -----------------------------NBA ALL STAR CARDS 300 SOME STILL IN BIKE SPOKES STRING 1000 SOME DAY IT WILL BE VALUABLE Warning: Like the UPDATE statement, if we omit a WHERE clause from the DELETE statement, all rows in that particular table will be deleted. Example 8 uses all three data manipulation statements to perform a set of database operations. This example inserts some new rows into the COLLECTION table we used earlier today. SQL> INSERT INTO COLLECTION VALUES('CHIA PET', 5,'WEDDING GIFT'); 1 row created.
SQL> INSERT INTO COLLECTION VALUES('TRS MODEL III', 50, 'FIRST COMPUTER'); 1 row created. Now create a new table and copy this data to it: SQL> CREATE TABLE TEMP (NAME CHAR(20),VALUE NUMBER,REMARKS CHAR(40)); Table created. SQL> INSERT INTO TEMP(NAME, VALUE, REMARKS) SELECT ITEM, WORTH, REMARKS FROM COLLECTION; 4 rows created. SQL> SELECT * FROM TEMP; NAME VALUE REMARKS -------------------- --------- -----------------------------NBA ALL STAR CARDS 300 SOME STILL IN BIKE SPOKES STRING 1000 SOME DAY IT WILL BE VALUABLE CHIA PET 5 WEDDING GIFT TRS MODEL III 50 FIRST COMPUTER Now change some values: SQL> UPDATE TEMP SET VALUE = 100 WHERE NAME = 'TRS MODEL III'; 1 row updated. SQL> UPDATE TEMP SET VALUE = 8 WHERE NAME = 'CHIA PET'; 1 row updated. SQL> SELECT * FROM TEMP; NAME VALUE REMARKS -------------------- --------- ---------------------------NBA ALL STAR CARDS 300 SOME STILL IN BIKE SPOKES STRING 1000 SOME DAY IT WILL BE VALUABLE CHIA PET 8 WEDDING GIFT TRS MODEL III 100 FIRST COMPUTER And update these values back to the original table: SQL> INSERT COLLECTION SQL> SELECT * FROM TEMP; SQL> DROP TABLE TEMP;
Analysis:
The DROP TABLE and CREATE TABLE statements are discussed in greater detail. For now, these statements basically do what their names suggest. CREATE TABLE builds a new table with the format we give it, and DROP TABLE deletes the table. Keep in mind that DROP TABLE permanently removes a table, whereas DELETE FROM <TableName> removes only the records from a table. To check what we have done, select out the records from the COLLECTION table. We will see that the changes we made now exist in the COLLECTION table. SQL> SELECT * FROM COLLECTION;
NAME VALUE REMARKS -------------------- -------- ---------------------------NBA ALL STAR CARDS 300 SOME STILL IN BIKE SPOKES STRING 1000 SOME DAY IT WILL BE VALUABLE CHIA PET 8 WEDDING GIFT TRS MODEL III 100 FIRST COMPUTER
Analysis:
The previous example used all three data manipulation commands--INSERT, UPDATE, and DELETE--to perform a set of operations on a table. The DELETE statement is the easiest of the three to use. Warning: Always keep in mind that any modifications can affect the referential integrity of our database. Think through all our database editing steps to make sure that we have updated all tables correctly.
A scheme is a piece of the database that is owned by some specific user. The complete database definition will typically consist of multiple schemas. The CREATE SCHEMA contains table creation, view and grant operations. Important aspects associated with schema definition are: AUTHORIZATION USER1 specifies that user USER1 is the owner of this schema. The CRAETE SCHEMA can contain many (or zero) statements for creating table and/or statements for creating views and/or statements for granting privileges to other users. USER1 is said to own all the tables and views created in this schema. A user is not allowed to own more than one schema. A schema cannot be owned by more than one user. Tables and views in different schemas can have the same (unqualified) name. Hence USER1.EMPLOYEE and USER2.EMPLOYEE refer to two different tables.
5. GRANT Statement
The Syntax of the GRANT command which is used to provide data access is: SQL> GRANT <privilege specification> ON <table> TO <grantee list> [WITH GRANT OPTION] <privilege specification> is either a comma delimited list of specific privileges (SELECT, INSERT, UPDATE, DELETE, REFERENCES) with an optional parenthesized comma delimited list of column names in case of UPDATE and REFERENCE or the word[s] ALL [PRIVILEGES]. <table> is a base table or a view. <grantee> is either a comma delimited list of valid authorization identifiers or the word PUBLIC. The GRANT command is further explained with the help of examples which assumes a user with authorization identifier USER3 and a table RESIGNATIONS (columns : SUBMITTED_TO, RESIGNED_AS, STATED_REASON and ACTUAL_REASON). SQL> GRANT INSERT ON RESIGNATRATION TO USER3; Will all USER3 to ass rows to the RESIGNATIONS table. GRANT of multiple privileges can be combined as follows as: SQL> GRANT SELECT, INSERT, DELETE, UPDATE, REFERENCES ON RESIGNATION TO USER3; Since the above SQL statement gives USER3 a free hand with RESIGNATIONS, it is equivalent to: SQL> GRANT ALL PRIVILEGES ON RESIGNATION TO USER3;
OR
SQL> GRANT ALL ON RESIGNATIONS TO USER3;
Integrity Constraints
Data integrity allows defining certain data quality requirements that the data in the database needs to meet. If a user tries to insert data that doesn't meet these requirements, Oracle will not allow so.
1. Constraint types
There are five integrity constraints in Oracle. Primary Key Foreign Key Unique Key Not Null Check
Now: trying to insert the number 2 again into a: SQL> INSERT INTO ri_unique VLAUES (2,7); This statement issues a ORA-00001: unique constraint (RENE.SYS_C001463 violated). Every constraint, by the way, has a name. In this case, the name is: RENE.SYS_C001463. In order to remove that constraint, an ALTER TABLE ... DROP CONSTRAINT ... is needed: SQL> ALTER TABLE ri_unique DROP CONSTRAINT sys_c001463; Of course, it is also possible to add a unique constraint on an existing table: SQL> ALTER TABLE ri_unique ADD CONSTRAINT uq_ri_b UNIQUE (b); A unique constraint can be extended over multiple columns: SQL> CREATE TABLE ri_3 ( A NUMBER, B NUMBER, C NUMBER, UNIQUE (A,B) ); It is possible to name the constraint. The following example creates a unique constraint on the columns a and b and names the constraint uq_ri_3. SQL> CREATE TABLE ri_3 ( A NUMBER, B NUMBER, C NUMBER, CONSTRAINT uq_ri_3 UNIQUE (A,B) );
1.5. Check
A check constraint allows to state a minimum requirement for the value in a column. If more complicated requirements are desired, an insert trigger must be used. The following table allows only numbers that are between 0 and 100 in the column a; SQL> CREATE TABLE ri_check_1 ( A NUMBER CHECK (A BETWEEN 0 AND 100), B NUMBER ); Check constraints can be added after a table had been created:
SQL> ALTER TABLE ri_check_1 ADD CONSTRAINT ch_b CHECK (B > 50); It is also possible to state a check constraint that check the value of more than one column. The following example makes sure that the value of begin_ is smaller than the value of end_. SQL> CREATE TABLE ri_check_2 BEGIN_ NUMBER, END_ NUMBER, VALUE_ NUMBER, CHECK (BEGIN_ < END_) );
2. Disabling Constraints
2.1. Disabling 'anonymous' constraint
SQL> CREATE TABLE foo(bar number,baz number,UNIQUE(bar,baz)); SQL> ALTER TABLE foo DISABLE UNIQUE (bar, baz);
Indexes
Indexes are totally optional structures that are intended to speed up the execution of SQL statements against table data and cluster data. Indexes are used for direct access to a particular row or set of rows in a table. Indexes are most typically organized as some type of tree structure.
Examples:
The SSN key is used to track individual students at a university. The concatenated primary key index for an ENROLL table (SSN + SectionID + Term + Year) that is used to track the enrollment of a student in a particular course section. The maximum number of columns for a concatenated index is 32; but the combined size of the columns cannot exceed about one-half of a data block size. A unique index allows no two rows to have the same index entry. An example would be an index on student SSN. A non-unique index allows more than one row to have the same index entry (this is also called a secondary key index). An example would be an index on U.S. Mail zip codes. A function-based index is created when using functions or expressions that involve one or more columns in the table that is being indexed. A function-based index pre-computes the value of the function or expression and stores it in the index. Function-based indexes can be created as either a B-tree or a bitmap index. A partitioned index allows an index to be spread across several tablespaces - the index would have more than one segment and typically access a table that is also partitioned to improve scalability of a system. This type of index decreases contention for index lookup and increases manageability. To create an index in our own schema: The table to be indexed must be in our schema, OR We have the INDEX privilege on the table to be indexed, OR We have the CREATE ANY INDEX system privilege. To create an index in another schema: We have the CREATE ANY INDEX system privilege, AND The owner of the other schema has a quota for the tablespace that will store the index segment (or UNLIMITED TABLESPACE privilege).
2. Loading Data
Data initially loaded into a table will load more efficiently if the index is created after the table is created. This is because the index must be updated after each row insertion if the index is created before loading data. Creating an index on an existing table requires sort space typically memory values that are paged in and out of segments in the TEMP tablespace allocated to a user. Users are also allocated memory for index creation based on the SORT_AREA_SIZE parameter if memory is insufficient, then swapping takes place.
The top level of the index is called the Root. The Root points to entries at lower levels of the index - these lower levels are termed Branch Blocks. A node in the index may store multiple (more than one) key values - in fact, almost all B-Trees have nodes with multiple values - the tree structure grows upward and nodes split when they reach a specified size. At the Leaf level the index entries point to rows in the table. At the Leaf level the index entries are linked in a bi-directional chain that allows scanning rows in both ascending and descending order of key values - this supports sequential processing as well as direct access processing. In a non-partitioned table, Key values are repeated if multiple rows have the same key value this would be a non-unique index (unless the index is compressed). Index entries are not made for rows that have all of the key columns NULL. Leaf Index Format: The index entry at the Leaf level is made up of three components. Entry Header - stores number of columns in the index and locking information about the row to which the index points. Key Column Length-Value Pairs - defines the size of the column in the key followed by the actual column value. These pairs are repeated for each column in a composite index. ROWID - This is the ROWID of a row in a table that contains the key value associated with this index entry. Data Manipulation Language Effects: Any DML on a table also causes the Oracle Server to maintain the associated indexes. When a row is inserted into a table, the index must also be updated. This requires the physical insertion of an index entry into the index tree structure. When a row is deleted from a table, the index only has the entry "logically" deleted (turn a delete bit from off to on). The space for the deleted row is not available for new entries until all rows in the block are deleted. When a row key column is updated for a table, index has both logical deletion and physical insertion into the index. PCTFREE has no effect on an index except when the index is created. New entries to an index may be added to an index block even if the free space in the block is less than the PCTFREE setting. o If an indexed table has lots of rows to be inserted, set PCTFREE high to accommodate new index values. o If the table is static, set PCTFREE low. o PCTUSED cannot be specified for indexes.
The UNIQUE clause specifies unique entries - the default is NONUNIQUE. Note that the owner's schema (User350) is specified this is optional. The PCTFREE parameter is only effective when the index is created - after that, new index block entries are made and PCTFREE is ignored. PCTFREE is ignored because entries are not updated instead a logical delete and physical insert of a new index entry is made. PCTUSED cannot be specified for an index because updates are not made to index entries. Use a low PCTFREE when the indexed column is system generated as would be the case with a sequence (sequence indexes tend to increase in an ascending fashion) because new entries tend to be made to new data blocks - there are no or few insertions into data blocks that already contain index entries. Use a high PCTFREE when the indexed column or set of columns can take on random values that are not predictable. Such is the case when a new Orderline row is inserted - the ProductID column may be a non-unique foreign key index and the product to be sold on an Orderline is not predictable for any given order. The Default and Minimum for INITTRANS is 2. The limit on MAXTRANS is 255 - this number would be inordinately large. By default, LOGGING is on so that the index creation is logged into the redo log file. Specifying NOLOGGING would increase the speed of index creation initially, but would not enable recovery at the time the index is created. Interestingly, Oracle will use existing indexes to create new indexes whenever the key for the new index corresponds to the leading part of the key of an existing index.
4. Key-Compressed Index
This is a B-tree with compression compression eliminates duplicate occurrences of a key in an index leaf block. SQL> CREATE INDEX Emp_Name ON Emp (Last_Name, First_Name) TABLESPACE Index01 COMPRESS 1; This approach breaks an index key into a prefix and suffix entry in the index block. Compression causes sharing of the prefix entries among all suffix entries and save lots of space allowing the storage of more keys in a block. Use key compression when: The index is non-unique where the ROWID column is appended to the key to make the index key unique. The index is a unique multicolumn index example: Zip_Code + Last_Name.
SQL> ALTER INDEX User350.Products_Region_Idx ALLOCATE EXTENT (SIZE 400K DATAFILE '/a01/student/user350/oradata/user350index01.dbf'); We can DEALLOCATE unused index space after the insertions are complete. This will free up space that is not in use within the tablespace. SQL> ALTER INDEX User350.Products_Region_Idx DEALLOCATE UNUSED;
Coalesce Index
Cannot move the index to another tablespace. Operation does not require more disk space. Only coalesces leaf blocks. Frees up index leaf blocks for use.
A single row insertion or deletion requires updating all indexes on the table. An update operation may or may not affect indexes depending on the column updated. The use of indexes involves a tradeoff - query performance tends to speed up, but data manipulation language operations tend to slow down. Query performance improves because the index speeds up row retrieval. DML operations slow down because along with row insertions, deletions, and updates, the indexes associated with a table must also have insertions and deletions completed. For very volatile tables, minimize the number of indexes used. When possible, store indexes to a tablespace that does not have rollback segments, temporary segments, and user/data tables DBAs usually create a separate tablespace just used for index segments. We can minimize fragmentation by using extent sizes that are at least multiples of 5 times the DB_BLOCK_SIZE for the database. Create an index when row retrieval involves less than 15% of a large table's rows retrieval of more rows is generally more efficient with a full table scan. We may want to use NOLOGGING when creating large indexes - we can improve performance by avoiding redo generation. Indexes created with NOLOGGING requires a backup as their creation is not archived.
Example:
SQL> CREATE BITMAP INDEX User350.Products_Region_Idx ON User350.Products_Region (RegionId ASC) NOLOGGING; Usually index entries are smaller than the rows they index. Data blocks that store index entries tend to store more entries in each block, so the INITRANS parameter should be higher on indexes than on their related tables.
Sequences in Oracle
Introduction
Use the CREATE SEQUENCE statement to create a sequence, which is a database object from which multiple users may generate unique integers. You can use sequences to automatically generate primary key values. When a sequence number is generated, the sequence is incremented, independent of the transaction committing or rolling back. If two users concurrently increment the same sequence, then the sequence numbers each user acquires may have gaps, because sequence numbers are being generated by the other user. One user can never acquire the sequence number generated by another user. After a sequence value is generated by one user, that user can continue to access that value regardless of whether the sequence is incremented by another user. Sequence numbers are generated independently of tables, so the same sequence can be used for one or for multiple tables. It is possible that individual sequence numbers will appear to be skipped, because they were generated and used in a transaction that ultimately rolled back. Additionally, a single user may not realize that other users are drawing from the same sequence. After a sequence is created, you can access its values in SQL statements with the CURRVAL pseudocolumn, which returns the current value of the sequence, or the NEXTVAL pseudocolumn, which increments the sequence and returns the new value. A sequence is a highly scalable, non-blocking, generator that generates unique numbers. This is demonstrated in the following example. First, two tables are created: SQL> CREATE TABLE SEQ_EX_A ( N NUMBER ); SQL> CREATE TABLE SEQ_EX_B ( S NUMBER, N NUMBER ); One table is populated with five rows: SQL> SQL> SQL> SQL> SQL> INSERT INSERT INSERT INSERT INSERT INTO INTO INTO INTO INTO SEQ_EX_A SEQ_EX_A SEQ_EX_A SEQ_EX_A SEQ_EX_A VALUES VALUES VALUES VALUES VALUES (55); ( 3); (27); (81); (32);
A sequence is generated: SQL> CREATE SEQUENCE SEQ_A START WITH 1 INCREMENT BY 1; Then, the values of table SEQ_EX_A are filled into SEQ_EX_B. The sequence generates a (as mentioned: unique) number by calling NEXTVAL on it: SQL> INSERT INTO SEQ_EX_B SELECT SEQ_A.NEXTVAL, N FROM SEQ_EX_A; The table's content: SQL> SELECT * FROM SEQ_EX_B; This returns: S N ---------- ---------1 55 2 3 3 27 4 81 5 32
Views
Definition
A VIEW is a virtual table that does not exist in reality, but is a logical definition of a set of related columns, usually from multiple tables. A VIEW presents data to the end user of an application system the way that the end user is used to seeing the data. o For example, the Customer Order Form is a view of data from several different tables including CUSTOMER, ORDERS, PRODUCT, ORDERLINE, and SALESPERSON. A VIEW can also be used to simplify query generation and to add data security to a database by limiting the data that an end user can access. A VIEW definition is permanently stored as part of the database. The example below creates a view named PATIENT_BILL that includes the PATIENT_NO, ITEM_CODE and CHARGE columns from the BILLED table and the DESCRIPTION column from the ITEM table, and the DATE_DISCHARGED from the PATIENT table.
Create a View
SQL> CREATE VIEW patient_bill AS SELECT B.patient_no, P.pat_name, B.item_code, charge, description, date_discharged FROM patient P, billed B, item I WHERE P.patient_no = B.patient_no AND I.item_code = B.item_code; View created. Note that the relationship from PATIENT to BILLED is 1:N and the relationship from ITEM to BILLED is 1:N. In other words, the BILLED table is the intersection table linking PATIENT and ITEM. Now we can query the view PATIENT_BILL just as we would a table. When we execute the query, the view is generated by the DBMS and loaded with data, and then our query is executed. SQL> SELECT patient_no, item_code, charge, description FROM patient_bill WHERE patient_no = 1117; PATIENT_NO ITEM_CODE CHARGE DESCRIPTION ---------- --------- -------- --------------------1117 2222 7.54 Syringe, 19 gauge 1117 2255 25 Saline Soln, 1 liter 1117 2245 167.67 Surgical Prep Pack #8 1117 2224 222.21 Surgical Prep Pack #4 1117 2267 4.92 Bed Pan
Derived Columns
A view may contain derived (or virtual) columns. For example, the total charges by patient for room and special items in a room (item codes between 2200 and 2250). A view for this aggregate data can be created from the PATIENT_BILL view that was created in the example above. The new view (code shown below for this view of a view) named ROOM_CHARGE has two virtual columns named OCCUPANT and ROOM_CHGS. Note that ROOM_CHARGES is the sum of the charges for an occupant of the room. SQL> CREATE VIEW room_charge (occupant, room_chgs) AS SELECT pat_name, sum(CHARGE) FROM patient_bill
SET Commands
Sets a system variable to alter the SQL*Plus environment settings for our current session, for example: Display width for data Turn on HTML formatting Enabling or disabling printing of column headings Number of lines per page In iSQL*Plus, we can also use the System Variables screen to set system variables. SET system_variable value where system_variable and value represent one of the following clauses: Syntax: SQL> SET option value SQL> SHO[W] option Options: most of these have an abbreviated and a long form e.g. APPINFO or APPI will do the same thing we can get a list of the set options in SQLPLUS with the command SQL> HELP SET APPI[NFO]{ON|OFF|text} ARRAY[SIZE] {15|n} AUTO[COMMIT] {OFF|ON|IMM[EDIATE]|n} AUTOP[RINT] {OFF|ON} AUTORECOVERY [ON|OFF] AUTOT[RACE] {OFF|ON|TRACE[ONLY]} [EXP[LAIN]] [STAT[ISTICS]] Application info for performance monitor (see DBMS_APPLICATION_INFO) Fetch size (1 to 5000) the number of rows that will be retrieved in one go. Autocommit commits after each SQL command or PL/SQL block Automatic PRINTing of bind variables. Configure the RECOVER command to automatically apply archived redo log files during recovery - without any user confirmation. Display a trace report for SELECT, INSERT, UPDATE or DELETE statements EXPLAIN shows the query execution path by performing an EXPLAIN PLAN. STATISTICS displays SQL statement statistics. Using ON or TRACEONLY with no explicit options defaults to EXPLAIN STATISTICS Set the non-alphanumeric character used to end PL/SQL blocks to c Change or enable command separator - default is a semicolon (;) The text to be printed between SELECTed columns normally a space. Version of oracle - see also init.ora COMPATIBILITY= We can set this back by up to 2 major versions e.g. Ora 9 supports 8 and 7 Termination character for substitution variable reference default is a period. The COPY command will fetch n batches of data between commits. (n= 0 to 5000) the size of each fetch=ARRAYSIZE. If COPYCOMMIT = 0, COPY will commit just once - at the end. Suppress the comparison of data types while inserting or appending to DB2 c = the char used to prefix substitution variables. ON or OFF controls whether to replace substitution variables with their values. (this overrides SET SCAN) Sets the depth of the level to which we can recursively describe an object (1 to 50) see the DESCRIBE command Display commands as they are executed
BLO[CKTERMINATOR] {.|c|OFF|ON} CMDS[EP] {;|c|OFF|ON} COLSEP { |text} COM[PATIBILITY] {V5|V6|V7|V8|NATIVE} CON[CAT] {.|c|OFF|ON} COPYC[OMMIT] {0|n}
EMB[EDDED] {OFF|ON}
OFF = report printing will start at the top of a new page. ON = report printing may begin anywhere on a page. Defines the escape character. OFF undefines. ON enables. Display the number of records returned (when rows > n ) OFF (or n=0) turns the display off ON sets n=1
FLAGGER Checks to make sure that SQL statements conform to the ANSI/ISO SQL92 {OFF|ENTRY|INTERMED[IATE]|FU standard. non-standard constructs are flagged as errors and displayed LL} FLU[SH] {OFF|ON} Buffer display output (OS) (no longer used in Oracle 9) HEA[DING] {OFF|ON} HEADS[EP] {||c|OFF|ON} print column headings Define the heading separator character (used to divide a column heading onto > one line.) OFF will actually print the heading separator char Change the default instance for our session, this command may only be issued when not already connected and requires Net8 Width of a line (before wrapping to the next line) Earlier versions default to 80, Oracle 9 is 150 Starting position from which CLOB and NCLOB data is retrieved and displayed Change the location from which archive logs are retrieved during recovery normally taken from LOG_ARCHIVE_DEST Set the maximum width (in chars) for displaying and copying LONG values. Set the fetch size (in chars) for retrieving LONG values.
MARK[UP] HTML [ON|OFF] Output HTML text, which is the output used by iSQL*Plus. [HEAD text] [BODY text] [TABLE text] [ENTMAP {ON|OFF}][SPOOL {ON|OFF}] [PRE[FORMAT] {ON|OFF}] NEWP[AGE] {1|n} NULL text The number of blank lines between the top of each page and the top title. 0 = a formfeed between pages. NULL text Replace a null value with 'text' The NULL clause of the COLUMN command will override this for a given column. The default number format. The default width for displaying numbers. The height of the page - number of lines. 0 will suppress all headings, page breaks, titles press [Return] after each page enclose text in single quotes Print a single line of the RECSEPCHAR between each record. WRAPPED = print only for wrapped lines EACH=print for every row
PAU[SE] {OFF|ON|text}
RECSEP {WR[APPED]|EA[CH]|OFF}
RECSEPCHAR {_|c} SCAN {OFF|ON} SERVEROUT[PUT] {OFF|ON} [SIZE n] [FOR[MAT] {WRA[PPED]|WOR[D_WRAPPED]|TR U[NCATED]}] SHOW[MODE] {OFF|ON} SPA[CE] {1|n} SQLBL[ANKLINES] {ON|OFF} SQLC[ASE] {MIX[ED]|LO[WER]|UP[PER]} SQLPLUSCOMPAT[IBILITY] {x.y[.z]} SQLCO[NTINUE] {> |text} SQLN[UMBER] {OFF|ON}
Define the RECSEPCHAR character, default= ' ' OFF = disable substitution variables and parameters whether to display the output of stored procedures (or PL/SQL blocks) i.e., DBMS_OUTPUT.PUT_LINE SIZE = buffer size (2000-1,000,000) bytes Display old and new settings of a system variable The number of spaces between columns in output (1-10) Allow blank lines within an SQL command. Reverts to OFF after the current command/block. Convert the case of SQL commands and PL/SQL blocks (but not the SQL buffer itself) Set the behavior or output format of VARIABLE to that of the release or version specified by x.y[.z]. Continuation prompt (used when a command is continued on an additional line using a hyphen -) Set the prompt for the second and subsequent lines of a command or PL/SQL block. ON = set the SQL prompt = the line number. OFF = set the SQL prompt = SQLPROMPT. set a non-alphanumeric prefix char for immediately executing one line of SQL (#) Set the command prompt. Set the char used to end and execute SQL commands to c. OFF disables the command terminator - use an empty line instead. ON resets the terminator to the default semicolon (;). Default file extension for SQL scripts Format white space in terminal output. OFF = use spaces to format white space. ON = use the TAB char. Note this does not apply to spooled output files. The default is system-dependent. Enter SHOW TAB to see the default value. OFF suppresses the display of output from a command file ON displays the output. TERMOUT OFF does not affect the output from commands entered interactively.
TERM[OUT] {OFF|ON}
Display the time at the command prompt. ON = display timing statistics for each SQL command or PL/SQL block run. OFF = suppress timing statistics Display trailing blanks at the end of each line. ON = remove blanks, improving performance OFF = display blanks. This does not affect spooled output. SQL*Plus ignores TRIMOUT ON unless we set TAB ON. Allows trailing blanks at the end of each spooled line.
TRIMS[POOL] {ON|OFF}
This does not affect terminal output. UND[ERLINE] {-|c|ON|OFF} VER[IFY] {OFF|ON} Set the char used to underline column headings to c. ON = list the text of a command before and after replacing substitution variables with values. OFF = dont display the command. Controls whether to truncate or wrap the display of long lines. OFF = truncate ON = wrap to the next line The COLUMN command (WRAPPED and TRUNCATED clause) can override this for specific columns.
WRA[P] {OFF|ON}
7. Middleware
Application Server 10g Application Server MapViewer Application Server Adapters Application Server Containers for J2EE Application Server Integration Application Server Personalization Application Server Web Cache Business Intelligence Standard Edition Collaboration Suite 10g Content Management SDK Content Services Forms & Reports Services HTTP Server Identity Management Portal Real-Time Collaboration Sensor Edge Server
1.1. Embedded
TimesTen In-Memory Database Berkeley DB Database Lite
2. Search
Secure Enterprise Search
3. Enterprise Management
Enterprise Manager 10g Grid Control Enterprise Manager Grid Control Plugins
4. Tape Backup
Secure Backup
5. Migration Tools
Database Migration Verifier Oracle-on-Linux VMware Tool Kits JDeveloper App Migration Assistant Migration Tool Kits Migration Workbench
6. Archived Products
Oracle9i Database Oracle9i Lite Oracle9i Personal Oracle9i Internet Directory Oracle9i Unified Messaging Oracle9iAS Wireless Discoverer Desktop Software Configuration Manager