What Is A Null Value
What Is A Null Value
What Is A Null Value
In databases a common issue is what value or placeholder do you use to represent a missing values. In SQL,
this is solved with null. It is used to signify missing or unknown values. The keyword NULL is used to
indicate these values. NULL really isn’t a specific value as much as it is an indicator. Don’t think of NULL as
similar to zero or blank, it isn’t the same. Zero (0) and blanks “ “, are values.
In most of our beginning lessons we’ve assumed all tables contained data; however, SQL treats missing values
differently. It is important to understand how missing values are used and their effect on queries and
calculations.
In many situations every row and column will contain data, but there cases where it makes sense for some
columns to not contain a value.
Customer
4 Chris Green MI
Consider the above Customer table. Several columns contain missing or unknown values. Reasons why this
can happen include a value is:
missing – Perhaps a customer, such as Sue, doesn’t divulge her age to your customer service representative.
unknown – An employee’s termination date is usually some event in the unforeseen future.
doesn’t apply – If the customer is a business, then Sex doesn’t apply.
You could argue that for text values you could use blanks, such as one space ‘ ‘, or even an empty value, which
is two single quotes ‘’ to represent a missing value. Yet this strategy doesn’t work well for numbers or
dates. If the customer’s age is unknown, what numeric value would you use? Would it make sense to use zero
or a negative number?
I think doing that causes more confusion and would make it very easy to skew results. For instance, if you
were going to calculate the average age of your female customers, the zeros, the ones you were using to
represent missing values, would cause the actual average age to be lower than you would expect.
SQL reserves the NULL keyword to denote an unknown or missing value. In SQL NULL’s are treated
differently than values. Special consideration is required when comparing null value or using them in
expressions.
1
NULL Value in Comparisons:
When it isn’t possible to specially code your data using “N/A” you can use the special keyword NULL to
denote a missing value. NULL is tricky. NULL isn’t a value in the normal sense. For instance no two NULL
are equal to each other. Surprisingly NULL = NULL is FALSE!
SQL covers this dilemma though. You can use the IS NULL and IS NOT NULL comparisons to test for
NULL values.
SELECT ProductID,
Name,
Color
FROM Production.Product
WHERE Color IS NULL
SELECT ProductID,
Name,
Color
FROM Production.Product
WHERE Color IS NOT NULL
NULLS in Expressions
As you can expect nulls have an adverse effect in expressions. Since NULL denotes an undefined value, its
participation in most expressions renders the expression unknown, or NULL, as well.
Check out the following arithmetic expression. The first line evaluates to a number the others to NULL:
20 + (5 * 4) = 20 + 20 = 40
NULL + (5 * 4) = NULL + 20 = NULL
20 + (NULL * 4) = 20 + NULL = NULL
SELECT ProductID,
Color + ' ' +Name
FROM Production.Product
Did you notice you either get value like “Black Chainring” or NULL? You may expect to see just the
product‘s name if the color is missing, but since the color is NULL, the expression is also NULL.
Without help, it would be tough to create expressions, as the NULL’s would “ruin” most results!
In this case you can use the COALESCE function to assist. This function will return the first non-NULL value
from a list of parameters. For example
2
COALESCE(NULL, 'A', 'B')
We can use COALESCE to replace a NULL with another value. In this way we can then continue to build an
expression that will return a result free of NULL value.
SELECT ProductID,
COALESCE(Color + ' ','') + Name
FROM Production.Product
How does the work? I’ve highlighted some important bits in the statement:
If color is the value “Black,” then the COALESCE function will return “Black “ with the training space. The
product name is then appended to form the full value.
If the color is NULL, then COALESCE seeks out the first non-null value. This happens to be the empty string
”. Using this trick, it makes it really easy to return either the combination of two columns, or just the second
column.
Where clauses are used to limit the rows returned from a query. Generally only rows meeting the where clause
are returned. Rows whose where clause evaluates to FALSE are removed from the result.
In similar fashion if the where clause evaluates to NULL, the row is eliminated.
SELECT ProductID,
Name,
Color
FROM Production.Product
WHERE LEN(COLOR) < 100
Will return all rows where a color is specified. LEN is a function that returns the number of characters in a
value; LEN(‘Black’) returns 5. Since this is less than 100 it would be included in the result.
In cases where COLOR is NULL, then LEN(COLOR) returns NULL. Since NULL < 100 is always false, the
row is removed from the result.
Since NULL represents a missing or unknown value, its effect on Boolean expressions is curious. It status as
an unknown value puts a spin on the outcome.
3
I think the results for AND are what you would expect, but the results you would expect for OR are not. In
many cases when a value is unknown, the uncertainty of the value translates through the expression.
The AND operator returns a TRUE only if all conditions are also TRUE. The following truth table shows all
combinations of values the condition (A AND B). Keep in mind we use NULL to denote unknown values.
The one of the inputs being unknown, there is no possibility the expression can be TRUE, as we can’t be
certain both inputs are TRUE. Conversely, we don’t know whether the unknown value is FALSE. This leaves
to concluding the result, in general is unknown.
Boolean OR Operator
The OR operator returns a TRUE when one or more conditions are also TRUE. Here is the Truth table for the
OR operator.
4
NULL TRUE TRUE
You’ll see that in every case one of the conditions is true, so is the end result. This is even the case when the
other input is NULL. All that matters is one input is TRUE, the other input, albeit unknown, is irrelevant.