In SQL and Relational Theory (C.J. Date, 2009) chapter 4 advocates avoiding duplicate rows, and also to avoid NULL
attributes in the data we store. While I have
null
s are a consequence of theory meeting reality and having to be adjusted to be usable. In my opinion attempting to avoid all null
values will ultimately lead to uglier and less maintainable code than just using null
where appropriate.
Do not allow a column to be defined as NULL if at all possible. For me it does not have anything to do with the business rule of what you want NULL to mean it has to do with disk I\O.
In SQL Server a nullable column, say a character 10, will take one bit in a bitmap when null and 10 bytes when not nullable. So how does having a null hurt disk I/O. The way it hurts is when a value is inserted into a column where a null used to be. Since SQL did not reserve space there is not room in the row to just put the value so SQL Server has to shift data around to make room. Page splits, fragmentation, updating the RID if this is a HEAP, etc all hurt disk I/O.
BTW if there is a gender table we could add another row for "Unable to determine the true sexual origin or state of the individual".
Quite simply by storing only the known information - in other words the Closed World Assumption. Aim to be in at least Boyce Codd / Fifth Normal Form and you won't go far wrong.
I disagree with the author and would claim that NULL is actually the CORRECT way to handle missing data for optional fields. In fact, it's the reason that NULL exists at all...
For your specific problem regarding gender:
NULL
could/should be used as long as:
A) You have a business reason. For example, in a table of payments, a NULL
payment value would mean it was never paid. A 0.00
payment value would mean we intentionally paid nothing. For medical charts, a NULL
value for a blood pressure reading would mean you didn't take a BP, a 0
value would mean the patient is dead. This is a significant distinction, and necessary in certain applications.
B) Your queries account for it. If you understand the affect of NULL
on IN
, EXISTS
, inequality operators (like you specified in OP), etc. then it shouldn't be an issue. If you have NULL
now in your tables and don't want the value for certain applications, you can employ views and either COALESCE
or ISNULL
to populate different values if the source table has a NULL
.
EDIT:
To address OP's questions about "real world" inequalities/equalities using NULL
, this is a great example I use sometimes.
You are at a party with 3 other people. You know that one person is named "John" but don't know the others.
Logically, the answer for "How many people are named Joe" is unknown or NULL
. In SQL, this would be something like
SELECT name FROM party where NAME = 'Joe'
You would get no rows since you don't know their names. They may or may not be Joe.
Your inequality would be:
SELECT name from party where NAME <> 'Joe'
You would only get a return value for "John" since John's name is all you know. The other people may or may not be Joe, but you have no way to know.
NULLs are required - theres no need to replace them
The enitre definition of NULL is that its unknown - simply replacing this with arbitrary type is doing the same thing, so why?
For the comments below:
Just tried this - neither is true:
declare @x char
set @x = null
if @x = @x
begin
select 'true'
end
if @x <> @x
begin
select 'false'
end
I can only take this to mean that because null is unknown then it can't be said that it equals or does not equal - hence both statements are false