How can I avoid NULLs in my database, while also representing missing data?

前端 未结 8 1984
予麋鹿
予麋鹿 2020-12-12 14:19

In SQL and Relational Theory (C.J. Date, 2009) chapter 4 advocates avoiding duplicate rows, and also to avoid NULL attributes in the data we store. While I have

相关标签:
8条回答
  • 2020-12-12 14:22

    nulls are a consequence of theory meeting reality and having to be adjusted to be usable. In my opinion attempting to avoid all null values will ultimately lead to uglier and less maintainable code than just using null where appropriate.

    0 讨论(0)
  • 2020-12-12 14:23

    Do not allow a column to be defined as NULL if at all possible. For me it does not have anything to do with the business rule of what you want NULL to mean it has to do with disk I\O.

    In SQL Server a nullable column, say a character 10, will take one bit in a bitmap when null and 10 bytes when not nullable. So how does having a null hurt disk I/O. The way it hurts is when a value is inserted into a column where a null used to be. Since SQL did not reserve space there is not room in the row to just put the value so SQL Server has to shift data around to make room. Page splits, fragmentation, updating the RID if this is a HEAP, etc all hurt disk I/O.

    BTW if there is a gender table we could add another row for "Unable to determine the true sexual origin or state of the individual".

    0 讨论(0)
  • 2020-12-12 14:28

    Quite simply by storing only the known information - in other words the Closed World Assumption. Aim to be in at least Boyce Codd / Fifth Normal Form and you won't go far wrong.

    0 讨论(0)
  • 2020-12-12 14:38

    I disagree with the author and would claim that NULL is actually the CORRECT way to handle missing data for optional fields. In fact, it's the reason that NULL exists at all...

    For your specific problem regarding gender:

    • Are you sure you want a gender table and incur the cost of an extra join for every query? For simple enumerated types it's not unreasonable to make the field an int and define 1=male, 2=female, NULL=unknown.
    0 讨论(0)
  • 2020-12-12 14:39

    NULL could/should be used as long as:

    A) You have a business reason. For example, in a table of payments, a NULL payment value would mean it was never paid. A 0.00 payment value would mean we intentionally paid nothing. For medical charts, a NULL value for a blood pressure reading would mean you didn't take a BP, a 0 value would mean the patient is dead. This is a significant distinction, and necessary in certain applications.

    B) Your queries account for it. If you understand the affect of NULL on IN, EXISTS, inequality operators (like you specified in OP), etc. then it shouldn't be an issue. If you have NULL now in your tables and don't want the value for certain applications, you can employ views and either COALESCE or ISNULL to populate different values if the source table has a NULL.

    EDIT:

    To address OP's questions about "real world" inequalities/equalities using NULL, this is a great example I use sometimes.

    You are at a party with 3 other people. You know that one person is named "John" but don't know the others.

    Logically, the answer for "How many people are named Joe" is unknown or NULL. In SQL, this would be something like

    SELECT name FROM party where NAME = 'Joe' You would get no rows since you don't know their names. They may or may not be Joe.

    Your inequality would be:

    SELECT name from party where NAME <> 'Joe' You would only get a return value for "John" since John's name is all you know. The other people may or may not be Joe, but you have no way to know.

    0 讨论(0)
  • 2020-12-12 14:40

    NULLs are required - theres no need to replace them

    The enitre definition of NULL is that its unknown - simply replacing this with arbitrary type is doing the same thing, so why?

    For the comments below:

    Just tried this - neither is true:

    declare @x char
    set @x = null
    
    if @x = @x
    begin
    select 'true'
    end
    
    if @x <> @x
    begin
    select 'false'
    end
    

    I can only take this to mean that because null is unknown then it can't be said that it equals or does not equal - hence both statements are false

    0 讨论(0)
提交回复
热议问题