In our product we have a generic search engine, and trying to optimze the search performance. A lot of the tables used in the queries allow null values. Should we redesign o
The issue of whether to use Nulls because they affect performance is one of those balancing acts of database design. You have to balance business needs against performance.
Nulls should be used if they are needed. For instance, you may have a begin date and an end date in a table. You often would not know the end date at the time the record is created. Therefore you must allow nulls whether they affect performance or not as the data is simply not there to be put in. However, if the data must, by the business rules, be there at the time the record is created, then you should not allow nulls. This would improve performance, make coding a bit simpler and make sure the data integrity is preserved.
If you have existing data that you would like to change to no longer allow nulls, then you have to consider the impact of that change. First, do you know what value you need to put into the records which are currently null? Second, do you have a lot of code that is using isnull
or coalesce
which you need to update (these things slow performance, so if you no longer need to check for them, you should change the code)? DO you need a default value? Can you really assign one? If not will some of the insert or update code break if it is not considering that the field can no longer be null. Sometimes people will put in bad information to allow them to get rid of nulls. So now the price field needs to contain decimal values and things like 'unknown' and thus can't properly be a decimal datatype and then you have to go to all sorts of lengths in order to do calculations. This often creates performance problems as bad or worse than the null created. PLus you need to go through all your code and where ever you used a refernce to the filed being null or not being null, you need to rewrite to exclude or include based on the possible bad values someone will put in becasue the data is not allowed to be null.
I do a lot of data imports from client data and every time we get a file where some field that should allow nulls does not, we get garbage data that needs to be cleaned up before we import to our system. Email is one of these. Often the data is input not knowing this value and it's generally some type of string data, so the user can type anything in here. We go to import emails and find things "I don't know". Tough to try to actually send an email to "I don't know". If the system requres a valid email address and checks for something like the existance of an @ sign, we would get 'I@dont.know" How is garbage data like this useful to the users of the data?
Some of the performance issues with nulls are a result of writing nonsargable queries. Sometimes just rearranging the where clause rather than eliminating a necessary null can improve the performance.