How do I not normalize continuous data (INTS, FLOATS, DATETIME, …)?

£可爱£侵袭症+ 提交于 2019-12-02 07:12:35

The Comments (so far) are discussing the misuse of the term "normalization". I accept that criticism. Is there a term for what is being discussed?

Let me elaborate on my 'claim' with this example... Some DBAs replace a DATE with a surrogate ID; this is likely to cause significant performance issues when a date range is used. Contrast these:

-- single table
SELECT ...
    FROM t
    WHERE x = ...
      AND date BETWEEN ... AND ...;   -- `date` is of datatype DATE/DATETIME/etc

-- extra table
SELECT ...
    FROM t
    JOIN Dates AS d  ON t.date_id = d.date_id
    WHERE t.x = ...
      AND d.date BETWEEN ... AND ...;  -- Range test is now in the other table

Moving the range test to a JOINed table causes the slowdown.

The first query is quite optimizable via

INDEX(x, date)

In the second query, the Optimizer will (for MySQL at least) pick one of the two tables to start with, then do a somewhat tedious back-and-forth to the other table to handle rest of the WHERE. (Other Engines use have other techniques, but there is still a significant cost.)

DATE is one of several datatypes where you are likely to have a "range" test. Hence my proclamations about it applying to any "continuous" datatypes (ints, dates, floats).

Even if you don't have a range test, there may be no performance benefit from the secondary table. I often see a 3-byte DATE being replaced by a 4-byte INT, thereby making the main table larger! A "composite" index almost always will lead to a more efficient query for the single-table approach.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!