How important are lookup tables?

后端 未结 7 2001
你的背包
你的背包 2020-12-08 02:51

A lot of the applications I write make use of lookup tables, since that was just the way I was taught (normalization and such). The problem is that the queries I make are of

相关标签:
7条回答
  • 2020-12-08 03:17

    Where possible (and It is not always . . .), I use this rule of thumb: If I need to hard-code a value into my application (vs. let it remain a record in the database), and also store that vlue in my database, then something is amiss with my design. It's not ALWAYS true, but basically, whatever the value in question is, it either represents a piece of DATA, or a peice of PROGRAM LOGIC. It is a rare case that it is both.

    NOT that you won't find yourself discovering which one it is halfway into the project. But as the others said above, there can be trade-offs either way. Just as we don't always acheive "perfect" normalization in a database design (for reason of performance, or simply because you CAN take thngs too far in pursuit of acedemic perfection . . .), we may make some concious choices about where we locate our "look-up" values.

    Personally, though, I try to stand on my rule above. It is either DATA, or PROGRAM LOGIC, and rarely both. If it ends up as (or IN) a record in the databse, I try to keep it out of the Application code (except, of course, to retrieve it from the database . . .). If it is hardcoded in my application, I try to keep it out of my database.

    In cases where I can't observe this rule, I DOCUMENT THE CODE with my reasoning, so three years later, some poor soul will be able to ficure out how it broke, if that happens.

    0 讨论(0)
  • 2020-12-08 03:24

    For look-up tables I use a sensible primary key -- usually just a CHAR(1) that makes sense in the domain with an additional Title (VARCHAR) field. This can maintain relationship enforcement while "keeping the SQL simple". The key to remember here is the look-up table does not "contain data". It contains identities. Some other identities might be time-zone names or assigned IOC country codes.

    For instance gender:

    ID Label
    M  Male
    F  Female
    N  Neutral
    
    select * from people where gender = 'M'
    

    Alternatively, an ORM could be used and manual SQL generation might never have to be done -- in this case the standard "int" surrogate key approach is fine because something else deals with it :-)

    Happy coding.

    0 讨论(0)
  • 2020-12-08 03:27

    Question is: do you need to include the lookup tables (domain tables 'round my neck of the woods) in your queries? Presumably, these sorts of tables are usually

    • pretty static in nature — the domain might get extended, but it probably won't get shortened.
    • their primary key values are pretty unlikely to change as well (e.g., the status_id for a status of 'open' is unlikely to suddenly get changed to something other than what it was created as).

    If the above assumptions are correct, there's no real need to add all those extra tables to your joins just so your where clause can use a friend name instead of an id value. Just filter on status_id directly where you need to. I'd suspect the non-key attribute in the where clause ('name' in your example above) is more likely to get changes than the key attribute ('name' in your example above): you're more protected by referencing the desire key value(s) of the domain table in your join.

    Domain tables serve

    • to limit the domain of the variable via a foreign key relationship,
    • to allow the domain to be expanded by adding data to the domain table,
    • to populate UI controls and the like with user-friendly information,

    Naturally, you'd need to suck domain tables into your queries where you you actually required the non-key attributes from the domain table (e.g., descriptive name of the value).

    YMMV: a lot depends on context and the nature of the problem space.

    0 讨论(0)
  • 2020-12-08 03:28

    The answer depends a little if you are limited to small filing systems in MyNonSQL, or if you are thinking about SQL and large databases.

    In real Databases, where there are many apps using one database, and many users using different report tools (not just the apps) to access the data, standards, normalisation, and open architecture requirements are important.

    Despite the people who attempt to change the definition of "normalisation", etc. to suit the purpose, Normalisation has not changed.

    • if you have "Open" and "Closed" repeated in data tables, that is a simple Normalisation error. If you change those values you may have to update millions of rows, which is very limited design. Such values are commonly normalised into a Reference or Lookup table. It also saves space. The value "Open", "Closed" etc is no longer duplicated.

    • the second point is ease of change, if "Closed" were changed to "Expired", again, one row needs to be changed, and that is reflected in the entire database; whereas in the unnormalised files, millions of rows need to be changed.

    • Adding new values is simply a matter of inserting one row.

    • in Open Architecture terms, the Lookup table is an ordinary table. It exists in the (standard SQL) catalogue; any report tool can find it, as long as the PK::FK relation is defined, the report tool can find that as well.

    • Enum is only for the Non-SQLS. In SQL the Enum is a Lookup table.

    • The next point relates to the meaningfulness of the key. If the Key is meaningless to the user, fine, use an INT or TINYINT or whatever is suitable; number them incrementally; allow "gaps". But if the Key is meaningful to the user, do not use a meaningless number, do use the meaningful key. "M" and "F" for Male and Female, etc.

      • Now some people will get in to tangents re the permanence of PKs. That is a separate point. Yes, of course, always use a stable value for a PK. "M" and "F" are unlikely to change; if you have used {0,1,2,4,5,6}, well don't change it, why would you want to. Those values were supposed to be meaningless, only meaningful Key need to be changed.
        .
    • if you do use meaningful keys, use short alphabetic codes, that both users and developers can readily understand (and infer to long description from).

    • Since PKs are stable, particularly in Lookup tables, you can safely code:

      WHERE status_id = 'O'

      You do not have to join with the Lookup table and examine the Value "Open". That loses the value of the Lookup table in the code segments.

    SQL is a cumbersome language, especially when it comes to joins. But that is all we have, so we need to just accept the encumbrance and deal with it. Your example code is fine. But simpler forms can do the same thing. A report tool would generate:

    SELECT  p.*,
             s.name
        FROM posts p, 
             status s
        WHERE p.status_id = s.status_id 
        AND   p.status_id = 'O'

    • For banking systems, where we use short codes which are meaningful (since they are meaningful, we do not change them with the seasons, we just add to them), given a Lookup table such as (carefully chosen, similar to ISO Country Codes):

      Eq   Equity
      EqCS Equity/Common Share
      O    Over The Counter
      OF   OTC/Future

      Code such as this is common:

      WHERE InstrumentTypeCode LIKE "Eq%"

    And the users would choose the value from a drop-down that displayed "Open", "Closed", etc., not {0,1,2,4,5,6}, not {M, F, U}. Both in the apps, and in the report tool. Without a lookup table, you can't do that.

    Last, If the database was large, and supported BI or DSS or OLAP functions (the highly Normalised databases do), then the Lookup table is actually a Dimension or Vector, in Dimension-Fact analyses. If it was not there, then it would have to be added in, to satisfy the requirements of that software, before such analyses can be mounted.

    0 讨论(0)
  • 2020-12-08 03:29

    The answer is "whatever makes sense".

    lookup tables involve joins or subqueries which are not always efficient. I make use of enums a lot to do this job. its efficient and fast

    0 讨论(0)
  • 2020-12-08 03:32

    Create a function for each lookup. There is no easy way. You want performance and query simplicity. Ensure the following is maintained. You could create a SP_TestAppEnums to compare existing lookup values against the function and look for out of sync/zero returned.

    CREATE FUNCTION [Enum_Post](@postname varchar(10))
    RETURNS int
    AS
    BEGIN
    DECLARE @postId int
    SET @postId =
    CASE @postname
    WHEN 'Open' THEN 1
    WHEN 'Closed' THEN 2
    END
    
    RETURN @postId
    END
    GO
    
    /* Calling the function */
    SELECT dbo.Enum_Post('Open')
    SELECT dbo.Enum_Post('Closed')
    
    0 讨论(0)
提交回复
热议问题