Any disadvantages to bit flags in database columns?

前端 未结 4 867
野趣味
野趣味 2020-12-30 07:27

Consider the following tables:

CREATE TABLE user_roles(
    pkey         SERIAL PRIMARY KEY,
    bit_id       BIGINT NOT NULL,
    name         VARCHAR(256)         


        
相关标签:
4条回答
  • 2020-12-30 07:55

    Adding to previous answers for SQL Server's implementation, you wouldn't save any space by having a single bitfield integer vs a pile of BIT NOT NULL columns:

    The SQL Server Database Engine optimizes storage of bit columns. If there are 8 or less bit columns in a table, the columns are stored as 1 byte. If there are from 9 up to 16 bit columns, the columns are stored as 2 bytes, and so on.

    bit at docs.microsoft.com

    As JNK mentioned, partial comparisons on a bitfield integer would not be SARGable, so an index on a bitfield integer would be useless unless comparing the entire value at once.

    On-disk indexes on SQL Server are based on sorting, so to get to the rows that have any particular bit set in isolation would require a separate index for each bit column. One way to save space if you are only looking for 1s is to make them filtered columns that only store the 1 values (zero values will not have an index entry at all).

    CREATE TABLE news(
        pkey          INT IDENTITY PRIMARY KEY,
        title         VARCHAR(256),
        company_fk    INTEGER REFERENCES compaines(pkey), -- updated since asking the question
        body          VARCHAR(512),
        public_role BIT NOT NULL DEFAULT 0,
        restricted_role BIT NOT NULL DEFAULT 0,
        confidential_role BIT NOT NULL DEFAULT 0,
        secret_role BIT NOT NULL DEFAULT 0
    );
    
    CREATE UNIQUE INDEX ByPublicRole ON news(public_role, pkey) WHERE public_role=1;
    CREATE UNIQUE INDEX ByRestrictedRole ON news(restricted_role, pkey) WHERE restricted_role=1;
    CREATE UNIQUE INDEX ByConfidentialRole ON news(confidential_role, pkey) WHERE confidential_role=1;
    CREATE UNIQUE INDEX BySecretRole ON news(secret_role, pkey) WHERE secret_role=1;
    
    select * from news WHERE company_fk=2 AND restricted_role=1 OR confidential_role=1; 
    select * from news WHERE company_fk=2 AND restricted_role=1 AND confidential_role=1;
    

    Both of those queries produce a nice plan with the random test data I produced:

    As always, indexes should be based on actual query usage and balanced against maintenance cost.

    0 讨论(0)
  • 2020-12-30 07:56

    There is at least one huge disadvantage here...

    These conditions are non-SARGable!

    This is a big one and for me would be a dealbreaker. The bitwise evaluations you need to perform are (to my knowledge) not indexable in any database - the engine needs to check every row to perform the evaluation, which means terrible performance.

    0 讨论(0)
  • 2020-12-30 07:57

    Disadvantages: Hard to write data, hard to read data, hard to debug, but especially: slow queries because the database cannot use indexes on a query like this.

    Advantages, you save a few bytes. Compared to a BIT field, you may save a few MB on a million records table.. hardly worth it. :)

    0 讨论(0)
  • 2020-12-30 07:59

    If you only have a handful of roles, you don't even save any storage space in PostgreSQL. An integer column uses 4 bytes, a bigint 8 bytes. Both may require alignment padding:

    • Making sense of Postgres row sizes
    • Calculating and saving space in PostgreSQL

    A boolean column uses 1 byte. Effectively, you can fit four or more boolean columns for one integer column, eight or more for a bigint.

    Also take into account that NULL values only use one bit (simplified) in the NULL bitmap.

    Individual columns are easier to read and index. Others have commented on that already.

    You could still utilize indexes on expressions or partial indexes to circumvent problems with indexes ("non-sargable"). Generalized statements like:

    database cannot use indexes on a query like this

    or

    These conditions are non-SARGable!

    are not entirely true - maybe for some others RDBMS lacking these features.
    But why circumvent when you can avoid the problem altogether?

    As you have clarified, we are talking about 6 distinct types (maybe more). Go with individual boolean columns. You'll probably even save space compared to one bigint. Space requirement seems immaterial in this case.


    If these flags were mutually exclusive, you could use one column of type enum or a small look-up table and a foreign key referencing it. (Ruled out in question update.)

    0 讨论(0)
提交回复
热议问题