PostgreSQL create a new column with values conditioned on other columns

后端 未结 2 1632
时光取名叫无心
时光取名叫无心 2021-01-04 11:22

I use PostgreSQL 9.1.2 and I have a basic table as below, where I have the Survival status of an entry as a boolean (Survival) and also in number of days

相关标签:
2条回答
  • 2021-01-04 12:09

    Honestly, I think you are better off not storing data in the db which is quickly and easily calculated from stored data. A better option would be to simulate a calculated field (gotchas noted below however). In this case you would 9changing spaces etc to underscores for easier maintenance:

    CREATE FUNCTION one_yr_survival(mytable)
    RETURNS BOOL
    IMMUTABLE
    LANGUAGE SQL AS $$
    select $1.survival OR $1.survival_days >= 365;
    $$;
    

    then you can actually:

    SELECT *, m.one_year_survival from mytable m;
    

    and it will "just work." Note the following gotchas:

    • mytable.1_year_survival will not be returned by the default column list, and
    • you cannot omit the table identifier (m in the above example) because the parser converts this into one_year_survival(m).

    However the benefit is that the value can be proven never to get out of sync with the other values. Otherwise you end up with a rats nest of check constraints.

    You can actually take this approach quite far. See http://ledgersmbdev.blogspot.com/2012/08/postgresql-or-modelling-part-2-intro-to.html

    0 讨论(0)
  • 2021-01-04 12:16

    The one-time operation can be achieved with a plain UPDATE:

    UPDATE tbl
    SET    one_year_survival = (survival OR survival_days >= 365);
    

    I would advise not to use camel-case, white-space and parenthesis in your names. While allowed between double-quotes, it often leads to complications and confusion. Consider the chapter about identifiers and key words in the manual.

    Are you aware that you can export the results of a query as CSV with COPY?
    Example:

    COPY (SELECT *, (survival OR survival_days >= 365) AS one_year_survival FROM tbl)
    TO '/path/to/file.csv';
    

    You wouldn't need the redundant column this way to begin with.


    Additional answer to comment

    To avoid empty updates:

    UPDATE tbl
    SET    "Dead after 1-yr" = (dead AND my_survival_col < 365)
          ,"Dead after 2-yrs" = (dead AND my_survival_col < 730)
    ....
    WHERE  "Dead after 1-yr" IS DISTINCT FROM (dead AND my_survival_col < 365)
       OR  "Dead after 2-yrs" IS DISTINCT FROM (dead AND my_survival_col < 730)
    ...
    

    Personally, I would only add such redundant columns if I had a compelling reason. Normally I wouldn't. If it's about performance: are you aware of indexes on expressions and partial indexes?

    0 讨论(0)
提交回复
热议问题