How do I create a dates table in Redshift?

前端 未结 3 1525
我寻月下人不归
我寻月下人不归 2020-12-12 06:56

tl;dr: I want to generate a dates table in Redshift in order to make a report easier to generate. Preferable without needing large tables already in redshift, needing to up

相关标签:
3条回答
  • 2020-12-12 07:22

    As a workaround, you can spin Postgres instance on your local machine, run the code there, export to CSV, then run CREATE TABLE portion only in Redshift and load data from CSV. Since this is a one-time operation it's ok to do, this is what I'm actually doing for new Redshift deployments.

    0 讨论(0)
  • 2020-12-12 07:29

    In asking the question, I figured it out. Oops.

    I started with a "facts" schema.

    CREATE SCHEMA facts;
    

    Run the following to start a numbers table:

    create table facts.numbers
    (
      number int PRIMARY KEY
    )
    ;
    

    Use this to generate your number list. I used a million to get started

    SELECT ',(' || generate_series(0,1000000,1) || ')'
    ;
    

    Then copy-paste the numbers from your results in the query below, after VALUES:

    INSERT INTO facts.numbers
    VALUES
     (0)
    ,(1)
    ,(2)
    ,(3)
    ,(4)
    ,(5)
    ,(6)
    ,(7)
    ,(8)
    ,(9)
    -- etc
    

    ^ Make sure to remove the leading comma from the copy-pasted list of numbers

    Once you have a numbers table, then you can generate a dates table (again, stealing code from elliot land http://elliot.land/post/building-a-date-dimension-table-in-redshift ) :

    CREATE TABLE facts.dates (
      "date_id"              INTEGER                     NOT NULL PRIMARY KEY,
    
      -- DATE
      "full_date"            DATE                        NOT NULL,
    
      -- YEAR
      "year_number"          SMALLINT                    NOT NULL,
      "year_week_number"     SMALLINT                    NOT NULL,
      "year_day_number"      SMALLINT                    NOT NULL,
    
      -- QUARTER
      "qtr_number"           SMALLINT                    NOT NULL,
    
      -- MONTH
      "month_number"         SMALLINT                    NOT NULL,
      "month_name"           CHAR(9)                     NOT NULL,
      "month_day_number"     SMALLINT                    NOT NULL,
    
      -- WEEK
      "week_day_number"      SMALLINT                    NOT NULL,
    
      -- DAY
      "day_name"             CHAR(9)                     NOT NULL,
      "day_is_weekday"       SMALLINT                    NOT NULL,
      "day_is_last_of_month" SMALLINT                    NOT NULL
    ) DISTSTYLE ALL SORTKEY (date_id)
    ;
    
    
    INSERT INTO facts.dates
    (
       "date_id"
      ,"full_date"
      ,"year_number"
      ,"year_week_number"
      ,"year_day_number"
    
      -- QUARTER
      ,"qtr_number"
    
      -- MONTH
      ,"month_number"
      ,"month_name"
      ,"month_day_number"
    
      -- WEEK
      ,"week_day_number"
    
      -- DAY
      ,"day_name"
      ,"day_is_weekday"
      ,"day_is_last_of_month"
    )
      SELECT
        cast(seq + 1 AS INTEGER)                                      AS date_id,
    
        -- DATE
        datum                                                         AS full_date,
    
        -- YEAR
        cast(extract(YEAR FROM datum) AS SMALLINT)                    AS year_number,
        cast(extract(WEEK FROM datum) AS SMALLINT)                    AS year_week_number,
        cast(extract(DOY FROM datum) AS SMALLINT)                     AS year_day_number,
    
        -- QUARTER
        cast(to_char(datum, 'Q') AS SMALLINT)                         AS qtr_number,
    
        -- MONTH
        cast(extract(MONTH FROM datum) AS SMALLINT)                   AS month_number,
        to_char(datum, 'Month')                                       AS month_name,
        cast(extract(DAY FROM datum) AS SMALLINT)                     AS month_day_number,
    
        -- WEEK
        cast(to_char(datum, 'D') AS SMALLINT)                         AS week_day_number,
    
        -- DAY
        to_char(datum, 'Day')                                         AS day_name,
        CASE WHEN to_char(datum, 'D') IN ('1', '7')
          THEN 0
        ELSE 1 END                                                    AS day_is_weekday,
        CASE WHEN
          extract(DAY FROM (datum + (1 - extract(DAY FROM datum)) :: INTEGER +
                            INTERVAL '1' MONTH) :: DATE -
                           INTERVAL '1' DAY) = extract(DAY FROM datum)
          THEN 1
        ELSE 0 END                                                    AS day_is_last_of_month
      FROM
        -- Generate days for 81 years starting from 2000.
        (
          SELECT
            '2000-01-01' :: DATE + number AS datum,
            number                        AS seq
          FROM facts.numbers
          WHERE number between 0 and 81 * 365 + 20
        ) DQ
      ORDER BY 1;
    

    ^ Be sure to set the numbers at the end for the date range you need

    0 讨论(0)
  • 2020-12-12 07:46

    Here is a different suggestion for building the facts.numbers that does not require manual intervention:

    1. Take a system table (guaranteed to exist) of a known or stable size
    2. Cross join that table to itself enough times to get the desired number of rows
    3. Select the row_number() over (order by 1) to turn those created records into an ascending set of numbers

    Example using the Redshift system table pg_catalog.pg_operator (which as of Oct 2020 has 659 records):

    -- Prep, so that you can copy/paste the code sample
    create schema if not exists facts;   -- Make sure the schema exists
    drop table if exists facts.numbers;  -- Avoid an error if that table already exists;
    create table facts.numbers           -- Create the table definition
    (
      number int primary key
    );
    
    -- The bit you care about
    insert into facts.numbers
        select     row_number() over (order by 1) -- return 1..n in place of the original record
        from       pg_catalog.pg_operator a       -- 659 records
        cross join pg_catalog.pg_operator b       -- to get 659^2=434k records 
        cross join pg_catalog.pg_operator c       -- to get 659^3=286M records
        limit      2000000                        -- to limit the result to a reasonable size
    ;
    
    0 讨论(0)
提交回复
热议问题