Return elements of Redshift JSON array on separate rows

后端 未结 2 583
北荒
北荒 2020-12-06 13:46

I have a Redshift table that looks like this:

 id | metadata
---------------------------------------------------------------------------
 1  | [{\"pet\":\"do         


        
相关标签:
2条回答
  • 2020-12-06 13:58

    There is generic version for CREATE VIEW seq_0_to_3. Let's call it CREATE VIEW seq_0_to_n. This can be generated by

    CREATE VIEW seq_0_to_n AS (  
        SELECT row_number() over (
                              ORDER BY TRUE)::integer - 1 AS i
        FROM <insert_large_enough_table> LIMIT <number_less_than_table_entries>);
    

    This helps in generating large sequences as a view.

    0 讨论(0)
  • 2020-12-06 14:06

    Thanks to this inspired blog post, I've been able to craft a solution. This is:

    1. Create a look-up table to effectively 'iterate' over the elements of each array. The number of rows in this table has be equal to or greater than the maximum number of elements of arrays. Let's say this is 4 (it can be calculated using SELECT MAX(JSON_ARRAY_LENGTH(metadata)) FROM input_table):

      CREATE VIEW seq_0_to_3 AS
          SELECT 0 AS i UNION ALL                                      
          SELECT 1 UNION ALL
          SELECT 2 UNION ALL    
          SELECT 3          
      );
      
    2. From this, we can create one row per JSON element:

      WITH exploded_array AS (                                                                          
          SELECT id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(metadata, seq.i) AS json
          FROM input_table, seq_0_to_3 AS seq
          WHERE seq.i < JSON_ARRAY_LENGTH(metadata)
        )
      SELECT *
      FROM exploded_array;
      

      Producing:

       id | json
      ------------------------------
       1  | {"pet":"dog"}
       1  | {"country":"uk"}
       2  | {"pet":"cat"}
       4  | {"country":"germany"}
       4  | {"education":"masters"}
       4  | {"country":"belgium"}
      
    3. However, I was needing to extract the field names/values. As I can't see any way to extract JSON field names using Redshift's limited functions, I'll do this using a regular expression:

      WITH exploded_array AS (                                                                                       
          SELECT id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(metadata, seq.i) AS json
          FROM input_table, seq_0_to_3 AS seq
          WHERE seq.i < JSON_ARRAY_LENGTH(metadata)
      )
      SELECT id, field, JSON_EXTRACT_PATH_TEXT(json, field)
      FROM (
          SELECT id, json, REGEXP_SUBSTR(json, '[^{"]\\w+[^"]') AS field
          FROM exploded_array
      );
      
    0 讨论(0)
提交回复
热议问题