How to change the col type of a BigQuery repeated record

放肆的年华 提交于 2020-05-15 22:43:47

问题


I'm trying to change a col type of a repeated record from STRING to TIMESTAMP. There are a few suggestions from BQ docs here (manually-changing-schemas). However, I'm running into issues with each of the recommended suggestions.

Here is an example schema:

{
  'name' => 'id',
  'type' => 'STRING',
  'mode' => 'REQUIRED'
},
{
  'name' => 'name',
  'type' => 'STRING',
  'mode' => 'REQUIRED'
},
// many more fields including nested records and repeated records
{
  'name' => 'locations',
  'type' => 'RECORD',
  'mode' => 'REPEATED',
  'fields' => [
    {
      'name' => 'city',
      'type' => 'STRING',
      'mode' => 'REQUIRED'
    },
    {
      'name' => 'updated_at',
      'type' => 'STRING',   // ** want this as TIMESTAMP **
      'mode' => 'REQUIRED'
    },
  ]
}

Issue with using Query:

I think we'd have to UNNEST the repeated record, cast the field to timestamp for each repeated record and then somehow recreate the row for insertion into the new table.

Issue with exporting table as JSON:

When exporting the table in JSON format it exports raw json representations of the data (with maps and dictionaries, as we'd expect it to).

However, we're not able to import that raw data back into BQ:

BigQuery does not support maps or dictionaries in JSON. For example, "product_categories": {"my_product": 40.0} is not valid, but "product_categories": {"column1": "my_product" , "column2": 40.0} is valid.

https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-json#limitations

Any suggestions would be much appreciated!


回答1:


Below answer is based on that:REPEATED RECORD type in BigQuery StandardSQL is represented as type ARRAY<STRUCT<f1 f1_type, f2 f2_type ... >>.

It is not my favorite since you have to specify full column list. Maybe there is a better way.

#standardSQL
-- Build sample data, try to mimic what's in question.
CREATE OR REPLACE TABLE
  <your_dataset>.sample_table AS
SELECT name, 
       array<struct<city string, update_at string>>[("SFO", "2011-1-1"), ("SEA", "2022-2-2")] 
       as locations
FROM UNNEST(['Name1', "Name2", "Name3"]) as name;

Then below SQL will convert update_at column into DATE and save to new table (or the same one if you'd like).

#standardSQL
CREATE OR REPLACE TABLE
  <your_dataset>.output_table AS
SELECT * REPLACE (
   ARRAY(SELECT AS STRUCT * REPLACE(CAST(update_at AS DATE) AS update_at)
         FROM UNNEST(locations)) 
   AS locations 
   )
FROM
  <your_dataset>.sample_table;



来源:https://stackoverflow.com/questions/55485264/how-to-change-the-col-type-of-a-bigquery-repeated-record

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!