Google BigQuery SQL: Prevent column prefix renaming after join

问题

Say you have a table "table_with_100_columns."

And you want to add one more column with a simple join... without changing all of the column names. In other words, you wish to write something like

SELECT a.* as <a's columns without prefix>, additional_field
FROM [table_with_100_columns] a
JOIN [table_with_2_columns] b
ON a.col1 = b.key

You should be able to do this to generate a new table with 101 columns, without having to rename every single column by hand. Right now the only way I know how to do this as follows:

SELECT
  a.col1 as col1,
  a.col2 as col2,
  a.col3 as col3,
  ...
  a.col100 as col100,
  b.additional_field as additional_field
FROM [table_with_100_columns] a
JOIN [table_with_2_columns] b
ON a.col1 = b.key

Having to write 100 unnecessary lines of code simply to add one more column to a table is unbelievably inefficient - so I'm hoping there is a better way to preserve column names while joining?

UPDATE

It appears this is not yet possible in BigQuery. It is very easy to implement and I suggest the following to the Google BigQuery team:

if no fields share a name in SELECT clause:
  if no subtable reference names given:
    Do not rename fields after JOIN

This will not break any current functionality and adds simple support for a very useful feature.

回答1:

I think this problem is specific to BigQuery Legacy SQL.
If you will use Big Standard SQL - you will not have this issue - see example below

#standardSQL
WITH table_with_100_columns AS (
  SELECT 11 AS col1, 21 AS col2, 31 AS col3 UNION ALL 
  SELECT 12 AS col1, 22 AS col2, 32 AS col3 UNION ALL
  SELECT 13 AS col1, 23 AS col2, 33 AS col3 UNION ALL
  SELECT 14 AS col1, 24 AS col2, 34 AS col3 UNION ALL
  SELECT 15 AS col1, 25 AS col2, 35 AS col3   
),
table_with_2_columns AS (
  SELECT 11 AS key, 17 AS additional_field UNION ALL
  SELECT 12 AS key, 27 AS additional_field UNION ALL
  SELECT 13 AS key, 37 AS additional_field UNION ALL
  SELECT 14 AS key, 47 AS additional_field UNION ALL
  SELECT 15 AS key, 57 AS additional_field   
)
SELECT a.*, additional_field
FROM `table_with_100_columns` AS a
JOIN `table_with_2_columns` AS b
ON a.col1 = b.key

See Migrating from legacy SQL in case if you need rewrite the rest of the query to be in Standard SQL

The output will be as below with original column names (w/o prefixes)

col1    col2    col3    additional_field     
13      23      33      37   
11      21      31      17   
15      25      35      57   
12      22      32      27   
14      24      34      47

回答2:

I don’t know of any option here available now rather than having those 100 unnecessary lines to be part of the code.
So you are down to how to actually make it in most optimal way for your particular use case
It can be many I think, but I see most obvious two below – they are more-less trivial, but I put it here for the sake of completeness of my answer:

Option 1 –one off action/need

Just take output for below statement into any spreadsheet, transpose it and dress upto expected SQL (at least the portion of it between SELECT and FROM in second query of your question)

SELECT * FROM table_with_100_columnsoutput WHERE false

in another words - you do this quite manually with whatever most friendly office tool for such manipulation you have in your hands

Option 2 – you need this on more-less frequent basis or as a part of some process

Generate SQL code using any language/client of your choice by retrieving schema with Tables:get API and looking for schema.fields[]

After sql code is assembled - you execute it using API of your choice
Can be get or insert or whatever fit into your implementation logic

Option 3 – BigQuery Mate “Add Fields” Button

Step 1 – select table in Navigation bar so you can see table’s schema in content panel
Step 2 – set cursor within Query Editor at position where fields needs to be inserted
Step 3 – click on “Add Fields” button

Deployed Option 3 with support for alias use. Available now in web store

回答3:

The easiest solution for now is to use standard SQL, it will not prefix any fields that are unique to any of the joined tables.

回答4:

As of release 127.0.0(2016-09-21) of the Cloud SDK, new Standard SQL query parameters include a FULL [OUTER] JOIN feature as part of Cloud BigQuery. In fact, calling a FULL OUTER JOIN (or simply FULL JOIN) returns all fields for all rows in both from_items that meet the join condition.

Therefore, running your query in Standard SQL would enable you to add another column (without renaming any) to a pre-existing table as long you specify FULL JOIN as part of your query. For more information on how to enable standard SQL to use with your BigQuery statements, see this Enabling Standard SQL reference.

回答5:

Since i needed to stick with Legacy SQL (because Im integrating with another system that uses Legacy SQL and crashes due to column prefixes)

I managed to fix the problem by replacing the selection part of the SQL

SELECT *
FROM table1 t1
LEFT JOIN table2 t2
ON [some_condition]
GROUP BY [group_columns]

SELECT 
column1 as new_name1,
column2 as new_name2,
column3 as new_name3
FROM table1 t1
LEFT JOIN table2 t2
ON [some_condition]
GROUP BY [group_columns]

Now column1 will be shown as new_name1 rather than t1_column1

来源：https://stackoverflow.com/questions/35640533/google-bigquery-sql-prevent-column-prefix-renaming-after-join

标签

sql

join

google-bigquery

prefix