问题
Say you have a table "table_with_100_columns."
And you want to add one more column with a simple join... without changing all of the column names. In other words, you wish to write something like
SELECT a.* as <a's columns without prefix>, additional_field
FROM [table_with_100_columns] a
JOIN [table_with_2_columns] b
ON a.col1 = b.key
You should be able to do this to generate a new table with 101 columns, without having to rename every single column by hand. Right now the only way I know how to do this as follows:
SELECT
a.col1 as col1,
a.col2 as col2,
a.col3 as col3,
...
a.col100 as col100,
b.additional_field as additional_field
FROM [table_with_100_columns] a
JOIN [table_with_2_columns] b
ON a.col1 = b.key
Having to write 100 unnecessary lines of code simply to add one more column to a table is unbelievably inefficient - so I'm hoping there is a better way to preserve column names while joining?
UPDATE
It appears this is not yet possible in BigQuery. It is very easy to implement and I suggest the following to the Google BigQuery team:
if no fields share a name in SELECT clause:
if no subtable reference names given:
Do not rename fields after JOIN
This will not break any current functionality and adds simple support for a very useful feature.
回答1:
I think this problem is specific to BigQuery Legacy SQL.
If you will use Big Standard SQL - you will not have this issue - see example below
#standardSQL
WITH table_with_100_columns AS (
SELECT 11 AS col1, 21 AS col2, 31 AS col3 UNION ALL
SELECT 12 AS col1, 22 AS col2, 32 AS col3 UNION ALL
SELECT 13 AS col1, 23 AS col2, 33 AS col3 UNION ALL
SELECT 14 AS col1, 24 AS col2, 34 AS col3 UNION ALL
SELECT 15 AS col1, 25 AS col2, 35 AS col3
),
table_with_2_columns AS (
SELECT 11 AS key, 17 AS additional_field UNION ALL
SELECT 12 AS key, 27 AS additional_field UNION ALL
SELECT 13 AS key, 37 AS additional_field UNION ALL
SELECT 14 AS key, 47 AS additional_field UNION ALL
SELECT 15 AS key, 57 AS additional_field
)
SELECT a.*, additional_field
FROM `table_with_100_columns` AS a
JOIN `table_with_2_columns` AS b
ON a.col1 = b.key
See Migrating from legacy SQL in case if you need rewrite the rest of the query to be in Standard SQL
The output will be as below with original column names (w/o prefixes)
col1 col2 col3 additional_field
13 23 33 37
11 21 31 17
15 25 35 57
12 22 32 27
14 24 34 47
回答2:
I don’t know of any option here available now rather than having those 100 unnecessary lines
to be part of the code.
So you are down to how to actually make it in most optimal way
for your particular use case
It can be many I think, but I see most obvious two below – they are more-less trivial, but I put it here for the sake of completeness of my answer:
Option 1 –one off action/need
Just take output for below statement into any spreadsheet, transpose it and dress upto expected SQL (at least the portion of it between SELECT and FROM in second query of your question)
SELECT * FROM table_with_100_columnsoutput WHERE false
in another words - you do this quite manually with whatever most friendly office tool for such manipulation you have in your hands
Option 2 – you need this on more-less frequent basis or as a part of some process
Generate SQL code using any language/client of your choice by retrieving schema with Tables:get API and looking for schema.fields[]
After sql code is assembled - you execute it using API of your choice
Can be get
or insert
or whatever fit into your implementation logic
Option 3 – BigQuery Mate “Add Fields” Button
Step 1 – select table in Navigation bar so you can see table’s schema in content panel
Step 2 – set cursor within Query Editor at position where fields needs to be inserted
Step 3 – click on “Add Fields” button
Deployed Option 3 with support for alias use. Available now in web store
回答3:
The easiest solution for now is to use standard SQL, it will not prefix any fields that are unique to any of the joined tables.
回答4:
As of release 127.0.0(2016-09-21) of the Cloud SDK, new Standard SQL query parameters include a FULL [OUTER] JOIN feature as part of Cloud BigQuery. In fact, calling a FULL OUTER JOIN
(or simply FULL JOIN
) returns all fields for all rows in both from_items
that meet the join condition.
Therefore, running your query in Standard SQL would enable you to add another column (without renaming any) to a pre-existing table as long you specify FULL JOIN
as part of your query. For more information on how to enable standard SQL to use with your BigQuery statements, see this Enabling Standard SQL reference.
回答5:
Since i needed to stick with Legacy SQL (because Im integrating with another system that uses Legacy SQL and crashes due to column prefixes)
I managed to fix the problem by replacing the selection part of the SQL
SELECT *
FROM table1 t1
LEFT JOIN table2 t2
ON [some_condition]
GROUP BY [group_columns]
To
SELECT
column1 as new_name1,
column2 as new_name2,
column3 as new_name3
FROM table1 t1
LEFT JOIN table2 t2
ON [some_condition]
GROUP BY [group_columns]
Now column1 will be shown as new_name1 rather than t1_column1
来源:https://stackoverflow.com/questions/35640533/google-bigquery-sql-prevent-column-prefix-renaming-after-join