问题
we have a data set in Big Query with more than 500000 tables, when we run queries against this data set using legacy SQL, its throwing an error
As per Jordan Tigani, it executes SELECT table_id FROM .TABLES_SUMMARY to get relevant tables to query How do I use the TABLE_QUERY() function in BigQuery?
Does queries using _TABLE_SUFFIX(standard SQL) executes TABLES_SUMMARY to get relevant tables to query?
回答1:
According to the documentation TABLE_SUFFIX
is a pseudo column that contains the values matched by the table wildcard and it is olny available in StandardSQL. Meanwhile, __TABLE_SUMMARY_
is a meta-table that contains information about the tables within a dataset and it is available in Standard and Legacy SQL. Therefore, they have two different concepts.
However, in StandardSQL, you can use INFORMATION_SCHEMA.TABLES
to retrieve information about the tables within the chosen dataset, similarly to __TABLE_SUMMARY_
. Here you can find examples of usage and also its limitations.
Below, I queried against a public dataset using both methods:
First, using INFORMATION_SCHEMA.TABLES
.
SELECT * FROM `bigquery-public-data.noaa_gsod.INFORMATION_SCHEMA.TABLES`
And part of the output:
Secondly, using __TABLES_SUMMARY__
.
SELECT * FROM `bigquery-public-data.noaa_gsod.__TABLES_SUMMARY__`
And part of the output table,
As you can see, for each method the output has a particular. Even though, both retrieve metadata about the tables within a particular dataset.
NOTE: BigQuery's queries have quotas. This quotas applies for some situations, including for the number of tables a single query can reference, which is 1000 per query, here.
回答2:
No, querying using wildcard table does not execute TABLES_SUMMARY. You can have more than 500k tables in the dataset, but it does require that the number of tables matching the prefix pattern to be less than 500k. For other limitations on wildcard tables you can refer to the documentation.
来源:https://stackoverflow.com/questions/61791953/does-big-query-executes-tables-summary-for-standard-sql