Indexes on BigQuery Table

I have a use case in which we have a few tables in BigQuery. Now I want to implement an index on one of the columns in the BigQuery table. But I am not finding enough documentation to do that. I found a few blogs and posts mentioning BigQuery doesn't support indexes. Please help me find a blog or post which can help me in implementing index on BigQuery. Thanks in advance.

2019 update: Check out how clusters improve your querying times and data scanned:

https://medium.com/google-cloud/bigquery-optimized-cluster-your-tables-65e2f684594b

As stated in the comments this question is associated with "how would BigQuery deal with my data if it was a 100 times larger". When dealing with traditional databases an index is the right solution, but BigQuery is different: As data size grows, BigQuery adds more servers to the mix - keeping performance almost constant.

In other words, as your data grows you should expect costs to increase linearly, with performance staying almost constant. No indexes needed. And this is one of the big reasons why people choose BigQuery for their analytical workloads.

(It all depends on your specific use case of course, please test these assertions and report back!)

The close you can get for "index" in BigQuery is Partitioned Tables. Currently it only supports partition by date though.

A partitioned table is a special table that is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance and reduce the number of bytes that are billed by restricting the amount of data that is scanned. BigQuery offers date-partitioned tables, which means that the table is divided into a separate partition for each date.

You can create indexes in bigquery table using Clustering order parameter available in advanced options while creating table.This clustering option is only available for Partitioned tables. Follow the below link for additional details: link to google documentation

Besides partitioning one could as well use multiple tables, eg each with a day's amount of data. BigQuery can query a maximum of 1000 tables at once, so that should cover most cases and let's you keep costs constant.

来源：https://stackoverflow.com/questions/28600228/indexes-on-bigquery-table

标签

cloud

google-bigquery