Indexes on BigQuery Table

隐身守侯 提交于 2019-12-05 03:57:22

2019 update: Check out how clusters improve your querying times and data scanned:


As stated in the comments this question is associated with "how would BigQuery deal with my data if it was a 100 times larger". When dealing with traditional databases an index is the right solution, but BigQuery is different: As data size grows, BigQuery adds more servers to the mix - keeping performance almost constant.

In other words, as your data grows you should expect costs to increase linearly, with performance staying almost constant. No indexes needed. And this is one of the big reasons why people choose BigQuery for their analytical workloads.

(It all depends on your specific use case of course, please test these assertions and report back!)

The close you can get for "index" in BigQuery is Partitioned Tables. Currently it only supports partition by date though.

A partitioned table is a special table that is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance and reduce the number of bytes that are billed by restricting the amount of data that is scanned. BigQuery offers date-partitioned tables, which means that the table is divided into a separate partition for each date.

You can create indexes in bigquery table using Clustering order parameter available in advanced options while creating table.This clustering option is only available for Partitioned tables. Follow the below link for additional details: link to google documentation

Besides partitioning one could as well use multiple tables, eg each with a day's amount of data. BigQuery can query a maximum of 1000 tables at once, so that should cover most cases and let's you keep costs constant.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!