BigQuery: Do clustered tables remain sorted in the face of streaming inserts? [duplicate]

耗尽温柔 提交于 2020-02-07 03:12:52

问题


I have hourly batch jobs that need to scan all the data that has streamed into my table in the last hour. Right now I'm using a date-partitioned table, which means that every time I scan a date partition for an hour's worth of data, I have to scan rows from all hours of that day.

I've been thinking about clustering this table on an hour field, however I'm under the impression that BigQuery won't actually keep the table effectively clustered in the face of streaming inserts. So here's my question:

Does BigQuery guarantee to keep clustered tables sorted even in the face of streaming inserts?


回答1:


Currently the answer is no, clustered tables do not remain sorted/clustered in the face of streaming inserts. Many thanks to Tamir for pointing out that there's an answer relevant to this question here. Check that answer out for details as well as a trick to force sorting on part of a partition.

It also looks like the BigQuery team is working on this. According this issue tracker comment from April 17, 2019:

We are doing some a fair amount of work with streaming to keep the table clustered upto a certain recent time interval. We don't have a good ETA to offer on this at this point, but we hope to have more information on this soon.



来源:https://stackoverflow.com/questions/55723409/bigquery-do-clustered-tables-remain-sorted-in-the-face-of-streaming-inserts

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!