Is “count distinct” exact with BigQuery new standard SQL syntax?

天涯浪子 提交于 2020-01-13 19:53:13

问题


With the legacy BigQuery syntax, we have to use the exact_count_distinct function if we want to have the exact number of distinct values for a field.

With the Standard SQL 2011 syntax, I wonder if "count(distinct myfield)" will always return the exact number of distinct values if I don't select the 'Use Legacy SQL' option.


回答1:


COUNT(DISTINCT input) gives an exact count in standard SQL.

One important distinction is that COUNT(DISTINCT input) is more scalable than EXACT_COUNT_DISTINCT(input) in legacy BigQuery SQL, so in general the performance will be better and you are less likely to encounter resource exceeded errors.

You can read about other differences between legacy and standard SQL in the migration guide.




回答2:


Based on documentation for APPROX_COUNT_DISTINCT (with reading in between lines) :

COUNT(DISTINCT input) - exact count
APPROX_COUNT_DISTINCT(input) - approximate result



来源:https://stackoverflow.com/questions/37171324/is-count-distinct-exact-with-bigquery-new-standard-sql-syntax

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!