What does the hint USE_ADDITIONAL_PARALLELISM do in Cloud Spanner

匆匆过客 提交于 2020-03-25 18:50:22

问题


In the doc we can find a query hint named USE_ADDITIONAL_PARALLELISM here: https://cloud.google.com/spanner/docs/query-syntax#statement-hints

However the documentation is very short for it. From my understanding it will spread a single query to be executed on multiple nodes. Is that correct?

In what scenario would we use it? What is its impact on the infrastructure? How does it scale with number of nodes? Does it need a query that picks data from different splits, or does it work on a single split? Any meaningful information about it is welcome.

PS: it was originally introduced in the thread Using multiple "count distinct" has huge performance impact


回答1:


A Cloud Spanner query may have multiple levels of distribution. The USE_ADDITIONAL_PARALLELISM query hint will cause a node executing a query to try and prefetch the results of subqueries further up in the distribution queue. This can be useful in scenarios such as queries doing full table scans or doing full table scans with aggregations like COUNT(), MAX , MIN etc. where identical subqueries can be distributed to many splits and where the individual subqueries to the splits return relatively little data (such as aggregation state). However, if the individual subqueries return significant data then using this hint can cause memory usage on the consuming node to go up significantly due to prefetching.



来源:https://stackoverflow.com/questions/60334962/what-does-the-hint-use-additional-parallelism-do-in-cloud-spanner

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!