问题
In the doc we can find a query hint named USE_ADDITIONAL_PARALLELISM
here: https://cloud.google.com/spanner/docs/query-syntax#statement-hints
However the documentation is very short for it. From my understanding it will spread a single query to be executed on multiple nodes. Is that correct?
In what scenario would we use it? What is its impact on the infrastructure? How does it scale with number of nodes? Does it need a query that picks data from different splits, or does it work on a single split? Any meaningful information about it is welcome.
PS: it was originally introduced in the thread Using multiple "count distinct" has huge performance impact
回答1:
A Cloud Spanner query may have multiple levels of distribution. The USE_ADDITIONAL_PARALLELISM
query hint will cause a node executing a query to try and prefetch the results of subqueries further up in the distribution queue. This can be useful in scenarios such as queries doing full table scans or doing full table scans with aggregations like COUNT()
, MAX
, MIN
etc. where identical subqueries can be distributed to many splits and where the individual subqueries to the splits return relatively little data (such as aggregation state). However, if the individual subqueries return significant data then using this hint can cause memory usage on the consuming node to go up significantly due to prefetching.
来源:https://stackoverflow.com/questions/60334962/what-does-the-hint-use-additional-parallelism-do-in-cloud-spanner