SolrCloud: workaround for classic pagination with “start,rows” parameters

依然范特西╮ 提交于 2019-12-08 12:02:01

问题


I have SolrCloud with 3 shards.

My purpose: select and process all products from category.

Current implementation: Portion selection in cycle.

  • 1st iteration: q=cat:1&start=0&rows=100
  • 2nd iteration: q=cat:1&start=100&rows=100
  • 3th: q=cat:1&start=200&rows=100

...

But growing "start", performance is down. Explanation here: https://wiki.apache.org/solr/DistributedSearch

Makes it more inefficient to use a high "start" parameter. For example, if you request start=500000&rows=25 on an index with 500,000+ docs per shard, this will currently result in 500,000 records getting sent over the network from the shard to the coordinating Solr instance. If you had a single-shard index, in contrast, only 25 records would ever get sent over the network. (Granted, setting start this high is not something many people need to do.)

What ideas how I can walk around all records in category?


回答1:


There is another way to do more effective pagination in Solr - Cursors - which uses the current place in the sort instead. This is particularly useful for deep pagination.

See the section about Cursors at the Pagination of Results wiki page. This should speed up delivery as the Server should be able to do a sort of its local documents, decide where it is in that sequence and return 25 documents after that document.

UPDATE: Also useful link coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets




回答2:


I think the short answer is "no" - it's a limitation of how Solr does sharding. Instead, can you amass a list of document unique keys outside of Solr - presumably from a backing database - and then retrieve from the index using sets of those keys instead?

e.g. ID:(1 OR 2 OR 3 OR ...very long list...)

Or, if the unique keys are numeric you could use a moving range instead:

ID:[1 TO 1000] then ID:[1001 TO 2000] and so forth.

In both options above you'd also restrict by category as well. They both should avoid the slow down associated with windowing however.



来源:https://stackoverflow.com/questions/25306028/solrcloud-workaround-for-classic-pagination-with-start-rows-parameters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!