What's the expected behavior of the Bing Search API v5 when deeply paginating?

拈花ヽ惹草 提交于 2021-01-28 07:32:26

问题


I perform a bing API search for webpages and the query cameras.

The first "page" of results (offset=0, count=50) returns 49 actual results. It also returns a totalEstimatedMatches of 114000000 -- 114 million. Neat, that's a lot of results.

The second "page" of results (offset=49, count=50) performs similarly...

...until I reach page 7 (offset=314, count=50). Suddenly totalEstimatedMatches is 544.

And the actual count of results returned per-page trails off precipitously from there. In fact, over 43 "pages" of results, I get 413 actual results, of which only 311 have unique URLs.

This appears to happen for any query after a small number of pages.

Is this expected behavior? There's no hint from the API documentation that exhaustive pagination should lead to this behavior... but there you have it.

Here's a screenshot:


回答1:


Each time the API is called, the search API obtains a group of possible matches starting at in the result set, and then filters out the results based on different parameters (e.g spam, duplicates, safesearch setting, etc), finally leaving a final result set.  If the final result after filtering and optimization is more than the count parameter then the number of results equal to count would be returned. If the parameter is more than the final result set count then the final result set is returned which will be less than the count parameter.  If the search API is called again, passing in the offset parameter to get the next set of results, then the filtering process happens again on the next set of results which means it may also be less than count.

  You should not expect the full count parameter number of results to always be returned for each API call.  If further search results beyond the number returned are required then the query should be called again, passing in the offset parameter with a value equal to the number of results returned in the previous API call.  This also means that when making subsequent API calls, the offset parameter should never be a hard coded value and should always be calculated based on the results of previous queries. 

totalEstimatedMatches can also add to confusion around the Bing Search API results.  The word ‘estimated’ is important because the number is an estimation based on an initial quick result set, prior to the filtering described above.  Additionally, the totalEstimatedMatches value can change as you iterate through the result set by making subsequent API calls with increasing offset values.  The totalEstimatedMatches should only be used as a rough guide indicating the magnitude of the possible result set, and it should not be used to determine the number of results that will ultimately be returned.  To query all of the possible results you should continue making API calls, passing in offset with a value of the sum of the results returned in previous calls, until that sum is greater than totalEstimatedMatches of the most recent API call.

  Note that you can see this same behavior by going to bing.com directly and using a query such as https://www.bing.com/search?q=bill+gates&count=50.  Notice that you will get around 34 results with a totalEstimatedMatches of ~567,000 (valid as of June 2017, future searches may change), and if you click the 'next page' arrow you will see that the next query executed will start at the offset of the 34 returned in the first query (ie. https://www.bing.com/search?q=bill+gates&count=50&first=34).  If you click ‘next’ several more times you may see the totalEstimatedMatches also change from page to page.




回答2:


This seems to be expected behavior. The Web Search API is not a crawler API, thus it only delivers results, that the algorithms deem relevant for a human. Simply put, most humans won't skim through more than a few pages of results, furthermore they expect to find relevant results on the first page.

If you could retrieve the results in the millions, you could simply copy their search index and Bing would be out of business.

Search indices seem to be things of political and economic power, as far as I know there are only four relevant search indices world wide: from Google, from Microsoft (Bing), from Russia, and from China. Those who control the search, control the Spice... ;-)



来源:https://stackoverflow.com/questions/46735916/whats-the-expected-behavior-of-the-bing-search-api-v5-when-deeply-paginating

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!