db.collection.count() returns a lot more documents for sharded collection in MongoDB

旧巷老猫 提交于 2019-12-09 13:15:26

问题


I have 2 shards with replication sets (3 instances each). When I do count() on a sharded collection, I get a lot more than the real number of documents (more than 2.5 millions documents difference). Same when I just do find() and incrementing counter in forEach() loop.

How do I know real number of documents? First of all, I know the trend of increase, i.e. it can not increase so radically. Secondly, when I count documents with the following M/R script, I get real number of documents (as I assume). I use this script to see duplicate documents. Number of duplicates is several thousands not millions. And the count on test_duplicate_collection minus duplicates gives me real number of documents.

var map = function(){
   emit(this.doc_id, 1);
};

var reduce = function(key, values){
   var result = 0;
   values.forEach(function(value) {
     result += value;
   });

   return result;
};

db.test_collection.mapReduce(map, reduce, "test_duplicate_collection",null );

Now, I understand that during balancing it can happen that some chunks are not deleted yet while transferring them to another shard. But I see in the status (sh.status()) that all chunks are equally distributed. I have also tried to pause write operations to see if it takes some time, but nothing happened.

You might say deletion of moved chunks is still going on, and indeed when I just started to use sharding I saw slight decreases (with no write operations) for sharded collection. But currently, there is no change over time, it just stands still. I tried also to use orphanage.js with the hope to find orphaned documents (using the script from https://groups.google.com/forum/#!topic/mongodb-user/OKH5_KDO04I) but no such documents have been found.

My question is what can be the reason that count() and find().forEach() give more than real number of documents (i.e. vs M/R script).

Appreciate your help.

EDIT1

There was a problem with the configuration of the replication set in one of the shards. Specifically, no master has been set in the configuration file. In MMS dashboard instead of Primary I always saw Slave for host who was listened by other replication hosts. When we fixed it, forEach loop count started to show the same number of documents as in M/R script above. So the only problem currently is with the count() itself.

In MongoDB JIRA I found the following unresolved bug with count() in sharded environment https://jira.mongodb.org/browse/SERVER-3645 But it really relates to count() during balancing, i.e. count may count chunks which are currently moved by the balancer. As a workaround this bug proposes to put query which is always true. I tried it also but still it returns count as before.


回答1:


Try using the slower (but apparently more accurate) .itcount()



来源:https://stackoverflow.com/questions/17557696/db-collection-count-returns-a-lot-more-documents-for-sharded-collection-in-mon

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!