Duplicate documents on _id (in mongo)

亡梦爱人 提交于 2019-12-01 20:10:25

问题


I have a sharded mongo collection, with over 1.5 mil documents. I use the _id column as a shard key, and the values in this column are integers (rather than ObjectIds).

I do a lot of write operations on this collection, using the Perl driver (insert, update, remove, save) and mongoimport.

My problem is that somehow, I have duplicate documents on the same _id. From what I've read, this shouldn't be possible.

I've removed the duplicates, but others still appear.

Do you have any ideas where could they come from, or what should I start looking at? (Also, I've tried to replicate this on a smaller, test collection, but no duplicates are inserted, no matter what write operation I perform).


回答1:


This actually isn't a problem with the Perl driver .. it is related to the characteristics of sharding. MongoDB is only able to enforce uniqueness among the documents located on a single shard at the time of creation, so the default index does not require uniqueness.

In the MongoDB: Configuring Sharding documentation there is specific mention that:

  • When you shard a collection, you must specify the shard key. If there is data in the collection, mongo will require an index to be created upfront (it speeds up the chunking process); otherwise, an index will be automatically created for you.

  • You can use the {unique: true} option to ensure that the underlying index enforces uniqueness so long as the unique index is a prefix of the shard key.

  • If the "unique: true" option is not used, the shard key does not have to be unique.




回答2:


How have you implemented generating the integer Ids?

If you use a system like the one suggested on the MongoDB website, you should be fine. For reference:

function counter(name) {
    var ret = db.counters.findAndModify({
         query:{_id:name}, 
         update:{$inc:{next:1}}, 
         "new":true, 
         upsert:true});

    return ret.next;
}

db.users.insert({_id:counter("users"), name:"Sarah C."}) // _id : 1
db.users.insert({_id:counter("users"), name:"Bob D."}) // _id : 2

If you are generating your Ids by reading a most recent record in the document store, then incrementing the number in the perl code, then inserting with the incremented number you could be running into timing issues.



来源:https://stackoverflow.com/questions/11241819/duplicate-documents-on-id-in-mongo

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!