How to determine if $addToSet actually added a new item into a MongoDB document or if the item already existed?

落花浮王杯 提交于 2019-12-24 12:43:36

问题


I'm using the C# driver (v1.8.3 from NuGet), and having a hard time determining if an $addtoSet/upsert operation actually added a NEW item into the given array, or if the item was already existing.

Adding a new item could fall into two cases, either the document didn't exist at all and was just created by the upsert, or the document existed but the array didn't exist or didn't contain the given item.

The reason I need to do this, is that I have large sets of data to load into MongoDB, which may (shouldn't, but may) break during processing. If this happens, I need to be able to start back up from the beginning without doing duplicate downstream processing (keep processing idempotent). In my flow, if an item is determined to be newly added, I queue up downstream processing of that given item, if it is determined to already have been added in the doc, then no more downstream work is required. My issue is that the result always returns saying that the call modified one document, even if the item was already existing in the array and nothing was actually modified.

Based on my understanding of the C# driver api, I should be able to make the call with WriteConcern.Acknowledged, and then check the WriteConcernResult.DocumentsAffected to see if it indeed updated a document or not.

My issue is that in all cases, the write concern result is returning back that 1 document was updated. :/

Here is an example document that my code is calling $addToSet on, which may or may not have this specific item in the "items" list to start with:

{
    "_id" : "some-id-that-we-know-wont-change",
    "items" : [ 
        {                
            "s" : 4,
            "i" : "some-value-we-know-is-static",
        }
    ]
}

My query always uses an _id value which is known based on the processing metadata:

var query = new QueryDocument
{
     {"_id", "some-id-that-we-know-wont-change"}                       
};

My update is as follows:

var result = mongoCollection.Update(query, new UpdateDocument()
{
     {                                                
          "$addToSet", new BsonDocument()
               {
                    { "items", new BsonDocument()
                         {
                              { "s", 4 },
                              { "i", "some-value-we-know-is-static" }                                                                            
                          } 
                    }
               }
     }
}, new MongoUpdateOptions() { Flags = UpdateFlags.Upsert, WriteConcern = WriteConcern.Acknowledged }); 

if(result.DocumentsAffected > 0 || result.UpdatedExisting)
{
     //DO SOME POST PROCESSING WORK THAT SHOULD ONLY HAPPEN ONCE PER ITEM                                                
}

If i run this code one time on an empty collection, the document is added and response is as expected ( DocumentsAffected = 1, UpdatedExisting = false). If I run it again (any number of times), the document doesn't appear to be updated as it remains unchanged but the result is now unexpected (DocumentsAffected = 1, UpdatedExisting = true).

Shouldn't this be returning DocumentsAffected = 0 if the document is unchanged?

As we need to do many millions of these calls a day, I'm hesitant to turn this logic into multiple calls per item (first checking if the item exists in the given documents array, and then adding/queuing or just skipping) if at all possible.

Is there some way to get this working in a single call?


回答1:


Of course what you are doing here is actually checking the response which does indicate whether a document was updated or inserted or in fact if neither operation happened. That is your best indicator as for an $addToSet to have performed an update the document would then be updated.

The $addToSet operator itself cannot produce duplicates, that is the nature of the operator. But you may indeed have some problems with your logic:

{                                                
      "$addToSet", new BsonDocument()
           {
                { "items", new BsonDocument()
                     {
                          { "id", item.Id },
                          { "v", item.Value } 
                     }
                }
           }
 }

So clearly you are showing that an item in your "set" is composed of two fields, so if that content varies in any way ( i.e same id but different value) then the item is actually a "unique" member of the set and will be added. There would be no way for instance for the $addToSet operator to not add new values purely based on the "id" as a unique identifier. You would have to actually roll that in code.

A second possibility here for a form of duplicate is that your query portion is not correctly finding the document that has to be updated. The result of this would be creating a new document that contains only the newly specified member in the "set". So a common usage mistake is something like this:

db.collection.update(
    { 
        "id": ABC,
        "items": { "$elemMatch": {
            "id": 123, "v": 10
         }},
    {
        "$addToSet": {
            "items": {
                "id": 123, "v": 10
            }
        }
    },
    { "upsert": true }
)

The result of that sort of operation would always create a new document because the existing document did not contain the specified element in the "set". The correct implementation is to not check for the presence of the "set" member and allow $addToSet to do the work.

If indeed you do have true duplicate entries occurring in the "set" where all elements of the sub-document are exactly the same, then it has been caused by some other code either present or in the past.

Where you are sure there a new entries being created, look through the code for instances of $push or indeed and array manipulation in code that seems to be acting on the same field.

But if you are using the operator correctly then $addToSet does exactly what it is intended to do.



来源:https://stackoverflow.com/questions/22971821/how-to-determine-if-addtoset-actually-added-a-new-item-into-a-mongodb-document

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!