Find and Replace Strings in Documents Efficiently

久未见 提交于 2019-12-04 12:22:28

问题


I have the following query, to find   tags in a name field and replace them with an empty space - to get rid of them.
Name strings can have 1 to many   tags e.g.

AA aa
AA  aa
AA   aa
AA    aa
AA AA aaaaaaaa

... like that.

  db.tests.find({'name':/.* .*/}).forEach(function(test){
      test.name = test.name.replace(" ","");
      db.tests.save(test);
   });

   db.tests.find({'name':/.*  .*/}).forEach(function(test){
      test.name = test.name.replace("  ","");
      db.tests.save(test);
   });

  db.tests.find({'name':/.*   .*/}).forEach(function(test){
      test.name = test.name.replace("   ","");
      db.tests.save(test);
   });

Other than repeating the same query pattern, is there a better solution to handle this scenario, in terms of less duplication and higher performance?


回答1:


Surely if all you want to do is strip the   entities from your text then you just do a global match and replace:

db.tests.find({ "name": /\ /g }).forEach(function(doc) {
    doc.name = doc.name.replace(/ /g,"");
    db.tests.update({ "_id": doc._id },{ "$set": { "name": doc.name } });
});

So there should be no need to write out every combination, the regex will replace very match with the /g option. Possibly also use /m for multi-line is your "name" string contains newline characters. See a basic regexer example.

It is also recommended to use $set in order to only modify the field(s) you really want to rather than .save() the whole document back. There is less traffic and less chance of overwriting changes that might have been made by another process since the document was read.

Ideally you would use the Bulk Operations API with MongoDB versions 2.6 and greater. This allows the updates to "batch" so there is again less traffic between the client and the server:

var bulk = db.tests.initializeOrderedBulkOp();
var count = 0;

db.tests.find({ "name": /\ /g }).forEach(function(doc) {
    doc.name = doc.name.replace(/ /g,"");
    bulk.find({ "_id": doc._id })
        .updateOne({ "$set": { "name": doc.name } });
    count++;

    if ( count % 1000 == 0 ) {
        bulk.execute();
        bulk = db.tests.initializeOrderedBulkOp();
    }
});

if  ( count % 1000 != 0 )
    bulk.execute();

Those are your primary ways to improve this. Unfortunately there is no way for a MongoDB update statement to use an existing value as part of it's update expression in this way, so the only way is looping, but you can do a lot to reduce the operations as is shown.



来源:https://stackoverflow.com/questions/28866930/find-and-replace-strings-in-documents-efficiently

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!