mongodb move documents from one collection to another collection

前端 未结 15 1509
感情败类
感情败类 2020-11-30 22:10

How can documents be moved from one collection to another collection in MongoDB?? For example: I have lot of documents in

15条回答
  •  自闭症患者
    2020-11-30 23:02

    I planned to arhieve 1000 records at a time using bulkinsert and bulkdelete methods of pymongo.

    For both source and target

    1. create mongodb objects to connect to the database.

    2. instantiate the bulk objects. Note: I created a backup of bulk objects too. This will help me to rollback the insertion or removal when an error occurs. example:

      For source // replace this with mongodb object creation logic source_db_obj = db_help.create_db_obj(source_db, source_col) source_bulk = source_db_obj.initialize_ordered_bulk_op() source_bulk_bak = source_db_obj.initialize_ordered_bulk_op()
      For target // replace this with mogodb object creation logic target_db_obj = db_help.create_db_obj(target_db, target_col) target_bulk = target_db_obj.initialize_ordered_bulk_op() target_bulk_bak = target_db_obj.initialize_ordered_bulk_op()

    3. Obtain the source records that matches the filter criteria

      source_find_results = source_db_obj.find(filter)

    4. Loop through the source records

      create target and source bulk operations

      Append archived_at field with the current datetime to the target collection

      //replace this with the logic to obtain the UTCtime. doc['archived_at'] = db_help.getUTCTime() target_bulk.insert(document) source_bulk.remove(document)

      for rollback in case of any errors or exceptions, create target_bulk_bak and source_bulk_bak operations.

      target_bulk_bak.find({'_id':doc['_id']}).remove_one() source_bulk_bak.insert(doc) //remove the extra column doc.pop('archieved_at', None)

    5. When the record count to 1000, execute the target - bulk insertion and source - bulk removal. Note: this method takes target_bulk and source_bulk objects for execution.

      execute_bulk_insert_remove(source_bulk, target_bulk)

    6. When exception occurs, execute the target_bulk_bak removal and source_bulk_bak inesertions. This would rollback the changes. Since mongodb doesn't have rollback, I came up with this hack

      execute_bulk_insert_remove(source_bulk_bak, target_bulk_bak)

    7. Finally re-initialize the source and target bulk and bulk_bak objects. This is necessary because you can use them only once.

    8. Complete code

          def execute_bulk_insert_remove(source_bulk, target_bulk):
              try:
                  target_bulk.execute()
                  source_bulk.execute()
              except BulkWriteError as bwe:
                  raise Exception(
                      "could not archive document, reason:    {}".format(bwe.details))
      
          def archive_bulk_immediate(filter, source_db, source_col, target_db, target_col):
              """
              filter: filter criteria for backup
              source_db: source database name
              source_col: source collection name
              target_db: target database name
              target_col: target collection name
              """
              count = 0
              bulk_count = 1000
      
              source_db_obj = db_help.create_db_obj(source_db, source_col)
              source_bulk = source_db_obj.initialize_ordered_bulk_op()
              source_bulk_bak = source_db_obj.initialize_ordered_bulk_op()
      
              target_db_obj = db_help.create_db_obj(target_db, target_col)
              target_bulk = target_db_obj.initialize_ordered_bulk_op()
              target_bulk_bak = target_db_obj.initialize_ordered_bulk_op()
      
              source_find_results = source_db_obj.find(filter)
      
              start = datetime.now()
      
              for doc in source_find_results:
                  doc['archived_at'] = db_help.getUTCTime()
      
                  target_bulk.insert(doc)
                  source_bulk.find({'_id': doc['_id']}).remove_one()
                  target_bulk_bak.find({'_id': doc['_id']}).remove_one()
                  doc.pop('archieved_at', None)
                  source_bulk_bak.insert(doc)
      
                  count += 1
      
                  if count % 1000 == 0:
                      logger.info("count: {}".format(count))
                      try:
                          execute_bulk_insert_remove(source_bulk, target_bulk)
                      except BulkWriteError as bwe:
                          execute_bulk_insert_remove(source_bulk_bak, target_bulk_bak)
                          logger.info("Bulk Write Error: {}".format(bwe.details))
                          raise
      
                      source_bulk = source_db_obj.initialize_ordered_bulk_op()
                      source_bulk_bak = source_db_obj.initialize_ordered_bulk_op()
      
                      target_bulk = target_db_obj.initialize_ordered_bulk_op()
                      target_bulk_bak = target_db_obj.initialize_ordered_bulk_op()
      
              end = datetime.now()
      
              logger.info("archived {} documents to {} in ms.".format(
                  count, target_col, (end - start)))
      

提交回复
热议问题