Fast or Bulk Upsert in pymongo

前端 未结 6 1911
逝去的感伤
逝去的感伤 2020-11-29 02:07

How can I do a bulk upsert in pymongo? I want to Update a bunch of entries and doing them one at a time is very slow.

The answer to an almost identical question is h

6条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-11-29 03:10

    Fastest bulk update with Python 3.5+, motor and asyncio:

    import asyncio
    import datetime
    import logging
    import random
    import time
    
    import motor.motor_asyncio
    import pymongo.errors
    
    
    async def execute_bulk(bulk):
        try:
            await bulk.execute()
        except pymongo.errors.BulkWriteError as err:
            logging.error(err.details)
    
    
    async def main():
        cnt = 0
        bulk = db.initialize_unordered_bulk_op()
        tasks = []
        async for document in db.find({}, {}, no_cursor_timeout=True):
            cnt += 1
            bulk.find({'_id': document['_id']}).update({'$set': {"random": random.randint(0,10)}})
            if not cnt % 1000:
                task = asyncio.ensure_future(execute_bulk(bulk))
                tasks.append(task)
                bulk = db.initialize_unordered_bulk_op()
        if cnt % 1000:
            task = asyncio.ensure_future(bulk.execute(bulk))
            tasks.append(task)
        logging.info('%s processed', cnt)
        await asyncio.gather(*tasks)
    
    
    logging.basicConfig(level='INFO')    
    db = motor.motor_asyncio.AsyncIOMotorClient()['database']['collection']
    start_time = time.time()
    loop = asyncio.get_event_loop()
    try:
        loop.run_until_complete(main())
    finally:
        execution_time = time.time() - start_time
        logging.info('Execution time: %s', datetime.timedelta(seconds=execution_time))
    

提交回复
热议问题