Make Pandas DataFrame apply() use all cores?

前端 未结 6 1565
陌清茗
陌清茗 2020-11-27 09:52

As of August 2017, Pandas DataFame.apply() is unfortunately still limited to working with a single core, meaning that a multi-core machine will waste the majority of its com

6条回答
  •  天涯浪人
    2020-11-27 10:33

    To use all (physical or logical) cores, you could try mapply as an alternative to swifter and pandarallel.

    You can set the amount of cores (and the chunking behaviour) upon init:

    import pandas as pd
    import mapply
    
    mapply.init(n_workers=-1)
    
    ...
    
    df.mapply(myfunc, axis=1)
    

    By default (n_workers=-1), the package uses all physical CPUs available on the system. If your system uses hyper-threading (usually twice the amount of physical CPUs would show up), mapply will spawn one extra worker to prioritise the multiprocessing pool over other processes on the system.

    Depending on your definition of all your cores, you could also use all logical cores instead (beware that like this the CPU-bound processes will be fighting for physical CPUs, which might slow down your operation):

    import multiprocessing
    n_workers = multiprocessing.cpu_count()
    
    # or more explicit
    import psutil
    n_workers = psutil.cpu_count(logical=True)
    

提交回复
热议问题