pandas multiprocessing apply

后端 未结 8 1425
借酒劲吻你
借酒劲吻你 2020-11-28 06:02

I\'m trying to use multiprocessing with pandas dataframe, that is split the dataframe to 8 parts. apply some function to each part using apply (with each part processed in d

8条回答
  •  甜味超标
    2020-11-28 06:35

    To use all (physical or logical) cores, you could try mapply as an alternative to swifter and pandarallel.

    You can set the amount of cores (and the chunking behaviour) upon init:

    import pandas as pd
    import mapply
    
    mapply.init(n_workers=-1)
    
    def process_apply(x):
        # do some stuff to data here
    
    def process(df):
        # spawns a pathos.multiprocessing.ProcessPool if sensible
        res = df.mapply(process_apply, axis=1)
        return res
    

    By default (n_workers=-1), the package uses all physical CPUs available on the system. If your system uses hyper-threading (usually twice the amount of physical CPUs would show up), mapply will spawn one extra worker to prioritise the multiprocessing pool over other processes on the system.

    You could also use all logical cores instead (beware that like this the CPU-bound processes will be fighting for physical CPUs, which might slow down your operation):

    import multiprocessing
    n_workers = multiprocessing.cpu_count()
    
    # or more explicit
    import psutil
    n_workers = psutil.cpu_count(logical=True)
    

提交回复
热议问题