pandas multiprocessing apply

后端 未结 8 1430
借酒劲吻你
借酒劲吻你 2020-11-28 06:02

I\'m trying to use multiprocessing with pandas dataframe, that is split the dataframe to 8 parts. apply some function to each part using apply (with each part processed in d

8条回答
  •  北荒
    北荒 (楼主)
    2020-11-28 06:25

    A more generic version based on the author solution, that allows to run it on every function and dataframe:

    from multiprocessing import  Pool
    from functools import partial
    import numpy as np
    
    def parallelize(data, func, num_of_processes=8):
        data_split = np.array_split(data, num_of_processes)
        pool = Pool(num_of_processes)
        data = pd.concat(pool.map(func, data_split))
        pool.close()
        pool.join()
        return data
    
    def run_on_subset(func, data_subset):
        return data_subset.apply(func, axis=1)
    
    def parallelize_on_rows(data, func, num_of_processes=8):
        return parallelize(data, partial(run_on_subset, func), num_of_processes)
    

    So the following line:

    df.apply(some_func, axis=1)
    

    Will become:

    parallelize_on_rows(df, some_func) 
    

提交回复
热议问题