Pandas df.iterrows() parallelization

后端 未结 3 1755
长情又很酷
长情又很酷 2020-12-02 10:02

I would like to parallelize the following code:

for row in df.iterrows():
    idx = row[0]
    k = row[1][\'Chromosome\']
    start,end = row[1][\'Bin\'].spl         


        
3条回答
  •  孤城傲影
    2020-12-02 10:41

    Consider using dask.dataframe, as e.g. shown in this example for a similar question: https://stackoverflow.com/a/53923034/4340584

    import dask.dataframe as ddf
    df_dask = ddf.from_pandas(df, npartitions=4)   # where the number of partitions is the number of cores you want to use
    df_dask['output'] = df_dask.apply(lambda x: your_function(x), meta=('str')).compute(scheduler='multiprocessing')
    

提交回复
热议问题