I would like to parallelize the following code:
for row in df.iterrows():
idx = row[0]
k = row[1][\'Chromosome\']
start,end = row[1][\'Bin\'].spl
Consider using dask.dataframe, as e.g. shown in this example for a similar question: https://stackoverflow.com/a/53923034/4340584
import dask.dataframe as ddf df_dask = ddf.from_pandas(df, npartitions=4) # where the number of partitions is the number of cores you want to use df_dask['output'] = df_dask.apply(lambda x: your_function(x), meta=('str')).compute(scheduler='multiprocessing')