python multiprocessing: write to same excel file

谁说我不能喝 提交于 2019-12-03 22:09:20

1) Why did you implement time.sleep in several places in your 2nd method?

In __main__, time.sleep(0.1), to give the started process a timeslice to startup.
In f2(fq, q), to give the queue a timeslice to flushed all buffered data to the pipe and as q.get_nowait() are used.
In w(q), are only for testing simulating long run of writer.to_excel(...), i removed this one.

2) What is the difference between pool.map and pool = [mp.Process( . )]?

Using pool.map needs no Queue, no parameter passed, shorter code. The worker_process have to return immediately the result and terminates. pool.map starts a new process as long as all iteration are done. The results have to be processed after that.

Using pool = [mp.Process( . )], starts n processes. A process terminates on queue.Empty

Can you think of a situation where you would prefer one method over the other?

Methode 1: Quick setup, serialized, only interested in the result to continue.
Methode 2: If you want to do all workload parallel.


You could't use global writer in processes.
The writer instance has to belong to one process.

Usage of mp.Pool, for instance:

def f1(k):
  # *** DO SOME STUFF HERE***
  results = pd.DataFrame(df_)
  return results

if __name__ == '__main__':
    pool = mp.Pool()
    results = pool.map(f1, range(len(list_of_days)))

    writer = pd.ExcelWriter('../test/myfile.xlsx', engine='xlsxwriter')
    for k, result in enumerate(results):
        result.to_excel(writer, sheet_name=list_of_days[k])

    writer.save()
    pool.close()

This leads to .to_excel(...) are called in sequence in the __main__ process.


If you want parallel .to_excel(...) you have to use mp.Queue().
For instance:

The worker process:

# mp.Queue exeptions have to load from
try:
    # Python3
    import queue
except:
    # Python 2
    import Queue as queue

def f2(fq, q):
    while True:
        try:
            k = fq.get_nowait()
        except queue.Empty:
            exit(0)

        # *** DO SOME STUFF HERE***

        results = pd.DataFrame(df_)
        q.put( (list_of_days[k], results) )
        time.sleep(0.1)  

The writer process:

def w(q):
    writer = pd.ExcelWriter('myfile.xlsx', engine='xlsxwriter')
    while True:
        try:
            titel, result = q.get()
        except ValueError:
            writer.save()
            exit(0)

        result.to_excel(writer, sheet_name=titel)

The __main__ process:

if __name__ == '__main__':
    w_q = mp.Queue()
    w_p = mp.Process(target=w, args=(w_q,))
    w_p.start()
    time.sleep(0.1)

    f_q = mp.Queue()
    for i in range(len(list_of_days)):
        f_q.put(i)

    pool = [mp.Process(target=f2, args=(f_q, w_q,)) for p in range(os.cpu_count()+1)]
    for p in pool:
        p.start()
        time.sleep(0.1)

    for p in pool:
        p.join()

    w_q.put('STOP')
    w_p.join()

Tested with Python:3.4.2 - pandas:0.19.2 - xlsxwriter:0.9.6

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!