What's the source of thread-unsafety in this usage of pandas.Series.reindex(, copy=True)?

大兔子大兔子 提交于 2020-01-16 03:03:51

问题


Calling pd.Series.reindex is not thread safe (https://github.com/pandas-dev/pandas/issues/25870). My question is why is Series.reindex (which returns a copy and seems like a functionally pure operation) not thread safe, even when no-one is writing to that object's data?

The operation I'm performing is:

s = pd.Series(...)
f(s)  # Success!

# Thread 1:
   while True: f(s)  

# Thread 2:
   while True: f(s)  # Exception !

... which fails for f(s): s.reindex(..., copy=True).

So, why did the threaded call fail? I'm surprised at this, because if there were any thread-not-safe calls, such as populating the Series' index, I would have thought these would done their mutating work already in the main thread.

Pandas does have an open issue that .copy is not threadsafe. However, the discussion there is around issues of people reading and writing to the object at the same time. https://github.com/pandas-dev/pandas/issues/2728

The maintainers marked the .reindex not-thread-safe issue as a duplicate of https://github.com/pandas-dev/pandas/issues/2728 . I'm suspicious that it has the same cause, but if .copy is the source, then I suspect almost all of pandas is not thread safe in any situation, ever for 'functionally pure' operations.

import traceback
import pandas as pd
import numpy as np
from multiprocessing.pool import ThreadPool

def f(arg):
    s,idx = arg
    try:
        # s.loc[idx].values   # No problem
        s.reindex(idx) # Fails
    except Exception:
        traceback.print_exc()
    return None


def gen_args(n=10000):
    a = np.arange(0, 3000000)
    for i in xrange(n):
        if i%1000 == 0:
            # print "?",i
            s = pd.Series(data=a, index=a)
            f((s,a)) # <<< LOOK. IT WORKS HERE!!!
        yield s, np.arange(0,1000)

# for arg in gen_args():
#     f(arg)   # Works fine

t = ThreadPool(4)
for result in t.imap(f, gen_args(), chunksize=1):
    # print "==>", result
    pass

来源:https://stackoverflow.com/questions/55347139/whats-the-source-of-thread-unsafety-in-this-usage-of-pandas-series-reindex-co

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!