Numpy vectorization messes up data type (2)

前端未结

关注

 3  1119

小蘑菇 2020-12-21 06:22

I\'m having unwanted behaviour come out of np.vectorize, namely, it changes the datatype of the argument going into the original function. My original question

3条回答

心在旅途 (楼主)

2020-12-21 07:16

I think @rpanai answer on the original post is still the best. Here I share my tests:

def qualifies(dt, excluded_months = []):
    if dt.day < 5:
        return False
    if (dt + pd.tseries.offsets.MonthBegin(1) - dt).days < 5:
        return False
    if dt.month in excluded_months:
        return False
    return True

def new_qualifies(dt, excluded_months = []):
    dt = pd.Timestamp(dt)
    if dt.day < 5:
        return False
    if (dt + pd.tseries.offsets.MonthBegin(1) - dt).days < 5:
        return False
    if dt.month in excluded_months:
        return False
    return True

df = pd.DataFrame({'date': pd.date_range('2020-01-01', freq='7D', periods=12000)})

apply method:

%%timeit
df['qualifies1'] = df['date'].apply(lambda x: qualifies(x, [3, 8]))

385 ms ± 21.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

conversion method:

%%timeit
df['qualifies1'] = df['date'].apply(lambda x: new_qualifies(x, [3, 8]))

389 ms ± 12.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

vectorized code:

%%timeit
df['qualifies2'] =  np.logical_not((df['date'].dt.day<5).values | \
    ((df['date']+pd.tseries.offsets.MonthBegin(1)-df['date']).dt.days < 5).values |\
    (df['date'].dt.month.isin([3, 8])).values)

4.83 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

0 讨论(0)

查看其它3个回答