BaseIndexer still broken with .count() and .min()?

大城市里の小女人 提交于 2021-01-07 02:48:43

问题


I use a custom window function in pandas. This works fine for things like .mean() and .sum(). As I understand it other aggregations like .count() and .min() used to have problems but should be fixed. Currently .count() uses the internal roll_count function AFAICS. But I still don't get the expected results:

import numpy as np
import pandas as pd

# Use largest most recent multiple of *modulo* past measurements:
class ModuloIndexer(pd.api.indexers.BaseIndexer):
    def get_window_bounds(self, num_values, min_periods, center, closed):
        end = np.arange(1, num_values + 1, dtype=np.int64)
        start = end % self.modulo
        return start, end

s = pd.Series(2 ** np.arange(8))  # [1,   2,   4,   8,  16,  32,  64, 128]
r = s.rolling(ModuloIndexer(s.index, modulo=4))
print(r.sum())         # Correct:   [0,   0,   0,  15,  30,  60, 120, 255]
print(r.apply(len))    # Correct:   [0,   0,   0,   4,   4,   4,   4,   8]
print(r.count())       # Weird:   [nan, nan, nan,   1,   1,   1,   1,   2]
print(r.apply(np.min)) # Correct: [nan, nan, nan,   1,   2,   4,   8,   1]
print(r.min())         # Weird:   [nan, nan, nan,   8,   8,   8,   8,   8]

Am I doing something wrong or is this a bug I should report?

PS: use apply(len) as a workaround only when no nans exist!

来源:https://stackoverflow.com/questions/64984049/baseindexer-still-broken-with-count-and-min

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!