Can numpy bincount work with 2D arrays?

前端未结

关注

 2  1266

I am seeing behaviour with numpy bincount that I cannot make sense of. I want to bin the values in a 2D array in a row-wise manner and see the behaviour below. Why would i

相关标签:

2条回答

南笙

2020-12-03 17:34

The problem is that bincount isn't always returning the same shaped objects, in particular when values are missing. For example:

>>> m = np.array([[0,0,1],[1,1,0],[1,1,1]])
>>> np.apply_along_axis(np.bincount, 1, m)
array([[2, 1],
       [1, 2],
       [0, 3]])
>>> [np.bincount(m[i]) for i in range(m.shape[1])]
[array([2, 1]), array([1, 2]), array([0, 3])]

works, but:

>>> m = np.array([[0,0,0],[1,1,0],[1,1,0]])
>>> m
array([[0, 0, 0],
       [1, 1, 0],
       [1, 1, 0]])
>>> [np.bincount(m[i]) for i in range(m.shape[1])]
[array([3]), array([1, 2]), array([1, 2])]
>>> np.apply_along_axis(np.bincount, 1, m)
Traceback (most recent call last):
  File "<ipython-input-49-72e06e26a718>", line 1, in <module>
    np.apply_along_axis(np.bincount, 1, m)
  File "/usr/local/lib/python2.7/dist-packages/numpy/lib/shape_base.py", line 117, in apply_along_axis
    outarr[tuple(i.tolist())] = res
ValueError: could not broadcast input array from shape (2) into shape (1)

won't.

You could use the minlength parameter and pass it using a lambda or partial or something:

>>> np.apply_along_axis(lambda x: np.bincount(x, minlength=2), axis=1, arr=m)
array([[3, 0],
       [1, 2],
       [1, 2]])

0 讨论(0)

我在风中等你

2020-12-03 17:48

As @DSM has already mentioned, bincount of a 2d array cannot be done without knowing the maximum value of the array, because it would mean an inconsistency of array sizes.

But thanks to the power of numpy's indexing, it was fairly easy to make a faster implementation of 2d bincount, as it doesn't use concatenation or anything.

def bincount2d(arr, bins=None):
    if bins is None:
        bins = np.max(arr) + 1
    count = np.zeros(shape=[len(arr), bins], dtype=np.int64)
    indexing = np.arange(len(arr))
    for col in arr.T:
        count[indexing, col] += 1
    return count


t = np.array([[1,2,3],[4,5,6],[3,2,2]], dtype=np.int64)
print(bincount2d(t))

P.S.

This:

t = np.empty(shape=[10000, 100], dtype=np.int64)
s = time.time()
bincount2d(t)
e = time.time()
print(e - s)

gives ~2 times faster result, than this:

t = np.empty(shape=[100, 10000], dtype=np.int64)
s = time.time()
bincount2d(t)
e = time.time()
print(e - s)

because of the for loop iterating over columns. So, it's better to transpose your 2d array, if shape[0] < shape[1].

UPD

Better than this can't be done (using python alone, I mean):

def bincount2d(arr, bins=None):
    if bins is None:
        bins = np.max(arr) + 1
    count = np.zeros(shape=[len(arr), bins], dtype=np.int64)
    indexing = (np.ones_like(arr).T * np.arange(len(arr))).T
    np.add.at(count, (indexing, arr), 1)

    return count

0 讨论(0)