Can numpy bincount work with 2D arrays?

前端 未结 2 1266
生来不讨喜
生来不讨喜 2020-12-03 17:21

I am seeing behaviour with numpy bincount that I cannot make sense of. I want to bin the values in a 2D array in a row-wise manner and see the behaviour below. Why would i

相关标签:
2条回答
  • 2020-12-03 17:34

    The problem is that bincount isn't always returning the same shaped objects, in particular when values are missing. For example:

    >>> m = np.array([[0,0,1],[1,1,0],[1,1,1]])
    >>> np.apply_along_axis(np.bincount, 1, m)
    array([[2, 1],
           [1, 2],
           [0, 3]])
    >>> [np.bincount(m[i]) for i in range(m.shape[1])]
    [array([2, 1]), array([1, 2]), array([0, 3])]
    

    works, but:

    >>> m = np.array([[0,0,0],[1,1,0],[1,1,0]])
    >>> m
    array([[0, 0, 0],
           [1, 1, 0],
           [1, 1, 0]])
    >>> [np.bincount(m[i]) for i in range(m.shape[1])]
    [array([3]), array([1, 2]), array([1, 2])]
    >>> np.apply_along_axis(np.bincount, 1, m)
    Traceback (most recent call last):
      File "<ipython-input-49-72e06e26a718>", line 1, in <module>
        np.apply_along_axis(np.bincount, 1, m)
      File "/usr/local/lib/python2.7/dist-packages/numpy/lib/shape_base.py", line 117, in apply_along_axis
        outarr[tuple(i.tolist())] = res
    ValueError: could not broadcast input array from shape (2) into shape (1)
    

    won't.

    You could use the minlength parameter and pass it using a lambda or partial or something:

    >>> np.apply_along_axis(lambda x: np.bincount(x, minlength=2), axis=1, arr=m)
    array([[3, 0],
           [1, 2],
           [1, 2]])
    
    0 讨论(0)
  • 2020-12-03 17:48

    As @DSM has already mentioned, bincount of a 2d array cannot be done without knowing the maximum value of the array, because it would mean an inconsistency of array sizes.

    But thanks to the power of numpy's indexing, it was fairly easy to make a faster implementation of 2d bincount, as it doesn't use concatenation or anything.

    def bincount2d(arr, bins=None):
        if bins is None:
            bins = np.max(arr) + 1
        count = np.zeros(shape=[len(arr), bins], dtype=np.int64)
        indexing = np.arange(len(arr))
        for col in arr.T:
            count[indexing, col] += 1
        return count
    
    
    t = np.array([[1,2,3],[4,5,6],[3,2,2]], dtype=np.int64)
    print(bincount2d(t))
    

    P.S.

    This:

    t = np.empty(shape=[10000, 100], dtype=np.int64)
    s = time.time()
    bincount2d(t)
    e = time.time()
    print(e - s)
    

    gives ~2 times faster result, than this:

    t = np.empty(shape=[100, 10000], dtype=np.int64)
    s = time.time()
    bincount2d(t)
    e = time.time()
    print(e - s)
    

    because of the for loop iterating over columns. So, it's better to transpose your 2d array, if shape[0] < shape[1].

    UPD

    Better than this can't be done (using python alone, I mean):

    def bincount2d(arr, bins=None):
        if bins is None:
            bins = np.max(arr) + 1
        count = np.zeros(shape=[len(arr), bins], dtype=np.int64)
        indexing = (np.ones_like(arr).T * np.arange(len(arr))).T
        np.add.at(count, (indexing, arr), 1)
    
        return count
    
    0 讨论(0)
提交回复
热议问题