Sum array by number in numpy

前端 未结 8 1128
别跟我提以往
别跟我提以往 2020-11-30 08:51

Assuming I have a numpy array like: [1,2,3,4,5,6] and another array: [0,0,1,2,2,1] I want to sum the items in the first array by group (the second array) and obtain n-groups

8条回答
  •  失恋的感觉
    2020-11-30 09:33

    There's more than one way to do this, but here's one way:

    import numpy as np
    data = np.arange(1, 7)
    groups = np.array([0,0,1,2,2,1])
    
    unique_groups = np.unique(groups)
    sums = []
    for group in unique_groups:
        sums.append(data[groups == group].sum())
    

    You can vectorize things so that there's no for loop at all, but I'd recommend against it. It becomes unreadable, and will require a couple of 2D temporary arrays, which could require large amounts of memory if you have a lot of data.

    Edit: Here's one way you could entirely vectorize. Keep in mind that this may (and likely will) be slower than the version above. (And there may be a better way to vectorize this, but it's late and I'm tired, so this is just the first thing to pop into my head...)

    However, keep in mind that this is a bad example... You're really better off (both in terms of speed and readability) with the loop above...

    import numpy as np
    data = np.arange(1, 7)
    groups = np.array([0,0,1,2,2,1])
    
    unique_groups = np.unique(groups)
    
    # Forgive the bad naming here...
    # I can't think of more descriptive variable names at the moment...
    x, y = np.meshgrid(groups, unique_groups)
    data_stack = np.tile(data, (unique_groups.size, 1))
    
    data_in_group = np.zeros_like(data_stack)
    data_in_group[x==y] = data_stack[x==y]
    
    sums = data_in_group.sum(axis=1)
    

提交回复
热议问题