Assuming I have a numpy array like: [1,2,3,4,5,6] and another array: [0,0,1,2,2,1] I want to sum the items in the first array by group (the second array) and obtain n-groups
This is a vectorized method of doing this sum based on the implementation of numpy.unique. According to my timings it is up to 500 times faster than the loop method and up to 100 times faster than the histogram method.
def sum_by_group(values, groups):
order = np.argsort(groups)
groups = groups[order]
values = values[order]
values.cumsum(out=values)
index = np.ones(len(groups), 'bool')
index[:-1] = groups[1:] != groups[:-1]
values = values[index]
groups = groups[index]
values[1:] = values[1:] - values[:-1]
return values, groups