Sum array by number in numpy

前端未结

关注

 8  1138

Assuming I have a numpy array like: [1,2,3,4,5,6] and another array: [0,0,1,2,2,1] I want to sum the items in the first array by group (the second array) and obtain n-groups

相关标签:

8条回答

耶瑟儿～

2020-11-30 09:20

If the groups are indexed by consecutive integers, you can abuse the numpy.histogram() function to get the result:

data = numpy.arange(1, 7)
groups = numpy.array([0,0,1,2,2,1])
sums = numpy.histogram(groups, 
                       bins=numpy.arange(groups.min(), groups.max()+2), 
                       weights=data)[0]
# array([3, 9, 9])

This will avoid any Python loops.

0 讨论(0)

梦谈多话

2020-11-30 09:23
I noticed the numpy tag but in case you don't mind using pandas, this task becomes an one-liner:
```
import pandas as pd
import numpy as np

data = np.arange(1, 7)
groups = np.array([0, 0, 1, 2, 2, 1])

df = pd.DataFrame({'data': data, 'groups': groups})
```
So df then looks like this:
```
   data  groups
0     1       0
1     2       0
2     3       1
3     4       2
4     5       2
5     6       1
```
Now you can use the functions groupby() and sum()
```
print(df.groupby(['groups'], sort=False).sum())
```
which gives you the desired output
```
        data
groups      
0          3
1          9
2          9
```
By default, the dataframe would be sorted, therefore I use the flag sort=False which might improve speed for huge dataframes.
0 讨论(0)
发布评论:

提交评论
- 加载中...

野性不改

2020-11-30 09:24

You're all wrong! The best way to do it is:

a = [1,2,3,4,5,6]
ix = [0,0,1,2,2,1]
accum = np.zeros(np.max(ix)+1)
np.add.at(accum, ix, a)
print accum
> array([ 3.,  9.,  9.])

0 讨论(0)

萌比男神i

2020-11-30 09:24

A pure python implementation:

l = [1,2,3,4,5,6]
g = [0,0,1,2,2,1]

from itertools import izip
from operator import itemgetter
from collections import defaultdict

def group_sum(l, g):
    groups = defaultdict(int)
    for li, gi in izip(l, g):
        groups[gi] += li
    return map(itemgetter(1), sorted(groups.iteritems()))

print group_sum(l, g)

[3, 9, 9]

0 讨论(0)

失恋的感觉

2020-11-30 09:33
There's more than one way to do this, but here's one way:
```
import numpy as np
data = np.arange(1, 7)
groups = np.array([0,0,1,2,2,1])

unique_groups = np.unique(groups)
sums = []
for group in unique_groups:
    sums.append(data[groups == group].sum())
```
You can vectorize things so that there's no for loop at all, but I'd recommend against it. It becomes unreadable, and will require a couple of 2D temporary arrays, which could require large amounts of memory if you have a lot of data.

Edit: Here's one way you could entirely vectorize. Keep in mind that this may (and likely will) be slower than the version above. (And there may be a better way to vectorize this, but it's late and I'm tired, so this is just the first thing to pop into my head...)

However, keep in mind that this is a bad example... You're really better off (both in terms of speed and readability) with the loop above...
```
import numpy as np
data = np.arange(1, 7)
groups = np.array([0,0,1,2,2,1])

unique_groups = np.unique(groups)

# Forgive the bad naming here...
# I can't think of more descriptive variable names at the moment...
x, y = np.meshgrid(groups, unique_groups)
data_stack = np.tile(data, (unique_groups.size, 1))

data_in_group = np.zeros_like(data_stack)
data_in_group[x==y] = data_stack[x==y]

sums = data_in_group.sum(axis=1)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
半阙折子戏

2020-11-30 09:36
The numpy function bincount was made exactly for this purpose and I'm sure it will be much faster than the other methods for all sizes of inputs:
```
data = [1,2,3,4,5,6]
ids  = [0,0,1,2,2,1]

np.bincount(ids, weights=data) #returns [3,9,9] as a float64 array
```
The i-th element of the output is the sum of all the data elements corresponding to "id" i.

Hope that helps.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页